You are on page 1of 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Quality Assessment Methods to


Evaluate the Performance of Edge
Detection Algorithms for Digital Image:
A Systematic Literature Review
NAZISH TARIQ1 , ROSTAM AFFENDI HAMZAH2 , THEAM FOO NG3 , SHIR LI WANG4 , and
HAIDI IBRAHIM1 (Senior Member, IEEE).
1
School of Electrical & Electronic Engineering, Engineering Campus, Universiti Sains Malaysia, 14300 Nibong Tebal, Pulau Pinang, Malaysia.
2
Fakulti Teknologi Kejuruteraan Elektrik & Elektronik, Univesiti Teknikal Malaysia Melaka, 76100 Durian Tunggal, Melaka, Malaysia.
3
Centre of Global Sustainability Studies (CGSS), Level 5, Hamzah Sendut Library, Universiti Sains Malaysia, 11800 USM, Penang, Malaysia.
4
Faculty of Art, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, 35900 Tanjung Malim, Perak, Malaysia.
Corresponding author: Haidi Ibrahim (e-mail: haidi_ibrahim@ ieee.org).
This research is supported by the Ministry of Higher Education (MoHE) Malaysia, under Fundamental Research Grant Scheme (FRGS)
with grant number 203\PELECT \6071421.

ABSTRACT A segmentation process is usually required in order to analyze an image. One of the
available segmentation approaches is by detecting the edges on the image. Up to now, there are many
edge detection algorithms that researchers have proposed. Thus, the purpose of this systematic literature
review is to investigate the available quality assessment methods that researchers have utilized to evaluate
the performance of the edge detection algorithms. Due to the vast number of available literature in this
area, we limit our search to only open-access publications. A systematic search in five publisher websites
(i.e., IEEExplore, IET digital library, Wiley, MDPI, and Hindawi) and Scopus database was carried out
to gather resources that are related to the edge detection algorithms. Seventy-three publications that are
about developing or comparing edge detection algorithms have been chosen. From these publication
samples, we have identified 17 quality assessment methods used by researchers. Among the popular quality
assessment methods are visual inspection, processing time, confusion-matrix based measures, mean square
error (MSE)-based measures, and figure of merit (FOM). This survey also indicates that although most of
the researchers only use a small number of test images (i.e., less than 10 test images), there are available
datasets with a larger number of images for digital image segmentation that researchers can utilize.

INDEX TERMS Digital image processing, edge detection algorithm, image segmentation, assessment,
validation, quality measures, reviews.

I. INTRODUCTION centroid and circularity [7]. Edge detection can also be used
N some computer vision applications, image segmenta- for art purposes. The natural picture can be transformed into
I tion is required to identify objects and analyze images au-
tomatically or help humans find the region of interest [1]–[3].
a cartoon-type picture with the help of edge detection [8].
Many researchers have come out with various algorithms
Edge detection is one of the significant branches available for edge detection. Edges can be identified by recognizing the
for image segmentation. The applications of edge detection locations where there are drastic changes in intensity levels.
are not limited. Many applications can benefit from edge This identification can be made by inspecting the gradient or
detection. One can use edge detection to detect cracks on derivative of the image. Thus, many algorithms are based on
a surface captured on a photograph [4]–[6]. Edge detection the first derivative or second derivative of the image. Popular
is also commonly used to assess the shape of an object. For edge detection algorithms, such as the Sobel, Prewitt, Robert
example, edge detection is used to find an object’s perimeter, and Canny edge detectors are using this concept [9], [10]. In
which is useful to find other shape’s characteristics, such as addition to the spatial domain approaches, some algorithms

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

are developed in the transformed domains, such as wavelet


and curvelet [11], [12]. Recently, machine learning and deep Search:
"edge detection algorithm"
learning approaches have also been used to develop edge
detection algorithm [13]–[15].
The edge detection output is the edge or contour con-
structed by the edgels (edgel is the term used to present
the edge element). When developing an edge detection al- Publications returned by:

IDENTIFICATION
gorithm, researchers need to evaluate their method’s perfor- 1) IEEE1 (n=17)
mance in terms of the edge or contour generated. 2) IET1 (n=8)
Up to now, various quality assessment methods have been 3) Wiley1 (n=44)
4) MDPI (n=9)
used by researchers to evaluate the edges generated by 5) Hindawi (n=10,000)
their approaches. Therefore, this observation motivates us 6) Scopus1,2 (n=290)
to conduct a systematic literature review to see how the
previous studies related to the development or comparison of
edge detection algorithms evaluated their works. This review
aims to see whether there is already a standard procedure
Publications screened on title:
to evaluate the edge detection algorithms or not. Four main
1) IEEE (n=5)
aspects will be investigated in this work. They are: (1) the 2) IET (n=4)
assessment method used by researchers, (2) the number of 3) Wiley (n=5)
test images used, (3) how their ground truth is obtained, and 4) MDPI (n=5)

SCREENING
(4) the number of benchmark methods utilized. 5) Hindawi (n=20)
6) Scopus (n=90)
This review hopefully will bring benefits to several par-
ties. First, for researchers developing a new edge detection
algorithm, this review can become a guideline for selecting
suitable evaluation approaches. This review can give a basic
idea to them to decide their test images and the number Publications after
of benchmark methods. On the other hand, this review can duplicates removed
hopefully help researchers develop a new quality assessment (n = 107)
measure to evaluate edges. Based on this systematic literature
review findings, we also will provide some suggestions for
ELIGIBILITY

future works.
This paper is divided into five sections. Section I, which is
Full text publications
this section, introduces this literature review. Then, Section accessed for eligibility
II will explain our methodology in executing this literature (n= 73)
review. Next, Section III will present our result, followed by
Section IV that will discuss the results. The final section,
INCLUDED

which is Section V, is for our conclusion.

II. METHODOLOGY Publications included in


the systematic review
In this work, the systematic literature review was executed (n= 73)
based on the guidelines provided by the Preferred Reporting
Items for Systematic Reviews and Meta-Analyses (PRISMA) Footnotes:
1Search is limited to only open-access publications.
[16], [17]. As there are many works related to edge de- 2Subject areas are limited to (1) Computer Science (2) Engineering, and (3) Mathematics
tection, we scope our search only to open-access articles.
FIGURE 1. Flowchart of the publication selection process.
The literature search has been carried out on four publisher
websites (i.e., IEEExplore, IET digital library, Wiley, MDPI,
and Hindawi) and the Scopus database.
For the IET (https://digital-library.theiet.org/), 144 articles
A. SEARCH STRATEGY have been given as the search results. Then, we limited our
The search was performed in February 2021. The search was search to only open-access publications. This search returns
performed by using the keyword: “edge detection algorithm”. only eight articles.
The related flowchart is shown in Figure 1. Wiley’s website (https://onlinelibrary.wiley.com/) returns
The search returned us with 1,149 results from the IEEE 921 publications. The results have been reduced to 44 publi-
website (https://ieeexplore.ieee.org). Therefore, we limited cations when we choose only open-access publications.
the search for the open-access publications only, which re- MPDI is an open-access publisher. We have executed the
turned us with only 17 results. search on www.mpdi.com. This website presented us with
2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

nine publications as a result. publication were: (a) authors, (b) year of publication, (c)
We did the search in https://www.hindawi.com/. Hindawi quality assessment method(s) used, (d) the number of test
is another well-known open-access publisher. This website images used, (e) information about the ground truth, and (f)
suggested 10,000 titles of publications. the number of benchmark methods used.
Scopus (https://www.scopus.com/) has given us 3,197
publications as the initial result of the search. Therefore, we III. RESULTS
have limited the search to only open-access publications (i.e., This section is divided into five subsections. Subsection III-A
“All Open Access”) and limit the subject areas to “Computer gives the search results for the publication selection pro-
Science”, “Engineering”, and “Mathematics”. With these cess. Then, Subsection III-B presents the quality assessment
settings, we obtained 290 articles. methods that are used by our samples. Next, information
about the test images used by the researchers are provided
B. STUDY SELECTION in Subsection III-C. After that, we explain our finding related
Due to the vast number of suggested publications that we to the ground-truth in Subsection III-D. Finally, Subsection
have obtained from our searches, as presented in Subsection III-E presents the gathered information about the number of
II-A, we first screened the publications’ titles. The publica- benchmark methods.
tions were included as the candidates of our samples if their
title meets the following criteria: A. SEARCH RESULTS
• The title is showing that the work is related to the digital
After the removal of duplicates, we identified a total of 107
image processing field. publications (i.e., n = 107) from six sources (i.e., IEEE,
• The title should reflect that the publication is about
IET, Wiley, MDPI, Hindawi, and Scopus) as the candidates
the development, usage, or evaluation of edge detection for our literature review. Following the full-text publications
algorithm. screening (i.e., to assess the publications’ eligibility), 73
publications (i.e., n = 73) have met the inclusion criteria.
For Hindawi’s website, most of the suggested publications
This publication selection process is shown in Figure 1.
are not related to this literature review. Therefore, we ar-
ranged the results based on the relevancy. Then, we inspect
B. QUALITY ASSESSMENT METHODS
only a few search pages of this website, which are really
There are 17 quality assessement methods have been used in
related to our work. Based on this approach, we kept five
these 73 publications. Each of this method is explained by
publications from IEEE, four publications from IET, five
the following subsections.
publications from Wiley, five publications from MDPI, 20
publications from Hindawi, and 90 publications from Scopus
1) Visual inspection
(please refer to Figure 1).
Visual inspection is the most used quality assessment method
After this first screening, the publications from these six
found in these 73 publication samples. All of these pub-
sources were combined, and duplicates were removed. Then,
lication samples are using visual inspection to discuss the
based on the full-text publications, the publications were
performance of their edge detection algorithm. Output im-
selected for this systematic literature review if they fulfill the
ages are shown for the evaluation. This quality assessment
following criteria:
method is not an objective assessment method. The ground
• The text in the publication is written in English.
truth is not required for this assessment. The outputs are
• The publication is on digital image processing.
judged subjectively by the human, based on their preference.
• The publication is about the development of a new edge
However, there are variations in how the authors presented
detection algorithm, improvement to edge detection al- their outputs. Some of the examples are shown in Figure 2.
gorithm, evaluation of edge detection algorithm, or the Lasserre et al. [18] have used five images to evaluate their
usage of the edge detection algorithm. edge detection algorithm’s performance. Two images have
• The evaluation used in the publication should not be too
been used to evaluate the directional gradient magnitudes
specific towards certain applications. (i.e., approach as shown in Figure 2(b)). These gradient
The full-text publications are excluded from this systematic magnitude images are used to show that the proposed method
literature review if any of the above condition is not met. The has successfully removed strong gradient values that are not
process of selecting the full-text publications for inclusion in related to the important edges, according to this work. The
this work was distributed independently among the authors other three images have been used to evaluate the appearance
of this paper. of the areas that are enclosed by the generated contours (i.e.,
approach as shown in Figure 2(d)). These images have also
C. DATA EXTRACTION AND ANALYSIS APPROACH been used to discuss how close the generated contours with
For this systematic literature review, data extraction was the expected contour by humans.
concentrated on the following four aspects: (i) quality as- Sun et al. [19] have used six hyperspectral images, con-
sessment method, (ii) test images, (iii) ground-truth, and (iv) sisting of artificial images and real natural scene images, for
the number of benchmark methods. Data extracted from each their evaluation. The artificial images were generated using
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

3) Number of detected edgels


In work by Sundani et al. [25], they have measured the num-
ber of edgels as one of their evaluations. They assume that a
good edge detection algorithm will produce more edges. As
they developed an improvement to the Canny algorithm, they
have also calculated the increment of edgels with respect to
the edges detected by the Canny algorithm. This measure is
defined as:
Nq − NC
Increment = × 100% (1)
(a) (b) Si
where Nq is the number of edgels detected by their method,
NC is the number of edgel from the Canny algorithm, and Si
is the image’s size.

4) Confusion matrix-based measures


In this reference-based evaluation, in addition to the detected
edges by the algorithm, the researchers also need to provide
the ground truth for each image tested. The ground truth
contains the targeted edgels; how the edges or contour should
be. By comparing the detected edges with the ground truth,
(c) (d) four classes are generated, which are true positive (TP), true
FIGURE 2. Examples of visual inspection. (a) Input image. (b) Gradient negative (TN), false negative (FN), and false positive (FP).
magnitude. (c) Detected edges. (d). Detected edges overlaid with the input This is defined by the confusion matix shown in Figure 3.
image.

a freeware called HYper-spectral data viewer for Develop-


ment of Research Applications (HYDRA) [20]. They have
commented on the edges generated by the algorithms (i.e.,
approach as shown in Figure 2(c)). Poor methods usually
generate many small, spurious, or grainy edges.
Chen and Chiu [21] compared the edges generated by their
method with four other methods. Yahya et al. [22] evaluated
the edges’ appearance from their method and compared them
with five other methods.
FIGURE 3. A confusion matrix to define TP, TN, FP and FN.
Researchers also have compared the images of edges
obtained from different image resolutions. An example is
The true positive TP is when the edgel is both defined by
a work on ore image edge detection by Wang and Zhang
the algorithm and also by the ground truth. The true negative
[23]. They have evaluated the edges that are obtained by
TN is when the non-edge pixel defined by the ground truth
the binarization process. Three input images with different
is correctly identified as the non-edge pixel by the algorithm.
resolutions have been used for this purpose.
The false-positive FP is when the non-edge pixel defined by
To evaluate the segmentation results of their work, Xu, the ground truth is wrongly identified as the edgel by the
Xu, and Zuo [24], have utilized two images for this purpose. algorithm. The false-negative FN is when the actual edgel
The images used are showing the wind turbine blade. They defined by the ground truth is wrongly identified as a non-
evaluate the performance in terms of edges detected and also edge pixel by the algorithm.
the segmented region. From these four classes, several other quality measures can
be defined. Precision is defined as:
2) Percentage of acceptable segmentation contour
#(TP)
In work by Lasserre et al. [18], they use the percentage of Precision = (2)
#(TP+FP)
acceptable segmentation contour as one of their evaluations.
This measure can be considered as an extension of the visual where #(.) is the number of elements in a set. This measure is
inspection assessments. No ground truth is needed for this also called as positive predictive value (PPV).
measure. This percentage is defined as the percentage of Recall is expressed as:
output images that have generated contours similar to those #(TP)
expected by a human. Recall = (3)
#(TP+FN)
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

This measure is also known as true positive rate (TPR), Related to the F-measure are two other quality measure,
sensitivity, or hit rate. which are optimal dataset scale (ODS), and optimal image
The false positive rate (FPR), or fall-out is given as: scale (OIS). The ODS is the value of F-measure obtained
#(FP) from a fix contour threshold, while the OIS is when the opti-
FPR = (4) mal threshold is applied individually to each test image. Qu
#(FP+TN)
et al. [15] have used F-measure, ODS and OIS in their work.
The true negative rate (TNR) is described as: Besides, they are also have discussed the plot of precision
#(TN) versus recall (i.e., PR curve) to compare the performance of
TNR = (5) their method with other methods.
#(TN+FP)
The negative predictive value (NPV) is presented by the
5) Number of the false edges
following equation:
Al-Jarrah, Al-Jarrah, and Roth [33] compared the number of
#(TN)
NPV = (6) false edges obtained from several edge detection approaches.
#(TN+FN) A low number of false edges indicates a good edge detection
The error rate (Err) is defined as: method. Unfortunately, the definition of the false edges is not
#(FP) mentioned in this work, whether it is false-positive edges,
Err = (7) false-negative edges, or both.
#(TP+FN)
The accuracy (Acc) is given as: 6) Receiver operating characteristic (ROC)
#(TP+TN) The receiver operating characteristic (ROC) is obtained by
Acc = (8)
#(TP+TN+FP+FN) plotting the true positive rate (TPR) versus the false positive
The F1-score is defined as: rate (FPR). If the plot shows that the curve is nearer to the
#(2TP) left-hand border and the ROC space’s top border, then the
F1-score = (9) method is considered to produce accurate results. On the
#(2TP+FN+FP)
other hand, if the curve is around 45◦ of the ROC space, this
The researchers in our samples did not use all the formulas shows that the results are less accurate. Zheng et al. [32] used
listed above but use selected evaluations for their work. For ROC as one of the evaluation criteria in their work.
example, Dong et al. [26] used only the true positive rate
(i.e., Recall) and the false positive rate (FPR) in their work. 7) Edge detection error, 
Huang, Yu, and Zuo [27] utilized the sensitivity measure The measure of edge detection error  was introduced by
(i.e., Recall). Dorafshan, Thomas, and Maguire [6] evaluated Ma and Staunton [34]. It compares the edge detected by the
the methods by using the positive predictive value (i.e., algorithm with the ground truth, as defined as:
Precision), true positive rate (i.e., Recall), accuracy, negative
predictive value, and F1-score. Luo et al. [28] have used the #(G ∩ E)
=1− (13)
true positive rate (i.e., Recall), false-positive rate (FPR), and #(G)
accuracy (Acc). They also have added the versions of Recall,
FPR, and Acc that still accepted detected edgels by the where G is the ground truth, E is the detected edges, and #(.)
algorithms, which deviate a few pixels from the ground truth. is the number of elements in a set. A good edge detection
Bogdan, Bonchis, and Orhei [29] utilized Recall, Precision, algorithm will give a small value of . Guo and Sengur [35]
and F1-score. have used this measure for their work.
Khunteta and Ghosh [30] and Xu et al. [31] used a mea-
sure called P-value (or precision value) for their evaluation 8) Intersection over union (IoU)
purpose. This measure is defined as: The measure of intersection over union (IoU) can be used
to measure the correlation between the edges on the ground
#(TP)
P-value = (10) truth, with the edges detected by the algorithm. A high IoU
#(TP+FP+FN) value indicates a high correlation of edges between these two
Sun et al [19] have used F-measure as their quantitative edge images. Zheng et al. [32] have used IoU as one of the
evaluation. Their definition of F-measure (Fα ) is: performance measures in their work.
Precision × Recall Yoon [36] has used IoU to measure the edge similarity.
Fα = (11) In this work, IoU is called as an edge similarity or an
α.Precision + (1 − α).Recall
overlapping rate. The similarity is defined as:
with α ∈ [0, 1] is the weighting parameter.
Qu et al. [15] and Zheng et al. [32] have also used F- G∩E
S(G, E) = (14)
measure in their work, but their definition for the F-measure G∪E
is: where E is the result from the segmentation, and G is the
2 × Precision × Recall
F-measure = (12) ground truth.
Precision+Recall
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

9) Misclassification rate (MCR) where TEN is the number of the detected edgels, while
Misclassification rate (MCR) is defined as: CEN is the number of edge segments. A higher value of R
P P indicates that the segment is longer (i.e., more continuous).
|BG ∩ BE | + |FG ∩ FE | The work by Yu et al. [42] used a similar but more detailed
MCR = P × 100% (15)
(BG + FE ) equation. They define A as the number of edgels detected
where G and E are the ground truth and the detected edges, by the algorithms, B is the number of segments detected
respectively. Then, Fi and Bi (with i = G, or E) stand for by 4-connected component analysis, and C is the number of
the foreground pixels, and background pixels, respectively. A segments detected by 8-connected component analysis. They
low MCR value indicates a good edge detection result. This measured two values, which are C/A (to present the edge
measure has been used by the work by Ergen [37]. continuity) and C/B (to present the edge width). A good edge
detection method is expected to give a smaller value of these
10) Buffer analysis method ratios.
In work by Zhang et al. [14], one template SAR image has
14) Mean square error (MSE), root-mean-square-error
been used in their buffer analysis method. The buffer analysis
(RMSE), signal-to-noise ratio (SNR), or peak signal-to-noise
method works by counting the number of detected edgels
ratio (PSNR)
within a 3 × 3 buffer around the correct edgel (i.e., the
Mittal et al. [38], Yang, Xia and Juan [41] use the mean
ground truth). There are two types of analysis presented; the
square error (MSE) as one of their evaluation method. If both
percentage Bi (i = 0, 1, 2, 3), and the cumulative percentage
the resultant edge image E and the ground truth image G are
Si (i = 0, 1, 2, 3), where i is the radius of the mask used for
both in size of M × N pixels, MSE is defined as:
the evaluation.
M −1 N −1
1 X X
11) Structural Similarity Index (SSIM) MSE(E, G) = ||Ex,y − Gx,y ||2 (20)
M N x=0 y=0
Mittal et al. [38] and Arulpandy and Pricilla [39] have use
SSIM to evaluate their work. SSIM can be defined as: where x, y are the spatial coordinates. Smaller magnitude of
MSE indicates a better performance.
SSIM(G, E) = [l(G, E), c(G, E), s(G, E)] (16) A similar measure to MSE is the root-mean-square-error
(RMSE), which is defined as:
which means that this measure is based on the luminance (l), p
contrast (c), and structure (s). RMSE(E, G) = MSE(E, G) (21)
RMSE has been used in the work by Mendhurwar et al. [43].
12) Edge based SSIM (ESSIM) Other option is by using peak signal-to-noise ratio (PSNR)
Arulpandy and Pricilla [39] have use edge based structural to evaluate the performance of edge detection algorithms.
similarity index (ESSIM) to evaluate their method. This PSNR is defined as:
measure can be defined as: 2552 M N
PSNR(E, G) = 10 log10 PM −1 PN −1
2
ESSIM(G, E) = [l(G, E), c(G, E), e(G, E)] (17) x=0 y=0 (Gx,y − Ex,y )
2552
which means that this measure is based on the luminance (l), = 10 log10 (22)
contrast (c), and edges (e). MSE(E, G)
In contrast to the MSE, a higher magnitude of PSNR indi-
13) Continuity evaluation cates a better performance. Researchers such as Yahya et al.
Zheng et al. [40] have used a measure called a continuity [22], Fu [44], Chi and Gao [45], Mendhurwar et al. [43], Ren,
evaluation index in their work. The continuity index ρ is Li and Chen [46], Sert and Avci [47], and Tang et al. [48]
defined as: have used PSNR as one of the evaluation measures for their
1 n s work. El Araby et al. [49], Hu et al. [50], and Sudhakara and
ρ= (18)
s n 1 Jenaki Meena [51] have used both MSE and PSNR.
Mendhurwar et al. [43] also have used signal-to-noise ratio
where 1 and s are the number of edge points from the
(SNR) to evaluate their method. This measure is defined as:
output, and the ground truth, respectively, whereas n1 and v
ns are the number of edge lines on the output, and the u PM −1 PN −1 2
x=0 y=0 (Ex,y )
u
ground truth, respectively. Higher value of ρ indicates better SNR(E, G) = t PM −1 PN −1 2
performance. x=0 y=0 (Gx,y − Ex,y )
Yang, Xia, and Juan [41] used a similar measure to ρ.
sP
M −1 PN −1 2
However, this measure does not need any ground truth. This x=0 y=0 (Ex,y )
= (23)
measure, denoted as R, is defined as: M N × MSE(E, G)
TEN Kurdi, Grantner, and M Abdel-Qader [52] have also used
R= (19) SNR in their work. However, their SNR is reported in decibel
CEN
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

(dB). Zhao [53] have utilized a slightly different equation for C. TEST IMAGES
their SNR measure. Not all of the publication samples in this study reported the
sources of their input images. Only 13 from 73 publication
15) Entropy samples (i.e., 17.81%) have reported their image source. Nev-
In 1948, E. Shannon had proposed a measure, called entropy, ertheless, we have observed that some of the works are using
to evaluate the information content [38]. The information standard images, such as “Lena” and “Cameraman”, together
entropy is defined as: with their unique input images. Zhang et al. [14] have used
L
X one SAR image obtained from the National Library Website.
H(I) = − pi log2 pi (24) Xu and Xiao [24] used images from work by Heath et al.
i=0 [63]. Sun et al. [19] generate artificial images to represent
where I is the image that we want to evaluate, pi is the hyperspectral images by using HYDRA [20]. Chen and Chiu
rate of recurrence of pixels with intensity i, and H(I) is used five natural images, with textures, taken from Flickr
Shannon’s entropy. This measure is also called the average (www.flickr.com).
information content (AIC). Researchers such as Mittal et al. Zhang et al. [64] used 795 training images and 654
[38], and Sudhakara and Jenaki Meena [51] have employed testing images taken from the NYUD2 dataset. Tang et al.
Shannon’s entropy to investigate whether their generated [48] used images obtained from the UCID test database.
edges are meaningful or not. Cao et al. [65] have used the Pascal VOC2012 database
(http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) for their
16) Figure of merit, FOM input. This database is available to researchers and has
Figure of merit (FOM) was introduced by Pratt [54]. This 17,125 images in 20 categories. Six publications have
measure is defined as: used Berkeley Segmentation Data Set and Benchmarks 500
E N (BSDS500). These publications are the works by Mittal et
1 X 1 al. [38], Qu et al. [15], Zheng et al. [32], Chen et al. [55],
FOM = (25)
max(NE , NG ) 1 + αd2 (k) Bogdan, Bonchis and Orhei [29] and Zheng et al. [40].
k=1
BSDS500 provides 200 images for training, 100 images for
where NG is the numbers of the actual edges, NE is the
validation, and 200 images for testing.
number of the detected edges by the algorithm, α is the
scaling constant, and d(k) is the displacement of the detected We also have investigated the number of test images used
edge from the actual edge. This measure has been used by by the researchers. Some of the publications did not explicitly
researchers, including Yahya et al. [22], Chen et al. [55], specify the number of test images used by them. This is
Mendhurwar et al. [43], Yang, Xia and Juan [41], Elaraby shown in Figure 4. From this figure, we can see that the
and Moratal [56], Ergen [37], Gonzalez, Melin and Castillo number of test images used by the researchers is varied. It
[57], Liu et al. [58], Ren, Li and Chen [46], Sadiq et al. [59], can be observed that 40 from 73 publication samples (i.e.,
Sert and Avci [47], Guo and Sengur [35], and Nes [60]. In 54.79%) are using five or fewer test images. On the other
the work by Liu and Ren [61], they addressed FOM as the hand, eight publications have used more than 100 test images.
quality factor Q. These eight publications are the works by:
1) Alpert et al. [66] in year 2010. The work was on detect-
17) Processing time, or computational complexity ing weak curved edges in noisy images. The test images
Processing time, or known as the CPU time, is a measure consist of 63 binary images and 100 grayscale images.
on how long the segmentation process needs to completely 2) Chen et al. [55] in the year 2015, which is using
segment the image. They normally measured in seconds, ot BSDS500. The edge detection is done based on nonsub-
milliseconds. Lasserre et al. [18] have evaluate the processing sampled contourlet transform and edge tracking.
time, based on three categories of input, which are easy, 3) Zhang et al. [64] in the year 2016, which is using the
medium and complex. Processing time is also has been used NYUD2 dataset. The work involves with a machine
by researchers such as Wang and Zhang [23], and Yahya et learning approach, which is based on structured forests.
al. [22]. 4) Dorafshan, Thomas and Maguire [6] in the year 2018.
In the work by Liu and Ren [61], in addition to the pro- The work is on detecting the crack in the concrete
cessing time, they also have reported the memory footprint, and involves implementing deep convolutional neural
which was measured in megabytes (MB). Higher value of networks.
memory footprint indirectly indicates that the algorithm is 5) Cao et al. [65] in the year 2018, which is using Pascal
more complex. VOC2012. This work implemented a parallel approach
Ma, Ma and Chu [62] compared the complexity of the edge to detect edges utilizing the Hadoop platform. Sixty
detection algorithms. They have evaluated the complexity of thousand images have been used in this work.
their method depending on the cases. On the other hand, 6) Qu et al. [15] in the year 2019, which is using BSDS500.
Qu et al. [15] used frame per second (FPS) to indicates the This work involves a deep neural network.
processing time needed by the algorihtms. 7) Mittal et al. [38] in the year 2019, which is using
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

have been utilized in their works. For researchers using


Berkeley Segmentation Data Set (BSDS500), this dataset
provides 100 test images with ground truth. The boundaries
in these ground truth images are labeled by a human. Re-
searchers such as Mittal et al. [38], Qu et al. [15], and Zheng
et al. [32] are using BSDS500 in their research.
In work by Xu and Xiao [24], their own edge detection al-
gorithm develops the ground truth. By changing the regular-
ization parameters αj , with j ∈ {1, 2, . . . , N }, their method
will develop N different edge maps Dj , ∈ {1, 2, . . . , N }.
Next, they generate N potential ground truths (PGTs) based
on the information from Dj . The Chi-square test was con-
ducted on each PGTj , and the PGTj with the best test value
FIGURE 4. The total number of test images used by the publication samples.
is selected as the estimated ground truth (EGT).

E. NUMBER OF BENCHMARK METHODS


BSDS500. The aim is to produce edges with better
We have investigated the number of benchmark methods used
connectivity.
by the publication samples. The result is shown in Figure
8) Zheng et al. [40] in the year 2020, which is using
6. As shown by this figure, four publications do not have
BSDS500. The work was on developing an adaptive
any benchmark method. Fifteen publications only have one
edge detection algorithm.
method to compare. Nevertheless, most of these publications
From the list above, we can see that the works with more are the extension to the well-known edge detector algorithm,
than 100 test images are mostly from the year 2018 and such as Canny or Sobel, where the comparisons were only
above. The majority of them are using the BSDS500 dataset. made to this base method.
Some of them are implementing deep neural network, which
requires a large number of images.
We also investigated the number of images that the re-
searchers used for their visual inspection assessment. The
finding is shown in Figure 5. From this figure, we can see
that majority of the publication samples are using less than
five images for this purpose. Although the majority of the
researchers are using a small number of test images for
the visual inspection, it is worth noting that some of the
researchers did further extensive assessments. These assess-
ments include inspecting the algorithms’ performance under
different parameter settings, different image resolutions, or
different levels of noise corruption.

FIGURE 6. The number of benchmark methods.

From Figure 6, it can be seen that the majority of the


methods have one to four benchmark methods. Canny, Sobel,
and Prewitt are among the popular benchmark methods.
In these samples, only one publication has more than nine
methods for the comparisons. This work is by Qu et al. [15],
where they have around 20 benchmark methods to assess
their deep neural network.

IV. DISCUSSIONS
From Section III-B, those 17 assessment methods obtained
from 73 publication samples can be classified into the fol-
FIGURE 5. The total number of test images used for the visual inspection. lowing three general categories:
(I) Subjective evaluation. Under this category, the per-
formance of the edge detection algorithm is judged
D. GROUND TRUTH subjectively by the human(s). Therefore, the reference
Almost all of our publication samples do not discuss their image or the ground truth is not needed. Methods
ground truth, although reference-based evaluation methods under this category are:
8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

1. Visual inspection (all 73 publications). gradient magnitude, or evaluating the edges shown on the
2. Percentage of acceptable contour (one publication). input image.
(II) No-reference image quality assessment methods. Works by Sze et al. [67], Alpert et al. [66], for example,
Methods under this category do not require the refer- use only visual inspection for their assessments because the
ence image or the ground truth. The measurement is objects on their test images are simple or having strong
done by the information provided solely by the results contrast, where the edges of interest are relatively easy to
from the algorithm. Methods under this category are: be identified. However, some other methods are using more
1. Continuity evaluation (two publications). complex test images. For these cases, not all edges on the
2. Processing time, or computational complexity (23 image will be considered for the evaluation. Judgment of the
publications). output quality is depending on the preference of the human
evaluator. Thus, the visual inspection is prone to the inter-
(III) Full-reference image quality assessment methods. The
observer variation, where the result reported by one observer
reference image or the ground truth is needed by this
may not be the same as a result by other observers.
category. The performance of the method is evaluated
To get a more truthful result, the researchers should report
by comparing the result from the algorithm with the
who the evaluators are and how many evaluators are em-
reference image. Methods under this category are:
ployed for their visual inspection procedure. In our opinion,
1. Number of detected edgels (one publication). the researchers should avoid being the evaluator, as the
2. Confusion matrix-based measures (14 publica- preference might be biased towards their own method. It
tions). would be much better to recruit people outside the research
3. Number of false edges (one publication). group to be the evaluator. It is also good to have more
4. ROC (one publication). than one evaluator for this purpose so that the inter-observer
5. Edge detection error (one publication). variation effect can be reduced. We also believe that the
6. Intersection over union (IoU) (two publications). visual inspection should be paired with other quantitative
7. Buffer analysis method (one publication). quality assessment methods.
8. SSIM (two publications). The second most popular edge detection assessment
9. Edge based SSIM (ESSIM) (one publication). method is the processing time or complexity analysis, with 23
10. Continuity evaluation (one publication). from 73 publications are utilizing it. This measure is used to
11. MSE, RMSE, SNR, or PSNR (14 publications). evaluate the processing speed of the methods, where a shorter
12. Entropy (two publications). processing time is desired. This measure is crucial, especially
13. Figure of merit (FOM) (14 publications). for those methods designed for real-time applications. The
The continuity evaluation methods fall into two categories other popular assessment methods are (i) confusion matrix-
because Yang, Xia, and Juan [41], and Yu et al. [42] do the based measures, (ii) MSE, RMSE, SNR, or PSNR, and (iii)
calculation based on the output edges only. In contrast, the figure of merit (FOM), where each of them has been reported
method by Zheng et al. [40] needs the ground truth. in 14 publications. The other evaluation methods can be
Researchers also combined assessment methods to eval- considered as not popular as only three or fewer publications
uate the performance of edge detection algorithm. Table 1 are using them.
presents the summary of the assessment used in these 73 pub- From Table 1, we can see that most of the publications
lications. There are 25 combinations in total, ranging from using a combination of three or four quality assessment meth-
one evaluation method to the combination of four methods. ods are recent publications, which are from the year 2018
From this literature review, we found that there is still no and above. This observation indicates that the researchers
standard assessment method available for evaluating the edge are still not satisfied with the currently available assessment
detection algorithm. There is no common assessment combi- methods, as a certain assessment only addressed a specified
nation that can be found in Table 1. Each assessment method aspect for the evaluation. Therefore, we would like to suggest
has its advantages and disadvantages, where the researcher the researchers have at least four quality assessment methods
uses or develops the measure based on their research’s aim. to cover the following four evaluation aspects:
Most of the assessment methods are also used by only one 1) The appearance of the edges. This can be observed by
publication. using subjective evaluation methods.
We observed that visual inspection is the most popular 2) Processing time, or complexity analysis. This measure
assessment method used by researchers. All of these 73 is important as a simple and fast method is generally
publications have utilized visual inspection. Even 20 of them desired.
did the evaluation solely based on the visual inspection. 3) Pixel-by-pixel evaluation of the detected edges. This
Although relatively easy, the visual inspection is essential to can be obtained by using full-reference image quality
verify that the algorithm produces results similar to the one assessment methods.
expected from a human. However, the visual inspection used 4) Continuity evaluation. This evaluation is important be-
may not be the same with each other. The visual inspection cause in most applications, a closed contour or continu-
may include checking the edges or contour, displaying the ous edges are required.
VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

TABLE 1. Summary of the quality assessment methods of 73 publications.

Total number
No. Assessement methods used Publications of publications
1 Visual inspection only Sze et al., 1997 [67], Alpert et al., 2010 [66], Yang and Xu, 20 publications
2011 [68], Liu et al., 2014 [4], Liu, Wang and Duan, 2014 [69],
Asghari and Jalali, 2015 [70], Singh, Bajpai and Pandey, 2015
[71], Wei and Qiao, 2015 [72], Mathur, Mathur and Mathur,
2016 [73], Liu and Mao, 2018 [74], Liu et al., 2018 [75], Shi
and Luo, 2018 [76], Wang and Wang, 2018 [77], Xu and Xiao,
2018 [24], Xu, Xu and Zuo, 2019 [78], Zhang et al., 2018 [79],
Bauganssa, Sbihi and Zaim, 2019 [80], Chen and Chiu, 2019
[21], Kabir and Mondal, 2019 [81], Sekehravani, Babulak and
Masoodi, 2020 [82]
2 (1) Visual inspection AND (2) MSE, RMSE, SNR, or PSNR Zhao, 2011 [53], Chi and Gao, 2014 [45], Kurdi, Grantner and 4 publications
M Abdel Qader, 2017 [52], Hu et al., 2019 [50]
3 (1) Visual inspection AND (2) Figure of merit (FOM) Nes, 2012 [60], Elaraby and Moratal, 2017 [56], Gonzalez, 5 publications
Melin and Castillo, 2017 [57], Liu et al., 2018 [58], Sadiq et
al., 2019 [59]
4 (1) Visual inspection AND (2) Processing time, or complexity Swargha and Rodrigues, 2012 [83], Shi et al., 2015 [84], 10 publications
Bastan, Bukhari and Breuel, 2017 [85], Gunawan et al., 2017
[86], Wang and Zhang, 2017 [23], Cao et al., 2018 [65], Jing et
al., 2018 [87], Zixin et al., 2018 [88], Ezzaaki et al., 2020 [89],
Ma, Ma and Chu, 2020 [62]
5 (1) Visual inspection AND (2) Confusion matrix-based mea- Khunteta and Ghosh, 2014 [30], Dong et al., 2015 [26], Zhang 6 publications
sures et al., 2016 [64], Huang, Yu and Zuo, 2017 [27], Dorafshan,
Thomas and Maguire, 2018 [6], Yang et al., 2018 [13]
6 (1) Visual inspection AND (2) Number of detected edgels Sundani et al., 2019 [25] 1 publication
7 (1) Visual inspection AND (2) Buffer analysis method Zhang et al., 2019 [14] 1 publication
8 (1) Visual inspection AND (2) MSE, RMSE, SNR, or PSNR Mendhurwar et al., 2011 [43], Ren, Li and Chen, 2013 [46], 3 publications
AND (3) Figure of merit (FOM) Sert and Avci, 2019 [47]
9 (1) Visual inspection AND (2) Misclassification rate (MCR) Ergen, 2014 [37] 1 publication
AND (3) Figure of merit (FOM)
10 (1) Visual inspection AND (2) Edge detection error AND (3) Guo and Sengur, 2014 [35] 1 publication
Figure of merit (FOM)
11 (1) Visual inspection AND (2) Confusion matrix-based mea- Chen et al., 2015 [55], Luo et al., 2017 [28] 2 publications
sures AND (3) Figure of merit (FOM)
12 (1) Visual inspection AND (2) Figure of merit (FOM) AND (3) Liu and Ren, 2019 [61] 1 publication
Processing time, or compexity
13 (1) Visual inspection AND (2) Percentage of acceptable con- Lasserre, Cutt and Moffat, 2008 [18] 1 publication
tour AND (3) Processing time, or complexity
14 (1) Visual inspection AND (2) Confusion matrix-based mea- Xu et al., 2014 [31], Sun et al., 2017 [19], Qu et al., 2019 [15], 4 publications
sures AND (3) Processing time, or complexity Bogdan, Bonchis and Orhei, 2020 [29]
15 (1) Visual inspection AND (2) Number of false edges AND (3) Al Jarrah, Al Jarrah and Roth, 2018 [33] 1 publication
Processing time, or complexity
16 (1) Visual inspection AND (2) Intersection over union (IoU) Yoon, 2016 [90] 1 publication
AND (3) Processing time, or complexity
17 (1) Visual inspection AND (2) MSE, RMSE, SNR or PSNR El Araby et al., 2018 [49], Fu, 2020 [44], Tang et al., 2020 [48] 3 publications
AND (3) Processing time, or complexity
18 (1) Visual inspection AND (2) Continuity evaluation AND (3) Yu et al., 2019 [42] 1 publication
Processing time, or complexity
19 (1) Visual inspection AND (2) Continuity evaluation AND (3) Zheng et al., 2020 [40] 1 publication
Confusion matrix-based measures
20 (1) Visual inspection AND (2) MSE, RMSE, SNR, or PSNR Sudhakara and Janaki Meena, 2019 [51] 1 publication
AND (3) Entropy
21 (1) Visual inspection AND (2) Structural Similarity Index Arulpandy and Pricilla, 2020 [39] 1 publication
(SSIM) AND (3) Edge based SSIM (ESSIM)
22 (1) Visual inspection AND (2) Continuity evaluation AND (3) Yang, Xia and Juan, 2016 [41] 1 publication
MSE, RMSE, SNR, or PSNR AND (4) Figure of merit (FOM)
23 (1) Visual inspection AND (2) Structural Similarity Index Mittal et al., 2019 [38] 1 publication
(SSIM) AND (3) MSE, RMSE, SNR, or PSNR AND (4)
Entropy
24 (1) Visual inspection AND (2) MSE, RMSE, SNR, or PSNR Yahya et al., 2019 [22] 1 publication
AND (3) Figure of merit (FOM) AND (4) Processing time, or
complexity
25 (1) Visual inspection AND (2) Confusion matrix-based mea- Zheng et al., 2019 [32] 1 publication
sures AND (3) Receiver operating characteristic (ROC) AND
(4) Intersection over union (IoU)

10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

One of the main drawbacks of the reference-based evalua- in the future, we would like to encourage researchers to
tion method for evaluating edge detection algorithms is that compare the performance of the developed algorithm with
the edges detected by the algorithm, and the edges in the state-of-art methods. For example, suppose the work is to
ground truth, are usually presented by thin segments. There- introduce a new improvement to Canny edge detection. In
fore, it is challenging to find the overlap portion of these that case, the comparison should not only done towards the
segments (i.e., the true positive), especially in segmenting original Canny edge detector but should also include the
weak edges. As a consequence, some assessment measures recently improved Canny versions.
that consider the slight deviation of the detected edges from
the ground truth, similar to the work by Luo et al. [28], should V. CONCLUSION
be further investigated in the future. To conclude, from these 73 publication samples, we found
The development of algorithms, which do not require that there are 17 quality assessment methods used by re-
training or validation image dataset, may continue to use searchers to evaluate the performance of their edge detection
a low number of test images. The number of test images algorithms. These assessment methods can be generally clas-
between one to five images is widely accepted, as found from sified into three categories, which are (i) subjective evalua-
Section III-C of this literature survey. However, we believe tion, (ii) no-reference image quality assessment methods, and
that the use of more test images is better as this will indicate (iii) full-reference image quality assessment methods. Most
that their methods are robust and can work on various types of of the assessment methods are under full-reference methods.
input images. We want to suggest that there should be at least This literature review reveals that there is still no standard
three test images to evaluate the edge detection algorithm. method to evaluate edge detection algorithm. There is also
One image is to present a simple image, one image with no standard in terms of the number of test images and the
moderate complexity, and another one with highly complex benchmark methods used. Most of the researchers also not
structures or edges. explain how they obtain the ground truth.
Nevertheless, with the rapid development of deep learning Based on this review, we have suggested that the evaluation
methods for image processing, including for edge detection methods combine at least four approaches to cover four main
purposes, researchers may be involved with deep learning, evaluation aspects. The number of the test images should be
which requires a large number of images for their methods. at least three to cover at least three levels of edge complexity.
For this purpose, the researcher can create their dataset or use For the ground truth, we suggested that the ground truth be
the available dataset such as BSDS500 or Pascal VOC2012. created by more than one human and contains three classes
Most of the quality assessment methods found in this liter- (i.e., useful edges, non-useful edges, and non-edges). We also
ature review are full-reference methods. This finding shows encourage the researchers to compare their methods with
the importance of the ground truth. Unfortunately, Section recent related methods. More works are needed in developing
III-D of this literature review shows that most authors did a standard quality assessment method for edge detection
not explain how they obtained their ground truth. Generally, algorithms.
the ground truth is presented by a binary image, where each
pixel is labeled either as edgel or as non-edgel. It is worth REFERENCES
noting that not all the edges from the image are the desired [1] Farid Garcia-Lamont, Jair Cervantes, Asdrúbal López, and Lisbeth Ro-
output edges. Therefore, some explanation on how the edges driguez. Segmentation of images by color features: A survey. Neurocom-
are classified into useful and non-useful edges should be puting, 292:1–27, 2018.
[2] M.R.H. Mohd Adnan, A. Mohd Zain, H. Haron, M. Zulfaezal Che Azemin,
provided. In the future, we would like to suggest that the and M. Bahari. Consideration of Canny edge detection for eye redness
ground truth is not binary but should contain information image processing: A review. IOP Conference Series: Materials Science
about useful edges, non-useful edges, and non-edges (i.e., and Engineering, 551(1), 2019.
[3] R. Yaacob, C.D. Ooi, H. Ibrahim, N.F.N. Hassan, P.J. Othman, and
three categories). H. Hadi. Automatic extraction of two regions of creases from palmprint
Furthermore, suppose the ground truth is created by hu- images for biometric identification. Journal of Sensors, 2019, 2019.
mans. In order to reduce the effect of inter-human variation, [4] T.-S. Liu, R.-X. Liu, P. Zeng, and S.-W. Pan. Improved Canny algorithm
for edge detection of core image. Open Automation and Control Systems
more than one human operator is needed to mark the edges Journal, 6(1):426–432, 2014.
for the ground truth. Therefore, in the future, research on how [5] H.-Y. Yin and L. Tang. Crack detection of highway prestressed concrete
to combine edge segments from multiple human operators bridge based on image recognition. 2020 IEEE International Conference
on Industrial Application of Artificial Intelligence, (IAAI 2020), pages 82–
should be conducted.
87, 2020.
We also found that the number of benchmark methods [6] S. Dorafshan, R.J. Thomas, and M. Maguire. Comparison of deep
that are less than five methods is acceptable in Section III-E. convolutional neural networks and edge detectors for image-based crack
Well-known edge detection algorithms such as Canny, Sobel, detection in concrete. Construction and Building Materials, 186:1031–
1045, 2018.
and Prewitt are among the popular choice for the benchmark [7] Shalinee Patel, Pinal Trivedi, Vrundali Gandhi, and Ghanshyam I. Prajap-
methods. We found out that the comparison with the state- ati. 2D basic shape detection using region properties. International Journal
of-art method is not a requirement for researches in this area. of Engineering Research & Technology (IJERT), 2(5):1147–1153, 2013.
[8] Parimal Sikchi, Neha Beknalkar, and Swapnil Rane. Real-time car-
This observation is maybe because the developed algorithms toonization using Raspberry Pi. International Journal of Computing and
are specific to a particular type of application only. However, Technology, 1:284–287, 2014.

VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

[9] S. K. Katiyar and P. V. Arun. Comparative analysis of common edge [33] Rami Al-Jarrah, Mohammad Al-Jarrah, and Hubert Roth. A novel edge
detection techniques in context of object extraction. arXiv:1405.6132, detection algorithm for mobile robot path planning. Journal of Robotics,
2014. 2018:1969834, Jan 2018.
[10] Ahmed Shihab. Comparative study among Sobel, Prewitt and Canny edge [34] Li Ma and R.C. Staunton. A modified fuzzy C-means image segmentation
detection operators used in image processing. Journal of Theoretical and algorithm for use with uneven illumination patterns. Pattern Recognition,
Applied Information Technology, 96:9, 10 2018. 40(11):3005–3011, 2007.
[11] K. Kumar, N. Mustafa, J. Li, R. A. Shaikh, S. A. Khan, and A. Khan. [35] Y. Guo and A. Şengür. A novel image edge detection algorithm based
Image edge detection scheme using wavelet transform. In 2014 11th on neutrosophic set. Computers and Electrical Engineering, 40(8):3–25,
International Computer Conference on Wavelet Actiev Media Technology 2014.
and Information Processing (ICCWAMTIP), pages 261–265, 2014. [36] J.W. Yoon. A new Bayesian edge-linking algorithm using single-target
[12] Tobias Gebäck and Petros Koumoutsakos. Edge detection in microscopy tracking techniques. Symmetry, 8(12), 2016.
images using Curvelets. BMC Bioinformatics, 10(1):75, 2009. [37] Burhan Ergen. A fusion method of Gabor wavelet transform and unsuper-
[13] X. Yang, Q. Zhang, X. Yang, Q. Peng, Z. Li, and N. Wang. Edge detection vised clustering algorithms for tissue edge detection. The Scientific World
in Cassini astronomy image using extreme learning machine. MATEC Journal, 2014:964870, Mar 2014.
Web of Conferences, 189, 2018. [38] M. Mittal, A. Verma, I. Kaur, B. Kaur, M. Sharma, L. M. Goyal, S. Roy,
[14] Ziwen Zhang, Yijun Liu, Tie Liu, Yang Li, and Wujian Ye. Edge detection and T. Kim. An efficient edge detection approach to provide better edge
algorithm of a symmetric difference kernel SAR image based on the GAN connectivity for image analysis. IEEE Access, 7:33240–33255, 2019.
network model. Symmetry, 11(4), 2019. [39] P Arulpandy and M Trinita Pricilla. Salt and pepper noise reduction and
[15] Z. Qu, S. Wang, L. Liu, and D. Zhou. Visual cross-image fusion using deep edge detection algorithm based on neutrosophic logic. Computer Science,
neural networks for image edge detection. IEEE Access, 7:57604–57615, 21(2), 2020.
2019. [40] Z. Zheng, B. Zha, H. Yuan, Y. Xuchen, Y. Gao, and H. Zhang. Adaptive
[16] S. Morton, A. Berg, L. Levit, and J. Eden. Finding What Works in Health edge detection algorithm based on improved grey prediction model. IEEE
Care: Standards for Systematic Reviews. National Academic Press (US), Access, 8:102165–102176, 2020.
Washington (DC), 2011.
[41] Li Yang, Chang Xia, and Chang Juan. Image edge detection based on
[17] David Moher, Alessandro Liberati, Jennifer Tetzlaff, and Douglas G Alt- Gaussian mixture model in nonsubsampled Contourlet domain. Journal of
man. Preferred reporting items for systematic reviews and meta-analyses: Electrical and Computer Engineering, 2016:4125909, Jul 2016.
the PRISMA statement. BMJ, 339, 2009.
[42] Changzhi Yu, Fang Ji, Xingjiu Jing, and Mi Liu. Dynamic granularity
[18] Patricia Lasserre, Bryce Cutt, and James Moffat. Edge-detection of the
matrix space based adaptive edge detection method for structured light
radiation field in double exposure portal images using a curve propagation
stripes. Mathematical Problems in Engineering, 2019:1959671, Mar 2019.
algorithm. Journal of Applied Clinical Medical Physics, 9(4):3–16, 2008.
[43] Kaustubha Mendhurwar, Shivaji Patil, Harsh Sundani, Priyanka Aggarwal,
[19] Genyun Sun, Aizhu Zhang, Jinchang Ren, Jingsheng Ma, Peng Wang,
and Vijay Devabhaktuni. Edge-detection in noisy images using indepen-
Yuanzhi Zhang, and Xiuping Jia. Gravitation-based edge detection in
dent component analysis. ISRN Signal Processing, 2011:672353, Apr
hyperspectral images. Remote Sensing, 9(6), 2017.
2011.
[20] Tom Rink, W. Paul Menzel, Paolo Antonelli, Tom Whittaker, Kevin
[44] R. Fu. Accurate detection of image edge: Comparison of different
Baggett, Liam Gumley, and Allen Huang. Introducing HYDRA: A mul-
algorithms. International Journal of Mechatronics and Applied Mechanics,
tispectral data analysis toolkit. Bulletin of the American Meteorological
1(7):119–123, 2020.
Society, 88(2):159 – 166, 01 Feb. 2007.
[45] Chunmei Chi and Feng Gao. Palm print edge extraction using fractional
[21] Shou-Cih Chen and Chung-Cheng Chiu. Texture construction edge
differential algorithm. Journal of Applied Mathematics, 2014:896938, Apr
detection algorithm. Applied Sciences, 9(5), 2019.
2014.
[22] Ali Abdullah Yahya, Jieqing Tan, Benyu Su, Kui Liu, and Ali Naser Hadi.
Image edge detection method based on anisotropic diffusion and total [46] F. Ren, B. Li, and Q. Chen. Single parameter logarithmic image processing
variation models. The Journal of Engineering, 2019(2):455–460, 2019. for edge detection. IEICE Transactions on Information and Systems, E96-
D(11):2437–2449, 2013.
[23] Qiangyu Wang and Guoying Zhang. Ore image edge detection using
HOG-index dictionary learning approach. The Journal of Engineering, [47] E. Sert and D. Avci. A new edge detection approach via neutrosophy based
2017(9):542–543, 2017. on maximum norm entropy. Expert Systems with Applications, 115:499–
[24] Huilin Xu and Yuhui Xiao. A novel edge detection method based on the 511, 2019.
regularized Laplacian operation. Symmetry, 10(12), 2018. [48] Jiali Tang, Yan Wang, Chenrong Huang, Huangxiaolie Liu, and Najla
[25] D. Sundani, S. Widiyanto, Y. Karyanti, and D.T. Wardani. Identification Al-Nabhan. Image edge detection based on singular value feature vec-
of image edge using quantum Canny edge detection algorithm. Journal of tor and gradient operator. Mathematical Biosciences and Engineering,
ICT Research and Applications, 13(2):133–144, 2019. 17(4):3721–3735, 2020.
[26] Enzeng Dong, Yao Zhao, Xiao Yu, Junchao Zhu, and Chao Chen. An [49] W.S. El Araby, A.H. Madian, M.A. Ashour, I. Farag, and M. Nassef.
improved NMS-based adaptive edge detection method and its FPGA Radiographic images fractional edge detection based on genetic algorithm.
implementation. Journal of Sensors, 2016:1470312, Dec 2015. International Journal of Intelligent Engineering and Systems, 11(4):158–
[27] Liang Huang, Xueqin Yu, and Xiaoqing Zuo. Edge detection in UAV re- 166, 2018.
mote sensing images using the method integrating Zernike moments with [50] Zhibin Hu, Caixia Deng, Yunhong Shao, and Cui Wang. Edge detection
clustering algorithms. International Journal of Aerospace Engineering, method based on lifting B-spline dyadic wavelet. International Journal of
2017:1793212, Feb 2017. Performability Engineering, 15(5):1472, 2019.
[28] S. Luo, J. Yang, Q. Gao, S. Zhou, and C.A. Zhan. The edge detectors [51] M. Sudhakara and M. Janaki Meena. An edge detection mechanism using
suitable for retinal OCT image segmentation. Journal of Healthcare L*a*b color-based contrast enhancement for underwater images. Indone-
Engineering, 2017, 2017. sian Journal of Electrical Engineering and Computer Science, 18(1):41–
[29] V. Bogdan, C. Bonchiş, and C. Orhei. Custom dilated edge detection 48, 2019.
filters. Journal of WSCG, 2020(2020):161–168, 2020. [52] Aous H. Kurdi, Janos L. Grantner, and Ikhlas M. Abdel-Qader. Fuzzy logic
[30] Ajay Khunteta and D. Ghosh. Edge detection via edge-strength estima- based hardware accelerator with partially reconfigurable defuzzification
tion using fuzzy reasoning and optimal threshold selection using particle stage for image edge detection. International Journal of Reconfigurable
swarm optimization. Advances in Fuzzy Systems, 2014:365817, Dec Computing, 2017:1325493, Mar 2017.
2014. [53] X. Zhao. Edge detection algorithm based on multiscale product with
[31] G. Xu, Y. Zhao, R. Guo, B. Wang, Y. Tian, and K. Li. A salient edges Gaussian function. Procedia Engineering, 15:2650–2654, 2011.
detection algorithm of multi-sensor images and its rapid calculation based [54] W. K. Pratt. Digital Image Processing. John Wiley & Sons, New York, 2
on PFCM kernel clustering. Chinese Journal of Aeronautics, 27(1):102– edition, 1993.
109, 2014. [55] Enqing Chen, Jianbo Wang, Lin Qi, and Weijun Lv. A novel multiscale
[32] Z. Zheng, B. Zha, Y. Xuchen, H. Yuan, Y. Gao, and H. Zhang. Adap- edge detection approach based on nonsubsampled Contourlet transform
tive edge detection algorithm based on grey entropy theory and textural and edge tracking. Mathematical Problems in Engineering, 2015:504725,
features. IEEE Access, 7:92943–92954, 2019. Feb 2015.

12 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

[56] A. Elaraby and D. Moratal. A generalized entropy-based two-phase [80] I. Bouganssa, M. Sbihi, and M. Zaim. Laplacian edge detection algorithm
threshold algorithm for noisy medical image edge detection. Scientia for road signal images and FPGA implementation. International Journal
Iranica, 24(6):3247–3256, 2017. of Machine Learning and Computing, 9(1):57–61, 2019.
[57] C.I. Gonzalez, P. Melin, and O. Castillo. Edge detection method based [81] Md Ahasan Kabir and Md Rubaiyat Hossain Mondal. Intensity gradient
on general type-2 fuzzy logic applied to color images. Information based edge detection for pixelated communication systems. The Journal
(Switzerland), 8(3), 2017. of Engineering, 2019(12):8463–8470, 2019.
[58] S. Liu, X. Geng, Y. Zhang, S. Zhang, J. Zhang, Y. Xiao, C. Huang, [82] E.A. Sekehravani, E. Babulak, and M. Masoodi. Implementing Canny
and H. Zhang. An edge detection method based on wavelet transform edge detection algorithm for noisy image. Bulletin of Electrical Engineer-
at arbitrary angles. IEICE Transactions on Information and Systems, ing and Informatics, 9(4):1404–1410, 2020.
E101D(9):2392–2400, 2018. [83] K. Swargha and P. Rodrigues. Performance amelioration of edge detec-
[59] B.O. Sadiq, E.O. Ochia, O.S. Zakariyya, and A.F. Salami. On the accuracy tion algorithms using concurrent programming. Procedia Engineering,
of edge detectors in number plate extraction. Baltic Journal of Modern 38:2824–2831, 2012.
Computing, 7(1):19–30, 2019. [84] Y. Shi, Y. Gu, L.-L. Wang, and X.-C. Tai. A fast edge detection algorithm
[60] P.G. Nes. Fast multi-scale edge-detection in medical ultrasound signals. using binary labels. Inverse Problems and Imaging, 9(2):551–578, 2015.
Signal Processing, 92(10):2394–2408, 2012. [85] M. Baştan, S.S. Bukhari, and T. Breuel. Active Canny: Edge detection
[61] Chunsheng Liu and Chunping Ren. Research on coal-rock fracture image and recovery with open active contour models. IET Image Processing,
edge detection based on Tikhonov regularization and fractional order 11(12):1325–1332, 2017.
differential operator. Journal of Electrical and Computer Engineering, [86] T.S. Gunawan, I.Z. Yaacob, M. Kartiwi, N. Ismail, N.F. Za’bah, and
2019:9594301, May 2019. H. Mansor. Artificial neural network based fast edge detection algorithm
[62] Y. Ma, H. Ma, and P. Chu. Demonstration of quantum image edge extration for MRI medical images. Indonesian Journal of Electrical Engineering and
enhancement through improved Sobel operator. IEEE Access, 8:210277– Computer Science, 7(1):123–130, 2017.
210285, 2020. [87] Y. Jing, J. Liu, Z. Liu, and H. Cao. Fast edge detection approach
[63] M. D. Heath, S. Sarkar, T. Sanocki, and K. W. Bowyer. A robust visual based on global optimization convex model and split Bregman algorithm.
method for assessing the relative performance of edge-detection algo- Diagnostyka, 19(2):23–29, 2018.
rithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, [88] L. Zixin, L. Jinyue, L. Yang, and W. Liming. A fast tool edge detection
19(12):1338–1359, 1997. method based on Zernike moments algorithm. IOP Conference Series:
[64] Heng Zhang, Zhenqiang Wen, Yanli Liu, and Gang Xu. Edge detection Materials Science and Engineering, 439(3), 2018.
from RGB-D image based on structured forests. Journal of Sensors, [89] Ayoub Ezzaki, Masmoudi Lhoussaine, Mohamed El Ansari, Francisco-
2016:5328130, Jul 2016. Angel Moreno, Rachid Zenouhi, and Javier Gonzalez Jimenez. Edge
[65] Jianfang Cao, Lichao Chen, Min Wang, and Yun Tian. Implementing a detection algorithm based on quantum superposition principle and photons
parallel image edge detection algorithm based on the Otsu-Canny operator arrival probability. International Journal of Electrical and Computer
on the Hadoop platform. Computational Intelligence and Neuroscience, Engineering (IJECE), 10(2):1655–1666, 2020.
2018:3598284, May 2018. [90] Iljung Yoon, Heewon Joung, and Jooheung Lee. Zynq-based recon-
[66] S. Alpert, M. Galun, B. Nadler, and R. Basri. Detecting faint curved figurable system for real-time edge detection of noisy video sequences.
edges in noisy images. Lecture Notes in Computer Science (including Journal of Sensors, 2016:2654059, Aug 2016.
subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), 6314 LNCS(Part 4):750–763, 2010.
[67] C.-J. Sze, H.-Y.M. Liao, H.-L. Hung, K.-C. Fan, and J.-W. Hsieh. Mul-
tiscale edge detection via normal changes. Lecture Notes in Computer
Science (including subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), 1310:22–29, 1997.
[68] G. Yang and F. Xu. Research and analysis of image edge detection
algorithm based on the MATLAB. Procedia Engineering, 15:1313–1318, NAZISH TARIQ received the B.S degree in
2011. biomedical engineering from Sir Syed University
[69] X. Liu, X. Wang, and Z. Duan. Canny edge detection based on iterative of Engineering & Technology Pakistan in 2011,
algorithm. International Journal of Security and its Applications, 8(5):41– and the M.E. in Industrial Control & Automation
50, 2014. from Hamdard University Pakistan in 2017. She
[70] Mohammad H. Asghari and Bahram Jalali. Edge detection in digital is currently pursuing a Ph.D. degree in electrical
images using dispersive phase stretch transform. International Journal of and electronic engineering with Universiti Sains
Biomedical Imaging, 2015:687819, Mar 2015.
Malaysia (USM). She is also a Research Asso-
[71] K.K. Singh, M.K. Bajpai, and R.K. Pandey. A novel approach for edge
ciate with the Office of Research and Innovation
detection of low contrast satellite images. International Archives of
the Photogrammetry, Remote Sensing and Spatial Information Sciences and Commercialization, Dow University of Health
- ISPRS Archives, 40(3W2):211–217, 2015. Sciences Pakistan. Her research interests include digital image processing,
[72] Z. Wei and J. Qiao. Industrial flame edge detection algorithm based on signal processing, and deep learning.
gray dominant filter. International Journal of Multimedia and Ubiquitous
Engineering, 10(2):87–96, 2015.
[73] N. Mathur, S. Mathur, and D. Mathur. A novel approach to improve Sobel
edge detector. Procedia Computer Science, 93:431–438, 2016.
[74] R. Liu and J. Mao. Research on improved Canny edge detection algorithm.
MATEC Web of Conferences, 232, 2018.
[75] L. Liu, F. Liang, J. Zheng, D. He, and J. Huang. Ship infrared image edge
detection based on an improved adaptive Canny algorithm. International ROSTAM AFFENDI HAMZAH graduated from
Journal of Distributed Sensor Networks, 14(3), 2018. Universiti Teknologi Malaysia, where he received
[76] C. Shi and J. Luo. Edge detection algorithm based on color space variables. his B.Eng majoring in Electronic Engineering.
International Journal of Performability Engineering, 14(5):885–890, 2018. Then he received his M. Sc. majoring in Electronic
[77] Z.-X. Wang and W. Wang. The research on edge detection algorithm of System Design Engineering and Ph.D. majoring
lane. Eurasip Journal on Image and Video Processing, 2018(1), 2018. in Image Processing from the Universiti Sains
[78] Hongyun Xu, Xiaoli Xu, and Yunbo Zuo. Applying morphology to Malaysia. Currently, he is a senior lecturer in the
improve Canny operator’s image segmentation method. The Journal of Universiti Teknikal Malaysia Melaka.
Engineering, 2019(23):8816–8819, 2019.
[79] K. Zhang, Y. Zhang, P. Wang, Y. Tian, and J. Yang. An improved Sobel
edge algorithm and FPGA implementation. Procedia Computer Science,
131:243–248, 2018.

VOLUME 4, 2016 13

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3089210, IEEE Access

Tariq et al.: Quality Assessment Methods to Evaluate the Performance of Edge Detection Algorithms for Digital Image

THEAM FOO NG obtained his B.Sc (Hons)


Mathematics and M.Sc (Statistics) from Univer-
siti Sains Malaysia (USM), Penang, Malaysia. He
received his Ph.D. degree from the University of
New South Wales (UNSW), Australia. He worked
with the School of Electrical & Electronic En-
gineering, Engineering Campus, USM, Nibong
Tebal, Penang, Malaysia. He is now with the
Centre for Global Sustainability Studies (CGSS),
USM, Penang, Malaysia, where he is currently a
senior lecturer. He was a permanent representative on behalf of CGSS for
the Division of Industry and Community Network (DICN), USM. He is also
a coordinator for the South East Asia Sustainability Network (SEASN). He
has experience as a guest editor, associate editor, section editor, and reviewer
for a few reputable journals/book series. Currently, he has more than 30
research papers and book chapters published in established international
journals and book chapters. His research interest is around machine learning,
pattern recognition, computational intelligence, image processing, education
for sustainable development, and sustainability.

SHIR LI WANG is a senior lecturer in the Com-


puting Department, Faculty of Art, Computing and
Industry Creative, Universiti Pendidikan Sultan
Idris (UPSI). She earned her doctorate from the
University of New South Wales, Australia. She
received her master’s and bachelor’s degrees from
the Universiti Sains Malaysia. Her research work
focuses on the improvement of algorithms and
problem solving on the basis of evolutionary al-
gorithms, neural networks, and deep learning. Her
interests and passions in artificial intelligence motivate her to explore the
potential of artificial intelligence in other domains. Currently, her research
focuses on adaptive parameters in evolutionary algorithms.

HAIDI IBRAHIM (M’07–SM’18) received his


B.Eng degree in electrical and electronic engineer-
ing from Universiti Sains Malaysia, Malaysia. He
received his Ph.D degree in image processing from
Centre for Vision, Speech and Signal Processing
(CVSSP), University of Surrey, United Kingdom
in 2005. His research interest includes digital im-
age and signal processing, and analysis.

14 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like