You are on page 1of 10

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2019.2952256, IEEE
Transactions on Multimedia
1

FFTMI: Features Fusion for Natural Tone-Mapped


Images Quality Evaluation
Lukáš Krasula∗ , Karel Fliegel, Member, IEEE, Patrick Le Callet, Senior Member, IEEE

Abstract—Tone-mapping is a crucial step in the task towards operators enable to be tuned to the particular scene via a set of
displaying high dynamic range (HDR) images on standard parameters which also significantly influences the result [4].
displays. Given the number of possible ways to tone-map such In most of the real applications, it is impossible or at least
images, development of an objective quality criterion, enabling
selection of the most suitable tone-mapping operator (TMO) inconvenient to manually select the appropriate TMO and
and setting its parameters in order to maximize the quality its parameters. The existence of objective quality assessment
of the reproduction, is of high interest. In this paper, a new method, reliable within TMO (for parameters setting) as
objective metric for natural tone-mapped images is proposed. well as among different TMOs, is therefore of very high
It is based on a fusion of several perceptually relevant features importance. However, considering the difference in DR of
that have been carefully selected using an appropriate feature
selection procedure. The outcome of the selection also provides the original image and its tone-mapped version, the usage
a valuable insight into the importance of particular perceptual of popular full-reference metrics, calculating a similarity or
aspects when judging the quality of tone-mapped HDR content. a difference between the reference and the distorted image, is
The performance of the resulting combination of features is out of the question.
thoroughly evaluated with respect to three publicly available Our study of the relevant objective metrics performance
databases and compared to several relevant state-of-the-art crite-
ria. The proposed approach is shown to significantly outperform revealed their inability to reliably predict the preferences of
the tested metrics and can, therefore, be considered a competitive human observers with respect to tone-mapped natural images
alternative for tone-mapped images evaluation. [5]. The purpose of this paper is to propose a novel objective
Index Terms—High Dynamic Range Imaging, tone-mapping, criterion applicable to such content. Motivated by the study of
quality assessment, feature selection. Čadı́k et al. [6], we decided to design the criterion as a fusion
of multiple, perceptually relevant, features.
When designing a feature-based objective quality metric, a
I. I NTRODUCTION
typical approach is to propose a number of estimators and

R ECENT development in image and video processing area


aims towards providing to the observers as realistic and
immersive experience as possible. This is reflected in new
combine them using a machine learning technique. While be-
ing demonstrably efficient, this approach has two fundamental
flaws. Firstly, it does not guarantee that the particular estima-
technologies available on the market such as systems with ultra tors are the ones most suitable for capturing a desired feature,
high definition (UHD), 3DTV, high frame rate (HFR), or high nor that the selected estimators are complementary. Secondly,
dynamic range (HDR), as well as in the standardization efforts the machine learning-based combination is not transparent
[1]. All of the stated enhancements bring new challenges to with respect to the relative importance of the given estimators
quality assessment. and mostly requires retraining for different conditions.
The goal of the HDR imaging is to capture the exact Considering these drawbacks, we decided to take a different
luminance values as present in the real-world scene and approach. We propose to perform feature selection on an
reproduce them faithfully on the end-device. However, the extensive set of both new and existing estimators to ensure
dynamic range (DR), i.e. the ratio between the darkest and the that the selected ones are optimal for their particular purposes,
brightest point, of common displays is around hundred times as well as in the combination with one another. We also only
smaller than the DR of a typical real-world scene as perceived allow for a linear combination which is in agreement with the
by human eye [2]. In order to reproduce such a scene, it is outcomes of the study conducted by Čadı́k et al. [6] and, more
necessary to compress its DR. This is achieved by using tone- importantly, provides desired transparency and is less prone to
mapping operators (TMOs). An overview of existing TMOs over-fitting. The parameters of the combination are determined
can be found e.g. in [2]. on only one database and are expected to remain fixed. The
Each TMO has a different impact on the resulting image validation performed on data from different datasets supports
quality. In order to compare the performance of various this claim.
TMOs, a number of subjective studies have been performed. The paper is organized as follows: The section II provides
Most of them are summarized in [3]. Moreover, most of the an overview of the relevant concepts in objective evaluation
L. Krasula and P. Le Callet are with University of Nantes, of tone-mapped images and feature selection, section III
LS2N CNRS UMR 6004, 44306, Nantes, France (email: {lukas.krasula, describes our proposed procedure used to identify the features
patrick.lecallet}@univ-nantes.fr). most relevant to tone-mapped images, section IV formulates
L. Krasula and K. Fliegel are with the Department of Radioelectronics,
Faculty of Electrical Engineering, Czech Technical University in Prague, the proposed metric, section V validates its performance, and,
Technická 2, 166 27 Prague 6 (email: fliegek@fel.cvut.cz). finally, section VI concludes the paper.

1520-9210 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2019.2952256, IEEE
Transactions on Multimedia
2

II. R ELEVANT L ITERATURE OVERVIEW Original set Subset Subset Subset


The goal of this section is to summarize efforts dedicated of features generation evaluation

to objective evaluation of tone-mapped images and to provide


an overview of existing approaches towards feature selection.
No Stopping Yes Results
A. Objective Metrics for Tone-Mapped Images criterion validation
The first objective metric designed for evaluation of
tone-mapped images is Dynamic Range Independent Metric
(DRIM) [7]. It uses a model of the human visual system (HVS) Figure 1. Framework of feature selection algorithms. Redrawn from [21].
to identify the areas where the contrast in the tone-mapped
image is lost, amplified, or reversed with respect to the HDR
original. B. Feature Selection for Quality Assessment
Later on, Yeganeh and Wang developed a Tone-Mapped The existing strategies for feature selection are thoroughly
image Quality Index (TMQI) [8]. The method consists of described in [21]. Such techniques were practically used in the
two parts. The first part estimates the structural similarity of area of image quality assessment, for example, by Nuutinen
the HDR and tone-mapped image. The difference in DR is et al. [22] where the objective was to reliably compare the
compensated by mapping according to the contrast sensitivity images coming from different cameras.
function (CSF). The second part of the index quantifies the The purpose of the feature selection strategies is to select
naturalness of the tone-mapped version based on its overall a subset from the full set of features that will be able to
brightness and contrast. The index was recently revised by Ma model the data the most accurately. The classical framework of
et al. [9], adding improvements to both parts. The improved feature selection algorithms is depicted in Figure 1. According
index is referred to as TMQI-II. to the ways of generating subsets, the strategies can be roughly
Ziaei Nafchi et al. [10] proposed feature similarity index divided into three approaches:
for tone-mapped images (FSITM) comparing locally weighted • Complete search algorithms,
mean phase angle (LWMPA) maps. The index is calculated for • sequential search algorithms, and
each channel separately. • random search algorithms.
Two more full-reference measures for contrast loss and
The complete search strategies guarantee the selection of the
contrast waste were introduced by Granados et al. [11]. They
globally optimal subset (with respect to the given criterion).
are based on the model of HVS and camera noise estimation.
Even though some methods not requiring testing all possible
In our previous work [4], we used a simple measure of
combinations, such as branch and bound or beam algorithms,
contrast reversal together with a new model of naturalness for
have been introduced. These strategies are mostly very time
TMOs parameters optimization. The contrast reversal quanti-
consuming and not practical for larger feature sets.
fies to what extent the gradient direction is changed between
The sequential search algorithms either start with one
the HDR and tone-mapped versions while the naturalness
feature and add one at the time, or begin with the full set
determines how likely does the combination of brightness,
and sequentially remove one feature. Which type to choose
contrast, and colorfulness in the tone-mapped image result
depends on the desired size of the subset. If a smaller subset
in a natural looking picture. The suitable estimators of the
is more convenient, the adding of features is more appropriate
particular features have been studied in [12].
[22]. The possible strategies include sequential forward selec-
More recently, Hadizadeh and Bajić [13] introduced a crite-
tion, sequential backward selection, or bidirectional selection
rion based on a “bag of features”. They combined 8 different
[23]. The disadvantage is that the approach cannot guarantee
quality-related features using support vector regression (SVR).
Another way is to evaluate the quality of the tone-mapped a finding of the global optimum and only focuses on a certain
image alone using no-reference measures. However, most of path.
these criteria are either designed for or trained on typical The last group is created by the random search algorithms.
distortions [14]–[16] and their applicability out of the context, The initial subset is generated randomly. Further on, a se-
therefore, has to be verified. Nevertheless, some distortion quential approach, adjusted to include randomnesses (such
unaware metrics, such as Natural Image Quality Evaluator as random-start-hill-climbing or simulated annealing) can be
(NIQE) [17], have been introduced as well. used. Alternatively, a new set can also be generated completely
Recently, several no-reference metrics coming from HDR randomly again. Such an approach is also known as Las Vegas
Image GRADient based Evaluator (HIGRADE), designed algorithm [24]. The randomness helps to avoid following a
specifically for images coming from tone-mapping and multi- single path.
exposure fusion, have been developed [18]. In terms of evaluating the currently selected subset, the
Aydın et al. [19] pointed out that the aesthetic properties of following approaches can be identified:
the tone-mapped image are of high importance in the quality • Filter models,

perception and proposed a measure based on sharpness, clarity, • wrapper models, and

depth, and tone. • hybrid models.


Gu et al. [20] developed a blind quality metric evaluating The filter models use a criterion independent on any mining
information, naturalness, and structure of tone-mapped images. algorithms. These criteria can be based e.g. on distance,

1520-9210 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2019.2952256, IEEE
Transactions on Multimedia
3

information, dependency, or consistency. The wrapper models a) Group 1: The first group was created by the outcomes
use the performance of a mining algorithm used on the of full-reference algorithms comparing the contrast and struc-
selected subset to evaluate the subset. The wrapper models ture of the HDR reference and the tone-mapped version. The
are generally more effective but also more computationally estimators in the group were: structural similarity from TMQI
demanding. The hybrid models combine the two approaches. [8], structural similarity from TMQI-II [9], contrast reversal
The often used stopping criteria include: from [4], contrast loss DRIMl , contrast reversal DRIMr , and
• All the possibilities have been tested, contrast amplification DRIMa [7]. If the Group 1 was used,
• a maximum number of iterations has been reached, random amount of its members have been randomly selected
• a maximum number of features have been reached, for the subset (since not any two of the members measure the
• adding more features does not provide any improvement, same perceptual aspect).
• a sufficiently good subset has been found, etc. b) Group 2: The features in the second group came from
full-reference feature similarity estimators for each channel
The following sections will describe the selection procedure
– FSITMr , FSITMg , and FSITMb [10]. If this group was
proposed for identification of the most relevant features for
selected, all three of the metrics were used since it had been
quality assessment of tone-mapped images.
found less meaningful to estimate the feature similarity in one
or two channels only.
III. P ROPOSED S ELECTION OF F EATURES R ELEVANT TO c) Group 3: The third group included contrast features
T ONE -M APPED I MAGES coming from GCF [26], Weber contrast [27], Michelson con-
trast [28], SDME [29], and RMS contrast [30]. Here, only one
Čadı́k et al. [6] identified the perceptual attributes contribut- feature at a time could have been chosen.
ing to overall quality perception for natural content. These are d) Group 4: The fourth group comprised of no-reference
brightness, contrast, details, color, and artifacts. They also colorfulness features. The used estimators were CIQI [31],
argue that the perceived contrast depends on lightness, chroma, CQE1 and CQE2 colorfulness [30], and color saturation (i.e.
and sharpness. The resulting fusion metric should, therefore, mean of the S channel in HSV color space). Again, only one
combine at least some of these perceptual attributes. feature from this group could have been in the subset.
Considering that the goal is to meaningfully combine as e) Group 5: The fifth group was created by sharp-
small number of features as possible while providing good ness/blur estimators. The members were Variance [32], Fre-
performance, the sequential forward selection algorithm is an quency Threshold [33], Gradient, Laplacian, Autocorrelation
ideal choice [22]. However, as with all sequential procedures, metric [34], Histogram Frequency [35], Kurtosis [36], Marzil-
the probability of ending up in a local minimum is high iano [37], HP [38], Kurtosis of Wavelet Coefficients [39],
and particularly sensitive to initial conditions. We, therefore, Riemannian Tensor [40], JNBM [41], CPBD [42], S1, S2,
propose to begin with modified Las Vegas algorithm [24] S3 [43] with improved pooling (S3III from [44]), FISH, and
in order to get a deeper insight into the behavior of the its block based variant FISHbb [45]. Only one blur/sharpness
combinations. This allows us to then choose a more reliable feature was allowed to be in the subset.
initial subset for the sequential search for final selection. f) Group 6: The sixth group was formed by Aesthetics
features proposed by Aydın et al. [19]. These include sharp-
ness, depth, clarity, and tone. Any number of them could have
A. Modified Las Vegas Algorithm
been randomly selected into the subset.
The following subsections describe the particular steps of g) Group 7: The seventh group consisted of saliency
the algorithm, as visualized in Figure 1. models outcomes. These were included since more details
1) Subset Generation: The full set consisted of 60 features. should provide more salient regions. The scores were therefore
Note that the term feature in the remainder of the paper stands created from the saliency maps by averaging assuming that
for an outcome of an estimator of certain image attribute or more salient regions will result in higher average saliency.
its overall quality. Such estimator can be a metric, measure, The included models were Frequency-tuned saliency model
index, or model [25]. In classical feature selection scenario, [46], Graph-based model [47], Itti-Koch model [48], Spectral
all the features are treated as independent entities. However, residual model [49], Incremental coding length saliency model
since the size of the subset is supposed to be small and should [50], and SUN [51]. Only one per subset could have been
contain only the optimal feature estimators, it is not desirable selected.
to combine multiple features of the same type together, i.e. the h) Group 8: The last group was formed by the outcomes
subsets including two different metrics measuring the same of estimators not belonging to any previous category. It in-
property (e.g. contrast) can be omitted. Therefore, several cluded NIQE [17], CS [52], QAC [53], BIQI [14], BRISQUE
groups of features have been created. Every subset in the [15], BLIINDS-II [16], Curvelet based metric [54], statistical
Las Vegas algorithm was generated by randomly selecting the naturalness from TMQI [8], statistical naturalness from TMQI-
groups from which the criteria will be taken. Each group was II [9], feature naturalness from [4], mean intensity from
assigned with different possibilities. Some groups enabled the [4], percentage of under and over exposed areas from [55],
selection of only one criterion from the group, some enabled JPEG2000 metric [56], and JPEG metric [57]. Any number
random selection of several metrics, and some required usage of these metrics could have been randomly selected to be
of all the criteria in the group. included in any subset.

1520-9210 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2019.2952256, IEEE
Transactions on Multimedia
4

Table I This should ideally lead to optimizing the performance with


D ESCRIPTION OF THE GROUPS OF FEATURES . respect to both of the performance analyses. Alternatively, it
Group no. Category Type # of features would be possible to weight the importance of the analyses
1 Contrast/structure similarity FR 6 according to the target application.
2 LWMPA similarity FR 3 Having a single dimensional ground truth data, this would
3 Contrast NR 5
4 Colorfulness NR 4 be a simple regression problem. However, given the specific
5 Sharpness NR 18 nature of the evaluation criterion, the optimal weights needed
6 Aesthetics NR 4 to be found using an optimization procedure. Considering the
7 Saliency NR 6
8 Other NR 14 high dimensionality of the parameters space, the direct search
In total 60 methods, such as Nelder-Mead downhill simplex [58], were
FR – full reference, NR – no reference not found suitable since they are too prone to ending up in
the local minima and are very dependent on the starting point.
The genetic optimization algorithm [59] has also been tested,
All this effort has been dedicated to making every combi-
nevertheless, the pattern search method [60] was generally able
nation that can possibly be selected perceptually meaningful.
to found the weights leading to the best performance for the
The categories of the features and number of features in each
given set of features among all the algorithms.
group are stated in Table I. In each iteration of the Las Vegas
The parameters resulting in the best performance for the i-
algorithm, the subset was randomly generated by selecting the
th subset have been saved together with the overall optimized
groups that will contribute to the subset and, according to the
performance Popt,i . Note that the full database has been used.
group properties, the criteria from it.
The optimal value, therefore, showed how well could the
2) Subset Evaluation: After selecting the subset Fi =
combination of the features in the subset perform if trained
{f1 , f2 , . . . , fki }, where ki is the size of the i-th subset, the
and tested on the same data. This, by all means, leads to
optimization algorithm has been run to train the weights of
over-fitting. Nevertheless, for the purpose of feature selection,
the contribution of each feature in the subset with respect
it provides information about which features can be used to
to the natural part of the TMIQD dataset from [5] (i.e.
model the data in the best way. To provide a combination
excluding computer generated images). Here, 10 natural HDR
for the general metric, different training procedure has been
images are tone-mapped using 9 different ways coming from
adopted. This will be discussed in Section IV.
5 TMOs. Considering the fact that some features come from
3) Stopping Criterion: Given that the purpose of the Las
full-reference estimators, we use the results obtained from the
Vegas algorithm was to provide a closer look into the behavior
experiment with the reference (i.e. scenario 1A in [5]). Only
of the combination and to preselect a subset for the sequential
the linear combination of features has been allowed since it
search, a predefined number of iterations has been set as
ensures transparent insight into the contributions of each one
the stopping criterion. 2,000 iterations already provided a
of them. The combination in i-th iteration was defined as
representative insight.
Si = τ1 × f1 + τ2 × f2 + . . . + τki × fki , (1) 4) Results: The optimal performance values with respect to
the size of the subset are depicted in Figure 2. An interesting
where τj is the weight for each feature fj and outcome can be that even with a high number of features, the
j ∈ {1, 2, . . . , ki }. The ranges of features were linearly overall performance does not get over 0.83. This is probably
normalized with respect to their highest theoretically possible caused by the challenging nature of the content and the
value. Nevertheless, the used optimization procedure (as two performance analyses together. Combining the features
described later) is invariant to the differences in range. This to work universally with respect to both of the analyses
step was, therefore, performed out of convention only. would probably require a more sophisticated non-linear model.
The evaluation criterion that has been used as a base for Nevertheless, the goal was not to perfectly model the data.
optimization, as well as for evaluating the subsets, is based To have an insight on which features are the most relevant
upon the two performance analyses described in [5]. The first for the modeling, a closer look has been taken to the best
analysis tests how well can an objective metric under the test performing subsets for each subset size (i.e. the subsets having
(in this case the combination of features) distinguish between the performance values Popt in the Figure 2 in the most right
qualitatively similar and different pairs. This is quantified by in each row).
AU CDS (Different vs. Similar) value. The second analysis Interestingly, four features were present in each of the best
determines the ability of the metric to recognize the quali- performing subsets from the size 4 to 13. These metrics were
tatively better image in the pair. The outcome is described structural similarity from TMQI-II [9], FSITMr , FSITMg , and
by AU CBW (Better vs. Worse) value and/or percentage of FSITMb [10]. Therefore, they have been selected as an initial
correct classification C0 . Further explanation can be found subset for a forward selection sequential algorithm.
in the Appendix. For more information about the framework,
analyses, interpretation, and outcomes, refer to [5].
For each generated subset, the optimization attempted to set B. Sequential Search
the weights τ1 to τki in order to maximize the value Pi defined The forward selection has been chosen since only a small
as number of features has been assumed to be added without
AU CDS,i + C0,i
Pi = . (2) jeopardizing the generality. The subsets evaluation remained
2

1520-9210 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2019.2952256, IEEE
Transactions on Multimedia
5

14 mapping of signals’ standard deviations according to contrast


12
sensitivity function (CSF). SS-II for a local patch is obtained
Size of the subset (−) as
10 2σ̃x σ̃y + c1 σxy + c2
8 SS-II(x, y) = 2 · , (3)
σ̃x + σ̃y2 + c1 σx σy + c2
6
4 where x and y stand for the patch in HDR and tone-mapped
2 image, respectively, σ̃ is the standard deviation of the patch
0
after the non-linear mapping, and c1 and c2 are small constants
0.5 0.6 0.7 0.8 for numerical stability. Note that the luminance component
Popt (−)
is missing, compared to SSIM definition, but the structural
element (i.e. the second fraction in equation 3) remains the
Figure 2. The optimal performance values with respect to the size of the
subset after 2,000 iterations of the Las Vegas algorithm. same.
The mapping is defined as
Z σ  (x − t )2 
the same as in the case of Las Vegas algorithm. The stopping 1 σ
σ̃ = √ exp − dx, (4)
criterion has been set according to Pareto optimality, i.e. to 2πθσ −∞ 2θσ2
stop the algorithm when adding another feature will not result
where
in an increase of Popt and simultaneously increase of both tσ (φ)
AU CDS and C0 . The rationale was to avoid over-fitting the θσ (φ) = , (5)
k
data by focusing more on one aspect of the performance than
the other. with φ being a spatial frequency and k representing a constant
These conditions lead to adding only one feature – feature obtained from Crozier’s law [61], typically ranging from 2.3
naturalness as described in [4]. Such a subset has been found to 4. The authors propose to use k = 3. The threshold for the
convenient for several reasons. Firstly, the size of the subset signal’s standard deviation is
(five features) is relatively small and thus potentially provides µ
more generality. Secondly, the combination of these features tσ (φ) = √ , (6)
2 · c · CSF (φ)
makes sense also with respect to the quality-related perceptual
attributes as described by Čadı́k et al. [6]. Structural similarity where µ is the mean intensity value (set to 128 by the authors)
measures the reproduction of contrast and structure which and c is a constant used to fit the physiological data. Here, CSF
provides an information about details and artifacts. The fea- as introduced by Mannos and Sakrison [62] and fit to the data
ture naturalness determines if the combination of brightness, measured by Kelly [63] is used. The map of SS-II is averaged
contrast, and color of the tone-mapped version is plausible on each scale.
for a natural looking image. The FSITM quantifies feature The above-described procedure is the same for the structural
similarity, therefore, it should capture changes in details similarity part of both TMQI and TMQI-II. However, in the
reproduction and detect the presence of artifacts. Moreover, latter, the contrast visibility model for HDR images has been
since all three channels are included, eventual color artifacts adapted to the local luminance. The estimate of contrast in the
should also be found. The detailed description of each of the HDR reference is therefore computed as the standard deviation
selected features will be provided in the next section. in a patch divided by the local mean, i.e. σ/µ.
2) Feature Naturalness: The no reference FN measure,
IV. P ROPOSED M ETRIC F ORMULATION developed in our previous work [4], is based on the assumption
The previous sections identified the most relevant features that naturalness is mainly defined by the image’s brightness,
for quality assessment of tone-mapped images on the dataset contrast, and colorfulness. To study the behavior of these fea-
from [5]. This section describes the individual selected fea- tures in natural scenes, 5,000 colorful images of different sizes
tures, the training of the parameters, and validation of the and contents were firstly obtained from a publicly available
results. Image Net database.1 Then the mean intensity (MI), global
contrast factor (GCF) [26], and CQE1 colorfulness [30], which
have been identified as reliable estimators in [12], for all of
A. Description of the Particular Components of the Metric these images were calculated.
As stated above, the features that have been identified as To capture the interaction of the three main properties,
the most relevant to the quality are structural similarity from their product was computed for each image and the histogram
TMQI-II (SS-II) [9], feature naturalness (FN) [4], and feature for all 5,000 scenes has been obtained. The histogram is
similarity in each channel (FSITMr, g, b ) [10]. shown in Figure 3. It has been noticed that the result can
1) Structural Similarity: The SS-II feature is obtained by a be approximated by Rayleigh distribution as shown by the red
modification of the popular SSIM index for comparing HDR curve. This distribution can, therefore, be interpreted as the
and tone-mapped images. This modification does not penalize probability that the image looks natural.
the difference in signal strength if they are both under or
over the visibility threshold. This is determined by non-linear 1 http://www.image-net.org/

1520-9210 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2019.2952256, IEEE
Transactions on Multimedia
6

1 FSITM is calculated for each color channel ch separately.


The similarity of the congruency map (SCG) for a channel
0.5 ch is computed as
   
ch ch
P CG IHDR ∩ P CG ITMO


ch
0 SCG (IHDR , ITMO ) = ,
0 0.5 1 W ·H
(14)
Figure 3. Probability distribution of the product of MI, GCF, and CQE1 with W and H being the image width and height, respectively.
colorfulness for 5,000 colorful natural images. The final index can then be obtained from
 
F SIT Mch =λ SCGch IHDR , ITMO +
  (15)
The probability density function (PDF) of Rayleigh distri- +(1 − λ)SCGch ln(IHDR ), ITMO ,
bution fR is defined as
where the parameter λ was set experimentally.
x − x22
fR (x, υ) = e 2υ , (7)
υ2 B. Training of Parameters
where x ≥ 0 and υ is a scaling factor. In this particular case The parameters of the fusion have been trained in order to
υ = 0.27. provide higher generality of the proposed approach. Typically,
FN is then obtained as the dataset is being repeatedly randomly divided into training
fR (M I · GCF · CQE1, 0.27) and testing parts and median results are provided. However,
FN = . (8) this still tests the performance within one dataset only. It
max fR (x, 0.27)
has therefore been decided to train the parameters of the
The denominator serves for normalization. It is the global combination of the features on a completely different database.
maximum of the Rayleigh PDF with respective υ from the For this purpose, the dataset developed by Yeganeh and
equation (7). The higher the F N value, the more natural Wang with TMQI [8] has been chosen. It is formed by 15
should the image look. source images and 8 TMOs. Since the database contains only
3) Feature Similarity: Feature Similarity Index for Tone- within content ranks and, therefore, does not allow for any
Mapped images (FSITM) was presented in [10]. It uses the other analysis, simple maximization of average Kendall Rank
phase congruency features to calculate the difference between Order Coefficient (KROCC) has been adopted. Only the ability
HDR and tone-mapped version of the image. More specifically, to rank the content has, therefore, been trained. The resulting
it uses the Locally Weighted Mean Phase Angle (LWMPA) to Features Fusion for Tone-Mapped Images (FFTMI) is, thus,
compute the phase congruency. The main advantage of this defined as
feature is its robustness against noise. F F T M I =τ1 × SS-II + τ2 × F N + τ3 × F SIT Mr +
Let the lGeρ,r and lGoρ,r be a quadratic pair of log-Gabor (16)
+τ4 × F SIT Mg + τ5 × F SIT Mb ,
wavelets, i.e. evenly and oddly symmetric, on the scale ρ and
orientation r. The responses for a two-dimensional signal (e.g. where the parameters’ values are determined by maximizing
an image I) are obtained as average KROCC on the Yeganeh’s dataset and scaled so
that the highest weight is equal to 1. The final values are
[eρ,r (I), oρ,r (I)] = [I ∗ lGeρ,r , I ∗ lGoρ,r ], (9) τ1 = 0.2129, τ2 = 0.0443, τ3 = 1, τ4 = 0.0621, and
τ5 = 0.0931. The highest weight has been assigned to the
where the operator ∗ stands for a convolution. The LWMPA FSITMr , the smallest to the FN. Nevertheless, all the features
is then computed as provide a valuable contribution to the overall performance.
X X 
LW M P A(I) = arctan2 eρ,r (I), oρ,r (I) . (10) V. P ROPOSED M ETRIC P ERFORMANCE V ERIFICATION
ρ,r ρ,r In this part, the performance of the proposed fusion metric
The operator arctan2(.) is defined as is evaluated and compared to other existing criteria with
available implementations on three publicly available datasets.
x The performance comparison procedures for Yeganeh’s [8]
arctan2(x, y) = 2 arctan p . (11)
x2 + y 2 + y and Čadı́k’s database are rank order correlation coefficients
(namely KROCC and Spearman Rank Order Coefficient –
The values of LW M P A(I) range from −π/2 to π/2. The
SROCC) only since the nature of the subjective scores pro-
binary phase congruency map P CG can then be obtained as
vided with the datasets does not allow for more sophisticated
 
P CG(I) = H LW M P A(I) , (12) statistical analysis. Moreover, the same way has already been
adopted when proposing new criteria [8], [10] and, therefore,
where H(.) is a Heaviside (unit-step) function. The definition allows for the direct comparison under the same conditions.
is The performance on the TMIQD dataset introduced in [5] is

done by the ROC based framework described and provided
1
 t>0
H(t) = 2 1
(13) with the paper.2
t=0
 2 http://mmtg.fel.cvut.cz/tmiqd-database/
0 t < 0.

1520-9210 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2019.2952256, IEEE
Transactions on Multimedia
7

Table II
KROCC OF THE METRICS FOR THE DATASET DEVELOPED BY Y EGANEH AND WANG [8].

Content no.
Metric Average Min
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
FFTMI 0.93 0.79 0.64 0.71 0.71 0.57 0.79 0.57 0.79 0.64 0.86 0.86 0.62 0.71 0.93 0.74 0.57
TMQI 0.79 0.64 0.64 0.71 0.64 0.93 0.57 0.57 0.57 0.86 0.71 0.57 0.55 0.64 0.86 0.68 0.55
TMQI-II 0.79 0.29 0.57 0.50 0.50 0.93 0.71 0.50 0.71 0.79 0.71 0.43 0.62 0.57 0.79 0.63 0.29
FSITMr 1.00 0.71 0.50 0.79 0.57 0.64 0.79 0.50 0.86 0.64 0.93 0.71 0.55 0.71 0.93 0.72 0.5
FSITMg 0.93 0.93 0.50 0.71 0.57 0.36 0.64 0.57 0.79 0.57 0.86 0.57 0.55 0.64 0.93 0.67 0.36
FSITMb 0.71 0.71 0.29 0.71 0.64 0.29 0.29 0.29 1.00 0.79 0.71 0.71 -0.25 0.50 1.00 0.56 -0.25

Table III
SROCC OF THE METRICS FOR THE DATASET DEVELOPED BY Y EGANEH AND WANG [8].

Content no.
Metric Average Min
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
FFTMI 0.98 0.86 0.79 0.86 0.83 0.69 0.88 0.69 0.90 0.79 0.93 0.95 0.68 0.88 0.98 0.85 0.68
TMQI 0.90 0.79 0.81 0.88 0.74 0.98 0.69 0.71 0.69 0.93 0.88 0.71 0.68 0.74 0.95 0.81 0.68
TMQI-II 0.90 0.50 0.69 0.69 0.67 0.98 0.83 0.67 0.81 0.90 0.83 0.60 0.79 0.76 0.90 0.77 0.5
FSITMr 1.00 0.76 0.64 0.90 0.71 0.74 0.90 0.57 0.93 0.79 0.98 0.86 0.65 0.88 0.98 0.82 0.57
FSITMg 0.98 0.98 0.69 0.86 0.71 0.55 0.79 0.62 0.90 0.71 0.93 0.76 0.74 0.79 0.98 0.80 0.55
FSITMb 0.81 0.81 0.60 0.86 0.79 0.43 0.43 0.45 1.00 0.86 0.86 0.86 -0.18 0.71 1.00 0.68 -0.18

The recently introduced ESPL-LIVE HDR Image Quality Table IV


Database [64] does not provide the original HDR images with KROCC FOR THE DATASET DEVELOPED BY Č AD ÍK ET AL . [6].
their tone-mapped versions and is, therefore, not suitable for Content no.
Metric Average Min
evaluating the performance of the full-reference metrics. 1 2 3
FFTMI 0.64 0.74 0.77 0.72 0.64
TMQI 0.56 0.77 0.62 0.65 0.56
A. Dataset of Yeganeh and Wang [8] TMQI-II 0.67 0.64 0.69 0.67 0.64
FSITMr 0.44 0.67 0.59 0.56 0.44
Note that since the parameters have been trained on the FSITMg 0.44 0.87 0.62 0.64 0.44
FSITMb 0.41 0.56 0.74 0.57 0.44
Yeganeh’s database, the comparison on this data is not com-
pletely fair and is included for the purpose of completeness
only. The resulting KROCC and SROCC per content, together Table V
SROCC FOR THE DATASET DEVELOPED BY Č AD ÍK ET AL . [6].
with an average and minimum value, can be found in Tables
II and III, respectively. In terms of KROCC, the proposed Metric
Content no.
Average Min
1 2 3
metric provides the highest average and minimum value. With FFTMI 0.80 0.89 0.90 0.86 0.8
respect to the SROCC, it still reaches the highest average TMQI 0.71 0.91 0.77 0.80 0.71
coefficient value but results in the same minimum value as TMQI-II 0.78 0.82 0.86 0.82 0.78
FSITMr 0.64 0.71 0.74 0.70 0.64
TMQI. Overall, the performance of the proposed metric is FSITMg 0.64 0.92 0.77 0.78 0.64
satisfactory and can be considered superior over the tested FSITMb 0.64 0.77 0.86 0.76 0.64
metrics.
It should be noted that Hadizadeh and Bajić [13] re-
ported even higher mean KROCC and SROCC in their paper. C. Tone-Mapped Images Quality Database (TMIQD) [5]
However, these values were calculated from 80-20 train-test Finally, we determine the performance of the metrics with
procedure and are, therefore, not directly comparable. When respect to the natural part of TMIQD (i.e. scenario 1A in [5]).
the full database was used for training (as in our case), the The performance has been evaluated using the methodology
performance on the TMIQD database [5] is lower than the proposed with the database (see Appendix) and calculated
proposed approach. using the publicly available scripts.2 The results are depicted
in Table VI.
B. Dataset of Čadı́k et al. [6] The value AU CDS quantifies the abilities of criteria to
The database developed by Čadı́k et al. [6] contains 3 source distinguish between images of similar and different qualities,
images processed with 14 TMOs. The values of KROCC while the AU CBW and C0 values show how reliable are
and SROCC in the same format as in the previous case
are in Tables IV and V, respectively. In terms of KROCC,
Table VI
the proposed method ranks the highest on average value R ESULTS FOR TMIQD [5].
and shares the same highest minimum with TMQI-II. It
should be noted that TMQI-II did not perform well on the Metric AU CDS pDS AU CBW pBW C0 pC0
FFTMI 0.64 1 0.92 1 0.84 1
Yeganeh’s dataset. This suggests higher universality of the TMQI 0.55 0.02 0.63 <<0.01 0.58 <<0.01
proposed FFTMI. Moreover, it reaches the highest average TMQI-II 0.51 <0.01 0.69 <<0.01 0.67 <<0.01
FSITMr 0.58 0.10 0.72 <<0.01 0.65 <<0.01
and minimum SROCC. Its performance can, therefore, again FSITMg 0.49 <<0.01 0.57 <<0.01 0.55 <<0.01
be considered superior over the other criteria. FSITMb 0.53 <0.01 0.64 <<0.01 0.58 <<0.01

1520-9210 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2019.2952256, IEEE
Transactions on Multimedia
8

the metrics in the ranking of the images. More discussion for a natural looking image and the feature similarity captures
about the meaning of these values can also be found in [65]. changes in details reproduction and detects the presence of
The p-values, accompanying the results, are the outcomes of artifacts in all three color channels.
the statistical analysis comparing the result of the proposed The performance of the fusion was evaluated with respect to
FFTMI with the method in the respective row. Note that the the three publicly available databases of tone-mapped images.
“<<” symbol identifies the cases, where the p-value is lower The proposed approach showed very good generality and
by more than one order of magnitude. The p values for AU CDS managed to significantly outperform all the other tested state-
and AU CBW were obtained using the procedure proposed by of-the-art criteria.
Hanley and McNeil [66], while in case of C0 the Fisher exact As part of the future work, we intend to study possible
test [67] was employed. extension of the proposed feature selection strategy to other
The proposed method reaches the highest value in all means of features fusion, other than linear combination, that
aspects of the analysis. The only case where the difference in may provide even better performance. Nevertheless, this would
performance is not statistically significant on the confidence negatively affect the transparency, which was crucial for
level of 95% is AU CDS of FSITMr where the p-value is 0.10. achieving the goals of the presented study. Furthermore, an
Moreover, the result of the proposed FFTMI suggests that effective extension of the metric towards tone-mapped video
there is still large space for improvement in terms of similar evaluation is planned.
and different pairs separation. Future metrics should attempt
to focus on this aspect in order to increase the reliability of A PPENDIX
quality estimations. P ERFORMANCE E VALUATION M ETHODOLOGY IN TMIQD
In terms of correct recognition of higher quality image (i.e.
correct ranking of the images), a formidable difference in The framework for evaluating the performance of objective
the performance, compared to the other tested criteria, can metrics with respect to the TMIQD3 have been published in
be observed. All the p-values are lower than 4 · 10−6 . We [5]. Two aspects are being tested:
can, therefore, conclude that the proposed FFTMI significantly Q1 How well can the metric distinguish between qualitatively
outperforms all the tested algorithms. distant and similar pairs?
Considering the outcomes from all three databases, the Q2 How well can the metric recognize the better image in
proposed metric generally provides a more reliable quality the pair? (i.e. how well can the metric rank the images).
estimation compared to the state-of-the-art objective quality The basic assumption is that a reliable metric should provide
criteria. The selected features can, therefore, be considered close scores for images which are qualitatively similar and
highly relevant to the quality perception of natural tone- more distant scores for qualitatively different images. It should
mapped images. also give a higher score for the qualitatively better image (in
cases where we are able to determine which image is better,
VI. C ONCLUSION i.e. if the two images are significantly qualitatively different).
The framework of the whole methodology is depicted in
This paper describes the design of a novel objective quality Figure 4. The dataset is firstly divided into image pairs which
criterion, suitable for the automatic assessment of natural are statistically significantly different and similar in quality
images after tone-mapping. The main contribution of the (according to the subjective scores obtained from the pair
paper is in the feature selection strategy identifying particular comparison test). For these two groups, distances of objective
estimators optimal with respect to capturing perceptually rel- scores for each pair are calculated. Considering the above-
evant aspects that are mutually complementary. The proposed mentioned assumption, significantly different pairs should
method is based on a linear combination of such features result in much higher distances. This is measured in Different
which ensures transparency regarding their contributions and vs. similar analysis (the top part of Figure 4). The outcome
high generality. is Area Under ROC Curve (AU CDS ) quantifying the metric’s
The features were selected from the pool of 60 estimators. abilities with respect to the aspect Q1.
A modified version of Las Vegas [24] algorithm was employed In the second part, only significantly different pairs are
to preselect the most relevant subset, followed by the forward considered and differences of objective scores are calculated
selection sequential algorithm to determine the final set. The for each of them. Here, the reliable criterion will provide the
selected estimators include structural similarity part of TMQI- same sign of the objective score differences as the difference of
II [9], our novel measure of naturalness based on brightness, subjective scores obtained in the pair comparison experiment
contrast, and colorfulness, and feature similarity based on (i.e. if an image has a higher subjective value, it should also
locally weighted mean phase angle in all three color channels have a higher objective value). Two outcomes can be com-
[10]. puted, AU CBW value signifying how well are the differences
The combination is also meaningful with respect to the main of scores for the two groups separated (see the bottom part of
aspects of the quality perception, as described by Čadı́k et al. Figure 4) and a percentage of correct classification C0 . A more
[6]. Structural similarity quantifies the reproduction of details detailed discussion about the difference between these two
and should also detect artifacts in luminance domain. The fea- entities is provided in [65]. Nevertheless, both of the values
ture naturalness determines if the combination of brightness,
contrast, and color of the tone-mapped version is plausible 3 http://mmtg.fel.cvut.cz/tmiqd-database/

1520-9210 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2019.2952256, IEEE
Transactions on Multimedia
9

Pairs without Different vs. Similar Analysis [14] A. K. Moorthy and A. C. Bovik, “A two-step framework for constructing
significant
difference in votes
blind image quality indices,” IEEE Signal Processing Letters, vol. 17,
Objective
evaluation P [-] no. 5, pp. 513–516, May 2010.
DATASET(S) Preprocessing +
preprocessing
[15] A. Mittal, A. K. Moorthy, and A. C. Bovik, “Referenceless image spatial
Significantly quality evaluation engine,” in 45th Asilomar Conference on Signals,
different pairs THR
|Δmodel| [-] Systems and Computers, November 2011.
Outcomes: [16] M. A. Saad, A. C. Bovik, and C. Charrier, “Blind image quality
- AUC values showing how well can the assessment: A natural scene statistics approach in the DCT domain,”
model distinguish between significantly
Pairs with
negative
Pairs with
positive different and similar stimuli IEEE Transactions on Image Processing, vol. 21, no. 8, pp. 3339–3352,
score difference score difference 2012.
- Threshold for the model’s scores difference
Objective evaluation +
providing 95% probability that the images
[17] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a completely
preprocessing
Better vs. Worse Analysis
are significantly different (0.95 percentile of blind image quality analyzer,” IEEE Signal Processing Letters, vol. 20,
the distribution for similar pairs)
no. 3, pp. 209–212, 2013.
Outcomes:
P [-] - Percentage of correct recognition of the [18] D. Kundu, D. Ghadiyaram, A. C. Bovik, and B. L. Evans, “No-reference
qualitatively better stimulus from the pair quality assessment of tone-mapped hdr pictures,” IEEE Transactions on
- AUC values showing how well can the
Image Processing, vol. 26, no. 6, pp. 2957–2971, June 2017.
0
Δmodel [-] model recognize qualitatively better [19] T. Aydın, A. Smolic, and M. Gross, “Automated aesthetic analysis of
stimulus from the pair
photographic images,” IEEE Transactions on Visualization and Com-
puter Graphics, vol. 21, no. 1, pp. 31–42, 2015.
Figure 4. Framework of the evaluation methodology from [5]. [20] K. Gu, S. Wang, G. Zhai, S. Ma, X. Yang, W. Lin, W. Zhang,
and W. Gao, “Blind quality assessment of tone-mapped images via
analysis of information, naturalness, and structure,” IEEE Transactions
on Multimedia, vol. 18, pp. 432–443, 2016.
provide information about the metric’s performance regarding [21] H. Liu and L. Yu, “Towards integrating feature selection algorithms
the aspect Q2. for classification and clustering,” IEEE Transactions on Knowledge and
Data Engineering, vol. 17, no. 4, pp. 491–502, 2005.
[22] M. Nuutinen, T. Virtanen, and P. Oittinen, “Image feature subset for
ACKNOWLEDGMENT predicting the quality of consumer camera images and identifying quality
dimensions,” Journal of Electronic Imaging, vol. 23, no. 6, 2014.
This work was partially supported by the Czech Science [23] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and
Foundation within the project No. GA17-05840S “Multicrite- Data Mining. Boston: Kluwer Academic, 1998.
ria optimization of shift-variant imaging system models.” [24] G. Brassard and P. Bratley, Fundamentals of Algorithms. New Jersey:
Prentice Hall, 1996.
[25] T. Richter, “From index to metric: Using differential geometry to define
R EFERENCES a global visual quality metric,” in SPIE 8135, Applications of Digital
Image Processing XXXIV, 2011.
[1] ITU-R Recommendation BT.2020, Parameter values for ultra-high [26] K. Matković, L. Neumann, A. Neumann, T. Psik, and W. Purgathofer,
definition television systems for production and international programme “Global contrast factor – a new approach to image contrast,” in
exchange, ITU-R Std., October 2015. Proceedings of the First Eurographics Conference on Computational
[2] F. Banterle, A. Artusi, K. Debattista, and A. Chalmers, Advanced High Aesthetics in Graphics, Visualization and Imaging, ser. Computational
Dynamic Range Imaging: Theory and Practice. Natick, MA, USA: Aesthetics’05. Aire-la-Ville, Switzerland, Switzerland: Eurographics
AK Peters (CRC Press), 2011. Association, 2005, pp. 159–167. [Online]. Available: http://dx.doi.org/
[3] J. Petit and R. Mantiuk, “Assessment of video tone-mapping: Are 10.2312/COMPAESTH/COMPAESTH05/159-167
cameras’ s-shaped tone-curves good enough?” J. Vis. Commun. Image [27] S. Agaian, K. P. Lentz, and A. M. Grigoryan, “A new measure of
R., vol. 24, pp. 1020–1030, 2013. image enhancement,” in IASTED International Conference on Signal
[4] L. Krasula, M. Narwaria, K. Fliegel, and P. Le Callet, “Rendering Processing and Communications, 2000.
of HDR content on LDR displays: An objective approach,” in SPIE [28] S. Agaian, B. Silver, and K. A. Panetta, “Transform coefficient
Applications of Digital Image Processing XXXVIII, 2015. histogram-based image enhancement algorithms using contrast entropy,”
[5] ——, “Preference of experience in image tone-mapping: Dataset and IEEE Transactions on Image Processing, vol. 16, pp. 741 – 758, 2007.
framework for objective measures comparison,” IEEE Journal of Se- [29] K. Panetta, Y. Zhou, and S. Agaian, “Nonlinear unsharp masking for
lected Topics in Signal Processing, vol. 11, no. 1, pp. 64–74, February mammogram enhancement,” IEEE Transactions on Information Tech-
2017. nology in Biomedicine, vol. 15, no. 4, pp. 918 – 928, 2011.
[6] M. Čadı́k, M. Wimmer, L. Neumann, and A. Artusi, “Evaluation of HDR [30] K. Panetta, C. Gao, and S. Agaian, “No reference color image contrast
tone mapping methods using essential perceptual attributes,” Computers and quality measures,” IEEE Transactions on Consumer Electronics,
& Graphics, vol. 32, pp. 330–349, 2008. vol. 59, no. 3, pp. 643 – 651, August 2013.
[7] T. O. Aydın, R. Mantiuk, K. Myszkowski, and H.-P. Seidl, “Dynamic [31] Y.-Y. Fu, “Color image quality measures and retrieval,” Ph.D. dis-
range independent image quality assessment,” in International Confer- sertation, Department of Computer Science, New Jersey Institute of
ence on Computer Graphics and Interactive Techniques, 2008. Technology, 2003.
[8] H. Yeganeh and Z. Wang, “Objective quality assessment of tone-mapped [32] S. Erasmus and K. Smith, “An automatic focusing and astigmatism
images,” IEEE Transactions on Image Processing, vol. 22, no. 2, pp. correction system for the SEM and CTEM,” in J. Microscopy, vol. 127,
657–667, February 2013. 1982, pp. 185–199.
[9] K. Ma, H. Yeganeh, K. Zeng, and Z. Wang, “High dynamic range image [33] L. Firestone, K. Cook, N. Talsania, and K. Preston, “Comparison of
compression by optimizing tone mapped image quality index,” IEEE autofocus methods for automated microscopy,” in Cytometry, vol. 12,
Transactions on Image Processing, vol. 24, no. 10, pp. 3086–3097, 2015. 1991, pp. 195–206.
[10] H. Ziaei Nafchi, A. Shahkolaei, R. Farrahi Moghaddam, and M. Cheriet, [34] C. F. Batten, “Autofocusing and astigmatism correction in the scanning
“FSITM: A feature similarity index for tone-mapped images,” IEEE electron microscope,” Master’s thesis, University of Cambridge, Cam-
Signal Processing Letters, vol. 22, no. 8, pp. 1026–1029, 2015. bridge, U.K., 2000.
[11] M. Granados, T. Aydın, J. R. Tena, J. F. Lalonde, and C. Theobalt, [35] X. Marichal, W. Ma, and H. J. Zhang, “Blur determination in the
“Contrast use metrics for tone mapping images,” in IEEE International compressed domain using DCT information,” in IEEE International
Conference on Computational Photography (ICCP), 2015. Conference on Image Processing, vol. 2, 1999, pp. 386–390.
[12] L. Krasula, K. Fliegel, P. Le Callet, and M. Klı́ma, “Objective evaluation [36] N. Zhang, A. Vladar, M. Postek, and B. Larrabee, “A kurtosis-based
of naturalness, contrast, and colorfulness of tone-mapped images,” in statistitcal measure for two-dimensional processes and its application to
Proc. SPIE 9217, Applications of Digital Image Processing XXXVII, image sharpness,” in Proceedings Section of Physical and Engineering
2014. Sciences of American Statistical Society, 2003, pp. 4730–4736.
[13] H. Hadizadeh and I. V. Bajić, “Full-reference objective quality as- [37] P. Marziliano, F. Dufaux, S. Winkler, and T. Ebrahimi, “Perceptual blur
sessment of tone-mapped images,” IEEE Transactions on Multimedia, and ringing metrics: Applications to JPEG2000,” Signal Processing:
vol. 20, pp. 392–404, 2018. Image Communications, vol. 19, no. 2, pp. 163–172, February 2004.

1520-9210 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2019.2952256, IEEE
Transactions on Multimedia
10

[38] D. Shaked and I. Tastl, “Sharpness measure: Towards automatic image [63] D. H. Kelly, “Effects of sharp edges on the visibility of sinusoidal
enhancement,” in IEEE International Conference on Image Processing, gratings,” Journal of the Optical Society of America, vol. 60, no. 1,
vol. 1, September 2005, pp. 937–940. pp. 98–103, 1970.
[39] R. Ferzli, L. J. Karam, and J. Caviedes, “A robust image sharpness metric [64] D. Kundu, D. Ghadiyaram, A. C. Bovik, and B. L. Evans, “Large-scale
based on kurtosis measurement of wavelet coefficients,” in Proceedings crowdsourced study for tone-mapped hdr pictures,” IEEE Transactions
of the 1st International Workshop on Video Processing and Quality on Image Processing, vol. 26, no. 10, pp. 4725–4740, Oct 2017.
Metrics for Consumer Electronics, 2005. [65] L. Krasula, K. Fliegel, P. Le Callet, and M. Klı́ma, “On the accuracy
[40] R. Ferzli and L. J. Karam, “A no reference objective sharpness metric of objective image and video quality models: New methodology for
using Riemannian tensor,” in Third International Workshop on Video performance evaluation,” in International Conference on Quality of
Processing and Quality Metrics for Consumer Electronics VPQM-07, Multimedia Experience (QoMEX), 2016.
Scottsdale, Arizona, January 2007, pp. 25–26. [66] J. A. Hanley and B. J. McNeil, “A method of comparing the area under
[41] ——, “A no-reference objective image sharpness metric based on the two ROC curve derived from the same cases,” Radiology, vol. 148, pp.
notion of just noticeable blur (JNB),” IEEE Transactions on Image 839–843, 1983.
Processing, vol. 18, no. 4, pp. 717–728, April 2009. [67] R. A. Fisher, “On the interpretation of χ2 from contingency tables, and
[42] N. Narvekar and L. J. Karam, “A no-reference image blur metric the calculation of p,” Journal of Royal Statistical Society, vol. 85, no. 1,
based on the cumulative probability of blur detection (CPBD),” IEEE pp. 87–94, 1922.
Transactions on Image Processing, vol. 20, no. 9, pp. 2678 – 2883,
September 2011.
[43] C. Vu, T. Phan, and D. Chandler, “S3: A spectral and spatial measure
of local perceived sharpness in natural images,” IEEE Transactions on
Image Processing, vol. 21, no. 3, 2011.
[44] L. Krasula, P. Le Callet, K. Fliegel, and M. Klı́ma, “Quality assessment
of sharpened images: Challenges, methodology, and objective metrics,”
IEEE Transactions on Image Processing, vol. 26, no. 3, pp. 1496–1508,
2017.
[45] P. V. Vu and D. M. Chandler, “A fast wavelet-based algorithm for global
and local image sharpness estimation,” Signal Processing Letters, IEEE,
vol. 19, no. 7, pp. 423 –426, july 2012.
[46] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned
salient region detection,” in IEEE Conference on Computer Vision and
Pattern Recognition, 2009.
[47] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in
Advances in neural information processing systems, 2006, pp. 545–552.
[48] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual at-
tention for rapid scene analysis,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, 1998.
[49] X. Hou and L. Zhang, “Saliency detection: A spectral residual approach,”
in IEEE Conference on Computer Vision and Pattern Recognition, 2007.
[50] ——, “Dynamic visual attention: Searching for coding length incre-
ments,” in Advances in Neural Information Processing Systems, 2008.
[51] L. Zhang, M. H. Tong, T. K. Marks, H. Shan, and G. W. Cottrell, “SUN:
A Bayesian framework for saliency using natural statistics,” Journal of
Vision, vol. 8, no. 7, pp. 32, 1–20, 2008.
[52] Q. Sang, X. Wu, C. Li, and Y. Lu, “Universal bling image quality as-
sessment using contourlet transform and singular-value decomposition,”
Journal of Electronic Imaging, vol. 23, no. 6, 2014.
[53] W. Xue, L. Zhang, and X. Mou, “Learning without human scores for
blind image quality assessment,” in IEEE Conference on Computer
Vision and Pattern Recognition, 2013.
[54] L. Liu, H. Dong, H. Huang, and A. C. Bovik, “No-reference image
quality assessment in curvelet domain,” Signal Processing: Image Com-
munication, vol. 29, no. 4, pp. 494–505, 2014.
[55] L. Krasula, M. Narwaria, and P. Le Callet, “An automated approach for
tone mapping operator parameter adjustment in security applications,” in
SPIE 9138, Optics, Photonics, and Digital Technologies for Multimedia
Applications III, 2014.
[56] H. R. Sheikh, A. C. Bovik, and L. Cormack, “No-reference quality
assessment using natural scene statistics: JPEG2000,” IEEE Transactions
on Image Processing, vol. 14, no. 11, pp. 1918–1927, November 2005.
[57] Z. Wang, H. Sheikh, and A. Bovik, “No-reference perceptual quality
assessment of JPEG compressed images,” in IEEE International Con-
ference on Image Processing, vol. 1, September 2002, pp. I–477 – I–480.
[58] J. A. Nelder and R. Mead, “A simplex method for function minimiza-
tion,” The Computer Journal, vol. 7, no. 4, pp. 308–313, 1965.
[59] A. R. Conn, N. I. M. Gould, and P. L. Toint, “A globally convergent aug-
mented Lagrangian algorithm for optimization with general constraints
and simple bounds,” SIAM Journal on Numerical Analysis, vol. 28, no. 2,
pp. 545–572, 1991.
[60] R. M. Lewis, A. Shepherd, and V. Torczon, “Implementing generating
set search methods for linear constraint minimization,” SIAM Journal
on Scientific Computing, vol. 29, no. 6, pp. 2507–2530, 2007.
[61] W. J. Crozier, “On the variability of critical illumination for flicker fusion
and intensity discrimination,” Journal of General Physiology, vol. 19,
no. 3, pp. 503–522, 1935.
[62] J. L. Mannos and D. J. Sakrison, “The effects of a visual fidelity criterion
on the encoding of images,” IEEE Transactions on Information Theory,
vol. 20, no. 4, pp. 525–536, 1974.

1520-9210 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like