You are on page 1of 15

2nd Reading

June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

International Journal of Wavelets, Multiresolution


and Information Processing
Vol. 15, No. 4 (2017) 1750037 (15 pages)
c World Scientific Publishing Company
DOI: 10.1142/S0219691317500370

Multi-focus image fusion and super-resolution


with convolutional neural network
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

Bin Yang∗ , Jinying Zhong† , Yuehua Li‡ and Zhongze Chen§


College of Electric Engineering, University of South China
Hengyang 421001, P. R. China
∗yangbin01420@163.com
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

†jy zhong1015@163.com
‡bhylyh.com@163.com
§zzchen801@163.com

Received 21 September 2016


Revised 21 February 2017
Accepted 26 April 2017
Published 25 May 2017

The aim of multi-focus image fusion is to create a synthetic all-in-focus image from
several images each of which is obtained with different focus settings. However, if the
resolution of source images is low, the fused images with traditional fusion method would
be also in low-quality, which hinders further image analysis even the fused image is all-
in-focus. This paper presents a novel joint multi-focus image fusion and super-resolution
method via convolutional neural network (CNN). The first level network features of
different source images are fused with the guidance of the local clarity calculated from the
source images. The final high-resolution fused image is obtained with the reconstruction
network filters which act like averaging filters. The experimental results demonstrate
that the proposed approach can generate the fused images with better visual quality
and acceptable computation efficiency as compared to other state-of-the-art works.

Keywords: Multi-focus image fusion; super-resolution; convolutional neural networks.

AMS Subject Classification: 62H35, 68U10

1. Introduction
Due to the restricting finite depth of field of optical lenses, only the objects with cer-
tain distance from the lens can be recorded clearly, so it is difficult to get an image
with all objects in it being in focus. The problem can be solved by the multi-focus
image fusion technology which creates a synthetic all-in-focus image via combining
two or more images of a scene obtained with different focus settings.6,13,8,9 In the
past two decades, the researchers all over the world have proposed various fusion
approaches which can be categorized into spatial domain and transform domain
methods according to the stage of the fusion operation performs in Ref. 6. The
spatial domain-based methods select weights of the image pixels with the spatial

1750037-1
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

B. Yang et al.

clarity measurement while the transform-based methods fuse the transform coeffi-
cients of different source images with the guidance of coefficients activity. The multi-
resolution transforms such as wavelet transform,2 and dual-tree complex wavelet
transform (DTCWT)5 are usually used to perform the fusion.
In addition, most consumer-level image sensors also have limitation in respect
to their maximum resolution. If the resolution of the source images is low the fused
image would be still low-resolution, which hinders many further image process-
ing tasks. Single image super-resolution (SR) technology can be used to improve
the image resolution.18,17,2 Bilinear and bicubic interpolation are the basic super-
resolution methods which still be very popular in many applications due to theirs
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

simple and low computation cost. Recent state-of-the-art super-resolution meth-


ods are mostly example-based which exploit internal similarities or learn mapping
functions from external low- and high-resolution exemplar pairs.18,2 In particular, a
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

convolutional neural network-based super-resolution method (SRCNN) is proposed,


which provides a directly end-to-end mapping between the low/high-resolution
images.2 Since the method does not need to solve any optimization problem, the
convolutional neural network (CNN)-based method is more efficient than most of
the example-based methods.
For the traditional multi-focus methods, the full resolution source images should
be acquired previously, which increases the storage burden due to the growing
sensor data volumes. In this paper, a novel joint multi-focus image fusion and
super-resolution method with CNN technique is proposed. The first CNN layer
convolutional features are fused via image local clarity. The final high-resolution
fused image is obtained with the reconstruction filters which act like averaging
filters. Several pairs of multi-focus images are used to test the performances of
the proposed method. The experimental results demonstrate that the proposed
approach can generate fused images with the lower reconstruction error and better
visual quality. Compared with some state-of-the-art works such as the method-based
on guide filters,7 DTCWT,5 and dense scale invariant feature transform (DSIFT).11
The experimental results show that the proposed method also provides better fused
results in terms of both the visual quality and the objective evaluations. Overall,
the contributions of this study are mainly in three aspects:

(1) We present a novel joint multi-focus image fusion and super-resolution method
via CNN, which directly produces super-resolution and all-in-focus output
images simultaneously. The end-to-end CNN mapping is used to increase the
resolution of the source images. The first network layer convolutional features
responding to the local structure of the source images are fused to achieve the
fusion operation. Our proposed image framework demonstrates state-of-the-
art fusion output and achieves faster speed. We also explore some parameter
settings to achieve better performances of the proposed method.
(2) A two-scale clarity measurement based on spatial frequency (SF) is used to
construct the fusion weight maps for different source images. Furthermore, the

1750037-2
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

Multi-focus image fusion and super-resolution with CNN

weight maps are refined by the morphological operator and the guide filter,
which will make full use of spatial consistency and preserve the intrinsic edge
structures of source images into the refined weight maps.
(3) We analyze the relationship between our CNN-based image fusion method and
the traditional sparse-coding-based image fusion methods. This relationship
provides guidance for the design of the fusion strategy. The first CNN layer
can also view as patch extraction and representation in sparse-coding-based
image fusion framework. Thus, the first layer network features can be fused to
achieve the information fusion. The effectiveness of the proposed method also
provides the probability of designation of more advanced CNN-based image
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

fusion methods.
The remainder of this paper is organized as follows. Section 2 gives the fusion
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

weights construction via image local clarity. Section 3 briefly reviews the image
super-resolution framework with CNN. In Sec. 4, the proposed scheme steps are
presented in detail. The experimental results and discussions are presented in Sec. 5.
Finally, the conclusion of this work is given in Sec. 6.

2. Fusion Weights Construction


The key issue for both the spatial domain and transform method is the focus mea-
sures of image pixels or transformed coefficients. Various image focus measures
have been proposed all over the world. The local sharpness features such as SF10
sum-modified-Laplacian (SML),4 and energy of morphologic gradients1 are usu-
ally used to determine the clarity of the image pixels. Those focus measures men-
tioned above are correct and conform to human visual perception especially in the
texture or edge regions. In this paper, a two-scale focus measurement is used to
construct the fusion weight. The SF is used as an image focus measure and the
morphological operators and the guided image filtering3 are used for the consis-
tency verification. Then, the arithmetic average fusion is used to fuse the network
features of the source images and the fusion weight is constructed as shown in
Fig. 1.

Fig. 1. The flowchart of the fusion weight construction.

1750037-3
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

B. Yang et al.

The local SF matrices SA and SB are calculated from the source images A and
B respectively as

S(i, j) = RF2 (i, j) + CF2 (i, j), (2.1)
where S(i, j) is the SF of an image patch at the position of (i, j). The row frequency
RF and column frequency CF are calculated as

 m 
 1 m
RF(i, j) =  [I(i, j) − I(i − 1, j)]2 (2.2)
m × m i=2 j=1
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

and

 
 1
m m
CF(i, j) =  [I(i, j) − I(i, j − 1)]2 , (2.3)
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

m × m i=1 j=2

respectively. The image I can be A or B. The matrices SA and SB indicate the focus
measurement in fine scale and they are robust for texture or edge regions. However,
for smooth region, they are not always effective. To overcome this problem, the
smoothed SF matrix GA and GB are obtained by applying the Gaussian filter on
SA and SB , respectively. They indicate the focus measurement in large scale. Both
the fine and the large scale focus measurements are considered to construct the
fusion weight map. The preliminary fusion weighting maps are obtained as

1 SA (i, j) ≥ SB (i, j),
Wd (i, j) = (2.4)
0 elsewise
and

1 GA (i, j) ≥ GB (i, j),
Wr (i, j) = (2.5)
0 elsewise,
respectively. The morphology opening and closing operators with 15 × 15 matrix
structure with all elements being logical “1” are used to reduce the effect of noise
for the weighting map Wr . Then, at the edge position of Wr , the weighting values
are replaced by the corresponding weighting values of Wd . The combined fusion
weighting map indicates as Wf which is further refined by guided image filtering.
The source images are used as the guide images. The filter size γ and the blur
degree ω are set as 15 and 10−3 , respectively. The final fusion weighting map is
defined as W .

3. Image Super-Resolution Convolutional Neural Network


Different from traditional super-resolution methods, SRCNN2 has several appealing
properties. It has simple structure, but can provide superior accuracy comparing
with state-of-the-art example-based methods. For its full feed-forward and no opti-
mization problem involved on usage, SRCNN is faster than most other methods.

1750037-4
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

Multi-focus image fusion and super-resolution with CNN

Fig. 2. An overview of the CNN model for image SR.


Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

With the available larger datasets or model, restoration quality of the network can
be further improved.
In our proposed method, three layers CNN model shown in Fig. 2 is used to
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

achieve the image super-resolution.


The first convolutional layer convolves the image by a set of filters, which can
be expressed as
F1 = max(0, W1 ∗ X + B1 ), (3.1)
where W1 and B1 are the filters and biases, respectively, X is the input image,
and ∗ denotes the convolution operation. We adopt the default network parameters
defined in Ref. 9. W1 contains 64 network filters each of which is set as 9 × 9,
and B1 is an 64-dimensional vector each element of which is associated with a
filter. The filters W1 can also be seen as basis or overcompleted dictionary atoms in
sparse representation theory. The first layer applies 64 filters W1 convolutions on
the image, and the output F1 is composed of 64 feature maps. Intuitively, the first
layer operation is equivalent to extracting a 64-dimensional feature for each local
patch of the source image.
The second layer can be expressed as
F2 = max(0, W2 ∗ F1 + B2 ). (3.2)
We define the convolutional layer W2 as 32 filters with size 64×1×1. Thus, this layer
achieves nonlinear mapping from 64-dimensional feature to a 32-dimensional vector
which denotes as F2 . Each feature in F2 is a linear combination of the 64-dimensional
features in F1 . Therefore, F2 represents more complex features which are conceptu-
ally a representation of a high-resolution patch that will be used for reconstruction.
Since the Rectified Linear Unit (ReLU, max(0; x))2 being applied on the filter
responses, both convolutional layers F1 and F2 are nonlinearity operations.
The reconstruction layer is also defined as convolutional operation:
Y = W3 ∗ F2 + B3 , (3.3)
where W3 corresponds to a filters of a size 32 × 5 × 5, and B3 is a scalar parameter.
The filter W3 performs like an averaging operation for overlapping high-resolution
patches to produce the final full image. Although the above three operations have
different means, they all can be implemented as convolutional layers.

1750037-5
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

B. Yang et al.

Training the end-to-end mapping can be converted to update the CNN parame-
ters (weights W1 , W2 , W3 and biases B1 , B2 , B3 ) of filters with low/high resolution
training data.2 In the proposed method, the parameters are updated by stochas-
tic gradient descent with the standard back propagation.15 Due to the CNN is
full feed-forward and no optimization problem involved, the time consume of the
SR operation is also acceptable. It is possible to add more convolutional layers to
increase the nonlinearity. However, this would increase the complexity of the model,
and thus demands larger dataset and more training time. Notice that the predefined
parameter upscale factor can be set 2, 3, and 4, respectively, and CNN structure for
different upscale settings does not change. In the training phase, the ground truth
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

images randomly are cropped as patches as training samples. The corresponding


low-resolution training samples are synthesized by blurring the ground truth patches
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

with a Gaussian kernel, subsample it by the upscale factor, and upscale it by the
same factor via bicubic interpolation.

4. Convolutional Neural Networks for Multi-Focus Image Fusion


In this paper, we assume that there are two multi-focus source images and the
source images have been registered, which means that the objects in all images
are geometrically aligned. The image fusion scheme using CNNs is summarized in
Fig. 3. The whole fusion procedure of the proposed method in this paper takes the
following steps:
(1) Two low resolution images A and B are upscaled by bicubic interpolation to the
desired size. Our goal is to get a fused image Y which has the same resolution
as the ground truth high-resolution image. For the ease of presentation, we still
call A and B low-resolution image, although it has the same size as desired
high-resolution image.
(2) The source images A and B are convolved by 64 pre-trained network filters.
The output of the first layer contains 64 feature maps denoted as F1 (A) and
F1 (B), respectively. Due to the convolution operation in image domain, each

Fig. 3. The framework of the proposed non-linear method.

1750037-6
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

Multi-focus image fusion and super-resolution with CNN

local image patch projects onto a 64-dimensional feature and each feature
corresponding to a network filter or basis. This process is equivalent to the
sparse coding solver in projecting the patch onto a (low-resolution) dictio-
nary. The sparse-coding can be viewed as one layer CNN. As wavelet bases
or sparse representation dictionary atoms, the CNN network filters or basis are
also designed to represent the local image salient features of an image. There-
fore, we can achieve the information fusion by combining the local feature maps
F1 (A) and F1 (B).
(3) The above analysis can also help us to design fusion rule. For multi-focus image
fusion, we hope that the all focus region of the source images are selected to con-
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

struct the fused image. Therefore, the first layer feature maps of different source
images are combined with the fusion weights constructed in Sec. 2 directly as
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

F1 = W  F1 (A) + (1 − W )  F1 (B), (4.1)

where image F1 is the fused feature maps; W is the fusion weights constructed
as Fig. 1.
(4) The fused first layer feature maps F1 are propagated the second CNN nonlinear
mapping layer and the third reconstruction layer serially to construct the final
high-resolution fused all-in-focus image.

5. Experiments and Analysis


Four pair multi-focus natural images shown in Figs. 5(a), 5(b), 6(a), 6(b), 7(a),
7(b), 8(a) and 8(b) are used to test the performances of the proposed fusion
scheme. Figure 5(a) is near-focused, where the left flowerpot is in focus and
clears whereas the right clock is blurred. Figure 5(b) is far-focused and the sit-
uations for the flowerpot and clock are contrary. The remainder examples have the
same condition that the left near object is focused in Figs. 6(a), 7(a) and 8(a),
while Figs. 6(b), 7(b) and 8(b) is focused on the far objects. These multi-focus
images are used as the reference high-resolution images to achieve the evaluations.
The experiments are carried out on the artificial low-resolution images which are
sampled on the reference high-resolution images with Gaussian filter and down-
sample operators for given sub-sample factor. For the ease of presentation, we
only give the artificial low-resolution images with sub-sample factor 2 as shown
in Figs. 5(c), 5(d), 6(c), 6(d), 7(c), 7(d), 8(c) and 8(d). All the experiments are
implemented on an AMD 2.70 GHz PC with the simulation software Matlab 2010a.
In addition, to make an objective comparison among the proposed method and
other methods, the quantitative evaluation criteria, QW 14 and QAB/F 16 are used
to compare the different fusion methods. Both the criteria should be as close to 1
as possible.
In the proposed method, the size of the local image patch is a free parameter.
We apply the above two objective metrics QW and QAB/F to evaluate the impacts
of the parameter on fusion performance. The comparisons of the fused results with

1750037-7
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

B. Yang et al.
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

(a) (b)
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

Fig. 4. Comparison of the fused results with the parameter m ranging between [3, 57]. (a) QW
values of the test images. (b) QAB/F values of the test images.

the parameter m ranging between [3, 57] are shown in Fig. 4. From Fig. 4, we can
obviously conclude that the performance of the proposed method has been nearly
unaffected by the block size m in terms of both criteria. However, long time would
be elapsed if the block size is set too large. Therefore, we set m equal to 5 in the

(a) (b) (c)

(d) (e) (f)

Fig. 5. The fused “Flowerpot” images by different methods. (a) and (b) are high-resolution source
multi-focus images with size of 480 × 640; (c) and (d) are the corresponding artifact low-resolution
source multi-focus images with size of 240 × 320 which have been zoom with bicubic interpolation;
(e) the fused results with SWT-based method; (f) the fused results with DTCWT -based method;
(g) the fused results with GFF method; (h) the fused results with DSIFT; (i) the fused results of
the proposed method.

1750037-8
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

Multi-focus image fusion and super-resolution with CNN

(g) (h) (i)

Fig. 5. (Continued )
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

proposed method. The size of the Gaussian filter and its standard deviation is set
as 45 × 45 and 15, respectively, which provides relatively better fused results. The
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

structure element of the morphology opening and closing operators is set as 15 × 15


matrix with all element values being 1. The guided filter size γ and the blur degree
ω are set as 15 and 10−3 , respectively.
The state-of-the-art fusion methods based on the stationary wavelet transform
(SWT),12 DTCWT,5 guided filtering fusion (GFF),7 and DSIFT11 are used to test
the effectiveness of the proposed method. For fair comparison, the low-resolution
source images are enhanced with the same SRCNN network to get the high resolu-
tion images. Then, various fusion methods are performed on the enhanced images.
The implementations of GFF and DSIFT are all from the publicly available codes
provided by the authors. The “db1” wavelet basis and three-level wavelet transform
are used for SWT-based method. The DTCWT is also decomposed into three-levels
and the filters are set as “near sym b” and “qshift b”, respectively. The low fre-
quency coefficients are fused using averaging rule, and the high frequency coeffi-
cients are fused with maximum selection rule.
The fused results with different fused methods based on SWT, DTCWT,
GFF, and DSIFT are presented in Figs. 5(e)–5(h), 6(e)–6(h), 7(e)–7(h), 8(e)–8(h),
respectively. For ease of presentation, we only listed the fused results with upscale
factor 2. The fused results of the proposed method are given in Figs. 5(i), 6(i), 7(i)
and 8(i). We can see that all the results exhibit the improved visual quality. The
objects in the source images are clear preserved in the fused images. Careful inspec-
tions of Figs. 5(c)–5(h) reveal that the results of the proposed method exhibit the
best visual quality. The remainder examples in Figs. 6–8 show similar fused results
and the proposed method provides the best visual results. Those marked regions
are magnified and shown on the bottom right on result images of Fig. 8. All the
processing results exhibit the enhanced visual quality with improved image reso-
lution. However, there are serious circle blurring effect in the SWT and DTCWT
fused images because of their down-sampling process. Although the fused results
obtained by GFF and DSIFT have better appearances than SWT and DTCWT,
there are also some reconstruction artifacts in the GFF and DSIFT fused image.
The proposed method provides the result with the best visual appearance.

1750037-9
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

B. Yang et al.
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

(a) (b) (c)


by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

(d) (e) (f)

(g) (h) (i)

Fig. 6. The fused “Clock” images by different methods. (a) and (b) are high-resolution source
multi-focus images with size of 512 × 512; (c) and (d) are the corresponding artifact low-resolution
source multi-focus images with size of 256 × 256 which have been zoom with bicubic interpolation;
(e) the fused results with SWT-based method; (f) the fused results with DTCWT -based method;
(g) the fused results with GFF method; (h) the fused results with DSIFT; (i) the fused results of
the proposed method.

The QW and QAB/F between high-resolution source images and fused images of
the five methods with upscale factor 2, 3, and 4 are listed in Tables 1–3, respectively.
The values in bold indicate the highest quality measure obtained over all fusion
methods. For both Tables 1 and 3, the results of proposed method are obviously

1750037-10
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

Multi-focus image fusion and super-resolution with CNN


Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

(a) (b) (c)


by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

(d) (e) (f)

(g) (h) (i)

Fig. 7. The fusion “Bottle” images by different methods. (a) and (b) are high-resolution source
multi-focus images with size of 320 × 320; (c) and (d) are the corresponding artifact low-resolution
source multi-focus images with size of 160 × 160 which have been zoom with bicubic interpolation;
(e) the fused results with SWT-based method; (f) the fused results with DTCWT -based method;
(g) the fused results with GFF method; (h) the fused results with DSIFT; (i) the fused results of
the proposed method.

better than those of other four methods on both criterions. For Table 2, when
the upscale factor is 3, the proposed method also provides competitive results.
The experimental results demonstrate the effectiveness of the proposed method. In
addition, in order to estimate the computational efficiency of the proposed method,
the time costs for different source images are listed in Table 4. Since the CNN

1750037-11
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

B. Yang et al.
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

(a) (b) (c)


by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

(d) (e) (f)

(g) (h) (i)

Fig. 8. The fused “Pepsi” images by different methods. (a) and (b) are high-resolution source
multi-focus images with size of 512 × 512; (c) and (d) are the corresponding artifact low-resolution
source multi-focus images with size of 256 × 256 which have been zoom with bicubic interpolation;
(e) the fused results with SWT-based method; (f) the fused results with DTCWT -based method;
(g) the fused results with GFF method; (h) the fused results with DSIFT; (i) the fused results of
the proposed method.

structure is the same for different upscale factor, we only present the time consumed
of different methods with upscale factor 2. From Table 4, we can see that the
proposed method performs the fastest, following by the DTCWT and GFF based
methods, and the DSIFT method performs the lowest. This is mainly because that

1750037-12
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

Multi-focus image fusion and super-resolution with CNN

Table 1. Objective performances of different fusion methods with resolution


upscale factor 2.

Source Evaluation
images criteria SWT DTCWT GFF DSIFT Ours
Flowerpot QW 0.8036 0.8070 0.8230 0.8234 0.8238
QAB/F 0.5537 0.5468 0.5626 0.5631 0.5635
Clock QW 0.7716 0.7803 0.8332 0.8248 0.8351
QAB/F 0.6256 0.6097 0.6334 0.6364 0.6434
Bottle QW 0.9053 0.9008 0.9056 0.9074 0.9076
QAB/F 0.6374 0.6282 0.6339 0.6305 0.6378
Pepsi QW 0.7754 0.7795 0.7857 0.7849 0.7859
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

QAB/F 0.4316 0.4319 0.4348 0.4357 0.4366

Table 2. Objective performances of different fusion methods with resolution


by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

upscale factor 3.

Source Evaluation
images criteria SWT DTCWT GFF DSIFT Ours
Flowerpot QW 0.5663 0.5784 0.6011 0.6023 0.6019
QAB/F 0.3888 0.3799 0.3882 0.3883 0.3889
Clock QW 0.5396 0.5369 0.5769 0.5737 0.5781
QAB/F 0.4804 0.4687 0.4808 0.4841 0.4852
Bottle QW 0.5802 0.5959 0.5965 0.5984 0.5981
QAB/F 0.4818 0.4786 0.4720 0.4773 0.4774
Pepsi QW 0.2133 0.2160 0.2343 0.2352 0.2360
QAB/F 0.2125 0.2139 0.2194 0.2205 0.2202

Table 3. Objective performance of different fusion methods with resolution


upscale factor 4.

Source Evaluation
images criteria SWT DTCWT GFF DSIFT Ours
Flowerpot QW 0.5551 0.5646 0.5836 0.5833 0.5841
QAB/F 0.3359 0.3244 0.3385 0.3382 0.3387
Clock QW 0.5054 0.5070 0.5305 0.5300 0.5304
QAB/F 0.3816 0.3737 0.3843 0.3836 0.3864
Bottle QW 0.4943 0.5009 0.5027 0.5024 0.5050
QAB/F 0.3682 0.3645 0.3718 0.3720 0.3747
Pepsi QW 0.1604 0.1620 0.1769 0.1764 0.1771
QAB/F 0.1410 0.1412 0.1490 0.1472 0.1466

Table 4. The elapsed time of different fusion methods(second).

Methods SWT DTCWT GFF DSIFT Ours


Flowerpot 23.1943 22.4345 22.5150 31.2199 15.2104
Clock 18.8257 18.3194 18.3980 31.5989 12.1085
Bottle 4.7447 4.6641 4.5996 7.7479 3.5839
Pepsi 18.8411 18.3084 18.4011 32.9836 12.1757

1750037-13
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

B. Yang et al.

there is no need to perform the super-resolution preprocessing previously, which


would be very time consuming, especially when the number of source images is
large.

6. Conclusions
In this paper, multi-focus image fusion and super-resolution are performed simul-
taneously based on CNN. The main contributions of the proposed method contains
two aspects. We use convolutional filters to extract patches from source images
and represent them, then fusion weights are learned to guide multi-focus images
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

patches fusion, then the fused image is projected by nonlinear mapping into high-
resolution patches, and aggregated to produce the final image, which contains more
detail information than other state-of-the-art fusion methods. Experimental results
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

show that the proposed method gives superior performances in both subjective and
objective evaluations.

Acknowledgments
This paper is supported by the National Natural Science Foundation of China (Nos.
61102108), Scientific Research Fund of Hunan Provincial Education Department
(Nos. 16B225, YB2013B039), the Natural Science Foundation of Hunan Province
(Nos. 2016JJ3106), Young talents program of the University of South China, the
construct program of key disciplines in USC (No. NHXK04), and Scientific Research
Fund of Hengyang Science and Technology Bureau (Nos. 2015KG51).

References
1. I. De and B. Chanda, Multi-focus image fusion using a morphology-based focus mea-
sure in a quad-tree structure, Inf. Fusion 14(2) (2013) 136–146.
2. C. Dong, C. C. Loy, K. M. He and X. O. Tang, Image super-resolution using deep
convolutional networks, IEEE Trans. Pattern Anal. Mach. Intell. 38(2) (2016) 295–
307.
3. K. M. He, J. Sun and X. O. Tang, Guided image filtering, IEEE Trans. Pattern Anal.
Mach. Intell. 35(6) (2013) 1397–1409.
4. W. Huang and Z. Jing, Evaluation of focus measures in multi-focus image fusion,
Pattern Recogn. Lett. 28(4) (2007) 493–500.
5. J. J. Lewis, R. J. Callaghan, S. G. Nikolov, D. R. Bull and N. Canagarajah, Pixel-and
region-based image fusion with complex wavelets, Inf. Fusion 8(2) (2007) 119–130.
6. S. T. Li, X. D. Kang, L. Y. Fang, J. W. Hu and H. T. Yin, Pixel-level image fusion:
A survey of the state of the art, Inf. Fusion 33(1) (2017) 100–112.
7. S. T. Li, X. D. Kang and J. W. Hu, Image fusion with guided filtering, IEEE Trans.
Image Process. 22 (2013) 2864–2875.
8. S. T. Li, J. T. Kwok and Y. N. Wang, Combination of images with diverse focuses
using the spatial frequency, Inf. Fusion 2(3) (2001) 169–176.
9. H. F. Li, X. K. Liu, Z. T. Yu and Y. F. Zhang, Performance improvement scheme
of multifocus image fusion derived by difference images by difference images, Signal
Process. 128 (2016) 474–493.

1750037-14
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037

Multi-focus image fusion and super-resolution with CNN

10. S. T. Li and B. Yang, Multifocus image fusion using region segmentation and spatial
frequency, Image Vision Comput. 26(7) (2008) 971–979.
11. Y. Liu, S. P. Liu and Z. F Wang, Multi-focus image fusion with dense SIFT, Inf.
Fusion 23 (2015) 139–155.
12. P. P. Mirajkar and D. R. Sachin, Image fusion based stationary wavelet transform,
J. Int. J. Adv. Eng. Res. Stud. 2 (2013) 99–101.
13. S. Pertuz, D. Puig, M. A. Garcia and A. Fusiello, Generation of all-in-focus images
by noise-robust selective fusion of limited depth-of-field images, IEEE Trans. Image
Process. 22(3) (2013) 1242–1251.
14. G. Piella and H. Heijmans, A new quality metric for image fusion, in Proc. IEEE Int.
Conf. Image Processing, Vol. 2 (IEEE, 2003), pp. 173–176.
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com

15. D. E. Rumelhart, G. E. Hinton and R. J. Williams, Learning representations by back-


propagating errors, J. Nature 323 (1986) 533–536.
16. C. S. Xydeas and V. Petrovic, Objective image fusion performance measure, Electron.
Lett. 36(4) (2000) 308–309.
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.

17. Q. Yan, Y. Xu and X. K. Yang, Single image super-resolution based on gradient profile
sharpness, IEEE Trans. Image Process. 24(10) (2015) 187–202.
18. J. Yang, J. Wright, T. S. Huang and Y. Ma, Image super-resolution via sparse repre-
sentation, IEEE Trans. Image Process. 19(11) (2010) 2861–2873.

1750037-15

You might also like