Professional Documents
Culture Documents
Bin2017 PDF
Bin2017 PDF
†jy zhong1015@163.com
‡bhylyh.com@163.com
§zzchen801@163.com
The aim of multi-focus image fusion is to create a synthetic all-in-focus image from
several images each of which is obtained with different focus settings. However, if the
resolution of source images is low, the fused images with traditional fusion method would
be also in low-quality, which hinders further image analysis even the fused image is all-
in-focus. This paper presents a novel joint multi-focus image fusion and super-resolution
method via convolutional neural network (CNN). The first level network features of
different source images are fused with the guidance of the local clarity calculated from the
source images. The final high-resolution fused image is obtained with the reconstruction
network filters which act like averaging filters. The experimental results demonstrate
that the proposed approach can generate the fused images with better visual quality
and acceptable computation efficiency as compared to other state-of-the-art works.
1. Introduction
Due to the restricting finite depth of field of optical lenses, only the objects with cer-
tain distance from the lens can be recorded clearly, so it is difficult to get an image
with all objects in it being in focus. The problem can be solved by the multi-focus
image fusion technology which creates a synthetic all-in-focus image via combining
two or more images of a scene obtained with different focus settings.6,13,8,9 In the
past two decades, the researchers all over the world have proposed various fusion
approaches which can be categorized into spatial domain and transform domain
methods according to the stage of the fusion operation performs in Ref. 6. The
spatial domain-based methods select weights of the image pixels with the spatial
1750037-1
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
B. Yang et al.
clarity measurement while the transform-based methods fuse the transform coeffi-
cients of different source images with the guidance of coefficients activity. The multi-
resolution transforms such as wavelet transform,2 and dual-tree complex wavelet
transform (DTCWT)5 are usually used to perform the fusion.
In addition, most consumer-level image sensors also have limitation in respect
to their maximum resolution. If the resolution of the source images is low the fused
image would be still low-resolution, which hinders many further image process-
ing tasks. Single image super-resolution (SR) technology can be used to improve
the image resolution.18,17,2 Bilinear and bicubic interpolation are the basic super-
resolution methods which still be very popular in many applications due to theirs
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com
(1) We present a novel joint multi-focus image fusion and super-resolution method
via CNN, which directly produces super-resolution and all-in-focus output
images simultaneously. The end-to-end CNN mapping is used to increase the
resolution of the source images. The first network layer convolutional features
responding to the local structure of the source images are fused to achieve the
fusion operation. Our proposed image framework demonstrates state-of-the-
art fusion output and achieves faster speed. We also explore some parameter
settings to achieve better performances of the proposed method.
(2) A two-scale clarity measurement based on spatial frequency (SF) is used to
construct the fusion weight maps for different source images. Furthermore, the
1750037-2
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
weight maps are refined by the morphological operator and the guide filter,
which will make full use of spatial consistency and preserve the intrinsic edge
structures of source images into the refined weight maps.
(3) We analyze the relationship between our CNN-based image fusion method and
the traditional sparse-coding-based image fusion methods. This relationship
provides guidance for the design of the fusion strategy. The first CNN layer
can also view as patch extraction and representation in sparse-coding-based
image fusion framework. Thus, the first layer network features can be fused to
achieve the information fusion. The effectiveness of the proposed method also
provides the probability of designation of more advanced CNN-based image
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com
fusion methods.
The remainder of this paper is organized as follows. Section 2 gives the fusion
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.
weights construction via image local clarity. Section 3 briefly reviews the image
super-resolution framework with CNN. In Sec. 4, the proposed scheme steps are
presented in detail. The experimental results and discussions are presented in Sec. 5.
Finally, the conclusion of this work is given in Sec. 6.
1750037-3
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
B. Yang et al.
The local SF matrices SA and SB are calculated from the source images A and
B respectively as
S(i, j) = RF2 (i, j) + CF2 (i, j), (2.1)
where S(i, j) is the SF of an image patch at the position of (i, j). The row frequency
RF and column frequency CF are calculated as
m
1 m
RF(i, j) = [I(i, j) − I(i − 1, j)]2 (2.2)
m × m i=2 j=1
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com
and
1
m m
CF(i, j) = [I(i, j) − I(i, j − 1)]2 , (2.3)
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.
m × m i=1 j=2
respectively. The image I can be A or B. The matrices SA and SB indicate the focus
measurement in fine scale and they are robust for texture or edge regions. However,
for smooth region, they are not always effective. To overcome this problem, the
smoothed SF matrix GA and GB are obtained by applying the Gaussian filter on
SA and SB , respectively. They indicate the focus measurement in large scale. Both
the fine and the large scale focus measurements are considered to construct the
fusion weight map. The preliminary fusion weighting maps are obtained as
1 SA (i, j) ≥ SB (i, j),
Wd (i, j) = (2.4)
0 elsewise
and
1 GA (i, j) ≥ GB (i, j),
Wr (i, j) = (2.5)
0 elsewise,
respectively. The morphology opening and closing operators with 15 × 15 matrix
structure with all elements being logical “1” are used to reduce the effect of noise
for the weighting map Wr . Then, at the edge position of Wr , the weighting values
are replaced by the corresponding weighting values of Wd . The combined fusion
weighting map indicates as Wf which is further refined by guided image filtering.
The source images are used as the guide images. The filter size γ and the blur
degree ω are set as 15 and 10−3 , respectively. The final fusion weighting map is
defined as W .
1750037-4
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
With the available larger datasets or model, restoration quality of the network can
be further improved.
In our proposed method, three layers CNN model shown in Fig. 2 is used to
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.
1750037-5
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
B. Yang et al.
Training the end-to-end mapping can be converted to update the CNN parame-
ters (weights W1 , W2 , W3 and biases B1 , B2 , B3 ) of filters with low/high resolution
training data.2 In the proposed method, the parameters are updated by stochas-
tic gradient descent with the standard back propagation.15 Due to the CNN is
full feed-forward and no optimization problem involved, the time consume of the
SR operation is also acceptable. It is possible to add more convolutional layers to
increase the nonlinearity. However, this would increase the complexity of the model,
and thus demands larger dataset and more training time. Notice that the predefined
parameter upscale factor can be set 2, 3, and 4, respectively, and CNN structure for
different upscale settings does not change. In the training phase, the ground truth
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com
with a Gaussian kernel, subsample it by the upscale factor, and upscale it by the
same factor via bicubic interpolation.
1750037-6
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
local image patch projects onto a 64-dimensional feature and each feature
corresponding to a network filter or basis. This process is equivalent to the
sparse coding solver in projecting the patch onto a (low-resolution) dictio-
nary. The sparse-coding can be viewed as one layer CNN. As wavelet bases
or sparse representation dictionary atoms, the CNN network filters or basis are
also designed to represent the local image salient features of an image. There-
fore, we can achieve the information fusion by combining the local feature maps
F1 (A) and F1 (B).
(3) The above analysis can also help us to design fusion rule. For multi-focus image
fusion, we hope that the all focus region of the source images are selected to con-
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com
struct the fused image. Therefore, the first layer feature maps of different source
images are combined with the fusion weights constructed in Sec. 2 directly as
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.
where image F1 is the fused feature maps; W is the fusion weights constructed
as Fig. 1.
(4) The fused first layer feature maps F1 are propagated the second CNN nonlinear
mapping layer and the third reconstruction layer serially to construct the final
high-resolution fused all-in-focus image.
1750037-7
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
B. Yang et al.
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com
(a) (b)
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.
Fig. 4. Comparison of the fused results with the parameter m ranging between [3, 57]. (a) QW
values of the test images. (b) QAB/F values of the test images.
the parameter m ranging between [3, 57] are shown in Fig. 4. From Fig. 4, we can
obviously conclude that the performance of the proposed method has been nearly
unaffected by the block size m in terms of both criteria. However, long time would
be elapsed if the block size is set too large. Therefore, we set m equal to 5 in the
Fig. 5. The fused “Flowerpot” images by different methods. (a) and (b) are high-resolution source
multi-focus images with size of 480 × 640; (c) and (d) are the corresponding artifact low-resolution
source multi-focus images with size of 240 × 320 which have been zoom with bicubic interpolation;
(e) the fused results with SWT-based method; (f) the fused results with DTCWT -based method;
(g) the fused results with GFF method; (h) the fused results with DSIFT; (i) the fused results of
the proposed method.
1750037-8
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
Fig. 5. (Continued )
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com
proposed method. The size of the Gaussian filter and its standard deviation is set
as 45 × 45 and 15, respectively, which provides relatively better fused results. The
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.
1750037-9
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
B. Yang et al.
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com
Fig. 6. The fused “Clock” images by different methods. (a) and (b) are high-resolution source
multi-focus images with size of 512 × 512; (c) and (d) are the corresponding artifact low-resolution
source multi-focus images with size of 256 × 256 which have been zoom with bicubic interpolation;
(e) the fused results with SWT-based method; (f) the fused results with DTCWT -based method;
(g) the fused results with GFF method; (h) the fused results with DSIFT; (i) the fused results of
the proposed method.
The QW and QAB/F between high-resolution source images and fused images of
the five methods with upscale factor 2, 3, and 4 are listed in Tables 1–3, respectively.
The values in bold indicate the highest quality measure obtained over all fusion
methods. For both Tables 1 and 3, the results of proposed method are obviously
1750037-10
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
Fig. 7. The fusion “Bottle” images by different methods. (a) and (b) are high-resolution source
multi-focus images with size of 320 × 320; (c) and (d) are the corresponding artifact low-resolution
source multi-focus images with size of 160 × 160 which have been zoom with bicubic interpolation;
(e) the fused results with SWT-based method; (f) the fused results with DTCWT -based method;
(g) the fused results with GFF method; (h) the fused results with DSIFT; (i) the fused results of
the proposed method.
better than those of other four methods on both criterions. For Table 2, when
the upscale factor is 3, the proposed method also provides competitive results.
The experimental results demonstrate the effectiveness of the proposed method. In
addition, in order to estimate the computational efficiency of the proposed method,
the time costs for different source images are listed in Table 4. Since the CNN
1750037-11
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
B. Yang et al.
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com
Fig. 8. The fused “Pepsi” images by different methods. (a) and (b) are high-resolution source
multi-focus images with size of 512 × 512; (c) and (d) are the corresponding artifact low-resolution
source multi-focus images with size of 256 × 256 which have been zoom with bicubic interpolation;
(e) the fused results with SWT-based method; (f) the fused results with DTCWT -based method;
(g) the fused results with GFF method; (h) the fused results with DSIFT; (i) the fused results of
the proposed method.
structure is the same for different upscale factor, we only present the time consumed
of different methods with upscale factor 2. From Table 4, we can see that the
proposed method performs the fastest, following by the DTCWT and GFF based
methods, and the DSIFT method performs the lowest. This is mainly because that
1750037-12
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
Source Evaluation
images criteria SWT DTCWT GFF DSIFT Ours
Flowerpot QW 0.8036 0.8070 0.8230 0.8234 0.8238
QAB/F 0.5537 0.5468 0.5626 0.5631 0.5635
Clock QW 0.7716 0.7803 0.8332 0.8248 0.8351
QAB/F 0.6256 0.6097 0.6334 0.6364 0.6434
Bottle QW 0.9053 0.9008 0.9056 0.9074 0.9076
QAB/F 0.6374 0.6282 0.6339 0.6305 0.6378
Pepsi QW 0.7754 0.7795 0.7857 0.7849 0.7859
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com
upscale factor 3.
Source Evaluation
images criteria SWT DTCWT GFF DSIFT Ours
Flowerpot QW 0.5663 0.5784 0.6011 0.6023 0.6019
QAB/F 0.3888 0.3799 0.3882 0.3883 0.3889
Clock QW 0.5396 0.5369 0.5769 0.5737 0.5781
QAB/F 0.4804 0.4687 0.4808 0.4841 0.4852
Bottle QW 0.5802 0.5959 0.5965 0.5984 0.5981
QAB/F 0.4818 0.4786 0.4720 0.4773 0.4774
Pepsi QW 0.2133 0.2160 0.2343 0.2352 0.2360
QAB/F 0.2125 0.2139 0.2194 0.2205 0.2202
Source Evaluation
images criteria SWT DTCWT GFF DSIFT Ours
Flowerpot QW 0.5551 0.5646 0.5836 0.5833 0.5841
QAB/F 0.3359 0.3244 0.3385 0.3382 0.3387
Clock QW 0.5054 0.5070 0.5305 0.5300 0.5304
QAB/F 0.3816 0.3737 0.3843 0.3836 0.3864
Bottle QW 0.4943 0.5009 0.5027 0.5024 0.5050
QAB/F 0.3682 0.3645 0.3718 0.3720 0.3747
Pepsi QW 0.1604 0.1620 0.1769 0.1764 0.1771
QAB/F 0.1410 0.1412 0.1490 0.1472 0.1466
1750037-13
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
B. Yang et al.
6. Conclusions
In this paper, multi-focus image fusion and super-resolution are performed simul-
taneously based on CNN. The main contributions of the proposed method contains
two aspects. We use convolutional filters to extract patches from source images
and represent them, then fusion weights are learned to guide multi-focus images
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com
patches fusion, then the fused image is projected by nonlinear mapping into high-
resolution patches, and aggregated to produce the final image, which contains more
detail information than other state-of-the-art fusion methods. Experimental results
by UNIVERSITY OF AUCKLAND on 08/10/17. For personal use only.
show that the proposed method gives superior performances in both subjective and
objective evaluations.
Acknowledgments
This paper is supported by the National Natural Science Foundation of China (Nos.
61102108), Scientific Research Fund of Hunan Provincial Education Department
(Nos. 16B225, YB2013B039), the Natural Science Foundation of Hunan Province
(Nos. 2016JJ3106), Young talents program of the University of South China, the
construct program of key disciplines in USC (No. NHXK04), and Scientific Research
Fund of Hengyang Science and Technology Bureau (Nos. 2015KG51).
References
1. I. De and B. Chanda, Multi-focus image fusion using a morphology-based focus mea-
sure in a quad-tree structure, Inf. Fusion 14(2) (2013) 136–146.
2. C. Dong, C. C. Loy, K. M. He and X. O. Tang, Image super-resolution using deep
convolutional networks, IEEE Trans. Pattern Anal. Mach. Intell. 38(2) (2016) 295–
307.
3. K. M. He, J. Sun and X. O. Tang, Guided image filtering, IEEE Trans. Pattern Anal.
Mach. Intell. 35(6) (2013) 1397–1409.
4. W. Huang and Z. Jing, Evaluation of focus measures in multi-focus image fusion,
Pattern Recogn. Lett. 28(4) (2007) 493–500.
5. J. J. Lewis, R. J. Callaghan, S. G. Nikolov, D. R. Bull and N. Canagarajah, Pixel-and
region-based image fusion with complex wavelets, Inf. Fusion 8(2) (2007) 119–130.
6. S. T. Li, X. D. Kang, L. Y. Fang, J. W. Hu and H. T. Yin, Pixel-level image fusion:
A survey of the state of the art, Inf. Fusion 33(1) (2017) 100–112.
7. S. T. Li, X. D. Kang and J. W. Hu, Image fusion with guided filtering, IEEE Trans.
Image Process. 22 (2013) 2864–2875.
8. S. T. Li, J. T. Kwok and Y. N. Wang, Combination of images with diverse focuses
using the spatial frequency, Inf. Fusion 2(3) (2001) 169–176.
9. H. F. Li, X. K. Liu, Z. T. Yu and Y. F. Zhang, Performance improvement scheme
of multifocus image fusion derived by difference images by difference images, Signal
Process. 128 (2016) 474–493.
1750037-14
2nd Reading
June 19, 2017 19:15 WSPC/S0219-6913 181-IJWMIP 1750037
10. S. T. Li and B. Yang, Multifocus image fusion using region segmentation and spatial
frequency, Image Vision Comput. 26(7) (2008) 971–979.
11. Y. Liu, S. P. Liu and Z. F Wang, Multi-focus image fusion with dense SIFT, Inf.
Fusion 23 (2015) 139–155.
12. P. P. Mirajkar and D. R. Sachin, Image fusion based stationary wavelet transform,
J. Int. J. Adv. Eng. Res. Stud. 2 (2013) 99–101.
13. S. Pertuz, D. Puig, M. A. Garcia and A. Fusiello, Generation of all-in-focus images
by noise-robust selective fusion of limited depth-of-field images, IEEE Trans. Image
Process. 22(3) (2013) 1242–1251.
14. G. Piella and H. Heijmans, A new quality metric for image fusion, in Proc. IEEE Int.
Conf. Image Processing, Vol. 2 (IEEE, 2003), pp. 173–176.
Int. J. Wavelets Multiresolut Inf. Process. 2017.15. Downloaded from www.worldscientific.com
17. Q. Yan, Y. Xu and X. K. Yang, Single image super-resolution based on gradient profile
sharpness, IEEE Trans. Image Process. 24(10) (2015) 187–202.
18. J. Yang, J. Wright, T. S. Huang and Y. Ma, Image super-resolution via sparse repre-
sentation, IEEE Trans. Image Process. 19(11) (2010) 2861–2873.
1750037-15