Professional Documents
Culture Documents
1802 01770
1802 01770
1802 01770
Xin Tao1,2 Hongyun Gao1,2 Yi Wang1 Xiaoyong Shen2 Jue Wang3 Jiaya Jia1,2
1
The Chinese University of Hong Kong 2 Youtu Lab, Tencent 3 Megvii Inc.
1
3 2 1
Output
…
Input
Hidden Layer
… Dilated Conv.
Conv.
3 2 1 Skip Conn.
Recurrent Conn.
(a) (b) (c) (d)
Figure 2. Different CNNs for image processing. (a) U-Net [27] or encoder-decoder network [24]. (b) Multi-scale [25] or cascaded
refinement network [4]. (c) Dilated convolutional network [5]. (d) Our proposed scale-recurrent network (SRN).
based methods for the same reasons. However, recent cas- framework scale-recurrent network (SRN).
caded networks [4, 25] still use independent parameters for
each of their scales. In this work, we propose sharing net- 2. Related Work
work weights across scales to significantly reduce training
difficulty and introduce obvious stability benefits. In this section, we briefly review image deblurring meth-
The advantages are twofold. First, it reduces the number ods and recent CNN structures for image processing.
of trainable parameters significantly. Even with the same Image/Video Deblurring After the seminal work of Fer-
training data, the recurrent exploitation of shared weights gus et al. [12] and Shan et al. [29], many deblurring meth-
works in a way similar to using data multiple times to learn ods were proposed towards both restoration quality and
parameters, which actually amounts to data augmentation adaptiveness to different situations. Natural image priors
regarding scales. Second, our proposed structure can in- were designed to suppress artifacts and improve quality.
corporate recurrent modules, the hidden state of which im- They include total variation (TV) [3], sparse image priors
plicitly captures useful information and benefits restoration [22], heavy-tailed gradient prior [29], hyper-Laplacian prior
across scales. [21], l0 -norm gradient prior [38], etc. Most of these tradi-
tional methods follow the coarse-to-fine framework. Excep-
Encoder-decoder ResBlock Network Also inspired by tions include frequency-domain methods [8, 14], which are
recent success of encoder-decoder structure for various only applicable to limited situations.
computer vision tasks [23, 31, 33, 39], we explore the ef- Image deblurring also benefits from recent advance of
fective way to adapt it for the task of image deblurring. In deep CNN. Sun et al. [32] used the network to predict
this paper, we will show that directly applying an existing blur direction. Schuler et al. [28] stacked multiple CNNs
encoder-decoder structure cannot produce optimal results. in a coarse-to-fine manner to simulate iterative optimiza-
Our Encoder-decoder ResBlock network, on the contrary, tion. Chakrabarti [2] predicted deconvolution kernel in
amplifies the merit of various CNN structures and yields frequency domain. These methods follow the traditional
the feasibility for training. It also produces a very large re- framework with several parts replaced to the CNN ver-
ceptive field, which is of vital importance for large-motion sion. Su et al. [31] used an encoder-decoder network with
deblurring. skip-connections to learn video deblurring. Nah et al. [25]
Our experiments show that with the recurrent structure trained a multi-scale deep network to progressively restore
and combining above advantages, our end-to-end deep im- sharp images. These end-to-end methods make use of
age deblurring framework can greatly improve training ef- multi-scale information via different structures.
ficiency (≈1/4 training time of [25] to accomplish similar
restoration). We only use less than 1/3 trainable parame- CNNs for Image Processing Different from classifica-
ters with much faster testing time. Besides training effi- tion tasks, networks for image processing require special
ciency, our method can also produce higher quality results design. As one of the earliest methods, SRCNN [9] used 3
than existing methods both quantitatively and qualitatively, flat convolution layers (with the same feature map size) for
as shown in Fig. 1 and to be elaborated later. We name this super-resolution. Improvement was yielded by U-net [27]
B3 I3
B2 I2
B1 I1
Upsampling
Conv. Layer ResBlock Deconv. Layer ConvLSTM Conv./Deconv. Skip Conn. Scale Conn. Recurrent Conn.
ReLU ReLU
conv ReLU
conv conv ResBlock ResBlock ResBlock ResBlock deconv
Figure 5. Visual comparisons on testing dataset. From Top to Bottom: input, Whyte et al. [34], Sun et al. [32], Nah et al. [25] and ours
(best to zoom-in and view on screen).
(a) Input (b) Sun et al. (c) Nah et al. (d) Ours
Figure 6. Real blurred images.