You are on page 1of 4

Single Image Super-Resolution Using

Dictionary-Based Local Regression


Sundaresh Ram and Jeffrey J. Rodriguez.
Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, USA.
Email: {ram.jjrodrig}@email.arizona .edu

Ahstract-This paper presents a new method of producing a more sophisticated statistical image priors learned from natural
high-resolution image from a single low-resolution image without images have been explored [1], [2], [12]. Even though natural
any external training image sets. We use a dictionary-based
images are sparse signals, trying to capture their rich charac­
regression model for practical image super-resolution using local
self-similar example patches within the image. Our method is
teristics using only a few parameters is impossible. Further,
inspired by the observation that image patches can be well rep­ example-based nonparametric methods [14]-[16], [18] have
resented as a sparse linear combination of elements from a chosen been used to predict the missing high-frequency component
over-complete dictionary and that a patch in the high-resolution of the HR image, using a universal set of training example
image have good matches around its corresponding location in the
LRIHR image patches. But these methods require a large set
low-resolution image. A first-order approximation of a nonlinear
mapping function, learned using the local self-similar example
of training patches, making them computationally inefficient.
patches, is applied to the low-resolution image patches to obtain Recently, many SR algorithms have been developed using
the corresponding high-resolution image patches. We show that the fact that images possess a large number of self-similarities,
the proposed algorithm provides improved accuracy compared I.e., local image structures tend to reappear within and across
to the existing single image super-resolution methods by running
different image scales [3], [5], [18], and thus the image SR
them on various input images that contain diverse textures, and
that are contaminated by noise or other artifacts.
problem can be regularized based on these examples rather
Index Terms-Image restoration, dictionary learning, sparse than some external database. In particular, Glasner et al.
recovery, image super-resolution, regression. [5] proposed a framework that uses the self-similar example
patches from within and across different image scales to
I. INTRODUCTION regularize the SR problem. Yang et al. [18] developed a
Super-resolution image reconstruction is a very important SR method where the SR images are constructed using a
task in many computer vision and image processing applica­ learned dictionary formed using image patch pairs extracted by
tions. The goal of image super-resolution (SR) is to generate a building an image pyramid of the LR image. Freedman et al.
high-resolution (HR) image from one or more low-resolution [3] extended the example-based SR framework by following a
(LR) images. Image SR is a widely researched topic, and there local self-similarity assumption on the example image patches
have been numerous SR algorithms that have been proposed in and iteratively upscaling the LR image.
the literature [1]-[9], [11]-[18]. SR algorithms can be broadly In this paper, we describe a new single image super­
classified into three main categories: interpolation-based algo­ resolution method using a dictionary-based local regression
rithms, learning-based algorithms, and reconstruction-based approach. Our approach differs from prior work on single­
algorithms. Interpolation-based SR algorithms [2], [8], [9], image SR with respect to two aspects: 1) using the in-place
[11] are fast but the results may lack some of the fine details. self-similarity [17] to construct and train a dictionary from the
In learning-based SR algorithms [4]-[6], [14], detailed textures LR image, and 2) using the trained dictionary to learn a robust
are elucidated by searching through a training set of LRIHR first-order approximation of the nonlinear mapping from LR
images. They need a careful selection of the training im­ to HR image patches. The HR image patch is reconstructed
ages, otherwise erroneous details may be found. Alternatively, from the given LR image patch using this learned nonlinear
reconstruction-based SR algorithms [1], [3], [12], [15]-[18] function. We describe our algorithm in detail and present
apply various smoothness priors and impose the constraint that both quantitative and qualitative results comparing it to several
when properly downsampled, the HR image should reproduce recent algorithms.
the original LR image.
The image SR problem is a severely ill-posed problem, since II. METHODS

many HR images can produce the same LR image, and thus We assume that some areas of the input LR image Xo
it has to rely on some strong image priors for robust estima­ contain high-frequency content that we can borrow for image
tion. The most common image prior is the simple analytical SR; i.e., Xo is an image containing some sharp areas but
"smoothness" prior, e.g., bicubic interpolation. As an image overall having unsatisfactory pixel resolution. Let Xo and
contains sharp discontinuities, such as edges and corners, using X denote the LR (input) and HR (output) images, where
the simple "smoothness" prior for its SR reconstruction will the output pixel resolution is r times greater. Let Yo and Y
result in ringing, jagged, blurring and ghosting artifacts. Thus, denote the corresponding low-frequency bands. That is, Yo has

978-1-4799-4053-0114/$31.00 ©2014 IEEE 121 SSIAI2014


series expansion:

x f(y) = f(yo + y - Yo) (1)


f(yo) + "ilF(yo)(y - Yo) + O {II y - Yo lin
f
first-order regression
Xo + V'FT(yO)(Y- Yo) \..
low-freq. band Y
Y = bicubic(X,)
:::::; Xo + "ilF(yo)(y - Yo).
Equation (1) is a first-order approximation for the nonlinear
mapping function f. Instead of learning the mapping function
f, we can learn its gradient "ilf, which should be simpler.
We learn the mapping gradient "ilf by building a dictionary
using the prior example pair {Yo, xo} detailed in the next
section. With the function values learned, given any LR input
Fig. I. For each patch y of the upsampled low-frequency image Y, we find
its in-place match YO from the low-frequency image Yo, and then perform patch y, we first search its in-place self-similar example patch
a first-order regression on Xo to estimate the desired patch x for the target pair {Yo, xo}, then find "ilf(yo) using the trained dictionary,
image X.
and then use the first-order approximation to compute the HR
the same spatial dimension as Xo, but is missing the high­ image patch x.
frequency content, and likewise for Y and X. Let Xo and x Due to the discrete resampling process in downsampling
denote a x a HR image patches sampled from Xo and X, and upsampling, we expect to find multiple approximate in­
respectively, and let Yo and y denote a x a LR image patches place examples for y in the 3 x 3 neighborhood of (is,js),
sampled from Yo and Y , respectively. Let (i, j) and (p, q) which contains 9 patches. To reduce the regression variance,
denote coordinates in the 2-D image plane. we perform regression on each of them and combine the results
by a weighted average. Given the in-place self-similar example
A. Proposed Super-Resolution Algorithm patch pairs {YO,XO}Y=l for y, we have
The LR image is denoted as Xo E lRK,XK2, from which 9

we obtain its low-frequency image Yo E lRK,XK2 by Gaussian x = L ( XOi + "ilF(yo,)(y - Yo,)) Wi, (2)
filtering. We upsample Xo using bicubic interpolation by a i =l

factor of r to get Y E lRTK,XTK2. Y is used to approximate where Wi = (liz) . exp {- II y - YOi II§ 120"2} with z the
the low-frequency component of the unknown HR image X E normalization factor.
lRTK,XTK2. We aim to estimate X from the knowledge of
Xo, Yo and Y . C. Dictionary Learning
Fig. 1 is a block-diagram description of the overall SR The proposed dictionary-based method to learn the mapping
scheme presented. For each image patch y from the image Y at gradient "ilf is a modification of the work by Yang et al.
location (i,j), we find its in-place self-similar example patch [15], [16] to guarantee detail enhancement. Yang et al. [15],
Yo around its corresponding coordinates (is,js) in the image [16] developed a method for single image SR based on sparse
Yo, where is = lilr + 0.5J and js = Ulr + 0.5J. Similarly, modeling. This method utilizes an overcomplete dictionary
we can obtain the image patch Xo from image Xo, which is a Dh E lRnxK built using the HR image, which is an n x K
HR version of Yo. The image patch pair {Yo, xo} constitutes matrix whose K columns represents K "atoms" of size n,
a LR/HR image prior example pair from which we learn a where an "atom" is a sparse coefficient vector (i.e., a vector
first-order regression model to estimate the HR image patch x of weights/coefficients in the sparse basis). We assume that
for the LR patch y. We repeat the procedure using overlapping any patch x E lRn in the HR image X can be represented as
patches of image Y , and the final HR image X is generated by a sparse linear combination of the atoms of Dh as follows:
aggregating all the HR image patches x obtained. For large
upscaling factors, the algorithm is run iteratively, each time x:::::; DhQ, with II Q 110« K, Q E lRK. (3)
with a constant scaling factor r. A patch y in the observed LR image can be represented using
a corresponding LR dictionary Dl with the same sparse coeffi­
B. Local Regression
cient vector Q. This is ensured by co-training the dictionary Dh
The patch-based single image SR problem can be viewed with the HR patches and dictionary Dl with the corresponding
as a regression problem, i.e., finding a nonlinear mapping LR patches.
function f from the LR patch space to the target HR patch For a given input LR image patch y we determine the sparse
space. However, due to the ill-posed nature of the inverse solution vector
problem at hand, learning this nonlinear mapping function
requires good image priors and proper regularization. From Q* = min
a
II GDIQ - Gy II� +A II Q 111 (4)
Section II-A, the in-place self-similar example patch pair
where G is a feature extraction operator to emphasize high­
{Yo, xo} serves as a good prior example pair for inferring frequency detail. We use the following set of I-D filters:
the HR version of y. Assuming that the mapping function f
is continuously differentiable, we have the following Taylor 9
,
= [-1,0,1]' 9
2
= 9 ;, 93 = [-1,-2,1], 94 = �
9 (5)

122
G is obtained as a concatenation of the responses from TABLE I
applying the above I-D filters to the image. The sparsity of PREDICTION RMSE FOR ONE UPSCALlNG STEP (2x)
the solution vector a* is controlled by A. In order to enhance Glasner Yang Freedman
Images Bicubic Ours
the texture details while suppressing noise and other artifacts, [5] [18] [3]
we need to adapt the number of non-zero coefficients in the Chip 6.03 5.81 5.70 5.85 4.63
solution vector a*, as increasing the number of non-zero Child 7.47 6.74 7.06 6.51 5.92
coefficients enhances the texture details but also enhances the Peppers 9.11 8.97 9.10 8.72 7.74
House 10.37 10.41 10.16 9.62 8.14
noise and artifacts. We use the standard deviation ( (J') of a
Cameraman 11.61 10.93 11.81 10.64 8.97
patch to indicate the local texture content, and empirically
13.31 12.92 12.65 11.97 11.41

{
Lena
adapted A as follows:
Barbara 14.93 14.24 13.92 13.23 12.22
Monarch 16.25 15.71 15.96 15.50 15.42
0.5 if (J' < 15
A = 0.1 if 15 � (J' � 25
A. Algorithm Parameter Settings
0.01 otherwise
We chose the image patch size as a = 5 and the iterative
These (J' thresholds are designed for our 8-bit gray-scale scaling factor as r = 2 in all of our experiments. Bicubic
images and can easily be adapted for other image types. interpolation on the input LR image Xo generates the low­
The mapping gradient \7f for a given Yo is obtained as frequency component Y of the target HR image X. A standard
\7f(yo) = Dha*. deviation of 0.4 is used in the low-pass Gaussian filtering to
We make use of a bilateral filter as a degradation operator obtain the low-frequency component Yo of the input LR image
instead of a Gaussian blurring operator to obtain the image Xo. For clean images, we use the nearest neighbor in-place
Yo from the given LR input image Xo for dictionary training, example for regression, whereas in the case of noisy images,
as we are interested in enhancing the textures present while we average all the 9 in-place example regressions for robust
suppressing noise and other artifacts. Dictionary training starts estimation, where (J' is the only tuning parameter needed to
by sampling in-place self-similar example image patch pairs compute the weight Wi in (2) depending on the noise level.
{Yo, XO}�l from the corresponding LR and HR images. We K = 512 atoms are used to train and build the dictionaries
generate the HR patch vector Xh = {xolO X02, ,Xo=}, LR • • • Dh and Dl used in the experiments.
patch feature vector Yi = {Yo" Yo2, ,xo=} and residue
• • •

B. Quantitative Results
patch vector E = {xo, - Yo" x02 - YOz,' ..,XOm - YOm}' We
use the residue patch vector E instead of the HR patch vector In order to obtain an objective measure of performance
X h for training. The residue patch vector is concatenated with for the SR algorithms under comparison, we validated the
the LR patch features, and a concatenated dictionary is defined results of several example images taken from [10] (whose
by names appear in Table 1) using the root mean square error
(RMSE). The results of all the algorithms are shown in Table
1 for one upscaling step (2 x ). From Table 1 we observe
(6)
that SR using simple bicubic interpolation performs the worst
due to the assumption of overly smooth image priors. Yang's
Here, Nand M are dimensions of LR and HR image patches SR algorithm performs better than bicubic interpolation in
in vector form. Optimized dictionaries are computed by terms of RMSE values for the different images. Glasner's
and Freedman's SR methods have very similar RMSE values,
II Xc - DcZ II� +A II Z 111 (7)
since both the methods are closely related by using local self­
s.t. II DCi II�� 1, i = 1, ... , K similar patches to learn the HR image patches from a single
LR image. The proposed SR algorithm has the best RMSE
The trammg process is performed in an iterative manner, values, as it combines the advantages of in-place example
alternating between optimizing Z and Dc using the technique patches and their corresponding local self similarity learned
in [15]. using the dictionary-based approach.

C. Qualitative Results
III. EXPERIMENTS AND RESULTS
Real applications requiring SR rely on three main aspects:
We evaluate the proposed SR algorithm both quantitatively image sharpness, image naturalness (affected by visual arti­
and qualitatively, on a variety of example images used in the facts) and the speed of the algorithm to super-resolve. We
SR literature [17]. We compare our SR algorithm with recent will discuss the SR algorithms compared here with respect
algorithms proposed by Glasner et al. [5], Yang et al. [18] and to these aspects. Fig. 2 shows the SR results of the different
Freedman et al. [3].We used open source implementations of approaches on "child" by 4 x , "cameraman" by 3 x and on
these three SR algorithms available online for comparison, "castle" by 2 x . As shown, Glasner's and Freedman's SR
carefully choosing the various parameters within each method algorithms give rise to overly sharp images, resulting in visual
for a fair comparison. artifacts, e.g., ghosting and ringing artifacts around the eyes

123
Original Bicubic Glasner Yang Freedman Ours
Fig. 2. Super-resolution results on "child" (4x), "cameraman" (3x) and "castle" (2x). Results are better viewed in zoomed mode.

in "child", and jagged artifacts along the towers in "castle" . [4] W. T. Freeman, T. R. Jones, and E. C. Pasztor, "Example-based super­
Also, the details of the camera are smudged in "cameraman" resolution," IEEE Comput. Graph. Appl., vol. 22,no. 2,pp. 56-65,Mar.
2002.
for both algorithms. The results of the Yang's SR algorithm [5] D. Glasner, S. Bagon, and M. Irani, "Super-resolution from a single
are generally a little blurry and they contain small visible image;' in Proc. IEEE Int. Con! Computer Vision, pp. 349-356,2009.
noise-like artifacts across the images upon a closer look. [6] H. He and W-c. Siu, "Single image super-resolution using Gaussian
process regression," in Proc. IEEE C0I1f Computer Vision and Pattern
In comparison, our algorithm is able to recover the local Recognition,pp. 449-456,2011.
texture details as well as sharp edges without sacrificing the [7] K. I. Kim and Y. Kwon, "Single-image super-resolution using sparse
naturalness of the images. regression and natural image prior," IEEE Trans. Pattern. Anal. Mach.
buell., vol. 32,no. 6,pp. 1127-1133,Jun. 2010.
[8] X. Li and M. T. Orchard, "New edge-directed interpolation," IEEE Trans.
IV. CONCLUSION
Image. Process., vol. 10,no. 10,pp. 1521-1527,Oct. 2001.
In this paper we propose a robust first-order regression [9] S. Mallat and G. Yu, "Super-resolution with sparse mixing estimators,"
IEEE Trans. Image. Process., vol. 19,no. II,pp. 2889-2900,Nov. 2010.
model for single-image SR based on local self-similarity
[10] D. Martin, C. Fowlkes, D. Tal, and J. Malik, "A dataset of human
within the image. Our approach combines the advantages segmentation natural images and its application to evaluating segmen­
of learning from in-place examples and learning from local tation algorithms and measuring ecological statistics," in Proc. Int. Con!
Computer Vision, pp. 416-423,2001.
self-similar patches within the same image using a trained
[11] Q. Shan, Z. Li, J. Jia, and C-K. Tang, "Fast image/video upsampling,"
dictionary. The in-place examples allow us to learn a local ACM Transactions on Graphics, vol. 27, no. 5, pp. 153-1-153-8, Dec.
regression function for the otherwise ill-posed mapping from 2008.
[12] J. Sun, J. Sun, Z. Xu, and H-Y. Shum, "Gradient profile prior and its
LR to HR image patches. On the other hand, by learning
applications in image super-resolution and enhancement," IEEE Trans.
from local self-similar patches elsewhere within the image, Image. Process., vol. 20,no. 6,Jun. 2011.
the regression model can overcome the problem of insufficient [13] R. Timofte, V. D. Smet, and L. V. Gool, "Anchored neighborhood
regression for fast example-based super-resolution," in IEEE Int. Con!
number of in-place examples. By conducting various experi­
Computer Vision, 2013.
ments and comparing with existing algorithms, we show that [14] Q. Wang, X. Tang, and H. Shum, "Patch based blind image super
our new approach is more accurate and can produce more resolution," in Proc. IEEE Int. COllf Computer Vision, pp. 709-716,2005.
[15] J. Yang, J. Wright, T. S. Huang, and Y. Ma, "Image super-resolution via
natural looking results with sharp details by suppressing the
sparse representation;' IEEE Trans. Image. Process., vol. 19,no. 11,pp.
noisy artifacts present within the images. 2861-2873,Nov. 2010.
[16] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. S. Huang, "Coupled dictio­
REFERENCES nary training for image super-resolution," IEEE Trans. Image. Process.,
vol. 21,no. 8,pp. 3467-3478, Aug. 2012.
[I] S. Dai, M. Han, W. Xu, Y, Wu, Y. Gong, and A. K. Katsaggelos,
[17] J. Yang, Z. Lin, and S. Cohen, "Fast image super-resolution based on
"SoftCuts: a soft edge smoothness prior for color image super-resolution,"
in-place example regression," in Proc. IEEE Con! Computer Vision and
IEEE Trans. Image. Process., vol. 18,no. 5,pp. 969-981,May 2009.
Pattern Recognition, pp. 1059-1066,2013.
[2] R. Fattal, "Image upsampling via imposed edge statistics;' ACM Trans­
[18] C-Y. Yang, J-B. Huang, and M-H. Yang, "Exploiting self-similarities for
actions on Graphics, vol. 26,no. 3,pp. 95-1-95-8,Jul. 2007.
single frame super-resolution," in Proc. Asian Con! Computer Vision, pp.
[3] G. Freedman and R. Fallal, "Image and video upscaling from local self­
497-510,2010.
examples," ACM Transactions on Graphics, vol. 30,no. 2,pp. 12-1-12-
II, Apr. 2011.

124

You might also like