You are on page 1of 7

Single Image Super-Resolution Using Compressive

Bhabesh Deka
Department of Electronics and Communication Engineering, Tezpur Central University, India

Abstract— This paper proposes a novel algorithm that unifies the
fields of compressed sensing and sparse representations to
generate a super-resolution image from a single, low-resolution
input along with the use of a training data set. Super-resolution
image reconstruction is currently an active area of research, as it
offers the promise of overcoming some of the inherent resolution
limitations of the imaging systems. In this paper, super-
resolution has been achieved by exploiting the fact that the image
data is highly sparse over some redundant transforms. Inspired
by this observation, we seek a sparse representation for each
patch of the low-resolution image, and then use the coefficients of
this representation to generate the high-resolution image. The
sparsifying dictionary is learned with the use of a training data
set that has been obtained from a collection of high resolution
images, to generate a global dictionary. When compared with
the existing techniques, the proposed method yields much better
results, both visually and quantitatively.

Keywords— Compressed sensing, sparse representation,
dictionary, single image super-resolution, matching pursuit
In recent years, image super-resolution has been an active
area of research in the signal processing community. Super-
resolution (SR) refers to recovering the high resolution (HR)
image from a single or multiple low resolution (LR) images of
the same scene, that have lost information embedded in the
higher frequencies during acquisition, transmission or storage
[1],[2]. Conventional reconstruction based SR methods
require alignment and registration of several LR images in
sub-pixel accuracy. The performance of these algorithms
degrades when the number of available input images is small
or the desired magnification factor is large [3]. In contrast,
single-image SR requires only one LR input to determine its
HR version, and therefore becomes more practical for real-
world applications. This is because there are some
applications such as in medical imaging, where we may have
to generate a higher resolution image from a lower resolution
image. Moreover, the setup for high resolution imaging proves
always expensive, and also it may not always be feasible to
obtain a HR image due to the inherent limitations of the
imaging sensor, optics manufacturing technology, etc. These
problems may be overcome through the use of image
processing algorithms, which are relatively inexpensive;
giving rise to the concept of super-resolution. Traditionally,
reconstruction-based SR algorithms [4], [5] define constraints
for the target high-resolution image to improve the quality of
image reconstruction. They require alignment and registration
of several LR images in sub-pixel accuracy. However, ill-
conditioned registration, and inappropriate blurring operator
assumptions limit the scalability of this type of approach.
While methods which introduce additional regularization
alleviate the above problems, their performance will still be
limited by the number of LR images/patches available. Also,
the magnification factor is typically limited to be less than 2
for this type of approach and may thus result in overly smooth
images, lacking important high-frequency details.
In the literature, there are basically two kinds of algorithms
for achieving single image SR [2]. They are as follows:
1. Learning-based or example based super-resolution
algorithms [6], [7].
2. Reconstruction-based super-resolution algorithms [4], [5].

Learning based SR algorithms require dictionaries for up-
sampling an LR image to an SR image. They, typically,
require database of millions of high resolution and low-
resolution patch pairs to make the database for building the
dictionary. Moreover, they are computationally intensive,
often results in blurring effects, due to over-fitting or under-
fitting at the time of reconstruction process. Algorithms in the
second category, reconstruct an SR image by interpolation
from the LR image. It consists of standard bilinear and bicubic
interpolation techniques, and the back projection method [8].
However, these algorithms also result in poor reconstruction
of the high frequency features in the image [4].
In this paper, we show the application of compressed
sensing (CS) to reconstruct the SR image from a single LR
image. In [2], [9], the authors assume that the patches in the
LR image are sparse, when represented on a suitable basis set.
The work in [2] solves the ill-posed SR problem through the
sparsity induced regularization within the framework of
compressed sensing. The major difference between the
proposed method and the method in [2] is that we seek the
sparsity of the image patches taken from an LR image using a
globally learned over-complete dictionary instead of a fixed
basis set, such as the wavelet. Then a sparse reconstruction
problem is solved, governed by the theory of compressed
sensing, to obtain an HR image patch. The HR patches are
finally combined through weighted averaging to obtain the SR
image. Thus, the proposed method amalgamates the fields of
learning, sparsity, and compressed sensing into one cell, and
work in unison to solve the problem of Single Image Super-
The rest of the paper is organized as follows. Section II
gives a brief overview of sparse and redundant representations.
Section III introduces the concept of compressed sensing in
signal processing and mentions a few applications of it. Image
super-resolution using compressed sensing has been discussed
in Section IV. Section V describes the proposed algorithm.
Experimental results are discussed in Section VI. Finally,
Section VII concludes the paper.
Basically, sparse representation models a signal as a large
dimension vector with a few non-zero components. In the
Sparseland model [10], signals are represented by linear
combination of atoms from a so called over-complete
dictionary. Formally, an over-complete dictionary is a
collection of prototype signals, called the atoms, such that the
number of atoms exceeds the dimension of the signal,
meaning any signal can be represented by more than one
combination of different atoms.
Let us consider an over-complete dictionary matrix
n K
e Ψ containing K atoms ( ) K n > and an image patch
of size n n × , arranged as a column vector
R e x . If α
represents the coordinate vector with respect to Ψ that
generatesx , then
. = x Ψα (1)
Such a system of equations is ill-posed and has infinite
solutions [10], [11]. Out of these solutions, we require that
solution which has the minimum number of non-zero
components. This requires imposing extra constraints forcing
the Sparsity. Thus the problem of finding sparse
representation is an optimization problem and can be written
0 2
ˆ min subjectto or , c = = ÷ <
α α x Ψα x Ψα
where c is the allowed root mean-square reconstruction error.
 - norm
. essentially counts the non-zero entities inα.
This optimization problem can be written in Lagrangian form
as [10]:

2 0
ˆ argmin µ = ÷ +
α x Ψα α (3)
Thus, we can obtain a unique solution by employing sparse
representations that leads to the solution of many problems,
like the data redundancy. As seen in the above, the selection
of the dictionary Ψis crucial for the sparse representation of
the signal. It can be selected either from a fixed basis set, such
as, the DCT, the KLT, the DWT, etc. or may also be learned
adaptively from a selected data set e.g. the K-SVD [11].
It is a new data acquisition theory [12]. It is based on the
property that a signal with sparsity is highly compressible.
Data can be condensed into a very fewer measurements.
Nyquist-Shannon theorem needs a signal to be sampled at a
rate which is at least two times faster than the signal
bandwidth. On the other hand, the theory of compressive
sensing or compressed sensing (CS) needs only a fewer
measurements to encode it. It makes the processing faster and
also decreases the storage space. The application of CS in
signal processing and other related fields of science and
engineering is increasing rapidly, ranging from MRI [13] in
medical science to sensing systems used in war. It is finding
applications in video processing [14], and compressive
imaging [15]. CS has been used in the computer vision and
graphics to solve problems in face recognition [16] and, so on.
How the concept of compressive signal can be used in
image super-resolution? To exactly understand this problem,
let us consider x be any real-valued, finite-length, discrete-
time signal. Dimension of x is 1 n× . It can be represented in
terms of an orthonormal basis set of 1 n× vectors{ }
i =
ψ .
Using the n n × basis matrix
1 2
[ | | | ]
= . Ψ ψ ψ ψ with
the vectors { }
Ψ as columns, a signal x can be expressed as

or ,
i i
= =
x ψ α x Ψα (4)
whereα is the equivalent representation of x in Ψdomain
with dimension 1 n× which denotes the weight coefficients
= α Ψ x .
Here, the signal x is assumed to be k -sparse, which means
it is a linear combination of only k basis vectors; only k of
the coefficients in (4) are non-zero or significantly large to
represent the signal.
The compressibility of k -sparse signal, forms the basis of
transform coding [17]. In conventional data acquisition system,
transform coding demands the entire set of coefficients αof
the signal x to be computed via
= α Ψ x , in spite of
knowing that only k of them are significant. After locating the
k largest coefficients and discarding the remaining; the k
values and locations of the largest coefficients are encoded.
This may introduce an overhead. Moreover, one of the other
inefficiencies is that k may be much smaller in comparison to
On the other hand, compressive sensing considers only a
fewer measurements of x . It is achieved through a linear
measurement process that computes m n  inner products
between x and a collection of vectors
{ }
j j =
φ [17]. This is
represented by
, =  x Φx (5)
where Φ is known as the sampling or sensing matrix of
dimension m n × . So, substituting (4) in (5), we can represent
the down sampled signal  x as
, or, , = =   x ΦΨα x Aα (6)
where the two matricesΦand Ψare mutually incoherent, and
= A ΦΨis the combined equivalent matrix of size m n × .
It is to be noted that in image super-resolution using CS, we
can assume either a redundant or an orthonormal basis set for
the sparse representation of the signal. A pictorial
representation of the sampling process, and the compressed
sensing process for an orthonormal basis set, such as the DCT,
the wavelet, etc., are shown in Fig. 1 and Fig. 2, respectively.

Now, if we consider the down sampled signal  x as the low
resolution input, we can solve (6), to get ˆ α. After obtaining
the sparse coefficients, we can apply ˆ Ψα to get ˆ x , the
required HR output image.
To ensure reconstruction by CS, the combined matrix
ΦΨ should obey an important property, known as the
restricted isometry property (RIP) which assures its
orthonormality. It is defined as follows.
A. Restricted Isometry Property
Any arbitrary matrixM, of size m n × andm n  , we
cannot solve = y Mz , until and unless Mis said to meet the
RIP condition [12], [17]:

2 2
(1 ) (1 ) ,
o o ÷ s s + z M z z (7)
with parameters ( , ) k o , where (0,1) o e for all k -sparse
vector z . In words, for proper values ofo , the RIP ensures
that a measurement matrix will be valid, if every possible set
M with k columns of Mforms an approximate orthogonal
Until and unless this condition is satisfied, we cannot use
compressed sensing framework to solve the sparse
optimization problem in (6) for m n 

[2], [17].
B. Selection of Φand Ψ
A number of possible pairs for Φ andΨ , include the
identity matrix and the DCT matrix, the noiselets matrix and
the wavelet basis, etc.
In the proposed method, we solve the CS problem by
considering the measurement matrix Φ from one of the
following: the noiselets [18], the random matrix with the
columns vectors uniformly at random on the unit sphere [17],
and the spike basis. Similarly, the representation matrix or
dictionary Ψ from one of the following: the DFT, the
overcomplete DCT [10], and the overcomplete learned
dictionary using the K-SVD [11].
C. Reconstruction Algorithm
If 2 m k > and the combined matrix Ameets the RIP, then
according to the theory of CS, the problem in (6) can be
solved uniquely for the sparsestα, that satisfies the equation.
Thus, we can find the desired ˆ αby solving the following

minimization problem:

argmin subjectto = 
α x Aα (8)
In the present problem, given an initial low resolution
image x , we would like to solve (8), to get the high resolution
imagex . This could be achieved through any method for the
representation of a sparse signal. Among other techniques, the
greedy pursuit algorithms are preferred over the convex
optimization method [19], as the former is faster to implement.
The two important greedy pursuit algorithms are, namely,
the Matching Pursuit (MP) [20] and the Orthogonal Matching
Pursuit (OMP) [21]. We have implemented the OMP as it
does not possess the possibility of selecting the same atom
repeatedly, like the MP, and hence faster convergence.
In the proposed algorithm, Ψ is learned from randomly
selected image patches taken from a database of high
resolution images other than the test images, using the K-SVD
algorithm. The K-SVD algorithm is trained using image
patches of reasonable size. This is because training a large
dictionary using the K-SVD algorithm would be
computationally intractable and difficult [10]. The sampling
matrix Φ is selected as discussed in subsection IV-B. After
learning a globally-trained K-SVD dictionary, the unknown in
(6) is only the sparse coefficient vector
α corresponding to
each patch
 x of the LR image. Therefore, it can be solved
using the OMP and the given matrices Φ and Ψ . After
α corresponding to all the patches, the reconstructed
image ˆ x can be obtained by the weighted averaging operation
given by

ˆ ˆ ,
i i i i
i i
| | | |
= u
| |
\ . \ .
¿ ¿
x L L L
L is a binary matrix that extracts a patch from the

Fig. 1 Sampling process

Fig. 2 Compressed sensing process
The overall algorithm for single image super-resolution
using compressed sensing can be summarized as follows:
1. Down-sample the given image by a specified factor (2, 4,
8, etc.). Take the down-sampled image as the LR input.
2. Construct a sampling matrix Φas discussed in subsection
3. Build a dictionary Ψfor the sparse representation of the
signal using the methods described above.
4. Obtain the resultant dictionary = A ΦΨ.
5. Obtain overlapping patches
 x from the low resolution
input image x .
6. Perform the OMP on each patch to get
α .
7. Finally, reconstruct the required high resolution image
using (9).
We have carried out a series of experiments with the
standard test images of “Lena ( 512 512 × )”, “Barbara
( 512 512 × )”, “Boat ( 512 512 × )”, “Peppers ( 256 256 × )”,
“House (256 256 × )'', and “Cameraman (256 256 × )”.
We construct different sensing matricesΦ
namely, the spike
basis, noiselets, and random (using “rand function” in
MATLAB) to get a better idea of its influence in the process
of comparison. The size of this matrix depends upon the size
of the “patches” formed out of the down sampled image
(example: for 2 2 × patch size, the measurement matrix is of
size4 64 × , and so on). The columns of Φare normalized to
have the unit norm.
A dictionary Ψof size 64 256 × is learned using the K-
SVD algorithm, from 100 0008 8 × image patches, randomly
selected from a database of high-resolution images, (sample
images include “couple ( 512 512 × ), “Hill ( 512 512 × )”,
“Aerial (256 256 × )”, “Clock (256 256 × )”, etc.). We also take
dictionaries obtained by the overcomplete DCT (64 256 × ),
the overcomplete DFT, the orthonormal DCT (64 256 × ), and
the orthonormal DFT ( 64 256 × ) for comparison of the
performances of the proposed algorithm. The columns of Ψ
are also normalized to have the unit norm.
We carry out the following experiments to study the
performance of the proposed algorithm.
A. Comparison between reconstructed images by taking
different sensing matrices:
We performed a cumulative study on the influence of
different sensing matrices viz. the spike matrix, the noiselets
and the random matrix on the quality of the output image. The
matrix constructed is of size ( 16 64 × ) (considering a
magnification factor of 2 and patch size of 4 4 × ) and the
image tested upon is that of “Lena”'. Fig. 3 shows the
reconstructed images obtained using different sensing
The coherency values, for different sensing matrices and
magnification factor, for the “Lena” image, are summarized in
Table I. All these results are obtained with the globally trained
over-complete K-SVD dictionary as the sparsifying basis.
Magnification Noiselets Spike Random
2 0.9625 0.9682 0.9892
4 0.9952 0.9976 0.9998

The RIP criterion says that the sensing and representation
matrices should have the least coherency or the greatest
incoherency between them to yield better result. So, from the
obtained values of coherency as well as from the view of the
output images, we conclude that the noiselets matrix gives the
best result followed by spike and random matrices.
B. Comparison between images of different sizes zoomed
by a factor of 2:
Even though our work involves the use of three different
sensing matrices, namely, the noiselets, the spike and the
random matrices of sizes 16 64 × and 4 64 × (for magnification
factors of 2 and 4, respectively), the results are shown only for
the noiselets matrix of size 16 64 × for input images of size
128 128 × (obtained by down sampling a 256 256 × image size
by 2) and patch size of 4 4 × . This is because the noiselets
gives the lease value of mutual coherence for different
magnification factors. Similarly, the same is true for images of
size256 256 × , obtained by down sampling an image of size
512 512 × by a factor of 2. For the sparsifying basis, a globally
trained over complete K-SVD dictionary of size 64 256 × is
taken for all the above cases.
For images of size 512 512 × , however, when we down
sample them by a factor 2 and then reconstruct their high
resolution versions (magnified by factor 2), the output images
obtained are of high quality with less noise in terms of
blurredness or jaggedness. Fig. 4 shows that the picture
quality is conserved and the output images appear nearly
similar to the original images. On the other hand, images with
size256 256 × , the outputs obtained are degraded by some
amount with the incorporation of noise in the form of
blurredness or jaggedness. Fig. 5 clearly displays this case.
This shows that the reconstruction is highly data dependent.
We may conclude that higher the density of data, higher is the
redundancy among them, and better is the image
C. Comparison between images of different sizes zoomed
by a factor of 4:
This time we have down sampled each of the test images by
a factor 4. The image patch considered is2 2 × . The noiselets
matrix is now of size 4 64 × and the sparsifying basis is of size
64 256 × . Fig. 6 shows the original image512 512 × , the down
sampled image128 128 × , and the output image. The output is
degraded by a little amount in the form of blurredness or
jaggedness. Blocking artifacts are also introduced to some
extent. Fig. 7 shows the original image of 256 256 × , the down
sampled version 64 64 × and the reconstructed image256 256 × .
The output image is highly degraded. A large amount of noise
in the form of blurredness or jaggedness is added which
degrades the image quality further. Therefore, same
conclusions are drawn as above.
D. Comparison between images obtained using different
interpolation methods and the proposed method:
For this purpose, we extract a small portion of the down
sampled Lena image (Fig. 8(a)) and magnifying it by a factor
of 2 to obtain four different images as arranged in Figs. 8(b)-
(d). From visual comparisons, it has been observed that the
proposed method yields the best result in terms of visual
quality compared to other methods. However, it is
computationally the most complex among the four methods
taken for comparison.
E. Comparison in terms of RMSE value:
It is a measure used to assess how well a method is, to
reconstruct an output image relative to the input image. From
Table II, we observe that RMSE is minimum for
512 512 × ”Lena” image, reconstructed using the noiselets
matrix with a magnification factor 2. The reconstructed image
for 256 256 × size using the random matrix with magnification
factor 4 yields the highest RMSE value. Therefore, we can
conclude that the first case will give better image
Image Magnification
Noiselets Spike Random
2 0.0738 0.0744 0.0928
4 0.1036 0.1073 0.1597
2 0.0312 0.0421 0.0586
4 0.0633 0.0680 0.1479
Bilinear Bicubic Proposed
2 0.0409 0.0314 0.0312
4 0.0697 0.0633 0.0627

Table III clearly shows the RMSE values for two
interpolation methods viz. bilinear and bicubic, and the
proposed method. We get better result with the proposed
method in terms of picture quality followed.
In this paper, we have proposed a unified scheme that
blends the ideas of sparse and redundant representations and
compressed sensing to obtain single image super-resolution.
Extensive simulations were carried out to study the
performance of the proposed method both visually and
quantitatively. Results obtained by the proposed method are
very encouraging compared to the existing methods. Future
work is in progress to study the possibility of learning the
sensing matrix in order to improve the performance of the
proposed method further.

This work is funded by the Department of Science and
Technology, Govt. of India through the DST Fast-Track
Project for Young Scientists (No. SR/FTP/ETA-0112/2011
Science and Engineering Research Board (SERB)).
[1] S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution image
reconstruction: A technical overview,” IEEE Signal Processing
Magazine, 2003.
[2] Pradeep Sen and Soheil Darabi, “Compressive image super-resolution ,”
in Proc. of the 43rd Asilomar conference on Signals, systems and
computers, IEEE Press, 2009, 1235-1242 .
[3] R. C. Hardie, K. J . Barnard, and E. E. Armstrong, “J oint map
registration and high-resolution image estimation using a sequence of
undersampled images,” IEEE Transactions on Image Processing, vol.
2, pp. 1621-1633, 1997.
[4] S. Dai, M. Han, W. Xu, Y. Wu, and Y. Gong, “Soft edge smoothness
prior for alpha channel super-resolution,”IEEE CVPR, pages 1-8, 2007.
[5] R. Fattal, “Image upsampling via imposed edge statistics,” ACM Trans.
Graph., vol. 26, p. 95, 2007.
[6] H. Chang, D. Y. Yeung, and Y. Xiong, “Super-resolution through
neighbor embedding,” IEEE CVPR, vol.1, pp. 275-282, 2004.
[7] W. T. Freeman, T. R. J ones, and E. C. Pasztor, “Example based super-
resolution,” IEEE CG & A, vol. 22, pp. 56-65, 2002.
[8] M. Irani and S. Peleg, “Motion analysis for image enhancement:
Resolution, occlusion, and transparency,” Journal of visual
communication and image representation, vol 4, pp. 324-335, 1993.
[9] J . Yang, J . Wright, T. Huang, and Y. Ma, “Image super-resolution via
sparse representation,” IEEE Transactions on Image Processing, vol.
19, pp.2861-2873, 2010.
[10] M. Elad and M. Aharon, “Image denoising via sparse and redundant
representations over learned dictionaries,” IEEE Transactions On
Image Processing, vol. 15, pp. 3736-3745, 2006.
[11] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithmfor
designing overcomplete dictionaries for sparse representation, “IEEE
Transactions on Signal Processing, vol. 54, pp. 4311-4322, Nov. 2006.
[12] E. Candes, J . Romberg, and T. Tao, “Robust uncertainty principles:
exact signal reconstruction from highly incomplete frequency
information,” IEEE Transactions on Information Theory, vol. 52, pp.
489-509, Feb 2006.
[13] M. Lustig, D. Donoho, and J . M. Pauly, “Sparse-MRI: The application
of compressed sensing for rapid MR imaging,” Magnetic Resonance in
Medicine, vol. 58, pp. 1182-1195, 2007.
[14] R. Marcia and R. Willett, “Compressive coded aperture video
reconstruction,” in Proc. of the 16
European Sig. Proc. Conf.
(EUSIPCO), 2008.
[15] K. Egiazarian, A. Foi, and V. Katkovnik. Compressed sensing image
reconstruction via recursive spatially adaptive filtering. in Proc. of the
IEEE conf. on Image Processing, 2007, pp. I - 549 - I - 552.
[16] J . Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face
recognition via sparse representation,” IEEE Transaction on Pattern
Analysis and Machine Intelligence, vol. 31, pp. 210-227, 2009.
[17] E. Candes and M. Wakin. An introduction to compressive sampling.
IEEE Signal Processing Magazine, vol. 25, pp. 21-30, 2008.
[18] R. Coifman, F. Geshwind, Y. Meyer , “Noiselets,” Applied and
Computational Harmonic Analysis, vol. 10, pp. 27-44, 2001.
[19] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic
decomposition by basis pursuit,” SIAM Journal on Scientific
Computing, vol. 20. Pp. 33-61, 1998.
[20] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency
dictionaries,” IEEE Transactions on Signal Processing, vol. 41, pp.
3397-3415, 1993.
[21] R. Y. Pati and P. Krishnaprasad, “Orthogonal matching pursuit:
recursive function approximation with applications to wavelet
decomposition,” In Conference Record of The Twenty-Seventh
Asilomar Conference on Signals, Systems and Computers, 1993.

Fig. 4: (a) Original image (512×512) (b) Down sampled image (256×256) and (c) Up sampled image (512×512) obtained by a
magnification factor 2.

Fig. 5: (a) Original image (256×256) (b) Down sampled image (128×128) and (c) Up sampled image (256×256) obtained by a
magnification factor 2.

Fig. 3: Original image (512×512) obtained by taking sensing matrix as (a) spike matrix (b) noiselets and (c) randommatrix
(b) (c)

Fig. 6: (a) Original image (512×512) (b) Down sampled image (128×128) and (c) Up sampled image (512×512) obtained by a
magnification factor 4.

Fig. 7: (a) Original image (256×256) (b) Down sampled image (64×64) (c) Up sampled image (256×256) obtained by a magnification
factor 4.

Fig. 8 (a): Downsampled image (256×256) and reconstructed images obtained by (b) nearest neighbour (c) bilinear (d) bicubic
and the (e) proposed methods.