Signal Processing 103 (2014) 142–154

Contents lists available at ScienceDirect

Signal Processing
journal homepage: www.elsevier.com/locate/sigpro

Generalized joint kernel regression and adaptive dictionary
learning for single-image super-resolution
Chen Huang n, Yicong Liang, Xiaoqing Ding, Chi Fang n
State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology,
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

a r t i c l e in f o abstract

Article history: This paper proposes a new approach to single-image super-resolution (SR) based on
Received 4 June 2013 generalized adaptive joint kernel regression (G-AJKR) and adaptive dictionary learning.
Received in revised form The joint regression prior aims to regularize the ill-posed reconstruction problem by
30 September 2013
exploiting local structural regularity and nonlocal self-similarity of images. It is composed
Accepted 18 November 2013
of multiple locally generalized kernel regressors defined over similar patches found in the
Available online 27 December 2013
nonlocal range which are combined, thus simultaneously exploiting both image statistics
Keywords: in a natural manner. Each regression group is then weighted by a regional redundancy
Single-image super-resolution measure we propose to control their relative effects of regularization adaptively. This joint
Face hallucination
regression prior is further generalized to the range of multi-scales and rotations. For
Face recognition
robustness, adaptive dictionary learning and dictionary-based sparsity prior are intro-
Joint kernel regression
Dictionary learning duced to interact with this prior. We apply the proposed method to both general natural
images and human face images (face hallucination), and for the latter we incorporate a
new global face prior into SR reconstruction while preserving face discriminativity. In both
cases, our method outperforms other related state-of-the-art methods qualitatively and
quantitatively. Besides, our face hallucination method also outperforms the others when
applied to face recognition applications.
& 2013 Elsevier B.V. All rights reserved.

1. Introduction features for recognition purposes. The SR of face images
is also called face hallucination [1–5].
Single-image super-resolution (SR) refers to the task of The imaging model in the SR problem is generally
estimating a high resolution (HR) image X A Rn from a expressed as
single low resolution (LR) image Y A Rm (lexicographically
ordered vector and m o n). SR techniques are central to Y ¼ DHX þV; ð1Þ
various applications, such as medical imaging, satellite
imaging and video surveillance. They are especially neces- where DA Rmn and H A Rnn are the downsampling
sary for face recognition applications in video surveillance matrix and blurring matrix respectively, and V A Rm is
systems because the face resolution is normally low in assumed to be an additive Gaussian white noise vector.
surveillance video, causing the loss of essential facial Then recovering an HR X from the input LR Y is an ill-
posed problem, and the optimal HR image X can be found
by maximizing the posterior probability pðXjYÞ based on
n
the maximum a posteriori (MAP) criterion and Bayesian
Corresponding authors. Tel.: þ 86 10 62772369 645.
rule
E-mail addresses: yach23@gmail.com (C. Huang),
liangyicong@ocrserv.ee.tsinghua.edu.cn (Y. Liang),
pðYjXÞpðXÞ
dxq@ocrserv.ee.tsinghua.edu.cn (X. Ding), X^ ¼ arg max pðXjYÞ ¼ argmax : ð2Þ
fangchi@ocrserv.ee.tsinghua.edu.cn (C. Fang). X X pðYÞ

0165-1684/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.sigpro.2013.11.042

for example. the nonlocal dictionaries. into a unified framework to produce more compelling Section 2 reviews related works on dictionary and mani- results. By further integrating adaptive dictionary learning under the MAP ^ ¼ arg min‖Y  DHX‖2 þ λCðXÞ. Among them. Section 3 details our G-AJKR framework and its sense. On the other hand. In fact. and example-based methods (usually dictionary induced) The remainder of the paper is organized as follows. SR can be viewed as a regression problem fold learning as the development that follows relies on aiming to map LR images to target HR images. preserving individual discriminativity based on Partial whose connection. discriminativity is often lost and the cients to generate the HR output. steps. CðÞ is usually a smoothness constraint. the HR image how to design a good image prior is always an essential prediction is pixelwise which causes discontinuities and issue. and neglected. discards the further potential enabled by the higher-order tion. between LR and HR patches. codes the prior knowledge that we want to impose in the we proposed an Adaptive Joint Kernel Regression (AJKR) HR space. the based methods (e. and they can all be effectively exploited as a prior for SR 2. Dictionary learning reconstruction. As for the paper in Section 5. Gaussian Process Regression (GPR) [12]. However. [11] Canonical Correlation Analysis (CCA) [4] and so on. Interpolation techniques (e. obtaining a Generalized AJKR (G-AJKR) method.1. the single-image SR methods can be mainly general natural images cannot be readily applied due to categorized into three classes: interpolation-based meth. reconstruction-based methods. combining a set of coherent NLM-generalized formulated as a regularized least-square optimization local regressors in the nonlocal range with higher-order problem as follows: information (i. This ods. details [2. However. Yang et al. Principal Component Analysis (PCA) [2. applied to face recognition applications. The first step reconstructs global faces in the face such methods strongly rely on the chosen dataset for subspace using MAP criterion [11. we address this problem from the similarities and local/nonlocal regularities have been viewpoints of learning good regression priors and robust exploited for more robust estimation. methods. and example-based problem was first addressed in the pioneering work of methods. For example. However.e. The example. Then in this them. Classic manifold learning a co-occurrence prior to constrain the correspondence methods include Locality Preserving Projections (LPPs) [5].3] is Many example-based methods directly or implicity use widely used for face modeling. it only builds kernel regressors at the same scale constraint and the regularization term C(X). They adopted an image pyramid to fast but tend to blur the fine details. [9. [6]) are simple and Baker and Kanade [1]. are the only concerns. Kernel Ridge Regression 2. but is limited in modeling the visual general natural image or face image. pðYjXÞ is modeled by the Gaussian distribu. regional redundancy) injected in.2] or manifold learning satisfactory results.g. of CðÞ to regularize the ill-posed reconstruction problem. NLKR overcomes the draw- backs of the literature [10] by unifying the local and nonlocal Learning a good dictionary is important for example- priors into a single model in a complimentary way. While explored the sparse representation of LR patches over an neighborhood preservation and correlation maximization LR dictionary. the task of SR reconstruction is method. To generate high-quality HR face images. Typically. but the result usually frequently used PCA. however.11–15]) hallucinate detailed textures current face hallucination methods usually involve two from a training set of LR/HR image or patch pairs. and used the same representation coeffi. Besides. the above methods dealing with Currently. the second step produces a residue image to recover Other natural image priors are also studied in the litera.g. Later. Similar to our pre- complexity of real images. learn a prior on the gradient distribution of frontal face based methods (e. ture. the core algorithm should be generalized. Most of the and rotation. The reconstruction. C.g. face. yields results like mean suffers from inconsistency between neighboring patches. Huang et al. this algorithm produces superior results than NLKR and excludes the necessity of separate deblurring. vious AJKR method [18]. while and Nonlocal Means [17] (NLM)-based nonlocal prior. where λ is the parameter balancing the effects of the data However. the ignorance of the special properties of face images. X ð3Þ X 2 framework. thus maximizing pðYjXÞ boils down to minimizing the statistics. / Signal Processing 103 (2014) 142–154 143 Generally. Experimental results of SR bases learned from an external database or the input image on generic and face images with applications in face itself. The gradient profile prior is developed in [8] to This paper focuses on SR from a given LR version of preserve sharp edges. For face hallucination. while regression models directly estimate HR pixels recognition are provided in Section 4.5. but it based methods to do local “regression” using the learned .11]. remains loose. we generalize AJKR in two ways: (1) self-similarity properties both within and across spatial to extend the regression range to multi-scales and rota- scales are fully exploited. priors of image self. p(X) which is ill-posed by itself. We conclude the (kernel learning) or regularize the estimator. but the local regularities are tions. (3) and images using the Bayesian theory. examples include SKR [16]. Related works (KRR) [13] and Non-Local Kernel Regression (NLKR) [14]. [7–10]) follow the form of Eq. which is very important when Another trend in SR is to combine the reconstruction. artifacts. [10] improved by assembling the (2) to incorporate a new global structure prior of human Steering Kernel Regression [16] (SKR)-based local prior faces into the G-AJKR method for face hallucination. In [9]. Zhang et al. In our previous work [18]. To exploit the full potential offered by such past works focus on designing different formulations joint regression. it needs a separate deblurring process data constraint [2] ‖Y DHX‖22 .4. dictionary-based methods do local regression using extension to face hallucination. regression models. Least Squares (PLS) [19].

Specifically.144 C. 2. / Signal Processing 103 (2014) 142–154 Fig. nonlocal self-similarity. A new joint regression prior is learned across scales .Vg faces. but it is holistic and tends to yield faces and multi-modal recognition [26].2. CCA average.Vg max varðXUÞ  varðYVÞ To make better use of image local structural regularity and where covð. ð4Þ fU. 1 shows LR and HR subspaces LR and HR manifolds have similar local topologies so that obtained from the PCA coefficients of real face images. between LR and HR images. imagine an K-SVD [20] and PCA [21]. but the flexibility is still limited extreme case when the first coordinates of X and Y are since the dictionary is only learned to perform well on perfectly correlated with the others almost uncorrelated. CCA finds two On the other hand. Proposed G-AJKR framework for single-image SR covðXU. Þ is the covariance operator. YVÞ ¼ arg max ½varðXUÞ  corrðXU. 1. while recovery in the subspace YV with an unfaithful face. the correlation (local neighborhood similarity of neighbors using the same weights derived in the LR assumption) does not hold well for LPP and LLE. As can the HR image can be represented as a linear combination be seen.3]. Two dimensional embeddings of PCA coefficients of LR (first row) and HR (second row) face images by different subspace methods. making the local topology recovery complete) from an image database using techniques like therein very difficult. [19] was proposed and successfully applied to face recognition tion is PCA [2. Traditional choices are the analytically designed effort on neighborhood preservation or correlation maximiza- wavelet bases which lack sufficient flexibility to a given tion may congregate different neighbors together in the image. Vg . PLS tries to maintain correspondence as well as tion of applying all these methods to infer HR faces is that preserve the variance. making it impossible to recover the neighborhood the learning process easily runs into the risk of building structure. An overview of the proposed method is shown mentioned above suffer from a common drawback: too much in Fig. PLS preserves both correlation and bases U and V to linearly map the two sets of vectors X and discriminativity very well. inspired to use manifold learning to hallucinate global fU. This will result in either wrong neighbor selection dictionaries with many artifacts under image corruptions. To strengthen this assumption in practice.25] and LPP [5] projecting onto a subspace that preserves neighborhood relationships. Partial Least Squares (PLS) The most popular modeling method in face hallucina. Clearly. Recently. Since face images are shown to reside on and V to maximize the covariance a low-dimensional nonlinear manifold.Vg fU. projected subspace. researchers are arg max covðXU. Although the correlation is finds an optimal subspace that maximizes the correlation perfectly maintained by CCA. Other methods learn a dictionary (usually over. YVÞ arg max corrðXU. Y to a common subspace where the correlation is max- imized 3. Online dictionary learning from the given image will give the first coordinate as the principle direction which itself offers a promising alternative to exploit rich informa. both hampering face recognition with reduced discriminativity. also with space. Huang et al. Taking CCA for example. One drawback is that point. a G-AJKR method is proposed in Unfortunately. Manifold learning for face hallucination with a near-mean face. CCA [4] little discrimination preserved. YVÞ2 ¼ arg fU. or indistinguishable reconstruction results in the subspace XU 2.23]. projects all the data points in X and Y to a common single tion contained in the input [22. A major assump. the projections still congregate. It finds normalized bases U like the mean. Typical methods include Locally linear embedding s:t: J U J ¼ J V J ¼ 1: ð5Þ (LLE) [24. CCA and the other subspace methods this section. YVÞ2  varðYVÞ. bases. Fig.

1. which are . and the adaptive dictionary learning is integrated (Fig. (c) extension to face hallucination by introducing a new global face structure prior based upon PLS. / Signal Processing 103 (2014) 142–154 145 Fig. cues from local and nonlocal image priors. 2(a)). 2(b)). C.1. Review of the previous AJKR method global face structure prior into G-AJKR for it to be tailored The AJKR method in our previous work [18] combines towards face hallucination (Fig. where the reference patch is marked as “R”. (b) block diagram of our generic image SR algorithm. Generic image SR measure (Fig.1. and rotations and is weighted by the regional redundancy 3. Huang et al. We also study how to introduce a new 3. (a) Graphical illustration of the G-AJKR framework. 2. 2(c)).

Compared with Zhang et al. 2 where e1 is a vector with the first element equal to one Patch rotation is simply achieved by bicubic interpola- and the rest zero. we further propose an model is formulated as explicit measure of regional redundancy to determine the confidence of each regression group that gives a^ i ¼ arg min ∑ wN 2 ij ‖Y j Φa‖WN . (6) exploits imposing two penalty terms. but in the local neighborhood N ðxj Þ. while Ds is a patch tion CðÞ : upsampling operator with zero-padding [14]. wjJ .θ ðxi Þ when the rotated patch Xj is its similar patch centered at xj found in the nonlocal versions Xs. / Signal Processing 103 (2014) 142–154 inspired by SKR [16] and NLM [17]. Then for the reference patch centered at xi on the current image plane. …. To search at multi-scales. edges by and smooth areas have large values while textures have ! small values. the more Taylor expansion with regression coefficients a. We then use it for adaptive regularization: ‖Y i  Y j ‖2WG wN ij ¼ exp  . respectively. [10]. Let xi denote the where I is the identity matrix. ð6Þ ! a ‖Xi  Xj ‖2WG j j A Pðxi Þ N 2 N Ri ¼ ∑ ðwij Þ . we compare to find its nonlocal neighbors Xj of the same X 2 ij 2 ð9Þ s X i¼1 j A Pðxi Þ patch size J at location 1. the local kernel regression regularizes the 3. J ¼ N N N brings out the inherent dissimilarity between the AJKR jN ðxj Þj instead of the popular spatial kernel matrix WKj method and NLKR method [14].θ j are considered (see Fig. n s ^ ¼ arg min‖Y  DHX‖2 þ λ ∑ ‖X i  ∑ kT Xj ‖2 . To attain the full power of joint kernel regression cies. Obviously. rotations. Ds is a patch downsampling operator which T SR optimization function in Eq. Note that the combined local tive and complete by building a global image-region vision regressors are generalized from NLM as in [27]. Here. we generate a multi- T Defining the row vectors kij ¼ eT1 ½ΦT ð∑j A Pðxi Þ wN N ij W j ÞΦ 1 resolution image hierarchy of decreasing resolutions {Xs} N ΦT  wN ij W j to be the equivalent kernels with which we scaled down by operator Ds with scale factors at 1. adaptive and collaborative framework instead of crudely The joint regression scheme present in Eq. S and a range of rotations ij W j ij W j Y j . we enjoy aim is to “harmonize” such nonlocal fusion in a complete the advantage of better capturing both image priors in an nonlocal sense.1. Proposed G-AJKR method observations found by nonlocal search via structural Image self-similarity tends to occur both at the same regression. Besides. …. and P s ðxi Þ. Concretely. It enables matrix K.15] and the robustness of local estimation by providing redundan. θ2 . (9) varies significantly across differ- local neighborhood N ðxi Þ. wij ¼ exp  2 . i. and Y i is the patch vector of pixels in xi0 s joint regression in Eq. and across different scales [9. the search space of the original algorithm is T T ð8Þ extended from (xi ) to (xi . It performs regression on each similar patch distances due to the way wij are calculated. The pixel method to a larger range of multi-scales and rotations in estimation at xi is the first element of the vector solution addition to just translations.25-s.e.2. generalizing the translation filed to a mapping f : R ↦R4 . The kernel weight wji is calculated in the same inter-group variance rather than in a blindly “group-wise” way as in Eq. X 2 2 ð10Þ methods by simultaneously exploiting both image priors X in a higher-order collaborative manner. This where WG is the weight matrix of a Gaussian kernel. we obtain a generalized joint kernel regression . keeps the patch center on the LR grid. it can account for the N in SKR [16]. wj2 . similar the grouped patches are and the more patch N bines them using the patch similarity weights wij defined redundancy there is in the nonlocal region.θ. T range. local and nonlocal priors simultaneously and collabora- tively. tion. (6): compare the upright unscaled reference patch with the " ! #1 target one that is scaled and rotated around its center. Usually. so the at higher level. Huang et al. j A Pðxi Þ j A Pðxi Þ θ A ½θ1 . Yi denotes the pixel Considering the degree of patch redundancy for the observation at xi. and makes our idea of nonlocally joint regression more adap- hn is the decay parameter. The way as in NLKR. Rn . we extend the AJKR are more consistent with this nonlocal fusion. R2 . X ð12Þ 2 2 R hn X where the diagonal matrix R ¼ diag½R1 .s). ð7Þ ^ ¼ arg min‖Y DHX‖2 þλ‖ðI KÞX‖2 . S. ð11Þ where Yj is the similar patch to the one at xi (including j A Pðxi Þ hn itself) found in a nonlocal range Pðxi Þ within the same which can also be regarded as penalizing the patch N image scale. we can obtain the matrix form: more reliable and robust results than similar unifying ^ ¼ arg min‖Y  DHX‖2 þ λ‖ðI KÞX‖2 . To z^ ðxi Þ ¼ eT1 a^ i ¼ eT1 ΦT ∑ wN N Φ ΦT ∑ wN N search a range of scales sA ½0. (7). using the polynomial bases Φ (say second-order) from the smaller the distances are (the larger Ri). This leads to a generalized nonlocally similar patch set where Xi denotes the pixel to be estimated at location xi. perform the regression for xi. and com. the kernel regressors generalized from NLM enabled by the image self-similarity.146 C. we can plug them into the s A ½0. The regional redundancy measure also kernel weight matrix is WN j ¼ diag½wj1 . 2(a) for example). and further to P s. location of the ith pixel in the HR grid.25 xj on the image plane of Xs. while the nonlocal self-similarity enhances scale and rotation. we additionally of Eq. Clearly. Then the joint kernel regression ent regions within an image. (3) to act as the regulariza. By properly arranging kij to the equivalent kernel This way.

Once the dic- " ! #1 tionary B ¼ ½B0 B1  A RJd is built. 3(b)).θ is the corresponding kernel weight matrix.θ ij ‖Xs. we adopt the adaptive PCA strategy in [21] S θ2 to learn B0.3.θ ðxi Þ in B such that Xi ¼ Bαi .s.θ j : ð15Þ concatenation of all αi . Huang et al.θ ðxi Þ learned from patches only at the same scale and rotation The equivalent kernels for the regression-based regu.θ ij WN. we initialize it by bicubic interpolation to mainly over the bases in B0 because those in B1 learned generate the pyramid images. the online dictionary B1 dominates for 3. In doing so. The superiority of ‖Xi Xs. our ð17Þ single kernel matrix K. / Signal Processing 103 (2014) 142–154 147 model dictionary learning scheme by combining the offline dic- S θ2 tionary B0 (learned from external database. and on the other hand. patch. wN.s. This The above G-AJKR framework can further benefit from adaptive mechanism not only guarantees to provide the best dictionary-based methods that introduce natural image possible solution for individual patches (never worse than priors from the learned bases.s.0 . but also makes our framework more Fig.θ image patch Xi A RJ as a linear combination of the atoms ðkij ÞT ¼ eT1 ΦT ∑ ∑ ∑ wN. it should be as in [21. and also apply PCA to the already grouped Ri ¼ ∑ ∑ ∑ ðwN. in the above G-AJKR process to avoid large variance or larization in Eq. While at the first iteration the HR image (17) to sparse coding only at that patch.s. Let the whole image be the average of all the overlapping patch estimates. see a s ¼ 0 θ ¼ θ1 j A P s. 3(a)) a^ i ¼ arg min ∑ ∑ ∑ wN. At each iteration t. imposes a strong effect of the joint regression prior.s.θ j Φ ΦT s ¼ 0 θ ¼ θ1 j A P s. the true signal space to accurately represent the patches. (11) become learned rich priors that counterbalance the possible “out- ! liers” in B1 due to the input corruptions. 2(b) shows the block diagram of our G-AJKR its ability to adapt to the G-AJKR prior in response to algorithm. We s then add them up using the patch downsampling operator ~ Jα‖2 þ β‖α‖1 . We here adopt an adaptive sparse coding only).1.θ ij WN.θ ij Þ2 : ð14Þ similar patches to learn B1 as in [22. Specifically.22].23].θ j ‖WG 2 this scheme is illustrated in Fig. we can reach to our final optimization s. B0 introduces robustness by the nal redundancy in Eq.θ ij ¼ exp  2 . B1 goes beyond the universal nature where WN. not just X0 ¼ X.S g). 2(b)). (12). α^ ¼ arg min‖Y  DHBJα‖22 þλ‖ðI  KÞB α R Ds and upsampling operator DsT and pack them into a ~ By plugging it into Eq. 3. thus reducing Eq.θ ðxi Þ j Fig.θ noted that the regression kernels ðkij ÞT are applied to the function for SR as pyramid {X } of different resolutions. we can represent an S θ2 s. αi A Rd . (9) can be rewritten as reduced descriptiveness of the dictionary. C. (c) PSNR curves (  3) versus iterations for different dictionary learning schemes on the lena image corrupted with a Gaussian blur (sb ¼ 2) and Gaussian noise (sn ¼ 5). X What distinguishes our dictionary scheme from others is Fig. …. . ð13Þ and online dictionary B1 (learned from input image. we construct the image for a given one. Generally. the large Ri robustness of joint structural regression.θ 2 j  Φa‖WN.s.θ . (b) examples of the online PCA dictionaries B1.s. where the 8 first atoms are shown. Note that B1 is s ¼ 0 θ ¼ θ1 j A P s. rotations and scales can give rise to the stability and While in the case of high patch redundancy. online from mutually dissimilar patches deviate too far from dancies offered by the 4D search across translations. hn Specifically. α is the wN. see Fig. of B0 and hence gives rise to the adaptivity to any given j and the patch similarity together with the induced regio.s. which is actually solved iteratively with our our regional redundancy measure in a unified framework dictionary learning (see next) by the iterative shrinkage (Fig. and ○ is the representation operator To derive the solution in matrix form. (a) Centroids of the offline PCA dictionary B0. which is performed is not available. if no/few similar patches are found algorithm [28]. get fXt.s. the more redun. its near-zero redundancy measure Ri cancels hierarchy from the current HR image estimation Xt (so we out the erroneous joint regression prior. Adaptive dictionary learning enforcing the remaining sparse representation prior. final G-AJKR model is given as where β is the regularization parameter of the additional ~ ^ ¼ arg min‖Y  DHX‖2 þ λ‖ðI  KÞX‖2 R: ð16Þ X 2 sparsity term. on the other hand. Xt. 3(c).

where the first step con.25. Face hallucination i ¼1 Subsequently.148 C. From the fold learning methods LLE and LPP potentially suffer from a Bayesian viewpoint in Eq. both qualitatively and quantitatively. 901g correlation is maximized with the discriminativity pre. The learned subspace can face reconstruction instead of simple bicubic interpolation introduce a strong facial prior to p(X) in Eq. For solving the optimization problem in Eq.5. the commonly used mani. Following [2. we . 2(c) illustrates this relationship which is bene. Thus are carried out in terms of the metrics of PSNR and LR/HR faces are represented as linear combinations of Structural SIMilarity index (SSIM) [29]. it is the first time that PLS strains and reconstructs the global face in a discriminativ. incorporate a good global face prior. As a result. Since the high dimensionality of the 4D search space imposes a large L H cLi ¼ VT bi . G-AJKR algorithm follows to recover details. apply the 4D search only to patches with high intensity ficial to the neighbor-based reconstruction. A Gaussian noise (sn ¼5) was also ∑j0 Ai0 j01     added. Then the proposed As mentioned in Section 2. we only set S¼3 (i. As a result. Since CCA only local/nonlocal priors to p(X) and applies the reconstruc- correlates the LR and HR subspaces. 4 hierarchy where L ci and H ci are the subspace projections whose levels) and rotation range θ A f  901. Then we compare with several related as well as state-of- neighbor-based reconstruction in such subspace will gener. 451. We first give an balance between the objectives of CCA and PCA by main. For the CAS-PEAL database. 2(c). we served.2 GHz. hn ¼15. (17). with the corresponding mean illuminance channel only.e. Suppose we have collected from training set the PCA In all experiments we use HR patches of 7  7 pixels L Q H Q (L¼49) with a 4-pixel overlap for both local kernel coefficients {bi }i ¼ 1 and {bi }i ¼ 1. Experimental results may not be faithful to the ground truth but are more like the mean features or unexpected ones. To the best of our knowledge. (17). as shown in Fig. (2) describing is implemented as the first step. This is due to that the SR effect is LR face image Y with its PCA coefficients bl computed. which strikes a method G-AJKR for single-image SR. grouping intermediate subspace errors. We set the support of L the nonlocal searching to be 15-nearest neighbors in a covariance using Eq. Given an input variance (threshold Δ ¼ 16). ð19Þ computational burden. Then for cl we seek its M and textures) rather than the uniform and low-frequency M ones (we apply 2D search for them). the-art algorithms. the face hallucination H H H X ¼ μ þE b : ð18Þ and recognition performance are presented.g. differentiate between the subspace projections. (2). edges obtain its PLS projection cl ¼ VT bl . M ch ¼ ∑ γ i0 cH i0 : ð22Þ 0 3.2.2. on several standard test images and ate faithful (or unique) visual features. is used for the face hallucination problem. the “hallucinated” facial features by these subspace methods 4. (5). The  M  M arg mincl  ∑ γ i0 cLi0  . G-AJKR further imposes great loss in the discriminative ability. 01. we γ i0 ¼ 1 . Pentium IV) with our unoptimized MATLAB code.4. then PLS is applied to them to find two normalized bases V and U to maximize the regression and patch matching.   Other parameters are set as λ¼40. ∑lm Alm Our face hallucination experiments were performed on M Using the same weights for the corresponding fcH g i0 i0 ¼ 1 . the task of face recognition. 451. to noise is also evaluated. Besides. Projecting the PCA coefficients ci H and ci by these bases into a common subspace. / Signal Processing 103 (2014) 142–154 tolerable to noise or blur that may cause e. it typically nearest neighbors fcLi0 gi0 ¼ 1 in the trained subspace and corresponding weights γ ¼ fγ i0 gM takes about 4 min to process a 256  256 image on a PC i0 ¼ 1  2 (3. Since in real- eigenfaces using coefficients bL and bH world SR tasks the observed LR images are often contami- L nated by noise. Ai0 j0 ¼ cl  cLi0  cl  cLj0 : ð21Þ use the iterative shrinkage algorithm [28]. cH T i ¼ U bi . we have window of size 21  21 across all pyramid levels.6) and down- The closed form solution of the weight is given by [25] sampled by a factor of 3. to the main characteristics of face images. it can also fail to tion constraint for pðYjXÞ using Eq. We here propose to use In this section we evaluate the effectiveness of our PLS [19] to learn an intermediate subspace. The PLS-based ity preserving subspace. s:t: ∑ γ i0 ¼ 1: ð20Þ fγg  0  0 synthetic LR images were generated from original images i ¼1 i ¼1 2 by a truncated 7  7 Gaussian kernel (sb ¼1. therefore introducing a global facial structure prior can be bh ¼ ðUUT Þ  1 Uch . the robustness of SR methods with respect Y ¼ μL þ EL b . we also propose a two- step face hallucination method. where the quantitative results faces μL and μH .g. Fig.11]. we expressed mostly in highly detailed image regions (e. β¼0. (discrimination). and orthogonal eigenvectors EL and EH. Huang et al. without much performance degradation. Finally. which are crucial for real-world images. By using the co-occurrence assumption. illustrative example to demonstrate the superiority of the taining correlation while capturing the projection variations generalization to the range of multi-scales and rotations. we can reconstruct the PCA coefficients The face images differ from the general natural images bh of the HR image and the hallucinated HR global face image X^ as in that the face images are more regular in structure. frontal view face images from the CAS-PEAL [30] database we can reconstruct the HR projection features in the and FERET [31] database. We first learn PCA models for For color images all the test algorithms are applied to the the LR/HR training images. ^ ¼ μH þ EH bh : X ð23Þ conductive.

for the Fig.825 CSR 23.78/0.28/0.29/0.749 27.837 32. PSNR/SSIM) for the noiseless case.71/0.672 24.856 30. [7].761 27. KRR [13].795 KRR 22. NLKR [14] and Zhang patches from the HR training images and learned the et al. All results of AJKR and G-AJKR using different scales and the face images were aligned by the eye positions and rotations. 24.34/0.695 23.42/0.821 30.24/0.810 29. 22.873 . two dictionary-based methods.62 dB.821 30.782 27.867 SC 23.03/0.860 30.68/0.900 32.903).96/0. Q¼800 images were randomly selected the basic algorithm on noisy Parrot image.90/0.883 31.698 24.797 28.75/0.44/0.01/0.790 29. Shan 200 PLS bases are kept for the subsequent projection. (d) G-AJKR across rotations and scales (PSNR: 29.681 24.896).749 27.78/0.943 33.92/0.36/0.72/0.63/0.49/0. [9] and Freedman and Fattal [15]. with the G-AJKR only across composed of LR/HR image pairs.38/0.828 Shan et al. / Signal Processing 103 (2014) 142–154 149 randomly selected Q¼500 images (one per person) with 4.816 30.940 34.775 31.12/0.40/0.37/0.74/0. In the global face reconstruction tralized Sparse Representation (CSR) [22] and Sparse Coding phase. and three state-of-the-art methods.918 34.78/0. The et al.796 31.741 26.91/0.888 32.876 31.717 24.742 27.71/0. 22.61/0. the GPR and KRR are two recent regression Fig.49/0.921 G-AJKR 25.45/0. We compare the for subspace training with other 403 images for testing. Among them. we sampled based methods.807 Freedman and Fattal 22.734 27.48/0.48/0.878 31.35/0.897 CSR 24.855 32.847 31.764 29.59/0.96/0.23/0.46/0.79/0.800 30.829 Glasner et al.33/0. Glasner et al.804 28.635 22.921 33.16/0.844 30.739 24.68/0.1.620 22.93/0.837 31.08/0.839 28.829 29.00/0.805 Zhang et al.872 Shan et al.12/0.28/0.60/0.84/0. SSIM: 0. SSIM: 0.711 23.788 28.85/0.03/0. PSNR/SSIM) for the noisy case.841 32.11/0.85/0.86/0.95/0.73/0.786 26.75/0.72/0.97/0. GPR [12]. To prepare the offline dictionary Next we compare our method against four regression- B0 for our G-AJKR algorithm in this scenario. SSIM: 0.30/0.90/0. 22.64/0. It is shown that the multi-scale-and-rotation mouth center and cropped to 128  96 pixels (HR) and version of AJKR preserves sharper edges and more faithful 32  24 pixels (LR). (c) G-AJKR across rotations but at the same scale (S¼ 0) (PSNR: 29.736 26. we use PCA which retains 98% of the variance and (SC) method [11].67/0.44/0.749 28. Huang et al.61/0. and other 40 images (disjoint from the training set) for testing.860 G-AJKR 24.63/0. 4.830 31.892).16/0.94/0.73/0. Methods Bike Butterfly Girl Parrot Plants GPR 21.896 32.879 KRR 23.883 31.26/0. Generic image SR experiments normal expression to train the PLS subspace.86/0.787 30.83/0.903 32.84/0.850 31.826 30. Comparison of SR results (  3) on noisy Parrot image obtained by AJKR [18] and G-AJKR using different scales and rotations.48/0.78/0. [10]0 s method.940 Table 2 Comparison of SR results (  3.841 Glasner et al.47/0.878 31.25/0.95/0.45/0.19/0.22/0. Methods Bike Butterfly Girl Parrot Plants GPR 21. neighborhood size is set as M¼50.57/0.765 26.711 25.771 27.856 30.67 dB. and we zoom the 32  24 rotations being in-between. (a) LR input.906 32.04/0.832 Freedman and Fattal 22. (b) AJKR (PSNR: 29. Cen- dictionary following [21].00/0.854 29.685 24.77/0. Table 1 Comparison of SR results (  3.802 28.872 Zhang et al.61/0.736 27.826 30.47/0. LR test image by 4 times.816 31. Then the training set for PLS learning is details than the original AJKR.699 24.682 24.93/0. 4 validates the efficacy of our 4D generalization of FERET database. C. 23.868 30. 22.827 SC 22.68/0.78 dB.42/0.

and we used between LR and HR patches via Gaussian process regression their default parameter setups. The PSNR and SSIM results of all methods compared can be seen accordingly in Tables 1 and 2. (a) Noiseless case. 5. 6. respectively. Visual comparison of SR results (  4) on real LR images. Huang et al. All the compared Table 1 and 2 show the quantitative results for the noise- results were reproduced by the codes provided by the less and noisy cases. / Signal Processing 103 (2014) 142–154 methods that capture from input image the mapping authors or downloaded from their websites.150 C. and sparse kernel regression. respectively. Fig. (b) noisy case. Visual comparison of SR results (  3). Since the implementation of Fig. .

(d) result using the G-AJKR algorithm only. Huang et al. C. 7. (b) reconstruction result directly in the PCA space without projection onto other subspaces. (e) PCA þCCA. (c) PCA þLPP. (f) PCAþPLS. (b) global hallucination result. (g) original HR image. (d) PCAþLLE. . Visual comparison between our two-step face hallucination algorithm with only the G-AJKR algorithm on the CAS-PEAL database. Fig. (a) LR input. / Signal Processing 103 (2014) 142–154 151 Fig. (a) LR input. Visual comparison of hallucinated global faces by different subspace methods on the CAS-PEAL database. 8. (e) original HR image. (c) result using the two-step algorithm.

magnify noise during image up-sampling.32 dB). 5(b) shows the noise robustness of our method while can be seen from the tables. (b) Wang and Tang (average PSNR: 25. As Fig. and Glasner et al. Huang et al.15 dB as in our previous AJKR case [18]). which lead to lower results than ours.89 dB (noisy) on average which are quite significant.0 s method. our method constantly outper. We can see that our method outper- also the noise need to be suppressed. This is due to our coherent learned face subspace which incorporates the special and collaborative use of regression priors and the rich properties of faces and compensates for the lost informa- redundancies found across scales and rotations.84 dB). 7 compares different subspace is also free of jaggy and ringing artifacts with the SC and methods for global face hallucination.2. (d) NE (average PSNR: 30. our adaptive dictionary learning scheme (it usually Clearly.94 dB). Shan quantitative comparisons. In comparison to like the original face but like a fused one. after projecting PCA coefficients onto subspaces improves the baseline performance of G-AJKR by about learned by LPP. whose subspace CSR methods respectively. not only must image details be recovered. which means the Fig. our improvements are also evident.152 C. Fig.27 dB). LLE and CCA the resultant faces do not look 0.33 dB). (average PSNR: 29. In Fig. (a) LR input. Face hallucination and recognition performance Fig. preserving edges and small details. For challenging results (  4 magnifications) on real LR images the noisy input. Methods like SC tend to forms all the others in terms of visual plausibility. (f) our method (average PSNR: 32. this can be forms the others across all metrics for both cases. (c) Zhuang et al.09 dB (noiseless) self-similarity and the responsive dictionary scheme. More and 2. . blur compared with the three regression methods GPR. which validates the benefits of dimension and neighborhood size are the same with ours. we do not include this method in our the state-of-the-art methods of Freedman and Fattal. 5 visually illustrates the performance difference of our method to the others. but are shown in Fig. Visual quality comparison with its et al. reported result on a real-world image will be shown next. 4. Again. Comparison of hallucination results on the CAS-PEAL (rows 1–2) and FERET (rows 3–4) databases along with the corresponding average PSNR values. with the attributed to the full and adaptive exploitation of image largest PSNR improvements (over GPR) at 4. (e) SC (average PSNR: 30. Our result tion in the input. 5(a). 9. / Signal Processing 103 (2014) 142–154 NLKR is not available. (g) original HR image. Our two-step method first hallucinates a global face in the KRR and Zhang et al. our method Face hallucination can handle more challenging tasks synthesizes more visual details and sharpen edges without than generic image SR due to the regular face structure. 6.

Gabor features are fed into the classic No. We compare our method with the methods of in Fig. 10.g. We used 300 training image of Zhuang et al. The SC method differs in applying this the local and nonlocal image priors in a higher-order prior to the sparse representations of image patches. Comparison of recognition results (ROC curves) for different face details without the global facial structure prior. Since our method can firstly infer a global face without Zhuang et al. unlike CCA. In order to demonstrate the advantage of discriminativity preservation shared by our face hallucination method. NE and SC are two patch-based methods for generic image SR. SC and our method. A more thorough comparison with recent representa- tive methods is conducted on both CAS-PEAL and FERET Receiver Operating Characteristic (ROC) curves are plotted databases. It shows that the gap between the verification Wang and Tang [3]. 8(c) and (d) compares the two-step approach with the direct G-AJKR method only to validate the necessity of the first step. a direct application of the generic G-AJKR method cannot generate sharpen edges and sufficient Fig. An adaptive dictionary Fig. which leads to faithful hallucination purely to good visual quality does not necessarily lead results as well as high face recognition performance. and is further generalized to the default parameters were used for these two methods. Fig. As can be seen. Hallucinating faces. suggesting the robustness of our of-the-art performance and is also successfully applied to method to face variations (e. PLS realizes the tradeoff between PCA and CCA to generate smooth and faithful faces which are not only distinct but also similar to the original faces. They are here selected for compar. Our method significantly improves the proposed a global hallucination method based on eigen. range of multi-scales and rotations. Conclusions ison due to the similar ideas of using the neighbor-based reconstruction and co-occurrence assumption. 8(b and c)). pp. Then using the co-occurrence prior to infer the HR face is usually not feasible. PCA þLDA (reduced to 600 dimensions) classifier using the cosine similarity measure. Kanade. the discriminativity is maintained detail compensation. Wang and Tang [3] accept rate is 0. illumination). [5]. Moreover. 83–88. Directly reconstruction in the PCA space also suffers from the failure of this co-occurrence assumption. both LPP and LLE cannot maintain the LR-to-HR correspondence. specific domains of human faces. The performance of International Conference on Automatic Face and Gesture Recogni- LR and original HR images is used as benchmarks. We used the default parameter setup which is beneficial to the face recognition task. to high face recognition performance by machine. The NE This paper introduces a generalized joint kernel regres- method applies LLE to the LR neighbor patches (we set sion framework for single-image super-resolution. pairs for it and PCA which retained 99% of the variance. The proposed G-AJKR algorithm follows the global hallucination step to enhance the edges and recover more details (see results in Fig. Baker. For recognition. whereas the performance transformation using PCA. discriminativity preserving global face prior based on It is worth noting that tailoring face SR algorithms Partial Least Squares. which results in perceptually distracting artifacts. [5] used manifold learning techniques for undesirable artifacts and noise while preserving unique both global face reconstruction (based on LPP) and local facial features using PLS. Huang et al. LR performance by around 18%.1%.0 s method and SC is only slightly better. 9 shows that our method produces clean face learning scheme is also integrated to interact with the regions with noticeable sharper edges and more faithful regression prior for robustness. making the results deviate further from the ground truth (more like the mean face). Zhuang et al. C. in: Proceedings of IEEE et al. The improvements can be seen ments shows that the proposed algorithm achieves state- on both databases. / Signal Processing 103 (2014) 142–154 153 discriminativity is lost. representative Zhuang [1] S. hallucination methods on the FERET database. The training (800 images) and testing This work was supported by the National Basic (403 images) sets were as described before for PLS learn. It com- 150) and estimates the HR image patch by patch using the bines multiple coherent local kernel regressors to exploit co-occurrence prior. The collaborative manner. The impres. Then we compare the verifica- References tion rates produced from the hallucinated HR face images among different methods. Research Program of China (973 program) under Grant ing. 2000. T. 5. Neighbor Embedding rates of LR and HR images is around 20% when the false (NE) [24] and Sparse Coding (SC) [11]. for this method.0 s method. The tion. The extension to face sion is confirmed by reading the corresponding average hallucination tasks distinguishes itself by incorporating a PSNR values. The large variety of experi- details than the others do. 2013CB329403. . namely. we Acknowledgments finally conducted face recognition experiments on the FERET database. 10.

pp. Ma. Pattern [21] W. Graph. Y. Zhang. Syst. Takeda. Non-local kernel regression algorithm for linear inverse problems with a sparsity constraint. [20] J. .T. T.T. X. Bypassing synthesis: PLS for face recogni- sparse representation. Kwon. 593–600. Chang. pp. Intell. IEEE Trans. Shan. Vision and Pattern Recognition. X. 566–579. Huang. Wright. Learn. fit locally: unsupervised non-local means and steering kernel regression. 1438–1451. Intell. IEEE Trans. pp. Sun. Fan. X. A non-local algorithm for image [31] P. X.M. IEEE Trans. Comput. Xu. Glasner. Z.Y.S. Vis. Image Process. 2008. [16] H. 2004. New edge-directed interpolation. pp. D. E. Buades. in: Proceedings of IEEE Conference on Computer pp. 40 (11) (2007) 3178–3194. IEEE Trans. A. [29] Z. Single image super-resolution using Gaussian Conference on Computer Vision and Pattern Recognition. H. Rizvi. Jia. (2010) 2532–2543. S. Li. Zhou. De Mol. Daubechies. Image quality [15] G. Image super-resolution using gradient profile [23] P. IEEE Trans. Gao. Dong. and neighbor reconstruction for residue compensation. / Signal Processing 103 (2014) 142–154 [2] C. ACM image restoration.K. 4 (2003) Process. Li. J. Milanfar. 2011. Pattern Anal. Jacobs. F. P. Part A 38 (1) (2008) 149–161. Zhang. Tao. Shi. 75 (2007) 115–134. Single-image super-resolution using sparse kernel regression. Dong. D. 18 (7) (2009) Pattern Recognition. G. S. in: Proceedings of IEEE Conference on Computer Vision methodology for face-recognition algorithms. Image Process.A. L.Y. Phillips. Huang. 2006. [10] K. Super-resolution through neighbor in: Proceedings of IEEE International Conference on Computer embedding. Krämer. Res. Zhang. pp. Graph. 1–8. Chatterjee. An iterative thresholding [14] H. Kim. Zhang. Shan. J. X. Siu. Coll. pp. Image learning of low dimensional manifolds. 20 (7) (2011) 1838–1857. H. low-resolution and sketch. Int. [11] J. J. M. J. He. Clustering-based denoising with locally prior. Image Process. Chen. Shi. Li. [24] H. H. Elad. A generalization of non-local means via [13] K. Tang.T. Image deblurring and super- Recognit. H. Rosipal. Irani. Image and video upscaling from local self.K. Freedman. H. He. Orchard. image using canonical correlation analysis. ACM Trans. M. in: Proceedings of IEEE Conference on Computer Vision Vision. Roweis. Y. assessment: from error visibility to structural similarity. The CAS- sing and reconstruction. adaptive joint kernel regression. Zhang. M. Single image super-resolution with [25] L.C. Zhuang. in: Proceedings of IEEE International Conference Trans. ularization. S. [17] A.W. 34–51. in: Proceedings of IEEE 2861–2873. Morel. Saul. [30] W. W. Mach. The FERET evaluation denoising. 13 (4) (2004) 600–612. Yang. Image Process.S. Sharma.P. Mairal. 2005. 2011. M. Hallucinating face by eigentransformation. pp. Kernel regression for image proces. 60–65. Moon. 2008. 349–356. Farsiu. Bovik. J. Math. Pattern Recognit. 2011. Mach. Image Process. 22 (10) (2000) 1090–1104. G. IEEE Trans. Fang. T. 2010. Shum. Pure Appl. Think globally. 275–282. 21 (11) (2012) 4544–4556. Sapiro. P. 2009. IEEE Trans. Wang. Z. 119–155. IEEE Trans. 449–456. for image and video restoration. 57 (11) (2004) 1413–1457.J. [28] I. Image Process. 43 (7) Latent Structure and Feature Selection. Milanfar. Overview and recent advances in partial least [4] H.154 C. [9] D. examples. C. on Computer Vision. Chatterjee. in: Proceedings of European Commun. in: Proceedings of IS&T-SPIE Computational regression and natural image prior. in: Proceedings of IEEE Conference on Computer Vision and learned dictionaries. IEEE Trans. 10 (10) (2001) 1521–1527. [8] J. G. 2013. process regression. Fast image/video upsampling. Wu. J. in: Proceedings of International Conference on Subspace. S. L. [12] H. IEEE Trans. B. pp. Ding. Wu.C. Gao. and Pattern Recognition. Sheikh. Cao. 28 (3) (2010) 1–10. Imaging VI. Y. Man Cybern. Huang. Defrise. Huang. Hallucinating faces: LPH super-resolution restoration. 19 (11) (2010) tion with pose. D. Shum. Single-image super-resolution via practice. resolution by adaptive sparse domain selection and adaptive reg- [6] X. Wang. 1259–1266. X. Pattern and Pattern Recognition. Sparse representation for color image [5] Y. Freeman. Rauss. Centralized sparse representation for [7] Q. Yeung.R. Part C 35 (3) (2005) 425–434. Huang et al. B. C. IEEE Trans. [19] R. p. Fattal. Super-resolution of human face squares. in: Proceedings of the British [3] X. [27] P.J. R. 32 (6) (2010) 1127–1133. Zhang. 17 (1) (2008) 53–69. Mach. Tang. C. J. P. N. Face hallucination: theory and [18] C. Trans. Zhang. P. Y. X. Syst. 27 (5) (2008) 1–7. J. 349–366. X. IEEE Machine Vision Conference.I. Image super-resolution via [26] A. Xiong. Liu. Simoncelli. Conference on Computer Vision. 68140P. Milanfar. Super-resolution from a single image. 16 (2) (2007) PEAL large-scale Chinese face database and baseline evaluations. [22] W. Zhao. Anal. Bagon. Yang. W. Zhang. S. D. D. Man Cybern. Image Process.