You are on page 1of 19

Extraction of High-Resolution Frames from Video Sequences

Richard R. Schultz, Robert L. Stevenson,

Member, IEEE, and Member, IEEE
able within the data. For this reason, multiframe methods have been proposed 10]{ 17] which use the additional data present within a sequence of temporally-correlated frames to improve resolution. The sinc basis function provides a perfect reconstruction of a continuous function, provided that the data was obtained by uniform sampling at or above the Nyquist rate 3]. However, sinc interpolation does not give good results within an image processing environment, since image data is generally acquired at a much lower sampling rate. Even the use of polynomial methods such as Lagrange interpolation do not perform satisfactorily for image data, since globally-de ned polynomials do not model local image properties well. Rather, researchers have investigated piece-wise polynomial approaches to the interpolation problem. In the simplest method of image magni cation, a zero-order hold of the low-resolution data is used to compute a high-resolution data set with a blocky appearance. This corresponds to a nearest-neighbor interpolation kernel 6]. Bilinear interpolation 18] and cubic spline interpolation 2], 3], 5], 6], 8], 9] have also received much attention. Hou and Andrews 3] originally applied the cubic B-spline basis function to the image interpolation problem. Keys 5] introduced a basis function similar to a windowed sinc function, termed the cubic convolution interpolation kernel. Through an analysis by Parker et al. 6], it was shown that frequency domain properties of the cubic convolution kernel correspond more closely to an ideal low-pass lter than the cubic B-spline. T. C. Chen and deFigueiredo 2] showed the correspondence between spline lters and partial di erential equation (PDE) image models. They derived a spline lter pertaining to a noncausal image model with a seven sample region of support, also with the shape of a windowed sinc function. This kernel consistently produced interpolations with lower mean square error values than the cubic B-spline. Similar to image restoration and reconstruction, image interpolation is an ill-posed inverse problem, since too few data points exist in an image frame to properly constrain the solution. Intuitively, the mapping between the unknown high-resolution image and the low-resolution observations is not invertible, and thus a unique solution to the inverse problem cannot be computed. Regularization techniques include prior knowledge about the data in order to compute an approximate solution 1], 4], 7], 19], 20]. A Tikhonov regularization approach to image interpo-

Abstract| The human visual system appears to be capable are inherently limited by the number of constraints avail-

of temporally integrating information in a video sequence in such a way that the perceived spatial resolution of a sequence appears much higher than the spatial resolution of an individual frame. While the mechanisms in the human visual system which do this are unknown, the e ect is not too surprising given that temporally adjacent frames in a video sequence contain slightly di erent, but unique, information. This paper addresses how to utilize both the spatial and temporal information present in a short image sequence to create a single high-resolution video frame. A novel observation model based on motion compensated subsampling is proposed for a video sequence. Since the reconstruction problem is ill-posed, Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence. Estimates computed from a low-resolution image sequence containing a subpixel camera pan show dramatic visual and quantitative improvements over bilinear, cubic B-spline, and Bayesian single frame interpolations. Visual and quantitative improvements are also shown for an image sequence containing objects moving with independent trajectories. Finally, the video frame extraction algorithm is used for the motion compensated scan conversion of interlaced video data, with a visual comparison to the resolution enhancement obtained from progressively-scanned frames. Keywords|Discontinuities, image enhancement, image sequence processing, interpolation, MAP estimation, scan conversion, stochastic image models, video sequence processing.
I. Introduction

Image interpolation involves the selection of data values between known pixel constraints. Image processing applications of interpolation include region-of-interest image magni cation, subpixel image registration, and image decompression, among others. Single frame interpolation techniques 1]{ 9] have been researched quite extensively, with the nearest-neighbor, bilinear, and various cubic spline interpolation methods providing progressively more accurate solutions. However, all of these methods
Accepted to appear in IEEE Transactions on Image Processing, Special Issue on Nonlinear Image Processing. This research was presented in part at the IEEE International Conference on Acoustics, Speech, and Signal Processing, (Detroit, MI), May 8{12, 1995. E ort sponsored by Rome Laboratory, Air Force Materiel Command, USAF under grant number F30602{94{1{0017. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the o cial policies or endorsements, either expressed or implied, of Rome Laboratory or the U.S. Government. Please address all correspondence to: Robert L. Stevenson, Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556, Phone: (219) 631{8308, Fax: (219) 631{4393, E-Mail:, WWW:


This requires knowledge of the exact image displacements. To achieve signi cant improvements in this area. and therefore a suboptimal solution must be computed. These techniques result in smooth estimates of the high-resolution frame containing blurred edges. since the minimal solution was not required to meet the constraint values exactly. Existing multiframe methods su er from several impractical assumptions. the solution was constrained to pass directly through the given image pixels. Existing multiframe methods use either least squares techniques or POCS algorithms with quadratic constraint sets to regularize the interpolation problem. If this is not the case. resulting in a weighted least squares algorithm for computing the high-resolution estimate. Modeling error due primarily to inaccurate motion vector estimates will be represented by a probability density function included in the Bayesian estimation algorithm. although nonquadratic. In the approach most related to this research. Generally these frames are not available. Tekalp et al. Their motivation came from generating a high-resolution frame from misregistered pictures obtained by a Landsat satellite. Although the eld of single frame image interpolation is far from mature. a least squares approximation is computed through a pseudoinverse of the constraint mapping matrix. all of their examples used a large number of frames which were slightly misregistered. rather than the simple cases of global displacement or rotation. representing the underlying distribution of the image . constrained optimization problem. featuring a novel observation model for video sequence data. In this paper. Block matching methods 22]. Patti joined with Tekalp to account for time-varying motion blur within video sequences 14]. This is rather impractical if a multiframe technique is to be applied to a video sequence containing objects with independent motion trajectories. which is an advantage over the regularization methods characterized by quadratic stabilizing functionals. 13]. the quality of these motion estimates will have a direct e ect on the quality of the enhancement algorithm. Obviously. then the image interpolation problem is no longer ill-posed. with interpolation kernel weights adjusted adaptively according to edge information in a local neighborhood. which happen to be available in images acquired with controlled subpixel camera displacement 12] and images captured by stereo cameras. Thus. Multiframe image restoration was introduced by Tsai and Huang 17]. Chen and deFigueiredo 1]. An unconstrained optimization was also derived in this research. Previous research deals with some type of global displacement or rotation occurring between frames. Previously proposed methods. the next step requires the investigation of multiframe data sets. and to accommodate for interlaced frames and other video sampling patterns 21]. motion estimates must be computed to determine pixel displacements between frames. This is a poor assumption if the multiframe algorithm is to be applied to an arbitrary video sequence. a cubic \spline-under-tension" kernel has been developed 9]. Bayesian estimates computed by this method contained preserved edges. In this case. In other words. and then interpolating between the non-uniformly spaced samples. resulting in a convex. if enough frames with the correct subpixel displacements are available. from bilinear and spline interpolation to quadratic functional minimization. Another approach to this problem involves mapping several low-resolution images onto a single high-resolution image plane. Bayesian maximum a posteriori (MAP) estimation will be used to regularize the ill-posed interpolation problem. the quality of an estimate generated by any method is inherently limited by the amount of data available in the frame. The resulting unconstrained optimization problem allowed for some noise within the data. a unique solution is obtained by the direct inversion of the constraint mapping matrix. in which additional data constraints from sequential frames can be used. the motion occurring between frames is not known exactly. Another approach is a stochastic regularization technique using a discontinuity-preserving prior model for the image data 7]. 10] applied Bayesian estimation with a Gaussian prior model to the problem of integrating multiple satellite frames observed by the Viking Orbiter. in which a quadratic stabilizing functional added to a delity term for the constraints was de ned. 23] will be used to estimate the subpixel displacement vectors between frames. As stated previously. A similar problem formulation was suggested by G. Although smoothness in one-dimensional function interpolation is often acceptable. since precise control over the data acquisition process is rarely available. The multiframe interpolation problem is placed into a Bayesian framework. A frequency domain observation model was de ned for this problem which considered only globally shifted versions of the same scene. The resulting algorithm incorporates several ideas which enhance both the usability and the quality of the estimated image frame: Independent object motion in the video sequence will be assumed. to account for additive noise corrupting the data. Stark and Oskoui 15] formulated a projection onto convex sets (POCS) algorithm to compute an estimate from observations obtained by scanning 2 or rotating an image with respect to the CCD image acquisition sensor array. result in smooth solutions to the image interpolation problem. This stochastic regularization technique requires a density for the data known as a prior image model. In real-life electronic imaging applications. although as a constrained optimization problem. the observation mapping becomes invertible and a unique solution may be computed. the human visual system is acutely aware of discontinuities within images. To better preserve edges. 16] then extended this POCS formulation to include sensor noise. However.lation was proposed by Karayiannis and Venetsanopoulos 4]. Cheeseman et al. the problem of enhancing the de nition of a single video frame using both the spatial and temporal information available in a video sequence is addressed. Provided that enough frames are available with di erent subpixel shifts. An extension of this algorithm for noisy data was provided by Kim et al.

If the video sequence is interlaced. resulting in an estimate of the high-resolution image containing distinct edges. A visual comparison of multiframe estimates computed from progressively-scanned and interlaced frames is also shown for an actual video sequence. 1 2 2 1 2 (3) where A(k. among others. Rows of A(l. This models the spatial integration of light intensity over a square surface region performed by CCD image acquisition sensors 7]. Applications of this research include Improved De nition Television (IDTV). including a discontinuity-preserving prior model for the data and a probability density function for the motion error. : : :. : : :. A 3 . k + M2 1 .k) containing useful information are those for A. This has the intent of improving upon least squares solutions and POCS solutions with quadratic constraint sets. : : :. (4) ? for l = k ? M2?1 .k). subpixel motion vectors must be computed from the low-resolution video frames. The video frame extraction algorithm is formulated within a Bayesian framework in Section III. N1 and j = 1.j) = 2 @ q B. Subsampling for the center frame is accomplished by averaging a square block of high-resolution pixels. or equivalently the (iN + j ? N )th the (i. II.k)z(k) . where q is an integer-valued interpolation factor in both the horizontal and vertical directions.k) maps a square block of q q high-resolution samples into a single lowresolution pixel. Subsampling Model for Center Frame ? ? for l = k ? M 2 1 . Motion Compensated Subsampling Model A novel observation model is proposed for a low-resolution video sequence.k) accommodates for these pixels with nonzero elements. j ) 2 2 y0 = A0 z(k): (5) element of y(l) . A single high-resolution frame z(k) coincident with the center frame y(k) is to be estimated from the low-resolution sequence. For pixels in z(k) which are not observable in y(l) . k ? 1 and l = k + 1. This paper will be organized as follows. Motion compensated subsampling matrices incorporate object motion between frames in the model. N2 . k. and preprocessing for image/video analysis. (1) 0 qi X qj X r=qi?q+1 s=qj ?q+1 1 (k zr. The multiframe algorithm proposed in this paper reduces to the Bayesian interpolation method presented previously 7] when only a single video frame is available. including a subsampling model for the center frame of a particular sequence. with error in the estimated motion vectors assumed to be Gaussian-distributed.k) is the motion compensated subsamthis expression. A(l.or odd-numbered scan lines from the high-resolution data into the low-resolution frame. : : :. The vector u(l. up to a practical limit. Write these useful rows as (l) quence contains N1 N2 square pixels. for l 6= k. Section V provides a brief summary. pling matrix which models the subsampling of the highresolution frame and accounts for object motion occurring between frames y(l) and y(k). Section II proposes an observation model for several frames from a lowresolution video sequence.pixel values. : : :. (2) for i = 1. The subsampling model for the center frame can be expressed in matrix notation as y(k) = A(k. The second part of the observation model is de ned to account for motion occurring within the sequence. z(k) . The idea is to extract knowledge about the high-resolution center frame. 1 (k yi. A hierarchical block matching technique is presented to estimate the displacement vector elds. A hierarchical subpixel motion estimator is described to compute the displacement vectors required in constructing the motion compensated subsampling matrices. Visual and quantitative simulation results are described in Section IV for a syntheticallygenerated sequence containing motion modeled with a camera pan and a video sequence containing independent object motion. In order to construct the motion compensated subsampling model. Object motion will also cause pixels to be present in y(l) which are not in z(k). and a motion compensated subsampling model for other frames within the multiframe model.k) is unknown.k) contains a column of zeros. Increasing the number of low-resolution frames used as constraints should improve interpolation results.s)A .j represent the reduced set of equations th pixel of frame l. This unknown high-resolution data consists of qN1 qN2 square pixels. A subsampling matrix is de ned to map the high-resolution data pixels into a low-resolution frame via spatial averaging. In (l. Problem Statement which elements of y(l) are observed entirely from motion Assume that each frame in a low-resolution video se.k) z(k) + u(l. An edge-preserving image prior will be assumed for the data. it is obviously di cult to utilize these nonzero rows. Since u(l. k + M 2 1 . Each row of A(k. : : :.k) 2 IRN N q N N will be referred to as the subsampling matrix. Let yi. along with future research issues to be explored. from temporally neighboring low-resolution frames y(l) . video hardcopy and display.k) y(l) = A(l. this matrix maps only the even. Video Sequence Observation Model quence n (l) o y where M represents an odd number of frames. Consider a short low-resolution video se(l) (l.compensated elements of z(k). This will be modeled as C.

k) i j ^ y0 = A0 z(k) + n(l. the magnitude and direction of this shift is (l.k) "(l) will serve as the criterion for determining whether ym.i.k) can be deleted in the construction of the reduced matrix A0 . a block of q q high-resolution pixels within z(k) . To construct A(l. To detect these pixels. with a single displacement vector describing the motion for all q q high-resolution elements contained (l) within yi. These samples are not useful in the video observation model. Pixels which are not completely observable must be detected so that the corresponding rows of A(l. 24] can be modi ed to quickly and accurately estimate subpixel displacement vectors 25]. This can be expressed by the relation (l) (k (7) yi. it is assumed that the shift (qvi . is used as the reference frame in the block matching scheme. for row iN2 + j ? N2 .k vi. an approach loosely based on the change detection algorithm in 27] will be used.k) (l.k) is an additive noise term representing the error ^ in estimating A0 . the motion compensated subsampling matrix must be estimated initially from the low-resolution frames.j ) (8) "(k (l. Recall that the only rows of A(l.k block matching with the mean absolute di erence (MAD) criterion 23]. an estimate A0 must be computed from y(l) and (k).n) = ym. respectively.k) .k) 4 . or a poor displacement vector estimate.j = y(k) 1 i?v .k) is similar in form to A(k. The motion estimator output. Each subsequent level of the hierarchy involves an up-sampling of the low-resolution frames through Bayesian MAP interpolation 7].k given by the displacement vi. the exact vectors describing translational motion between frames y(l) and y(k) are required. In (8).k) containing useful information are those for which elements of y(l) are observed entirely from motion compensated elements of z(k) . and hence also observable in the up-sampled frame y (l.k (l. To construct the motion compensated subsampling matrix A(l. An integer-valued displace(l) ment ^i.n is "(k) .jl.j . Since a single displacement vector is used to describe the motion (l) of each low-resolution pixel yi. Figure 1 depicts the hierarchical subpixel motion estimator for q = 4.) Gaussian. this corresponds to a row of the center frame subsampling matrix A(k.d.j ) is estimated initially for each pixel yi. (l.k in z(k) . where the translational displacement is denoted as (l.s) A 1 has a resolution of q pixels in the vertical and horizontal directions. so that an estimate of the displacement eld v(l.k) .k) . Exact motion vectors are rarely available. A(l.k) (l. 26].k) .s?qvj A r=qi?q+1 s=qj ?q+1 1 (k zr.k) being shifted to form the row with the same index within A(l. ^ then row iN2 + j ? N2 within A(l. represented as vi and vj . Each motion vector vi.j . (l. the relationship between y(l) and z(k) for y l 6= k will be de ned as (l. ^(l.j ?v . If ym. including displace"(l) ment vi. ^ ^ i j (9) Thus.j ) = q vi.k) must be ^ computed from the low-resolution frames y(l) and y(k). is the set of all displacements corresponding to ^ (l) Provided that the low-resolution pixel yi.j ?v = q2 @ 0 q(Xi) i?v q(Xj ) j ?v 0 qi 1@ X = q2 i j r=q(i?vi )?q+1 s=q(j ?vj )?q+1 1 qj X (k) zr?qvi . and that translational displacements are su cient to describe motion over small time periods. Hierarchical Subpixel Motion Estimation as v0 .n in the motion compen^(l.jl. ^ that is. although this may not represent the most accurate error distribution.j ) associated with ym. j ) (l) (l. this vector will be used to describe the motion for all q q high-resolution pixels (l) contained within yi. with the corre^"( sponding subpixel displacement given as vi.k "(l) DF Dm. but with the summation over a shifted set of pixels.k). In either case. Therefore. This is repeated in order to estimate the up-sampled vector vi. consider the th pixel between frames l and k: motion of the (i.n?qv .k sated subsampling model will not provide useful informa"(l) tion.k 1 ^"( An unobservable pel detection (UPD) algorithm is applied to the data at the nest level of the hierarchy to determine which motion vectors will be useful in the con^ struction of A0 .n ? ym?)qv . Explicitly. denoted D.n) detects a pixel in y"(l) which is not present in y"(k) .In practice. (l. This can be generalized to a fractional shift by adding a weighting term in the summation to compensate for the fractional part of the pixel in the shifted region. with subpixel re nements computed for each motion vector. Pixels entering frame y(l) from behind an object in y(k) or from the image boundaries are unobservable in the high-resolution frame z(k) .j = yi?)v . Assume that intensity is constant along object trajectories in a video sequence. The displaced frame di erence.j is observed in pixels in y(l) which are observed entirely from elements in frame y(k). y(k).k).j . Within the video sequence observation model.j ).k) is deleted in the con^ struction of the reduced matrix A0 . y(l) i. The additive noise is assumed to be independent and identically distributed (i. The hierarchical block matching algorithm 22].j) . A large value for DF D m.k) (l z(k) is mapped to yi. may be fractional values to account for subpixel motion in the low-resolution data.k) .j using v(l.n is detected as an unobservable pixel in z(k) . The center frame of the video sequence.j ) = vi vj ]t: Vertical and horizontal displacements.k) (6) where n(l. qvj ) is an integer number of pixels.

k) (k) separated by discontinuities. Alvalue in smooth image locations and a large value at edges. z z where is a threshold parameter separating the quadratic ? (10) and linear regions. with the projection op. 20].e. given by the following secondThe problem of estimating the high-resolution frame ^(k) order nite di erences: z given the low-resolution sequence y(l) is ill-posed in the sense of Hadamard 28]. for y = A z . since a number of solutions could dtm. : : : . y(k+ ) : Applying Bayes' theorem to the conditional probability relim (x) = x2 .3z (15) m?1.k) N1 N2 2 N1 N2 2 2 5 . The ? ) (k) o ? ) : (11) threshold parameter controls the size of discontinuities . !1 sults in the optimization problem n characterizes the Gauss-Markov image model. (17) ? ) 2 x ^(k) = arg max logPr z(k) jy(k? .. The quantity dc ? ? is a spatial activity measure within the data.jxj ? 2.n+1 (13) satisfy the video sequence observation model constraints.1z(k) = zm.n resulting in a constrained optimization problem with a dtm. image data consists of smooth regions. : : :.4z(k) = 0:5zm?1. t (k) = 0:5z dm.. III. : : :. and c is a local group of pixels contained (2 ) t z(k) within the set of all image cliques C .k) (l) (l. Gaussian-distributed. The likelihood of edges in the data is controlled by terior probability Pr z(k) j y(l) .n. dm. A quadratic edge penalty.n?1 ? zm.k) . though the error variance (l. maximum of the log-likelihood function.n ? 2zm. : : :. A longer-tailed density than the Gaussian results. is assumed to be ned as i.n+1 (14) A well-posed problem will be formulated using the stochast (k) = z tic regularization technique of Bayesian MAP estimation. Ef(18) Pr y(l) jz(k) : fectively. displacement vectors used in the construction of the mo^ an assumption of global smoothness is made for the data. with directions selected to account for horizontal. since smooth edges will be more highly probable than sharp discontinuities. l.n?1 ? zm.k) for each frame is unknown. y(k+ jz + log Pr y(k? modeled by the prior 20] by providing a less severe edge ? Both the prior image model Pr z(k) and the conditional penalty.n + zm. The error which is incorporated into the estimation problem through between frames is assumed to be independent.Four spatial activity measures are computed at each pixel in the high-resolution image.n. with the zero-mean probability ( ) (k) = 1 exp ? 1 X t z(k) : Pr z (12) density given as Z 2 c2C dc (20) Pr y(l) jz(k) = In this expression. A more reasonable prior assumption is that digitized data is piece-wise smooth.d. : : :. 2 (x) = x . edges are k+ ? Y statistically unlikely to appear in the MAP estimate. jjxjj > . The Huber-Markov random Pr y(k)jz(k) = 1.tions. By makPr y(k? ) .i. with these regions (k) (k. with a small for l = k ? M2 1 . making disz continuities within the Gaussian image model unlikely.n?1 ? 2zm.n + 0:5zm?1.k) is known exactly.n. for l 6= k.n + 0:5zm+1. i. with the probability density de^ The error in estimating A0 . k ? 1 and l = k + 1. or equivalently at the the Huber edge penalty function 7]. so that the a Gaussian prior. high-frequency components are suppressed by the ? l=k? image model. Edges are ^(k) = arg max log Pr z(k) z severely penalized by the quadratic function. : : :.k) the Gibbs prior.These quantities approximate second-order directional derivaerator structure described in detail. vertical. : : :. y(k).n. M2 1 . Video Frame Extraction Algorithm M 2 1 (k) M 2 1 (k) M 2 1 M 2 1 M 2 1 M 2 1 M 2 1 M 2 1 (l. in which discontinuities are more probable. The objective is to estimate a high. (19) 0 otherwise: eld (HMRF) model 7] is a Gibbs prior which represents piece-wise smooth data.n + zm+1.2z m+1. is the \temperature" parameter for exp ? (1 ) y0 ? A0 z(k) . Since A(k. Z is a normalizing constant known as the 2 1 ^ partition function..n+1 (16) unique minimum. and diagonal edge orientaA. The gradient projection algorithm will be used to compute the estimate. ? density Pr y(l) jz(k) will be de ned. (k) tives computed at zm. y(k+ ) jz(k) = ing an assumption of Gaussian-distributed data. : : :.n . y(k). y(k). Bayesian MAP Estimation The MAP estimate?is located at the maximumof the pos. Bayesian estimation distinguishes between possible soluThe conditional density models the error in estimating tions through the use of a prior image model. tion compensated subsampling matrices A(l. Commonly.k 2 (l.complete conditional density may be written as resolution frame by reconstructing the high-frequency com? ? ponents of the image lost through undersampling.

^ z z 2Z m.k) t (k) t M 2 1 t M 2 1 k M 2 1 (k 1) (k+1) k+ M 2 1 k M 2 1 . The projection operator. z i The iterative technique is summarized below: Gradient Projection Algorithm 1.n r=1 with step size i.k)I. z 6 m. Set iteration number i = 0. By making a second-order Taylor se2 ^ . i parameter. the gradient pro9 jection technique 29] has been selected. (k+ . Initial condition z(0k) = q2A(k.k) A(k. A zero-order hold. y0 . with Z= z : y =A z : i i Each frame y(l) .k) pi r2f z(ik) .Gradient optimization techniques converge to a local minimum of the objective function by following the trajectory de ned by the negative gradient. t . : : :. (k+1. . .k) y0 ? A0 z(k) ^ .k) . : : : . the estimate is given as ^(k) = z(+1 . has an associated con dence the constraint space mapping denoted as p(k) = ?Pgi(k).n. is the Hessian matrix of the objeci tive function. l 6= k the low-resolution data y(k) by a factor of q in both directions satis es the constraints. P 2 IRq N N q N N .k) = ?1 (l.n r=1 dtm. A0 0 = A0( ? ^ ^ ^ k) . A .n. Any starting point z(k) which is a member of + > 0 ? > l = k ? M2 1 Z is valid. For each iteration. ? : : :.n r=1 dm. where r2f z(k) . This method maps > the negative gradient of the objective function in (24) onto > M ?1 k+ 2 > X 2= the constraint space at each iteration through a projection (l.k)I. all iterates must belong to the constraint set Z . y0 . (23) ries approximation to the objective function at the current + 1=2 y0 ? A0 z(k) state z(k). such that ? ) 0 ? .k)A(k.k) y(k).k (k 1. (27) . A ^ z(i+1 ? z(ik) : (28) ? ) t z(ik) ^ 0( : :::.n. P = I ? A(k. each iteration of the gradient 4 To minimize f dtm. . (21) operator. = 4 XX ^ (24) + 1=2 y0 ? A0 z(k) is convex.A (k) (l) (l.k) A(k. a quadratic step size approximation becomes i where the block-diagonal matrix of con dence parameters pi(k) p(ik) is denoted as .rz To ensure that the constraints are met.r z(k) 2 k) If this criterion is satis ed.r z(k) ^(k) = arg zmin z method requires a movement in the descent direction pi 2Z m.k) A(k. and it will be used as the where the set of constraints is de ned as o n (k) (k) (k. : : :. In constrained optimization problems such z 4 t (k) (k) = arg min as (23). or pel-wise replication. p(ik) = diag (k? I. . To formulate the frame extraction optimization problem. Gradient-based methods may be employed to compute ^(k). of > .k) (k+1. (k?1. since the convex Huber function is used to penalize edges within the data. . (26) i = (k) ? .k) (k) initial condition z(k) = q2A(k.k) y(k) is the zero-order hold of y(k).k): (25) problem can be expressed more compactly as (X X ?z(k). . The optimization = I ? q2A(k. for l 6= k. : : :. A sequence of iterates n (k)oK zi i=0 are generated.k) k+ M 2 1 . y0 . . since a unique minimum exists.k) t it is assumed to be proportional to the frame index di erence jl ? kj. The image is then updated as the stacked observation vector is given as k) ? ? ) ( ?) t ? z(i+1 = z(ik) + i p(ik). in which states denoted by increasing iteration numbers more closely approximate the (X X estimate ^(k) . 0 (22) the gradient is computed as g(k) = rf z(k) .k) ^ proportional to the con dence in A0 . The MAP estimate of the high-resolution data given the low-resolution sequence becomes B. is given by (l. Gradient Projection Algorithm 2 1 2 2 1 2 2 t t (l.k The objective function f z(k). y0 = y0( and the estimate of the motion-compensated subsampling andaconvergence is achieved once the relative state change for single iteration has fallen below a predetermined threshmatrix is represented as old .k)I . the Huber-Markov prior model and the complete conditional density are incorporated into (11).

This video sequence simulates a diagonal To minimize the number of computations required in espanning of the scene acquired by a video camera mounted timating a high-resolution video still. Displacement vectors were rst pi i estimated using the hierarchical subpixel motion estimausing the projection operator de ned in (25). Sequences Figures 3 and 4 show the complete unprocessed and estiThe original Airport test sequence consists of seven high. Each of the high-resolution frames the multiframe technique. and the best video frame extraction from were rst subsampled and then interpolated back to their the Airport sequence. ^ Airport. The two M = 5 frames for computing the the high-resolution estiBayesian estimates were computed to compare linear and mate is su cient for this sequence. a nonlinear estimate was generated with M = 7. low-resolution sequence. with a zero-order hold of the center frame 3. tion algorithm. while Figure 5 shows details in a region of resolution frames. with the center frame z(k) shown in Fig. The second data set.k) = jl?kj . = 1. Update the state. Each low-resolution frame include the fewest number of frames in the multiframe y(l) was generated by averaging 4 4 pixel blocks within model which will provide a signi cant improvement over each high-resolution frame z(l) and then subsampling by a single frame interpolation techniques. The rst image sequence. Again. A linear estimate was computed with pa10 4. three low-resolution video sequences nonlinear estimates. zi+1 i i i (l. the video frame extraction algorithm was used to estimate the center frame. incre. Results for the Mobile nonlinear estimates. shifting each successive best in the previous sequence. The number of lowfactor of q = 4. Figure 2 shows image to model a camera pan. it is known that only panning occurs within the sequence. although the improvement is not as dramatic frame seven horizontal pixels to the right and seven verti. Note that the relatively small value for kz ?z k k) jl?kj . Compute the step size i as de ned in (26). Simulations eraging over the estimated displacement vector elds.k) 7 .the wall calendar. and then (k) = z(k) + p(k). Visual and Quantitative Results for Subsampled results for various Mobile Calendar frame interpolations. is an actual image sequence translational motion. greatly improved motion vector estimates can be recovered by avIV. as reported (in )the 1000 two rows of Table I. Evidently. . (dB ). In ment i and return to Step 2. With this knowledge. Both of these sequences MAP estimate. the center lowCalendar. and Bayesian MAP estimation assuming image and motion model combinations.that several more frames could have been used from the to-noise ratio. selected as the reference. : z(0k) = q2 A(k. cubic B-spline interpolation 3]. Mobile the original high-resolution test frame z(k) . then ^(k) = z(+1 . a multiframe model composed of M = 5 frames from 0 SNR = 10 log10 (k) an arbitrary image sequence generates a signi cantly higher kz ? ^(k) k2 z (k) i+1 i gi(k) = rf z(ik) .k shows little con dence in the motion estimates. Figure 6a) shows this inforMAP estimation assuming a Gauss-Markov image model mation for the Airport sequence for a number of di erent with = 1 7].mated frames. = 1. Otherwise. From experi2 kz(k) ? z(k) k . The parameter value l. 5. Progressively-scanned and lational motion. a toy train engine moves with transacquired using a video camera.2. cal pixels down. It appears parison of the estimates by showing the improved signal.k) y(k) t (k) (k) i (l. was subsampled by a factor of q = 4 in the same manner as the Airport sequence. the objective is to on an airplane ying overhead. original dimensions so that quantitative comparisons could The original Mobile Calendar test sequence consists of be made using the quadratic signal-to-noise ratio (SNR).k = jl?kj represents a were used in the simulations. Next. using a Huber-Markov image model with = 1 7]. An interpretation of the results will be provided in both Within the sequence. Compute the gradient of the objective function. If kz k i (l. although gains would likely be incremental at best. Table II shows quantitative A.and ) z 6. Project the negative gradient onto the constraint space. was synthetically-generated from a single digital high con dence in the estimate of A0 . given the (k) = ?Pg(k). composed of objects possessing ne detail. Dome. is a digitized video sequence composed of several resolution frame of the sequence y(k). this particular example. a wall calendar moves with subpixel cases. Mobile Calendar sequence to improve resolution. and (l. and the train is pushing a ball undergointerlaced versions of this data set were interpolated using ing rotational motion. The center low-resolution frame y(k) was resolution frames which should be included in the video expanded using the single frame techniques of bilinear in. Bayesian plotting SNR versus M .Calendar sequence are shown in Figure 6b). This In order to show the resolution improvement achievable results in a signi cant improvement in both the linear and nal with this approach.observation model can be determined experimentally by terpolation 18]. seven frames. the best single frame objects moving independently. Table I provides a quantitative com. It was generated by extracting subimages from gorithm with the Huber-Markov image model provides the a digitized image of an airport.k) = 10 . rameter values M = 7. (29) ence. A third sequence. the video frame extraction alure 2a).

the orthogonal axis to the direction of motion can be described by the addition of vertical and horizontal vectors with equal magnitudes. visual resolution is signi cantly improved. containing 160 120 pixels. Obviously. Visual Results for an Actual Video Sequence Single frame interpolation methods are inherently limited by the number of constraints available within a given image. the estimate computed by the frame extraction algorithm has the potential to be substantially improved over single frame interpolations. for the synthetic sequence this assumption holds exactly. additional linearly independent rows must be available from ^ the motion compensated subsampling matrices A0 . and in this region the motion compensated subsampling matrices provide little additional information in the observation model. For actual digitized sequences such as Mobile Calendar. This is certainly no surprise. V. In the Airport sequence. Dome is a short video sequence of a landmark on the University of Notre Dame campus. interpolated by a factor of q = 4. A basic assumption for the motion occurring between frames is that intensity is constant along object trajectories.resolution frame than any single frame interpolation technique. Not surprisingly. which models the subsampling of the unknown high-resolution data and accounts for independent object motion occurring between frames. For instance. a ecting the appearance of the overall frame extraction. and substantial improvement over the low-resolution data is achieved with the video frame extraction algorithm. This allows for the accurate estimation of displacement vectors within the Airport sequence. Since the multiframe technique uses motion compensation between frames. interpolated by a factor of q = 4. All rows within A(k. the motion estimator fails. A novel observation model was proposed for low-resolution video frames. resolution improvement is evident since the wall calendar undergoes a subpixel translation. The deinterlaced frame generated from two even elds and three odd elds is depicted in Figure 8b). Independent object motion occurs in the Mobile Calendar sequence. As expected. Otherwise.k) C. and an image sequence containing independent object motion. If an object is moving with subpixel motion. An image produced by this method is shown in Figure 8a). the constraints will be redundant and will not provide more information than the pixel found within the center frame. This is why the Airport estimate in Figure 2d) contains both vertical and horizontal structures with high resolution. Additional linearly independent constraints are available from the adjacent frames within a video sequence. Provided that the object motion has subpixel resolution. in the region containing the ball with pixels undergoing rotational motion. Further(l. the motion estimator does not produce an accurate vector eld. if the motion is a horizontal panning of the scene rather than diagonal. Each frame was progressively-scanned. Figure 7a) shows the center frame of the M = 5 frame sequence. much of the background is stationary. vertical structures will be reconstructed with enhanced resolution. Interpretation of Results more.or odd-numbered scan lines from the progressively-scanned frames were used in alternating interlaced frames. B. In regions such as the wall calendar in Figure 4. The true indication as to whether a video processing algorithm is e ective is to test it on an actual image sequence. since twice as much information is present in the progressively-scanned sequence. However. In the case of camera panning. an interlaced version of the Dome sequence was created as well. Visual and quantitative simulation results were reported for a synthetically-generated sequence containing a global camera pan. This relatively small number of frames provides a decent trade-o between video frame quality and computation time necessary to estimate the high-resolution data. Note the severe motion artifacts between the even and odd elds. with more moderate improvements shown for the digitized sequence with independent object motion. The diagonal shift of each frame within the Airport sequence results in the greatest number of linearly independent constraints. object features orthogonal to the motion will be enhanced by the frame extraction algorithm. The video frame extraction from these ve frames. the constant intensity assumption often does not hold due to noise within each frame. is shown in Figure 7b). This provides a large number of linearly independent constraints in the observation model. and serves as an \upper-bound" estimate of the video frame extraction algorithm. and a Bayesian frame extraction algorithm was proposed to estimate a single high-resolution frame given a short low-resolution video sequence. this will provide another useful constraint. Only the even. and horizontal structures will remain essentially as they were in the low-resolution data. Incorrect motion estimates can severely degrade the quality of the estimate. A hierarchical subpixel motion estimator was presented to estimate the displacement vectors required in constructing the observation model. A simplistic method employed in the generation of interlaced video hardcopy involves the integration of two elds by placing them together in the same frame. the improvement provided by the algorithm was quite re8 . Several justi cations are proposed for the performance of the algorithm in each case. the frame generated from the progressively-scanned data shown in Figure 7b) has higher resolution than the frame computed from the interlaced sequence shown in Figure 8b). If a low-resolution pixel undergoes a subpixel shift. subpixel global motion occurs between each frame.k) are linearly independent. Conclusion The video frame extraction algorithmperforms extremely well given the image sequence with camera pan motion. In order to show how the multiframe algorithm may be used for scan conversion. since the sequence was generated from a single digitized image. In the Airport sequence. To provide additional information in the video observation model.

pp. 20] R. Y. no. C. \Image resolution enhancement using subpixel camera displacement. 1153{1160. \Discontinuity preserving regularization of inverse visual problems. \Fast B-spline transforms for continuous image representation and interpolation. 1. \Recursive reconstruction of high resolution image from noisy undersampled multiframes. Huang. vol. (San Francisco. Cybern. Stutz. 277{285." Signal Processing. pp. Jacquemod. 1981. Mach. R. Venetsanopoulos. \An edge9 10] P. 1013{ 1027. P. Signal Processing. Odet. \Super-resolved surface reconstruction from multiple images." IEEE Trans. 1992." IEEE Trans. \Bayesian interpolation. 16] A. Spectral correlations can be included in the Huber-Markov image model through the addition of between-channel cliques 31]. Image Processing. IEEE Int. Conf.-J. 259{288. 139{146. 6] J. Valenzuela. Tekalp. R. Acoust. 53. Speech. vol. 6. Troxel... IEEE Int. and H. vol." IEEE Trans. A. \Highresolution standards conversion of low resolution video. Walowit. pp.. K. no. \Image interpolation based on variational principles. 2. 15] H. 2197{2200. pp. 38. Pirsch." IEEE Trans. 1994. Speech. 23] H. 3. Patti. 11. P. Imaging. V. P. Chen and R. (l." J. IEEE. 942{951. vol. CA. Fundamentals of Digital Image Processing. 6. I. Unser." Neural Comp. The most critical aspect of accurately modeling the video data is the accurate estimation of motion. pp. Tekalp. JAI Press Inc. 1985. Andrews. Kraft. vol.. 12] G. R. I{343 to I{347. B. and E. 73. 8] M. 5] R. vol. Opt. Elec. interlaced and compressed video frames) and temporal discontinuities within the image sequence 30]. pp. Cheesman. vol. 4] N. 4. vol. 233{ 242. the scan conversion capabilities of this algorithm will be investigated more fully. Kim. 1. N. Peleg. 4. \Improving resolution by image registration. no. 18] A. 1715{ 1726. 3. vol. and M.. \Cubic splines for image interpolation and digital ltering. 6. MI). pp. J. Eden.. Keys. Speech.). III{169 to III{172. Huang. 3. J. no. 1995.. Hanson. no. 3. pp.markable.g. vol. Signal Processing. NASA Ames Research Center. Signal Processing." in Proc. The improved displacement vector estimates should pro^ vide a more accurate estimate of A0 . and R. J. pp." IEEE Trans. Image Processing '88 (T. 24. 33." Tech.. 21] A. E." in Advances in Computer Vision and Image Processing (R. Winans. 3] H. Hsing. Oskoui. 6. H. C. Regularization techniques can be applied to the ill-posed inverse problem of motion estimation which are robust to spatial noise and sparsity (e.). M. pp. eds. N. TX). no. 1991. R. Chen and R. pp. and H. vol. vol. 1989. 415{447. Sezan. 2] T. 1988. (Austin. \Multiframe image restoration and registration. B. C. L. ." IEEE Trans. pp. Goutte. Y. Hou and H. Acoust. Musmann. vol. To further improve the video observation model. 1991. Signal Processing. More modest improvement gains were visible for the sequence containing objects moving independently. Conf. Parker.. 508{517." Proc. ed. M. Speech. 3. vol. Schultz and R. Speech. J. 1001. Englewood Cli s. Schmitz. Grallert. pp. 2. 11] M. pp. Patt. NJ: Prentice-Hall. I. 152{161. using convex projections. Visual Commun.." Signal Processing. SPIE Conf. Kenyon. K. L. A. \Highresolution image reconstruction from a low-resolution image sequence in the presence of time-varying motion blur. 19] D. Acoust. Finally. Xue.. pp. 14] A. Soc.. Sezan. vol. no. 1984. 1989. 26. 29. Am. 25. 1990. 455{469. 26. P. Man. 523{548. and A." in Proc. pp. Conf. B. \Highresolution image reconstruction from lower-resolution image sequences and space-varying image restoration. Kanefsky. deFigueiredo. Intell. MacKay. 1993. \A uni ed approach to optimal image interpolation problems based on linear partial di erential equation models. 9] K. 22] M." CVGIP: Graphical Models and Image Processing. S. Stevenson. Bierling. Image Processing. K." IEEE Trans." IEEE Trans. and A. and M. Speech. J. A. J. Irani and S. C. \Twodimensional interpolation by generalized spline lters based on partial di erential equation image models. 1994. no. pp. Anal. Karayiannis and A. 3. J. 3. E. pp. 7] R. Rep. S. CA). no. and D. Imaging. Aldroubi. Stevenson. A. 1991. Simulations were also conducted on an actual video sequence to compare the performance of the algorithm on progressively-scanned and interlaced frames." in Proc. G. M. G. A number of issues will be explored in future research. Stark and P. Patti. vol. (Detroit. 13] S. no. Acoust. and E. restricted spatial interpolation algorithm. 1. Sezan. Image Processing. \High-resolution image recovery from image-plane arrays. M. 1978. 1992." IEEE Trans. Signal Processing. \Advances in picture coding. 317{339. pp. no. 1992. Ozkan. \Comparison of interpolating methods for image resampling. 17] R. 13. \A Bayesian approach to image expansion for improved de nition. and R. Med. Acoust. M. M. a more accurate sensor model is under investigation which incorporates a realistic point spread function (PSF) for the electronic imaging system. Tekalp. 1994.k) References 1] G. 231{239. no. Color video sequences contain spectral information in each frame which can further improve the quality of the high-resolution frame estimates. no. Bose. Signal Processing. pp. no. 1985. M. Mo ett Field. 1983. 631{642. 3. 1. 31{39. \Displacement estimation by hierarchical blockmatching. FIA{94{12. vol. Tsai and T. no. December 1994. pp. Delp. IEEE Int. no. I." in Proc. 41{49. \Cubic convolution interpolation for digital image processing. Syst. Acoust. 1. 2. Jain. vol." J. 1992. deFigueiredo. Tsai and T.

Herve. He joined the faculty of the Department of Electrical Engineering at the University of North Dakota in the Fall of 1995. Richard R. no. \Regularization of discontinuous ow elds. 31] R. 25] G. \A comparison of techniques for estimating block motion in image sequence coding. W.E. and computer vision. 11. 29] J. vol. Workshop on Visual Motion. Image Processing IV. \Stochastic modeling and estimation of multispectral image data. pp.D. \Motion compensating eld interpolation using a hierarchically structured displacement estimator. 26] M. 10 . no. Bierling and R. degree (summa cum laude) from the University of Delaware in 1986. New Haven. de Haan and P. Hadamard. Hotter and R. 4.S.E. Rheinbolt. vol. 1970. 4. Schultz was born on March 19. image processing. He joined the faculty of the Department of Electrical Engineering at the University of Notre Dame in 1990. 3.E. Beizen. pp. pp. Stevenson was born on December 20. and the Ph. L." Signal Processing. 1988. R. in Ridley Park. 1989. vol. While at Purdue he was supported by a National Science Foundation Graduate Fellowship and a graduate fellowship from the DuPont Corporation. Shulman and J. 30] D. North Dakota. Thoma. pp. Tau Beta Pi. 15. August 1995. 387{404.E.E. vol." IEEE Trans.24] M. 1963. SPIE Conf.D. 27] M. Lectures on the Cauchy Problem in Linear Partial Di erential Equations. and Phi Kappa Phi. He received the B. C. Dr. 1199. Thoma. Dr.S. (Irvine. Stevenson. Computer Science and Applied Mathematics. 8. CA). where he is currently an Assistant Professor. \Image segmentation based on object oriented mapping parameter estimation. Ortega and W. Iterative Solutions of Nonlinear Equations in Several Variables. respectively.-Y. Image Processing. degrees from the University of Notre Dame in 1992 and 1995. 1994. Visual Commun. Robert L. Orchard. C. 248{258. Eta Kappa Nu. in Electrical Engineering from Purdue University in 1990. and Ph. pp. 229{ 239.E." in Proc. M. CT: Yale University Press. New York: Academic Press. His current research interests include digital image and video processing and the analysis of biomedical imagery. where he is currently an Assistant Professor. and the M." Signal Processing. vol. 28] J. degree (summa cum laude) from the University of North Dakota in 1990. 81{86." in Proc. 1923. in Grafton. 6. and a member of both Eta Kappa Nu and Tau Beta Pi. 1967. Schultz and R. Pennsylvania. 1989. 315{334. Stevenson is a member of the IEEE. \Sub-pixel motion estimation with 3-D recursive search block-matching. no. Schultz is a member of the IEEE." Signal Processing: Image Commun. 1986. He received the B. His research interests include multidimensional signal processing. A..

47 5. = 1. M = 7. (l.97 11 .43 1.25 1.TABLE I Comparison of Interpolation Methods on the Airport Sequence Technique Single Frame Bilinear Interpolation Single Frame Cubic B-Spline Interpolation Single Frame MAP Estimation.48 6. (l.00 TABLE II Comparison of Interpolation Methods on the Mobile Calendar Sequence Technique Single Frame Bilinear Interpolation Single Frame Cubic B-Spline Interpolation Single Frame MAP Estimation. (l. = 1. = 1. = 1 10 Video Frame Extraction with Motion Estimates.k) = jl?kj 10 Video Frame Extraction with Motion Estimates. = 1 Single Frame MAP Estimation. M = 7.57 1. (l. = 1.82 1.24 0.k) = jl?kj 10 Video Frame Extraction with Motion Estimates. M = 7. = 1. = 1 10 Video Frame Extraction with Motion Estimates.05 1.72 7. M = 7. = 1. = 1 Single Frame MAP Estimation.k) = j1000j l?k Video Frame Extraction with Panning.27 1. M = 7.72 0. M = 7.k) = j1000j l?k SNR (dB) 0.k) = jl?kj SNR (dB) 0.k) = jl?kj Video Frame Extraction with Panning.51 3. (l. (l.

1.k) Fig. 12 . ZOH{2 corresponds to a zero-order hold of the input eld by a factor of two.k) ZOH-2 ZOH-2 MAP-2 MAD ZOH-2 MAP-2 ZOH-2 MAP-4 MAD ZOH-2 MAP-4 ˆ v (l. Triangles within each block represent initial conditions. Hierarchical subpixel motion estimator for q = 4. MAD represents the block matching motion estimator using the mean absolute di erence criterion.k) © ZOH-2 UPD ˆ v´ (l. and UPD corresponds to the unobservable pel detector. MAP{n denotes an nth order up-sampler using the Bayesian MAP interpolation algorithm.y (k) y (l) MAD ˆ v (l.

(d) Video frame extraction with averaged motion estimates. (b) Low-resolution frame y(k) . 2. Synthetic Airport sequence. l?k ab c d 13 .k) = j1000j .Fig. = 1. (a) Original high-resolution frame z(k) . M = 7. M = 1. (l. (c) Single frame MAP estimate. = 1.

(a) Original high-resolution frame z(k) .Fig. a b 14 . 3. Mobile Calendar sequence. (b) Low-resolution frame y(k).

k) = jl10kj . (b) Video frame extraction with motion estimates. (a) Single frame MAP estimate. Mobile Calendar sequence. = 1. = 1. ? a b 15 . (l.Fig. 4. M = 7. M = 1.

(c) Single frame MAP estimate. M = 1. (a) Original high-resolution frame z(k) . ? ab c d 16 . (b) Low-resolution frame y(k).Fig. Details of the Mobile Calendar sequence. = 1. (d) Video frame extraction with motion estimates. = 1. M = 7.k) = jl10kj . 5. (l.

a b 17 . obtained by averaging each estimated displacement eld.0 0.0 4. Improved SNR versus number of frames in video observation model.0 2. (b) Mobile Calendar sequence.0 0.5 GMRF/Motion Vectors 1.0 7.0 ∆ SNR (dB) 5.0 6.5 2.0 3.0 1 3 5 7 GMRF/Motion Vectors HMRF/Motion Vectors HMRF/Camera Pan GMRF/Camera Pan M Mobile Calendar Sequence 2.0 1 3 M 5 7 Fig. (a) Airport sequence. Prior image models include the Huber-Markov random eld (HMRF) model and the Gauss-Markov random eld (GMRF) model.5 0. 6.Airport Sequence 8. \Motion Vector" corresponds to displacement vectors estimated independently by the hierarchical subpixel motion estimator. while \Camera Pan" designates a single motion vector for each frame.0 HMRF/Motion Vectors ∆SNR (dB) 1.0 1.

7. (a) Low-resolution frame y(k) of progressively-scanned sequence.k) = jl?kj . a b 18 . M = 5. = 1. (b) Video frame extraction with 5 averaged motion vectors. Progressively-scanned Dome Sequence. (l.Fig.

(b) Video frame extraction with averaged motion vectors. (l.Fig. 3 odd elds). Interlaced Dome Sequence. (a) Low-resolution frames y(k) (even eld) and y(k+1) (odd eld) of interlaced sequence. a b 19 . M = 5 (2 even elds.k) = jl?kj . 8. combined into a single 5 frame. = 1.