You are on page 1of 54

CSE412

SELECTED TOPICS IN
COMPUTER ENGINEERING

VIDEO ENCODING
ITU-R BT.601 Digital Video
Spatial Resolution
• A digital video can be obtained either by sampling a raster
scan, or directly from a digital video camera.
• The International Telecommunications Union – Radio
Sector (ITU-R) developed the BT.601 standards for digital
video formats to represent different analog TV video
signals for both 4:3 and 16:9 aspect ratios.

2
ITU-R BT.601 Digital Video
Color Coordinate
• The BT.601 standard also defines a digital color coordinate, known
as YCbCr.
• Assuming the range of RGB values is 0-255 the transformation
matrix is:
  
Y 0.257 0.504 0.098 R    
16
Cb = − 0.148 − 0.219 0.439 G  + 128
      
 Cr  0.439 − 0.368 − 0.071  B  128

3
ITU-R BT.601 Digital Video
Color Coordinate
• The inverse transformation is:

In the relation above, R =255R ,G =255G , B =255B eht era ,


,seiramirp BGR dezilamron eht fo tnelaviuqe latigidR ,G ,B.

4
ITU-R BT.601 Digital Video
Chrominance Subsampling
• Chrominance is usually subsampled at a reduced rate than the
luminance, resulting in different formats.
• In 4:2:2 format, the chrominance is subsampled at half the
sampling rate of luminance. This leads to half the number of Cb &
Cr pixels per line.
• To further reduce sampling rate, the chrominance is subsampled by
a factor of 4 along each line, resulting in 4:1:1 format. However,
this format yields asymmetrical resolutions in the horizontal and
vertical directions.
• To solve the above problem, the chrominance components are
subsampled by half along both the horizontal and vertical directions
in the 4:2:0 format.

5
ITU-R BT.601 Digital Video
Y pixel Cb & Cr pixel

4:4:4 4:2:2
For every 22 Y pixels 4 For every 22 Y pixels 2
Cb & 4 Cr pixels (no Cb & 2 Cr pixels
subsampling) (subsampling by 2:1
horizontally only)
BT.601 chrominance subsampling formats 6
ITU-R BT.601 Digital Video
Y pixel Cb & Cr pixel

4:1:1 4:2:0
For every 41 Y pixels 1 For every 22 Y pixels 1
Cb & 1 Cr pixel Cb & 1 Cr pixel
(subsampling by 4:1 (subsampling by 2:1 both
horizontally only) horizontally and vertically)

BT.601 chrominance subsamplingformats 7


Digital video - Chroma sub-sampling
• 4:4:4, 4 pixels of Y, Cb and Cr each
• 4:2:2 : Cb and Cr are half, NTSC uses this
subsampling
• 4:1:1 : Cb and Cr are factor of four
• 4:2:0 : Cb and Cr are subsampled, effectively 4:1:1
– Used in JPEG and MPEG
ITU-R BT.601 Digital Video

9
Chroma sub-sampling
Frequency Domain Characterization of
Video Signals
• A video is a 3-D signal, having two spatial dimensions (horizontal and
vertical) and one temporal dimension (time).
• The 2D spatial frequency is a measure of how fast the imageintensity
or color changes in the 2D image plane.

11
Compressing Digital Video
• Exploit spatial redundancy within frames (like JPEG:
transforming, quantizing, variable length coding)
• Exploit temporal redundancy between frames
– Only the sun has changed position between these 2 frames

Previous Frame Current Frame


1
3

Simplest Temporal Coding - DPCM


• Frame 0 (still image)
• Difference frame 1 = Frame 1 –
Frame 0
• Difference frame 2 = Frame 2 –
Frame 1
• If no movement in the scene, all
difference frames are 0. Can be
greatly compressed! 0 1 2 3
• If movement, can see it in the
difference images
Difference Frames
• Differences between two frames can be
caused by
– Camera motion: the outlines of background or
stationary objects can be seen in the Diff Image
– Object motion: the outlines of moving objects can
be seen in the Diff Image
– Illumination changes (sun rising, headlights, etc.)
– Scene Cuts: Lots of stuff in the Diff Image
– Noise
Difference Frames
• If the only difference between two frames is
noise (nothing moved), then you won’t
recognize anything in the Difference Image
• But, if you can see something in the Diff Image
and recognize it, there’s still correlation in the
difference image
• Goal: remove the correlation by compensating
for the motion
Types of Motion

• Translation: simple
movement of typically
Frame n
rigid object

Frame n+1 Frame n+2


(Rotation) (Zoom)

• Rotation: spinning about an axis


❑ Camera versus object rotation

• Zooms –in/out
Frame n Frame n+1 ❑ Camera zoom vs. object zoom

(movement in/out)
Describing Motion
• Translational
– Move (object) from (x,y) to (x+dx,y+dy)
• Rotational
– Rotate (object) by (r rads) (counter-clockwise
/ clockwise)
• Zoom
– Move (in/out) from (object) to
increase/decrease its size by (t times)
Motion Estimation
• Determining parameters for the motion
descriptions
• For some portion of the frame, estimate its
movement between 2 frames - the current
frame and the reference frame
General Idea
• For a region PC in the current frame, find a
region PR in the search window in reference
frame so that Error(PR,PC) is minimized

Current
Portion
Reference of
Search Frame
Frame interest
window
PC
Block-based Motion Estimation
• PC is a block of pixels (in the current frame)
• The search window is a rectangular segment (in
the reference frame)

T=1 (reference) T=2 (current)


Motion Vectors
• A motion vector (MV) describes the offset
between the location of the block being coded (in
the current frame) and the location of the best-
match block in the reference frame

T=1 (reference) T=2 (current)


Motion Compensation

The blocks being predicted are on a grid


1 3 1 2 3 4
2 4
5 6 7 8
6 7 8
5
9 10 11 12
10
9 12
11
14 13 14 15 16

13 15 16
2
4

Motion Vector Search

1. Mean squared error Given error measure, how


– Select a block in the to efficiently determine
reference frame to best-match block in search
minimize window?
Σ(b(Bref)-b(Bcurr))2 – Full search: best
results, most
2. Mean absolute error computation
– Select block to – Logarithmic search
minimize – heuristic, faster
Σ|b(Bref)-b(Bcurr)| – Hierarchical motion
estimation
Motion Vector Search
• Full search: Evaluate Logarithmic Search: First examine positions
marked 1.
every position in the Choose best of these (lowest error
measure) and examine positions
search window marked 2 surrounding it
Choose the best of these, and examine the
positions marked 3
Final result = best of these
Hierarchical Motion Estimation
• Use an averaging filter on the image, then
downsample by a factor of 2
• Conduct a search on the downsampled image
(only ¼ of the size)
• Given the results of the search on the
downsampled image, return to the full resolution
image and refine the search there
Motion Compensation
• The standards do not specify HOW the encoder will
find the motion vectors (MVs)
• The encoder can use exhaustive/fast search, MSE
/MAE/other error metric, etc.
• The standard DOES specify
– The allowable syntax for specifying the MVs
– What the decoder will do with them
• What the decoder does is to grab the indicated
block from reference frame, and glue it in place
Motion Estimation
Full-search Block Matching Algorithm (FSBMA)
• Motion estimation (ME) is to find the closest estimate of the data
in the current frame from those in the previous frame.
• Motion estimation is performed by block matching and
translational motion is assumed in ME.

= Motion
vector

Frame (n–1) Frame n

28
Motion Estimation
Frame (n−1)

W(j,k)

M Frame n

N
N N

MAD =
1
N2
 W ( j, k) − X ( j, k ) N
j =1 k =1
X(j,k)

29
Motion Estimation
Motion Estimation
• Calculate the MAD for all positions of X(j,k) )within the
search window W(j,k) .R is the maximum search range.
• Find the minimum MAD to be the best match.
• Motion vector is defined by MV(vx ,vy).

If an operation consists of
1 subtraction, 1 comparison
and 1 addition, then the
number of operations for a
vy search block is
(2R +1)2 N 2.
vx
For an LL image, the
number of operations is
(2R +1)2 L2.
Motion Estimation - Fast Algorithms
3-step Search BMA
1. In the first step , 8 coarsely spaced pixels around the center pixel
z0 are tested as shown in the figure.
2. In the second step, again 8 pixels are used around the pixel of
minimum MAD found in step 1, but the stepsize is halved.
3. The final step is basically a repetition of step 2 ,but with a
stepzize of 1.
4. The initial stepsize R0 is usually equal or slightly larger than half
of the maximum search range R.
5. The number of steps required is S gol =2R . The number search
points would be (8S+1).
Motion Estimation - Fast Algorithms
–7 –6 –5 –4 –3 –2 –1 0 +1 +2 +3 +4 +5 +6 +7
+7
+6
Step 1
+5
+4 Step 2

+3 Step 3

+2
MV
+1
0
z0
–1
–2
–3
–4
–5
–6
–7
3-step search motion estimation
Motion Estimation - Fast Algorithms
2D-logarithmic Search BMA
1. It starts from the center of a diamond search region. Each step
consists of calculating 5 search points, located at the center and
the four corners n pixels (stepsize) away, as illustrated in the
figure.
2. The search in the next step is centered at the best matching
point of the current step.
3. The stepsize is reduced to half of its current value if the best
match is located at the center or at the boundary of the search
range. Otherwise it remains the same.
4. When the stepsize is reduced to 1 ,we have arrived at the final
step.
Motion Estimation - Fast Algorithms
–6 –5 –4 –3 –2 –1 0 +1 +2 +3 +4 +5 +6
+6
+5 Step 1
+4 Step 2

+3 Step 3

+2 Step 4

+1 MV
0
z0
–1
–2
–3
–4
–5
–6

2D- logarithmic search motion estimation


Motion Estimation - FSBMA

Frame (n – 1) Frame n

Motion vectors
Motion Compensation
• Consider a sequence of images,
S =f (t), t = 0,1 ,2... ,

where f(t) is a 2-D intensity distribution at time t.


• The frames are partitioned into blocks of NN pixels and let
b(s,t) denote the pixel intensities in a block s at time t. The block
difference bd (s,t) is obtained by subtracting the previous block
b(s,t − ) from b(s,t).
• If the block under consideration has moved, the displaced block
difference, dbd (s,t) ,which might reduce the block difference, bd
(s,t) ,is given by

dbd(s,t) = b(s,t) − b(s,D t − )


where D is the motion vector.
Motion Compensation

Frame-difference image between Motion-compensated frame-


frames 1 and 6 of Calendar. difference image between
frames 1 and 6 of Calendar..
Half-pixel Accuracy Search

dm dm = MV
Current
block, Bm

Matching
block,
B m

Half-pixel accuracy search motion estimation


Half-pixel Interpolation
I(x, y) A I(x+1, y)

B E C

I(x, y +1) D I(x+, 1 y+1)

A = [I (x, y) + I (x +1, y)] / 2 , C = [I (x +1, y) + I (x +1, y +1)]/ 2 ,


B = [I (x, y) + I (x, y +1)]/ 2 , D = [I (x, y +1) + I (x +1, y +1)]/ 2 ,
E = [I (x, y) + I (x +1, y) + I (x, y +1) + I (x +1, y +1)]/ 4.
Comparison of ME Algorithms

Anchor frame Target frame

Motion field -
half-pixel Predicted
accuracy image
Hierarchical Motion Estimation
• Full-search motion estimation algorithm is an extremely
computationally intensive method.
• To reduce the number of computation steps, hierarchical
approach can be employed by first searching the solution in a
coarse resolution, and then refining it at a finer resolution
around a small neighborhood of the previous solution.
• The following figure illustrates a 3-level hierarchical block
matching algorithm where the spatial resolution of each level is
halved, both horizontally and vertically, of the level below it in
the pyramid.
• We assume the same block size is used at different levels, so
that the number of blocks is reduced by half as well in each
dimension.
Hierarchical Motion Estimation

Comparison of motion estimation algorithms


4
4

Motion Compensation Example

Frame n-1 Frame n

MOTION COMPENSATED
Frame n
Motion Compensation
• This glued together frame is called
the motion compensated frame
• The encoder can also form the
difference between the motion
compensated frame and the actual
frame.
• This is called the motion
compensated difference frame
• This difference frame formed using
MC should have less correlation
between pixels than the difference
frame formed without using MC
Motion Compensated Difference
Frames

Suppose we are doing lossless coding


Encoder has sequence of frames: …, F(n-2), F(n-1)
Next: encode F(n)
Past frames have been losslessly encoded, so the
decoder knows F(n-1) perfectly already
Encoder sends the motion vectors for frame F(n)
relative to frame F(n-1), to form motion
compensated frame M(n)
– Encoder knows M(n), Decoder knows M(n)
4
7

Motion Compensation Example

Frame n-1 Frame n

MOTION COMPENSATED
Frame n
Encoding Difference Frames
Encoder forms motion With no motion compensation
compensated diff frame: encoder could do frame diff:
MCD(n) = F(n) – M(n) FD(n) = F(n) – F(n-1)
Encoder losslessly encodes Encoder losslessly encodes
MCD(n) FD(n)
Decoder can then do Decoder can then do
F(n) = MCD(n) + M(n) F(n) = FD(n) + F(n-1)
→ knows F(n) exactly → knows F(n) exactly

If successive frames are very similar:


fewer bits to send Motion Vectors + MCD(n) instead of FD(n)
fewer bits to send FD(n) instead of F(n)
Motion compensated difference frames

• Decoder knows F(n-1) and, once you send the


motion vectors, it knows M(n)

Send FD(n)

Reference Frame Original Frame Difference Image


F(n-1) F(n) FD(n)=F(n)-F(n-1)

Send Motion
Vectors Send MCD(n)
Motion compensated Motion compensated
frame M(n) difference image
MCD(n) =F(n) – M(n)
Motion estimation philosophy
• Goal of motion estimation is NOT to provide a
careful analysis of the actual motion
• Goal is to achieve a given quality of representation
of the video while globally minimizing the bit rate
required to send
– The motion information
– The prediction error information
• Most of the time, for a given representation quality
– fewer bits to send MV+MCD(n) instead of sending FD(n)
– fewer bits to send FD(n) instead of sending F(n) itself.
Motion Estimation/Compensation
Summary
At the encoder:
– For each block in the frame being coded,
examine the search window(s) in the
reference frame to find the best match block
– Form the MC difference image = original
image minus motion compensated image
Motion Estimation/Compensation
Summary
At the decoder:
– Decode the reference frames
– For each block in a temporally coded frame,
use the motion vector to select a block from
the reference frame and glue it in place
– Add the difference image
Temporal Location of Reference
• The reference frame need not occur before the
temporally coded frames which use it
Flavors of Motion Estimation
1. Forward predicted blocks: the best-match block
occurs in the reference frame before the block’s
frame
2. Backward predicted blocks: the best-match block
occurs in the reference frame after the block’s frame
3. Interpolatively predicted blocks: the best-match
block is the average of the best-match blocks from
reference frames before & after

The motion compensation direction can be selected


independently for each block in a frame.

You might also like