You are on page 1of 27

# Coding Technologies for Video:

## acquisition, compression and display.

Dunai Fuentes Maria Jos Giner Pablo Lanaspa ZhihengXu

INDEX
1 2 3 4 Capture and Editing .............................................................................................................................. 3 Video Compression .............................................................................................................................. 4 Video Analysis ..................................................................................................................................... 6 Entropy Coding .................................................................................................................................... 7 4.1 5 6 7 Homemade Entropy Encoder ........................................................................................................ 9

Transform and Quantization............................................................................................................... 10 Controlling the Backlight ................................................................................................................... 13 Professional Stuff (all-night Launching) ............................................................................................ 21

Annex 1 FFMPEG ................................................................................................................................... 23 Annex 2 PSNR ......................................................................................................................................... 24 Annex 3 Entropy ...................................................................................................................................... 24 Annex 4 Q ................................................................................................................................................ 25

## 1 CAPTURE AND EDITING

For calculating the mean luminance value we just used the Matlab function mean , first for calculating the mean of every column and afterwards for the mean the resulting row.Mean_y = mean(mean(y)); What rests is a value with the mean luminance of the frame. For the upper frame Mean_y =62.6867 Knowing about this feature tell us about how bright or dark is the image in a scale from 0 to 255. Displaying the image in colour with imshow(); is easier to do if we have the RGB colour space representation of it. The other given function yuv2rgb.m simply converts our selected frame to RGB. RGB = yuv2rgb(y,u,v); imshow(RGB);

2 VIDEO COMPRESSION
Firstly we are going to describe the processing of digital video from capture to display and then explain and illustrate H.264 coding.

Compression takes place in the camera while recording the video but further processes can be execute on a computer. It is essential to understand the changes the video suffers while we compress it so we will be able to minimize the losses we are taking during the process.During the video compression process, a video stream is analyzed and unnecessary parts of the data are discarded in order to make a large video file smaller in size.There are essentially two ways to compress data in a video file: intraframe and interframe.

Intraframe (I-frame) compression compresses each individual frame of the video (similar to JPEG compression of a still image). Every frame of the image is considered as a still image. With intraframe compression, the complete frame is only slightly compressed, so the file size isnt that much smaller because each individual frame is included in the newly-compressed version. Interframe compression takes a look at each frame in a video file, compares it to the previous frame and stores only the changed data frame from frame so the file size is much smaller than intraframe compression.

In the image below we have a diagram that summarize the changes that takes every frame during the video compression:

The original image is divided into a set of square blocks, usually 8x8 pixels. The image data are transformed using the DCT to a new set of coefficients. The transform coefficients are quantizedusing a simple multiply-round-divide operation .

The quantized coefficients are zeroed by this operation, making the image well-suited for efficient lossless compression applied before to storage or transmission. To reconstruct the image, the quantized coefficients are converted by the inverse DCT, creating a new image that approximates the original. The error is the difference between the original and the reconstruction and it consists of mainly high frequency texture.(References:https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8multiresolution-image-representations/)

Seeing the effects of these parameters in your own video is always instructive so we created some clips where we could see the differences (the how in Annex 1).

We adjusted the settings of the encoding in different ways which have produced diverse outputs: Producing streams with constant bitrate: High level bitrate low compression Low level bitrate high compression Producing streams with variable bitrate: To produce streams with variable bitrate through ffmpeg you need to set a constant qquantization factor. Choosing a q-factor with the value of 5 we have created a low-compressed video which had good quality. Then we changed the value to 40 in order to create a highcompressed video and easily notice a huge difference in the video-quality. In conclusion, we found out that the lower we chose bit rate, the lower quality the video had. After trying several bitrate levels we decided that for a Full HD video resolution (1920x1080) the minimum bit rate that produced an acceptable image quality was 0.6 Mb/s.

3 VIDEO ANALYSIS
In this part of the work we had to change the tool programme. We used ElecardStreamEye to open and analyse coded H.264 streams. The programme shows useful information for analysing the video as GOP structure, picture size, Motion Vectors (MV)... - The frames that normally consumes more space are the original ones and the frames that normally consumes less space are the bi-directionally predicted ones. - The motion vectors predict in which way the motion is going to go. They go pixel by pixel looking around their surrounding pixels with the aim of finding the pixel that fits better with themselves. In this way we can predict the next frame without losing quality in the video.

- The 16x16 squares prediction is normally used in blocks of the frame in the edges or where there are lot of details in order to maintain as much quality as possible in the picture. Whereas the 4x4 prediction is usually used in parts of the frame where it is not expected to have a big change or where the pixels in the zone are very similar. Later we wrote a script in Matlab to calculate the average PSNR of the luminance component of our videos (Annex 2). On the one hand we observed that PSNR is not always a good measure of the quality because the PSNR value didnt change too much when we were comparing constant bitrate on low and high compressed videos, whereas the quality difference was very obvious. In the other hand PSNR was a good quality measure when we had to compare variable bitrate compressions. (See the following table) This is probably caused by the optimization algorithms of h.264. Whenever you indicated a lowconstant bitrate it tries to do the best compression and that implies a high PSNR. However, when dealing with an imposed high q it can that much to solve the mess.

## Average PSNR 28,1844 27,5440 42,3248 61,0033

Constant high compression (0,6Mb/s) Automatically set by FFMPEG Variable highcompression Variable lowcompression 40 5

4 ENTROPY CODING
Exercise 1:

Exercise 2:

After some rest we already had our minds structured and soon we achieved the working (just working) script that goes from the 1080x1920 matrix, 1 byte per pixel, to a huge string of binary states (0,1) which codified the information stored in the previous matrix using the huffman coding. function[SYM,PROB,sig_a,DICT,sig_encoded]=encoder(filename,width,height,framenumber) COUNT = zeros(256,1); [y,u,v]=extractyuv420(filename,width,height,framenumber); sig_a=[1:2073600]; fori=1:1:1080 for j=1:1:1920 number= y(i,j); sig_a((i-1)*1920+j)= number; COUNT(number)=COUNT(number)+1; end end PROB=COUNT/(1080*1920); SYM=[0:1:255]; DICT = huffmandict(SYM,PROB); sig_encoded=huffmanenco(sig_a,DIC); sig_encoded had for the fifth frame of our video (framenumber=50) a total of 13428574 bits, way less than the original one which needed 1920*1080*8=16588800 bits to represent the luminance of the same frame. Getting the original image from the coded one is just as simple as type DECO = huffmandeco(sig_encoded, DICT) where DECO would be the original image again (because there is no loss). To calculate the entropy and the average number of bits used per symbol we implemented some new scripts (entropy.m and average.m) that worked with the information we already had.

The results for this frame were: Entropy = 6.14083 ; Average = 6.4083 With an efficiency of 95.97% we can undoubtedly say that we have a really good compression. But we had been missing something all the way round. The fact that we used a make-it-yourself dictionary which fitted perfectly the frame we were working on. This process makes no sense for a versatile encoder.Another dictionary with the probabilities of each representable luminance (or color), extracted from the experience with different kinds of video environments, wont perform as well as ours did with our working frame but will get better results with a wider range of videos and as it can be defined in the codec itself it wouldnt be necessary to send this information to the decoder. H.264 uses one of a few tables (dictionaries) depending on the properties of the video. That way it keeps its flexibility while it improves its accuracy. Benchmarking vs h.264 wasnt even in our expectations after seen that it took almost 20 mins to do the Huffman encoding of a single frame.

## 5 TRANSFORM AND QUANTIZATION

To get started with this part we began by coding the frame with a constant Q for every coefficient of the DCT: y_0 = dct2(y) y_q40 = round(y_0/40); y_q5 = round(y_0/5); y_display40=idct2(y_q40*40); imshow(y_display40,[0,255]); figure; y_display5=idct2(y_q5*5); imshow(y_display5,[0,255]); Where y is the luminance of the second frame of video3.yuv The results were as expected: High quantization level

10

## Low quantization level

We may optimize the algorithm by doing a scaled quantization being the high frequencies (responsible of the details and not easily appreciate by the human eye) eliminated or highly quantitates while the low frequencies are carefully quantitates so to not have losses on it. The pattern we followed can be seen below this sentence:

11

Scripts used: quantization.m and reverse_quantization.m (check Annex 3 Q) The resulting image:

12

## 6 CONTROLLING THE BACKLIGHT

For all the testing in this part we will be also using the second frame of the video: [y,u,v]=extractyuv420('video3.yuv',1920,1080,2); RGB = yuv2rgb(y,u,v); imshow(RGB); Original

When displaying an image on a screen we have to set some backlight value. By decreasing this value we can save energy and get better blacks (because a highly backlighted black looks like a grey). On the

13

other hand if we just decrease too much the backlight we will get a darker image and this is not what we want. Our goal is to reduce the backlight, especially in those dark areas where it also improves the image quality, while maintaining the display as the original image. We will begin by calculating a single backlight value for the whole picture.

This can be done in four ways: 1 Maximum LED values: In RGB colour space three values determinate the colour of each pixel. These values are scaled from 0 to 1 but are rarely push to their limits. In our case the maximum value of one in the second frame is on the blue for 0.5847 [y,u,v]=extractyuv420('video3.yuv',1920,1080,2); RGB = yuv2rgb(y,u,v); red=RGB(:,:,1); green=RGB(:,:,2); blue=RGB(:,:,3); R=max(max(red)); G=max(max(green)); B=max(max(blue)); We can take advantage of these limit to increase the colourage, decrease the backlight and keep having the same final result.

Summing up, the new image for a backlight of 58.47% would be... RGB2 = RGB/0.5847;

14

2 Maximum Luminance of the image: For this we will introduce a new color space more suitable for this task, YCbCr. It is quite similar to YUV because it has a luminance (Y) and two chrominance but they are all scaled from 0 to 1 instead of 0 to 255. Again we can push the max number to 1 and divide every other one for the same factor so the backlight will this very same factor, less than 100% YCBCRMAP = rgb2ycbcr(RGB); Y = ycb1(:,:,1); Ymax = max(max(Y)); Ynew=Y/Ymax; YCBCRMAP2=Ynew; YCBCRMAP2(:,:,2)=YCBCRMAP(:,:,2); YCBCRMAP2(:,:,3)=YCBCRMAP(:,:,3); RGB3 = ycbcr2rgb(YCBCRMAP2); outframe2 = saveSIM2frame1Value(255*RGB3, BackLight2, 'testing2'); The resulting value for the backlight is 0.8273

15

3 Average Luminance of the image: Now we are taking the average luminance to 1. This will make some values (every one bigger than the average) bigger than 1 which is not possible because the LCD cant generate more light. We round every value higher than 1 to 1 and lose some information. Ymean=Y/mean(mean(Y)) Ymean(Ymean>1)=1; BackLight3=mean(mean(Y)) YCBCRMAP3=Ymean; YCBCRMAP2(:,:,2)=YCBCRMAP(:,:,2); RGB4 = ycbcr2rgb(YCBCRMAP3); outframe3 = saveSIM2frame1Value(255*RGB4, BackLight3, 'testing3'); BackLight3 = 0.3759

16

4 Square-root of the average luminance Repeat the steps in 3 but with the square-root: Yroot=Y/sqrt(mean(mean(Y))); Yroot(Yroot>1)=1; BackLight4=sqrt(mean(mean(Y))); YCBCRMAP4=Yroot; YCBCRMAP4(:,:,2)=YCBCRMAP(:,:,2); YCBCRMAP4(:,:,3)=YCBCRMAP(:,:,3); RGB5 = ycbcr2rgb(YCBCRMAP4); outframe4 = saveSIM2frame1Value(255*RGB5, BackLight4, 'testing4'); BackLight4 = 0.6131

17

The square-root has better quality because there we less higher-than-1 numbers after dividing.

The backlight for the last 2 are, for the average and the square-root respectively

18

## Combining the pictures with their backlights:

19

To actually see the differences between the last two images and the original one we can subtract them to the original, square the difference and display them as gray scale pictures. MSE1 = (RGB RGB4*BackLight3).^2; Grey1 = (MSE1,[],3); max(max(Grey1)) 0.0482 imshow(Grey1, [0 , 0.0482]);

20

## MSE2 = (RGB RGB5*BackLight4).^2; Grey2 = (MSE2,[],3); max(max(Grey2)) 0.0817 imshow(Grey2, [0 , 0.0817]);

The higher Grey2 value from the square-root of the average vs. Grey1 from the average indicates that the maximum difference with the original is lower for Grey2. Anyway, we can clearly see that the second image is overall darker than the first one which also means that the differences with the original are lower. (0 difference 0 Black).

## 7 PROFESSIONAL STUFF (ALL-NIGHT LAUNCHING)

Once we had learnt the whole video processing chain: Acquisition, Compression and backlight dimming, we had to prepare our videos for display. We used some functions made at DTU to compare some of the outputs created through different algorithms. We could use two different structures for the modeled backlight, diverse algorithms such as full blacklight, maximum luminance value, average luminance value, square-root of the average luminance value and the homemade algo.

The DTU algorithms in Matlab could only work with uncompressed avi format. Thus, we had to change the format of the videos and we created two different ones. The first one was just an uncompressed avi version from the original one and the second one was a little trickier. We compressed the original one

21

with a constant bitrate of 0.6Mb and then we transformed it to an uncompressed avi version in order to be able to work with it in MatLab. In this way we had two videos in the same format, the first one with good quality and the second one with a worse one.

Some post-process calculation were required: Avg, 8 rows 2 columns backlight, previously high compressed video PSNR = 9.626 Avg, 2202 LEDs, previously high compressed video PSNR = 10.2525 Bbgd, 8 rows 2 columns backlight, previously high compressed video PSNR = 8.522

We can notice some improvement in that very low quality video when using precise backlight dimming.

22

ANNEX 1 FFMPEG
For the sake of simplicity we created several videos equals to the third original video (the one with more contrasts) but with some special differences in format and compression: 1 Original, reduced to 10 seconds, 250 frames, FullHD, uncompressed .avi ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodecrawvideo -s 1920x1080 uncompressed.avi 2 Original, reduced to 10 seconds, 250 frames, Full HD, (uncompressed) .yuv ffmpeg -i uncompressed.avi video3.yuv 3 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, low bitrate.mp4 ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 b 0.6M lowbitrate.mp4 4 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, high bitrate .mp4 ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 b 5M highbitrate.mp4 5 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, variable bitrate, low q .mp4 ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 qmax 5 lowq.mp4 6 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, variable bitrate, high q .mp4 ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 qmin 40 highq.mp4 7 Original, reduced to 10 seconds, 250 frames, Full HD, from previously high compressed video, uncompressed .avi ffmpeg -ilowbitrate.mp4 -vcodecrawvideo uncompressedfromcompressed.avi

NOTE: many more videos were created but those are the ones remaining and used in this report.

23

ANNEX 2 PSNR

ANNEX 3 ENTROPY
function [average] = average (filename,width,height,framenumber,DICT,PROB) average=0; n_boits=zeros(256,1);

24

## function [SYM, PROB, DICT, result]=entropy(filename,width,height,framenumber) COUNT = zeros(256,1);

fori=1:1:1080 for j=1:1:1920 number= y(i,j); COUNT(number)=COUNT(number)+1; end end PROB=COUNT/(1080*1920); SYM=[0:1:255]; DICT = huffmandict(SYM,PROB); result = 0; fori=1:1:255 if (PROB(i)~= 0) result = result -PROB(i)*log2(PROB(i)); end end

ANNEX 4 Q

fori=1:1:height/4

25

for j=1:1:width/4 Matrix(i,j)=round(y_0(i,j)/5); end end fori=height/4:1:height/2 for j=width/4:1:width/2 Matrix(i,j)=round(y_0(i,j)/10); end end fori=height/2:1:height*3/4 for j=width/2:1:width*3/4 Matrix(i,j)=round(y_0(i,j)/30); end end fori=height*3/4:1:height for j=width*3/4:1:width Matrix(i,j)=0; end end

## fori=1:1:height/4 for j=1:1:width/4 Matrix_R(i,j)=Matrix(i,j)*5; end end

26

fori=height/4:1:height/2 for j=width/4:1:width/2 Matrix_R(i,j)=Matrix(i,j)*10; end end fori=height/2:1:height*3/4 for j=width/2:1:width*3/4 Matrix_R(i,j)=Matrix(i,j)*30; end end Matrix_R = idct2(Matrix_R);

27