You are on page 1of 17
CHAPTER 1 INTRODU! TON 1 OVERVIEW Data compression is the process of reducing the memory space required to store data from any source. It is a reversible process that can regenerate an approximated version of the original data after compression. Data can be text, signals or multimedia contents like images, audio or video. Based on the capability of regenerating the originality, the compression methods are divided into lossless methods and lossy methods. Lossless methods do not compromise on the originality of the content at the time of regeneration. Text compression needs lossless compression, because the loss of a single character changes the meaning of the content or renders it meaningless. Multimedia content, however, can afford lossy compression with an acceptable degree of loss, Video compression is the process of encoding a digital video file into a form that takes limited storage space and transmission bandwidth, Constraints in video encoding and decoding include video quality, storage space and bit rate consumption (Stuttgen 1995), In this research, the audio signals of a video clip are not considered. Video compre: ‘ion is an important preprocessing step in video streaming (Chwan 1998). A video compression method uses two components: an encoder at the video streaming transmission end and a decoder at the receiving end. ANNA UNIVERSITY. CHENNAL- 600 025 ‘The decoder applies a reversible transform on the encoded videos to decode them into an approximated version, The output of the encoder is a compact version of the original video content, The decoder applies a reverse form of the transform that was used earlier in the encoder. This is because the encoded content may be distorted by the different steps in the encoding process. The role of video compression in video streaming is illustrated in Figure 1.1 Source Node Destination Node iat Compressed |Decompressed << 9. +4 Decoder idee Transmitting channel Encoder Figure 1.1 Video streaming Seapent y Seament segment 8 “TS z] ele) Pe Figure 1.2 A hierarchy of video contents x shot 4 Frames ‘A video sequence can be divided into several segments, each containing a sequence of shots with a set of frames. A shot is a collection of continuously captured frames that corresponds to a scene. A hierarchy of contents in a video sequence is given in Figure 1.2. In general, digital video processing is performed at a frame level. Frames are further divided into macro blocks for exploiting redundancy, ANNA UNIVERSITY, CHENNAL- 600 025 In the initial stages of video compression research, a video was considered a series of motion pictures that are processed separately. Key factors to be considered for video proces ing are the storage requirements for the video as they relate to the compression ratio, and the quality of the video (Martinez 2006) in terms of the peak signal to noise ratio (PSNR) and the mean square error (MSE). Major factors associated with video compression include: : ‘The mode of the input stream (real-time or not) In real-time video compression, there is no delay between video capture and storage because the video content is immediately coded and stored in a compressed form, But in non-real-time compression, recorded video contents are stored without compression using a format like the uncompressed audio video interleave (AVI). . Symmetry vs. Asymmetry ‘Symmetric compression refers to compressing frames with a fixed size and at a constant rate, For example, a video that has frames sized 640 by 480 will be compressed at a rate of 30 frames per second. The degree of asymmetry is represented as a ratio that represents the duration of compression for a video with a playback length of one minute. If the degree of asymmetry is 100:1, the method takes 100 minutes to compress a one-minute video. The higher the degree of symmetry, the better the quality. : The ability to reduce storage space (compression ratio) The ratio between the original video size and that of the compressed video is called the compression ratio. If the compression ratio is 50:1, the original video is represented by 50 bits/bytes and the compressed video by 1 bitibyte. ANNA UNIVERSITY. CHENNAL- 600 025 . The ability to reproduce the original (lossy vs. lossless) The loss factor refers to the quality of the decompressed video. Without an error correction method, the high compression ratio will lead to a steep loss. Visually, the impact of the loss will be low. The effect of the loss will be high when the decompressed video is subjected to further analysis like object detection or recognition. . Macroblock referencing (inter-frame vs. intra-frame) Inter-frame decoding references the reconstructed macroblocks of neighbor frames to rebuild a new block in the current frame. Intra-frame decoding references a block of the current frame (o reconstruct a new block in the same frame. . Bit rate control ‘The bit rate is an important factor for video streaming if the system is bandwidth-limited. A suitable compression method has to support both the existing software and hardware parameters. Video compression has applications in teleconferencing, high definition television (HDTV) broadcasts, audiovisual communication, remote sensing, and medical applications. 12 NEED FOR COMPACT REPRESENTATION IN VIDEO COMPRESSION ‘Video streaming is the transmission of signals through a transmission es of data will lead to high ed by medium. Given that the tansfer of large mi zes must be reduced. This is achi bandwidth consumption, video file compression, which is the process of reducing or transforming redundant video contents, leading to an efficient utilization of the bandwidth at the time of video streaming. ANNA UNIVERSITY. CHENNAL- 600 025 Image/Video compression methods generally use certain transforms (Bairagi 2013) like the discrete cosine transform (DCT), continuous wavelet transform (CWT), Karhunen-Loeve transform (KLT), Walsh-Hadamard transform (WHT), discrete wavelet transform (DWT) and integer wavelet transform (IWT), followed by redundancy mapping like motion es imation and size-reducing processes like quantization, The output of these transforms is a compact representation of multimedia content that can be reconstructed approximately. These transforms are liable to high data loss at the time of reconstruction for a video with lots of dynamic scenes. So a compact representation or transform is necessary to represent relevant video contents that can be reconstructed with minimal loss. The compact representation must allow different levels of decomposition and reconstruction with minimum error. Recent High Definition (HD) videos contain a broad range of colors. Similarly, videos with moving objects recorded by moving cameras may include dynamically changing textures. Such videos contain a large mass of spatial and temporal by redundant data, and dynamically changing texture demand more bandwidth consumption at the time of video streaming. This demand can be reduced by streaming the video in a compact representation such that there is reconstruction with error correction or a distortion reduction facility. Thus, a compact description of redundant video contents is essential, with the properties of errorless reconstruction and residual error correction. For easy understanding, let us consider a video sequence with a resolution of 720*576 pixels, with a refresh rate of 25 frames per second and a color depth of 8 bits. The bandwidth requirement for video streaming with luminance-chrominance representation is 1.66 Mb/s (720 x 576 x 25 x 8 +2. (360 x $76 x 25 x 8)). For the same video to be transmitted with an HDTV video streaming with a resolution of 960%1080, it is 1.99 Gb/s (1920 x 1080 x 60 x 8 + 2 x (960 x 1080 x 60 * 8), ANNA UNIVERSITY. CHENNAL- 600 025 ‘The contents of a video may have static and dynamic structures that create a large amount of redundant data. At the time of compression, this data must be stored compactly subject to minimal error in reproduction to reduce the volume of redundant data, leading to reconstruction with minimal or no loss, There are many compact representations to store redundant data. This research uses tensor, which is an N-way representation of video contents, for compression. 13 STRUCTURE OF THE VIDEO COMPRESSION METHOD Video compression is a multi-step process that handles certain operations on video contents.The sequence of video compression operations varies, depending on the methods, applications and type of video contents. The general series of actions for video compression is given in Figure 1.3 ' | Redundancy 4 Site reduction | ll "I mapping 1 methods 1 Redundancy de Reverse Ste Retrieve aporoximated mapping reduction I video ml | methods "| Figure 1.3 General structure of a video codec Instream from Video out 1.3.1 Preprocessing Preprocessing (Shi &Sun 2008) is preparing video contents for compression by applying one or more simple methods. The steps listed in preprocessing are optional, and depend on the device used to capture the video. a ANNA UNIVERSITY. CHENNAL- 600 025 Preprocessing includes filtering to remove noise, dividing video frames into macroblocks, or grouping them into a pattern that contains a sequence of forward predicted frames (P-frames), bi-directional predicted frames (B- frames) and intra-coded frames (I-frames). Filtering is applied when using video capture or recording devices with poor configuration and lighting. Videos captured by devices with poor configuration have enhancement filters, like soothing and smoothing, applied to them for spatial enhancement. For frequency domains, low-pass and high-pass filters are used, while low lighting and blurring can be rectified using contrast adjustment and sharpening filters. Preprocessing improves the quality of in-stream video frames prior to the application of encoding, 1.3.2 Redundancy Mapping An original video stream is continuous in both spatial and temporal domains, The digital processing of a video needs a sampling of the video in both domains, as shown in Figure 1.4. The spatial domain consists of pixels represented by positive integers in a range that depends on the type of pixels: binary, gray scale and RGB. Figure 1.4 The 3D organization of a video ANNA UNIVERSITY. CHENNAL- 600 025 Redundancy in video streams can be identified in spatial and temporal domains. To reduce the memory space, these redundant contents to be mapped into a simple format with a capability of remapping to its original format. The types of redundancies are perceptual, spatial, and temporal. Perceptual redundancy (Wang et al. 2013, Tang 2007) depends on the consistency of the human eye in recognizing visual content. This kind of redundancy can be exploited in both spatial and temporal domains. Spatial redundancy relates to redundant contents exploited among the pixels in a single frame. Predictive coding and transform coding map redundant spatial contents. Predictive coding depends on the correlation between the pixels inside a frame. The prediction of pixels can be made in any forward direction inside the frame, based on the correlation between pixels, as shown in Figure 1.5. 9 S @ () D Q © @ Figure 1.5 Examples of intra prediction of the current block from neighboring blocks through four directions (a) Horizontal (b) Diagonal, downright (c) Vertical (4) Diagonal, down left ANNA UNIVERSITY. CHENNAL- 600 025 Temporal redundancy is identified between successive frames. P- frame and B-frame coding is done by mapping redundancies between frames in forward and backward directions, making it bidirectional. Frame differencing, the simplest form of interframe redundancy removal, is the process of ascertaining the difference between the current and neighboring frames. A prediction between the frames is made with reference to the difference. The prediction can be the process of estimating the motion of objects using the motion compensation of an NxN macroblock of a frame. This motion can be rigid, characterized by regular movements like displacement and rotation; or nonrigid, with irregular movements like the affine transformation, as seen in Figure 1.6 (a), (b) and (c) respectively. @) ) (©) Figure 1.6 Inter prediction by (a) Displacement (b) Rotation (c) Affine transform ANNA UNIVERSITY. CHENNAL- 600 025 10 1.3.3 Size Reduction Methods Redundancy mapping methods reduce the number of redundant contents within a frame as well as between frames, However, recent video compression methods demand good bit rate savings to transfer video contents through a channel, calling for additional steps to reduce the mapped data size. In general, quantization methods, entropy coding and intermediate information encoding with host encoders are used for different video codecs. Quantization (Nanda & Kaulgud 2002) is the process of mapping a range of values into a single value to give an additional level of compression. Entropy coding (Sunil et al, 2018) is the final step of video compression that reduces memory space dramatically, A host encoder (Naser et al. 2015) incorporates an existing video compression standard like the MPEG, H.264, and H.265 to encode content requiring a high reproduction rate. 1.34 Decoders Decoding is the inverse of encoding, or encoding in reverse order. It applies the inverse transform and reverse size reduction methods to regenerate an approximated version of the original. The difference between the original video sequence and the approximated one affects the quality of the video. 1.4 TENSOR ALGEBRA In the proposed video compression methods, tensor, which is an N= way representation of an entire video or the intermediate result of the video. 1.4.1 Tensor and its Terminologies ANNA UNIVERSITY. CHENNAL- 600 025 ul A tensor, T € R™*®, js a 3D array that represents three different dimensions of the given data. Each element of the tensor, TT, is a real value represented by ty, where i € Lj € J, € K. 1.4.2 Terminologies . Rank of a tensor: Rank is the contra-variant and covariant indices of a tensor, independent of the number of dimensions that represent the tensor. . The inner product of two tensors, T, and Tp, is given by Equation (1.1) < TT 2 >= Lyn Tryp Toy ay . ‘The outer product of two tensors, T and Tp, is given by Equation (1.2) Ty 0T, =T,T2" (2) . Tensor decomposition: ‘Tensor decomposition is the process of dividing an N-dimensional tensor into a number of rank-1 tensors. The basic idea underlying an array of tensor decomposition methods (Doulamis 2000) is that any decomposition method is reversible to obtain an approximated version of the original tensor. Consider a third order tensor, A, with dimensions Ny x Nz X Ng . A can be represented as the sum of 3 rank-I tensors in a linear combination. Singular value decomposition (Kilmer & Martin 2004) is the basic structure of all other higher-order decompositions. For a tensor A of order 3, A=Sx}_,U" (1.3) ANNA UNIVERSITY. CHENNAL- 600 025 12 where $ is the core tensor with dimensions ny x mz X m3, where ny < Ny, mz < Nz and ng < Ny. S isthe projection of tensor A into the tensor basis in terms of factor matrices U", where dimension(U,) is (1 X 1) . Rank approximation: Rank approximation is a minimizing problem that fixes the fitness between the original tensor and the approximated tensor from a cost function to the minimum. The fitness is in terms of Frobenius norm: min||A — All ,where rank(A) < rank(A). The Frobenius norm for a tensor with order 3 is given by Equation (1.4): WAll = YEE pL a (4) . Residual error: As given in Eq.(1.3), the reconstructed tensor obtained from the product of the core tensor and factor matrices is the approximated version, A’ of A, The difference between the original tensor and the approximated tensor is termed the residual tensor, as given inEquation (1.5): R=||A-A'll (5) 1S OBJECTIVES OF THE STUDY The principal objectives of the proposed research work are twofold: ‘©The primary objective is to use a tensor, which is an N-way representation of all types of data, to compactly represent data, store compressed data, and enable intermediate data compression ANNA UNIVERSITY. CHENNAL- 600 025 B * Here, a number of proved operators and the decomposition methods of tensor algebra can be used. ‘© The secondary objective is to be able to compress videos with dynamic and static textures (for example, underwater videos) with a high PSNR value, efficient compression rate and better bit rate. 1.6 CONTRIBUTIONS OF THE STUDY The contributions of the proposed research work are as follows: + Three new video compression schemes are proposed by applying tensor representation and operations on tensor algebra in different stages of the compression. + Acompact video content representation for video coding, using a multi-linear tensor rank approximation with a dynamic core tensor order, is the video compression framework. The contributions of this work in using tensor algebra are listed belo i, A Default Low Multi near Rank Approximation (LMLRA) is an iterative method to decompose a tensor by ding an optimal core tensor size, which ensures minimum data loss during the decomposition. To eliminate a number of repeated calculations, fixing the core tensor size is considered an optimization problem and is solved using Tikhonov regularization. Thus, LMLRA is a single-step decomposition routine fixed core tensor size. ANNA UNIVERSITY. CHENNAL- 600 025 iii, 14 The Tikhonov regularization problem is solved using the comer of the L-curve method to minimize the residual error, With the imension of the core tensor, the 4D tensor is decomposed into a core tensor and four-factor matrices of smaller sizes. To achieve lossless compression, the residual tensor also has to be coded. After decomposition, the least residual error entries are ignored by making them sparse. ‘A new framework for video compression with a tensor representation using the Canonical Correlation Analysis-based Keyframe selection (CCA-KeyCodec) is the next video compression framework. This framework makes the following contributions: ii, iii, In general, Canonical Correlation Analysis (CCA) is used to find the amount of redundancy between frames of different sources in multi-view video compression. In this method, it is used to find the same between successive frames. The frame with the maximum correlation with the most number of consecutive frames is used as the keyframe Variable-sized blocks are identified to map different transformation models onto a single frame. Data fusion is used to combine the data obtained from the CCA keyframe identification, variable block coordinates, and the geometrical transform model attributes, Further, the LMLRA tensor decomposition is applied on the processed video contents. ANNA UNIVERSITY. CHENNAL- 600 025 15 + A novel video codec with synthesis analysis coding and Feature-based Video quality Metries for Distortion Error Control (F VMDEC) is the third video compression framework. The contributions, in terms of the vatious steps of the video compression method, are listed below: i, A new voting-based segmentation method is proposed, using a sparse tensor representation for fast and compact segmentation, ii, All segment details are stored as tensors for minimal storage, iii, A set of quality metrics is used to improve the quality of the reconstructed segments previously degraded by distortion, All three methods are tested against the 11.264, 11.265, and other state-of-the-art techniques, The ReefVid database (ReefVid database) has a broad range of underwater videos of different species, and general videos for video compression with high redundancy are used for a_ performance comparison. 7 ORGANIZATION OF THE THESIS ‘This thesis is organized and presented in seven chapters. Chapter 1 offers an overview of video compression, the need for a compact representation of video contents and intermediate results, and the structure of general video codecs. It explains the objectives, contributions of the research and organization of chapters in this thesis. ANNA UNIVERSITY. CHENNAL- 600 025 16 Chapter 2 surveys in detail video compression standards, video compression using different transforms, tensors in image and video processing, and existing video compression methods using tensors as a representation for a whole video file or intermediate data. Chapter 3 introduces a new video codec using a 4D tensor to year Rank Approximation (LMLRA) is represent a video file. The Low Multi- used to decompose the 4D tensor into a lesser-order core tensor and four different rank-1 factor matrices. A new method to find the order of the core tensor is proposed to attain maximum decomposition with high reconstruction quality. This method is compared with current video compression standards and state-of-the-art video compression methods with tensor representation. Chapter 4 proposes a video codec with Canonical Correlation Analysis (CCA) to choose the keyframe. LMLRA decomposition and ‘geometric transformations are used to efficiently map the motion between the keyframe and successive frames. This method is compared with state-of-the- art methods and video compression standards in terms of compression metries and time consumption. Chapter 5 advances a new video codec to achieve faster and enhanced hi -quality compression, enhanced compression with high quality. A new set of Feature-based Video Quality Metrics (FVM) is defined to improve the quality of the reconstructed video. This method is compared with state-of the-art methods and video compression standards. Chapter 6 presents a comparative analysis of all the proposed video compression methods in terms of in the aspect of compression, quality of reconstruction and computational requirements. ANNA UNIVERSITY. CHENNAL- 600 025 7 Chapter 7 concludes the thesis and elaborates on the future scope of this work. It discusses the advantages and disadvantages of the proposed research. ‘The references used for this research follow work are given at the end of this thesis. 1.8 SUMMARY ‘This chapter has presented an introduction to video compression, the need for a compact representation of video contents and intermediate results, the structure of general video codecs, and the objectives and contributions of the research. This chapter has outlined the chapter also has given an outline of the succeeding chapters. Chapter 2 presents a review of the literature on the subject. ANNA UNIVERSITY. CHENNAL- 600 025

You might also like