You are on page 1of 4

Poster Paper Proc. of Int. Conf.

on Control, Communication and Power Engineering 2013

An Overview on Multimedia Transcoding Techniques on Streaming Digital Contents
Shakti Awaghad1 and Sanjay Pokle2

Research Scholar, Dept of Electronics & Communication Engineering., G. H. Raisoni College of Engg, Nagpur, India 2 Professor & Head, Dept. of Electronics & Communication Engineering, R.C.O.E.M, Nagpur, India cessing is done (like adding logo or commercial to the video). This process requires a high computational power and data flow. During the transmission of the videos there are limitations of the network bandwidth available and problems concerning transmission errors. Some techniques [2] allow recovery of the video from errors occurred on the end-user side or during transmission. On the end-user side there are different kinds of devices. Depending on the device limitations are in processing power, display resolution, codecs available, etc. Transcoding is used in various multimedia systems like conferencing, telemedicine, education and military. In these systems, audio and video needs to be delivered smoothly without interruptions. Quality of user experience must meet the prescribed level of quality of experience for that system. The paper presents an exclusive review of the phenomenon for multimedia transcoding. Section 2 highlights the significance of using transcoding process in multimedia. Section 3 discusses about illustrating architectures of transcoding followed by classification of transcoding in Section 4. The past research work and conclusions is discussed in Section 5. II. SIGNIFICANCE Quality of experience is used to measure the real efficiency of the transcoding algorithms and video coding algorithms in general. User experience is dependent on both the video and audio quality of the streamed content. Depending on the content type (news shows, sport games, movies, etc.) people react differently to the proportion of video/audio quality. In particular lower video quality is acceptable in a news show if audio is of good quality. Opposite to that lower audio quality with good video would be acceptable for some sport games [3]. Delay time is also important factor that needs to be carefully planned not to degrade the QOE. After requesting a stream from a server and defining its capabilities, end user has to wait some time for the transcoding process to begin and for the transmission of the content to his device. With some perceptual tricks like displaying the channel logo, waiting time can be shorten, but just perceptually [4]. Transcoding algorithms should be able to provide a lower bandwidth video when degradation occurs on the available network bandwidth. This will help from video being stopped and it will be played at lower quality. Unfortunately this approach won’t help if the bit-error or jitter is high. Researches 62

Abstract— The current IT infrastructure as well as various commercial applications are directly formulated based on deployment in multimedia system e.g. education, marketing, risk management, tele-medicines, military etc. One of the challenges found in using such application is to deliver uninterrupted stream of video between multiple terminals e.g. smart-phone, PDAs, laptops, IPTV etc. The research shows that there is a stipulated need of designing novel mechanism of bit rate adjustment as well as format conversion policy so that the source stream may stream well in diverse end devices with multiple configuration of processor, memory, decoding etc. This paper discusses various eminent points from literature that will throw better highlights in understanding a schema of direct digital-to-digital data conversion of one encoding to another termed as transcoding. Although multimedia transcoding has covered more than a decade in the area of research, but unfortunately, there is a huge tradeoff between the application, service, resource constraint, and hardware design that gives rise to QoS issues. Keywords- component; M ultimedia Transcoding, Video Transcoding, data-format conversion.

I. INTRODUCTION The amount of video content on the Internet has rapidly grown during the last decade. Introduction of video services such as YouTube has enabled users to generate their own content and upload it to online video services. In 2007 more than 500.000 user-generated content was uploaded daily on those video services [1]. This created a huge amount of multimedia content which needs to be available to users on the Internet. All the multimedia content needs to be transformed into right format for every user. For example, someone is using a desktop computer who wants to watch a high definition video or users who wants to watch video on their mobile phone with a small screen needs video that is optimized for their low resolution screen and limited network bandwidth. This means that every video must be transcoded into the right resolution, frame rate, bit rate and codec. In multimedia system there should be an original video recorded by a camera or created on the computer and encoded into a compressed format. After that the video is uploaded onto an Internet server where the whole transcoding process begins. When a user requests a video, server needs to transcode the video into the desired format and send it to the user. The video from the server is transcoded into a format with desired resolution, bit-rate, frame-rate, codec, and all other specific pro© 2013 ACEEE DOI: 03.LSCS.2013.2.42

Poster Paper Proc. of Int. Conf. on Control, Communication and Power Engineering 2013 of QOE for the standard devices like desktop computers, video consoles and television has been done but further researches is needed to see the difference concerning mobile devices.The first and most important challenge in the context of a video conferencing is to provide transcoding on the fly with real-time speed and without any interruption of video flow [5], [6]. There are three basic requirements in transcoding [7]: 1) the information in the original bitstream should be exploited as much as possible; 2) the resulting video quality of the new bitstream should be as high as possible, or as close as possible to the bitstream created by coding the original source video at the reduced rate; 3) in real-time applications, the transcoding delay and memory requirement should be minimized to meet real-time constraints. A video transcoder can provide several functions, including adjustment of bit rate and format conversion. We illustrate these functionalities and their classification in Fig. 1. from normal video to a video containing only the region of interest. III. UNDERSTANDING TRANSCODING ARCHITECTURES The most straightforward transcoding architecture is to cascade the decoder and encoder directly. In this architecture, the incoming source video stream (Vs) is fully decoded, and then re-encoded the decoded video into the target video stream (VT) with desirable bit-rate or format, with no degradation in the visual quality due to transcoding. Spatialdomain transcoding architecture (SDTA) that can perform dynamic bit-rate adaptation via the rate-control at the encoder side. This architecture is flexible since the decoder-loop and the encoder-loop can be totally independent of each other (e.g., they can operate at different bit-rates, frame-rates, picture resolutions, coding modes, and even different standards). This architecture is drift-free, but its computational complexity is high for real-time applications.

Variable Length Coding Variable Length Decoding Inverse quantization of the source Quantization of the target video Inverse quantization of the source Discrete Cosine Transform Inverse DCT Reference Frame Motion Estimation Motion Compensation

Figure 1 Different Transcoding techniques

Homogeneous transcoding performs conversion between video bitstreams of the same standard. A simple technique to transcode a video to lower bit rate is to increase the quantization step at the encoder part in the transcoder [8], [9]. Spatial resolution can be done in a number of ways (see Fig. 3) [10]. One possibility is to transcode from normal video to a video containing only the region of interest. Fig. 4 illustrates that a transcoder can down-sample a scene to the object of interest (determined through meta information). This may be done using some meta information. In subsampling, filtering and pixel averaging to reduce spatial resolution [10] problems arise when passing motion vectors directly from the decoder to the encoder. Thus, motion vectors need to be refined. Frame-rate conversion is needed when the end-system supports only a lower frame-rate. With dropped frames, the incoming motion information is invalid because they point to the frames that do not exist in the transcoded bitstream. A heterogeneous video transcoder provides conversions between existing and future video coding standards. It provides syntax conversion between these standards. Further, a heterogeneous video transcoder may also provide the functionalities of homogeneous transcoding. Transcoding Figure. 2. Frequency domain transcoder architecture (FDTA) may include additional functions such as error-resilience and In this architecture, only VLD and inverse quantization logo or watermark insertion. These functions will be described are performed to get DCT value of each block in the decoder in the paper subsequently.One possibility is to transcode 63 © 2013 ACEEE DOI: 03.LSCS.2013.2. 42

Exploiting the structural redundancy of the architecture and the linearity of the DCT/IDCT, structurally simpler but functionally equivalent frequency-domain transcoding architecture is possible, which can be further simplified as shown in Fig. 2.

Poster Paper Proc. of Int. Conf. on Control, Communication and Power Engineering 2013 end. At the encoder end, the motion compensated residue errors are encoded through re-quantization, and VLC. The reference frame memory in the encoder end stores the DCT values after inverse quantization, that are then fed to the frequency-domain MC module to reduce drift error. This is referred to as frequency-domain transcoding architecture (FDTA). IV. CLASSIFICATION Transcoding is process of decoding video from some format to usually uncompressed format and encoding it to desired format. For example during transcoding an MPEG-2 video would be decoded to RAW format and then encoded into H.264 format. This process is an intensive computation process for server processor. Most intensive part of transcoding is motion estimation. This is because during motion estimation we must find how some blocks of picture move from one frame to another. A. Types of Transcoding The proliferation of wireless devices with very different form factors multiplies the costs associated with Web applications. Different devices require pages to be presented in different markup languages, such as HyperText Markup Language (HTML), compact HTML (including a version called imode that is popular in Japan), Unwired Planet’s Handheld Devices Markup Language (HDML), Wireless Markup Language (WML), and VoiceXML. There are two main approaches to handling the need to present the content delivered by an application differently in different circumstances, for different devices, and for different levels of network service. Static transcoding: With static transcoding of static content, different versions of the same content are produced by a designer as shown in figure.3, generally using various studio tools, and stored in the file system of the Web server. With static transcoding of dynamic content, either different versions of the application are provided or the application contains logic that produces different versions of the content. allow the problem of creating content to be separated from the problem of creating different presentations. Dynamic transcoding shown below as figure.4. consists of a set of techniques for tailoring the information generated for a user to match the specific presentation requirements of a given user on a given device on a given level of network service.The mechanism used by Dynamic Transcoding can be used across a wide variety of content for a wide variety of devices, they reduce the overall development and maintenance costs of Web applications.

Figure 4 Dynamic Transcoding

Figure. 3 Static Transcoding

The primary disadvantage of static transcoding is the number of different pages and applications that have to be created, tested, organized, and maintained. As the number of devices increases, the burden of hand generating different versions of each application or page for different presentations will become onerous. Dynamic transcoding: Dynamic transcoding promises to © 2013 ACEEE DOI: 03.LSCS.2013.2.42 64

Spatial And Temporal Transcoding: The heterogeneity of communication networks and network access terminals often demand the conversion of compressed video not only in the bit rates, but also in the spatial/temporal resolutions. One of the challenging tasks in spatial/temporal transcoding is how to efficiently re-estimate (or map) the target motion vectors from the input motion vectors. Many works on the motion reestimation for spatial transcoding consider the simple case of 2 1 downscaling. For transcoding with temporal resolution changes, due to the frame dropping, one has to derive a new set of motion vectors that do not exist in the input video. Video services used transcoding in a way that they pretranscoded an original video and stored it in one or more formats that they later used for streaming to end user. Lately number of different formats of multimedia streams requested by users is growing with number of new devices having different capabilities (in terms of resolution, processing power and network bandwidth). These trend increases number of different files needed for every single multimedia content. Increase in number of transcoded files requires more processing power for transcoding and more storage space for storing those files. Even doe storage space is taken as a low cost resource, this increase in space is significant and requires new ways of transcoding. IDC proposes a different approach in transcoding called transactional transcoding [1]. In transactional transcoding multimedia content is transcoded when there is a need for transcoding. Original multimedia content is kept only in original format or some format that is more appropriate for transcoding. When a transcoding request comes from the end user, transcoding starts. User can request which format of multimedia he needs. The transcoding technique in this literature is briefly classified into two phases as illustrated below: 1 . Homogeneous Video Transcoding : Homogeneous transcoding performs conversion between video bit streams of the same standard. A high quality source video may be

Poster Paper Proc. of Int. Conf. on Control, Communication and Power Engineering 2013

Figure.5 Heterogeneous video transcoder

transcoded to a target video bitstream of lower quality, with different spatial/temporal resolutions, and different bit rates. Some of the research issues found in this type of transcoding techniques are reducing bits with fixed resolution, spatial resolution reduction, temporal resolution reduction, and transcoding between multiple and single Layers 2. Heterogeneous Video Transcoding: A heterogeneous video transcoder as shown in figure.5, provides conversions between various standards, for instance, MPEG-2 to H.263 transcoder, MPEG-2 to MPEG-4 transcoder, H.263 to MPEG4 transcoder, etc. In this architecture, syntax conversion (SC) is needed to convert the syntax of source video to that of the target video. A higher resolution decoder decodes the incoming bitstream. The extracted MVs are then post-processed according to the desired output encoding structure, and if required, they are properly scaled down to suit the lower spatial-temporal resolution encoder. In case post-processing is not sufficient, the extracted MVs are refined to improve the encoding efficiency. The decoded pictures are accordingly downsampled spatially or temporally, and the down-sampled images are encoded with the new MVs. Since the incoming MVs are re-employed, and other encoding decisions, such as macroblock types can be extracted from the incoming bitstream, the architecture of this transcoder can be further simplified. V. CONCLUSION Multimedia content is taking a primary role in Internet. From the beginning of Internet when text with small amount of images was the leading element, in this decade primary role is being taken by multimedia content. Progress has been empowered by higher network speeds and better processing power of end user devices. Also development and outspread of mobile devices capable of playing multimedia opened new approaches to video experience. All this together with vast amount of user generated videos being uploaded and watched every day is contributing to research in the field of transcoding. To keep up with the increasing amount of user generated videos and diversity of devices for watching videos, providers will need to change the way of transcoding © 2013 ACEEE DOI: 03.LSCS.2013.2. 42 65

the multimedia. Some videos might still be streamed in the old way by pre-transcoding, but majority of videos will need to be streamed in a novel way of transactional transcoding. Transactional transcoding will allow video to be optimized for the end user to gain the best possible quality of experience. Also it will provide the means of extra monetization features like delivering ads targeted using location based services. REFERENCES
[1] G. Ireland and L. Ward, “Transcoding Internet and Mobile Video: Solutions for the Long Tail”, IDC, 2007 [2] L. Superiori, O. Nemethova, M. Rupp, “An H.264/AVC Error Detection Algorithm Based on Syntax Analysis”, In: A. M. A. Ahmad and I. K. Ibrahim, “Multimedia Transcoding in Mobile and Wireless Networks”, Information Science Reference, London, 2009, pp. 215 – 234. [3] H. Knoche and M. A. Ssse, “Getting the Big Picture on Small Screens: Quality of Experience in Mobile TV”, In: A. M. A. Ahmad and I. K. Ibrahim, “Multimedia Transcoding in Mobile and Wireless Networks”, Information Science Reference, London, 2009, pp. 31-46. [4] N. Roma and L. Sousa, “Insertation of irregular-shaped logos in the compressed DCT domain”, In: Proc. 14th Int. Conf. Digital Signal Processing (DSP) 2002, vol. 1, Jul. 2002, pp. 125-128 [5] G. Keesman, R. Hellinghuizen, F. Hoeksema, and G. Heideman, “Transcoding of MPEG bitstreams,” Signal Processing: Image Commun., vol. 8, no. 6, pp. 481–500, 1996. [6] J. Youn, M.-T. Sun, and C.-W. Lin, “Motion vector refinement for high performance transcoding,” IEEE Trans. Multimedia, vol. 1, no. 1, pp. 30–40, Mar. 1999. [7] X.Wang,W. Zheng, and I. Ahmad, “MPEG-2 to MPEG-4 transcoding,” in Proc. Workshop and Exhibition on MPEG-4, San Jose, CA, Jun. 18–20, 2001, pp. 83–86. [8] M.-T. Sun, T.-D.Wu, and J.-N. Hwang, “Dynamic bit allocation in video combining for multipoint conferencing,” IEEE Trans. Circuits Syst. II, vol. 45, no. 5, pp. 644–648, May 1998. [9] O. Werner, “Requantization for transcoding of MPEG-2 intraframes,” IEEE Trans. Image Process., vol. 8, no. 2, pp. 179– 191, Feb. 1999. [10] R. Mohan, J. R. Smith, and C. Li, “Adapting multimedia internet content for universal access,” IEEE Trans. Multimedia, vol. 1, no. 1, pp. 104–114, Mar. 1999.