Prepared by Arun Dushing

TABLE OF CONTENTS FUNDAMENTALS OF VIDEO Video components Video Signal Types of Analog Video Signal • Component Video • Composite Video • S-Video DIGITAL VIDEO Analog Video Scanning Process • Progressive scanning • Interlaced Scanning Color Video Digitizing Video Digital Video Color Sampling VIDEO COMPRESSION Video Compression Requirements Coding Techniques • Entropy coding • Source coding • Hybrid encoding Methods for Compression STEPS IN VIDEO COMPRESSION

AUDIO, VIDEO CODEC’S SPEECH CODECS • G.723.1 • G.711 • G.729 • GSM-AMR VIDEO CODECS • H.261 • H.263 • H.264 • MPEG-1

• MPEG-2 • MPEG-4 COMPARISON OF THE CODECS Most Commonly used Video/Audio File Formats VIDEO-CONFERENCING Benefits of Videoconferencing Videoconferencing Protocols • H.320 • H.323 Videoconferencing Terms Types of Videoconferencing • Point-to-point • Multipoint • Multicast VIDEO OVER IP Data/ Video/ Voice in ONE Net Video over IP Solution Structure IP VIDEO TECHNOLOGIES The ISDN to IP Migration for Videoconferencing • ISDN-Only Environments • Converged IP Environments • IP Overlay Environments • Hybrid Video Environments FREQUNTLY ASKED QUESTION

Video over IP Video components, digital video, pictures and audio, video codec’s, issues and solutions, video conferencing, multipoint video conferencing, video protocol stack. Multicasting.

Persistence of Vision The rapid presentation of frames of video information to give you the illusion of smooth motion.

Fundamentals of Video

Video is nothing but a sequence of still pictures • To create an illusion of motion the pictures have to be played at a rate > 24 frames / sec • A picture is divided into small areas called pixels • Picture qualities • Brightness : Overall / average intensity of illumination of the picture and it determines the background level in the reproduced picture • Contrast : The difference in the intensity between the dark parts and the bright parts of the picture • Detail or Resolution: The detail / Resolution depend on the number of picture elements. Also known as the definition

Video components
a. voltage circuit b. Luminance c. Color d. Timing Scan rates a. Video 525 lines interlaced b. Computer pixels, Lines vs. Pixels c. How may Pixels are in the frame Refresh Rates a. Traditional Video 15.75 khz = 525 lives x 30 frames per sec b. Computer graphics 640 x 480 up to 1390 x 1024 up to 110 Khz.

Video Basic Basic Principal of Video How Television works Video Technology General Information Page

Video Signal

• A picture has 4 variables, two in the spatial axes, intensity variation and one along the temporal axis • An electrical signal can only represent a single variable with time • Picture is scanned horizontally in lines to produce an electrical signal corresponding to the brightness level of the pixels along the line • The vertical resolution of the picture is determined by the number of scanning lines

Types of Analog Video Signal
1. Component Video 2. Composite Video 3. S-Video Component video: Higher-end video systems make use of three separate video signals for the red, green, and blue image planes. Each color channel is sent as a separate video signal. (a) Most computer systems use Component Video, with separate signals for R, G, and B signals. (b) For any color separation scheme, Component Video gives the best color reproduction since there is no “crosstalk“between the three channels. (c) This is not the case for S-Video or Composite Video, discussed next. Component video, however, requires more bandwidth and good synchronization of the three components. Composite video: A composite video signal is a combination Luminance level and the line synchronization information Color (“chrominance") and intensity (“luminance") signals are mixed into a single carrier wave. a) Chrominance is a composition of two color components (I and Q, or U and V).

b) In NTSC TV, e.g., I and Q are combined into a chroma signal, and a color sub
carrier is then employed to put the chroma signal at the high-frequency end of the signal shared with the luminance signal. c) The chrominance and luminance components can be separated at the receiver end and then the two color components can be further recovered. d) When connecting to TVs or VCRs, Composite Video uses only one wire and video color signals are mixed, not sent separately. The audio and sync signals are additions to this one signal. Since color and intensity are wrapped into the same signal, some interference between the luminance and chrominance signals is inevitable. S-Video: as a compromise, (Separated video, or Super-video, e.g., in S-VHS) uses two wires, one for luminance and another for a composite chrominance signal. As a result, there is less crosstalk between the color information and the crucial grayscale information. The reason for placing luminance into its own part of the signal is that black-and-white information is most crucial for visual perception. In fact, humans are able to differentiate spatial resolution in grayscale images with a much higher acuity than for the color part of color images. As a result, we can send less accurate color information than must be sent for intensity information | we can only see fairly large blobs of color, so it makes sense to send less color detail.

Digital Video
Digital Video obtained by:  Sampling an analog video signal V(t)  Sampling the 3-D space-time intensity distribution I(x,y,t)

Analog Video Scanning Process
An analog signal f(t) samples a time-varying image. So-called “progressive" scanning traces through a complete picture (a frame) row-wise for each time interval. In TV, and in some monitors and multimedia standards as well, another system, called “interlaced" scanning is used:

a) The odd-numbered lines are traced first, and then the even-numbered lines are traced. This results in “odd" and “even" fields | two fields make up one frame. b) In fact, the odd lines (starting from 1) end up at the middle of a line at the end of the odd field, and the even scan starts at a half-way point.

Video Sampling Progressive scanning: One full frames every 1/30th of a second. Interlaced scanning: two separate fields every 1/60th of a second. (P:1 interlacing) Progressive scanning

Interlaced Scanning Because of interlacing, the odd and even lines are displaced in time from each other | generally not noticeable except when very fast action is taking place on screen, when blurring may occur.

Scanning and Interlacing
• Even at rates > 24 frames /sec, the user will be able to see a flicker at high intensity levels • To avoid flicker, a single frame is displayed in two interlaced fields • Interlaced video standards o NTSC – 525 / 60 o PAL – 625 / 50

NTSC (National Television System Committee) NTSC is the video system or standard used in North America and most of South America. In NTSC, 30 frames are transmitted each second. Each frame is made up of 525 individual scan lines. NTSC (National Television System Committee) TV standard is mostly used in North America and Japan. It uses the familiar 4:3 aspect ratio (i.e., the ratio of picture width to its height) and uses 525 scan lines per frame at 30 frames per second (fps). a) NTSC follows the interlaced scanning system, and each frame is divided into two fields, with 262.5 lines/field. b) Thus the horizontal sweep frequency is 525 X 29.97 =15, 734 lines/sec, so that each line is swept out in 63.6 u second. c) Since the horizontal retrace takes 10.9 u sec, this leaves 52.7 sec for the active line signal during which image data is displayed NTSC video is an analog signal with no fixed horizontal resolution. Therefore one must decide how many times to sample the signal for display: each sample corresponds to one pixel output. A “pixel clock" is used to divide each horizontal line of video into samples. The higher the frequency of the pixel clock, the more samples per line there are. Different video formats provide different numbers of samples per line, as listed in following Table Format Resolution/Lines VHS 240 S-VHS 400-425 Betamax 500 Standard 8 mm 300 Hi-8 mm 425 Mini DV 480 (720X480) DVD 720X480 HD-DVD up to 1920X1080

PAL (Phase Alternating Line) PAL is the predominant video system or standard mostly used overseas. In PAL, 25 frames are transmitted each second. Each frame is made up of 625 individual scan lines. PAL (Phase Alternating Line) is a TV standard widely used in Western Europe, China, India, and many other parts of the world. PAL uses 625 scan lines per frame, at 25 frames/second, with a 4:3 aspect ratio and interlaced fields. (a) PAL uses the YUV color model. It uses an 8 MHz channel and allocates a bandwidth of 5.5 MHz to Y, and 1.8 MHz each to U and V. The color sub carrier frequency is fsc 4:43 MHz. (b) In order to improve picture quality, chroma signals have alternate signs (e.g., +U and -U) in successive scan lines, hence the name “Phase Alternating Line".

(c) This facilitates the use of a (line rate) comb filter at the receiver| the signals in consecutive lines are averaged so as to cancel the chroma signals (that always carry opposite signs) for separating Y and C and obtaining high quality Y signals.

Digital Levels Video Level White Black Blank Sync NTSC 200 70 60 4 PAL 200 63 63 4

Color Video

• •

Color video camera produces RGB output signals To maintain the compatibility with the monochrome receiver, the color signals are converted into Luminance (Y) and Chrominance or Color Difference (R-Y, BY) signals Widely used color formats o YUV

This color space is the rescaled version of color difference signals to be compatible with analog channel bandwidth YCbCr Recommended for Digital TV broadcasting by ITU-BT.601 o NTSC Color Bars

Digitizing Video

 

A composite video signal is sampled at a rate 4 – times the fundamental sampling frequency recommended by ITU ( 4 x 3.375 = 13.5 MHz) With the recommended sampling rate the number of samples during the active line period for both NTSC and PAL will be the same The signal is converted into 8 – bit samples using A/D converter Color difference signals are sampled at a reduced rate, which is also an integral multiple of 3.375

Digital Video Color Sampling
The advantages of digital representation for video are many. For example: (a) Video can be stored on digital devices or in memory, ready to be processed (noise removal, cut and paste, etc.), and integrated to various multimedia applications; (b) Direct access is possible, which makes nonlinear video editing achievable as a simple, rather than a complex task; (c) Repeated recording does not degrade image quality;

(d) Ease of encryption and better tolerance to channel noise. Since humans see color with much less spatial resolution than they see black and white, it makes sense to “decimate" the chrominance signal. Interesting (but not necessarily informative!) names have arisen to label the different schemes used. To begin with, numbers are given stating how many pixel values, per four original pixels, are actually sent: (a) The chroma subsampling scheme “4:4:4" indicates that no chroma subsampling is used: each pixel's Y, Cb and Cr values are transmitted, 4 for each of Y, Cb, Cr. (b) The scheme \4:2:2" indicates horizontal subsampling of the Cb, Cr signals by a factor of 2. That is, of four pixels horizontally labeled as 0 to 3, all four Ys are sent, and every two Cb's and two Cr's are sent, as (Cb0, Y0)(Cr0, Y1)(Cb2, Y2)(Cr2, Y3)(Cb4, Y4), and so on (or averaging is used). (c) The scheme “4:1:1" subsamples horizontally by a factor of 4. (d) The scheme “4:2:0" subsamples in both the horizontal and vertical dimensions by a factor of 2. Theoretically, an average chroma pixel is positioned between the rows and columns as shown Fig.5.6. Scheme 4:2:0 along with other schemes is commonly used in JPEG and MPEG (see later chapters in Part 2).

Color Sampling 4:2:2 At the first sample point on a line, Y (luminance), Cr (R-Y), and Cb (B-Y) samples are all taken; at the second sample point only a Y sample is taken; at the third sample point a Y, a Cb and a Cr are taken, and this process is repeated throughout the line

4:2:0 At the first sample site in the first line, a Y sample and a Cb sample are taken. At the second site a Y sample only is taken, while at the third site a Y and a Cb are taken and this is repeated across the line. Similarly Cr samples are taken in the second line


Video compression

Goal of video compression is to minimize the bit rate in the digital representation of the video signal while: – Maintaining required levels of signal quality – Minimizing the complexity of the codec – Containing the delay Video compression is all about reducing the number of bytes by which a video can be transmitted or stored, without costing much on the quality. It also reduces the time of transmitting a video over a channel, thanks to the reduced size. Compressed video can be transmitted more economically over a smaller carrier. Most networks handle approximately 120 Mbits/s of data. Uncompressed video normally exceeds a network’s bandwidth capacity, does not get displayed properly, and requires large amount of disk space for storage purposes. Therefore, it is not practical to transmit video sequences without using compression. Most compression methods concentrate on the differences within a frame or between different frames for minimizing the amount of data required for storing the video sequence. For differences within a single frame, compression techniques take advantage of the fact that the human eye is unable to distinguish small differences in color. In video compression, only the changes between the frames are encoded. By ignoring redundant pixels, only the changed portion of a video sequence is compressed, thereby reducing overall file size. There are well defined standards and protocols describing how the how the information should be encoded, decoded, and otherwise represented.

Video Compression Requirements

• General requirements o format independent of frame size and frame/audio data rate o synchronization of audio and video (and other) data o compatibility between hardware platforms

• Further requirements for “retrieval mode” systems
o o o

fast-forward and fast-backward searching random access to single images and audio frames independence of compressed data units for random access and editing

Coding Techniques
Entropy coding • Lossless coding is a reversible process - perfect recoveries of data -> before and after are identical in value. Used regardless of media’s specific characteristics. Low compression ratios. – Example: Entropy Coding • Data taken as a simple digital sequence • Decompression process regenerates data completely • E.g. run-length coding (RLC), Huffman coding, Arithmetic coding • Lossless encoding techniques are used in the final stage of video compression to represent the “remaining samples” with an optimal number of bits • Run-length coding represents the each row of samples by a sequence of lengths that describe the successive runs of the same sample value • Variable Length Coding ( VLC ) assigns a shortest possible bit sequence based on the probability distribution of the sample values Source coding Takes advantage of the nature of the data to generate a one-way relationship between the original and compressed information. “Lossy” techniques • Lossy coding is an irreversible process -recovered data is degraded -> the reconstructed video is numerically not identical to the original. Takes into account the semantics of the data. Quality is dependent on the compression method the compression ratio. – Example: Source Coding • Degree of compression depends on data content. • E.g. content prediction technique - DPCM, delta modulation Hybrid encoding Uses elements from both Entropy and Source Most techniques used in multimedia systems are hybrid – E.g. JPEG, H.263, MPEG-1, MPEG-2, MPEG-4

Methods for Compression
 Intra-Coded Compression o Pictures encoded in this method are called I – Pictures o Compression is achieved by removing the redundancy along the special axes Inter-Coded Compression and Prediction o This method takes the advantage of the similarities between successive pictures o The next picture is predicted from a limited number of previous or future pictures o Pictures that are predicted in one direction are called P – Pictures and pictures that are predicted in both directions are known as B – Pictures



There exists a general sequence of steps by which a video is compressed. Apart from these basic steps the various standards mentioned above customize their own procedure of compression. A video is nothing but a series of image frames. When a motion picture is displayed, each frame is displayed for a short period of time, usually 1/24th, 1/25th or 1/30th of a second, followed by the next frame. This creates the illusion of a moving image.

The difference between subsequent frames will be minimal. Video compression uses this property to reduce the size of a video. A video encoder is the device that does the compression. An encoder compared consecutive frames, picks out only the difference and encodes that instead of encoding the entire frame. The compression is done on Frame by Frame basis. Below given diagram gives the basic procedure of video compression irrespective of the standards.

RGB to YUV: This is the first step in compressing a video sequence. RGB (Red, Green, and Blue) and YUV (Luminance, Blue Chrome, and Red Chrome) refer to color formats by which a video can be represented. Each frame will have a particular value for Red, Green and Blue components. When a camera captures a video, it will be in RGB (Red, Geen, and Blue) format. But RGB videos require more space for storage than YUV format. Therefore to make transmission and storage easier, the video sequence is converted from RGB to YUV. This conversion is done for each frame of the video. The formula by which the conversion is done is given below. Y = 0.299R + 0.587G + 0.114B U = − 0.147R − 0.289G + 0.436B V = 0.615R − 0.515G − 0.100B Motion Estimation – Motion Compensation: Motion Estimation is one of the key elements in video compression. To achieve compression, the redundancy between adjacent frames can be exploited. That is, a reference frame is selected, and subsequent frames are predicted from the reference using Motion estimation. In, Motion compensation, the current frame is subtracted from the reference frame to create a residual frame which is then encoded. In a series of frames, the current frame is predicted from a previous frame known as reference frame. The current frame is divided into macroblocks, typically 16 x 16 pixels in size. However, motion estimation techniques may choose different block sizes, and may vary the size of the blocks within a given frame. Each macroblock of the current frame is compared with the reference frame and the best matching block is selected. A vector referring to the displacement of the macroblock in the reference frame with respect to the macroblock in the current frame is determined. This vector is known as motion vector (MV).

If the comparison of the current frame is done with the previous frame, it is called backward estimation. If it is done with the next frame, it is called forward estimation. If it is done based on both previous and next frame, it is called bi-directional estimation.

Reference Frame (Motion Estimation)

Current Frame

Motion Vectors (Motion Compensation)

Residual Frame DCT (Discrete Cosine Transform): Discrete Cosine Transform involves in converting the frames from time domain to frequency domain. A DCT is performed on small blocks (8 pixels by 8 lines) of each component of the motion compensated frame to produce blocks of DCT coefficients. The magnitude of each DCT coefficient indicates the contribution of a particular combination of horizontal and vertical spatial frequencies to the original picture block. The coefficient corresponding to zero horizontal and vertical frequency is called the DC coefficient.

Quantization: Quantization is the process of converting analog signal to digital signal. It involves the approximation of continuous values to discrete integer values. Quantization plays a important role in data compression. The frame that comes from the discrete cosine transform is very high in precision. By quantizing the values to approximate integer, the size of the frame is reduced. Instead of using large numbers, we reduce them to inexpensive integer values by dividing hem with constant values. But there are losses associated with the quantization.

Frame after DCT

Quantizing constant

Frame after quantization

For example, if we take the first pixel of the matrix,

Inverse Quantization: Inverse Quantization helps in reconstructing the frame which can be used as a reference frame for Motion Estimation. The quantized frame is multiplied with the same quantizing constant value with which it is divided during quantization. Huffman Coding: The quantized frame will have discrete values associated with each pixel. Huffman coding associates each pixel value with a symbol that can be transmitted easily through a channel. While decompression, the symbols are remapped to their corresponding values and the frame can be reconstructed. Once the frame is out of the Huffman Coding phase, the video stream is ready to be transmitted. ---------------------------------------------

Audio, video codec’s
• – – • – – • – Broadcast (high bit rate): MPEG-1 MPEG-2 Video Conferencing (low bit rate): H.261 H.263 Interactive (full range of bit rates): MPEG-4

G.723.1 G.723.1 is an optional legacy codec included in the 3rd Generation Partnership Project (3GPP) recommendation for compatibility with standards such as H.323. A look-ahead of 7.5 ms duration is also used. Music or tones such as DTMF or fax tones cannot be transported reliably with this CODEC, and thus some other method such as G.711 or out-of-band methods should be used to transport these signals. G.723.1 operates at two bit rates of 6.3 kbit/s and 5.3 kbit/s. G.711 G.711 is an ITU-T standard for audio companding. G.711 is a standard to represent 8 bit compressed pulse code modulation (PCM) samples for signals of voice frequencies, sampled at the rate of 8000 samples/second and 8 bits per sample. G.711 encoder will thus create a 64 kbit/s bitstream. This codec is used to transmit DTMF and fax tones in E1/T1 lines.

There are two main algorithms defined in the standard, mu-law algorithm (used in North America & Japan) and a-law algorithm (used in Europe and rest of the world) G.729 G.729 is mostly used in Voice over IP (VoIP) applications for its low bandwidth requirement. Music or tones such as DTMF or fax tones cannot be transported reliably with this codec, and thus use G.711 or out-of-band methods to transport these signals. Also very common is G.729a which is compatible with G.729, but requires less computation. This lower complexity is not free since speech quality is marginally worsened. The annex B of G.729 is a silence compression scheme, which has a Voice Activity Detection (VAD) module (used to detect voice activity, speech or non speech), Comfort noise generator (CNG), a DTX module which decides on updating the background noise parameters for non speech (noisy frames) which are also called as SID frames. G.729 operates at 8 kbit/s, but there are extensions, which provide also 6.4 kbit/s and 11.8 kbit/s rates for marginally worse and better speech quality respectively. GSM-AMR Under 3G-324M, the adaptive multi-rate (AMR) codec is the mandatory speech codec. AMR can operate at different rates between 12.2 and 4.75 kbps. It also supports comfort noise generation (CNG) and a discontinuous transmission (DTX) mode. It can dynamically adjust its rate and error control, providing the best speech quality for the current channel conditions. The AMR codec also supports unequal error detection and protection (UED/UEP). This scheme partitions the bit stream into classes on the basis of their perceptual relevance. An AMR frame is discarded if errors are detected in the most perceptually relevant data, otherwise it is decoded and error concealment is applied. Since the ability to suppress silence is one of the primary motivations for using packets to transmit voice, the real time protocol (RTP) header carries both a sequence number and a timestamp to allow a receiver to distinguish between lost packets and periods of time when no data was transmitted. Some payload formats define a "silence insertion descriptor" or "comfort noise" (CN) frame (like G.711 codec which is sample based; i.e. the encodings produce one or more octets per sample) to specify parameters for artificial noise that may be generated during a period of silence to approximate the background noise at the source. Some codecs like G729 (It is a frame based codec because it encodes a fixed-length block of audio into another block of compressed data, typically also of fixed length) have the silent frames as a part of the codec frame structure and hence don’t need separate payload format for the silent frame. When the CN payload format is used with another payload format, different values in the RTP payload type field distinguish comfortnoise packets from those of the selected payload format. The RTP header for the comfort noise packet SHOULD be constructed as if the comfort noise were an independent codec. Each RTP packet containing comfort noise MUST contain exactly one CN payload per channel. This is required since the CN payload has a variable length. The CN packet update rate is left implementation specific. The CN payload format provides a minimum interoperability specification for communication of comfort noise parameters. The comfort noise analysis and synthesis as well as the VAD and DTX algorithms are unspecified and left implementation-specific.

H.261 Designed for video phone and video conference over ISDN • Bit rate: n x 64kbps, n [1, 30] • QCIF (172x144), CIF (352x288) • Coding Scheme – DCT based compression to reduce spatial redundancy (similar to JPEG) – Block based motion compensation to reduce temporal redundancy H.263 Designed for low bit rate video applications • Bit rate: 10 ~ 384kbps • SQCIF (128x96) ~ 16CIF (1408x1152) • Coding similar to H.261 but more efficient H.263 is a video codec designed by the ITU-T as a low-bit rate encoding solution for videoconferencing. It is a legacy codec that is used by existing H.323 systems and has been kept for compatibility. It was further enhanced to codec’s such as H.263v2 (a.k.a. H.263+ or H.263 1998) and H.263v3 (a.k.a. H.263++ or H.263 2000). H.264: This is one of the most advanced standards for video compression. This is based on the basic compression principles like most standards but has some unique features. The average bitrate reduction in H.264 is 50% which is higher than any other standards mentioned above. Video conferencing, Tele-medicine, Satellite telecast are some of the application that uses H.264. MPEG-1 Designed for storage/retrieval of VHS quality video on CD-ROM • Bit rate: ~1.5Mbps • Similar Coding scheme to H.261 with: – Random access support – Fast forward/backward support Standard used for the compression of moving pictures and audio. This was based on CD-ROM video applications, and is a popular standard for transmitting video sequences over the internet. In addition, level 3 of MPEG-1 is the most popular standard for digital compression of audio--known as MP3. MPEG-1 is designed for bitrates up to 1.5 Mbit/sec. MPEG-2 Designed for broadcast quality video storage and transport • HDTV support • Bit rate: 2Mbps or higher (CBR/VBR) • Two system bit streams: Packet Stream & Transport Stream • Used for:

– DVD – DirecTV – Digital CATV This standard is mainly used in Digital Television set top boxes and DVD video. It is based on MPEG-1, but has some special features for digital broadcast television. The most significant enhancement from MPEG-1 is its ability to efficiently compress interlaced video. MPEG-2 scales well to HDTV resolution and bit rates, reducing the need for an MPEG-3. Designed for Videos with bitrate between 1.5 and 15 Mbit/sec.

Video Compression: Deficiencies of existing standards

• Designed for specific usage – H.263 cannot be stored (no random access) – MPEG-1 & MPEG-2: not optimized for IP transport • No universal file format for both local storage and network streaming • Output cannot be reused efficiently after composition - encoded once, no versatility

Video Compression: Requirements for New Standard

• Efficient coding scheme – Code once, use and reuse everywhere – optimized for both local access and network streaming • Works well in both error prone and error free environment – Scalable for different bandwidth usage – Video format can be changed on the fly – Transparent to underlying transport network • Support efficient interactivity over network The solution is : MPEG-4

MPEG-4 • Internet in the future – Not only text and graphics, but also audio and video • Fast and versatile interactivity – Zoom in; zoom out (remote monitoring) – Fast forward and fast backward (video on demand) – Change viewing point (online shopping, sports) – Trigger a series of events (distance learning) – On the fly composition – Virtual environments • Support both low bandwidth connections (wireless/mobile) and high bit rates (fixed/wire line) MPEG-4 is a standard used primarily to compress audio and video (AV) digital data. It is more flexible than H.263 baseline and offers advanced error detection and correction schemes. MPEG-4 absorbs many of the features of MPEG-1 and MPEG-2 and other related standards, adding new features such as (extended) VRML support for 3D rendering,

object-oriented composite files (including audio, video and VRML objects), support for externally-specified Digital Rights Management and various types of interactivity. AAC (Advanced Audio Codec) was standardized as an adjunct to MPEG-2 (as Part 7) before MPEG-4 was issued. MPEG-4 Standard is predominantly used for multimedia and Web compression. MPEG-4 involves object-based compression, similar in nature to the Virtual Reality Modeling Language. Individual objects within a scene are tracked separately and compressed together to create an MPEG4 file. This leads to a efficient compression that is very scalable, from low bit rates to very high. It allows developers to access objects in a scene independently, and therefore introduce interactivity. Most of the features included in MPEG-4 are left to individual developers to decide whether to implement them, which is why it is divided into many parts ranging from part1 to part 22. -----------------------------------------

There are frame and stream based codecs, compared for video codecs in Table 3. The comparison of latency, quality and applications, show that the codec type needs to be matched to the type of use of the system, taking into consideration the bandwidth of the system. In following Table, the speech codecs are compared based on the differences in their frame duration, frame size, Bit rate and RTP payload type. The RTP payload type is the number specified in the RFC’s for the respective codec. Codec MJPEG Wavelet MPEG-4 H.263 Compre ssion Framebased Framebased Streambased Streambased Transfor m DCT Wavelet DCT and Wavelet DCT Bit Rate 10~300 0 30~750 0 10 ~ 10000 30~200 Resolutio n Any size 160x120 ~ 320x240 64x48 ~ 4096x409 6 128x96 ~ 1408x115 2 Frame Rate 0~30 8~30 1~ 60 10 ~ 15 Latenc y Low High Mediu m Low Quality Broadca st Visually lossless Internet Video phone Applicatio n IP networks Various Wireless to Digital TV Teleconfer e-nce

Table 3: Comparison of video codecs

SINo: 1 2 3 4 5 6 7

Name of the Codec G711 G723 G723 –LO G723 – HI G729 G729A GSMEFR

Frame duration (mSec) 5-10 30 30 30 10 10 20

Frame size (bytes) 48 24 24 24 10 10 31

Bitrate in Kbps 64 53.3 53.3 64 8 8 13.2

RTP payload_type 0 4 4 4 18 18 99

Table 4: Comparison of the speech codecs Many leading major commercial DSP processors from Analog Devices, Motorola, and Texas Instruments, Freescale, ARM (not exactly a DSP but is the core for may of the DSP’s) are used in these gateways. The user has to analyze the processor to choose from based on • cycle count • speed • cost/performance • energy efficiency • memory usage • different call scenarios the gateway will be handling The codecs may not be ported on each one of them, for determining the above factors, but some DSP modules like FIR filter, FFT etc can be used to evaluate the above factors. There are certain benchmarking suites available in the market which might help the reader in deciding a better processor for the intended gateway.

Most Commonly used Video/Audio File Formats
Extension 3g2 .3gp .3gp2 .3gpp .3mm .60d .aep .ajp .amv .asf .asx .avb .avi .avs File Description 3GPP2 Multimedia File 3GPP Multimedia File 3GPP Multimedia File 3GPP Media File 3D Movie Maker Movie Project CCTV Video Clip After Effects Project CCTV Video File Anime Music Video File Advanced Systems Format File Microsoft ASF Redirector File Avid Bin File Audio Video Interleave File AviSynth Script File Extension .mpv2 .mqv .msh .mswmm .mts .mvb .mvc .nsv .nvc .ogm .par .pds .piv .playlist File Description MPEG-2 Video Stream Sony Movie Format File Visual Communicator Project File Windows Movie Maker Project AVCHD Video File Multimedia Viewer Book Source File Movie Collector Catalog Nullsoft Streaming Video File NeroVision Express Project File Ogg Media File Dedicated Micros DVR Recording PowerDirector Project File Pivot Stickfigure Animation CyberLink PowerDVD Playlist

.avs .bik .bix .box .byu .camrec .cvc .d2v .d3v .dat .dce .dif .dir .divx .dmb .dpg .dv .dvr-ms .dvx .dxr .eye .fcp .flc .fli .flv .flx .gl .grasp .gvi .gvp .ifo .imovieproj .imovieproject .ivf .ivr .ivs .izz .izzy .lsf .lsx .m1pg .m1v .m21 .m2t .m2ts .m2v .m4e

Application Visualization System File BINK Video File Kodicom Video File Kodicom Video Brigham Young University Movie Camtasia Studio Screen Recording cVideo DVD2AVI File Datel Video File VCD Video File DriveCam Video Digital Interface Format Adobe Director Movie DivX-Encoded Movie File Digital Multimedia Broadcasting File Nintendo DS Movie File Digital Video File Microsoft Digital Video Recording DivX Video File Protected Macromedia Director Movie Eyemail Video Recording File Final Cut Project FLIC Animation FLIC Animation Flash Video File FLIC Animation GRASP Animation GRASP Animation Google Video File Google Video Pointer DVD-Video Disc Information File iMovie Project File iMovie Project Indeo Video Format File Internet Video Recording Internet Streaming Video Isadora Media Control Project Isadora Project Streaming Media Format Streaming Media Shortcut iFinish Video Clip MPEG-1 Video File MPEG-21 File HDV Video File Blu-ray BDAV Video File MPEG-2 Video MPEG-4 Video File

.pmf .pro .prproj .prx .qt .qtch .qtz .rm .rmvb .rp .rts .rts .rum .rv .sbk .scm .scm .scn .sfvidcap .smil .smk .smv .spl .srt .ssm .str .svi .swf .swi .tda3mt .tivo .tod .ts .vdo .veg .vf .vfw .vid .viewlet .viv .vivo .vlab .vob .vp6 .vp7 .vro .w32

PSP Movie File ProPresenter Export File Premiere Pro Project Windows Media Profile Apple QuickTime Movie QuickTime Cache File Quartz Composer File Real Media File RealVideo Variable Bit Rate File RealPix Clip RealPlayer Streaming Media QuickTime Real-Time Streaming Format Bink Video Subtitle File Real Video File SWiSH Project Backup File ScreenCam Screen Recording Super Chain Media File Pinnacle Studio Scene File Sonic Foundry Video Capture File SMIL Presentation File Smacker Compressed Movie File VideoLink Mail Video FutureSplash Animation Subtitle File Standard Streaming Metafile PlayStation Video Stream Samsung Video File Macromedia Flash Movie SWiSH Project File DivX Author Template File TiVo Video File JVC Everio Video Capture File Video Transport Stream File VDOLive Media File Vegas Video Project Vegas Movie Studio Project File Video for Windows Generic Video File Qarbon Viewlet VivoActive Video File VivoActive Video File VisionLab Studio Project File DVD Video Object File TrueMotion VP6 Video File TrueMotion VP7 Video File DVD Video Recording Format WinCAPs Subtitle File

.m4u .m4v .mjp .mkv .mod .moov .mov .movie .mp21 .mp4 .mpe .mpeg .mpg

MPEG-4 Playlist iTunes Video File MJPEG Video File Matroska Video File JVC Recorded Video File Apple QuickTime Movie Apple QuickTime Movie QuickTime Movie File MPEG-21 Multimedia File MPEG-4 Video File MPEG Movie File MPEG Video File MPEG Video File

.wcp .wm .wmd .wmmp .wmv .wmx .wvx .xvid .yuv .zm1 .zm2 .zm3 .zmv

WinDVD Creator Project File Windows Media File Windows Media Download Package Windows Movie Maker Project File Windows Media Video File Windows Media Redirector Windows Media Video Redirector Xvid-Encoded Video File YUV Video File ZSNES Movie #1 File ZSNES Movie #2 File ZSNES Movie #3 File ZSNES Movie File

What Is Videoconferencing?
Conducting a conference between two or more participants at different sites by using ISDN or computer networks to transmit audio and video data. Video-conferencing A video communications session among three or more people who are geographically separated. This form of conferencing started with room systems where groups of people met in a room with a wide-angle camera and large monitors to hold a conference with other groups at remote locations. Federal, state and local governments are making major investments in group videoconferencing for distance learning and telemedicine.*

Benefits of Videoconferencing
• • • • Interaction with people and classrooms anywhere in the world Share and collaboration on data Expose students to the latest technology available Save time and money involved in travel for meetings • Distance learning - providing opportunities for learning that would otherwise be unavailable in all settings

Videoconferencing Protocols
Videoconferencing protocols are based on the standards set by the IEEE*.

• •

H.323 - Videoconferencing over LAN H.320 - Videoconferencing over ISDN

*Institute of Electrical & Electronics Engineers H.320 – A dedicated pipe (mapped circuit) connecting locations. ISDN Advantage: Always on connections Disadvantages: Pricey for equipment and the dedicated line. Errors can cause call to drop.

H.323 - Video over IP. Has the ability to dial by the IP address or alias. Includes the T1 capabilities for sharing and collaboration. Can be used on both private WANs and public Internet. It is packet based. Advantages: More cost effective (higher speeds at lower cost than H.320) • Ability to integrate into an existing network • You can connect to an existing H.320 infrastructure • Has the ability to go over the public Internet. Disadvantages: Firewalls block video traffic • Not enough bandwidth on IP network resulting in choppy IP video • Non-secure transmission of data

Videoconferencing Terms
• • • • MCU Gatekeeper Gateway CODEC

Multipoint Control Unit (MCU): Negotiates multiple clients in a conference format. The client does scheduling from a GUI interface that allows the client to pick a "virtual" conference room and decide if the meeting is private or public. The host client can then invite other participants to join scheduled or impromptu virtual meetings right at the client desktop. It translates the various protocols (I.e. H.320, H.323, ISDN) into one videoconference so all can understand, regardless of what protocol they are running. Gatekeeper: This component of H.323 manages the inbound and outbound bandwidth from the LAN. The gatekeeper registers clients and coordinates communications with other gatekeepers. It verifies users’ identities through static IP addressing and allows them to pass through to the MCU. There are four features within a Gatekeeper:

• • •

Admission control authorizes clients’ access to the LAN. Bandwidth control manages bandwidth for each network segment. Client network addresses are translated so participants can dial network locations with aliases (such as e-mail addresses) instead of IP addresses. • Call management monitors H.323 calls, tracks rejected calls, accounts for use of WAN links and monitors other H.323 components. Gateway: Gateway is the IP address of YOUR router, not MOREnet’s router. CODEC: CODEC stands for coder-decoder. It translates signals from analog to digital and back again.

Types of Videoconferencing
• • • Point-to-point Multipoint Multicast

(A) Point-to-point:

Point-to-point Videoconference between two end points; directly connected to each other by IP or ISDN Advantages:    Clearer reception between the two sites Less scheduling Only the two parties involved in the conference need to schedule Both sites must be using the same protocol Only two sites are allowed

Disadvantages: • •

(B) Multipoint:

Multipoint Three or more end points participating in a conference; accomplished by connecting to an Multipoint Control Unit (MCU). Advantages: • Many sites using differing protocols can be connected in the same conference. • Better monitoring of the connections Disadvantages: • Slight increase in latency. • Must be scheduled in advance with a Multipoint Control Unit. (MCU)

(C) Multicast:

Multicast One-way communications to multiple locations. Like a TV broadcast.

Disadvantages • • • No interaction from student. “Talking head” MOREnet currently does not support

Video over IP
Data/ Video/ Voice in ONE Net

Go to Video over IP

Traditional CCTV System   Coaxial Cable Analog Signal

Problems: • Hard to remote management and maintenance • Storing in Video Tape, difficult to manage the video data and maintain the quality • Analog signal system, hard to integrate with other system

DVR Solution  Video stored in digital data  PC-based infrastructure Problems: • Stand-alone System, poor in integration • In Windows DVR, system stability is a problem. • In Linux or single-chip DVR, service is a key maintain issue • Hard to manage in large or distributed system

IP Network in Video Surveillance   Transmitted in IP Network Client/ Server-based infrastructure

Benefits: • Expandable & Integrated Network system • Suitable for large or distributed system • Lower total ownership cost • Capable of remote management and maintenance • Good flexibility for the system upgrade or re-layout


IP Networ k
Control Room Fiel d

Video over IP Solution Structure

Video Capture

Resolution by NTSC & PAL Different TV resolution standard in LINE/FIELD, mostly NTSC is for US, and PAL for Europe NTSC 720 x 480 704 x 480 640 x 480

352 x 240 176 x 112 PAL 720 x 704 x 640 x 352 x 176 x

576 576 576 288 144

Other likeness • CCIR601, RS170: much like NTSC • SECAM: much like PAL Compression- MJPEG Algorithm •

MJPEG (Motion JPEG): compressed by each still image Other likeness: JPEG2000, Wavelet

Compression- MPEG Algorithm

• •

MPEG: compressed by each moving object Other likeness: H.261/ 263, MPEG1, MPEG2, MPEG4

Video Capture IP products IP Camera: A camera that the video is transferred directly into IP signal (= Analog camera + 1-ch Video Server) Other alias: Network Camera

Video Server: A device can digitize the analog video signal for IP network transmitting Other alias: Encoder, IP Codec, Camera Server


Transmission Media Video Transmission needs more bandwidth than Data and Voice. Higher bandwidth can result in better video performance (FPS and Quality)

Bandwidth Requirement • Simple calculation Bandwidth requirement = Image Size per frame x FPS (Frame per second) x (1 + 3% IP overhead) x (1+30% margins) x 8bits For example: 5 Kbytes x 30 FPS x1.03 x 1.3 x 8 bits = 1.6 Mbps Note: Video record storage space can be also calculated by this formula: Image Size per frame x FPS (Frame per second) x record time = total storage space requirement Network Protocols Advanced Network protocols can help the video transmission more efficient, such as:

Integrated with Alarm Systems

Video Motion Detection (VMD) Detect the change of objects in the images to decide triggering alarm or not.

DI/ DO Control Integrate with DI (Digital Input) sensors and DO (Relay output) Alarms can build an intelligent video surveillance system.

Integrated with other Systems In IP Network, all systems can be integrated in 1 system for the centralized control and interoperation purpose.

View & Record

Multiple Camera Viewing Formats Capability to view multiple camera’s image in 1 windows, such as

Video over IP Solution (Apllication)

IP Video Technologies— Video Conference Architecture a Typical H.323 Terminal

Type of H.323 Endpoints

Video Conference Architecture H.323 Components

Video Conference Architecture- Call Signaling and Flow

Video Conference Architecture Call Signaling and Flow H.323 Multipoint Videoconference H.323 Multipoint

Vide oconference

The ISDN to IP Migration for Videoconferencing
Introduction Since the release of IP-capable videoconferencing solutions in the mid-1990s, the percentage of video calls hosted over IP networks has continued to grow. As shown in the left chart below, WR estimates that in 2004 IP became the most common network used for hosting videoconference calls. Virtually all video systems today include IP network capability, while only a limited percentage support ISDN. For some, the justification for migrating from ISDN to IP for videoconferencing was purely financial as it allowed companies to enjoy a pay-one-price cost structure for unlimited videoconferencing usage. For many others, however, it was the soft benefits of running videoconferencing over IP, such as enhanced reliability and manageability, tighter security, and an improved user experience that prompted the shift. This session provides insight into the pros and cons of the four most common network architectures in use today for videoconferencing: • • ISDN networks – using digital phone lines from telephone companies Converged IP - using the enterprise’s data network to host video traffic

• IP Overlay - involves the deployment of a dedicated network to host video traffic • Hybrid – utilizing a combination of the above options to meet specific business challenges For both new and existing VC users, there are many benefits and reasons for running videoconferencing traffic over IP. Even if customers won’t save significant costs by migrating from ISDN to IP, the IP strategy allows enterprise managers to turn videoconferencing into a manageable enterprise business tool, instead of a technology gadget or curiosity. Architecting the Videoconferencing Environment Modern day videoconferencing environments follow one of four basic network architectures: ISDN-only, Converged IP, IP Overlay, or some combination of the three which we call a hybrid environment. ISDN-Only Environments The diagram below highlights a traditional videoconferencing environment using only ISDN service from a local telephone provider. Note that this organization may not be able to connect to IP-only external endpoints (listed as Client Location below).

Figure 2: Traditional ISDN-Only Videoconferencing Environment ISDN-Only Advantages Data Isolation - In an ISDN videoconferencing environment the video traffic does not touch the organization’s data network, which is a source of comfort for IT and network managers. Universal Availability - ISDN service is available almost anywhere in the world (or at least in most places where phones services are available). Low Fixed Costs - The fixed monthly cost for ISDN services is relatively low (typically $150 per month for 384 kbps ISDN connectivity), which makes ISDN cost-effective for organizations with limited monthly video usage. ISDN-Only Disadvantages Endpoint Cost – With today’s videoconferencing systems, ISDN network support is typically an option costing several thousands of dollars per endpoint. Endpoint Monitoring – ISDN-only environments typically include a number of legacy, ISDN-only video systems which do not support the advanced endpoint monitoring features available on current video endpoints. It is not possible to monitor the health and “readiness” of these video endpoints. Network Monitoring – Like the plain old telephone network (POTS), ISDN is a switched technology in which the network is only connected when calls are in progress. This means that an ISDN problem, such as a down ISDN line, will not be apparent until a call is attempted – at which point the likelihood is that the users will be impacted. Even commercially available video network management systems are not able to detect ISDN issues while calls unless a call is connected. Network Efficiency and Scalability – The typical ISDN environment requires that each endpoint have its own dedicated bandwidth1, which means that even though the ISDN lines connected to a specific system may only be in use for a few hours each month, that system’s ISDN bandwidth cannot be shared with other endpoints. Deploying additional endpoints will require additional ISDN lines. Usage Costs – In most ISDN environments, every single video call – whether across town, across the world, or simply between two rooms in the same building – will involve per-minute ISDN transport and usage fees. Depending upon the frequency of usage, these

fees can be quite high on a monthly basis and can negatively impact the adoption of videoconferencing within the enterprise. Global Reach – In order to communicate with IP-only endpoints, such as those deployed at the partner location shown above, either an ISDN to IP gateway device or an external gateway service must be used. Lack of Redundancy – In the event that one or more of an endpoint’s ISDN lines experiences problems, the endpoints’ ability to communicate will either be blocked or impacted. There is no alternate network to host the video traffic. Limited TELCO Support – The decreased demand for ISDN lines for videoconferencing has prompted telephone companies to reduce their ISDN support staff; a phenomenon that can significantly impact ISDN troubleshooting and problem resolution efforts. Converged IP Environments In a converged IP environment, videoconferencing traffic rides over the organization’s primary IP data network as shown in the diagram below. Note that unless an ISDN gateway (or gateway service is used), this enterprise may not be able to connect to ISDNonly endpoints (labeled Partner Location below).

Figure 3: Converged IP Videoconferencing Environment Converged IP Advantages Ability to Leverage Infrastructure – Since the endpoints are connected to the corporate IP network, the enterprise can to leverage its existing network lines, support staff, and monitoring / management systems. Improved Reliability – IP endpoints and network can be monitored continuously meaning that should a problem arise, the support team will be pro-actively notified, unlike an ISDN environment in which problems are only discovered once a call is attempted. In addition, ISDN video calls use multiple lines bonded together to form a single data pipe; a process that often causes problems during ISDN video calls. Enhanced Manageability – IP-capable video systems can be remotely managed either individually or using a centralized management system like TMS, a software solution available from TANDBERG, the sponsor of this white paper. Management features include remote call launching and termination, endpoint configuration, software upgrades, and more. Note that some legacy ISDN-only video systems include IP connections for remote

management, but the management capabilities do not include monitoring of the ISDN network lines. Installation Simplicity – By using IP instead of ISDN, organizations can avoid the headaches often associated with the deployment of ISDN lines including assignment of SPIDs and the activation of long distance service. Expanded Scalability – In an IP environment, the deployment of an additional video system does not require the activation of dedicated lines. Instead, the enterprise simply needs to connect the video system to the enterprise network. This is especially important for organizations planning to make desktop videoconferencing capabilities available to their user base as these deployments typically involve thousands of endpoints. Decreased Cost of Ownership – IP-only endpoints are less expensive to purchase (ISDN is now an optional add-on for most endpoints), cheaper to keep under a service plan (fees are based on purchase price), and do not require dedicated ISDN lines, resulting in a lower total cost of ownership. Predictable Usage Fees – While ISDN is a “metered” service with transport fees charged on a per-minute basis, IP networks typically include unlimited usage for a fixed monthly fee. This allows enterprise organizations to predict and budget for the monthly costs associated with videoconferencing. Call Speed Flexibility – In ISDN environments, the maximum possible connection speed stems from the number of installed ISDN lines (ex. 3 ISDN lines permit a single call up to 384 kbps). In an IP environment, endpoints are usually connected to high bandwidth connections either on the LAN or WAN, and therefore higher bandwidth calls are often possible. This is especially important for multisite meetings during which the host endpoint may require additional bandwidth to host the meeting. Tighter Security – Although most IP video endpoints include support for AES data encryption, including secure password authentication, most legacy ISDN systems do not support encryption. Because securing ISDN calls on legacy endpoints requires the use of expensive and complex external encryption systems, these are used primarily in military and government environments. Converged IP Disadvantages Network Capability - Many enterprise networks are not equipped to host video traffic, and cannot be cost-effectively upgraded to do so in some locations. For example, in one organization the connections to the Los Angeles and London offices may be “video-ready,” but those to the Milan and Singapore offices are not up to the task. In an IP-only environment, the Milan and Singapore offices would be unreachable from the enterprise’s IP video systems (unless an ISDN gateway product / service or an IP-overlay solution was used). Endpoint Capability – Many legacy video systems are not IP-capable and would need to be replaced or upgraded to function in an IP-only environment. Global Reach – In order to communicate with ISDN-only endpoints, such as those deployed at the client location shown above, either an IP to ISDN gateway device or an external gateway service must be used. In addition, corporate security systems, including the enterprise firewalls and NAT systems, often block IP traffic between enterprises, making it impossible to host IP video calls between organizations. Lack of Redundancy – In the event that the enterprise LAN or WAN experiences problems, one or more endpoints may be unable to place or receive video calls. Once again, there is no alternate network to host the video traffic. Potential Impact on Network – If not properly planned and managed, it is possible that the videoconferencing traffic could negatively impact the other traffic on the data network. This risk, however, is easily avoided through the use of a videoconferencing gatekeeper.

IP Overlay Environments Many organizations are unable to host videoconferencing traffic on all or specific segments of their primary data network due to limited bandwidth or lack of QoS (quality of service). To bypass these issues, some organizations choose to replace their ISDN network with a totally separate IP network dedicated to hosting IP video traffic. The graphic below highlights an IP overlay environment. Note the use of the IP overlay network provider’s ISDN and Internet gateways to allow the host organization to connect to external ISDN and IP endpoints.

Figure 4: Pure IP-Overlay Videoconferencing Environment IP Overlay Advantages IP video overlay solutions share many of the advantages of the converged IP solution, plus several key advantages: Network Isolation - the IP overlay architecture allows organizations to enjoy the benefits of IP videoconferencing without impacting the existing data network. Upgrade Avoidance – the IP overlay method allows an organization to avoid the need for network capacity and/or performance upgrades in some or all locations. IP-Overlay Disadvantages IP video overlay solution disadvantages include the need to purchase additional network services dedicated to hosting IP video traffic, and the fact that gateways (which never improve but often detract from the user experience) must be used to conduct calls with any locations not on the IP overlay network. Hybrid Video Environments The fourth videoconferencing architecture involves a combination of two or more of the ISDN, converged IP, and IP overlay methodologies as shown below.

Figure 5: Hybrid IP / ISDN Videoconferencing Environment As shown above, in a well-designed hybrid environment, the majority of the enterprise endpoints have access to both IP and ISDN connections, either directly or using the enterprise gateway. In addition, the use of a session border controller (labeled SBC in the diagram above) allows internal IP endpoints to connect to external IP (Internet) endpoints without compromising enterprise network security. Hybrid Environment Advantages This architecture affords many advantages from the three prior methods, plus additional benefits: Endpoint Flexibility –The enterprise can utilize a mixture of new (and relatively inexpensive) IP-capable video endpoints and legacy, ISDN-only endpoints. Network Redundancy – Since most endpoints have access to IP and ISDN connections, video connections can be made even if one of the networks (IP or ISDN) is experiencing problems. Global Reach – The support for IP and ISDN video traffic throughout the enterprise makes it easier to host video calls between different organizations. Hybrid Environment Disadvantages The most significant disadvantage of this method is the frequent use of gateways (products or services) to connect to internal and external video endpoints.

Frequently Asked Questions

1. What are the benefits of IP vs. ISDN for business-quality videoconferencing? The business case for IP vs. ISDN-based videoconferencing spans quality, cost, management, efficiency, reliability, and scalability areas. a) ISDN is usually inexpensive to own, but it is expensive to use. Besides an initial capital outlay to provision select conference rooms with ISDN connectivity, there are few additional costs required to begin videoconferencing using ISDN. A standard ISDN business-level videoconferencing call at 384 Kbps requires the bonding together of 6 ISDN channels; higher call speeds require the bonding of additional channels. Enterprises pay for ISDN on a per-minute-per-B-channel basis (often based on distance as well), making the use of the equipment costly. TV quality video at 768 Kbps on an ISDN system quickly becomes prohibitive in cost. These expensive ISDN usage fees often prohibit deep adoption of ISDN videoconferencing within an organization or enterprise. b) The availability of flat-rate pricing for IP videoconferencing, on the other hand, allows calls at bandwidths too expensive for ISDN, including some IP calls up to 2 Mbps and beyond. These high bit rate calls enable higher quality audio and video communications. Because IP is so affordable and due to the pervasiveness of IP network connections, IP videoconferencing endpoints can be deployed across the enterprise economically. c) Furthermore, since IP systems do not bond channels together like ISDN system do, very high call reliability rivaling the POTS network can be achieved. ISDN has proven itself unreliable over the years due to the channel bonding problem: if one of the bonded channels is dropped during a call, often the entire call goes down. Most companies using ISDN are delighted to achieve a 92-94% success rate while those using IP videoconferencing often achieve greater than 99% reliability. d) IP videoconferencing also permits significant management benefits. IP based video systems are always connected to the packet-switched network. This constant connectivity allows these systems to be remotely controlled and managed from a central, remote location. Large-scale conferencing environments often use an IPbased software product, called a gatekeeper, to control and track the usage of their videoconferencing systems, enabling improved measurement of ROI and convenient billing mechanisms. e) One of the primary advantages of deploying IP based videoconferencing is the ability to use an organization’s existing data network as the means of transport. This is called “converged networking”. Converged networking can result in both cost savings and efficiency enhancements because only one network is deployed, maintained, and managed. f) Furthermore, since IP connections are already nearly everywhere – to every enterprise conference room and to every enterprise desktop, scaling voice and video over IP applications is easy because the network is already deployed, debugged, up and running. ISDN requires a separate network infrastructure and a separate management team, and will usually be limited to niche deployments within the enterprise. 2. What network protocols are used for IP videoconferencing services? The two most important protocols are H.323 and Session Initiation Protocol (SIP).

H.323. H.323 is an ITU umbrella standard describing a family of protocols used to perform call control for multimedia communication on packet networks. The most important protocols used to set up, manage, and tear down calls are H.225 and H.245. H.225 is used to perform call control, and H.245 is used to perform call management. In the most basic use of H.323v1 to set up a call, an endpoint initiates an H.225 exchange on a TCP wellknown port with another endpoint. This exchange uses the Q.931 signaling protocol. Once a call has been established using Q.931 procedures, the H.245 call management phase of the call is begun. H.245 negotiations take place on a separate channel from the one used for H.225 call setup (although with the use of H.245 tunneling, H.245 messages can be encapsulated in Q.931 messages on existing H.225 channels), and the H.245 channel is dynamically allocated during the H.225 phase. The port number to be used for H.245 negotiation is not known in advance. The media channels (those used to transport voice and video) are similarly dynamically allocated, this time using the H.245 Open Logical Channel procedure. Note that H.245 channels are unidirectional. In a minimal situation with direct call signaling between endpoints and the use of one bi-directional voice channel, for each call there will be a minimum of five channels (one H.225 channel, one H.245 channel, and one shared voice channel). Three of these will be on dynamically allocated ports. Business-quality IP video communication between two H.323 endpoints typically requires in excess of 380 kbps data rates for each unidirectional media channel or aggregate data rates of over 750 kbps. SIP. Session Initiation Protocol (SIP) is an Internet Multimedia Architecture established by the Internet Engineering Task Force (IETF –, SIP may be used for Voice over IP (VoIP), video conferencing, instant messaging, and is being planned for use in 3G wireless applications as well as new converged data and voice applications. It is an application layer signaling protocol used to establish, modify, and terminate multimedia sessions and is part of the. SIP applications include voice, video, gaming, instant messaging, presence, call control, etc. In the spirit of other Internet based applications, SIP relies on a number of other computer communications standards including Session Description Protocol (SDP), Real-Time Protocol (RTP), TCP, UDP, and so on. SIP messages are based on HTTP protocol and have a similar text-based structure. SIP uses Uniform Resource Indicators (URIs) which are a more general form of world-wide web Uniform Resource Locators (URLs). There are a number of URI forms, including user@domain, domain, user@ipaddr, telephone-number@domain. SIP messages can also use other URIs, such as the telephone URL (as defined in IETF RFC2806) Generally, the IP components are defined as user agents, proxies, redirect servers, and registrars: user agents are much like an endpoint in H.323 and may be telephones, video units, PDAs, etc. SIP communicates between these four components using a request – response data model. Messages between components are initiated when one component sends a request message (called a method) to second component. Responses consist of a numerical code and a textual “reason”. To initiate a session, one SIP device sends an “invite” message to another SIP device. SDP is carried in the SIP message to describe the media streams and RTP is used to exchange real-time media streams.

3. What are the main IP videoconferencing deployment issues?

The main technical issues surrounding IP videoconferencing deployment include the following: • Quality of Service. Quality of service (QoS) is a network term that specifies a guaranteed throughput level. In layman’s terms, it means that the network must be designed so that the voice and video data are transmitted through the network with a minimum of delay and loss. The network must be carefully evaluated to insure that it will be able to transmit voice and video data properly. Often components in the network must be upgraded or additional routers, switches, or “packet shaping” devices may be required. • Overlay Network vs. Converged Network Architecture. Enterprises may not want to put voice and video data in competition with mission critical data applications such as market or manufacturing data running across the same network. Consequently, a separate QoS enabled “overlay” network may be deployed for voice and video applications. • Security. Most enterprise networks employ firewalls and network address translation (NAT) in an effort to prevent hackers or unauthorized persons to have access to the data one network. Voice and video over IP are not NAT and firewall friendly. Organizations will need to consider how they will securely traverse the corporate firewalls or NAT system be modified, re-configured, or upgraded to allow IP-based videoconferencing traffic? • Bandwidth Over the WAN. IP data connections must be available at the locations where the enterprise needs to use video. These will typically be available from service providers (or to consider what other alternatives are available and what bandwidth will be required. In general, satellites communications will not offer sufficient quality of service for IP videoconferencing due to excessive latency. • Multipoint Bridging Capability. Organizations will need to consider whether more than two parties will need to participate in a video call. If so, some type of video multipoint bridging capability will be necessary. The MCU may be purchased and managed internally or all bridging functions may be outsourced to a service provider. If an internal MCU is to be purchased, then the magnitude of the initial investment increases and internal staff will need to be allocated to manage the video bridge. • IP-ISDN Gateway Needs. As the transition to IP will not be complete for some years to come, organizations using IP video systems will likely need to communicate others using ISDN. Organizations will need to consider how many gateways are needed and how many ISDN lines should be provisioned for each. Rather than owning the gateway with its associated capital outlay and management costs, an enterprise may utilize a gateway owned and managed by a service provider. • Additional IT Resource Requirements: Network maintenance and support resources must be willing, capable, and available to support a converged network carrying data, video, and voice traffic? Organizations should consider if additional resources will be required to manage new video-centric devices on the network. 4. What are quality-of-service (QoS) requirements for IP-based voice and video? Real-time IP applications, such as videoconferencing and voice-over-IP are much more sensitive to network quality of service vis-à-vis store-and-forward-type of data applications, such as e-mail and file transfer. Quality of Service (QoS) refers to intelligence in the network to grant appropriate network performance to satisfy an application’s requirements. For multimedia over IP networks, the goal is to preserve both the mission-critical data in the presence of multimedia voice and video and to preserve the voice and video quality in the presence of bursty data traffic. Four parameters are generally used to describe quality of service: latency or delay, the

amount of time it takes a packet to transverse the network; jitter, the variation in delay from packet to packet; bandwidth, the data rate that can be supported on the network; and packet loss, the per cent of packets that do not make it to their destination for various reasons. • End-to-end latency. End-to-end latency refers to the total transit time for packets in a data stream to arrive at the remote endpoint. The upper bound for latency for H.323 voice and video packets should not be more than 125-150 milliseconds. The average packet size for video packets is usually large (800-1500 bytes) while audio packet sizes are generally small (480 bytes or less). This means that the average latency for an audio packet may be less than that for a video packet as intervening routers/switches typically prioritize smaller over larger packets when encountering network congestion. In addition, an H.323 video call actually represents four streams – each station sends and receives audio and video. The difference in latency of the streams will manifest itself as additional delay (both H.323 and SIP convey sufficient information to lip-synch the various streams). • Jitter or variability of delay. This refers to the variability of latencies for packets within a given data stream and should not exceed 20 - 50 milliseconds. An example would be a data stream in a 30 FPS H.323 session that has an average transit time of 115 milliseconds. If a single packet encountered a jitter of 145 milliseconds or more (relative to a prior packet), an underun condition may occur at the receiving endpoint, potentially causing either blocky, jerky video or undesirable audio. Too much jitter can cause inter-stream latencies which as discussed next. • Inter-stream latency. This refers to the relative latencies that can be encountered between the audio and video data streams and is based on how the relative average transit time for the given streams, at any given point, vary from each other. In this case the relative latency variations are not symmetrical. This is due to the fact that the human brain already compensates for audio latency relative to video. Due to this fact, an audio stream that starts arriving at an endpoint 30 milliseconds ahead of its video stream counterpart(s) will produce detectable lipsynchronization problems for most participants. An audio stream that arrives later than its associated video stream data has a slightly higher tolerance of 40 milliseconds before the loss of audio and video synchronization becomes generally detectable. • Packet loss. This term refers to the loss or desequencing of data packets in a real-time audio/video data stream. A packet loss rate of 1% produces roughly a loss of one fast video update per second for a video stream producing jerky video. Lost audio packets produce choppy, broken audio. Since audio operates with smaller packets at a lower bandwidth, in general, it is usually less likely to encounter packet loss, but an audio stream is not immune from the effects of packet loss. A 2% packet loss rate starts to render the video stream generally unusable, though audio may be minimally acceptable. Consistent packet loss above 2% is definitely unacceptable for H.323 videoconferencing unless some type of packet loss correction algorithm is used between the endpoints. Packet loss in the 1-2% should still be considered a poor network environment and the cause of this type of consistent, significant packet loss should be resolved.

Three tools for network quality of service Three types of tools or solutions are available to the network engineer to build quality of service into the network system. 1) Provisioning means providing adequate bandwidth for all voice, video, and data applications that traverse a common network. By using a 100 Mbps Ethernet network for example instead of a 10Mbps network, the network is more likely to support multimedia traffic together with data. Note that IP networks typically have significant packet overhead. For example, a 384 Kbps video call actually requires about 10% additional bandwidth for IP overhead; furthermore, when going from IP to ATM or frame relay, an additional 10% of the call bandwidth should be allocated for encapsulation. Hence, a 384 Kbps IP call traversing an ATM backbone may required as much as 460 Kbps of bandwidth. 2) Classifying means giving packets a classification based on their priority. Voice packets would be given the highest priority since they are very delay and jitter sensitive, even though they are not particularly bandwidth challenging. Video packets might be given a slightly lower priority; and email packets, for example, given the lowest priority. There are many different classification schemes possible, including some that are in the process of being standardized. One common scheme is to give VoIP packets an IP precedence of of 5 and videoconferencing applications an IP precedence of 4. 3) Queuing refers to a process that takes place in the routers and switches whereby different queues or buffers are established for the different packet classifications. One of the buffers, for example, might be a delay and drop sensitive buffer designed to handle voice and/or video packets. Many queuing schemes are available for implementation. Solving QoS over IP networks for multimedia conferencing is a two-phase problem: 1. Guarantee QoS within a specific, controlled enterprise intranet or service provider network. 2. Guarantee QoS across the hand-off (peering) points between the networks. The public Internet presents this second challenge to the extreme. Four major QoS initiatives are RSVP (resource ReSerVation Protocol), IP Precedence, and Differentiated Services (DiffServ) from the IETF, and 802.1p from the IEEE. Improved quality of service through use of standard mechanisms, such as DiffServ and MPLS, is the key factor behind the promise of broad-based use of interactive business-quality IP video. The underlying requirement is for the IP video infrastructure to enable end-to-end prioritized processing and delivery of video traffic between subscriber networks and carrier core networks. This requires prioritized treatment of video traffic over the “last mile” access network through the metro network through the carrier networks. While DiffServ is gaining broad support to

enable “soft” QoS through prioritization of processing of traffic by service provider routers, MPLS is being deployed in service provider networks as an adjunct to enable the fine-grained delivery of a number of value-added services. 5. What are enterprise network policy-related challenges for H.323-and SIPbased video usage? Use of high data rate applications, such as H.323/SIP-based business-quality video, has the potential of significantly impacting available LAN capacity for data traffic. Even with the on-going migration of corporate LANs to gigabit backbones and 100BT switched subnets, uncontrolled usage of interactive video services has the potential of severely reducing response times for business applications. Thus, the H.323 video delivery infrastructure is required to permit corporations to implement fine-grained controls regarding who can use interactive video and under what conditions. Specifically, corporations require the ability to control: • Who (by user and IP address) can use IP videoconferencing services? • Can specific users and end-points only receive and/or initiate calls. • What types of codec’s can specific end-points/users use for calls? • Maximum aggregate video traffic throughput coming into/exiting an enterprise network. • The above by time of day. 6. What are the formats of videoconferencing? There are mainly two formats for Videoconferencing: 1. Point-to-Point Videoconferencing: This is conferencing with video and audio on the network much like a video telephone. It is a conference between two sites where each site can have capabilities like document sharing, chatting, etc. 2. Multiple Point Videoconferencing: Multipoint videoconferencing allows three or more participants to sit in a virtual conference room and communicate as if they were sitting right next to each other. Related to multiple site videoconferencing is bridging where sites connect through a meeting point software that supports capabilities like document sharing, chatting, etc. See also: Desktop Videoconferencing and Room-Based Videoconferencing 7. What are the most used protocols in Video Conference? H.323: Internet-Based connection that is the cheapest way to go with. Basically, you use the Internet as the medium to transmit audio and video. H.320: or what is known by ISDN which is transmitted through digital telephone lines. There is a cost associated with the usage of this protocol. Video Essential: