High-speed Distributed Video Transcoding for Multiple Rates and Formats

IEICE TRANS. INF. & SYST., VOL.Exx–??, NO.
xx XXXX 200x
1
PAPER
High-speed Distributed Video Transcoding for Multiple
Rates and Formats
Yasuo SAMBE†a) , Member, Shintaro WATANABE† , Dong YU† , Nonmembers,
Taichi NAKAMURA†† , and Naoki WAKAMIYA††† , Members
SUMMARY This paper describes a distributed video proposed [2]–[7], and most attempt to decrease the com-
transcoding system that can simultaneously transcode an MPEG- putational complexity by using information like DCT
2 video file into various video coding formats with different rates.
coefficients and the motion vectors extracted from the
The transcoder divides the MPEG-2 file into small segments
along the time axis and transcodes them in parallel. Efficient original coded data. None of them, however, was de-
video segment handling methods are proposed that minimize the signed to produce multiple formats and rates. The aim
inter-processor communication overhead and eliminate temporal of our work is to provide a video transcoding system
discontinuities from the re-encoded video. We investigate how that can convert MPEG-2 video files into other kinds
segment transcoding should be distributed to obtain the shortest
total transcoding time. Experimental results show that imple-
of formats and bit-rates at high-speed. To realize mul-
menting distributed transcoding on 10 PCs can decrease the to- tiple transcoding and speed up the transcoding process,
tal transcoding time by a factor of about 7 for single transcoding we integrate multiple processors to fully decode and re-
and by a factor of 9.5 for simultaneous three kinds of transcoding encode incoming video
rates.
Our transcoding system divides the MPEG-2 file
key words: MPEG, video-transcoding, distributed computing
into small segments along the time axis and transcodes
them in parallel [8]. Parallel transcoding along time
1. Introduction
axis usually suffers from quality discontinuity and
degradation around the segmented cut points in the re-
With the launch of digital broadcasting in many coun-
encoded video, because of a lack of information such as
tries and the proliferation of digital video disks (DVD),
the coding complexity of the previous video segment.
both of which use the MPEG-2 video coding standard
To get the information of the previous segment from
[1], it is expected that MPEG-2 will become the de
another processor requires inter-processor communica-
facto video compression format in video archives. On
tion, which leads to additional overhead and a signifi-
the contrary, as the number of different video compres-
cant performance degradation.
sion algorithms in use increases in the networked video
To achieve high performance without significant
application over the Internet, there is a growing de-
quality degradation due to parallel transcoding, we pro-
mand to convert a pre-encoded MPEG-2 digital video
pose the segment handling method; it divides in-coming
in archives to other compressed formats such as MPEG-
MPEG-2 data with minimum duplication and the data
1, MPEG-4, H.263 and so on.
are used to determine re-encoding parameters. We
Besides converting formats, video content will be
also investigate scheduling algorithms and the segment
altered in terms of bit-rate and resolution to meet the
length of the distributed transcode that minimizes over-
network bandwidth and terminal capability. For in-
all transcoding time.
stance, the bandwidth of end user access networks can
The organization of this paper is as follows. Sec-
vary from several tens of kilo-bits per second to 20 to
tion 2 introduces the distributed video transcoder ar-
30 megabits per second. Moreover, modern terminals
chitecture. Section 3 proposes a video segment han-
use displays with various sizes and resolutions. There-
dling techniques for distributed transcoding along with
fore, it is often necessary for service providers delivering
experimental results. In section 4, we investigate the
video over the Internet to transcode the same content
performance model of the transcoder and the optimum
to yield different video formats, spatial resolution, and
segment length. Section 5 uses experimental results to
bit-rates simultaneously.
assess the performance model. Our conclusion is pre-
Several video transcoding techniques have been
sented in Section 6.
†
The authors are with the R&D Headquarters, NTT
DATA Corporation, Tokyo, Japan 2. System Overview
††
The author is with the School of Computer Science,
Tokyo University of Technology, Tokyo, Japan
††† Our transcoding system consists of a source PC, sev-
The author is with the Department of Information Net-
working, Graduate School of Information Science and Tech- eral transcoding PCs, and a merging PC. These PC
nology, Osaka University, Osaka, Japan are connected by Giga-bit Ethernet LAN, as shown in
a) E-mail: sanbey@nttdata.co.jp Fig.1.
IEICE TRANS. INF. & SYST., VOL.Exx–??, NO.xx XXXX 200x
2
Source fied. Decoded frames are filtered, resized, and passed to

Video Segment the encoder frame by frame. Figure.2 shows a block di-
agram of the transcoding PC. In the segment transcod-
Source Transcoding
PC PC ing process, a frame is decoded only once and the en-
Source coding modules specified by transcoding parameters re-
Video LAN Transcoding encode the frame. This frame by frame based transcod-
(MPEG-2) PC ing architecture gives greater flexibility in transcoding.
For example, new encoding modules or new filter oper-
Merging Transcoding ations like digital water marking can be easily added.
PC
PC All of the modules shown in the figure, including trans-
Transcoded ・ mission modules, are implemented by multi-thread pro-
Video ・
・ gramming. Therefore, both transmitting and transcod-
Transcoded Transcoding ing process also can run in parallel.
Video Segment PC Transcoded segments are sent to the merging PC
and concatenated to form the desired video format files
Fig. 1 Distributed video transcoder at the merging PC. After the second segment for each
desired file is received, the concatenation process begins
and runs in parallel to the following segment transcod-
Source Transcoded
Segments Segments ing. This concatenation process includes modification
of time-code. In this paper, video buffer verifier (VBV)
MPEG-2 Filter, Encoder
Decoder Resize (MPEG-1) requirements are not taken into account, because the
probability that a bitstream concatenated with seg-
Encoder ments does not meet the VBV requirements can be de-
(MPEG-2) creased by overlapped segment transcoding. In order
to strictly guarantee the requirements, efficient meth-
Encoder ods for compressed domain video editing as proposed
(MPEG-4) in [12] should be applied .
・
・
・
3. video segment handling
Fig. 2 Block diagram of a transcoding PC
If the video segments are to be transcoded in parallel
with minimal communication overhead, we need effi-
The source PC has a source MPEG-2 Program cient video segment handling techniques with regard to
Stream (PS) input file in which is multiplexed audio Open-GOP as well as rate control for re-encoding the
and video data. Upon user or operator request, the segments.
source PC demultiplexes the audio and video data, di-
vides the MPEG-2 video file into video segments of 3.1 Segmentation at Open-GOP
appropriate length, and transmits these segments to
the transcoding PCs. Segment length is determined so There are two kinds of GOP in MPEG-2: Closed-GOP
that the total transcoding time will be minimized with and Open-GOP. In the case of Closed-GOP, all frames
additional overlapped data, as explained in Section 3 of the GOP can be independently decoded and no prob-
and 4. When the source PC transmits segments to the lem occurs in the decoding process. On the contrary, if
transcoding PC, it also sends the transcoding parame- the first GOP of the segment is an Open-GOP, the last
ters that specify the operation of the filter and the en- reference frame of the previous segment is needed to de-
coder of transcoding PC. These parameters include fil- code the first bidirectional coded frame. To better un-
ter function specifications, spatial resolution, temporal derstand this, Fig.3 shows a typical GOP structure. In
resolution, and re-encoding formats desired. The de- this figure, GOP (k−1) is a closed-GOP and GOP (k) is
multiplexing, dividing and transmitting processes are an Open-GOP. Decoding bidirectionally coded frames
implemented by multi-threaded programming so that B14 and B15 requires both P13 of GOP (k − 1) and
they can be performed in parallel. A video segment I16. If GOP(k − 1) is located on another transcoding
consists of one or more consecutive Groups of Pictures PC, the transcoding PC processing GOP(k) should get
(GOP). Audio transcoding is done on a single transcod- the decoded P13 frame via inter-processor communica-
ing PC, because it is not computationally expensive to tion. Generally, Open-GOP is more efficient in coding
transcode audio. Audio transcoding is ignored here- than Closed-GOP, because the former reduces the tem-
after. poral redundancy between consecutive frames around
Each transcoding PC decodes and re-encodes the GOP boundaries. Therefore most GOPs in MPEG-2
video segments into the different video formats speci- encoded files are Open-GOPs and most segments would
SAMBE et al.: HIGH-SPEED DISTRIBUTED VIDEO TRANSCODING FOR MULTIPLE RATES AND FORMATS
3
Display Order
I B B P B B P B B P B B P B B I B B P
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
I P B B P B B P B B P B B I B B P B B
1 4 2 3 7 5 6 10 8 9 13 11 12 16 14 15 19 17 18
GOP(k-1) GOP(k)
Coded Order
Fig. 3 Segmentation at Open-GOP
begin with Open-GOP. dn . The virtual buffer dn is the accumulation of dif-

This leads to the following approaches: (i) trans- ference between the actual size and the target size of
mitting decoded reference frames among transcoding coded frames. The quantization scale Q is calculated
PCs, (ii) duplicating the coded frames data in seg- as Q = rdn /31, where r is reaction parameter which is
mentation so that each segment can be decoded in- defined as r = 2bitrate/f ramerate.
dependently. The second approach is more suitable, Therefore, if the Xn and dn of the first frames in a
because the size of coded data is much smaller than segment can be estimated properly, the re-encoder can
decoded data and it takes smaller time to transmit. In calculate the target bit budget and the initial quanti-
this distributed transcoding system, source PC makes zation scale so that re-encoded video quality is made
video segments with duplicating one GOP after seg- to be constant around segment cut point. In our sys-
mentation point and transmits it to a transcoding PC. tem, each transcoding PC determines the complexities
The first bi-directionally coded frames of the GOP are of the first frames of the video segment from the source
transcoded by the PC. The next segment except the MPEG-2 coded data. In fact, the complexities of the
first bi-directionally coded frames is transcoded by an- transcoded frames would be different from the input
other PC. By this method, all frames of the Open-GOP video due to the frame resolution and bit-rate change.
can be transcoded and the transcoded frames has the However, since it has been shown that there are strong
same structure of source coded video. correlations between the input and output video [6], we
employ the complexities of the input coded data, multi-
3.2 Re-Encoding Rate Control around Cut Points plying each of them by the ratio of output resolution to
that of input. The virtual buffer memory dn can’t be
If each video segment is encoded independently, seg- estimated without actual frame size of all re-encoded
ment encoding quality may differ which leads to discon- frames including those of frames on other transcoding
tinuity around the segmentation points and irregular PCs. However, dn is expected to be stabilized within a
video quality. This is because re-encoding parameters few GOP transcoding. To shorten the stabilization pe-
of each segment are determined without regard to the riod, dn of the first frames is calculated from source
coding complexity of the previous segment. The more MPEG-2 data as dn = 31Qn /r. The re-encoder of
complexity the video frame has, the more bits must be transcoding PC begins to transcode one GOP before
allocated to make coded video quality constant. segmentation point with the initial complexities and
In the widely used MPEG-2 Test Model 5 (TM5) virtual buffer memory.
rate control [10], the target size of the frame is made To verify the above proposed segment handling, we
proportional to complexity. In TM5, the frame com- conducted segment transcoding experiments as follows:
plexity Xn is defined as the product of the coded frame
size Sn (in bits) and the average quantization scale Qn
Table 1 Experimental transcoding conditions
of the frame, where n denotes the coding picture type
(I, P, B). The complexity of the frame to be coded is Source Video Format MPEG-2 MP@ML
Source Video football, flower garden
estimated as the same as that of the same type of pre- mobile & calendar, sailboat
vious coded frame. For example in Fig.3, in encoding Source Video Rate 8 Mbit/s
B14, the complexity of the frame is estimated as the Source Frame Size Horizontal 720 pixels，
same as that of B12. By doing this, TM5 assures that Vertical 480 lines
Output Video Coding MPEG-2, MPEG-4 ASP
the video quality keeps consistent. TM5 calculate the
Output Video Rate MPEG-2: 2 Mbps, 1 Mbps
target frame size using the complexity. After decid- MPEG-4: 1 Mbps, 750 Kbps
ing the target frame size, quantization scale Q of each Output Frame Size Horizontal 360 pixels，
macro-block is determined so that actual coded size will Vertical 240 lines
be equal to the target size using virtual buffer memory GOP/GOV Structure M=3，N=15
4
16
MPEG-2 PS file 2nd segment
Source PC k -1 k k + 1 ) 14
segment ( segment (
4
01
x( 12
s) s + 1)
(a) Simply segmented yr

o 10
m
e
Transcoding PC(j) M 8
erf
k -1 k
fu 6
Transcoding PC(j+1) B
k k + 1
la
(b) 1GOP overlapped utr 4 No segmented
i
Transcoding PC(j) V 2 Simply Segmented
k -1 k 1GOP overlapped
1GOP overlapped with initial adjustment
0
Transcoding PC(j+1) k -1 k k + 1 45 50 55 60 65 70 75 80 85 90
(c) 1GOP overlapped with initial adjustment Frame Number
Transcoding PC(j) Fig. 6 Virtual buffer memory around cut-point: MPEG-2 of

k -1 k
2Mbit/s, mobile & calendar
Transcoding PC(j+1) k -1 k k + 1
12
initial re-encoding rate-control parameters 2nd segment
estimated with the source MPEG-2 PS file 10 No segmented
Simply Segmented
Fig. 4 Video segment handling methods ) 1GOP overlapped
5
01 8 1GOP overlapped with initial adjustment
x(
30 tyi 6
No segmented xe
Simply Segmented lp
1GOP overlapped m 4
25
1GOP overlapped with initial adjustment o
)
C
2
01 20
5
x(
yit 15 0
45 50 55 60 65 70 75 80 85 90
xe
lp Frame Number
m
o 10
C Fig. 7 Complexity around cut-point: MPEG-4 of 750Kbit/s,
5 sailboat
2nd segment
0 4.0
45 50 55 60 65 70 75 80 85 90 2nd segment
Fig. 5 Complexity around cut-point: MPEG-2 of 2Mbit/s, ) 3.5
4
01
mobile & calendar
x( 3.0
yr
o 2.5
m
e
(a) simply segmented and transcoded independently, M
re 2.0
(b) 1GOP overlapped transcoding without initial pa- ffu 1.5
rameter adjustment, (c) 1GOP overlapped transcod- B
la No segmented
ing with initial parameters, as the above proposed tuir 1.0 Simply Segmented
method. These methods are illustrated in Fig.4 and V 0.5 1GOP overlapped
the transcoding parameters are listed in Table.1. The 0
first segment has the first 60 frames and the second seg- 45 50 55 60 65 70 75 80 85 90
ment has the next 60 frames. Figure 5 and 6 show the Frame Number
the complexity Xn and virtual buffer memory dn us- Fig. 8 Virtual buffer memory around cut-point: MPEG-4 of
ing the standard video mobile & calendar for MPEG-2 750Kbit/s, sailboat
transcoded at 2 Mbit/sec and Fig.7- 8 show them using
sailboat for MPEG-4 transcoded at 750 Kbit/sec. We
can see that the Xn and dn of the proposed method 9 shows the Peak Signal-to-Noise Ratio (PSNR) of these
(c) in the second segment are the most approximate to methods using the standard video mobile&garden for
those of transcoding without segmentation. The pro- MPEG-2 transcoded at 2 Mbit/sec and Fig.10 shows
posed initial adjustment shorten the period in which them using sailboat for MPEG-4 transcoded at 750
the Xn and dn become close to those of non seg- Kbit/sec. These results show that the proposed method
mented transcoding, compared with the one GOP over- (c) achieves the same quality and continuity as the non-
lapped transcoding without initial adjustment. Figure segmented video sequence. Example transcoded pic-
5
24
23
)
22
B
d
(
R 21
N
S
P
20 No segmented
Simply segmented
19 1GOP overlapped
18
55 60 65 70 75 80 85 90 95 100 105
Frame Number
Fig. 9 Transcoded video quality around cut-point: MPEG-2 Fig. 12 An example frame of the proposed transcoding method
of 2Mbit/s, mobile & calendar (c) at cut-point
34 12
No segmented
) Simply Segmented
4
01 10 1GOP overlapped
32 x( 1GOP overlapped with initial adjustment
yr 8
o
)
B
d
(
R 30
m
e
N
S M
re 6
ffu
P
28 No segmented B
Simply segmented la 4
1GOP overlapped utr
1GOP overlapped with initial adjustment i 2
26 V 2nd Segment
55 60 65 70 75 80 85 90 95 100
Frame Number 0
45 50 55 60 65 70 75 80 85 90
Fig. 10 Transcoded video quality around cut-point: MPEG-4 Frame Number
of 750Kbit/s, sailboat
Fig. 13 Virtual buffer memory around cut-point: MPEG-4 of
1Mbit/s, sailboat
No segmented
26 Simply Segmented
1GOP overlapped
25
)
B
d
(
R
N
24
S
P
23
Fig. 11 An example frame of simply segmented method (a) at 22

cut-point 55 60 65 70 75 80 85 90 95 100 105
Frame Number
Fig. 14 Transcoded video quality around cut-point: MPEG-4
tures are shown in Fig.11 and Fig.12. These pictures of 1Mbit/s, sailboat
are the frame 60 of simply segmented method and that
of the proposed method in Fig.10, respectively. The
proposed method is especially efficient for video hav- transcoding conditions, there are some cases in which
ing little movement like this sailboat video, because the method (b) ’s degradation are smaller than those of
even short time quality degradation as Fig.11 for this the proposed method. This is because the initial esti-
kind of video is very noticeable. mation of virtual buffer memory dn doesn’t work well
Table 2 shows the comparison of quality degrada- due to saturation of quantization scale Q estimated for
tion during 30 frames after cut-point (the frame 60), that of output video segment. When this case occurs,
compared with transcoded video without segmentation. the estimated size of previous coded frames are smaller
Although the results show the proposed method than that of actual size, and then the dn is estimated
achieve the least quality degradation for many much higher than that of video without segmentation.
6
Table 2 Comparison of image quality degradation for segment transcoding methods

format rate Simply 1GOP Overlapped 1GOP Overlapped
segmented with initial adjustment
(bit/s) (dB) (dB) (dB)
MPEG-4
football 750K -24.6 -16.7 -6.1
flower 750K -13.8 -2.3 -5.3
mobile&calendar 750K -15.0 -8.9 -6.2
sailboat 750K -44.8 -14.4 -9.7
football 1M -17.3 -6.4 -3.1
flower 1M -11.9 -2.2 -5.0
mobile&calendar 1M -11.3 -4.2 -6.5
sailboat 1M -46.2 -12.8 -3.3
MPEG-2
football 1M -33.9 -13.2 -2.0
flower 1M -15.0 -5.2 -2.0
sailboat 1M -32.5 -5.6 -5.8
football 2M -12.9 -1.4 -1.3
flower 2M -11.7 -1.0 -0.4
sailboat 2M -28.9 -0.2 -0.2
As an example of this case, dn using mobile & calendar ball video. The video sequence included a wide variety
for MPEG-4 transcoded at 1 Mbit/sec is shown in Fig. of frames in terms of the degree of movement and tex-
13. There are large discrepancies at the initial frame ture. The other transcoding parameters were the same
(frame 45) between the dn of the proposed method and as those in Table 1. The transcoding PC had two 1.26
that of the non segmented method and some differences GHz Pentium processors and 2GB of memory. Fig.15
remains at the second segment. However, as shown in and Fig.16 show the transcoding time and the time his-
Fig 14, the period in which the proposed method has togram respectively. These results show that the fluc-
the most quality degradation is limited within a few tuation in segment transcoding time is about only 5%
frames. or so and the transcoding speed can be estimated as
constant within the video sequence. Therefore, for a
4. Segment Allocation of Distributed Transcod- transcoding job, the video segment transcoding time
ing can be taken as the product of c and d, where c is
transcoding performance and d is segment length in
This section describes how the video segments should terms of display time.
be allocated to transcoding PCs in order to minimize This c is time taken to transcode unit length of
the overall transcoding time. source video and is the sum of decoding time cdec and
re-encoding time cenc . For multiple transcoding into
4.1 Segment Transcoding Time p kinds of rates or formats, the time taken to encode
only increases: c = cdec + pcenc . However, in case that
In a transcoding PC, since the MPEG-2 decoder, filter, transcoding PC has multiple CPUs like our experimen-
and encoders are implemented as threads operating in tal system, the part of encoding might be done in par-
parallel and the encoding process is the most time con- allel and the re-encoding time is less than pcenc . This
suming process, the segment transcoding time primar- parallelization effect depends on the implementation of
ily depends on the encoder. encoder. In our system having dual CPU, when two
Most encoding algorithms include discrete cosine encoders run, about 60% of encoding process is over-
transform, motion compensation (MC). In this paper, lapped and c can be estimated as
the encoders we use employ simple block matching mo-
tion estimation with fixed search range in the MC.
Therefore, The time taken to transcode segments are c = cdec + cenc (p = 1)
expected to be constant irrespective of the degree of p
movement and the texture. = cdec + 1.4 cenc (p ≥ 2) (1)
2
In order to ensure the assumption that segment
transcoding time can be treated as constant and to in-
vestigate how the video segment transcoding can be where cdec ≈ 0.5, cenc ≈ 1.5 for the transcoding con-
determined, ditions listed as Table.1. The estimation of cdec and
we measured the MPEG-4 transcoding time of 40 cenc using other transcoding conditions remains further
second video segments created from a one hour foot- study.
7
MPEG-2 PS file ( s bit/s, sec in terms of display time )

R F
Source PC
Segment Length = (sec) in tems of display time
d
Demultiplexing and Transmission ( demux bit/s)

R
Segment Transcoding ( = × ) c d
Transcoding PCs Segment Handling ( OH sec)
T
PC(1)
PC(2)
PC(3)
PC(m)
Transmission of
Transcoded Video Segments over LAN
Merging PC
Fig. 17 Process flow of the distributed transcoding
)c 100
es 90 PC, because the delay until the first video segment is
( transmitted to the transcoding PC becomes shorter.
e 80
im
T 70
However, this increases overhead costs including the
gn 60 segment handling process described in the previous sec-
id 50 tion and transmission overhead such as connection set-
oc
sn 40 ting, making the total transcoding time longer.
raT 30 If we assume that segment transmission time is rel-
tn 20 atively small and can be neglected,the total transcoding
e 10 time (Ttotal ) can be estimated as the sum of the time
m
ge 0 taken to transcode a source file (length F in display
S 0 10 20 30 40 50 60 70 80 90
time) using m PCs in parallel and the time (Tc ) which
Segment Number is not able to do in parallel. The former is cF/m. Ac-
Fig. 15 Transcoding time of each 40 second video segment cording to Amdahl’s law [9], the total transcoding time
Ttotal is estimated using parallelism a as
45 a
40
Ttotal = (Tc + cF ){(1 − a) + } (2)
m
35
yc 30 a=
cF
(3)
ne 25 cF + Tc
uq 20
er 15 The Tc is the sum of time taken to transmit the
F 10
5
first video segment to the last PC (which is denoted
0 by PC(m) in Fig.17, and the time taken to transcode
< 60 60 65 70 75 80 85 ≧90 overlapped 2GOP data as described in 3.2 and com-
-65 -70 -75 -80 -85 -90
munication setup overhead. We assume the latter time
Segment Transcoding Time (sec) is constant for each segment and denoted as TOH , and
Fig. 16 Histogram of the segment transcoding time assume that segment merging time can be neglected,
because the merging time is relatively small compared
to the transcoding time and the only few last segments
contribute to Tc . Then, the Tc can be estimated as
4.2 Performance model and the optimum segment
length dRs m TOH F
Tc = + (4)
Rdemux dm
Since the segment transcoding time can be considered
as constant, the proposed system implements a sim- where Rs and Rdemux are source video coding rate in
ple round-robin scheduling method (Fig.17). Segment bits/ sec, demultiplexing speed in bits/sec, respectively.
lengths are equal, as is transcoding PC performance. Therfore,using above equations,
For this allocation, shortening the length of the video dRs m cF TOH F
segment decreases the waiting time of each transcoding Ttotal = + + (5)
Rdemux m dm
8
1000
)c
es 950
(
e
m
iT 900
gn
id 850
oc
sn
↓d opt=30
ar 800 Experimental
Tl
at Estimated
oT 750
700
0 50 100 150 200 250 300 350
Segment Length (sec)
Fig. 18 Overview of the distributed transcoding system
Fig. 20 Performance of the distributed transcoder using 10
transcoding PCs
1600
)c
se( 1550 8000
e )c 7000
im es
T1500 (
e 6000 Experimental
gn m
iT Estimated
id 1450
oc gn 5000
sn ↑d opt=60 Experimental id
oc 4000
raT1400 Estimated sn
la ar 3000
to 1350 Tl
T at 2000
oT
1300 1000
0 50 100 150 200 250 300 350
0
Segment Length (sec) 0 1 2 3 4 5 6 7 8 9 10
Fig. 19 Performance of the distributed transcoder using 5 Number of transcoding PCs
transcoding PCs
Fig. 21 Performance of the distributed transcoder
The optimum length of segment dopt for minimiz-

ing the total transcoding time Ttotal can be calculated well predict the experimental results. The estimation
by differentiating (5) with respect to segment length d. used c, Rdemux and TOH values of 2.0, 120Mbit/sec
and 1.5 sec, respectively; these values were obtained in
Rdemux TOH F a preliminary experiment. The performance of merg-
dopt = (6)
Rs M 2 ing is 30 Mbit /sec to form each output video listed
in Table 1. Figure 21 shows how the total transcoding
5. Experimental Results time decreases with the number of transcoding PCs. In
this experiment, each segment length was determined
In order to verify the performance model and the opti- by (6).
mum segment length, we conducted experiments using With 10 PCs, the proposed transcoding system de-
the one-hour football video introduced in the previous creases the total transcoding time achieved with one
section and the transcoding parameters in Table 1. The PC by a factor of 7. While performance seems to sat-
overview of experimental system is shown in Fig. 18. urate at about 10 processors due to the bottleneck of
All PCs had two 1.26GHz Intel’s Pentium processors, the de-multiplexing process in our current implemen-
and are connected by 1000Base-T LAN. tation, higher performance can be achieved by improv-
Figure 19 shows the total transcoding time for var- ing the demultiplexing algorithm and tuning operations
ious segment lengths from 10 sec to 360 sec, using 5 such as like disk I/O. Also, to achieve higher perfor-
transcoding PCs and Fig. 20 shows that of 10 transcod- mance, the optimum segment allocation taking into
ing PCs, respectively. Estimated performance and op- account of the fluctuation in the segment transcod-
timum segment length, calculated by Eq.(5) and Eq.(6) ing time needs to be investigated. Figure 22 shows
9
2000 operation.
1800 Acknowledgment
1600 500Kb/s + 1Mb/s + 2Mb/s
)c The Authors would like to thank Professor Masayuki
es 1400
( 500Kb/s + 1Mb/s Murata of Osaka Univ. for his valuable suggestions on
e
m
iT distributed multimedia networks and Dr. Sakuichi Oht-
1200
gn suka of NTT DATA corp. for his comments on video
id 1000 coding quality.
oc
sn
ar 800 References
Tl 500Kb/s
at 600
oT [1] ITU-T H.262/ISO-IEC 13838-2, MPEG-2, H.262, 1996.
[2] G. Morrison, “Video transcoders with low delay,” IEICE
400 Estimated Trans. Commun., vol.E80-B, no.6, pp.963–969, June, 1997.
[3] T. Shanableh and M. Ghanbari，“Heterogeneous video
200 transcoding to lower spatio-temporal resolutions and dif-
ferent encoding formats,” IEEE Trans. Multimedia，Vol.
0 2，No. 2，pp.101–110，June, 2000.
0 50 100 150 200 250 300 350 [4] Z. Lei and N.D. Georganas, “Rate adaptation transcod-
ing for precoded video streams,” Proc. ACM Multimedia02,
Segment Length (sec) pp.127–136, Juan-les-Pins, Dec., 2002.
Fig. 22 Performance of the distributed transcoder for simulta- [5] J. Youn, M.T. Sun and J. Xin, “Video transcoder architec-
neous multiple transcoding ture for bit rate scaling of H.263 bit streams,” Proc. ACM
Multimedia, pp.243–250, Orlando, Nov., 1999.
[6] J. Xin, M.T. Sun and B.S. Choi and K.W. Chun, “An
HDTV-to-SDTV spatial Transcoder,” IEEE Trans. Circuits
the simultaneous multiple MPEG-4 transcoding per- & Systems Video Technology, vol.12, no.11, pp.998–1008,
formance using 10 transcoding PCs. The estimation Nov., 2002
used c is 2.0 for 500Kb/s, 2.6 for 500Kb/s + 1Mb/s, [7] Y. Nakajima and M. Sugano, “MPEG bit rate and for-
and 4.7 for 500Kb/s + 1Mb/s + 2Mb/s, calculated mat conversions for heterogeneous network/storage appli-
by Eq.(1), respectively. These results show that the cations,” IEICE Trans. Electron., vol.E85-C, no.3, pp.492–
504, Mar., 2002.
distributed transcoding is efficient for speedup of mul-
[8] Y. Sambe, S. Watanabe, Dong Yu, T. Nakamura, N.
tiple transcoding. If three kinds of transcoding are Wakamiya, “A High Speed Distributed Video Transcoding
done on one transcoding PC, it will take about 16920 for Multiple Rates and Formats,” Proc. ITC-CSCC2003,
(= 3600 × 4.7) seconds. The distributed transcoding on pp.921–924, 2003.
10 PCs decrease the total time by a factor of about 9.5 [9] J.L. Hennesy and D.A. Patterson, “Computer architecture
(= 16920/1780). The performance become improved : A quantitative approach,” Morgan Kaufmann Inc., 1990.
[10] Test Model 5, ISO/IEC JTC1/ SC29/ WG11/ N0400,
for multiple transcoding compared with that of single
MPEG93/457, April, 1993.
transcoding. The performance improvement is because [11] P. Tiwari and E. Viscito，“A parallel MPEG-2 video en-
decoding process is done only once and eliminate the coder with look-ahead rate control,” Proc. IEEE Interna-
multiple decoding time. The reason that the estimated tional Acoustics, Speech and Signal Processing Conf., vol.4,
performance at short segment length doesn’t well pre- pp.1994–1997，1996．
dict for multiple transcoding is because segment han- [12] R. Egawa，A. A. Alatan，and A. N. Akansu，“Compressed
domain MPEG-2 video editing with VBV requirement,”
dling overhead time TOH is assumed to be constant and IEEE Proc. ICIP2000, 2000.
merging time is neglected. These may cause underesti- [13] Y. Sambe, S. Watanabe, Dong Yu, T. Nakamura, N.
mation of total segment overhead time. Wakamiya, “Distributed video transcoding and its applica-
tion to grid delivery,” IEEE Proc. APCC2003, vol.1, pp.98–
6. Conclusion 102, 2003.
In this paper, we investigated high-speed distributed

video transcoding for multiple rates and formats, us-
ing our proposed segment handling method and rate-
control to ensure uniform transcoded video quality.
Experimental results show the distributed transcoding
system with 10 PCs decrease the total transcoding time
by a factor of 7. More work needs to be done on a
better video rate-control for variable bit rate encoding,
and a better scheduling algorithm for unequal PC per-
formance [13] and that can deal with PC failure during
10
Yasuo Sambe received the B.E. and

M.E. degrees from Osaka University, in
1984 and 1986, respectively. Since 1986,
he has been with NTT Data Corpora-
tion and engaged in research and develop-
ment of multimedia communication sys-
tems and video coding. He ia a member
of IEICE, ACM and IEEE.
Shintaro Watanabe received his

B.S. degree from Tokyo Institute of Tech-
nology in 1994 and M.E. degree from
Nara Institute of Science and Technology
in 1996. He has been working for Re-
search and Development Headquarters at
NTT DATA Corporation since 1996. His
main interests are in image processing and
video processing.
Dong Yu received the M.E. and Ph.D

degrees from University of Tokyo, in 1994
and 1991, respectively. Since 1994, he
has been with NTT Data Corporation and
engaged in research and development of
multimedia communication systems.
Taichi Nakamura received the B.E.

and M.E. degrees from Chiba Univer-
sity, Japan, in 1972 and 1974, respec-
tively. He received the Ph.D. degree
from Hokkaido University in 1988. Since
1974, he worked with NTT Research Lab-
oratory and engaged in research of im-
age communications. Since 1988, he was
with NTT DATA and engaged in research
and development of multimedia systems.
Since 2003, He is a Professor of the School
of Computer Science, Tokyo University of Technology. He is a
member of IEICE and IEEE.
Naoki Wakamiya received the M.E.

and Ph.D. degrees from Osaka Univer-
sity in 1994 and 1996, respectively. He
was a Research Associate of the Gradu-
ate School of Engineering Science, Osaka
University, from 1996 to March 1997, and
a Research Associate of the Educational
Center for Information Processing, Osaka
University, from 1997 to March 1999. He
is an Assistant Professor of the Graduate
School of Information Science and Tech-
nology, Osaka University, since April 1999. His research interests
include performance evaluation of computer communication net-
works, and distributed multimedia systems. He is a member of
IEICE, ACM and IEEE.

High-speed Distributed Video Transcoding for Multiple Rates and Formats

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

High-speed Distributed Video Transcoding for Multiple Rates and Formats

Uploaded by

Copyright:

Available Formats

IEICE TRANS. INF. & SYST., VOL.Exx–??, NO.

Source ﬁed. Decoded frames are ﬁltered, resized, and passed to

Fig. 3 Segmentation at Open-GOP

begin with Open-GOP. dn . The virtual buﬀer dn is the accumulation of dif-

(a) Simply segmented yr

Transcoding PC(j) Fig. 6 Virtual buﬀer memory around cut-point: MPEG-2 of

Fig. 11 An example frame of simply segmented method (a) at 22

Table 2 Comparison of image quality degradation for segment transcoding methods

MPEG-2 PS file ( s bit/s, sec in terms of display time )

Demultiplexing and Transmission ( demux bit/s)

Fig. 17 Process ﬂow of the distributed transcoding

The optimum length of segment dopt for minimiz-

In this paper, we investigated high-speed distributed

Yasuo Sambe received the B.E. and

Shintaro Watanabe received his

Dong Yu received the M.E. and Ph.D

Taichi Nakamura received the B.E.

Naoki Wakamiya received the M.E.

You might also like