Professional Documents
Culture Documents
xx XXXX 200x
1
PAPER
High-speed Distributed Video Transcoding for Multiple
Rates and Formats
Yasuo SAMBE†a) , Member, Shintaro WATANABE† , Dong YU† , Nonmembers,
Taichi NAKAMURA†† , and Naoki WAKAMIYA††† , Members
SUMMARY This paper describes a distributed video proposed [2]–[7], and most attempt to decrease the com-
transcoding system that can simultaneously transcode an MPEG- putational complexity by using information like DCT
2 video file into various video coding formats with different rates.
coefficients and the motion vectors extracted from the
The transcoder divides the MPEG-2 file into small segments
along the time axis and transcodes them in parallel. Efficient original coded data. None of them, however, was de-
video segment handling methods are proposed that minimize the signed to produce multiple formats and rates. The aim
inter-processor communication overhead and eliminate temporal of our work is to provide a video transcoding system
discontinuities from the re-encoded video. We investigate how that can convert MPEG-2 video files into other kinds
segment transcoding should be distributed to obtain the shortest
total transcoding time. Experimental results show that imple-
of formats and bit-rates at high-speed. To realize mul-
menting distributed transcoding on 10 PCs can decrease the to- tiple transcoding and speed up the transcoding process,
tal transcoding time by a factor of about 7 for single transcoding we integrate multiple processors to fully decode and re-
and by a factor of 9.5 for simultaneous three kinds of transcoding encode incoming video
rates.
Our transcoding system divides the MPEG-2 file
key words: MPEG, video-transcoding, distributed computing
into small segments along the time axis and transcodes
them in parallel [8]. Parallel transcoding along time
1. Introduction
axis usually suffers from quality discontinuity and
degradation around the segmented cut points in the re-
With the launch of digital broadcasting in many coun-
encoded video, because of a lack of information such as
tries and the proliferation of digital video disks (DVD),
the coding complexity of the previous video segment.
both of which use the MPEG-2 video coding standard
To get the information of the previous segment from
[1], it is expected that MPEG-2 will become the de
another processor requires inter-processor communica-
facto video compression format in video archives. On
tion, which leads to additional overhead and a signifi-
the contrary, as the number of different video compres-
cant performance degradation.
sion algorithms in use increases in the networked video
To achieve high performance without significant
application over the Internet, there is a growing de-
quality degradation due to parallel transcoding, we pro-
mand to convert a pre-encoded MPEG-2 digital video
pose the segment handling method; it divides in-coming
in archives to other compressed formats such as MPEG-
MPEG-2 data with minimum duplication and the data
1, MPEG-4, H.263 and so on.
are used to determine re-encoding parameters. We
Besides converting formats, video content will be
also investigate scheduling algorithms and the segment
altered in terms of bit-rate and resolution to meet the
length of the distributed transcode that minimizes over-
network bandwidth and terminal capability. For in-
all transcoding time.
stance, the bandwidth of end user access networks can
The organization of this paper is as follows. Sec-
vary from several tens of kilo-bits per second to 20 to
tion 2 introduces the distributed video transcoder ar-
30 megabits per second. Moreover, modern terminals
chitecture. Section 3 proposes a video segment han-
use displays with various sizes and resolutions. There-
dling techniques for distributed transcoding along with
fore, it is often necessary for service providers delivering
experimental results. In section 4, we investigate the
video over the Internet to transcode the same content
performance model of the transcoder and the optimum
to yield different video formats, spatial resolution, and
segment length. Section 5 uses experimental results to
bit-rates simultaneously.
assess the performance model. Our conclusion is pre-
Several video transcoding techniques have been
sented in Section 6.
†
The authors are with the R&D Headquarters, NTT
DATA Corporation, Tokyo, Japan 2. System Overview
††
The author is with the School of Computer Science,
Tokyo University of Technology, Tokyo, Japan
††† Our transcoding system consists of a source PC, sev-
The author is with the Department of Information Net-
working, Graduate School of Information Science and Tech- eral transcoding PCs, and a merging PC. These PC
nology, Osaka University, Osaka, Japan are connected by Giga-bit Ethernet LAN, as shown in
a) E-mail: sanbey@nttdata.co.jp Fig.1.
IEICE TRANS. INF. & SYST., VOL.Exx–??, NO.xx XXXX 200x
2
Display Order
I B B P B B P B B P B B P B B I B B P
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
I P B B P B B P B B P B B I B B P B B
1 4 2 3 7 5 6 10 8 9 13 11 12 16 14 15 19 17 18
GOP(k-1) GOP(k)
Coded Order
16
MPEG-2 PS file 2nd segment
Source PC k -1 k k + 1 ) 14
segment ( segment (
4
01
x( 12
s) s + 1)
fu 6
Transcoding PC(j+1) B
k k + 1
la
(b) 1GOP overlapped utr 4 No segmented
i
Transcoding PC(j) V 2 Simply Segmented
k -1 k 1GOP overlapped
1GOP overlapped with initial adjustment
0
Transcoding PC(j+1) k -1 k k + 1 45 50 55 60 65 70 75 80 85 90
(c) 1GOP overlapped with initial adjustment Frame Number
12
initial re-encoding rate-control parameters 2nd segment
estimated with the source MPEG-2 PS file 10 No segmented
Simply Segmented
Fig. 4 Video segment handling methods ) 1GOP overlapped
5
01 8 1GOP overlapped with initial adjustment
x(
30 tyi 6
No segmented xe
Simply Segmented lp
1GOP overlapped m 4
25
1GOP overlapped with initial adjustment o
)
C
2
01 20
5
x(
yit 15 0
45 50 55 60 65 70 75 80 85 90
xe
lp Frame Number
m
o 10
C Fig. 7 Complexity around cut-point: MPEG-4 of 750Kbit/s,
5 sailboat
2nd segment
0 4.0
45 50 55 60 65 70 75 80 85 90 2nd segment
Fig. 5 Complexity around cut-point: MPEG-2 of 2Mbit/s, ) 3.5
4
01
mobile & calendar
x( 3.0
yr
o 2.5
m
e
(a) simply segmented and transcoded independently, M
re 2.0
(b) 1GOP overlapped transcoding without initial pa- ffu 1.5
rameter adjustment, (c) 1GOP overlapped transcod- B
la No segmented
ing with initial parameters, as the above proposed tuir 1.0 Simply Segmented
method. These methods are illustrated in Fig.4 and V 0.5 1GOP overlapped
1GOP overlapped with initial adjustment
the transcoding parameters are listed in Table.1. The 0
first segment has the first 60 frames and the second seg- 45 50 55 60 65 70 75 80 85 90
ment has the next 60 frames. Figure 5 and 6 show the Frame Number
the complexity Xn and virtual buffer memory dn us- Fig. 8 Virtual buffer memory around cut-point: MPEG-4 of
ing the standard video mobile & calendar for MPEG-2 750Kbit/s, sailboat
transcoded at 2 Mbit/sec and Fig.7- 8 show them using
sailboat for MPEG-4 transcoded at 750 Kbit/sec. We
can see that the Xn and dn of the proposed method 9 shows the Peak Signal-to-Noise Ratio (PSNR) of these
(c) in the second segment are the most approximate to methods using the standard video mobile&garden for
those of transcoding without segmentation. The pro- MPEG-2 transcoded at 2 Mbit/sec and Fig.10 shows
posed initial adjustment shorten the period in which them using sailboat for MPEG-4 transcoded at 750
the Xn and dn become close to those of non seg- Kbit/sec. These results show that the proposed method
mented transcoding, compared with the one GOP over- (c) achieves the same quality and continuity as the non-
lapped transcoding without initial adjustment. Figure segmented video sequence. Example transcoded pic-
SAMBE et al.: HIGH-SPEED DISTRIBUTED VIDEO TRANSCODING FOR MULTIPLE RATES AND FORMATS
5
24
23
)
22
B
d
(
R 21
N
S
P
20 No segmented
Simply segmented
19 1GOP overlapped
1GOP overlapped with initial adjustment
18
55 60 65 70 75 80 85 90 95 100 105
Frame Number
Fig. 9 Transcoded video quality around cut-point: MPEG-2 Fig. 12 An example frame of the proposed transcoding method
of 2Mbit/s, mobile & calendar (c) at cut-point
34 12
No segmented
) Simply Segmented
4
01 10 1GOP overlapped
32 x( 1GOP overlapped with initial adjustment
yr 8
o
)
B
d
(
R 30
m
e
N
S M
re 6
ffu
P
28 No segmented B
Simply segmented la 4
1GOP overlapped utr
1GOP overlapped with initial adjustment i 2
26 V 2nd Segment
55 60 65 70 75 80 85 90 95 100
Frame Number 0
45 50 55 60 65 70 75 80 85 90
Fig. 10 Transcoded video quality around cut-point: MPEG-4 Frame Number
of 750Kbit/s, sailboat
Fig. 13 Virtual buffer memory around cut-point: MPEG-4 of
1Mbit/s, sailboat
No segmented
26 Simply Segmented
1GOP overlapped
1GOP overlapped with initial adjustment
25
)
B
d
(
R
N
24
S
P
23
As an example of this case, dn using mobile & calendar ball video. The video sequence included a wide variety
for MPEG-4 transcoded at 1 Mbit/sec is shown in Fig. of frames in terms of the degree of movement and tex-
13. There are large discrepancies at the initial frame ture. The other transcoding parameters were the same
(frame 45) between the dn of the proposed method and as those in Table 1. The transcoding PC had two 1.26
that of the non segmented method and some differences GHz Pentium processors and 2GB of memory. Fig.15
remains at the second segment. However, as shown in and Fig.16 show the transcoding time and the time his-
Fig 14, the period in which the proposed method has togram respectively. These results show that the fluc-
the most quality degradation is limited within a few tuation in segment transcoding time is about only 5%
frames. or so and the transcoding speed can be estimated as
constant within the video sequence. Therefore, for a
4. Segment Allocation of Distributed Transcod- transcoding job, the video segment transcoding time
ing can be taken as the product of c and d, where c is
transcoding performance and d is segment length in
This section describes how the video segments should terms of display time.
be allocated to transcoding PCs in order to minimize This c is time taken to transcode unit length of
the overall transcoding time. source video and is the sum of decoding time cdec and
re-encoding time cenc . For multiple transcoding into
4.1 Segment Transcoding Time p kinds of rates or formats, the time taken to encode
only increases: c = cdec + pcenc . However, in case that
In a transcoding PC, since the MPEG-2 decoder, filter, transcoding PC has multiple CPUs like our experimen-
and encoders are implemented as threads operating in tal system, the part of encoding might be done in par-
parallel and the encoding process is the most time con- allel and the re-encoding time is less than pcenc . This
suming process, the segment transcoding time primar- parallelization effect depends on the implementation of
ily depends on the encoder. encoder. In our system having dual CPU, when two
Most encoding algorithms include discrete cosine encoders run, about 60% of encoding process is over-
transform, motion compensation (MC). In this paper, lapped and c can be estimated as
the encoders we use employ simple block matching mo-
tion estimation with fixed search range in the MC.
Therefore, The time taken to transcode segments are c = cdec + cenc (p = 1)
expected to be constant irrespective of the degree of p
movement and the texture. = cdec + 1.4 cenc (p ≥ 2) (1)
2
In order to ensure the assumption that segment
transcoding time can be treated as constant and to in-
vestigate how the video segment transcoding can be where cdec ≈ 0.5, cenc ≈ 1.5 for the transcoding con-
determined, ditions listed as Table.1. The estimation of cdec and
we measured the MPEG-4 transcoding time of 40 cenc using other transcoding conditions remains further
second video segments created from a one hour foot- study.
SAMBE et al.: HIGH-SPEED DISTRIBUTED VIDEO TRANSCODING FOR MULTIPLE RATES AND FORMATS
7
Source PC
Segment Length = (sec) in tems of display time
d
Segment Transcoding ( = × ) c d
Transcoding PCs Segment Handling ( OH sec)
T
PC(1)
PC(2)
PC(3)
PC(m)
Transmission of
Transcoded Video Segments over LAN
Merging PC
)c 100
es 90 PC, because the delay until the first video segment is
( transmitted to the transcoding PC becomes shorter.
e 80
im
T 70
However, this increases overhead costs including the
gn 60 segment handling process described in the previous sec-
id 50 tion and transmission overhead such as connection set-
oc
sn 40 ting, making the total transcoding time longer.
raT 30 If we assume that segment transmission time is rel-
tn 20 atively small and can be neglected,the total transcoding
e 10 time (Ttotal ) can be estimated as the sum of the time
m
ge 0 taken to transcode a source file (length F in display
S 0 10 20 30 40 50 60 70 80 90
time) using m PCs in parallel and the time (Tc ) which
Segment Number is not able to do in parallel. The former is cF/m. Ac-
Fig. 15 Transcoding time of each 40 second video segment cording to Amdahl’s law [9], the total transcoding time
Ttotal is estimated using parallelism a as
45 a
40
Ttotal = (Tc + cF ){(1 − a) + } (2)
m
35
yc 30 a=
cF
(3)
ne 25 cF + Tc
uq 20
er 15 The Tc is the sum of time taken to transmit the
F 10
5
first video segment to the last PC (which is denoted
0 by PC(m) in Fig.17, and the time taken to transcode
< 60 60 65 70 75 80 85 ≧90 overlapped 2GOP data as described in 3.2 and com-
-65 -70 -75 -80 -85 -90
munication setup overhead. We assume the latter time
Segment Transcoding Time (sec) is constant for each segment and denoted as TOH , and
Fig. 16 Histogram of the segment transcoding time assume that segment merging time can be neglected,
because the merging time is relatively small compared
to the transcoding time and the only few last segments
contribute to Tc . Then, the Tc can be estimated as
4.2 Performance model and the optimum segment
length dRs m TOH F
Tc = + (4)
Rdemux dm
Since the segment transcoding time can be considered
as constant, the proposed system implements a sim- where Rs and Rdemux are source video coding rate in
ple round-robin scheduling method (Fig.17). Segment bits/ sec, demultiplexing speed in bits/sec, respectively.
lengths are equal, as is transcoding PC performance. Therfore,using above equations,
For this allocation, shortening the length of the video dRs m cF TOH F
segment decreases the waiting time of each transcoding Ttotal = + + (5)
Rdemux m dm
IEICE TRANS. INF. & SYST., VOL.Exx–??, NO.xx XXXX 200x
8
1000
)c
es 950
(
e
m
iT 900
gn
id 850
oc
sn
↓d opt=30
ar 800 Experimental
Tl
at Estimated
oT 750
700
0 50 100 150 200 250 300 350
Segment Length (sec)
Fig. 18 Overview of the distributed transcoding system
Fig. 20 Performance of the distributed transcoder using 10
transcoding PCs
1600
)c
se( 1550 8000
e )c 7000
im es
T1500 (
e 6000 Experimental
gn m
iT Estimated
id 1450
oc gn 5000
sn ↑d opt=60 Experimental id
oc 4000
raT1400 Estimated sn
la ar 3000
to 1350 Tl
T at 2000
oT
1300 1000
0 50 100 150 200 250 300 350
0
Segment Length (sec) 0 1 2 3 4 5 6 7 8 9 10
Fig. 19 Performance of the distributed transcoder using 5 Number of transcoding PCs
transcoding PCs
Fig. 21 Performance of the distributed transcoder
2000 operation.
1800 Acknowledgment
1600 500Kb/s + 1Mb/s + 2Mb/s
)c The Authors would like to thank Professor Masayuki
es 1400
( 500Kb/s + 1Mb/s Murata of Osaka Univ. for his valuable suggestions on
e
m
iT distributed multimedia networks and Dr. Sakuichi Oht-
1200
gn suka of NTT DATA corp. for his comments on video
id 1000 coding quality.
oc
sn
ar 800 References
Tl 500Kb/s
at 600
oT [1] ITU-T H.262/ISO-IEC 13838-2, MPEG-2, H.262, 1996.
[2] G. Morrison, “Video transcoders with low delay,” IEICE
400 Estimated Trans. Commun., vol.E80-B, no.6, pp.963–969, June, 1997.
[3] T. Shanableh and M. Ghanbari,“Heterogeneous video
200 transcoding to lower spatio-temporal resolutions and dif-
ferent encoding formats,” IEEE Trans. Multimedia,Vol.
0 2,No. 2,pp.101–110,June, 2000.
0 50 100 150 200 250 300 350 [4] Z. Lei and N.D. Georganas, “Rate adaptation transcod-
ing for precoded video streams,” Proc. ACM Multimedia02,
Segment Length (sec) pp.127–136, Juan-les-Pins, Dec., 2002.
Fig. 22 Performance of the distributed transcoder for simulta- [5] J. Youn, M.T. Sun and J. Xin, “Video transcoder architec-
neous multiple transcoding ture for bit rate scaling of H.263 bit streams,” Proc. ACM
Multimedia, pp.243–250, Orlando, Nov., 1999.
[6] J. Xin, M.T. Sun and B.S. Choi and K.W. Chun, “An
HDTV-to-SDTV spatial Transcoder,” IEEE Trans. Circuits
the simultaneous multiple MPEG-4 transcoding per- & Systems Video Technology, vol.12, no.11, pp.998–1008,
formance using 10 transcoding PCs. The estimation Nov., 2002
used c is 2.0 for 500Kb/s, 2.6 for 500Kb/s + 1Mb/s, [7] Y. Nakajima and M. Sugano, “MPEG bit rate and for-
and 4.7 for 500Kb/s + 1Mb/s + 2Mb/s, calculated mat conversions for heterogeneous network/storage appli-
by Eq.(1), respectively. These results show that the cations,” IEICE Trans. Electron., vol.E85-C, no.3, pp.492–
504, Mar., 2002.
distributed transcoding is efficient for speedup of mul-
[8] Y. Sambe, S. Watanabe, Dong Yu, T. Nakamura, N.
tiple transcoding. If three kinds of transcoding are Wakamiya, “A High Speed Distributed Video Transcoding
done on one transcoding PC, it will take about 16920 for Multiple Rates and Formats,” Proc. ITC-CSCC2003,
(= 3600 × 4.7) seconds. The distributed transcoding on pp.921–924, 2003.
10 PCs decrease the total time by a factor of about 9.5 [9] J.L. Hennesy and D.A. Patterson, “Computer architecture
(= 16920/1780). The performance become improved : A quantitative approach,” Morgan Kaufmann Inc., 1990.
[10] Test Model 5, ISO/IEC JTC1/ SC29/ WG11/ N0400,
for multiple transcoding compared with that of single
MPEG93/457, April, 1993.
transcoding. The performance improvement is because [11] P. Tiwari and E. Viscito,“A parallel MPEG-2 video en-
decoding process is done only once and eliminate the coder with look-ahead rate control,” Proc. IEEE Interna-
multiple decoding time. The reason that the estimated tional Acoustics, Speech and Signal Processing Conf., vol.4,
performance at short segment length doesn’t well pre- pp.1994–1997,1996.
dict for multiple transcoding is because segment han- [12] R. Egawa,A. A. Alatan,and A. N. Akansu,“Compressed
domain MPEG-2 video editing with VBV requirement,”
dling overhead time TOH is assumed to be constant and IEEE Proc. ICIP2000, 2000.
merging time is neglected. These may cause underesti- [13] Y. Sambe, S. Watanabe, Dong Yu, T. Nakamura, N.
mation of total segment overhead time. Wakamiya, “Distributed video transcoding and its applica-
tion to grid delivery,” IEEE Proc. APCC2003, vol.1, pp.98–
6. Conclusion 102, 2003.