Professional Documents
Culture Documents
Conversion of Multichannel Sound Signal Maintaining Physical Properties of Sound in Reproduced Sound Field
Conversion of Multichannel Sound Signal Maintaining Physical Properties of Sound in Reproduced Sound Field
Abstract—In this paper, we describe a new method for con- some television receivers. Although such a down-mixing algo-
verting the signal of the original multichannel sound system into rithm was reported as having a certain effectiveness [3], it does
that of an alternative system with a different number of channels not work for an arbitrary loudspeaker arrangement.1 To enable
while maintaining the physical properties of sound at the listening
point in the reproduced sound field. Such a conversion problem down-mixing between a number of systems, a technology for
can be described by the underdetermined linear equation. To converting or recreating sound fields is necessary.
obtain an analytical solution to the equation, the method parti- A sound field reconstruction method was proposed in the
tions the sound field of the alternative system on the basis of the 1960s [4]. It tried to recreate the primary sound field of a
positions of three loudspeakers and solves the “local solution” in large room, such as a concert hall, in a small room (the sec-
each subfield. As a result, the alternative system localizes each
channel signal of the original sound system at the corresponding ondary sound field) based on Huygens’ principle, which can
loudspeaker position as a phantom source. The composition of the be regarded as an approximation of the Kirchhoff–Helmholtz
local solutions introduces the “global solution,” that is, the ana- integral theorem [5]. There are other methods using the Kirch-
lytical solution to the conversion problem. 22-channel signals of hoff–Helmholtz integral theorem directly [6], [7]. In those
a 22.2 multichannel sound system without the two low-frequency methods, it is necessary to enclose the listening area by a
effect channels were converted into 10-, 8-, and 6-channel signals
by the method. Subjective evaluations showed that the proposed boundary surface, on which sound pressure and its gradient
method could reproduce the spatial impression of the original are controlled. Another method of sound field reconstruction
22-channel sound with eight loudspeakers. is wave field synthesis (WFS) [8], [9]. In WFS, the boundary
Index Terms—Acoustic signal processing, sound field repro- surface enveloping the listening area degenerates to a plane,
duction, sound localization, spatial and multichannel audio, and then, it uses a Rayleigh I or II integral instead of the
three-dimensional sound. Kirchhoff–Helmholtz integral [10]. However, those sound
field reconstruction methods require many loudspeakers or a
loudspeaker array in order to satisfy their physical theory.
I. INTRODUCTION On the other hand, there are some methods that can be
used to reproduce the original sound pressure at a listening
point, instead of the listening area that has a certain volume.
B ROADCASTING and packaged media have popularized
5.1 multichannel sound as a home sound system. Research
on multichannel audio has now shifted to advanced systems with
Among them, ambisonics is a widely accepted sound pickup
and reproduction method [11], and higher order ambisonics
more channels to provide enhanced spatial impressions. For ex- [12], [13] can expand the listening area if a certain number of
ample, a 22.2 multichannel sound system has been developed secondary sources (loudspeakers) are available and uniformly
for ultrahigh-definition television with 4320 scanning lines [1]. arranged [14]. Ambisonics represents the observed sound
Such advanced sound systems require their own loudspeaker ar- pressure by spherical harmonic expansion with a finite number
rangement to bring out the best performance. Although loud- of components up to a given order and tries to reproduce the
speakers can be arranged optimally in a theater, they are diffi- sound that has the same expansion coefficients through several
cult to set up in a typical home environment. loudspeakers surrounding the listening point or area. It offers
“Down-mixing” is a widely known method of reducing the a complete hierarchical approach to sound pickup, transmis-
number of channels in multichannel audio. Down-mixing from sion and reproduction, which corresponds to mono, stereo,
5.1 to two-channel stereo or monophonic has already been stan- horizontal surround, or three-dimensional sound reproduction.
dardized in an ITU-R Recommendation [2] and is applied to However, it has the problem of the unavailability of higher order
sound pickup devices, and even with the specially designed
microphone array [15], the expansion in actual ambisonic
Manuscript received June 29, 2010; revised October 12, 2010 and November encoding is still limited up to several orders at present.
01, 2010; accepted November 02, 2010. Date of publication November 15, 2010; All sounds reproduced can be considered binaural because
date of current version May 25, 2011. The associate editor coordinating the re- they are heard by both ears of a listener. Thus, the binaural
view of this manuscript and approving it for publication was Prof. Lauri Savioja.
The author is with NHK Science and Technology Research Laboratories, recording with the so-called “dummy head” could bring the spa-
Tokyo 157-8510, Japan (e-mail: ando.a-io@nhk.or.jp). tial sound impression of the original sound field to the listener.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. 1The down-mixing method standardized in [2] uses only three down-mixing
Digital Object Identifier 10.1109/TASL.2010.2092429 coefficients, i.e., 1.0, 0.7071, and 0.0.
in the reproduction field should be selected, satisfying the fol- A virtual source of the original loudspeaker is localized at
lowing conditions: 1) a spherical triangle formed by the three in the reproduction space if
loudspeaker directions includes the direction of , and 2) the
spherical area of the triangle is the minimum among those sat- (13)
isfying 1). holds. A local solution can be then obtained as
Using the notation in Fig. 2, we define three vectors:
(14)
(9) where
where denotes the vector product. If and only if the position
is inside the spherical triangle determined by , , and , the
three vectors , , and have the same direction. Therefore,
we can determine that the direction of is inside the triangle if
(10)
(11)
(12)
(16)
where we have the equation shown at the bottom of the page,
and From (15) and (16), the coincidence of sound pressures is
obtained when
(17)
ANDO: CONVERSION OF MULTICHANNEL SOUND SIGNAL MAINTAINING PHYSICAL PROPERTIES OF SOUND IN REPRODUCED SOUND FIELD 1471
(18)
Although solution (18) does not satisfy condition (13), it guar- be included in the area bounded by the three loudspeakers of the
antees the coincidence of the directions of particle velocities. th, th, and th channels in the reproduction field.
Therefore, solution (18) gives the same sound pressure and The elements of the local solution in this case are denoted as
sound direction in both fields.
The obtained solution depends not on the input signal but on
the positions of the loudspeakers and receiving point in both
fields. One of the obtained signals is Using this notation, we define as
TABLE III
RESULTS OF AUTOMATIC SELECTION OF THREE LOUDSPEAKER POSITIONS IN
REPRODUCTION FIELD WITH RESPECT TO EACH LOUDSPEAKER POSITION IN
ORIGINAL FIELD AND RESULTANT WEIGHTING COEFFICIENTS
Fig. 4. Double-blind triple-stimulus with hidden reference method.
TABLE II
SCALES USED FOR SUBJECTIVE EVALUATION
Akio Ando (M’80) received the B.S. and M.S. programs in March 2000 and sports and variety programs in December 2001.
degrees from Kyushu Institute of Design, Fukuoka, Since 2002, he has been engaged in research on audio and acoustics including
Japan, in 1978 and 1980, respectively, and the acoustic signal processing, electroacoustical transducers, cognitive science
Dr.Eng. degree from Toyohashi University of Tech- of acoustics, and spatial sound reproduction. From 2004 to 2006, he was the
nology, Toyohashi, Japan, in 2001. Director of the Acoustics and Audio Signal Processing Division, and currently
In 1980, he joined the Japan Broadcasting Corpo- he is a Senior Research Engineer of the Advanced Television Systems Research
ration (NHK). He has been with the NHK Science Division, NHK Science and Technology Research Laboratories. Since 2010,
and Technology Research Laboratories, Tokyo, he has been a Guest Professor at the Tokyo Institute of Technology, Japan. His
Japan, since August 1983. He was in charge of research interests include pattern recognition, signal processing, theoretical
developing simultaneous subtitling systems for live acoustics, and multichannel audio coding.
broadcast TV programs using speech recognition,
with which NHK started simultaneous subtitled broadcasting for daily news