You are on page 1of 9

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO.

6, AUGUST 2011 1467

Conversion of Multichannel Sound Signal


Maintaining Physical Properties of Sound
in Reproduced Sound Field
Akio Ando, Member, IEEE

Abstract—In this paper, we describe a new method for con- some television receivers. Although such a down-mixing algo-
verting the signal of the original multichannel sound system into rithm was reported as having a certain effectiveness [3], it does
that of an alternative system with a different number of channels not work for an arbitrary loudspeaker arrangement.1 To enable
while maintaining the physical properties of sound at the listening
point in the reproduced sound field. Such a conversion problem down-mixing between a number of systems, a technology for
can be described by the underdetermined linear equation. To converting or recreating sound fields is necessary.
obtain an analytical solution to the equation, the method parti- A sound field reconstruction method was proposed in the
tions the sound field of the alternative system on the basis of the 1960s [4]. It tried to recreate the primary sound field of a
positions of three loudspeakers and solves the “local solution” in large room, such as a concert hall, in a small room (the sec-
each subfield. As a result, the alternative system localizes each
channel signal of the original sound system at the corresponding ondary sound field) based on Huygens’ principle, which can
loudspeaker position as a phantom source. The composition of the be regarded as an approximation of the Kirchhoff–Helmholtz
local solutions introduces the “global solution,” that is, the ana- integral theorem [5]. There are other methods using the Kirch-
lytical solution to the conversion problem. 22-channel signals of hoff–Helmholtz integral theorem directly [6], [7]. In those
a 22.2 multichannel sound system without the two low-frequency methods, it is necessary to enclose the listening area by a
effect channels were converted into 10-, 8-, and 6-channel signals
by the method. Subjective evaluations showed that the proposed boundary surface, on which sound pressure and its gradient
method could reproduce the spatial impression of the original are controlled. Another method of sound field reconstruction
22-channel sound with eight loudspeakers. is wave field synthesis (WFS) [8], [9]. In WFS, the boundary
Index Terms—Acoustic signal processing, sound field repro- surface enveloping the listening area degenerates to a plane,
duction, sound localization, spatial and multichannel audio, and then, it uses a Rayleigh I or II integral instead of the
three-dimensional sound. Kirchhoff–Helmholtz integral [10]. However, those sound
field reconstruction methods require many loudspeakers or a
loudspeaker array in order to satisfy their physical theory.
I. INTRODUCTION On the other hand, there are some methods that can be
used to reproduce the original sound pressure at a listening
point, instead of the listening area that has a certain volume.
B ROADCASTING and packaged media have popularized
5.1 multichannel sound as a home sound system. Research
on multichannel audio has now shifted to advanced systems with
Among them, ambisonics is a widely accepted sound pickup
and reproduction method [11], and higher order ambisonics
more channels to provide enhanced spatial impressions. For ex- [12], [13] can expand the listening area if a certain number of
ample, a 22.2 multichannel sound system has been developed secondary sources (loudspeakers) are available and uniformly
for ultrahigh-definition television with 4320 scanning lines [1]. arranged [14]. Ambisonics represents the observed sound
Such advanced sound systems require their own loudspeaker ar- pressure by spherical harmonic expansion with a finite number
rangement to bring out the best performance. Although loud- of components up to a given order and tries to reproduce the
speakers can be arranged optimally in a theater, they are diffi- sound that has the same expansion coefficients through several
cult to set up in a typical home environment. loudspeakers surrounding the listening point or area. It offers
“Down-mixing” is a widely known method of reducing the a complete hierarchical approach to sound pickup, transmis-
number of channels in multichannel audio. Down-mixing from sion and reproduction, which corresponds to mono, stereo,
5.1 to two-channel stereo or monophonic has already been stan- horizontal surround, or three-dimensional sound reproduction.
dardized in an ITU-R Recommendation [2] and is applied to However, it has the problem of the unavailability of higher order
sound pickup devices, and even with the specially designed
microphone array [15], the expansion in actual ambisonic
Manuscript received June 29, 2010; revised October 12, 2010 and November encoding is still limited up to several orders at present.
01, 2010; accepted November 02, 2010. Date of publication November 15, 2010; All sounds reproduced can be considered binaural because
date of current version May 25, 2011. The associate editor coordinating the re- they are heard by both ears of a listener. Thus, the binaural
view of this manuscript and approving it for publication was Prof. Lauri Savioja.
The author is with NHK Science and Technology Research Laboratories, recording with the so-called “dummy head” could bring the spa-
Tokyo 157-8510, Japan (e-mail: ando.a-io@nhk.or.jp). tial sound impression of the original sound field to the listener.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. 1The down-mixing method standardized in [2] uses only three down-mixing
Digital Object Identifier 10.1109/TASL.2010.2092429 coefficients, i.e., 1.0, 0.7071, and 0.0.

1558-7916/$26.00 © 2010 IEEE


1468 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011

Fig. 1. Block diagram of proposed method.

TABLE I the localization of the reproduced sound [20]. Consequently,


REQUIREMENTS FOR SOUND REPRODUCTION
we developed a new conversion method satisfying all three
requirements.
In this paper, a new conversion method that is applicable to
automatic down-mixing and up-mixing is presented. The basic
concept of this method is to solve the problem on the conversion
matrix that converts the input signals for loudspeakers in the
original sound field into those in the reproduction sound field in
such a way that the physical properties of sound coincide at the
Moreover, there are some methods to reproduce binaural signals receiving point (listening point). This problem can be solved an-
on loudspeakers [16], [17]. However, a mismatch of the head-re- alytically and its solution is frequency-independent, indicating
lated transfer function (HRTF) [18] between the dummy head that the conversion does not change the timbre.
and the listener’s real head degrades the spatial impression of
the reproduced sound. II. FORMULATION OF SOUND FIELD CONVERSION
The conventional methods mentioned above aim at recreating The proposed method converts the original signal into
the primary sound field, such as that of a live performance in a the converted signal through matrix operation:
concert hall, in a different room. Such a problem is difficult and
very challenging, as it had been thought as “impossible” [4]. On (1)
the other hand, the problem of recreating the original “repro-
duced sound field,” such as the sound field in the mixing studio where
equipped with a multichannel sound system, into another room
seems to be easier to be solved than the conventional problem.
However, only a few attempts have so far been made regarding .. ..
. .
this problem, which is the purpose of this study. In this paper, the
sound field reproduced by the original sound system is referred
to as the “original” sound field and the sound field recreated The matrix should be solved such that the physical proper-
by an alternative system with a different number of channels ties at the receiving point in the original and reproduction sound
is referred to as the “reproduction” sound field. The problem fields are the same. Fig. 1 shows the block diagram of the pro-
of recreating the primary sound field, such as that in a concert posed method, where denotes the sound propagation from
hall, is the task of a recording engineer and is not dealt with in loudspeakers to the receiving point in the original field and
this paper. Our problem of recreating the original sound field in denotes that in the reproduction field. If a physical property of
another room corresponds to that of finding a good conversion sound is a linear function of the audio signal, the equation that
from the sound signal of the original sound system to that of the should satisfy becomes
reproduction sound system.
In such conversion, we aim for the faithful reproductions of (2)
1) timbre, 2) sound localization, and 3) the sound envelopment
could be formally written as
of the original sound [19]. Such requirements are shown in
Table I. Regarding requirement 1), the conversion discussed
in this paper should not be a function of frequency, because
a frequency-dependent conversion might change the timbre. Equation (2) can be solved with numerical algorithms. However,
Among the conventional methods, ambisonics [11]–[13] seems the presented approach tries to solve it analytically.
to meet requirements 1) and 3). However, ambisonics cannot Because a physical property of sound is a function of location
prevent the generation of a negative conversion coefficient in a three-dimensional space, its dimension is at most three. We
that brings out an opposite-phase signal that would degrade select the three-dimensional property of sound in this paper. As
ANDO: CONVERSION OF MULTICHANNEL SOUND SIGNAL MAINTAINING PHYSICAL PROPERTIES OF SOUND IN REPRODUCED SOUND FIELD 1469

a result, becomes a matrix. In such a case, (2) be-


comes the underdetermined linear equation if , which is
the usual situation in a three-dimensional sound field reproduc-
tion. To obtain an analytical solution, we divided the matrix
into 3 3 partial matrices and solved each partial problem. The
physical meaning of this division is to partition the reproduction
field on the basis of the positions of three loudspeakers. Using
the linearity of the physical property of sound again, the conver-
sion matrix is further divided into the problems of conversion Fig. 2. Generation of phantom source at  (position of loudspeaker in original
from one channel to three channels. The solution to this con- field) with three loudspeakers located at  ,  , and  in reproduction field.
version problem is called the “local solution” in this paper. In
contrast, the analytical solution , which is composed of local loudspeakers in the reproduction sound field, whose locations
solutions, is called the “global solution.” The physical meaning are , . Let the input signals
of the local solution is to create a phantom source using three of the loudspeakers in the original space be , ,
loudspeakers in the reproduction field. Since the restriction on and those in the reproduction space be , ; the
and is only , the method can be applicable to both Fourier transforms of sound pressure and particle velocity at the
down-mixing and up-mixing. receiving point in the original sound field are represented as
The following assumptions are made: 1) each loudspeaker
can be modeled as a point source; 2) the sound pressure at a unit (5)
distance from a loudspeaker is proportional to the input to the
loudspeaker (the proportionality coefficient is denoted as );
3) only the outgoing wave from the loudspeaker is considered;
and 4) the reflected sound can be neglected in the original and (6)
converted fields. Furthermore, 5) is assumed, where
is the wave number and is the minimum distance between and those in the reproduction sound field as
the loudspeakers and the receiving point. Assumption 5) is valid
except for the low-frequency sound that does not contribute to
the perception of sound localization. (7)
It is well known that sound can be described by the following
two physical properties: sound pressure and particle velocity
[21]. Sound pressure is a scalar variable of time and location. On (8)
the other hand, particle velocity is generally a three-dimensional
variable. Note that both physical properties are linear functions
of the source signal. On the basis of the assumptions mentioned To obtain an analytical solution in which the physical prop-
above, if a loudspeaker whose input signal is is located at erties of sound coincide at the receiving point, the proposed
, the Fourier transforms of sound pressure and method finds a local solution creating a phantom source of an
particle velocity at the receiving point can be original loudspeaker at the corresponding position in the repro-
written as duction field. The phantom source is generated by three loud-
speakers, whose directions make a spherical triangle including
(3) the source direction. Fig. 2 shows this step of the method. The
method then creates a global solution by summing up such
local solutions for all the loudspeakers of the original field based
and on the linearity of the physical property of sound.

III. ANALYTICAL SOLUTION

A. Selection of Three Loudspeakers in Reproduction Field to


Generate Phantom Source in Original Field
We shall use polar coordinates originating at the receiving
(4) point and represent a position as , where is the distance
from the origin, is the azimuthal angle, and is the elevation
angle. When only the direction is considered, the position is
respectively, where is the imaginary unit, superscript de- represented as in this paper.
notes the transposition of the vector or matrix, and is the pro- To create a phantom source as shown in Fig. 2, three loud-
portionality coefficient stated in the preceding paragraph. We speakers in the reproduction field should be selected for each
used assumption (5) to derive the last row of (4). loudspeaker position in the original field. Only the loudspeaker
Suppose there are loudspeakers in the original sound field, direction is considered here. We assume that a loudspeaker in
whose locations are , , and the original field is located at . Three loudspeaker positions
1470 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011

in the reproduction field should be selected, satisfying the fol- A virtual source of the original loudspeaker is localized at
lowing conditions: 1) a spherical triangle formed by the three in the reproduction space if
loudspeaker directions includes the direction of , and 2) the
spherical area of the triangle is the minimum among those sat- (13)
isfying 1). holds. A local solution can be then obtained as
Using the notation in Fig. 2, we define three vectors:

(14)

(9) where
where denotes the vector product. If and only if the position
is inside the spherical triangle determined by , , and , the
three vectors , , and have the same direction. Therefore,
we can determine that the direction of is inside the triangle if

(10)

holds, where denotes the inner product of two vectors. We used


this method to verify condition 1).

B. Local Solution That Generates Phantom Source


Assuming that a loudspeaker is located at in the orig-
inal space and its input signal is , the Fourier transform of
the particle velocity observed at the receiving point is

(11)

where is the density of air, is the speed of sound, and


Solution (14) guarantees the coincidence of the particle ve-
locities at the receiving pont. Now, let us introduce a condition
for the coincidence of sound pressures. In the current case, the
Fourier transform of the sound pressure at the receiving point
Here, is the distance between the source and the receiving due to a loudspeaker at is
point, is the azimuthal angle, and is the elevation angle. On
the other hand, if there are three loudspeakers in the reproduc- (15)
tion space, each of whose input signal and location are
and , respectively , the Fourier trans- and that due to the three loudspeakers at ,
form of the observed particle velocity is is

(12)
(16)
where we have the equation shown at the bottom of the page,
and From (15) and (16), the coincidence of sound pressures is
obtained when

(17)
ANDO: CONVERSION OF MULTICHANNEL SOUND SIGNAL MAINTAINING PHYSICAL PROPERTIES OF SOUND IN REPRODUCED SOUND FIELD 1471

is true. The solution does not satisfy condition (17) because


substituting (14) into (17) yields an incorrect equation:

On the other hand, a new vector

(18)

satisfies condition (17), where

(19) Fig. 3. 22.2 multichannel loudspeaker arrangement without LFE channels.

Although solution (18) does not satisfy condition (13), it guar- be included in the area bounded by the three loudspeakers of the
antees the coincidence of the directions of particle velocities. th, th, and th channels in the reproduction field.
Therefore, solution (18) gives the same sound pressure and The elements of the local solution in this case are denoted as
sound direction in both fields.
The obtained solution depends not on the input signal but on
the positions of the loudspeakers and receiving point in both
fields. One of the obtained signals is Using this notation, we define as

in the frequency domain. The time-domain representation is 0


(20) where . Accordingly, the matrix
becomes the global solution because the sound
From (20), it is clear that the conversion is frequency-indepen- pressure and particle velocity are linear functions of the input
dent and satisfies requirement 1) stated in Section I. signal.
The problem to create a phantom source is known as “pan- Instead of sound pressure and particle velocity, we could use
ning” [22]–[26]. Among the panning methods, vector-based sound intensity [28] as the physical property of sound and obtain
amplitude panning (VBAP) [25], [26] is widely regarded as a an alternative solution of (18). Although doing so is valid for
promising method of three-dimensional panning. VBAP could the local solution, the global solution cannot be obtained by the
be considered a generalization of the “tangent law” [22] of a simple addition of local solutions, because sound intensity is a
three-dimensional loudspeaker setup. If all distances between quadratic variable of the original signal. This is the reason we
loudspeakers and the receiving point shown in Fig. 2 are the did not select sound intensity as the physical properties of sound.
same, local solution (14) yields the same result as VBAP. Thus,
solution (14) provides physical underpinnings for the use of IV. SUBJECTIVE EVALUATION
VBAP [27]. Two subjective experiments were carried out in order to eval-
If we select only particle velocity as the physical property of uate the proposed method. In the first experiment, a 22.2 sound
sound, it is necessary to “normalize” solution (14) to control the signal without the two low-frequency effect (LFE) channels was
loudness of the reproduced sound. Actually such normalization converted into a 10-channel signal. The converted sound was
was introduced in VBAP [25]. In the panning problem, there compared with the original sound by subjective evaluation. In
is little restriction on such normalization. However, in the mul- the second experiment, a 22-channel signal was converted into
tichannel sound conversion problem considered in this paper, 8- and 6-channel signals, and the converted sound was compared
the normalization of each local solution loses the loudness bal- with the original sound. Fig. 3 shows the 22.2 multichannel
ance among phantom sources and fails to compose the global loudspeaker arrangement used in the experiments. All the ex-
solution. On the other hand, solution (18) normalizes (14) in a periments were carried out in a soundproof room where the re-
natural way that coincides with the sound pressure. Thus, we verberation time at 500 Hz was 0.18 s. The distance between the
adopted solution (18) as a local solution. listening point and each loudspeaker was 2 m.
Eight sound stimuli were obtained from 22.2 multichannel
C. Global Solution
programs exhibited at World Expo 2005 held in Aichi, Japan,
A global solution ( matrix) can be obtained by and NAB 2006 and 2007 shows held in Las Vegas, NV. Each
putting each element of local solutions into its appropriate por- stimulus was from 10 to 12 s long. The stimuli included musical
tion. Let the loudspeaker of the th channel in the original space and sport sounds, birds singing, and the sound of a light breeze.
1472 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011

TABLE III
RESULTS OF AUTOMATIC SELECTION OF THREE LOUDSPEAKER POSITIONS IN
REPRODUCTION FIELD WITH RESPECT TO EACH LOUDSPEAKER POSITION IN
ORIGINAL FIELD AND RESULTANT WEIGHTING COEFFICIENTS
Fig. 4. Double-blind triple-stimulus with hidden reference method.

TABLE II
SCALES USED FOR SUBJECTIVE EVALUATION

Fig. 5. Ten-channel loudspeaker arrangements in reproduction space.

A. Subjective Evaluation Method


Subjective evaluations used the “double-blind triple-stimulus
with hidden reference” method [29], which can be used for the
subjective assessment of small impairments in a multichannel
sound system. Fig. 4 shows the method. In Fig. 4, stimulus
“R” indicates the reference sound and stimuli “A” and “B” the
sounds for evaluation. The subject was asked to assess the im-
pairment on A and B compared with R, according to a contin-
uous five-grade impairment scale shown in Table II. Either stim-
ulus A or B was the same as R. The stimulus that is the same
as the reference is referred to as the “hidden reference,” and
the other stimulus the “object.” In the experiment, the reference
was the original 22-channel sound and the object the converted
sound. The impairment was assessed from the following two
viewpoints: sound localization and sound envelopment. Timbre
was not assessed in the experiments because in principle it could
be kept the same by the proposed method [please see (20)]. After
the experiment, the “difference grade” was calculated for each
object by subtracting the grade given to the hidden reference
from that to the object, and therefore, it should be a nonpositive
value.

B. Experiment-1: Conversion From 22-Channel Sound Signal


to 10-Channel Sound Signal In the experiment, subjects were 38 people in their twenties,
The three loudspeaker arrangements shown in Fig. 5 were thirties, and forties, who were experienced in playing musical
used to convert the 22-channel signal into a 10-channel signal. instruments. They evaluated each conversion twice. Eight sound
Layout 1 had 4 loudspeakers in the top layer, 5 loudspeakers stimuli were used in the experiment.
in the middle layer, and 1 loudspeaker in the bottom layer. The subjective evaluation results are shown in Fig. 6. P1, P2,
Layout 2 had 3, 6, and 1 loudspeakers in the top, middle, and and P3 in Fig. 6 show the proposed method results for layouts 1,
bottom layers, and layout 3 had 3, 5, and 2 loudspeakers in 2, and 3, respectively. D2 and D3 show the results of the conven-
the top, middle, and bottom layers, respectively. The result tional down-mixing algorithm where each original signal was
of the automatic selection of three loudspeakers (described converted with the coefficients of 1.0, 0.5, and 0.0 satisfying the
in Section III-A) and the resultant weighting coefficients (de- coincidence of sound pressures at the receiving point. Layout
scribed in Section III-B) for layout 2 are shown in Table III as 1 had every other loudspeaker of the original 22-channel sound
an example. system, and hence, the conversion matrix of the conventional
ANDO: CONVERSION OF MULTICHANNEL SOUND SIGNAL MAINTAINING PHYSICAL PROPERTIES OF SOUND IN REPRODUCED SOUND FIELD 1473

Fig. 6. Results of experiment-1. (a) Sound localization. (b) Sound envelop-


ment.
Mean scores and 95% confidence limits
P1: Proposed, layout 1
D2: Conventional, layout 2 P2: Proposed, layout 2
D3: Conventional, layout 3 P3: Proposed, layout 3

down-mixing algorithm was almost the same as that of the pro-


posed method for this layout. For example, both methods evenly
distributed the sound signal of a channel located in the (90, 0) di- Fig. 7. 8- and 6-channel loudspeaker arrangements in reproduction
rection in the original space to two channels located in the (120, space. (a) 8-channel loudspeaker arrangements; (b) 6-channel loudspeaker
0) and (60, 0) directions in the reproduction space. Because of arrangements.
this, the conventional down-mixing algorithm was not evaluated
for layout 1. The proposed method obtained a difference grade
of more than 1.0 for both spatial impressions. Thus, even with
10 channels, the proposed method kept the spatial impressions
of the original 22-channel sound and gave a better result than the
conventional down-mixing algorithm. The difference between
these methods was significant at a level of 0.05.

C. Experiment-2: Conversion From 22-Channel Sound Signal


to 8- and 6-Channel Sound Signals
In the first experiment, the proposed method obtained a differ-
ence grade of more than 1.0 in converting 22-channel sound
into 10-channel sound. The question arose as to how many chan-
nels in the reproduction space can keep a difference grade of
more than 1.0 for both spatial impressions. Hence, we con-
ducted the second experiment in which the original 22-channel
sound signal was converted into 8- and 6-channel sound signals.
We set up three loudspeaker arrangements for both conversions,
as shown in Fig. 7. The subjects were 32 people in their twen-
ties, thirties, and forties, who were experienced in playing mu-
sical instruments. They evaluated each conversion twice. Since
the number of loudspeaker arrangements in this experiment was
twice that in the first experiment, we considered four represen-
tative stimuli and used them in this experiment. The other ex- Fig. 8. Results of experiment-2. (a) Sound localization; (b) sound envelopment.
perimental conditions in this experiment were the same as those Mean scores and 95% confidence limits
in the first experiment. D1: Conventional, layout 1 P1: Proposed, layout 1
D2: Conventional, layout 2 P2: Proposed, layout 2
The results are shown in Fig. 8. Regarding layout 1 for the D3: Conventional, layout 3 P3: Proposed, layout 3
8- and 6-channel loudspeaker arrangements, the difference be-
tween the conventional down-mixing method and the proposed
area). On the other hand, the converted 6-channel sound did not
method is not significant at a level of 0.05. The reason for this
maintain a difference grade of more than 1.0.
is likely that both methods distributed the sounds of the front
channels of the original 22-channel sound in almost the same
V. DISCUSSION
manner. For the other layouts, the proposed method yielded
better results than the conventional down-mixing method did The proposed method guarantees the coincidences of pres-
on both spatial impressions, and the difference was significant sures and directions of sound only at the receiving point. It can
at a level of 0.05. The converted 8-channel sound in layouts 1 be extended into the method of minimizing the square errors
and 2 gave spatial impressions with a difference grade of more of those sound properties over a receiving area (an infinite set
than 1.0 (layout 3 had only a few loudspeakers in the frontal of receiving points) [30]. However, in such a case, the obtained
1474 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011

solution will inevitably be a frequency-dependent function and REFERENCES


would not satisfy requirement 1) mentioned in Section I. Al-
though the proposed method ensures the coincidences of the [1] K. Hamasaki, T. Nishiguchi, R. Okumura, Y. Nakayama, and A.
Ando, “A 22.2 multichannel sound system for ultrahigh-definition TV
physical properties only at a receiving point, the subjective eval- (UHDTV),” SMPTE Motion Imaging J., pp. 40–49, Apr. 2008.
uation results showed that it is effective in a nearby area because [2] ITU-R Rec. BS.775-2, “Multi-channel stereophonic sound system
with and without accompanying picture,” ITU. Geneva, Switzerland,
both ears would not be on the receiving point at the same time. 2006.
We shall study the size of the effective listening area reproduced [3] S. K. Zielinski, F. Rumsey, and S. Bech, “Effects of down-mix algo-
rithms on quality of surround sound,” J. Audio Eng. Soc., vol. 51, no.
by the proposed method. 9, pp. 780–798, Sep. 2003.
The down-mixing method proposed in [2] has coefficients [4] M. Camras, “Approach to recreating sound field,” J. Acoust. Soc.
based on the conservation of sound power. On the other hand, Amer., vol. 43, no. 6, pp. 1425–1431, Jun. 1968.
[5] M. Born and E. Wolf, Principles of Optics: Electromagnetic Theory
the conventional down-mixing method used in this study has co- of Propagation, Interference and Diffraction of Light. Cambridge,
efficients based on the conservation of sound pressure, so as to U.K.: Cambridge Univ. Press, 1999.
[6] S. Ise, “A principle of sound field control based on the Kirch-
match the proposed method. The local solution of the proposed hoff–Helmholtz integral equation and the theory of inverse systems,”
method can be naturally made to conserve power by modifying Acustica, vol. 85, pp. 78–87, 1999.
(15). However, in such a case, the global solution cannot be ob- [7] S. Takane, Y. Suzuki, and T. Sone, “A new method for global sound
field reproduction based on Kirchhoff’s integral equation,” Acustica,
tained by adding the local solutions for the same reason as in vol. 85, pp. 250–257, 1999.
the coincidence of sound intensities. [8] A. J. Berkhout, “A holographic approach to acoustic control,” J. Audio
Eng. Soc., vol. 36, no. 12, pp. 977–995, Dec. 1988.
Notwithstanding the above remarks, the down-mixing [9] A. J. Berkhout, D. de Vries, and P. Vogel, “Acoustic control by wave
method described in this paper performed very well. There field synthesis,” J. Acoust. Soc. Amer., vol. 93, no. 5, pp. 2764–2778,
May 1993.
would be no reason for these results other than it guarantees [10] D. de Vries, “Wave field synthesis,” in AES Monograph 2009.
the coincidence of sound pressures. The principal difference [11] M. A. Gerson, “Periphony: With-height sound reproduction,” J. Audio
between the conventional and proposed methods is the repro- Eng. Soc., vol. 21, no. 1, pp. 2–10, Jan./Feb. 1973.
[12] R. Nicol and M. Emerit, “3D-sound reproduction over an extensive lis-
ducibility of the sound direction. A subjective evaluation of tening area: A hybrid method derived from holophony and ambisonic,”
sound materials with a clear sound direction would increase the in Proc. AES 16th Int. Conf., 1999.
[13] D. B. Ward and T. D. Abhayapala, “Reproduction of a plane-wave
difference between the two methods. sound field using an array of loudspeakers,” IEEE Trans. Speech Audio
In this paper, we used subjective evaluation to evaluate the Process., vol. 9, no. 6, pp. 687–707, Sep. 2001.
spatial impression of sound. On the other hand, there are objec- [14] J. Ahrens and S. Spors, “An analytical approach to sound field repro-
duction using circular and spherical loudspeaker distributions,” Acta
tive evaluation methods with interaural time difference (ITD), Acustica United With Acustica, vol. 94, pp. 988–999, 2008.
interaural level difference (ILD), and interaural cross correla- [15] B. Rafaely, “Analysis and design of spherical microphone arrays,”
IEEE Trans. Speech Audio Process., vol. 13, no. 1, pp. 135–143, Jan.
tion (IACC) [19]. Such methods could extract the spatial im- 2005.
pressions adequately with artificial sound sources, such as white [16] B. B. Bauer, “Stereophonic earphones and binaural loudspeakers,” J.
Audio Eng. Soc., vol. 9, no. 2, pp. 148–151, Apr. 1961.
noise. However, it is difficult for such methods to evaluate the [17] B. S. Atal, M. Hill, and M. R. Schroeder, “Apparent Sound Source
impressions of a natural sound source that is a good test signal Translator,” U.S. Patent 2,236,949, Feb. 1966.
for overall evaluation. Thus, we did not adopt the objective eval- [18] J. Brauert, Spatial Hearing. The Psychophysics of Human Sound Lo-
calization. Cambridge, MA: MIT Press, 1997.
uation in this study. Both subjective and objective evaluation [19] F. Rumsey, Spatial Audio. Burlington, MA: Focal Press, Elsevier,
methods with various test signals will be further studied in eval- 2001.
[20] G. Martin, W. Woszczyk, J. Corey, and R. Quesnel, “Controlling
uation experiments. phantom image focus in a multichannel reproduction system,” in Proc.
In this paper, the proposed method was evaluated as AES 107th Conv. Paper, Sep. 1999.
a down-mixing method. Obviously, it can be used as an [21] A. D. Pierce, “Acoustics, an introduction to its physical principles and
applications,” Acoust. Soc. Amer., 1989.
up-mixing method. We shall study this issue in due course. [22] B. Bernfeld, “Attempts for better understanding of the directional
stereophonic listening mechanism,” in Proc. 44th AES Conv., Feb.
1973.
[23] M. Gerzon, “Panpot laws for multispeaker stero,” in Proc. 92th AES
Conv., March 1992.
[24] M. Poletti, “The design of encoding functions for stereophonic and
VI. CONCLUSION polyphonic sound system,” J. Audio Eng. Soc., vol. 44, no. 11, pp.
948–963, Nov. 1996.
[25] V. Pulkki, “Virtual sound source positioning using vector base ampli-
In this paper, we proposed a new method for converting tude panning,” J. Audio Eng. Soc., vol. 45, no. 6, pp. 456–466, Jun.
multichannel sound signals while maintaining the pressure 1997.
[26] V. Pulkki, “Localization of amplitude-panned virtual sources II: Two-
and direction of sound at the receiving point in the reproduced and three-dimensional panning,” J. Audio Eng. Soc., vol. 49, no. 9, pp.
sound field. We found that the conventional down-mixing 753–767, Sep. 2001.
[27] A. Ando and K. Hamasaki, “Sound intensity based three-dimensional
method would be effective if the sound pressure were con- panning,” in Proc. AES 126th Conv., May 2009.
served. Even in such cases, the proposed method performed [28] F. J. Fahy, Sound Intensity, 2nd ed. London, U.K.: E & FN Spon,
better than the conventional method because it can reproduce 1995.
[29] ITU-R Rec. BS.1116-1, Methods for the subjective assessment of small
the sound direction. Subjective evaluations revealed that the impairments in audio systems including multichannel sound systems,
8-channel sound converted by the proposed method gave almost ITU. Geneva, Switzerland, 1997.
[30] A. Ando, “Adaptation of multichannel sound reproduction to restricted
the same spatial impressions as the original 22-channel sound speaker arrangement,” presented at the Proc. 19th Int. Congr. Acoust.
at a receiving point. (ICA2007), Sep. 2007, ELE-01-002.
ANDO: CONVERSION OF MULTICHANNEL SOUND SIGNAL MAINTAINING PHYSICAL PROPERTIES OF SOUND IN REPRODUCED SOUND FIELD 1475

Akio Ando (M’80) received the B.S. and M.S. programs in March 2000 and sports and variety programs in December 2001.
degrees from Kyushu Institute of Design, Fukuoka, Since 2002, he has been engaged in research on audio and acoustics including
Japan, in 1978 and 1980, respectively, and the acoustic signal processing, electroacoustical transducers, cognitive science
Dr.Eng. degree from Toyohashi University of Tech- of acoustics, and spatial sound reproduction. From 2004 to 2006, he was the
nology, Toyohashi, Japan, in 2001. Director of the Acoustics and Audio Signal Processing Division, and currently
In 1980, he joined the Japan Broadcasting Corpo- he is a Senior Research Engineer of the Advanced Television Systems Research
ration (NHK). He has been with the NHK Science Division, NHK Science and Technology Research Laboratories. Since 2010,
and Technology Research Laboratories, Tokyo, he has been a Guest Professor at the Tokyo Institute of Technology, Japan. His
Japan, since August 1983. He was in charge of research interests include pattern recognition, signal processing, theoretical
developing simultaneous subtitling systems for live acoustics, and multichannel audio coding.
broadcast TV programs using speech recognition,
with which NHK started simultaneous subtitled broadcasting for daily news

You might also like