Professional Documents
Culture Documents
This Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at least
two qualified anonymous reviewers. The complete manuscript was not peer reviewed. This convention paper has been
reproduced from the author's advance manuscript without editing, corrections, or consideration by the Review Board. The AES
takes no responsibility for the contents. This paper is available in the AES E-Library, http://www.aes.org/e-lib. All rights
reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the
Audio Engineering Society.
ABSTRACT
Recently, the new MPEG-H 3D Audio standard has been finalized. It has been designed for delivery of next
generation audio content to the user. In addition to highly efficient immersive audio transmission, MPEG-H 3D
Audio allows new capabilities such as personalization and adaptation of the audio content to different use scenarios.
It also provides an enhanced concept for loudness and dynamic range control (DRC) to adapt the characteristics of
the audio content to the requirements of different playback scenarios and listening conditions. This paper gives a
detailed overview of the loudness control and DRC functionality of MPEG-H 3D Audio. Relevant use cases are
discussed to exemplify the application of the enhanced DRC and loudness management features.
mobile networks) and is consumed on a variety of values – and how it is applied. The metadata is typically
devices (e.g. AVR, TV set, mobile device) in different generated by the content provider and attached to the
environments (e.g. silent living room, noisy public content. The audio content is delivered unmodified and
transport). the metadata can be applied at the receiver if desired.
The content provider has full control of the whole
From the consumer’s point of view, the characteristics process and can ensure that the DRC metadata produces
of the audio content should fit the individual listening a high-quality result in all scenarios.
condition and preference irrespective of the origin and
distribution channel of the content. As a consequence of For the integration into MPEG-H 3DA, additional
the wide variability of listening environments, flexible requirements for interactive and immersive audio have
adaptation of the audio content is required to avoid user been taken into account and corresponding extensions
annoyance in many cases. To be more specific, here are have been added. The loudness control and DRC tool of
a few examples that illustrate common problems that MPEG-H 3DA provides automatic loudness
may result from the scenario described above: normalization to a desired target level regardless of the
given content format, the loudspeaker configuration for
• The user needs to adjust the playback volume playback, or the user’s selection of a specific
when switching between different distribution presentation of the delivered program. It allows
channels because the loudness is not consistent. reversible adaptation of the dynamic range of the audio
as appropriate for the content type, listening
• Consecutive program items don’t share a environment, capabilities of the receiving device as well
common loudness level, e.g. a movie is as user preferences.
followed by an annoyingly loud commercial.
This paper provides an overview of dynamic range and
• The intelligibility of movie dialog is adversely loudness control as specified in the MPEG-H 3DA
affected in soft parts due to a noisy listening standard. First, a brief summary of the MPEG-H 3DA
environment. decoder is presented in Section 2. The different
functional blocks that are related to DRC and loudness
• The dynamic range of an asset is too large for processing are discussed in more detail in Section 3.
the desired playback level: The level of loud Practical examples for configuring the loudness/DRC
parts of a movie is annoyingly high when soft system and how to control it are given in Section 4,
parts are just loud enough; or soft parts are where both content production side and receiving side
inaudible when loud parts are at a reasonable are considered.
level.
2. OVERVIEW OF MPEG-H 3D AUDIO
• The dynamic range of an asset is too large for
the employed playback device, e.g. low-quality In this section, a brief overview of the MPEG-H 3DA
loudspeakers in portable devices. system is presented. Figure 1 shows a block diagram
illustrating the main functional blocks of the MPEG-H
• The audio signal clips after downmixing the 3DA decoder.
original immersive content to a lower number
of playback loudspeakers. As a first step, all transmitted audio signals, in either
channel, object or HOA format, are decoded by the core
To meet the requirements of multi-platform decoder stage. Channel signals are mapped to the target
environments, a sophisticated control of loudness and reproduction loudspeaker setup using a format
dynamics over a large range is necessary. Such a converter. For object-based representations, the audio
solution is provided in MPEG-H 3DA based on the signals for each object are rendered to the target
loudness control and DRC tools of the new MPEG-D reproduction loudspeaker setup by the object renderer
Dynamic Range Control standard [2][6]. MPEG-D DRC using the associated object metadata. Alternatively,
provides an enhanced feature set and improved quality signals coded via extended Spatial Audio Object Coding
compared to DRC systems of legacy audio codecs. It (SAOC-3D) [3], i.e. parametrically coded channel
defines a comprehensive metadata format – including signals and audio objects, are rendered to the
program loudness information and time-varying gain
Channel
Content Format
DRC
Converter
Object
Content Object
DRC
Renderer
MPEG-H
3D Audio SAOC 3D SAOC 3D Peak
Mixer DRC LN
bitstream in Core Content Renderer Limiter audio out
Decoder
DRC
HOA HOA
Content Renderer
DRC
loudness
More advanced loudness related parameters including
loudness [LKFS]
range loudness range, maximum short-term and maximum
momentary loudness (see ITU-R BS.1771-1 [9], EBU
R-128 [10][11]), as well as signal peak information can
Figure 2 - Legend for loudness and DRC related be optionally transmitted in the MPEG-H 3DA metadata
illustrations. Loudness is measured in LKFS according if desired. A detailed overview of supported descriptors
to [8]. is provided in [7].
loudness [LKFS]
DRC set 3
gains, where the level of certain objects or channel
Mobile playback groups is increased.
DRC set 2
target loudness
at decoder
clipping, the DRC and loudness control tool of MPEG-
Playback on TV speakers H 3DA includes an optional peak limiter at the decoder.
As shown in Figure 1, the limiting is performed at the
very end of the decoder processing chain.
DRC set 1
- 24 - 24
There are several processing steps within the MPEG-H Figure 8 illustrates the signal of a narration object and
3DA decoding that can potentially lead to clipping of the corresponding ducking gain.
signal peaks. The most important examples for such
processing steps are loudness normalization to high
Narration signal
0
Receiver type Target loudness Dynamic range
time
also fully controlled at the encoder, the audio signal to scenario, the tablet should be considered as an AVR
be expected at the output of the decoder can be rather than a mobile device when configuring the DRC
evaluated already at the encoder. Potential clipping can and loudness processing in the MPEG-H 3DA decoder.
therefore be identified in advance at the encoder and be Such decoding behavior can easily be achieved if the
anticipated appropriately. receiving device is aware of its currently active audio
output channel. Then, different values for the playback
4.2. Receiving Side target loudness can be used to perform loudness
normalization and to select the DRC set that is
In the following we discuss the use cases considered in appropriate for the given output configuration used for
the previous section from the perspective of the playback. On a tablet a practical approach is to trigger
receiving side. In general, the DRC tool of the MPEG-H different configurations when either using the internal
3DA decoder automatically selects the best-suited DRC loudspeakers, headphones or the HDMI output.
configuration included in the metadata of the audio
stream. For this, it takes into account information on the
5. SUMMARY
reproduction configuration (e.g. target loudness,
loudspeaker configuration) and potential additional user
In this paper a technical overview of the comprehensive
input (selection of a presentation of a program, special
loudness and DRC concept in MPEG-H 3D Audio has
effect types).
been provided. The different functional blocks and their
integration into the MPEG-H 3D Audio system have
For the case that the user receives the MPEG-H 3DA
been presented. The practical application of the
stream on a TV, the target loudness at the decoder will,
loudness and DRC features has been discussed for
e.g. be set to -24 LKFS. The selection process of the
relevant use cases together with illustrative examples
DRC tool automatically picks the DRC configuration
for corresponding system configuration and control of
from the stream that includes this particular value in its
the loudness and DRC tools.
associated target level range. Some receiving devices
may offer a control interface that allows the user to
select a specific DRC effect, e.g. for listening late at 6. REFERENCES
night or in noisy environments. In this case, the
selection process of the DRC tool will preferably choose [1] ISO/IEC, “Information technology -- High
a DRC set that has the corresponding effect type defined efficiency coding and media delivery in
in its configuration data. heterogeneous environments -- Part 3: 3D Audio”,
International Standard ISO/IEC 23008-3:2015.
When receiving an MPEG-H 3DA stream on a mobile [2] ISO/IEC, “Information technology -- MPEG audio
device, a similar procedure as described above for the technologies -- Part 4: Dynamic Range Control”,
TV happens: The increased playback target loudness of, International Standard ISO/IEC 23003-4:2015.
e.g. -16 LKFS is taken into account by the selection
process to choose the appropriate DRC configuration. [3] ISO/IEC, “Information technology -- MPEG audio
User input or other control mechanisms on the mobile technologies -- Part 2: Spatial Audio Object
device may lead to another DRC configuration that, e.g. Coding”, International Standard ISO/IEC 23003-
was specially optimized to provide an improved 2:2010.
listening experience in noisy environments. [4] Herre, J. et al, “MPEG-H Audio - The Upcoming
Standard for Universal Spatial / 3D Audio Coding”,
Additional aspects of controlling DRC and loudness International Conference on Spatial Audio (ICSA),
related processing result from application scenarios, in 2014, Erlangen, Germany.
which a device only receives the MPEG-H 3DA stream
for decoding, while the actual playback of the rendered [5] Herre, J. et al, “MPEG-H Audio - The New
audio content is done on a different device. For Standard for Universal Spatial / 3D Audio Cod-
example, a tablet can be used to receive and decode ing”, 137th AES Convention, Los Angeles, USA,
movie content from a media streaming service. The 2014.
HDMI output of the tablet is connected to an AVR such
that the audio is played back over a high-quality multi- [6] ISO/IEC JTC1/SC29/WG11 N15071, “White Paper
channel loudspeaker system. It is obvious, that in this on MPEG-D Dynamic Range Control”, ISO/IEC