You are on page 1of 10

Audio Engineering Society

Convention Paper 9465


Presented at the 139th Convention
2015 October 29–November 1 New York, USA

This Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at least
two qualified anonymous reviewers. The complete manuscript was not peer reviewed. This convention paper has been
reproduced from the author's advance manuscript without editing, corrections, or consideration by the Review Board. The AES
takes no responsibility for the contents. This paper is available in the AES E-Library, http://www.aes.org/e-lib. All rights
reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the
Audio Engineering Society.

Dynamic Range and Loudness Control in


MPEG-H 3D Audio
1 1 1 1 2
Fabian Kuech , Michael Kratschmer , Bernhard Neugebauer , Michael Meier and Frank Baumgarte
1
Fraunhofer IIS, Erlangen, Germany
2
Apple Inc., Cupertino, USA

ABSTRACT

Recently, the new MPEG-H 3D Audio standard has been finalized. It has been designed for delivery of next
generation audio content to the user. In addition to highly efficient immersive audio transmission, MPEG-H 3D
Audio allows new capabilities such as personalization and adaptation of the audio content to different use scenarios.
It also provides an enhanced concept for loudness and dynamic range control (DRC) to adapt the characteristics of
the audio content to the requirements of different playback scenarios and listening conditions. This paper gives a
detailed overview of the loudness control and DRC functionality of MPEG-H 3D Audio. Relevant use cases are
discussed to exemplify the application of the enhanced DRC and loudness management features.

capabilities of the receiving device as well as user


preferences.
1. INTRODUCTION
The traditional TV broadcast environment uses a well-
The new MPEG-H 3D Audio standard [1] has recently defined end-to-end solution to deliver audio content to
been finalized. It allows a highly efficient transmission the end user. Accordingly, it has been a good
of immersive audio content in three different formats, compromise to define a particular target loudness and
channel-based, object-based and scene-based using dynamic range for this specific delivery channel and
Higher Order Ambisonics (HOA). It has been designed well-known type of sound reproduction system of the
to offer new capabilities such as user interaction for receiving device.
personalization and adaptation of the audio content for
different use scenarios. Additionally, MPEG-H 3D
However, new types of delivery platforms and
Audio (MPEG-H 3DA) offers a flexible and
infrastructures have become significant and are
comprehensive concept for loudness control and DRC
constantly evolving. In a multi-platform environment,
to adapt the characteristics of the audio content as
the same content is delivered through different
appropriate for the content type, listening environment,
distribution networks (e.g. broadcast, broadband and
Kuech et al. DRC and Loudness Control in MPEG-H 3D Audio

mobile networks) and is consumed on a variety of values – and how it is applied. The metadata is typically
devices (e.g. AVR, TV set, mobile device) in different generated by the content provider and attached to the
environments (e.g. silent living room, noisy public content. The audio content is delivered unmodified and
transport). the metadata can be applied at the receiver if desired.
The content provider has full control of the whole
From the consumer’s point of view, the characteristics process and can ensure that the DRC metadata produces
of the audio content should fit the individual listening a high-quality result in all scenarios.
condition and preference irrespective of the origin and
distribution channel of the content. As a consequence of For the integration into MPEG-H 3DA, additional
the wide variability of listening environments, flexible requirements for interactive and immersive audio have
adaptation of the audio content is required to avoid user been taken into account and corresponding extensions
annoyance in many cases. To be more specific, here are have been added. The loudness control and DRC tool of
a few examples that illustrate common problems that MPEG-H 3DA provides automatic loudness
may result from the scenario described above: normalization to a desired target level regardless of the
given content format, the loudspeaker configuration for
• The user needs to adjust the playback volume playback, or the user’s selection of a specific
when switching between different distribution presentation of the delivered program. It allows
channels because the loudness is not consistent. reversible adaptation of the dynamic range of the audio
as appropriate for the content type, listening
• Consecutive program items don’t share a environment, capabilities of the receiving device as well
common loudness level, e.g. a movie is as user preferences.
followed by an annoyingly loud commercial.
This paper provides an overview of dynamic range and
• The intelligibility of movie dialog is adversely loudness control as specified in the MPEG-H 3DA
affected in soft parts due to a noisy listening standard. First, a brief summary of the MPEG-H 3DA
environment. decoder is presented in Section 2. The different
functional blocks that are related to DRC and loudness
• The dynamic range of an asset is too large for processing are discussed in more detail in Section 3.
the desired playback level: The level of loud Practical examples for configuring the loudness/DRC
parts of a movie is annoyingly high when soft system and how to control it are given in Section 4,
parts are just loud enough; or soft parts are where both content production side and receiving side
inaudible when loud parts are at a reasonable are considered.
level.
2. OVERVIEW OF MPEG-H 3D AUDIO
• The dynamic range of an asset is too large for
the employed playback device, e.g. low-quality In this section, a brief overview of the MPEG-H 3DA
loudspeakers in portable devices. system is presented. Figure 1 shows a block diagram
illustrating the main functional blocks of the MPEG-H
• The audio signal clips after downmixing the 3DA decoder.
original immersive content to a lower number
of playback loudspeakers. As a first step, all transmitted audio signals, in either
channel, object or HOA format, are decoded by the core
To meet the requirements of multi-platform decoder stage. Channel signals are mapped to the target
environments, a sophisticated control of loudness and reproduction loudspeaker setup using a format
dynamics over a large range is necessary. Such a converter. For object-based representations, the audio
solution is provided in MPEG-H 3DA based on the signals for each object are rendered to the target
loudness control and DRC tools of the new MPEG-D reproduction loudspeaker setup by the object renderer
Dynamic Range Control standard [2][6]. MPEG-D DRC using the associated object metadata. Alternatively,
provides an enhanced feature set and improved quality signals coded via extended Spatial Audio Object Coding
compared to DRC systems of legacy audio codecs. It (SAOC-3D) [3], i.e. parametrically coded channel
defines a comprehensive metadata format – including signals and audio objects, are rendered to the
program loudness information and time-varying gain

AES 139th Convention, New York, USA, 2015 October 29–November 1


Page 2 of 10
Kuech et al. DRC and Loudness Control in MPEG-H 3D Audio

Channel
Content Format
DRC
Converter
Object
Content Object
DRC
Renderer
MPEG-H
3D Audio SAOC 3D SAOC 3D Peak
Mixer DRC LN
bitstream in Core Content Renderer Limiter audio out
Decoder
DRC

HOA HOA
Content Renderer
DRC

Figure 1 - Top level block diagram of MPEG-H 3D Audio decoder


integrated into the HOA renderer. For parametric
target reproduction loudspeaker setup using the representations based on SAOC, the DRC processing of
associated metadata. HOA content is rendered to the individual channels and objects is performed in the
target reproduction loudspeaker setup using the corresponding rendering block, too. A second DRC
associated HOA metadata. stage is located directly after the mixer. It is used to
control the dynamic range of the entire audio scene
After decoding and rendering in the individual rendered to the target loudspeaker setup. The post-
rendering blocks, the final output signal is created in the mixing DRC block can also be used to apply optimized
mixing stage. The mixer generates the final output processing for specific loudspeaker setups. After
signal for playback based on the output signals of each rendering, potential format conversion, mixing and
channel-, object-, and HOA-related rendering block. DRC processing, the loudness normalization block (LN)
ensures that the signal is played out at the pre-defined
Note that the mixer is not simply superimposing its loudness level. The processing chain is completed by an
different input signals, but it also takes into account optional peak limiter that prevents the audio signal from
additional control information on how to create the potential clipping that may be caused by signal
output signal of the final audio scene. This control amplifications due to downmixing and rendering, user
information can either be transmitted within the MPEG- interaction or loudness normalization to high target
H 3DA metadata or may result from user interaction levels.
with the content. This feature is especially useful to
support personalization of broadcast programs, where A more detailed discussion of the MPEG-H 3DA
different presentations of the same program can be system can be found in [4][5].
provided within the same MPEG-H 3DA stream. The
user can then select between different presentations, e.g.
3. FUNCTIONALITY
corresponding to different dialog languages in movies
or alternative commentary tracks in sports broadcasts
The dynamic range and loudness control tool included
(e.g. “home vs. away team”). The user can additionally
in the MPEG-H 3D Audio standard supports audio
select optional audio overlays for video description or
processing in the following main categories.
other voice-over scenarios. It is also possible to adjust
the level of a dialog object compared to music and
effects in order to, e.g. increase speech intelligibility. • Loudness control of an entire audio program or
presentation.
Figure 1 also shows the different processing blocks • Dynamic range control within an audio
related to DRC and loudness control. The first set of program.
DRC blocks is located directly after the core decoding • Peak and clipping control.
of the audio signals in order to allow for DRC • Ducking for voice-over applications.
processing of each channel and object individually. In
case of HOA content formats, the DRC processing is

AES 139th Convention, New York, USA, 2015 October 29–November 1


Page 3 of 10
Kuech et al. DRC and Loudness Control in MPEG-H 3D Audio

Typically a combination of these different processing


playback without loudness normalization
blocks is required to achieve a desired adaptation of the loudness
audio characteristics. For example, to normalize an
audio asset to a high target loudness level, dynamic
range compression is needed to provide the required commercial
headroom and additional peak limiting should be target loudness
at decoder
performed to avoid clipping distortions.
sports
The specific features provided by MPEG-H 3DA for
each of these functional categories mentioned above are
discussed in this section. Whenever necessary, all movie
functions work together to produce the most suitable
output for a given playback scenario.
playback
3.1. Loudness Control loudness with loudness normalization

The MPEG-H 3DA system can automatically normalize


and control the loudness of the reproduced audio
target loudness
content at the decoder. The system supports mandatory at decoder
loudness information for the program that is included in
commercial
the metadata of the MPEG-H 3DA stream. Various sports
loudness measurement systems (e.g. ITU-R BS.1770-3 movie
[8], EBU R-128 [10], ATSC A/85 [13]) are supported in
order to fulfill applicable broadcast regulations and
recommendations. It is possible to specify whether a
Figure 3 - Example of loudness normalization for three
loudness descriptor relates to the loudness of the full
program items having different loudness.
program or whether it refers to a specific anchor
element of the program such as the dialog or
Figure 3 shows three example items of a broadcast
commentary.
program. The top of Figure 3 illustrates the case of
unprocessed audio assets. None of the three different
The general concept of loudness normalization is
program items matches the target loudness level at the
illustrated in Figure 3, where the meaning of the
decoder, where the commercial and the sports program
different bars is shown in Figure 2. The thick horizontal
are too loud and the movie’s loudness is too low
line in Figure 2 corresponds to the measured loudness of
compared to the desired target level. After loudness
the considered audio asset. The blue bar represents the
normalization has been applied, as illustrated at the
loudness range [11] and the top of the red bar illustrates
bottom of Figure 3, the playback loudness of all three
the peak level [8] of the audio asset. It is important to
program items is the same and matches the desired
note that the loudness range of an audio asset is closely
target level at the decoder. Note that the loudness range
related to its dynamic range, i.e. a large loudness range
and the signal peak relative to the program loudness
indicates high dynamic range audio, and consequently, a
remains unchanged after loudness normalization, since
small loudness range implies a reduced dynamic range
loudness normalization is performed based on a time-
of the audio asset.
invariant normalization gain derived from the difference
of the loudness of the original program item and the
peak
target loudness level for playback at the decoder.

loudness
More advanced loudness related parameters including
loudness [LKFS]
range loudness range, maximum short-term and maximum
momentary loudness (see ITU-R BS.1771-1 [9], EBU
R-128 [10][11]), as well as signal peak information can
Figure 2 - Legend for loudness and DRC related be optionally transmitted in the MPEG-H 3DA metadata
illustrations. Loudness is measured in LKFS according if desired. A detailed overview of supported descriptors
to [8]. is provided in [7].

AES 139th Convention, New York, USA, 2015 October 29–November 1


Page 4 of 10
Kuech et al. DRC and Loudness Control in MPEG-H 3D Audio

DRC processing or downmixing for format conversion increased


may potentially change the loudness of the original dialog level
audio content. Loudness information of the program
after applying a specific DRC or performing format original After loudness
conversion can be additionally included in the metadata loudness compensation
to compensate for any related loudness variation.
dialog
In case that the MPEG-H 3DA stream contains multiple
presentations of the same program, loudness
channel bed
information is provided for each of them separately at
the encoder. This enables immediate and automated
loudness control for interactive and personalized audio
Figure 4 – Illustration of the loudness compensation
reproduction. For example, when the user switches
concept for user interaction with the dialog object gain.
between different presentations the loudness
normalization gain is instantaneously adjusted.
channel bed in the mix is increased as desired, where
this effect is partly achieved by boosting the dialog, but
As mentioned in Section 2, MPEG-H 3DA allows users
also by attenuating the channel bed signals when
to interact and control the rendering of individual
applying the loudness compensation gain.
objects or signal groups. For example, the level of a
dialog or commentary object within the program mix
It should be noted that the loudness compensation
can be changed if a corresponding control interface is
concept described above is not included in [1] but part
provided to the user at the receiving device. However,
of a proposed amendment to [1] that is currently
when increasing the level of the dialog object, the
standardized in the MPEG Audio-Subgroup.
overall loudness of the resulting mix will also be
increased compared to the original presentation. This
behavior interferes with the requirement of consistent 3.2. Dynamic Range Control
loudness. Therefore, the loudness control concept in
MPEG-H 3DA includes a tool to compensate for Traditionally, dynamic range control is associated with
loudness variations due to user interaction with compression, i.e. reduction of the dynamic range by
rendering gains. The method is based on metadata means of a dynamic range compressor. This results in
included in the audio stream that provides the measured softer segments of an audio asset being louder and/or
loudness for each signal group or object that is part of loud segments being softer. A simple dynamic range
the program mix. From these individual loudness compressor generates time-varying gain values that are
values, a compensation gain can be determined after any applied to the audio signal to achieve the desired
gain interaction by the user, which is applied in the compression effect.
loudness normalization block together with the loudness
normalization gain. The loudness compensation concept Instead of applying the gain values immediately,
is illustrated in Figure 4 for the example of a program MPEG-H 3DA employs a reversible approach: The
consisting of a dialog object and a channel bed. DRC gain values are provided as metadata
accompanying the audio content such that they can be
On the left hand side of Figure 4 the loudness of the applied at the receiver if desired. If no DRC processing
original presentation is shown, where the different color is required, the audio remains unchanged and is played
bars correspond to the loudness portion of the dialog back without any modifications. The DRC configuration
object and the channel bed. In the center, the loudness data is fully controlled at the encoder and consists of
distribution is shown if the level of the dialog is static information describing the different DRC
increased and no loudness compensation is applied. configurations included in the DRC metadata and
Obviously, increasing the level of a dialog object also dynamic data representing encoded DRC gain
results in an increase of the overall loudness of the full sequences. The temporal resolution of the encoded DRC
program mix. The right-hand side depicts the loudness gains can be chosen to be as low as 1 ms and thus
distribution within the mix after applying the loudness allows for transmitted DRC sequences meeting
compensation gain to restore the desired target level. As requirements of professional dynamic range control.
can be seen, the relative level of the dialog object to the The efficient gain coding schemes offered by the DRC

AES 139th Convention, New York, USA, 2015 October 29–November 1


Page 5 of 10
Kuech et al. DRC and Loudness Control in MPEG-H 3D Audio

tool avoid undesirable high bitrate overhead for the 0 dB FS


dynamic DRC metadata.
- 16
There are various parameters specified in [1] and [2]
that can be used to control the DRC processing, where Mobile
important examples include:
- 24
• Compressor configurations (including DRC
characteristics) that are used to compute the
DRC gain sequences. - 31 TV

• The target level range for which the DRC


configuration is optimized. This relates to the
appropriate playback target levels commonly AVR
used for different reproduction devices and
scenarios (e.g. AVR, TV, mobile device).
Figure 5 - Typical relations of target level and dynamic
range for different receiver types.
• The DRC effect types provided by a certain
DRC configuration. They serve a wide range It is common practice to use different compressors for
of scenarios such as Late Night Listening, different playback target levels. While for low playback
Noisy Environment, Limited Playback Range, target levels only moderate or even no compression is
Dialog Enhancement, or General desired, high target levels imply the need for a
Compression. considerable amount of signal compression. Typical
relations of target level and dynamic range for different
• The frequency band configuration for flexible receiver types are illustrated in Figure 5. It shows
multiband DRC processing. practical examples for three different target levels,
representing typical choices as used for playback over
• The target loudspeaker configuration: Unique AVRs, TVs, and mobile devices.
identifiers define whether a DRC
configuration is designed for processing the Consequently, a specific target level range is defined for
original content format or for playback with a each DRC configuration to identify the most suitable
specific loudspeaker setup (e.g. optimized for one for a given playback system. Figure 6 illustrates an
stereo playback). example of three different sets of DRC configurations
that are designed for specific receiver types (AVR, TV,
• Presentation identifiers can be used to indicate mobile device) and that declare corresponding target
that a DRC configuration is assigned to a level ranges.
specific presentation of an audio scene.
The DRC effect types supported by MPEG-H 3DA
The configuration and gains of several independent cover a wide range of use cases and include: Late Night
compressors can be included in the DRC metadata to Listening, Noisy Environment, Limited Playback Range
achieve optimized compression effects for various (as for the internal loudspeakers of mobile devices),
playback scenarios. The DRC system allows specifying Dialog Enhancement, and General Compression. Figure
separate DRC configurations for single channels or 7 illustrates a comparison of regular playback on TV
groups of channels, individual objects or groups of with watching a movie late at night. For the latter case,
objects, HOA content or different pre-defined loud parts of the movie should be attenuated, e.g. to
presentations of the program. At the decoder, the avoid disturbance of family members, while soft parts
appropriate DRC configuration is automatically selected of the dialog should still be intelligible at low listening
considering the given target loudness level for playback, levels. This results in a reduced dynamic range
the desired DRC effect type, the reproduction setup, the compared to the audio signal after processing with the
selected presentation of a program or other user input. default DRC configuration for a high-quality receiver
type.

AES 139th Convention, New York, USA, 2015 October 29–November 1


Page 6 of 10
Kuech et al. DRC and Loudness Control in MPEG-H 3D Audio

target levels, downmixing to a lower number of


0 playback loudspeakers during format conversion of
channel content and user interaction with rendering

loudness [LKFS]

DRC set 3
gains, where the level of certain objects or channel
Mobile playback groups is increased.

To avoid annoying audible distortion due to digital

DRC set 2
target loudness
at decoder
clipping, the DRC and loudness control tool of MPEG-
Playback on TV speakers H 3DA includes an optional peak limiter at the decoder.
As shown in Figure 1, the limiting is performed at the
very end of the decoder processing chain.
DRC set 1

High-quality playback on AVR


In case of downmixing, rendering, or loudness
normalization, the clipping prevention can also be
controlled at the production side. So-called clipping
prevention gains can be generated by a peak limiter at
Figure 6 - Illustration of the target level range
the encoder side, transmitted within the DRC metadata
information included for different sets of DRCs. The
and applied to audio content in the same way as
dashed line indicates the desired target loudness for this
compressor gain values. Based on the peak information
example.
in the metadata, the DRC tool automatically scales back
and applies the clipping prevention gains according to
0 dB FS the remaining headroom after adjusting the level to the
target loudness level if applicable. If such an adjustment
is not required, the clipping prevention gains can be
merged with the DRC gains to one combined gain
sequence at the encoder side.

- 24 - 24

watching TV 3.4. Ducking


TV late at night
Ducking is useful if an audio signal is overlaid over the
Figure 7 - Illustration of dynamic range characteristics main audio signal, such as narration, director’s
after processing with the default DRC configuration for comments, video description, and other related use
TV and a DRC configuration with effect type “Late cases. In common approaches the main audio signal is
Night”. attenuated when the narration is active. The attenuation
and transitions can be precisely controlled and delivered
The user or an application can request DRC effect types by the DRC tool of MPEG-H 3DA. The realization of
via the control interface of the decoder. ducking via encoder generated gains transmitted within
the DRC metadata eliminates the look-ahead needed in
More details and examples for practical DRC system traditional playback systems.
configurations and decoder side control of DRC
processing are discussed in Section 4. The ducking gain sequences for the main audio program
are coupled to a corresponding audio object or track.
The ducking of the main program is automatically
performed if, e.g. the user activates audio description or
3.3. Peak and Clipping Control selects a pre-defined presentation including it.

There are several processing steps within the MPEG-H Figure 8 illustrates the signal of a narration object and
3DA decoding that can potentially lead to clipping of the corresponding ducking gain.
signal peaks. The most important examples for such
processing steps are loudness normalization to high

AES 139th Convention, New York, USA, 2015 October 29–November 1


Page 7 of 10
Kuech et al. DRC and Loudness Control in MPEG-H 3D Audio

Narration signal

0
Receiver type Target loudness Dynamic range
time

AVR -31 LKFS High


Ducking gain [dB]

0 TV -24/-23 LKFS Medium


time

Mobile -16 LKFS Low


Figure 8 - Narration object signal (top) and
corresponding ducking gain (bottom) applied to the Table 1 - Typical DRC configurations for different
main program. receiver types.

levels for the AVR/TV/mobile DRC configurations


4. SYSTEM CONFIGURATION AND
should cover the entire range of playback loudness
CONTROL levels such as shown in Figure 6. The effect type
description of the DRC sequences for receiver
4.1. Production Side adaptation may simply be set to “General
Compression”.
As already mentioned earlier in this paper, today’s
content providers, e.g. broadcasters, have to expect that In addition to receiver type adaptation of the audio,
consumers will listen to the delivered program on a special DRC configurations supporting typical listening
variety of receiving devices. Accordingly, the content conditions can be included in the MPEG-H 3DA stream.
and its associated DRC and loudness related metadata Practical examples are watching a movie on a mobile
within the transmitted MPEG-H 3DA stream needs to device while being in a noisy environment such as
be prepared to provide the best user experience for public transport, or watching TV at home late at night as
different playback scenarios. already discussed in Section 3.2.
As a basic step, different compressors should be While the previously considered configurations for
configured at the encoder to generate DRC gain receiver type adaptation typically apply the same DRC
sequences that are suited to adapt the audio content for gains to each audio channel, object and HOA scene, it
playback on high-quality home theater systems/AVRs, can be beneficial to provide different DRC gain
TV sets, and mobile devices such as smart phones or sequences for individual objects or channel groups. For
tablets. While on AVRs the target loudness level should example, in a noisy environment it is sometimes
be rather low and no or only very little dynamic range difficult to understand the dialog of a movie. In this
compression is required, the small loudspeakers in flat case, it is advantageous to use an individual compressor
panel TV sets imply that the reproduced content should configuration for the dialog object, or the channels that
have only moderate dynamic range and the loudness dominantly carry the dialog, in order to increase speech
level at the decoder should also be increased. The intelligibility. Similarly, for a late night DRC
situation is even more challenging on mobile devices, configuration, the audio of the channel bed can be
where miniaturized transducers and potentially noisy compressed more aggressively to reduce high level
listening environments demand high target loudness peaks in action scenes, while applying more
levels and small dynamic range. In Table 1, typical conservative processing to the dialog parts of the
DRC configurations for three receiver types are content.
summarized for broadcast or movie streaming service
scenarios. Let us revisit the case of mobile receiving devices
again: In this scenario, downmixing of immersive sound
The compressor configuration at the MPEG-H 3DA for reproduction on stereo loudspeakers has to be
encoder can either be based on default compression considered in addition to high target loudness levels,
curves as specified in [12] or on additional pre-defined leading to a high risk of clipping of the rendered audio.
individual parameter settings. Alternatively, the DRC The downmix processing of the format converter at the
gains provided by custom compression tools can be MPEG-H 3DA decoder is fully defined and therefore
passed to an encoder interface and included into the known at the encoder side. Since the DRC processing is
MPEG-H 3DA stream. The range of playback target

AES 139th Convention, New York, USA, 2015 October 29–November 1


Page 8 of 10
Kuech et al. DRC and Loudness Control in MPEG-H 3D Audio

also fully controlled at the encoder, the audio signal to scenario, the tablet should be considered as an AVR
be expected at the output of the decoder can be rather than a mobile device when configuring the DRC
evaluated already at the encoder. Potential clipping can and loudness processing in the MPEG-H 3DA decoder.
therefore be identified in advance at the encoder and be Such decoding behavior can easily be achieved if the
anticipated appropriately. receiving device is aware of its currently active audio
output channel. Then, different values for the playback
4.2. Receiving Side target loudness can be used to perform loudness
normalization and to select the DRC set that is
In the following we discuss the use cases considered in appropriate for the given output configuration used for
the previous section from the perspective of the playback. On a tablet a practical approach is to trigger
receiving side. In general, the DRC tool of the MPEG-H different configurations when either using the internal
3DA decoder automatically selects the best-suited DRC loudspeakers, headphones or the HDMI output.
configuration included in the metadata of the audio
stream. For this, it takes into account information on the
5. SUMMARY
reproduction configuration (e.g. target loudness,
loudspeaker configuration) and potential additional user
In this paper a technical overview of the comprehensive
input (selection of a presentation of a program, special
loudness and DRC concept in MPEG-H 3D Audio has
effect types).
been provided. The different functional blocks and their
integration into the MPEG-H 3D Audio system have
For the case that the user receives the MPEG-H 3DA
been presented. The practical application of the
stream on a TV, the target loudness at the decoder will,
loudness and DRC features has been discussed for
e.g. be set to -24 LKFS. The selection process of the
relevant use cases together with illustrative examples
DRC tool automatically picks the DRC configuration
for corresponding system configuration and control of
from the stream that includes this particular value in its
the loudness and DRC tools.
associated target level range. Some receiving devices
may offer a control interface that allows the user to
select a specific DRC effect, e.g. for listening late at 6. REFERENCES
night or in noisy environments. In this case, the
selection process of the DRC tool will preferably choose [1] ISO/IEC, “Information technology -- High
a DRC set that has the corresponding effect type defined efficiency coding and media delivery in
in its configuration data. heterogeneous environments -- Part 3: 3D Audio”,
International Standard ISO/IEC 23008-3:2015.
When receiving an MPEG-H 3DA stream on a mobile [2] ISO/IEC, “Information technology -- MPEG audio
device, a similar procedure as described above for the technologies -- Part 4: Dynamic Range Control”,
TV happens: The increased playback target loudness of, International Standard ISO/IEC 23003-4:2015.
e.g. -16 LKFS is taken into account by the selection
process to choose the appropriate DRC configuration. [3] ISO/IEC, “Information technology -- MPEG audio
User input or other control mechanisms on the mobile technologies -- Part 2: Spatial Audio Object
device may lead to another DRC configuration that, e.g. Coding”, International Standard ISO/IEC 23003-
was specially optimized to provide an improved 2:2010.
listening experience in noisy environments. [4] Herre, J. et al, “MPEG-H Audio - The Upcoming
Standard for Universal Spatial / 3D Audio Coding”,
Additional aspects of controlling DRC and loudness International Conference on Spatial Audio (ICSA),
related processing result from application scenarios, in 2014, Erlangen, Germany.
which a device only receives the MPEG-H 3DA stream
for decoding, while the actual playback of the rendered [5] Herre, J. et al, “MPEG-H Audio - The New
audio content is done on a different device. For Standard for Universal Spatial / 3D Audio Cod-
example, a tablet can be used to receive and decode ing”, 137th AES Convention, Los Angeles, USA,
movie content from a media streaming service. The 2014.
HDMI output of the tablet is connected to an AVR such
that the audio is played back over a high-quality multi- [6] ISO/IEC JTC1/SC29/WG11 N15071, “White Paper
channel loudspeaker system. It is obvious, that in this on MPEG-D Dynamic Range Control”, ISO/IEC

AES 139th Convention, New York, USA, 2015 October 29–November 1


Page 9 of 10
Kuech et al. DRC and Loudness Control in MPEG-H 3D Audio

JTC1/SC29/WG11, Communication Subgroup,


2015, Geneva, Switzerland. http://www.mpeg-
audio.org/docs/w15071_MPEG-D_DRC.pdf

[7] Füg, S. et al.: “Design, Coding and Processing of


Metadata for Object-Based Interactive Audio”,
137th AES convention, 2014, Los Angeles, USA.

[8] ITU-R, Recommendation BS1770-3, “Algorithms


to measure audio programme loudness and true-
peak audio level”, 2012, Intern. Telecom Union,
Geneva, Switzerland.

[9] ITU-R, Recommendation BS1771-1,


“Requirements for loudness and true-peak
indicating meters”, 2012, Intern. Telecom Union,
Geneva, Switzerland.

[10] European Broadcasting Union, “Loudness


Metering: ‘EBU Mode’ metering to supplement
loudness normalisation in accordance with EBU R
128”, EBU-Tech 3341, Geneva, Switzerland, 2011.

[11] European Broadcasting Union, “Loudness Range:


A measure to supplement loudness normalisation in
accordance with EBU R 128”, EBU-Tech 3342,
Geneva, Switzerland, 2011.
[12] ISO/IEC, “Information technology – MPEG
systems technologies -- Part 8: Coding-independent
code points”, International Standard ISO/IEC
23001-8.
[13] ATSC Recommended Practice A/85, “Techniques
for Establishing and Maintaining Audio Loudness
for Digital Television”, 2013, Advanced Television
Systems Committee, Washington, USA.

AES 139th Convention, New York, USA, 2015 October 29–November 1


Page 10 of 10

You might also like