High Dynamic Range Simultaneous Signal Compositing Applied To Audio

HIGH DYNAMIC RANGE
SIMULTANEOUS SIGNAL COMPOSITING, APPLIED TO AUDIO
Ryan Janzen and Steve Mann
University of Toronto
LDR EXPOSURES
ABSTRACT SENSOR
HDR
COMPOSITING
High Dynamic Range (HDR) compositing is well established , , , ...
in the field of image processing, where a sequence of differently-
TIME
exposed images of the same scene are combined to overcome SENSOR
the limited dynamic range of ordinary cameras. HDR
COMPOSITING
We extend this technique to audio. Rather than acquiring
samples separated by time or space, as is done in HDR image REAL−WORLD ...
HDR PHENOMENON LDR EXPOSURES
processing, we propose to perform simultaneous sampling of
the same input signal, using differently-gained versions of the Fig. 1: “Sequential HDR” vs. proposed “Simultaneous HDR”
same HDR signal fed into separate analog to digital convert-
ers (ADCs). An HDR audio signal is thus sampled by merg-
ing a set of low dynamic range (LDR) samplings of the orig- 2. SIMULTANEOUS HDR AUDIO COMPOSITING
inal HDR input signal. We optimize the choice of LDR input HDR photography and video typically use time-separated ex-
gains to achieve as high a dynamic range as possible for a posures (subject to ghosting problems when the subject matter
desired sampling accuracy. is in motion [7]) or spatially separated sensors (e.g. multiple
CCD arrays with beamsplitters). However, we propose par-
Index Terms— high dynamic range compositing, simul-
allel simultaneous samplings of an input signal from a single
taneous HDR compositing, composited dynamic range (CDR),
acoustic sensor.
HDR audio, CDR audio
To the best of our knowledge, this may be the first publi-
cation where HDR compositing is applied to audio. We focus
on extreme dynamic ranges which are beyond the sampling
1. INTRODUCTION AND PRIOR ART
capability of a given analog to digital converter (ADC).
In digital photography, sequentially capturing and combining Acoustic HDR compositing may have applications in biomed-
different images of the same scene is an established area of ical pulsed ultrasound [9], and research on water-hammer ef-
research [1, 2, 3, 4, 5, 6, 7]. fects [10, 11], where a very strong acoustic impulse occurs
Cameras have a limited dynamic range, but it is possi- periodically, and very weak sounds need to be sensed as well.
ble to capture a high dynamic range (HDR) scene in an HDR Another application is capturing sound from a wearable mi-
image, by combining a series of low dynamic range (LDR) crophone adjacent to a person’s mouth where we wish to use
images, each with different exposures [1, 2]. An overexposed it to also capture more distant (i.e. quiet) voices or ambient
image is saturated in bright regions of the scene, but captures sounds in the room, including while the wearer is speaking,
dark areas well. On the other hand, an underexposed image i.e. a situation that previously known methods like AGC (Au-
has its response cut off at (or near) 0 in dark regions, but cap- tomatic Gain Control) would not be able to handle.
tures bright areas well. By properly merging a series of differ- Due to the limited dynamic range of conventional audio
ently exposed images, it is possible to capture an HDR scene recorders, there is an unfortunate common need to adjust the
that cannot be captured accurately by one exposure alone. gain of a recorder (either manually or by AGC, in one or more
stages) depending on the sound level being recorded [12].
We have also applied HDR compositing to RADAR imag-
Instead, it would be far superior if one could simply press
ing, to distinguish strong reflections (such as from large ships)
“record”, without any saturation or SQNR problems, over a
from weak reflections (e.g. from a small iceberg fragment),
wide dynamic range. Furthermore, we believe that audio mix-
when both reflections are received simultaneously. Under
ing boards [13] should be HDR-capable, so that only one gain
these circumstances, other existing methods of handling dy-
control is required per channel, rather than a separate gain
namic range such as STC (Sensitivity Time Control [8]) which
control at every stage of amplifier for each channel. For con-
adjust receiver input gain over time, were ineffective.
2012 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)
978-1-4673-1433-6/12/$31.00 ©2012 IEEE
Authorized licensed use limited to: Macquarie University. Downloaded on January 08,2022 at 09:16:40 UTC from IEEE Xplore. Restrictions apply.
Loud Exposure
g
3
g
2 Medium Exposure
g
1
Audio Quiet Exposure HDR

Input GAIN
CLIP SIGNAL
QUANTIZATION RESCALING Output
PRIOR TO ADC This example illustrates:
3−Bit ADC, 8 Levels of Quantization
(2’s complement signed numbers)
ELECTRONIC Levels
Possible
PREPROCESSING
COMPOSITING
Certainty Certainty Certainty
FUNCTIONS
CERTAINTY
t
HDR
t
Fig. 2: HDR input compositing system: illustrated for the simple example of 3-bit A-to-D conversion.
ventional or extreme dynamic range inputs, we would wish to 3. AUDIO “EXPOSURES”, CERTAINTY FUNCTIONS
have no distortion other than at the final output stage, so the
Analogously to image exposures in photography, we combine
operator does not need to constantly worry about saturation
a Wyckoff set [2, 14] of several LDR audio exposures, each
(gain too high) and quantization (gain too low) for each of the
with a different gain before ADC. See Fig. 3.
separate stages of processing.
HDR audio compositing could serve as a low cost input We use sensors and ADCs which are ordinarily used for
solution using existing input sampling hardware, with simple LDR audio sampling, and thus are required to be linear (but
gain circuitry added. This method captures a high dynamic quantized) up to a maximum measurable quantity, after which
range (HDR) audio signal using a set of simultaneous con- the response may become nonlinear. (Otherwise, stray fre-
ventional low dynamic range (LDR) samplings of the same quencies would be added to the signal during normal LDR
signal, gained differently for each ADC. Fig. 2 shows the con- sampling.) Thus, we bypass the comparametric analysis [4]
ceptual structure of the system. that is often needed when compositing HDR images.
Each captured exposure signal is most useful near, but
USEABLE DYNAMIC RANGE just short of, the ADC’s maximum. The signal thus overpow-
(g 3) (g 2) (g 1)
ers quantization noise as much as possible without saturating.
M We quantify this in a certainty function (similar to a certainty
EXTENDED DYNAMIC RANGE g GAIN
FACTOR
SAFETY
WITH REDUCED SQNR Δ MARGIN function in HDR imaging [1]), which is used to weight each
αSM exposure when combining samples. We design a certainty
function as follows: (1) For ADC output level 0, the certainty
function is small but nonzero , to recognize that quantization
error overpowers the signal, and the only information gained
is the fact that the signal is between ADC levels ±1; (2) affine
increase starting from ADC 0, representing improving SQNR
where the signal overpowers the quantization noise; (3) ramp-
down near ±ADC maximum, to prevent sudden switchover to
another exposure and thus prevent sudden glitches in the case
of imperfect gain or bias calibration, (4) at ±ADC maximum,
certainty drops to zero to reject saturated signals (no informa-
tion can be gained as to how large the signal is beyond the
EXPOSURE 1 SATURATES ADC limits).
EXPOSURE 2 SATURATES Unlike HDR imaging, the result of applying the certainty
M=3 gΔ = 8
EXPOSURE 3 SATURATES function is now a time-varying certainty signal. Fig. 2 shows
Fig. 3: Wyckoff set consisting of three exposures of the same input the process of combining LDR exposures, using a separate
signal: a sinusoid which crescendos over a 60 dB range. This simple time-varying certainty signal for each exposure. Later, we
example illustrates a geometric series of gains, causing the exposures will see (in Fig. 7) the total certainty fluctuating, depending
to be evenly spaced (logarithmically) across a chosen dynamic range. on the total information available from each exposure.
q
x(t) = SINUSOID
4. OCCUPYING A DYNAMIC RANGE:
RANGE DENSITY FUNCTIONS RDF
t R (q)
x
We now examine HDR time-varying signals more closely.
Let us say “quantity” q when we refer to the instantaneous
value (voltage, current, etc.) of a signal x(t). Then, “ampli-
tude” A(t) strictly refers to the envelope (peak or RMS) of q
the quantities taken on by x(t) in one cycle of a waveform of y(t) = SAWTOOTH
period T (e.g. values of x(τ ) in the interval τ ∈ [t− T2 , t+ T2 ]) RDF

Thirdly, “dynamic range” refers to the ratio between the t R y (q)
maximum amplitude and minimum amplitude of the signal
(which occur at different times).
We propose a Range Density Function (RDF) which rep-
resents the proportion of the entire signal which occupies each
EXPONENTIAL = z(t) log(a) q
given quantity in the range of the signal. This RDF can be ap- log(A (t)) CRESCENDO
z
proximated by taking a histogram of the signal quantity (as RDF
time progresses), and can be approached in the limit with R (q)
z
asymptotically increasing histogram resolution. Each bin of a t
DRDF
histogram (between quantities a and b) is related to the RDF R (log(a))
Az
as follows:
Z b
Fig. 4: RDF (range density function). Note that RDF is an ob-
Rx (q)dq = Histogram [a ≤ x(t) ≤ b]
a
served distribution, which varies for each observation or realization
Z t2 h (1) of a random process, unlike the PDF (probability density function),
1 u(x(τ )−a)
i
· u(b−x(τ )) · dτ
= which is a statistical distribution. In the third example, even though
t2 − t1 t1 the RDF at each given amplitude is still that of a sinusoid, a differ-
ent overall RDF is created when the sinusoid is modulated with an
where u(·) is the Heaviside step function, and the histogram exponential crescendo. This pattern gives a uniform DRDF (when
is found by observing x(t) between times t1 and t2 . measured logarithmically).
The RDF of a function is a deterministic analogue to the
Probability Density Function (PDF) of a statistical random
and should obey the properties: Rx (q) ≥ 0, and which occupies each given amplitude in the dynamic range
Rvariable,
∞
R x (q)dq = 1. For a periodic waveform (with period T ), of the signal. The signals in Figs. 3 (top plot) and 4 (bottom
−∞
we can determine the RDF directly: plot) have a uniform DRDF in a logarithmic sense, due to the
ramp from minimum to maximum amplitude. This straight
1 X 1 line (when viewed on a log plot) represents an exponential
Rx (q) = · (2)
T |x0 (t)| crescendo in amplitude.
{t| x(t)=q}
for any values of q at which x(t) is differentiable and x0 (t) is 5. CHOOSING GAIN SPACING OF M EXPOSURES
nonzero1 . If x(t) holds steady at quantity q ? , a delta measure We would like to carefully choose the relationship between
occurs at Rx (q ? ). For example, this occurs twice for a square the M exposures that will be combined together to sample
wave: RSQUARE (q) = 21 δ(q−1)+ 12 δ(q+1). A sine wave leads an HDR signal. We examine the case where the signal is de-
to two asymptotes: RSINE (q) = π1 √ 1 2 , for q ∈ [−1, 1]. tected by a single sensing element, and each exposure is a
1−q
See Fig. 4. gained version of this sensed signal. We thus want to care-
In the frequency domain, the most pure waveform is the fully choose the spacing between the gains gm of each expo-
complex exponential, and a uniform distribution is created by sure (“exposure packing”).
a sinc function. In the amplitude domain, the most pure wave- To make HDR sampling robust for a variety of situations,
form is the square wave, and a uniform distribution is created we space the exposures in a geometric sequence to equally
by a sawtooth wave.2 (See Fig. 4.) cover all amplitudes available within the given dynamic range:
We further propose a Dynamic Range Density Function m−1
gm = g∆ g1 for m = 1...M (3)
(DRDF) which represents the proportion of the entire signal
1 This
where g∆ is the gain factor separating each exposure.
calculation is analogous to finding the PDF of a function of a ran-
dom variable having a uniform distribution over the phases [0, 2π].
If the input signal’s dynamic range is not occupied uni-
2 As well, an inverse sawtooth or triangular pattern would also suffice. formly (biased DRDF or RDF), then an uneven spacing of
3 This is a multidimensional optimization problem, for future work. the exposures may be optimal.3 However, we attempt here
to use a stringent set of requirements to create a generalized DYNAMIC RANGE OF
HDR system with demands on the full dynamic range (as rep- ORIGINAL SIGNAL HYPOTHESIZED
resented by uniform DRDF and RDF). D0 REGION
FOR SAFE HDR
We position the first exposure to capture the strongest ex-
pected input signal A0,MAX entirely within the sampling range ONE POSSIBLE
AADC,MAX , less a safety margin αSM : CHOICE OF
g4 (D , D )
EP 0
g1 · A0,MAX < AADC,MAX (1 − αSM ) (4)
g3
Define an “exposure packing” dynamic range, DEP , as the dy-
g2
namic range spanned by the M exposures, from the minimum
amplitude desired to be sensed by the most-gained exposure, g1
HDR
to the max. amplitude sensed by the least-gained exposure:
DEP EXPOSURE
DEP = HDR exposure packing dynamic range = ggM1 g∆ = g∆ M LOG−LOG PACKING
SCALE g 1 g 2 g 3 g 4 Q 4ADC DYNAMIC
A RANGE
D0 = Dynamic Range of original signal = A0,MAX 0,MIN
QADC = Quantization ratio = maximum quantity limit of ADC

quantization step of ADC
Fig. 5: The proposed constraints on the input signal and the expo-
g∆ = Gain factor, creating a geometric sequence of gains
sure packing form a triangular region. In this example, M = 4 ex-
M = Number of exposures in Wyckoff set
posures. In this visualization, D0 and DEP are treated as changeable
We hypothesize that the exposure packing may be best quantities, and the sequence of gains stretch or contract in tandem
chosen with relation to some function of the quantization noise, along the axes. Contracting the gain sequence allows the exposures
the HDR signal’s dynamic range, and number of exposures: to overlap. We are working here with gains in a geometric sequence,
p
g∆ = M DEP = f (QADC , D0 , M ) (5) as was explained in the text. (Note that, in this visualization, one
should be wary of imagining a crescendo input signal along these
Initially, we can devise some rough, approximate constraints. axes, as was done earlier, since here (a) the axes represent different
These will be compared against test results. To contain D0 possible dynamic ranges, and (b) the gains are arranged such that
within successive exposures, and to maintain precision greater the weakest signal is to be sampled by g4 ∝ g∆ 3
.) Returning now
than one quantization step when graduating from one expo- to the practicalities of choosing the exposure spacing: Alternatively,
sure to the next, we must have: instead of varying (DEP , D0 ), we can take them as given, and instead
M choose the hardware, i.e. M and QADC , to allow the (DEP , D0 ) point
< QM

D0 < DEP = g∆ ADC (6) to fall comfortably within the bounds of the triangle.
This constraint defines a triangular region of the D0 vs. DEP
plane, as drawn in Fig. 5. Again, this rough guideline is only a
hypothesis. Interestingly, it is indeed observed/confirmed by
our tests (e.g. Fig. 8). We created an automated routine, for
rebuilding and testing the exposures while varying their pack-
ing (Figs. 6 and 8). Even though it is desirable to increase D0
as much as possible, Fig. 8 reveals a tradeoff against recon-
struction error. Therefore, we may need to retreat back from
the DEP = QM ADC edge of the triangle by an exposure over-
lap factor αEO ≡ log(QADC )/ log(g∆ ) − 1, as desired in this
tradeoff. Therefore, we choose a number of exposures:
log D0
M≥ (αEO + 1) (7)
log QADC
and space them out evenly within a dynamic range less than
that available from QADC , but more than the dynamic range
of the original signal. An equal compromise would be:
p log10 D0 +M log10 QADC
g∆ = M DEP = 10 2M (8) Fig. 6: Sampling a 200 dB dynamic range signal using four 16-bit
which has DEP positioned between the bounds in Eqn. 6. Al- ADCs: Optimization of the gain separation between the conditioned
ternatively, Fig. 6 demonstrates that g∆ can be optimized com- signals fed into each ADC input. Each ADC on its own cannot accu-
putationally for a specific RDF and DRDF. rately capture the input signal across its entire dynamic range; the
To conclude: By increasing the number of exposures be- mean normalized errors from each ADC’s exposure are included
yond the bare minimum, we can spread and overlap the expo- for comparison. The data were produced by a computationally-
generated exponential crescendo input followed by HDR composit-
sures to increase the coverage of quantities near (but not quite
ing. The gains were varied according to Eqns. 3 and 4. This graph
saturating at) the extrema of each exposure, thus increasing M
will end up being a slice of a later graph (Fig. 8), since DEP = g∆ .
the compositing certainty.
Input
Signal
Sampling
gives an
imperfect
representation
Reduced error
from composited
LDR exposures
Recovered
HDR Signal
Fig. 7: The proposed algorithm, after sampling a HDR computational test signal with M = 4 exposures, compositing the result. This simple
example, for clarity, has the sampling encumbered by very coarse quantization steps of 0.05 (visible on 2nd plot, upon close inspection).
Under these harsh conditions, the quantization error from one of the exposures is compared to the reduced error after reconstruction. We
show the reconstructed output error signal (pink), along with the error of one of the exposures (blue), for clarity. Even when subjected to
severe quantization in this test, the reduction in error is visible, as the algorithm consolidates information from all four exposures.
6. EVALUATING THE SYSTEM IN OPERATION An important consideration was “Dynamage Range”: the
We evaluated the proposed system first using a low-cost PC ratio between the amplitude leading to damaging the sensor
stereo sound card (M = 2). The system was able to sample a or ADC, and the amplitude of the smallest detectable signal.
101 dB dynamic range test signal4 (which is beyond what the For example, when we used a hydrophone to listen to water
device can capture on its own), producing a 21dB reduction flow in a pipe, with high gain, we needed to ensure that a
in THD, as compared to feeding the one signal into one input, strong impulse produced by the water-hammer effect would
as would otherwise conventionally be done. not damage the hydrophone or ADC.
This initial protoype successfully proved the ability to ex- Additional simulations tested operation with M = 4 gained
pand the dynamic range of a mediocre audio capture device, inputs. Fig. 7 shows how our technique operates on a computer-
without the time-varying artifacts associated with AGC. The generated input signal. Examining the graph titled “error sig-
audio capture device was a Realtek ALC880 (16-bit resolu- nals normalized”: the reduced error signal after HDR com-
tion, 44.1 kHz sampling), positing is plotted in the purple trace, reduced from the blue
The prototype used a simple circuit to gain each expo- trace, which illustrates how Exposure 1, alone, was only use-
sure, and electronically saturate the highly gained exposures ful for the largest amplitudes. (This figure shows very coarse
to avoid damaging the ADCs. This circuit was constructed quantization for illustration purposes.)
with series-connected fast recovery diodes, with two paral- Fig. 8 verifies operation over a wide variation in D0 and
lel chains to limit both polarities, thus limiting the signals to DEP . A triangle wave crescendo was used as an input, in order
approximately +/-1.4V just before entry into the ADCs. to test under full coverage of the dynamic range. That is, the
4 According to Eqn. 7, M = 2 exposures is sufficient for this signal, with triangular waveform gave a uniform RDF within each period,
78% exposure overlap. The test signal was generated by a hardware signal and the exponential crescendo gave a uniform DRDF (full,
generator, traversing the dynamic range in 16 logarithmically-equal steps. uniform amplitude coverage of the dynamic range).
Fig. 8: Verification of HDR reconstruction, and optimization of synchronized exposures across a space of possible dynamic ranges. The
hypothesized constraints are evidenced in the triangular region, visible with reduced error. While we wish to permit as wide an input dynamic
range (D0 ) as possible, it is clear that there is a tradeoff against reconstruction error. Therefore, as was explained in the text, we can retreat
D0 and DEP back from the edge of this triangle, or otherwise increase M , as desired to reduce reconstruction error.
7. CONCLUSION [4] S. Mann, “Comparametric equations with practical applica-

tions in quantigraphic image processing,” IEEE Trans. Image
Processing, vol. 9, no. 8, pp. 1389–1406, August 2000.
We proposed a new technique to capture a high dynamic range
[5] M.A. Robertson, S. Borman, and R.L. Stevenson, “Estimation-
(HDR) signal, by compositing simultaneous, differently-gained theoretic approach to dynamic range enhancement using mult.
low dynamic range (LDR) samplings of it. HDR audio com- exposures,” J. Electronic Img., vol. 12(2), pp. 219–228, 2003.
positing is a low cost solution, using existing sampling hard- [6] S.B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High
ware, with the addition of simple input gain circuitry. dynamic range video,” ACM Trans. Graphics (Proc. SIG-
This work may have applications in RADAR, SONAR, GRAPH 2003), vol. 22(3), pp. 319–325, 2003.
biomedical ultrasound, and high dynamic range audio record- [7] E.A. Khan, O. Akyuz, and E. Reinhard, “Ghost removal in
ing. Extreme dynamic range events, in particular, can be high dynamic range images,” Proc. ICIP, pp. 2005–8, 2006.
sensed by geophones (for solid vibrations, such as glass and [8] H. Meikle, Modern radar systems, Artech House, 2008.
[9] K. Nightingale, M.S. Soo, R. Nightingale, and G. Trahey,
steel breakage), hydrophones (for underwater recordings such
“Acoustic radiation force impulse imaging: in vivo demonstra-
as for research on the water-hammer effect), or microphones
tion of clinical feasibility,” Ultrasound Med. Biol., vol. 28, pp.
(for gas environments, e.g. a headset microphone sensing 227–235, 2002.
both the user’s voice as well as distant quiet sounds). [10] S. Mann, R. Janzen, J. Huang, M. Kelly, J. Ba, and A. Chen,
“User-interfaces based on the water-hammer effect,” in Proc.
TEI, 2011, pp. 1–8.
[11] Ryan Janzen and Steve Mann, “Arrays of water jets as user in-
8. REFERENCES
terfaces.: Detection...of flow by listening to turbulence signa-
tures using hydrophones,” in Proc. ACMMM, 2007, pp. 505–8.
[1] S. Mann, “Compositing multiple pictures of the same scene,”
[12] John Eargle, Handbook of Recording Engineering, NY:
in Proc. Imaging Science and Tech.Conf., 1993, pp. 50–52.
Springer, 2005.
[2] S. Mann and R.W. Picard, “Being ‘undigital’ with digital cam-
[13] Roey Izhaki, Mixing Audio, Elsevier Science, 2011.
eras: Extending dynamic range by combining differently ex-
[14] Charles W. Wyckoff, “An experimental extended response
posed pictures,” in Proc. IS&T, May 7–11 1995, pp. 422–428.
film,” Tech. Rep. NO. B-321, Edgerton, Germeshausen &
[3] P. E. Debevec and J. Malik, “Recovering high dynamic range
Grier, Inc., Boston, Massachusetts, MARCH 1961.
radiance maps from photog.,” SIGGRAPH, pp. 369–378, 1997.

High Dynamic Range Simultaneous Signal Compositing Applied To Audio

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

High Dynamic Range Simultaneous Signal Compositing Applied To Audio

Uploaded by

Copyright:

Available Formats

HIGH DYNAMIC RANGE

SIMULTANEOUS SIGNAL COMPOSITING, APPLIED TO AUDIO

Ryan Janzen and Steve Mann

Audio Quiet Exposure HDR

period T (e.g. values of x(τ ) in the interval τ ∈ [t− T2 , t+ T2 ]) RDF

QADC = Quantization ratio = maximum quantity limit of ADC

7. CONCLUSION [4] S. Mann, “Comparametric equations with practical applica-

You might also like