Professional Documents
Culture Documents
University of Toronto
LDR EXPOSURES
ABSTRACT SENSOR
HDR
COMPOSITING
High Dynamic Range (HDR) compositing is well established , , , ...
in the field of image processing, where a sequence of differently-
TIME
exposed images of the same scene are combined to overcome SENSOR
the limited dynamic range of ordinary cameras. HDR
COMPOSITING
We extend this technique to audio. Rather than acquiring
samples separated by time or space, as is done in HDR image REAL−WORLD ...
HDR PHENOMENON LDR EXPOSURES
processing, we propose to perform simultaneous sampling of
the same input signal, using differently-gained versions of the Fig. 1: “Sequential HDR” vs. proposed “Simultaneous HDR”
same HDR signal fed into separate analog to digital convert-
ers (ADCs). An HDR audio signal is thus sampled by merg-
ing a set of low dynamic range (LDR) samplings of the orig- 2. SIMULTANEOUS HDR AUDIO COMPOSITING
inal HDR input signal. We optimize the choice of LDR input HDR photography and video typically use time-separated ex-
gains to achieve as high a dynamic range as possible for a posures (subject to ghosting problems when the subject matter
desired sampling accuracy. is in motion [7]) or spatially separated sensors (e.g. multiple
CCD arrays with beamsplitters). However, we propose par-
Index Terms— high dynamic range compositing, simul-
allel simultaneous samplings of an input signal from a single
taneous HDR compositing, composited dynamic range (CDR),
acoustic sensor.
HDR audio, CDR audio
To the best of our knowledge, this may be the first publi-
cation where HDR compositing is applied to audio. We focus
on extreme dynamic ranges which are beyond the sampling
1. INTRODUCTION AND PRIOR ART
capability of a given analog to digital converter (ADC).
In digital photography, sequentially capturing and combining Acoustic HDR compositing may have applications in biomed-
different images of the same scene is an established area of ical pulsed ultrasound [9], and research on water-hammer ef-
research [1, 2, 3, 4, 5, 6, 7]. fects [10, 11], where a very strong acoustic impulse occurs
Cameras have a limited dynamic range, but it is possi- periodically, and very weak sounds need to be sensed as well.
ble to capture a high dynamic range (HDR) scene in an HDR Another application is capturing sound from a wearable mi-
image, by combining a series of low dynamic range (LDR) crophone adjacent to a person’s mouth where we wish to use
images, each with different exposures [1, 2]. An overexposed it to also capture more distant (i.e. quiet) voices or ambient
image is saturated in bright regions of the scene, but captures sounds in the room, including while the wearer is speaking,
dark areas well. On the other hand, an underexposed image i.e. a situation that previously known methods like AGC (Au-
has its response cut off at (or near) 0 in dark regions, but cap- tomatic Gain Control) would not be able to handle.
tures bright areas well. By properly merging a series of differ- Due to the limited dynamic range of conventional audio
ently exposed images, it is possible to capture an HDR scene recorders, there is an unfortunate common need to adjust the
that cannot be captured accurately by one exposure alone. gain of a recorder (either manually or by AGC, in one or more
stages) depending on the sound level being recorded [12].
We have also applied HDR compositing to RADAR imag-
Instead, it would be far superior if one could simply press
ing, to distinguish strong reflections (such as from large ships)
“record”, without any saturation or SQNR problems, over a
from weak reflections (e.g. from a small iceberg fragment),
wide dynamic range. Furthermore, we believe that audio mix-
when both reflections are received simultaneously. Under
ing boards [13] should be HDR-capable, so that only one gain
these circumstances, other existing methods of handling dy-
control is required per channel, rather than a separate gain
namic range such as STC (Sensitivity Time Control [8]) which
control at every stage of amplifier for each channel. For con-
adjust receiver input gain over time, were ineffective.
2012 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)
978-1-4673-1433-6/12/$31.00 ©2012 IEEE
Authorized licensed use limited to: Macquarie University. Downloaded on January 08,2022 at 09:16:40 UTC from IEEE Xplore. Restrictions apply.
Loud Exposure
g
3
g
2 Medium Exposure
g
1
COMPOSITING
Certainty Certainty Certainty
FUNCTIONS
CERTAINTY
t
HDR
t
Fig. 2: HDR input compositing system: illustrated for the simple example of 3-bit A-to-D conversion.
ventional or extreme dynamic range inputs, we would wish to 3. AUDIO “EXPOSURES”, CERTAINTY FUNCTIONS
have no distortion other than at the final output stage, so the
Analogously to image exposures in photography, we combine
operator does not need to constantly worry about saturation
a Wyckoff set [2, 14] of several LDR audio exposures, each
(gain too high) and quantization (gain too low) for each of the
with a different gain before ADC. See Fig. 3.
separate stages of processing.
HDR audio compositing could serve as a low cost input We use sensors and ADCs which are ordinarily used for
solution using existing input sampling hardware, with simple LDR audio sampling, and thus are required to be linear (but
gain circuitry added. This method captures a high dynamic quantized) up to a maximum measurable quantity, after which
range (HDR) audio signal using a set of simultaneous con- the response may become nonlinear. (Otherwise, stray fre-
ventional low dynamic range (LDR) samplings of the same quencies would be added to the signal during normal LDR
signal, gained differently for each ADC. Fig. 2 shows the con- sampling.) Thus, we bypass the comparametric analysis [4]
ceptual structure of the system. that is often needed when compositing HDR images.
Each captured exposure signal is most useful near, but
USEABLE DYNAMIC RANGE just short of, the ADC’s maximum. The signal thus overpow-
(g 3) (g 2) (g 1)
ers quantization noise as much as possible without saturating.
M We quantify this in a certainty function (similar to a certainty
EXTENDED DYNAMIC RANGE g GAIN
FACTOR
SAFETY
WITH REDUCED SQNR Δ MARGIN function in HDR imaging [1]), which is used to weight each
αSM exposure when combining samples. We design a certainty
function as follows: (1) For ADC output level 0, the certainty
function is small but nonzero , to recognize that quantization
error overpowers the signal, and the only information gained
is the fact that the signal is between ADC levels ±1; (2) affine
increase starting from ADC 0, representing improving SQNR
where the signal overpowers the quantization noise; (3) ramp-
down near ±ADC maximum, to prevent sudden switchover to
another exposure and thus prevent sudden glitches in the case
of imperfect gain or bias calibration, (4) at ±ADC maximum,
certainty drops to zero to reject saturated signals (no informa-
tion can be gained as to how large the signal is beyond the
EXPOSURE 1 SATURATES ADC limits).
EXPOSURE 2 SATURATES Unlike HDR imaging, the result of applying the certainty
M=3 gΔ = 8
EXPOSURE 3 SATURATES function is now a time-varying certainty signal. Fig. 2 shows
Fig. 3: Wyckoff set consisting of three exposures of the same input the process of combining LDR exposures, using a separate
signal: a sinusoid which crescendos over a 60 dB range. This simple time-varying certainty signal for each exposure. Later, we
example illustrates a geometric series of gains, causing the exposures will see (in Fig. 7) the total certainty fluctuating, depending
to be evenly spaced (logarithmically) across a chosen dynamic range. on the total information available from each exposure.
Authorized licensed use limited to: Macquarie University. Downloaded on January 08,2022 at 09:16:40 UTC from IEEE Xplore. Restrictions apply.
q
x(t) = SINUSOID
4. OCCUPYING A DYNAMIC RANGE:
RANGE DENSITY FUNCTIONS RDF
t R (q)
x
We now examine HDR time-varying signals more closely.
Let us say “quantity” q when we refer to the instantaneous
value (voltage, current, etc.) of a signal x(t). Then, “ampli-
tude” A(t) strictly refers to the envelope (peak or RMS) of q
the quantities taken on by x(t) in one cycle of a waveform of y(t) = SAWTOOTH
for any values of q at which x(t) is differentiable and x0 (t) is 5. CHOOSING GAIN SPACING OF M EXPOSURES
nonzero1 . If x(t) holds steady at quantity q ? , a delta measure We would like to carefully choose the relationship between
occurs at Rx (q ? ). For example, this occurs twice for a square the M exposures that will be combined together to sample
wave: RSQUARE (q) = 21 δ(q−1)+ 12 δ(q+1). A sine wave leads an HDR signal. We examine the case where the signal is de-
to two asymptotes: RSINE (q) = π1 √ 1 2 , for q ∈ [−1, 1]. tected by a single sensing element, and each exposure is a
1−q
See Fig. 4. gained version of this sensed signal. We thus want to care-
In the frequency domain, the most pure waveform is the fully choose the spacing between the gains gm of each expo-
complex exponential, and a uniform distribution is created by sure (“exposure packing”).
a sinc function. In the amplitude domain, the most pure wave- To make HDR sampling robust for a variety of situations,
form is the square wave, and a uniform distribution is created we space the exposures in a geometric sequence to equally
by a sawtooth wave.2 (See Fig. 4.) cover all amplitudes available within the given dynamic range:
We further propose a Dynamic Range Density Function m−1
gm = g∆ g1 for m = 1...M (3)
(DRDF) which represents the proportion of the entire signal
1 This
where g∆ is the gain factor separating each exposure.
calculation is analogous to finding the PDF of a function of a ran-
dom variable having a uniform distribution over the phases [0, 2π].
If the input signal’s dynamic range is not occupied uni-
2 As well, an inverse sawtooth or triangular pattern would also suffice. formly (biased DRDF or RDF), then an uneven spacing of
3 This is a multidimensional optimization problem, for future work. the exposures may be optimal.3 However, we attempt here
Authorized licensed use limited to: Macquarie University. Downloaded on January 08,2022 at 09:16:40 UTC from IEEE Xplore. Restrictions apply.
to use a stringent set of requirements to create a generalized DYNAMIC RANGE OF
HDR system with demands on the full dynamic range (as rep- ORIGINAL SIGNAL HYPOTHESIZED
resented by uniform DRDF and RDF). D0 REGION
FOR SAFE HDR
We position the first exposure to capture the strongest ex-
pected input signal A0,MAX entirely within the sampling range ONE POSSIBLE
AADC,MAX , less a safety margin αSM : CHOICE OF
g4 (D , D )
EP 0
g1 · A0,MAX < AADC,MAX (1 − αSM ) (4)
g3
Define an “exposure packing” dynamic range, DEP , as the dy-
g2
namic range spanned by the M exposures, from the minimum
amplitude desired to be sensed by the most-gained exposure, g1
HDR
to the max. amplitude sensed by the least-gained exposure:
DEP EXPOSURE
DEP = HDR exposure packing dynamic range = ggM1 g∆ = g∆ M LOG−LOG PACKING
SCALE g 1 g 2 g 3 g 4 Q 4ADC DYNAMIC
A RANGE
D0 = Dynamic Range of original signal = A0,MAX 0,MIN
Authorized licensed use limited to: Macquarie University. Downloaded on January 08,2022 at 09:16:40 UTC from IEEE Xplore. Restrictions apply.
Input
Signal
Sampling
gives an
imperfect
representation
Reduced error
from composited
LDR exposures
Recovered
HDR Signal
Fig. 7: The proposed algorithm, after sampling a HDR computational test signal with M = 4 exposures, compositing the result. This simple
example, for clarity, has the sampling encumbered by very coarse quantization steps of 0.05 (visible on 2nd plot, upon close inspection).
Under these harsh conditions, the quantization error from one of the exposures is compared to the reduced error after reconstruction. We
show the reconstructed output error signal (pink), along with the error of one of the exposures (blue), for clarity. Even when subjected to
severe quantization in this test, the reduction in error is visible, as the algorithm consolidates information from all four exposures.
6. EVALUATING THE SYSTEM IN OPERATION An important consideration was “Dynamage Range”: the
We evaluated the proposed system first using a low-cost PC ratio between the amplitude leading to damaging the sensor
stereo sound card (M = 2). The system was able to sample a or ADC, and the amplitude of the smallest detectable signal.
101 dB dynamic range test signal4 (which is beyond what the For example, when we used a hydrophone to listen to water
device can capture on its own), producing a 21dB reduction flow in a pipe, with high gain, we needed to ensure that a
in THD, as compared to feeding the one signal into one input, strong impulse produced by the water-hammer effect would
as would otherwise conventionally be done. not damage the hydrophone or ADC.
This initial protoype successfully proved the ability to ex- Additional simulations tested operation with M = 4 gained
pand the dynamic range of a mediocre audio capture device, inputs. Fig. 7 shows how our technique operates on a computer-
without the time-varying artifacts associated with AGC. The generated input signal. Examining the graph titled “error sig-
audio capture device was a Realtek ALC880 (16-bit resolu- nals normalized”: the reduced error signal after HDR com-
tion, 44.1 kHz sampling), positing is plotted in the purple trace, reduced from the blue
The prototype used a simple circuit to gain each expo- trace, which illustrates how Exposure 1, alone, was only use-
sure, and electronically saturate the highly gained exposures ful for the largest amplitudes. (This figure shows very coarse
to avoid damaging the ADCs. This circuit was constructed quantization for illustration purposes.)
with series-connected fast recovery diodes, with two paral- Fig. 8 verifies operation over a wide variation in D0 and
lel chains to limit both polarities, thus limiting the signals to DEP . A triangle wave crescendo was used as an input, in order
approximately +/-1.4V just before entry into the ADCs. to test under full coverage of the dynamic range. That is, the
4 According to Eqn. 7, M = 2 exposures is sufficient for this signal, with triangular waveform gave a uniform RDF within each period,
78% exposure overlap. The test signal was generated by a hardware signal and the exponential crescendo gave a uniform DRDF (full,
generator, traversing the dynamic range in 16 logarithmically-equal steps. uniform amplitude coverage of the dynamic range).
Authorized licensed use limited to: Macquarie University. Downloaded on January 08,2022 at 09:16:40 UTC from IEEE Xplore. Restrictions apply.
Fig. 8: Verification of HDR reconstruction, and optimization of synchronized exposures across a space of possible dynamic ranges. The
hypothesized constraints are evidenced in the triangular region, visible with reduced error. While we wish to permit as wide an input dynamic
range (D0 ) as possible, it is clear that there is a tradeoff against reconstruction error. Therefore, as was explained in the text, we can retreat
D0 and DEP back from the edge of this triangle, or otherwise increase M , as desired to reduce reconstruction error.
Authorized licensed use limited to: Macquarie University. Downloaded on January 08,2022 at 09:16:40 UTC from IEEE Xplore. Restrictions apply.