You are on page 1of 8

David Bessell

Interdisciplinary Centre for Computer


Dynamic Convolution
Music Research (ICCMR)
Plymouth University Modeling, a Hybrid
Drake Circus, Plymouth
Devon PL4 8AA, UK Synthesis Strategy
David.Bessell@Plymouth.ac.uk
https://sites.google.com/site/davebessellmusic/home

Abstract: This article outlines a hybrid approach to the synthesis of percussion sounds. The synthesis method described
here combines techniques and concepts from physical modeling and convolution to produce audio synthesis of
percussive instruments. This synthesis method not only achieves a high degree of realism in comparison with audio
samples but also retains some of the flexibility associated with waveguide physical models. When the results are
analyzed, the method exhibits some interesting detailed spectral features that have some aspects in common with
the behavior of acoustic percussion instruments. In addition to outlining the synthesis process, the article discusses
some of the more creative possibilities inherent in this approach, e.g., the use and free combination of excitation and
resonance sources from beyond the realms of the purely percussive examples given.

Physical modeling offers the contemporary com- this the demands of computing, mathematics,
poser the promise of many interesting and desirable and associated fields, there comes a point even
technical and creative attributes. Conceptually it for the most gifted where the time involved and
straddles the worlds of traditional acoustic and elec- learning curve become problematic. This has led to
tronic composition, seeming to offer possibilities a noticeable division between technically oriented
to both. (As just one example, consider the virtual researchers and software developers, on the one
creation of hybrid instruments made from disparate hand, and practicing musicians and composers, on
components derived from real-world models but im- the other. The practical outcome of this is that
possible to actually build in practice.) It is perhaps important advances in physical modeling research
surprising, then, that physical modeling has not are generally not being utilized creatively in the
become more widely used in contemporary musical majority of new music pieces.
works to date. Might there be some barrier that is Currently there are a variety of levels at which
limiting the use of this technology by the wider a player or composer can engage with physical-
body of composers and instrumentalists? modeling technologies. There are (1) the pre-
To fully explore the potential offered by physical built commercial instruments with fixed inter-
modeling requires the acquisition of quite high-level face options, such as Applied Acoustics’ String
skills in computing, mathematics, and mechanical Studio (Applied Acoustics 2011); (2) the modu-
and electrical engineering. Allowing the composer lar systems with more or less pre-built building
to simulate conventional acoustic instruments—or blocks, exemplified by Native Instruments’ Reak-
to create new instruments—that are expressive, tor (Native Instruments 2011) and by Modalys
and that have specific timbral characteristics suited (http://forumnet.ircam.fr/701.html?&L=1) from the
to an individual piece, raises the question: At Institut de Recherche et Coordination Acoustique/
what technical level must a musician engage to Musique (IRCAM); (3) graphical programming
fully exploit the possibilities? It is desirable for environments, such as Cycling ‘74’s Max/MSP
a composer or player to have as much control (http://cycling74.com); (4) libraries for various text-
over the model as possible, to be truly able to based software languages, such as STK, the Synthesis
fit the instrument to the musical task at hand. ToolKit in C++ (Cook and Scavone 2004); and fi-
The acquisition of traditional musical skills is nally, (5) work from first principles implemented as
no trivial task in itself, however; if we add to mathematical expressions in text-based code, such
as the work of Stefan Bilbao (Bilbao 2009). Each of
these approaches falls somewhere along a contin-
Computer Music Journal, 37:1, pp. 44–51, Spring 2013
doi:10.1162/COMJ a 00159
uum between maximum ease of use and maximum

c 2013 Massachusetts Institute of Technology. flexibility. The approach proposed here, a variant

44 Computer Music Journal


on physical-modeling and convolution-synthesis snare-strike sample and 152 kB for the convolution
methods, offers a further option to this list. The impulse sample. This potential for data reduction
work focuses on percussive sounds in which an in approaches based on physical modeling has been
attack part (source) and a decay part (filter) are noted by Rauhala, Lehtonen, and Välimäki (2007).
obtained from recorded sounds. The amplitude of
the attack component is used to derive an envelope
for a noise source.
Limitations of Existing Techniques Used
in Isolation

Although each of the three fields mentioned earlier


A Hybrid Software Approach
has its own well-established body of techniques,
each also has its own strengths and limitations. For
Dynamic convolution modeling (DCM) is an ap-
example, convolution as normally implemented has
proach to realistic sound synthesis combining
a tendency to produce rather static aural results.
techniques and paradigms derived from the fields
Sampling likewise can only produce static snapshots
of physical modeling (Välimäki et al. 2006), real-
of audio, which require further manipulation to
time convolution (Stockham 1966; Gardner 1995),
render them more expressive in a musical context.
and audio sampling (Rabiner and Gold 1975).
Physical modeling can be powerfully expressive, but
Audio examples of the process in action were re-
at the expense of some fairly high-order complexity
alized with an implementation in Max/MSP. The
both in control and implementation. As noted,
Max/MSP implementation can be downloaded from
this complexity has led to some resistance to
https://sites.google.com/site/davebessellmusic/
its use among non-specialist musicians; note,
software-maxmsp.
for example, the relative commercial failure of
Software development in this project is ongoing,
Yamaha’s VL1 physical-modeling synthesizer, which
and this Max/MSP iteration is currently designed pri-
was launched in 1994 and was based on research
marily to recreate realistic percussion instruments
by Julius Smith (Smith 2005; Stanford University
that can be played from a standard MIDI keyboard,
2011). Dynamic convolution synthesis, as it is
although it already has some capabilities beyond
presented here, attempts to marry some of the
simple replication of realistic percussion. The mo-
conceptual and practical ease of use found in
tivation for attempting to create a hybrid synthesis
typical audio sampling software with some of the
method was to try to combine the strengths of some
realism of convolution filtering and reverberation
current staples of computer audio manipulation into
software, without entirely losing the flexibility and
a new expressive tool that could be easily managed
expressiveness of physical modeling.
by non-expert users or more traditionally oriented
composers while still maintaining sophisticated au-
dio and performance flexibility. During research into
a practical implementation of this concept, it was Precedents
discovered that, in comparison with the large and
somewhat unwieldy sample resources typically used A number of previous studies have elements in
for realistic and expressive drum emulation in com- common with the methods outlined here. There
mercial libraries, comparable audio results could be are precedents for the use of physical modeling for
achieved with significant data reduction while re- drums and percussion, such as the 2-D waveguide
taining a quite modest demand on CPU load. In com- percussion models by Van Duyne and Smith (1993),
parison to a well-known and popular drum sample li- the 3-D waveguides in the work of Laird (2001),
brary that uses 1.6 MB of data-compressed samples to and the “physically informed” percussion-synthesis
recreate a velocity-sensitive snare drum, the method approach of Perry Cook (Cook 1997). A number of
proposed here uses just 4 kB (uncompressed) for the approaches based on source-filter decomposition

Bessell 45
Figure 1. Structure of the
synthesis process.

have also been proposed (Sandler 1990; Laroche and


Meillier 1994). Further work on physically modeled
percussion encompasses both modal approaches
and finite difference models (Avanzini and Marogna
2010; Rabenstein, Koch, and Popp 2010; Bilbao
2012), and there are precedents for the use of noise
inputs in a physical-modeling context, such as
Karplus and Strong (1983). Demoucron (2008) and
Demoucron and Rasamimanana (2009) have pre-
sented a violin system that uses physical modeling
for the input combined with convolution. Early
work on convolution as a synthesis tool was done
by Curtis Roads (Roads 1993). Mackenzie, Kale, and
Cain (1995) investigated percussion synthesis using
a variety of filters excited by a noise source; in this
case, the result of the uniform noise excitation is The particular form it takes in the current article,
then amplitude-scaled after the filter stage. Simi- however, is a little different from that commonly
larly, Karjalainen et al. (2002) investigated the use of seen in waveguide models. In this case, a noise
autoregression, moving average, linear prediction, source is amplitude-modulated by an envelope
and frequency zooming in a similar context. Macon derived directly from an initial short transient
et al. (1998) implemented percussion synthesis using audio sample. (See the excitation sample, excitation
an all-pole model that emphasizes efficiency. envelope, and pink noise sections of Figure 1.) This
amplitude-modulated noise source is then used as
the excitation for an impulse response derived from
Component Parts an audio sample of a resonant body. (See Figure 1,
drum sample.) In order to generate this excitation
The hybrid synthesis approach proposed here can be envelope, the initial transient sample waveform is
broken down into a number of component parts, as first rectified. This process is performed in order
outlined here. to create a less colored result when the amplitude-
scaled noise source is convolved with the drum
resonance impulse. The excitation sample might
Physical Modeling typically be drum-beater strike noise generated by
the user; see the subsequent section on sampling
The principal element that the proposed hybrid for further details. The architecture of the whole
strategy takes from physical modeling is the concept process can be seen in Figure 1. This architecture
of separating the excitation source from the rest of draws generally on ideas found in the work of
the process. In particular, techniques of noise exci- Välimäki and Smith but is implemented here in
tation seen in waveguide model implementations a way specific to this application; more on the
(Smith 2006) were the starting point. implications of this later.

the excitation wavetable signals [were] obtained


by inverse filtering (deconvolution) of the Convolution
recorded sound by the SDL response. For
practical synthesis, only the initial transient Developments in partitioned convolution im-
part of the inverse-filtered excitation is used, plementations (Gardner 1995; Stockham 1996)
typically covering several tens of milliseconds combined with increases in computing speed now
(Välimäki et al. 2006). mean that complex filter responses such as those

46 Computer Music Journal


associated with resonant objects can realistically be source for the amplitude envelope used to shape
implemented as part of a real-time process. In the the noise excitation for the convolution stage. This
Max/MSP implementation utilized here, an external second sample is created by sampling a damped
object written by Alex Harker (Harker and Trem- strike on the resonant body used for the first
blay 2012) is used. This implements in Max/MSP sample. Alternative strike samples derived from
a similar strategy for real-time convolution as that impacts on other resonant bodies using a variety of
outlined by Gardner (1995). The Harker external beater materials and attack characters can also be
takes advantage of Macintosh-specific code to give substituted, for a wider range of creative results like
an efficient realization for this application. (Com- those normally associated with physical modeling.
mercial PC alternatives such as SIR2 or Freeverb3 As one of our primary aims was user-friendliness,
could be substituted in this context.) techniques such as deconvolution as a possible
An overview of the real-time convolution process means of deriving strike samples from a single
that the Harker external uses is as follows. The overall source sample were not used. Combining
sample to be used as the impulse response is strike samples deconvolved from the same sample
partitioned, and the first partition is convolved in as that of the resonant body would also have the
the time domain using a process similar to that used potential to introduce unwanted phase-related
by the Max/MSP object buffir∼ (a finite-impulse- coloring into the final audio result and would have
response, or FIR, filter). An FIR filter convolves an an associated increase in CPU usage. The two
input signal with an impulse response, and this samples—the resonant impulse and the beater strike
can be implemented in a process known as direct sample to be used as excitation envelope—are all
convolution. Direct convolution performed in the that is required to create a sense of realistic dynamic
time domain has no inherent latency but comes behaviors. In an informal test, a small sample of
with the disadvantage of a high computational cost. expert listeners (five) were consulted to confirm
This means that long convolutions such as the the subjective audio-quality equivalence between
ones needed for the DCM process are not viable the results of the DCM process and conventional
in real time using just FIR filters. In this case, commercial sample libraries. Readers can judge for
however, only the short first partition uses direct themselves by listening to audio samples created
convolution. The audio output of this first partition with this process at http://soundcloud.com/dave
is therefore available with zero latency. Playback of -bessell/drum-synthesis. [Editor’s note: The sound
this first-partition audio allows time for the delay examples will also appear on the DVD accompanying
inherent in the frequency-domain convolution of the Vol. 37, No. 4 (Winter 2013).]
second partition to elapse, and the second-partition
audio output can then be appended to that of the
first partition. This in turn allows time for the The Sum of the Parts
longer delay associated with the larger fast-Fourier-
transform block size of the third partition, and so Although the techniques associated with sampling,
on, in a cascade of just-in-time calculations. convolution, and physical modeling have many
well-understood aspects, the particular configu-
ration suggested in this article gives some useful
Sampling results when synthesizing resonant bodies such as
drums.
The current software implementation takes as its
starting point an audio sample of the drum, or
indeed any resonant body that is to be synthesized. Features
This “reverberant body” sample is used as the
impulse response for the convolution stage. Then Combining the elements outlined previously allows
a second transient “strike” sample is needed as a the user control over the following aspects. Separate

Bessell 47
samples can be loaded for the resonant-body impulse A digital filter model for the soundboard has
and the beater strike. Audio output is triggered by been designed based on recorded bridge impulse
MIDI note input, and the user can specify the level responses of the harpsichord. The output of the
of variability and liveliness at the spectral level. string models is injected into the soundboard
This control of variability in consecutive strikes filter that imitates the reverberant nature of the
can be achieved by manipulating various factors, soundbox . . . (Rauhala et al. 2007).
including the pitch of the strike-sample playback
and the balance between the shaped noise excitation The excitation sample could be manipulated in
and a more conventional single-sample excitation ways that give results comparable to those associated
impulse. MIDI velocity response is implemented by with varying the hardness of the material that is used
scaling the amplitude of the strike envelope and the to excite the resonant body in waveguide models.
frequency cutoff of a one-pole low-pass filter on the This is not implemented in this particular example,
output of this envelope (see Figure 1). though, as it was found that more-convincing results
were obtained by just using a sample of a different
beater material in the first place.
Creative Flexibility

Interestingly, this architecture retains some of the Variability of Spectral Detail


flexibility and creative possibilities of the type of
physical model with which it shares some tech- As a consequence of the noise excitation method
niques. A variety of different beater samples can be used, the audio output also exhibits some properties
used to “strike” the drum impulse. This variety of normally not associated with conventional sam-
strike excitations is by no means as flexible as the pling. For instance, each audio response, although
hybrid acoustic/synthesized system proposed in the broadly similar to the others, is nevertheless unique
work of Roberto Aimi (Aimi 2007), or the some- in its micro detail at the spectral level. For an ex-
what similar Korg Wavedrum commercial product. ample, see Figures 2 and 3 and compare the spectral
The system proposed here is primarily aimed at a detail of two consecutive gong strikes created with
user group oriented towards sampling technology the DCM software.
(rather than the live percussion players of Aimi’s This low-level spectral variability allows the
study), and for this user group the flexibility is possibility of using multiple simultaneous layers
extended. The strike excitations can also include of audio without the obtrusive phase-cancellation
nonrealistic combinations such as the pizzicato artifacts associated with static audio samples. Sim-
gong hybrid heard at http://soundcloud.com/dave ilarly, multiple consecutive repeats of a sound are
-bessell/pizzgong, in which a pizzicato sample is each unique, obviating the need for the “round
used in place of a conventional percussion beater robin” sample strategies typically found in com-
as a noise envelope for a gong sample. In this case, mercial sample libraries and playback engines.
the pizzicato excitation sample is created by record- One desirable aspect of conventional sampling and
ing a heavily damped pizzicato on a violin string. convolution applications that is maintained, how-
The MIDI control on the particular iteration of ever, is the sense of convincing audio realism, to
the software used to create this example is some- the extent that it is almost impossible in some
what more elaborate than the examples presented contexts to distinguish the result from an actual
so far. recording of the original drum or resonant body.
This kind of creative flexibility was one of the Particularly good results were obtained for this
primary design aims for this approach. The general implementation with metallic percussion such
approach is similar to that outlined by Rauhala as cymbals or gongs. Accompanying audio ex-
et al. (2007), although the method of excitation is amples (http://soundcloud.com/dave-bessell/drum
somewhat different. -synthesis) illustrate DCM synthesis of a variety of

48 Computer Music Journal


Figure 2. DCM gong
strike 1.

percussion sounds: snare, gong, cymbals, tom-tom, likely to be too small to be significant, but it could
orchestral bass drum, and drumsticks. easily be rectified by velocity-switching to a second
set of samples for the low-velocity extreme.
This strategy of placing the velocity sensitivity
Velocity Response outside the convolution process is conceptually
analogous to procedures followed in commuted
A further consequence of the combination of noise waveguide models (Smith 2010) and, combined
excitation with the velocity-sensitivity imple- with the noise excitation, conveys a convincing
mentation (see MIDI velocity, low-pass filter, and sense of dynamic response in the otherwise static
amplitude in Figure 1), which is used as the input for convolution output.
the convolution resonance section, is that to a great Notwithstanding some minor novel aspects in
degree the static nature of conventional convolution the implementation of the noise excitation, the
is not apparent in the final result. Providing that the individual elements mentioned previously are
original drum sample used as the impulse response largely well-established techniques. The particular
is sampled from a loud (fff) drum hit, a high degree configuration presented in this work, however,
of realism in the velocity response can be achieved. exhibits some surprisingly realistic, expressive, and
There are some small caveats to this aspect: A re- flexible audio attributes along with a significant
sponse that is “louder” than the original resonance reduction in data storage, for results that can exceed
sample cannot be created, and at extremely low the realism of a conventional static sample. Typical
volumes there is some small subjective deviation CPU load figures for this Max/MSP implementation
from the response of the actual drum in the real on a 2011 MacBook Pro for a single drum are in
world. In most practical musical situations this is the range of 1.5 to 2 percent. In short, the whole

Bessell 49
Figure 3. DCM gong
strike 2.

is considerably more than the sum of the parts without specialist knowledge, while maintaining
in terms of achievable audio performance and a high degree of audio realism when compared to
user-friendliness for non-specialist users. conventional sampling. Some of the creative aspects
of conventional physical modeling techniques are
maintained, such as the possibilities for creating
expressive new hybrid instruments.
Outcomes and Areas for Further Research
There is the potential for further research into References
extending this technique to wind and string in-
struments, but already this implementation allows Aimi, R. M. 2007. “Hybrid Percussion: Extending Physical
the free mixing of any resonant-body response Instruments Using Sampled Acoustics.” PhD disserta-
with any percussive-excitation response. Crucially, tion, Media Arts and Sciences, Massachusetts Institute
these resonant bodies and percussive excitations of Technology.
can easily be created by the user, thus in theory Applied Acoustics. 2011. “String Studio Overview.”
allowing greater flexibility than modular physical http://www.applied-acoustics.com/stringstudio/
overview/. Accessed 8 January.
modeling approaches that rely on a menu of pre-
Avanzini, F., and R. Marogna. 2010. “A Modular Physically
created building blocks. The possible excitation
Based Approach to the Sound Synthesis of Membrane
sources include just about anything that can strike, Percussion Instruments.” IEEE Transactions on Audio,
pluck, or scrape, and it is possible to imagine a more Speech, and Language Processing 18(4):891–902.
abstract approach beyond the conventional, such as Bilbao, S. 2009. Numerical Sound Synthesis: Finite Dif-
excitation by water droplets. The user interface can ference Schemes and Simulation in Musical Acoustics.
be made conceptually easy to manage, even for those Chichester, UK: John Wiley and Sons.

50 Computer Music Journal


Bilbao, S. 2012. “Time Domain Simulation and Sound Syn- All-Pole Model.” In Proceedings of the IEEE Interna-
thesis for the Snare Drum.” Journal of the Acoustical tional Conference on Acoustics, Speech, and Signal
Society of America 131(1):914–925. Processing, pp. 3589–3592.
Cook, P. R. 1997. “Physically Informed Sonic Modeling Native Instruments. 2011. “Reaktor 5.”
(PhISM): Synthesis of Percussive Sounds.” Computer http://www.native-instruments.com/#/en/products/
Music Journal 21(3):38–49. producer/reaktor-5/. Accessed 9 January 2012.
Cook, P. R., and G. P. Scavone. 2004. “The Synthesis Rabenstein, R., T. Koch, and C. Popp. 2010. “Tubular Bells:
ToolKit (STK) in C++.” In K. Greenbaum, ed. Audio A Physical and Algorithmic Model.” IEEE Transactions
Anecdotes: A Cookbook of Audio Algorithms and on Audio, Speech, and Language Processing 18(4):881–
Techniques. Natick, Massachusetts: A.K. Peters, pp. 890.
237–253. Rabiner, L. R., and B. Gold. 1975. Theory and Application
Demoucron, M. 2008. “On the Control of Virtual Violins: of Digital Signal Processing. Englewood Cliffs, New
Physical Modelling and Control of Bowed String Jersey: Prentice-Hall.
Instruments.” PhD thesis, L’Université Pierre et Marie Rauhala, J., H. M. Lehtonen, and V. Välimäki. 2007. “To-
Curie and the Royal Institute of Technology (KTH ward Next-Generation Digital Keyboard Instruments.”
Stockholm). IEEE Signal Processing Magazine 24(2):12–20.
Demoucron, M., and N. Rasamimanana. 2009. “Score Roads, C. 1993. “Musical Sound Transformation by
Based Real-Time Performance with a Virtual Violin.” Convolution.” In Proceedings of the International
In Proceedings of the 12th International Conference on Computer Music Conference, pp. 102–109.
Digital Audio Effects (DAFx-09). ID 38. Sandler, M. 1990. “Analysis and Synthesis of Atonal
Gardner, W. G. 1995. “Efficient Convolution without Percussion Using High Order Linear Predictive Coding.”
InputOutput Delay.” Journal of the Audio Engineering Applied Acoustics 30(2–3):247–264.
Society 43(3):127–136. Smith, J. O. 2005. “Physical Audio Signal Processing
Harker, A., and P. A. Tremblay. 2012. “The HISSTools Im- for Virtual Musical Instruments and Audio Effects.”
pulse Response Toolbox: Convolution for the Masses.” https://ccrma.stanford.edu/∼jos/pasp05/. Last modified
In Proceedings of the International Computer Music December 2005; accessed 12 December 2010.
Conference, pp. 148–155. Smith, J. O. 2006. “A Basic Introduction to Digital
Karjalainen, M., et al. 2002. “Frequency-Zooming ARMA Waveguide Synthesis (for the Technically Inclined).”
Modeling of Resonant and Reverberant Systems.” https://ccrma.stanford.edu/∼jos/swgt/swgt.html. Last
Journal of the Audio Engineering Society 50(12):1012– modified February 2006; accessed 13 December
1029. 2010.
Karplus, K., and A. Strong. 1983. “Digital Synthesis of Smith, J. O. 2010. “Commuted Waveguide Synthesis.”
Plucked-String and Drum Timbres.” Computer Music https://ccrma.stanford.edu/∼jos/cs.html. Accessed 12
Journal 7(2):43–55. December 2010.
Laird, J. 2001. “The Physical Modelling of Drums Using Stanford University. 2011. “Patents and Applications.”
Digital Waveguides.” PhD dissertation, Electrical and www.sondiusxg.com/patent.html. Accessed 2 February
Electronic Engineering, Bristol University, UK. 2011.
Laroche, J., and J. L. Meillier. 1994. “Multichannel Stockham, T. G. 1966. “High-Speed Convolution and
Excitation/Filter Modeling of Percussive Sounds with Correlation.” In AFIPS Proceedings 1966 Spring Joint
Application to the Piano.” IEEE Transactions on Speech Computer Conference, Vol. 28, pp. 229–233.
and Audio Processing 2(2):329–344. Välimäki, V., et al. 2006. “Discrete-Time Modelling of
Mackenzie, J., I. Kale, and G. D. Cain. 1995. “Applying Bal- Musical Instruments.” Reports on Progress in Physics
anced Model Truncation to Sound Analysis/Synthesis 69(1):1–78.
Models.” In Proceedings of the International Computer Van Duyne, S. A., and J. O. Smith. 1993. “Physical
Music Conference, pp. 400–403. Modeling with the 2-D Digital Waveguide Mesh.”
Macon, M., et al. 1998. “Efficient Analysis/Synthesis In Proceedings of the International Computer Music
of Percussion Musical Instrument Sound Using an Conference, pp. 40–47.

Bessell 51

You might also like