Professional Documents
Culture Documents
Abstract: Music information retrieval (MIR) has a great potential in musical live coding because it can help the
musician–programmer to make musical decisions based on audio content analysis and explore new sonorities by
means of MIR techniques. The use of real-time MIR techniques can be computationally demanding and thus they have
been rarely used in live coding; when they have been used, it has been with a focus on low-level feature extraction.
This article surveys and discusses the potential of MIR applied to live coding at a higher musical level. We propose a
conceptual framework of three categories: (1) audio repurposing, (2) audio rewiring, and (3) audio remixing. We explored
the three categories in live performance through an application programming interface library written in SuperCollider,
MIRLC. We found that it is still a technical challenge to use high-level features in real time, yet using rhythmic and
tonal properties (midlevel features) in combination with text-based information (e.g., tags) helps to achieve a closer
perceptual level centered on pitch and rhythm when using MIR in live coding. We discuss challenges and future
directions of utilizing MIR approaches in the computer music field.
Live coding in music is an improvisation practice retrieval of sounds from both crowdsourced and
based on generating code in real time by either personal databases.
writing it directly or using interactive programming This article explores the challenges and opportu-
(Brown 2006; Collins et al. 2007; Rohrhuber et al. nities that MIR techniques can offer to live coding
2007; Freeman and Troyer 2011). Music information practices. Its main contributions are: a survey re-
retrieval (MIR) is an interdisciplinary research field view of the state of the art of MIR in live coding;
that targets, generally speaking, the analysis and a categorization of approaches to live coding using
retrieval of music-related information, with a focus MIR illustrated with examples; an implementation
on extracting information from audio recordings in SuperCollider of the three approaches that are
(Lerch 2012). Applying MIR techniques to live coding demonstrated in test-bed performances; and a dis-
can contribute to the process of music generation, cussion of future directions of real-time computer
both creatively and computationally. A potential music generation based on MIR techniques.
scenario would be to create categorizations of audio
streams and extract information on timbre and
Background
performance content, as well as drive semiautomatic
audio remixing, enabling the live coder to focus on
In this section, we overview the state of the art of
high-level musical properties and decisions. Another
live coding environments and MIR related to live
potential scenario is to be able to model a high-level
performance applications.
music space with certain algorithmic behaviors and
allow the live coder to combine the semi-automatic
Xambó et al. 9
coding in music, will be used interchangeably in MIR in commercial software. The present article
this article. Programming languages that have been aims to help fill that gap.
used for live coding include ChucK (Wang 2008),
Extempore (previously Impromptu; Sorensen 2018),
FoxDot (Kirkbride 2016), Overtone (Aaron and Real-Time Feature Extraction Tools for Live
Blackwell 2013), Sonic Pi (Aaron and Blackwell Coding Environments
2013), SuperCollider (McCartney 2002), TidalCycles
(McLean and Wiggins 2010), Max and Pd (Puckette In this section, we present a largely chronological
2002), and Scratch (Resnick et al. 2009). With and functional analysis of existing MIR tools for live
the advent and development of JavaScript and Web coding and interactive programming to understand
Audio, a new generation of numerous Web-based live the characteristics and properties of the features
coding programming languages has emerged, e.g., supported, identify the core set of features, and
Gibber (Roberts and Kuchera-Morin 2012). Visual discuss whether the existing tools satisfy the
live coding is also an emerging practice, notably requirements of live coders.
with Hydra (https://github.com/ojack/hydra) and Nearly all MIR systems extract a certain interme-
Cyril (http://cyrilcode.com/). diate feature representation from the input audio and
use this representation for inference—for example,
classification or regression. Although still an open
Commercial Software Using MIR debate, there is some common ground in the MIR
for Live Performance literature about the categorization of audio features
between low-, mid-, and high-level. In an earlier
Real-time MIR has been investigated and imple- publication, the second author referred to low-level
mented in mainstream software ranging from: digi- features as instantaneous features, extracted from
tal audio workstations (DAWs) such as Ableton Live; small blocks of audio and describing various, mostly
karaoke software such as Karaoki (http://www.pcdj technical properties, such as the spectral shape,
.com/karaoke-software/karaoki/) and My Voice intensity, or a simple characterization of pitched
Karaoke (http://www.emediamusic.com/karaoke content (Lerch 2012). Serra et al. (2013) refer to
-software/my-voice-karaoke.html); DJ software such tonal and temporal descriptors as midlevel fea-
as Traktor (https://www.native-instruments.com tures. High-level features usually describe semantic
/en/products/traktor/), Dex 3 (http://www.pcdj.com characteristics, e.g., emotion, mood, musical form,
/dj-software/dex-3/), and Algoriddim’s Djay Pro and style (Lerch 2012), or overall musical, cultural,
2 (https://www.algoriddim.com/); and song re- and psychological characteristics, such as genre,
trieval or query-by-humming applications such as harmony, mood, rhythm, and tonality (Serra et al.
Midomi (https://www.midomi.com/) and Sound- 2013). The distinction between low-, mid-, and
Hound (https://soundhound.com/). Typically, these high-level audio features is used here because it
solutions include specialized MIR tasks, such as provides a sufficient level of granularity to be useful
pitch tracking for the karaoke software, and beat to the live coder.
and key detection for the DJ software. In these ap- Various audio feature extraction tools for live
plications, the MIR tasks are focused on high-level coding have existed for some time. In this section,
musical information (as opposed to low-level feature we present an overview of existing tools for the three
extraction) and thus inspire this research. There are environments ChucK (Wang 2008), Max (Puckette
several collaborations and conversations between 2002), and SuperCollider (McCartney 2002), because
industry and research looking at real-time MIR of their popularity, extensive functionality, and
applied to composition, editing, and performance maturity. The findings and conclusions, however,
(Bernardini et al. 2007; Serra et al. 2013). To our are generalizable to other live coding environments.
knowledge, there has been little research on the A illustrative selection of MIR tools for analysis is
synergies between MIR in musical live coding and listed in Table 1.
Included Unit Analyzer (UAna) Wang, Fiebrink, and Cook (2007) ChucK
fiddle∼,bonk∼ and sigmund∼ objects Puckette, Apel, and Zicarelli (1998) Max
Built-in UGens for machine listening SuperCollider Community SuperCollider
Third-party SMIRK Fiebrink, Wang, and Cook (2008) ChucK
LibXtract Jamie Bullock (2007) Max, SuperCollider
Zsa.Descriptors Malt and Jourdan (2008) Max
analyzer∼ object Tristan Jehan Max
Mubu Schnell et al. (2009) Max
SCMIR Nick Collins (2011) SuperCollider
Freesound quark Gerard Roma SuperCollider
Figure 1 shows a timeline of the analyzed MIR Considering these most popular features, it is
tools from 1990s to present, which provides a histor- noticeable that the majority are low-level (instan-
ical and technological perspective of the evolution taneous) features (e.g., RMS or spectral centroid).
of the programming environments and dependent These features have been widely used in audio con-
libraries. As shown in Figure 2, the live coding tent analysis—for example, to classify audio signals.
environment that includes most audio features is In live coding, they can be helpful to determine
SuperCollider, and the third-party libraries with audiovisual changes in real time, yet the amount
the largest number of features are LibXtract and of data can be overwhelming and most of the ex-
Freesound quark, the latter as a wrapper of the tracted features are not musically meaningful (e.g.,
Essentia framework. Figure 3 shows the distribution how to interpret continuous floating-point values
of the audio features over six categories (statis- of MFCCs). Midlevel features, such as onsets and
tical, envelope, intensity, timbre, temporal, and tonality, can be interpreted more easily and can be
tonal), which are inspired by Essentia’s organization used to characterize rhythmic and tonal properties,
(Bogdanov et al. 2013) and Lerch’s (2012) feature allowing the live coder to make content-driven de-
classification. cisions (e.g., a threshold value can indicate whether
The timbre audio features are the greatest in or not to trigger a percussive sound depending on
number (43 percent of the total). The most popu- the onset values).
lar spectral features are spectral centroid, spectral It is noteworthy that the real-time MIR tools
rolloff, spectral spread, and Mel frequency cepstral presented here provide the coder only with few
coefficients (MFCCs), followed by the less popular midlevel features and show a lack of high-level
but still prominent spectral flatness and spectral features (Figure 2). As later discussed in this article,
flux. The second category with stronger representa- the use of high-level features (e.g., mood or genre)
tion (16 percent of the total) consists of statistical requires a longer time scope, and systems extracting
audio features, with zero-crossing rate the most high-level features are often prone to error (e.g.,
frequently implemented. The more commonly sup- see the semantic-gap problem between low-level
ported intensity audio features are loudness and root descriptors and semantic descriptors described by
mean square (RMS). In temporal audio features, beat Schedl et al. [2014]). This may affect the performance
tracker and onset detection are the most present. negatively, compared with the immediacy and ease
Finally, there is a considerable amount of tonal of use of instantaneous low-level features. At the
audio features (15 percent of the total), where pitch same time, if the technical issues were solved,
(F0) is the only feature supported extensively. high-level information could increase the richness
Xambó et al. 11
Figure 1. Timeline of live Figure 2. Functional Color version of the figure
coding programming analysis of MIR tools for online at
languages and dependent live coding: Distribution of www.mitpressjournals.org
MIR tools. Color version of audio features by number /doi/suppl/10.1162
the figure online at against low-level and /COMJ a 00484.
www.mitpressjournals.org midlevel audio features.
/doi/suppl/10.1162/COMJ a 00484.
Figure 1
Figure 2
and musicality of the live coding session as the perspective of, for example, the overall musical
live coder could make decisions from a musical form.
Conceptual Framework on MIR in Live Coding application programming interface (API) library
written in SuperCollider that is designed to provide
The conceptual framework presented here struc- a musical approach to using MIR techniques in live
tures MIR approaches for live coding into three coding.
nonexclusive categories: (1) audio repurposing, by
which we mean the analysis, retrieval, and manipu-
lation of audio clips from a sound or music database; Audio Repurposing
(2) audio rewiring, by which we mean the real-time
analysis of an audio input signal, which can be Since the advent of the social Web, sound sharing
either an external source or the output of the system and retrieval from online databases (e.g., ccMixter,
itself; and (3) audio remixing, by which we mean a Freesound, Looperman) have become increasingly
system supporting musically meaningful remixing popular (Font, Roma, and Serra 2017). Sounds are
decisions semiautonomously. identified and retrieved by parsing either user-
For each of the three categories, we present a annotated data (e.g., tags, annotations, descriptions)
prototype. The aim is to provide a conceptual foun- or automatically extracted data. Font, Roma, and
dation that helps us to discuss future possibilities. Serra identify two types of queries in audio retrieval
The three prototypes have been developed as three based on features: (1) queries by range of a property,
separate modules that constitute Music Informa- that is, retrieval of sounds that match a given
tion Retrieval for Live Coding (MIRLC, available interval or threshold filtered by one or multiple
at http://github.com/axambo/MIRLC). This is an audio features and (2) queries by similarity, i.e.,
Xambó et al. 13
Figure 4. Block diagram of
audio repurposing.
retrieval of similar sounds from a given example. An time-consuming offline analysis with ad hoc access
instance of the former is giving a range of bpm values to precomputed features and metadata from the
to find specific rhythmic sounds. The latter has been audio clips (Serra et al. 2013). Typically, the offline
notably explored for sound-synthesis techniques analysis uses a feature-extractor program and stores
such as musical mosaicking (Zils and Pachet 2001), the information of each audio file as a single vec-
concatenative sound synthesis (Schwarz 2007), or tor containing the aggregated audio features (Font,
mashups (Serra et al. 2013) by retrieving short Roma, and Serra 2017). The live coding language
sounds similar to a reference. Other useful queries deals with low-level operations of real-time audio
are by category, e.g., instrument or genre. processing, such as loading sounds in buffers, which
Figure 4 illustrates a block diagram of the audio implies processing blocks of audio samples in real
repurposing approach in live coding, where the time.
live coder can retrieve sounds from either online
databases or local databases. The retrieved sounds
Examples of Audio Repurposing
will be processed and integrated into the perfor-
mance using live coding techniques. For efficient Current examples of audio repurposing in live
audio retrieval in quasi–real time, a common prac- coding include the retrieval of audio clips, usu-
tice is to analyze the sound database in advance and ally by textual queries from personal or online
store its feature representation, thus combining the databases, as shown in live coding sessions with
Xambó et al. 15
Figure 5. The first author online at
live coding with MIRLC at www.mitpressjournals.org
the Noiselets /doi/suppl/10.1162
microfestival, 8 January /COMJ a 00484.
2017, Barcelona. Color (Photo by Helena Coll.)
version of the figure
way. The live coder has a range of options to In this case study, the role of live coding is
retrieve sounds, including mid- and high-level to query, browse, and control the audio clips
content-based queries (e.g., duration, bpm, pitch, in real time. Live coding sends out high-level
key, or scale) and text-based queries (i.e., tags). textual (e.g., tags) and content-based queries (e.g.,
Sounds can be retrieved in groups, which facilitates pitch, bpm, key, or scale), the latter based on MIR
the creation of conceptually related sound groups information, with the intent of crafting a coherent
based on similarity, rhythm, or pitch, among sound palette. Using MIR techniques removes the
other criteria. Each sound is played in a loop, and requirement of individually knowing each sound
the groups can be played either simultaneously to create a homogeneous and coherent sound
or sequentially. This provides different levels of pool. The combination of metadata with audio
musical granularity. The main functionalities content analysis provides flexibility and variation
include asynchronous management of multiple to the performance. It is possible to search for
sounds by a single query or operation; human-like sounds based on different criteria, such as rhythm,
queries by content, similarity, tag, filter, sound ID, melody, duration, and harmony. The use of the
and sounds chosen at random; and an architecture similarity descriptor can give musically consistent,
to play with the groups of sounds either in sequence yet unpredictable results, which demonstrates that
or in parallel. Figure 6 shows an example of the defining musical similarity is a nontrivial task
code. (Lerch 2012). As future work, the MIR processes
// Instantiation
a = MIRLCRep.new
b = MIRLCRep.new
r.play
that are taking place could be made more visible to Audio Rewiring
both the audience and the live coder using textual
feedback, following the notion of “showing the The use of an audio stream as an input is a common
screen” in live coding. Next steps of interest include practice in interactive computer music and machine
the use of content-based features at even higher listening (Chadabe 1984; Rowe 1993, 2001). The
levels (e.g., mood or genre) to be combined with advent of sufficiently fast computers and suitable
text-based queries. In the current version, the use of software has made feature analysis of the audio input
tags has been an effective workaround (e.g., “happy,” signal possible in real time. Mapping the analysis
“sad,” “angry,” “excited,” “techno,” “drumbeat,” results to sound processing parameters opens a
or “dancehall”). range of creative possibilities for both studio and
Xambó et al. 17
Figure 7. Block diagram of
audio rewiring.
Xambó et al. 19
Figure 9. Example of code
of the audio-rewiring
prototype.
// Instantiation
a = MIRLCRew.new( 1, "afro-beat-6-8-toms.wav" )
Xambó et al. 21
Figure 11. Geolocation
diagram of a conjoint
performance between
Female Laptop Orchestra
(FLO) and Women in
Music Tech (WiMT).
Xambó et al. 23
Bullock, J. 2007. “Libxtract: A Lightweight Library for Latina, C. 2016. “Machine Listening Eurorack Module.”
Audio Feature Extraction.” In Proceedings of the Master’s thesis, Georgia Institute of Technology.
International Computer Music Conference, vol. 2, Lee, S. W., J. Bang, and G. Essl. 2017. “Live Coding
pp. 25–28. YouTube: Organizing Streaming Media for an Audiovi-
Chadabe, J. 1984. “Interactive Composing: An Overview.” sual Performance.” In Proceedings of the International
Computer Music Journal 8(1):22–27. Conference on New Interfaces for Musical Expression,
Chafe, C., J.-P. Cáceres, and M. Gurevich. 2010. “Effect of pp. 261–266.
Temporal Separation on Synchronization in Rhythmic Lee, Y. S., et al. 2018. “Demo Hour.” Interactions
Performance.” Perception 39(7):982–992. 25(5):10–13.
Collins, N. 2002. “The BBCut Library.” In Proceedings Lerch, A. 2012. An Introduction to Audio Content
of the International Computer Music Conference, Analysis: Applications in Signal Processing and Music
pp. 313–316. Informatics. Hoboken, NJ: Wiley-IEEE Press.
Collins, N. 2006. “BBCut2: Integrating Beat Tracking and Malt, M., and E. Jourdan. 2008. “Zsa. Descriptors:
On-the-Fly Event Analysis.” Journal of New Music A Library for Real-Time Descriptors Analysis.” In
Research 35(1):63–70. Proceedings of the 5th Sound and Music Computing
Collins, N. 2011. “SCMIR: A SuperCollider Music Conference, pp. 134–137.
Information Retrieval Library.” In Proceedings of the McCartney, J. 2002. “Rethinking the Computer Music
International Computer Music Conference, pp. 499– Language: SuperCollider.” Computer Music Journal
502. 26(4):61–68.
Collins, N. 2015. “Live Coding and Machine Listening.” McLean, A., and K. Sicchio. 2014. “Sound Choreography
In Proceedings of the International Conference on Live < > Body Code.” In Proceedings of the Conference on
Coding, pp. 4–11. Computation, Communication, Aesthetics, and X,
Collins, N., et al. 2007. “Live Coding Techniques for pp. 355–362.
Laptop Performance.” Organised Sound 8(3):321–330. McLean, A., and G. Wiggins. 2010. “Tidal: Pattern
Davies, M. E. P., et al. 2014. “AutoMashUpper: Automatic Language for the Live Coding of Music.” In Proceedings
Creation of Multi-Song Music Mashups.” IEEE/ACM of the Sound and Music Computing Conference.
Transactions on Audio, Speech, and Language Process- Available online at zenodo.org/record/849841/files
ing 22(12):1726–1737. /smc 2010 067.pdf. Accessed December 2018.
Essl, G. 2010. “Ursound: Live Patching of Audio and Olowe, I., et al. 2016. “FEATUR.UX: Exploiting Multitrack
Multimedia Using a Multi-Rate Normed Single-Stream Information for Artistic Visualization.” In Proceedings
Data-Flow Engine.” In Proceedings of the International of the International Conference on Technologies for
Computer Music Conference, pp. 534–537. Music Notation and Representation, pp. 157–166.
Fiebrink, R., G. Wang, and P. Cook. 2008. “Support for Ordiales, H., and M. L. Bruno. 2017. “Sound Recycling
MIR Prototyping and Real-Time Applications in the from Public Databases: Another BigData Approach to
ChucK Programming Language.” In Proceedings of the Sound Collections.” In Proceedings of the International
9th International Conference on Music Information Audio Mostly Conference, paper 48.
Retrieval, pp. 153–158. Puckette, M. 2002. “Max at Seventeen.” Computer Music
Font, F., G. Roma, and X. Serra. 2017. “Sound Sharing Journal 26(4):31–43.
and Retrieval.” In T. Virtanen, M. D. Plumbley, and Puckette, M. S., T. Apel, and D. D. Zicarelli. 1998.
D. Ellis, eds. Computational Analysis of Sound Scenes “Real-Time Audio Analysis Tools for Pd and MSP.”
and Events. Cham, Switzerland: Springer International In Proceedings of the International Computer Music
Publishing, pp. 279–301. Conference, pp. 109–112.
Freeman, J., and A. V. Troyer. 2011. “Collaborative Textual Resnick, M., et al. 2009. “Scratch: Programming for All.”
Improvisation in a Laptop Ensemble.” Computer Music Communications of the ACM 52(11):60–67.
Journal 35(2):8–21. Roberts, C., and J. Kuchera-Morin. 2012. “Gibber: Live
Kirkbride, R. 2016. “FoxDot: Live Coding with Python Coding Audio in the Browser.” In Proceedings of the
and SuperCollider.” In Proceedings of the International International Computer Music Conference, pp. 64–69.
Conference on Live Interfaces, pp. 193–198. Roberts, C., and G. Wakefield. 2017. “gibberwocky: New
Knotts, S. 2016. “Algorithmic Interfaces for Collaborative Live-Coding Instruments for Musical Performance.” In
Improvisation.” In Proceedings of the International Proceedings of the 17th International Conference on
Conference on Live Interfaces, pp. 232–237. New Interfaces for Musical Expression, pp. 121–126.
Xambó et al. 25