You are on page 1of 15

A Real-Time Way to Turn Urban Environments into Music

Noah Vawter

Thesis Proposal for Degree of Master of Science, Fall 2005

__________________________
Thesis Advisor
Chris Csikszentmihalyi
Professor of Media Arts and Sciences
MIT Media Laboratory
__________________________
Thesis Reader
Barry Vercoe
Professor of Media Arts & Sciences
MIT Media Laboratory
__________________________
Thesis Reader
Douglas Repetto
Director of Research
Columbia University, Computer Music Center.

Table of Contents
Abstract.....................................................................................................................................3
Introduction, Motivation and Inspiration.....................................................................................4
Prior Explorations......................................................................................................................6
Overview and Physical Description...........................................................................................8
Analysis, Processing and Synthesis..........................................................................................9
Schedule.................................................................................................................................11
Resources...............................................................................................................................11
Deliverables.............................................................................................................................11
Bibliography.............................................................................................................................12

Abstract
As human civilization devises ever more powerful machines, living among them may
become more difficult. We may find ourselves surrounded by incidentally created sounds and
noises, out of synchronization with our momentary needs and discordant. Currently,
legislating noise pollution is the only articulated solution and clearly it is not very effective.
Our impression of sound, however, may be mediated and manipulated, transformed into
something less jarring. So far, Walkmans and sound canceling headphones have done this,
isolating us from noise but also from one another. In their place, a next generation
headphone system is proposed which integrates environmental sound into a personal
soundscape. It allows one to synthesize music from environmental sound using a number of
digital signal processing (DSP) algorithms to create a sonic space in which the listener
remains connected with his or her surroundings, is also cushioned from the most harsh and
arrhythmic incursions and may also be drawn to appreciate the more subtle and elegant
ones.

Introduction, Motivation and Inspiration


The idea for this project came after experiencing a high-pitched squealing and
extended grating of the bus brakes in the city where I live. The squeal generated by the
friction between the brake pads and the rotors had a particular character. Similar in origin to
a bowed violin string, it held its pitch as the speed of the bus slowed, but resounded through
the sonically hideous frame and panels of the city bus instead of the critically-designed body
of a violin. Instead of the gingerly-spaced overtones of a stringed instrument gently
communicating a pitch to my mind, the inharmonic tones of the manmade vehicle posed a
question at the highest level of urgency: What is going on? I wanted to hear a response to
the blaring sound. Perhaps something soothing, like the exclamation of "Excuse me!"
following a disgusting belch.
The bus sounded as if a dozen children with tin whistles each picked a random note
and piped away at it. It was highly dissonant, but composers and conductors are able to lead
instrumental sound into and out of dissonances with a terrific dynamic range. Louis Curran,
Professor Emeritus of Music Theory and Music History at WPI, organized compositional
styles by the way they resolve dissonance. Extrapolating, it is possible to create an intelligent
system that analyzes soundscapes and makes them more harmonious.
To do this, one must inquire: What partials are in the squeal? What harmony did it
resemble? What chord would resolve it? Further sounds among the urban cacophony may
be considered. Many have a tonal component, others are rhythmic. I propose the city may
be seen not in terms of the relationships of the people forming it or the treasures within it, but
by the vibrations of all its physical objects forming a continuous and haphazard concatenation
of disordered pitches. This is neither composed nor improvised, but may in some cases be
musical.
While nearly every sound means something useful to the person closest to it (if not in
physical proximity then in operation or design), in a densely populated environment, one is
destined by proximity to perceive meaningless emanations. We may have gotten used to this
routine cacophony, but this thesis proposes to explore what it would be like to order the
sound. Not to regulate every sound-causing movement within it, but to develop digital signal
processing (DSP) techniques to simulate human perception of a busy scene and reconfigure
it.
To have complete control however would mean isolation, never needing to learn to
negotiate and never getting pleasantly surprised. Since the essence of city life is exposure to
and sharing a variety of ideas, even if some of them are not immediately palatable, elitism
behind a Walkman or sound-canceling headphones is unacceptable. Techniques for
cushioning the harshest sounds, and adding variety to the monotonous ones is more enticing.
For example, why must every note sounded from the speaker of a reversing truck have the
same pitch? Why not transform every fourth one into a leading tone? Perhaps a bass line
could be added. The goal of this project is to reconfigure urban audio space into a personally
inspiring and soothing environment.
This thesis may lead to many things. It may inspire people to design more systems
that integrate ambient sound with musical compositions, recalling some of the original
impetus for making live music, before recording was possible. Also, it may encourage
designers of anything sound-emitting to more carefully consider interactions among soundemitters. For example, the keypad beeps of adjacent automated teller machines (ATMs)
could be altered to form a chord. Instead of varying continuously, elevator motors could run
at RPMs to form a simple motif. Maybe quieter shipping containers could be designed. And
4

quite simply, maybe it will make people more considerate to others.


Longer term, it would be nice if this thesis were to lead to an rise in flaneurs- urban
wanderers seeking novelty- because the remixed sound will encourage people to reexperience familiar areas through the ears of the device, and lure them into exploring areas
they have never been. It may even influence the layout of modern cities. For example, it
might lead one to consider different, more consonant designs of audible systems like
crosswalk signals, automobile mufflers and subway tunnels. Once designers and planners
receive a vision of a more consonant-sounding city, it might encourage them to locate noisy
artifices like chemical plants away from residences. It might persuade city planners to
consider not only volume levels, but pitches in local sound ordinances. It might inspire
designers of large machines like construction equipment to consider auditory harmony and
rhythm in the mechanical operation.

Prior Explorations
In their paper "Smart Headphones," Sumit Basu and Alex Pentland at the MIT Media
Lab describe a reality-mediation project based on headphones, microphones and signal
processing [Basu 2001]. In the context of my project, their work is interesting because it
demonstrates a system with external cognition that shapes the perception of a wearer's
sonic environment. Their paper begins "Though our ears are wonderful instruments, there
are times when they simply cannot handle everything we need them to," which is similar to
the basis of my argument: In the millions of years of evolution leading to the construction of
my hearing system, the inharmonic sounds of arbitrary metal shapes have probably only
influenced the last 1000 years to even the tiniest degree. It is therefore a strain for the
human mind to interpret some of the new sounds.
However, Basu and Pentland's project resulted in a different system. It only lets in
human speech from the outside world, superimposing it over prerecorded music. This is an
improvement over certain situations, but in rich environments, censors too much interesting
information. It treads on the ideals of the flneur who roams the streets in search of "bustle,
gossip and beauty." [Levi 2004] Along with the honking horns and crossing signals, one
would miss the ringing of bells, the clamoring sirens and warbling birds. It would overlook
cultural differences such as the distinction between the American Republic's sine wave
modulated police sirens and the European tritonic version.
Artists have also addressed some of these ideas. For example, Luigi Russolo wrote a
manifesto titled the Art of Noises in 1913 [Russolo 1913].
This brief document
circumscribes the sonic environment from "ancient life" until 1913, with prescriptions for the
future. It describes sound's evolution as ever-growing in complexity and from mystery to
ecstasy to tedium. He writes "For many years Beethoven and Wagner shook our nerves
and hearts. Now we are satiated and we find far more enjoyment in the combination of the
noises of trams, backfiring motors, carriages and bawling crowds." He beseeches
composers to break out of the monotony of the music of their time by recasting the sounds
around him into a composition with hand-built noise-generating instruments. He writes "We
are therefore certain that by selecting, coordinating and dominating all noises we will enrich
men with a new and unexpected sensual pleasure."
I agree with his sentiment completely - that taking control of the environment around
oneself and ordering it can be used to stimulate emotion. However, I choose to help the
noise become music, rather than perform concerts using those noises, which the Futurists
did.
Another artist who examined the sounds of the city is Iori Nakai [Nakai 2003]. In 2003,
he demonstrated "Streetscape" in Linz. This is an interactive look at urban acoustics, but
no processing is involved. In this art piece, map representations of Tokyo and Linz are
presented to visitors along with a stylus. As the visitor moves the stylus over various parts
of the city, recorded sounds from that region play back. Thus the piece is an inversion of
this thesis. It bends the goals of the flneur in the direction of voyeur, and therefore
isolation. It is relevant, however for its presentation of sound as an exploration and a
choice. Similarly, my device will encourage one to experience portions of the city outside
one's vital paths and possibly to alter behavior to tend toward particularly exciting areas. In
contrast, my device will not enable one to do this anonymously, nor as rapidly.
Coincidentally, the concepts of anonymity and rapidity of access are keys to understanding
the modern debates over public photography and privacy.
Another artist who experimented with the mobile headphone/microphone combination is
6

Akitsugu Maebayashi [Maebayashi 2000]. In 2000, he exhibited a piece which sought to


process the environment although with very modest intentions. Rather than respond to the
environment in a manner based on human perception, it simply stepped through a fixed
sequence of echoes and reverberation, generally causing an even more disconcerting
effect than being in the city. Nevertheless, this piece is important for two reasons. First, it
confirms the appeal of a lively processed environment. And most importantly, it contributes
the idea that one can compose a sequence of processing parameters, that the
transformation of an environment can have varying modes. This superimposes the idea of
a "song" onto the outside world, giving a more recognizable and repeatable construction.
One of the neat things about this is that a flneur can, as with any good composition, learn
to anticipate changes in the sound, and relocate his physical presence to one where the
processing will be especially appropriate.

In 2005, I demonstrated a project called "Sonic Authority" [Vawter 2005] . In Sonic


Authority, manmade machines with periodic waveforms such as air conditioners, electric
power transformers and unidentified telephone pole equipment were analyzed to determine
the dominant perceived tone. Permanent, official-looking tags were printed and affixed on or
near the devices declaring the machines' contribution to the audible scene.
Finally, I was inspired by a segment I heard on a radio show in which a man walked
around New York City with a cardboard tube. He had calculated the resonant pitch of the
tube to be B-flat and was letting people listen to the sounds of the city as they were
sustained by the tube.

Related Psychoacoustical Research


The proposed device intends to manipulate images of the surrounding sonic
environments into tonal music. To do this, it will measure the sampled sound's features, then
process the sound numerically so it more closely follows the pre-composed musical structure
stored inside. Not every musical characteristic humans perceive can be reliably computer
calculated, but it is an active area of research. The most important musical characteristics in
this context are volume, key, dissonance, and tempo.
Although it may seem simple, the impression of volume has some important nuances.
This is evident in the well-known Fletcher and Munson curves and masking makes it even
more difficult to predict [Dowling and Harwood 1986]. Detection of key through integration of
overtones is nicely explained in Wei Chai's paper Automated Analysis of Musical Structure
[Chai 2005]. In 1722, Jean-Phillippe offered a very early look at dissonance, as derived from
the ratios of harmonic instruments (without regard to tonality) [Rameau 1722]. The topic of
dissonance and its role in composition was important throughout Western music and is the
topic of numerous textbooks such as Harmony [Piston 1941]. In the second half of the 20th
century, computation of dissonance for arbitrary groups of nonharmonic tones was examined
in the often referenced paper "Consonance Theory Part II: Consonance of Complex Tones
and Its Calculation Method" [Kameosha 1969]. Autodetection of tempo is notably examined
in "Tempo and Beat Analysis of Acoustic Musical Signals" [Scheirer 1998]. Finally, in addition
to musical features, the relationship between dissonance, spectra and the construction of
unevenly spaced scales gets intense scrutiny in Tuning, Timbre, Spectrum, Scale [Sethares
1999].

Overview and Physical Description


The hardware produced will be a small package which fits in the pocket of the flneur's
clothing. It will listen to the environment surrounding the flneur and transform it according
the schemas of a short album of songs. It will have an on/off switch, volume control and a
"tuning adjustment." A pair of over-the-ear headphones on a single cable will be attached.
In addition to the headphones' speakers, one small microphone will be mounted on the
outside of each ear.
The microphones' purpose is to produce an image of the audible world surrounding the
listener. The tuning adjustment is to select modes or songs. The progression through the
album will be similar to an Extended Play record, yet implemented as an analog radio-style
tuning knob with simulated static in order to underscore the concept of a graded integration
with the environment, as opposed to the digital in/out of CD player tracks. Furthermore, the
fake static sustains the impression of the music and sound coming from "out there" rather
than "in here."
The hardware will most likely be implemented using a development board based on the
Analog Devices (ADI) Blackfin Digital Signal Processor (DSP). This is desirable because it
has a large amount of processing power (1.5 billion multiply and accumulate operations per
second is considered large for a portable system by today's standards), can be
programmed in C++ and has the Linux environment available for it. At present the
VicCore53X development board from Voice Interconnect, a German company, is desired.
The hardware will also have a stereo codec and 1/8" jacks for headphone output and
microphone input. Custom modification will be necessary to implement the tuning knob.
Based on unfortunate experiences with Chiclet, the DSP Musicbox, a protective case will be
designed to cover the printed circuit board.
One goal of this project is to subtly mix fantasy and reality. This will be done by carefully
considering which signal processing algorithms to apply. For example: When it is
necessary to produce as realistic-sounding an environment as possible, algorithms such as
equalization, linear filtering (FIRs and IIRs) pitch-scaling, and sampling will be used. In
lesser measure, unnatural sounding algorithms such as waveshaping distortion, bit
reduction, ring modulation, aliasing and wavetable/additive synthesis will be used.
Another goal of this project is to ensure that it is not overly singular. This means that in
the near future, someone else should be able to pick it up and create their own algorithms
with it. Therefore, an extension language will be created for operating on the environmental
noise. It will be similar to Csound, but have much higher level primitives. The extension
language is intended to survive the physical project.

Analysis, Processing and Synthesis


The analysis routines will get their input data from the Analog to Digital Converters
(ADCs). All signal paths will be stereo, fixed-point data at 44.1KHz rate for high quality. Data
will be operated on in windows whose size will be picked based upon experimentation, since
there is a system of tradeoffs between immediacy/delay, processing efficiency and processing
quality.
Analysis and synthesis are operations which require computational resources, which
are typically measured in percent of CPU usage or number cycles per second run time, e.g.
MIPS (Millions of Instructions Per Second). Given the limited number of MIPS, allocation
decisions must be made. Since the device will produce a variety of different outputs, it will
utilize a number of synthesis techniques, each of which will require a varying number of
MIPS. For simplicity, the analysis and synthesis routines will each be required to utilize less
than 50% of the available MIPS. It is expected that the analysis routines will always occupy
the same number of MIPS.
The basic flow of signal processing will traverse a simple network of analysis,
processing and synthesis modules. See Figure 1.
Illustration 1Block Diagram of Signal Processing Flow

The analysis routines form a mostly sequential processing chain, with outputs taken at
each link available to the main sequencer. In the first step in the chain, the sound will be
filtered using the Inner/Outer ear transform as in "Skeleton" [Jehan 2004]. This equalization
stage is done when processing microphone input to more closely resemble the audio a
human ear would hear. It may also be tweaked to account for the transfer function of the
microphones. Following the E.Q., one stream will be sent to a beat detection module, which
will supply the main sequencer with tempo and rhythm data. The equalized sound stream will
also be continuously supplied to the Fast Fourier Transform routine. The frequency domain
data will be supplied to Kameoka and Kuriyagawa's dissonance measurement algorithm
[Kameoka 1969]. The frequency domain data will also be supplied to the Dominant Pitch
Analysis module. The dominant pitch data forward data to the Chromagram computation
module.
In the previously mentioned work Sonic Authority, the analysis began with samples of
each device. For noise immunity, long, 30 second windows were used . Next the Fast
Fourier Transform (FFT) was computed. This resulted in a spectrum with about 1,000,000
bins (44100 samples per second * 30 seconds). To transform this into a dominant frequency,
the bins were used to compute 121 sums, one for each step of the audible 10-octave
chromatic scale. Each sum indicates the relative dominance of one note. For example, to
10

find the dominance of note A-4, the total of every bin whose frequency is within 25% of an
integer multiple of the 440 Hz fundamental is summed. The dominance levels are then
compared, and the most dominant is reported. This method is similar in spirit to computing
the Chromagram [Chai 2005] and computing the Constant Q Transform [Brown 1989]. For
example output, see Figure 2.
Illustration 2Dominant Pitches in Unidentified Telephone Pole Equipment

Once computed, the dominant frequency spectrum is of great usefulness. Its outputs
can be readily applied to computing the key of the piece. This is a useful piece of information
because it can inform how to harmonize. In practice, the precision of the spectral dominance
algorithm varied with the sampled location. Some sounds resulted in quite narrow bands,
and it was possible to name the dominant pitch by finding the maximum value on the 121value graph. Other sounds produced small clusters of dominance, from 3-9 semitones wide,
whose amplitudes were within 5% of each other. Such clusters are highly dissonant, and it is
the goal of this project to turn such dissonance into music and improve the quality of the
algorithm. There are many ways to interpret such results and one of the goals is to explore
them.
Techniques for the system could come from many places. They will be both
discovered and inspired from other musicians. For example, jazz musician Thelonious Monk
would play a cluster of semitones, then release all but one key, creating a very dissonant
attack on an otherwise normal note. To mimic this effect inside the listener's environment,
two enveloping filters would be employed. First, the cluster of notes would be attenuated with
either an array of comb or notch filters. This would virtually eliminate the dissonant sound
from the environment. Then, to sustain connection with the listener's environment, a second
filter or filter bank would be used to isolate only one note from the cluster at a time and remix
it in. Furthermore, the reintroduced note could be varied with time, creating a melodic line.
Another response to the dissonant sound would be to harmonize with it. This is the
response offered by Kelly Dobson in "Machine Therapy" [Dobson 2002]. In her project, the
human listener harmonizes with a machine's movements and audible vibrations. Computer
musicians have taken all kinds of approaches toward autoharmony. One area to explore is
when to mix in realistic vs. fantastic instruments. A realistic instrument would have a similar
harmonic spectrum to the original. Wei Chai's paper, for example, describes the comparison
11

of odd and even harmonic levels to get an image of timbre.


Another set of techniques is based on William Sethares' ideas [Sethares 2005]. He
examines the peculiar spectra of naturally-occurring rocks, describing how K&K's dissonance
algorithm informs a particular musical scale. It would be possible to automate his
methodology in order to create scales in real time. After calculating the non-chromatic scale,
an additive synthesis method would be used to play melodies using the new instrument. This
is an important reference point for analyzing manmade noise because often physical design
of machines such as vehicle transmissions create similar groups of sounds whose
fundamental frequencies scale at the same rate, but are inharmonic.

12

Schedule
December 2005 - Initial development activities such as porting Linux to the VicCore
development board will take place.
January 2006 - The hardware, including headphones, microphones, tuning knob, case
and battery system will be constructed.
February 2006 - The initial DSP modules and extension language will be developed.
Demos of each algorithm will be presented to readers for critique.
March 2006 - The system will be tested extensively in several cities. An online audio
journal will be kept in several cities for readers to critique.
April 2006 - Writing the thesis will begin.
May 2006 - Writing the thesis will continue.

Resources
VicCore 53x-OEM Blackfin DSP Development Board
IGLOO Parallel Port ICE (In Circuit Emulator)

Deliverables
A reformulated Walkman-like device that transforms the sonic environment into music.
New algorithms to transform disordered manmade noise into music.
An evaluation of which algorithms are best suited to the goals.

13

Bibliography
[Basu 2001]

Basu, S. and Pentland, A. (2001) "Smart Headphones," Proceedings of


CHI 2001, Seattle, WA.

[Levi 2004]

Levi, Lawrence. (2004) Flaneur magazine, 2004. Brooklyn, New York.


http://www.flaneur.org/flanifesto.html

[Russolo 1913]

Russolo, L. (1913) The Art of Noises. Translated by Robert Filliou


1967, Great Bear Pamphlet, Something Else Press.

[Nakai 2003]

Iori Nakai. (2003) Streetscape. Ars Electronica, Linz 2003.

[Maebayashi 2000] Maebayashi, Akitsugu. (200) "Sonic Interface", exhibited in Tokyo.


[Vawter 2005]

Vawter, Noah. (2005) "Sonic Authority." exhibited in Cambridge.

[Dowling 1986]

Downling and Harwood, (1986) Music Cognition, Academic Press,


Orlando. pp. 46-49.

[Chai 2005]

Chai, Wei. (2005) "Automated Analysis of Musical Structure."

[Rameau 1722] Jean-Philippe Rameau (1722). Treatise on Harmony, translated from


French, Dover Press, New York. p. 29.
[Piston 1941]

Piston, Walter. Harmony. (1941) Norton Press, New York.

[Kameoka 1969]

A. Kameoka and M. Kuriyagawa, (1969) "Consonance Theory Part II:


``Consonance of Complex Tones and Its Calculation Method", The
Journal of the Acoustical Society of America, 1969b, Vol. 45(6),
1460-1469

[Scheirer 1998]

Scheirer, Eric. (1998) "Tempo and Beat Analysis of Acoustic Musical


Signals." J. Acoust. Soc. Am. 103:1 (Jan 1998), pp 588-601.

[Sethares 2005]

Sethares, W. (2005) Tuning, Timbre, Spectrum, Scale, 2nd edition.


Springer-Verlag London. 2nd edition. pp 139-144.

[Jehan 2004]

Jehan, Tristan. (2005) Skeleton Computer Software.

[Brown 1991]

Brown, J.C., (1991). ``Calculation of a Constant Q Spectral


Transform" J. Acoust. Soc. Am. 89 425-434.

[Dobson 2002]

Dobson, Kelly. (2002). Machine Therapy Session.

14

15

You might also like