Professional Documents
Culture Documents
A Way To Turn Urban Environments Into Music
A Way To Turn Urban Environments Into Music
Noah Vawter
__________________________
Thesis Advisor
Chris Csikszentmihalyi
Professor of Media Arts and Sciences
MIT Media Laboratory
__________________________
Thesis Reader
Barry Vercoe
Professor of Media Arts & Sciences
MIT Media Laboratory
__________________________
Thesis Reader
Douglas Repetto
Director of Research
Columbia University, Computer Music Center.
Table of Contents
Abstract.....................................................................................................................................3
Introduction, Motivation and Inspiration.....................................................................................4
Prior Explorations......................................................................................................................6
Overview and Physical Description...........................................................................................8
Analysis, Processing and Synthesis..........................................................................................9
Schedule.................................................................................................................................11
Resources...............................................................................................................................11
Deliverables.............................................................................................................................11
Bibliography.............................................................................................................................12
Abstract
As human civilization devises ever more powerful machines, living among them may
become more difficult. We may find ourselves surrounded by incidentally created sounds and
noises, out of synchronization with our momentary needs and discordant. Currently,
legislating noise pollution is the only articulated solution and clearly it is not very effective.
Our impression of sound, however, may be mediated and manipulated, transformed into
something less jarring. So far, Walkmans and sound canceling headphones have done this,
isolating us from noise but also from one another. In their place, a next generation
headphone system is proposed which integrates environmental sound into a personal
soundscape. It allows one to synthesize music from environmental sound using a number of
digital signal processing (DSP) algorithms to create a sonic space in which the listener
remains connected with his or her surroundings, is also cushioned from the most harsh and
arrhythmic incursions and may also be drawn to appreciate the more subtle and elegant
ones.
Prior Explorations
In their paper "Smart Headphones," Sumit Basu and Alex Pentland at the MIT Media
Lab describe a reality-mediation project based on headphones, microphones and signal
processing [Basu 2001]. In the context of my project, their work is interesting because it
demonstrates a system with external cognition that shapes the perception of a wearer's
sonic environment. Their paper begins "Though our ears are wonderful instruments, there
are times when they simply cannot handle everything we need them to," which is similar to
the basis of my argument: In the millions of years of evolution leading to the construction of
my hearing system, the inharmonic sounds of arbitrary metal shapes have probably only
influenced the last 1000 years to even the tiniest degree. It is therefore a strain for the
human mind to interpret some of the new sounds.
However, Basu and Pentland's project resulted in a different system. It only lets in
human speech from the outside world, superimposing it over prerecorded music. This is an
improvement over certain situations, but in rich environments, censors too much interesting
information. It treads on the ideals of the flneur who roams the streets in search of "bustle,
gossip and beauty." [Levi 2004] Along with the honking horns and crossing signals, one
would miss the ringing of bells, the clamoring sirens and warbling birds. It would overlook
cultural differences such as the distinction between the American Republic's sine wave
modulated police sirens and the European tritonic version.
Artists have also addressed some of these ideas. For example, Luigi Russolo wrote a
manifesto titled the Art of Noises in 1913 [Russolo 1913].
This brief document
circumscribes the sonic environment from "ancient life" until 1913, with prescriptions for the
future. It describes sound's evolution as ever-growing in complexity and from mystery to
ecstasy to tedium. He writes "For many years Beethoven and Wagner shook our nerves
and hearts. Now we are satiated and we find far more enjoyment in the combination of the
noises of trams, backfiring motors, carriages and bawling crowds." He beseeches
composers to break out of the monotony of the music of their time by recasting the sounds
around him into a composition with hand-built noise-generating instruments. He writes "We
are therefore certain that by selecting, coordinating and dominating all noises we will enrich
men with a new and unexpected sensual pleasure."
I agree with his sentiment completely - that taking control of the environment around
oneself and ordering it can be used to stimulate emotion. However, I choose to help the
noise become music, rather than perform concerts using those noises, which the Futurists
did.
Another artist who examined the sounds of the city is Iori Nakai [Nakai 2003]. In 2003,
he demonstrated "Streetscape" in Linz. This is an interactive look at urban acoustics, but
no processing is involved. In this art piece, map representations of Tokyo and Linz are
presented to visitors along with a stylus. As the visitor moves the stylus over various parts
of the city, recorded sounds from that region play back. Thus the piece is an inversion of
this thesis. It bends the goals of the flneur in the direction of voyeur, and therefore
isolation. It is relevant, however for its presentation of sound as an exploration and a
choice. Similarly, my device will encourage one to experience portions of the city outside
one's vital paths and possibly to alter behavior to tend toward particularly exciting areas. In
contrast, my device will not enable one to do this anonymously, nor as rapidly.
Coincidentally, the concepts of anonymity and rapidity of access are keys to understanding
the modern debates over public photography and privacy.
Another artist who experimented with the mobile headphone/microphone combination is
6
The analysis routines form a mostly sequential processing chain, with outputs taken at
each link available to the main sequencer. In the first step in the chain, the sound will be
filtered using the Inner/Outer ear transform as in "Skeleton" [Jehan 2004]. This equalization
stage is done when processing microphone input to more closely resemble the audio a
human ear would hear. It may also be tweaked to account for the transfer function of the
microphones. Following the E.Q., one stream will be sent to a beat detection module, which
will supply the main sequencer with tempo and rhythm data. The equalized sound stream will
also be continuously supplied to the Fast Fourier Transform routine. The frequency domain
data will be supplied to Kameoka and Kuriyagawa's dissonance measurement algorithm
[Kameoka 1969]. The frequency domain data will also be supplied to the Dominant Pitch
Analysis module. The dominant pitch data forward data to the Chromagram computation
module.
In the previously mentioned work Sonic Authority, the analysis began with samples of
each device. For noise immunity, long, 30 second windows were used . Next the Fast
Fourier Transform (FFT) was computed. This resulted in a spectrum with about 1,000,000
bins (44100 samples per second * 30 seconds). To transform this into a dominant frequency,
the bins were used to compute 121 sums, one for each step of the audible 10-octave
chromatic scale. Each sum indicates the relative dominance of one note. For example, to
10
find the dominance of note A-4, the total of every bin whose frequency is within 25% of an
integer multiple of the 440 Hz fundamental is summed. The dominance levels are then
compared, and the most dominant is reported. This method is similar in spirit to computing
the Chromagram [Chai 2005] and computing the Constant Q Transform [Brown 1989]. For
example output, see Figure 2.
Illustration 2Dominant Pitches in Unidentified Telephone Pole Equipment
Once computed, the dominant frequency spectrum is of great usefulness. Its outputs
can be readily applied to computing the key of the piece. This is a useful piece of information
because it can inform how to harmonize. In practice, the precision of the spectral dominance
algorithm varied with the sampled location. Some sounds resulted in quite narrow bands,
and it was possible to name the dominant pitch by finding the maximum value on the 121value graph. Other sounds produced small clusters of dominance, from 3-9 semitones wide,
whose amplitudes were within 5% of each other. Such clusters are highly dissonant, and it is
the goal of this project to turn such dissonance into music and improve the quality of the
algorithm. There are many ways to interpret such results and one of the goals is to explore
them.
Techniques for the system could come from many places. They will be both
discovered and inspired from other musicians. For example, jazz musician Thelonious Monk
would play a cluster of semitones, then release all but one key, creating a very dissonant
attack on an otherwise normal note. To mimic this effect inside the listener's environment,
two enveloping filters would be employed. First, the cluster of notes would be attenuated with
either an array of comb or notch filters. This would virtually eliminate the dissonant sound
from the environment. Then, to sustain connection with the listener's environment, a second
filter or filter bank would be used to isolate only one note from the cluster at a time and remix
it in. Furthermore, the reintroduced note could be varied with time, creating a melodic line.
Another response to the dissonant sound would be to harmonize with it. This is the
response offered by Kelly Dobson in "Machine Therapy" [Dobson 2002]. In her project, the
human listener harmonizes with a machine's movements and audible vibrations. Computer
musicians have taken all kinds of approaches toward autoharmony. One area to explore is
when to mix in realistic vs. fantastic instruments. A realistic instrument would have a similar
harmonic spectrum to the original. Wei Chai's paper, for example, describes the comparison
11
12
Schedule
December 2005 - Initial development activities such as porting Linux to the VicCore
development board will take place.
January 2006 - The hardware, including headphones, microphones, tuning knob, case
and battery system will be constructed.
February 2006 - The initial DSP modules and extension language will be developed.
Demos of each algorithm will be presented to readers for critique.
March 2006 - The system will be tested extensively in several cities. An online audio
journal will be kept in several cities for readers to critique.
April 2006 - Writing the thesis will begin.
May 2006 - Writing the thesis will continue.
Resources
VicCore 53x-OEM Blackfin DSP Development Board
IGLOO Parallel Port ICE (In Circuit Emulator)
Deliverables
A reformulated Walkman-like device that transforms the sonic environment into music.
New algorithms to transform disordered manmade noise into music.
An evaluation of which algorithms are best suited to the goals.
13
Bibliography
[Basu 2001]
[Levi 2004]
[Russolo 1913]
[Nakai 2003]
[Dowling 1986]
[Chai 2005]
[Kameoka 1969]
[Scheirer 1998]
[Sethares 2005]
[Jehan 2004]
[Brown 1991]
[Dobson 2002]
14
15