Professional Documents
Culture Documents
Sound Spheres Non Touchable Virtual Instrument Finger Tracking PDF
Sound Spheres Non Touchable Virtual Instrument Finger Tracking PDF
Technical Report N O
2010/ 23
19 September, 2010
Department of Computing
Faculty of Mathematics, Computing and Technology
The Open University
http://computing.open.ac.uk
Sound Spheres
A non-contact virtual musical instrument
played using finger tracking
Craig Hughes
(T8078171)
Firstly, I would like to thank my tutor, Mr. Michel Wermelinger, whose support and
guidance throughout this project has been invaluable. My thanks also go to the
A huge thank you must also go to my family, especially my wife Wendy for her
I would also like to thank many of my friends and family for generously and
enthusiastically giving up their personal time to help evaluate the Sound Spheres
Finally, I would like to acknowledge that this project has been greatly inspired by the
work of Johnny Lee (2008). In particular, his study “Hacking the Nintendo Wii
the Technology, Entertainment and Design (TED) Conference in 2008 have been of
great influence.
Table of Contents
Preface ........................................................................................................................ i
Chapter 1 Introduction.................................................................................................... 1
ii
3.2.8 Other Research Methods .................................................................................. 41
3.3 Preliminary Analysis of Research Data ................................................................ 42
Chapter 5 Results.......................................................................................................... 63
iii
5.7.5 Application of Control Parameters ................................................................... 77
Chapter 6 Conclusions.................................................................................................. 78
References ..................................................................................................................... 83
Index ..................................................................................................................... 86
iv
List of Figures
v
List of Tables
vi
Abstract
the direct physical interaction between the performer and a musical instrument. The
advent of electronics and computing has given rise to many new electronic musical
instruments and interfaces. Recent advances in these areas have seen an emerging
trend into the design of virtual musical interfaces in which audio is synthesized and
The research described in this dissertation concerns the design and construction of a
new non-contact virtual musical instrument (called Sound Spheres) that uses a finger
parameters and key factors that are considered important for the design of such
instruments and provides research into whether these can be successfully achieved in
Results show that implementation of the control parameters of pressure, speed, and
inconclusive. Furthermore the results present evidence that the finger tracking
vii
Chapter 1 Introduction
The cool computer interface technique of finger tracking and the fascinating world of
attention! This perhaps would not be surprising if it were not for the fact that I am a
spare time.
The ability to control a software application purely by moving ones fingers freely in
the air (i.e. finger tracking) has always been, for me at least, the subject of science
fiction. This perception changed quite by chance after stumbling upon a fascinating
Entertainment and Design (TED) Conference website. The presentation showed how
to implement a cost effective finger tracking software application utilizing the built
in infrared camera and simple bluetooth connectivity of the Nintendo Wii Remote
Games Controller. This opened a world of possibilities and I began to think about the
various types of software application to which I could apply the finger tracking
used in the production of music and more precisely could it be used to play a virtual
musical instrument (VMI)? The non-contact nature of the finger tracking method
provoked a further question. How might the player of such an instrument exercise
control in order to affect its musical outcomes? This project involves the design and
research into playability with respect to its control affordances and effectiveness.
1
1.1 Definition of Terms
As a central theme of the project the term interaction is used specifically to describe
the relationship between a musical instrument and an entity that manipulates the
The terms effective and effectiveness are used throughout this project. They refer
circa 1890) and Hornbostel-Sachs (Hornbostel and Sachs, 1914) systems, clearly
more modern variants (e.g. electric guitar, electric keyboard, electric flute, etc),
Not all physical interaction with musical instruments requires a performer however.
Take for example the Aeolian Harp, which is an instrument that is played entirely by
the wind as it blows across the harp‟s strings. The wind exerts a force on the harp
interaction is typically physical and involves contact with the instrument either
directly through touch or indirectly through the use of equipment or implements such
2
as sticks or bows. However, the Theremin and Terpsitone are two notable
instruments that, whilst requiring a performer to play them, are controlled through
body gestures and are played without any physical contact with the instrument (i.e. a
non-physical interaction). One might argue that even for the Theremin and
Terpsitone the interaction is of a physical nature, for the performer must still provide
might suggest that the manipulation of electromagnetic waves is also physical (albeit
invisible). However, using the definition adopted in section 1.1, we can say that
interaction by a performer that does not directly or indirectly contact the musical
instrument is best described (for the sake of clarity) as a non-physical interaction and
Electronics and computers, along with music related software, have enabled many
the creation of music for musicians and non-musicians alike. The aspiration to create
new musical devices and interfaces has given rise to a number of interesting studies.
One such study by Crevoisier et al. (2006), showed how a simple everyday object, a
table, could be transformed into a musical and visual instrument using sound
produced by touch. Kiefer (2010) explored three new input devices for the intuitive
control of composition and editing for digital music. Paine et al. (2007) also sought
evaluation of the relationship between the musician and musical interface. A more
artistic approach to the creation of music was taken in the design of the Sounds of
influenced by musical input using a computer mouse. In each of these examples the
3
physical interaction through contact with a musical instrument or device was central
to the study.
Interfaces for Real-time Electronic Music Performance carried out at the Virtual,
taxonomy for new interfaces for real-time electronic music performance under the
working title TIEM (Taxonomy for real-time Interfaces for Electronic Music
and website where, after completing a survey, people can submit details of new
currently lists over 70 new and unique electronic musical instruments and interfaces.
contact.
Software applications along with various types of controllers (or gestural interfaces
which convert body movement and hand gestures to computer commands) have
processed, synthesized and played back to the musician using computer software.
computer screen.
A wide variety of controllers have been developed to facilitate the creation of music
for VMIs. Some of these are included in the TIEM taxonomy and many more have
been described or been the focus of research into gestural interfaces. Mulder (2000)
for example provides descriptions of different types of VMI controller. Based on the
4
evidence of these and other sources, the majority of these VMI controllers rely on
physical interaction with a device such as the T-Stick, researched, designed and built
by Malloch (2007). The T-Stick can sense where and how much of it is touched,
There is however a small number of VMIs that are controlled without contact (non-
physical interaction) and instead rely on the proximity or movement of parts of the
(2005).
Physical contact with any instrument (including VMIs) provides the performer with
varying degrees of touch sensation (or tactile feedback), where the performer can feel
aspects of the device such as resistance, vibration, texture or forces (such as weight
Perhaps the reason why the majority of VMIs are played through physical interaction
is to do with the wide belief that tactile feedback provides a greater degree of control
and hence the musician/instrument can be more expressive. It has been argued that
tactile feedback plays a central role in musical performance, and that audio and
vision are of a lesser importance, and merely act as monitoring senses. This is
certainly the conclusion reached by Castagne et al. (2004) and Lecuyer, et al. (2005).
However, there is another viewpoint. Whilst one might acknowledge that tactile
not an entirely necessary factor as the Theremin and other non-contact instruments
control allows for new musical possibilities. The following viewpoint can also be
5
considered. Instruments are generally played for musical performance or some kind.
The musical performance maybe recorded for later playback or carried out in real-
time. It is during this real-time musical performance that it could be argued audio and
visual feedback become equally (if not more) important than the physical control of
musical performance and from an audience‟s perspective audio and visual is the only
instruments can be equally compelling to watch and moving to listen to. One only
has to watch a performance of Ivan Franco playing the Airstick at Steim, Netherlands
Typically non-contact VMIs take their input from body proximity, movement and/or
VMIs. In Vlaming‟s (2008) thesis a wide range of motion capture techniques and
systems are identified. Finger tracking is one such motion capture technique.
Finger tracking systems recognize and follow the position and gestures of fingers as
they are moved freely in the air, to enable control of software and user interfaces.
This technique has many applications. Kiefer (2010) demonstrated an example with
the Phalanger gesture recognition system that it is possible to use hand tracking to
a VMI? The answer to this would very much depend on what is meant by effective.
The Thummer Mapping Project study (Paine et al., 2007) identified four common
physical instrument variables (pressure, speed, angle and position) that control
instrument dynamics, pitch, vibrato and articulation. In a later study Paine (2009) re-
6
iterated these control parameters as important factors for the design of new musical
finger-tracking VMI then one might conclude that key aspects likely to contribute to
Jorda (2004) describes other factors that are perhaps also important to the
(learning curve), control and predictability are all important factors. He also
suggests that the balance between challenge, frustration and boredom must be met.
factor for digital musical instruments. They suggest that musical instruments that
challenge, frustration and boredom, and reproducibility) can be achieved then one
Until recently, hardware to support finger tracking has been expensive and confined
to specialist use (such the motion capture systems from Vicon). However, in the
fascinating study “Hacking the Nintendo Wii Remote”, Lee (2008) showed an
accessible and affordable finger tracking technique utilizing the Nintendo Wii
Remote controller (Wiimote) for the Nintendo Wii game console. He cleverly
exploited the Wiimote‟s built in infrared camera and simple bluetooth connectivity,
It is this accessibility to an affordable finger tracking system that has been the
catalyst for this project. It provides a means in which to develop a new non-contact
7
virtual musical instrument that utilizes finger tracking, and to investigate its musical
The aim of this project is to develop a non-contact virtual musical instrument (VMI),
that is played through using a finger tracking method (movement of fingers in the
The study has primarily focused on the control aspects of the VMI, specifically the
ability for the VMI to provide control parameters of position, speed, pressure and
angle to vary the audio feedback of the VMI. However, the VMI has also been
assessed for its playability in a more general sense. For this I have considered the
To facilitate finger tracking I have utilized the infrared camera and bluetooth
detailed in section 2.5, with this approach there are two possible marker-based
implementation strategies; passive markers and active markers. This project utilizes
passive markers in the form of highly reflective tape stuck to a lightweight cap
the most appropriate and effective method of affixing the markers (e.g. directly to the
finger tip or to an object that is placed over the finger tip). An infrared illuminator
has been used as a light source which is directed from the position of the Wiimote in
the direction of the player‟s fingers. The passive markers reflect the infrared light
back to the camera of the Wiimote, which in turn, sends data (via Bluetooth
8
responsible for both the audio (sound synthesis and playback) and visual feedback
I have termed the VMI as Sound Spheres and this terminology will be used
In this research I make several references to work carried out by Paine (2007, 2009)
contribution in this area. His assertion that pressure, speed, angle and position could
act as a design consideration for future music interface development has provoked
played using finger tracking (i.e. non-contact). This dissertation explores this idea
As described in section 1.1 the Taxonomy for real-time Interfaces for Electronic
Music performance (TIEM) database currently lists over 70 new and unique
taxonomy are played without contact and none of these uses the finger tracking
development into this area. I therefore believe that the addition of the Sound Spheres
discussed, all those who submit to the taxonomy must take part in the TIEM Survey.
about the qualities of movement needed to play their instrument/interface. They are
also asked to rank their relative importance. I believe that results gathered from the
9
user study of the Sound Spheres VMI will enable me also to contribute to this
This project essentially involves the design of a new non-contact virtual musical
instrument and a study on the effectiveness of the finger tracking method and
highlights areas of knowledge that have influenced both the design and study of the
Research methods and the data collection process used within the project are
discussed in Chapter 3 and include methods for a pilot study and a user study of the
The functional design of the Sound Spheres VMI is outlined in Chapter 4 and in this
Chapter 5 presents results and analysis of data collected during the Sound Spheres
The dissertation ends with Chapter 6 where conclusions are drawn and discussed in
relation to the research aims and objectives. A project review and recommendations
10
Chapter 2 Literature Review
Research for this project has centered around four core themes; new electronic
Research into new electronic musical interfaces revealed a wide variety of different
types of interfaces and classifications. While the focus of this research is principally
on designing a VMI that used a finger tracking method, useful insights can also be
gained from research studies into the design of other types of electronic musical
interface.
The Sound Rose project (Crevoisier et al., 2006) is an audio interactive system based
around a touch sensitive interface that enables its user to create music and images
through the tapping and dragging of fingers over the surface of a simple table top
(the touch table). Their project was of interest as it primarily describes the design of
the hardware and software components of the system and it was useful to appreciate
how these are combined into a single system and how the sequence of events/inputs
were processed. The choice of layout and construction of the touch table (designed
with usability and ergonomic considerations) is also described and justified. The user
interface, graphics and music components of the system are described. It was
enlightening to see how the x-y coordinates of the position of touch were mapped to
The Sound Rose project shares two similarities to the Sound Spheres project and
hence has been useful in developing initial ideas and design of the Sound Spheres
11
system. The first similarity is that both projects take input from finger positions (x
and y coordinates) and process these to render graphics and generate sound to
provide both visual and audio feedback to the user. The simple process flow chart
(and accompanying text) presented in the Sound Rose project helped clarify how
similar processes could be sequenced for the Sound Spheres project. Unfortunately
the Sound Rose paper does not provide an in-depth view or discussion on the sound
and graphics processing and hence it did not help to identify any associated
problems. The second similarity between the projects is that they both use finger
positions as a mapping to the sound generated. Whilst the Sound Spheres system will
consider the parameters of speed, pressure, angle in addition to position, the Sound
Rose paper does at least highlight an interesting example of a mapping which will be
considered for the Sound Spheres system and that is the spatialization and panning of
musical interface is detailed in the Thummer Mapping Project (ThuMP) (Paine et al.,
2007). This study sought to develop a new electronic musical instrument (the
musician and musical interface. The methods used to design and evaluate the new
interface are described in detail and have provided insights during the definition of
my own research methodology. The first stage of the project sought to quantify and
subsequent analysis, the parameters of instrument control exercised were noted, and
these included pitch, dynamics, articulation and vibrato. Further analysis revealed
that the four most common physical instrument controls used to manipulate these
12
parameters were pressure, speed, angle and position. The second stage of the project
The pressure, speed, angle and position parameters are acknowledged to be applied
in different ways for different instruments. Take for example the control parameter of
pressure. The pressure of a bow on a violin‟s strings will vary the tone and dynamics
of the sound produced, whereas for a wind instrument an increase in pressure results
when more air stream is directed into the instrument and a change of sound is
produced. The pressure of a finger on a guitar string will also change its sound,
perhaps from a muted note when low pressure is applied to a clean sound when high
pressure is applied.
The importance of the pressure, speed, angle and position controls was further
specific interfaces, the Wacom Graphics Tablet and The Nintendo Wii Remote
(Wiimote). Paine (2007, 2009) describes in detail how the various gestural
possibilities of the Wiimote can be mapped to provide pressure, speed, angle and
using the pitch, roll and yaw controls of the Wiimote, whilst the pressure
13
This significance of pressure, speed, angle and position controls is further supported
by the TIEM (Taxonomy for real-time Interfaces for Electronic Music performance)
research project. This ongoing project seeks to develop taxonomy for new interfaces
preliminary outcomes and future plans are detailed by Paine and Drummond (2009).
The project‟s primary method of defining taxonomy is through the online TIEM
part of this questionnaire, participants were asked to select and rank the qualities of
movement needed to play their instrument/interface. The results were in line with the
1. Position (81.13%)
2. Speed (71.70%)
3. Pressure (58.49%)
4. Angle (49.06%)
The studies have been of great interest as they stress the importance of these physical
pressure, speed, angle and position controls and have played a central role in the
These controls are not the only important factors in the design of new musical
interfaces however. Jorda (2004a) also explored the dynamic relationship between
musician and the instrument. His project starts with the assertion that considering
surprising how few professional musicians use them and whether they can be
14
considered to support a virtuoso performance (i.e. a performance where the musician
must apply great skill). To this end he presents factors that contribute to what makes
is too simple may not provide a rich experience, and one that is too difficult to master
may alienate the user before they are able to progress. This leads onto a discussion
(2004b) goes on to look at other factors that would perhaps allow a player of an
experience for an instrument‟s players. One might consider that these factors at odds
with those outlined by Paine. However, they might also be seen as complementary.
For example, variability can only be achieved if there are multiple control parameters
of which to choose. This study is of interest for the design of new musical interfaces
and hence has been useful in formulating design ideas. Whilst no real method is
Wanderley (2000) looks at how to design and perform new computer-based musical
instruments from yet another perspective. The focus of his study is on gestural
control and hence aligns well with the design of the Sound Spheres VMI. His paper
provides a number of useful insights. He presents a four part approach to the design
15
It is comforting to see that Wanderley‟s approach is directly in line with the design
approach started for the Sound Spheres VMI and hence reaffirms its validity. His
paper also discusses various types of gestural control and looks at characteristics of
the sensors used to capture the performer‟s actions. Again, the use of strategies for
supports the work carried out by Garth Paine on the Thummer Mapping Project.
least two features. Firstly, any gestures or body movements can be used to control
the sound synthesis process. Secondly, the mapping of these gestures is entirely
programmable and hence limited only by the sound synthesis model. He suggests
however that many designs of new musical controllers and interfaces are more
auditory perception. Hence many VMI designs are not adopted for musical
This notion is re-affirmed by Dobrian (2003) who states that the design of new
interfaces for music mostly focuses on technical issues and engineering challenges.
He discusses the relationship between the performer and instrument and suggests that
plays his instrument. One‟s knowledge of the instrument being played enhances
16
appreciation of the skill of the performer. However Dobrian recognizes that with new
virtual space and resulting sounds or music and hence audience appreciation can be
However, one could argue that his guidelines are too biased towards audience
appreciation rather than the playability by a performer. Take his simplicity guideline
for example. This guideline states that mappings of gesture to sound must be simple
and direct in order for the audience to perceive the cause and effect relationship. This
(i.e. a specific gesture would always generate the same sound). As argued in section
2.3 on mapping strategies, mappings that are not one-to-one are more engaging for
users. He does go someway towards dealing with this issue in his multiple
may soon seem simplistic to an audience, but two or more simple simultaneous
Aesthetic considerations for VMI design are further highlighted by Barbosa (2001).
He gives clear illustrations of the complex sensory feedback system formed by the
performer and the instrument and argues that the feedback between the output of a
virtual device and the user must be in real time if the system is to be classified as an
instrument. Indeed it is hard to see how a VMI could be playable in any other
context. He also gives a list of considerations for live musical performances of VMIs,
although two specific points are open to criticism. Firstly, Barbosa argues that
„although the consequences of the performer‟s actions should be very clear they
should not be predictable and in this sense the interaction process should not be
totally understood by the audience‟. This is in contrast to the view held by Dobrian.
17
An argument for an opposing view to Barbosa‟s argument is that knowing exactly
how a musician plays any instrument does not take away from the pleasure an
audience can get from watching the musician perform. Secondly, Barbosa also
argues that „if the interaction process is too obvious and the audience is not surprised
suggest that Barbosa‟s point is in fact back to front. A musical performance is more
likely to be just a technology demonstration if the audience has no idea what role the
Software is at the core of a VMI and its mode of implementation can influence the
performer experience and hence the musical outcome. One particular study of
interest (Johnstone, et al. 2008) investigated three different modes of how software
can exert control of the VMI. These modes are described as instrumental,
the musical outcome and plays the virtual instrument in a similar fashion to a
intended input from the musician. In ornamental mode, the player surrenders control
for the generated sound and visuals of the VMI to the software itself. The player in
this mode merely influences the musical outcome but cannot control it. For example,
the musician may initiate a sound sequence through some input, but the specific
the player and the software share control of the musical outcome. For example, part
of the musical outcome (like the playing of a melody) may be controlled through the
musician‟s direct input and part may be automatically generated by the software
qualitative study of expert musicians using the VMI in different modes of software
18
control it was (not surprisingly) discovered that the instrumental mode was the clear
preference. This has obvious implications for the interaction design of VMIs, as the
instrument‟s software component should ideally leave the balance of power and
control to the player. This position has been taken during the design of Sound
VMI design considerations for balance of power and control to the performer, the
these in the design of a VMI necessitates a well conceived mapping strategy between
the performer‟s input and the instrument‟s audiovisual output. The Sound Spheres
project focuses on the control aspects of the VMI, specifically the ability for it to
provide control parameters of position, speed, pressure and angle to vary the audio
feedback of the VMI. It therefore follows that these control parameters should be
mapped to the specific effect on the audio feedback that each will modify. This has
synthesis parameters.
parameters of position, speed, pressure and angle are controlled simultaneously and
The study by Hunt and Wanderley (2002) looked at various strategies for mapping
real-time musical instrument control system and use these attributes to define a mode
of operation they call Performance Mode, which essentially describes how a player
of a VMI discovers how to control it by exploring the different input control options
19
and their combinations. In this mode the player may appear to be merely „playing
around‟ but in fact they are actually discovering hidden relationships between the
various system parameters. They state that this Performance Mode is usually the
player‟s first mode of operation when they try a new instrument for the first time.
This reflects my own experience of picking up and trying to play a wide variety of
control multiple parameters in order to play. This preference for performance mode
is considered in the design of the user study of my project where the session starts
Hunt and Wanderley also discuss the concept that multi-parameters should be
coupled together and that this is key to the design and development of richer
interfaces. They first describe two types of mapping, convergent and divergent.
Convergent mapping is where multiple control parameters can control a single sound
synthesis control (i.e. a many-to-one mapping). They illustrate this well by asking
„where is the volume control on a violin‟ and go on to explain that there is no single
mapping is a one-to-many mapping where one control parameter can control multiple
sound synthesis parameters (e.g. pitch, reverb, volume, etc). Again they use the
violin for illustrative purposes by asking the question “which sonic parameter does
the bow control”. It actually influences many aspects of the sound (e.g. volume,
timbre, articulation, and pitch). Initially only a one-to-one mapping strategy was
considered form the Sound Spheres VMI, however Hunt and Wanderley‟s illustration
of these two mapping types have presented an alternative view. Surprisingly they do
20
not go on to express the point that in actual fact both convergent and divergent
Hunt and Wanderley do however give details of a study that they conducted to
music. Three types were considered; a set of on-screen sliders controlled by a mouse,
a set of physical sliders moved by the fingers, and a multi-parametric interface which
uses parametric coupling. The same set of four sound synthesis parameters (pitch,
volume, timbre and panning) was used in each case. Whilst their study gave
technical and set-up details for each interface used, it was their detailed conclusions
drawn from the study which were most interesting and they can be summarised as:-
Although the study focused on controls for computer-based music interfaces, one
might argue that their findings are equally relevant to any VMI and especially the
Sound Spheres VMI whose interface is also computer-based. It would have been
would have added further value to design decisions for the Sound Spheres mapping
strategy.
So far, focus has been placed on the need for a mapping strategy between input
control parameters and audio feedback. What about mapping for visual feedback? A
view was previously expressed that for a non-contact VMI both audio and visual
21
feedback is necessary (for both the performer and audience), and the Sound Spheres
VMI will very much be an audiovisual experience. Logically therefore the Sound
Spheres mapping strategy must (and does) also consider visual feedback.
Furthermore, the synchresis (i.e. a term coined by film theorist Chion (1994),
meaning the forging between something one sees and something one hears) of both
the audio and visual feedback has also been an influence in system design
considerations.
Both mapping strategies and synchresis for audiovisual instruments are topics
details how both sound and visuals may be linked together in a musical instrument.
mapping strategies for digital and virtual musical instrument design and makes
reference to the work of Hunt and Wanderley. His section on synchresis was
enlightening and he clearly illustrated its importance by explaining how audio and
visuals are combined in films. Typically film audio and visuals are recorded from
different sources (e.g. sound effects are added after the film visual have been
recorded), however they are perceived by the audience as a single, fused audiovisual
event. Moody‟s hypothesis is that where synchresis is involved, it is motion, and the
domain in which the motion occurs, that forms the connection between audio and
visuals. Moody does however acknowledge that film based synchresis is simpler to
concept of interactivity in watching films. A VMI on the other hand must also
the audio and visual feedback of the instrument. Moody partially addresses this issue
22
by developing 8 small experimental audiovisual instruments with differing control
parameters and evaluating them against a set of criteria he believed were key factors
consolidated to develop the Ashitaka audiovisual instrument which was then also
evaluated against the key factors. Whilst his thesis has enabled the incorporation of
synchresis ideas into the design of Sound Spheres, Moody‟s conclusion as to how
evaluation of his experimental and final Ashitaka instruments were carried out only
by himself.
less technologies, as well as examples of the types of systems that use them. The
detection, gesture control, attention detection, speech recognition. One could note
that the use of infrared illuminators and/or infrared cameras is used in three of these
categories (i.e. presence detection, gesture control and attention detection), and
interestingly the Wiimote was not featured. Whether this omission was intended or
not, it does indicate that perhaps awareness that the Wiimote can be used for device-
There are relatively few musical interfaces that use non-contact methods of input and
those that exist often tend to be more akin to an art form than a controllable musical
instrument. Take for instance the Sound Sculpture created by Hegarty and Fernstrom
23
(2008) which uses electric field sensing (similar to the Theremin) to detect the
proximity and activity of people. Designed to be used in public places, the system
maps people‟s proximity and activity to various sound files and audio effects and the
Sculpture itself.
However, Franco (2005) presents a good example of a true non-contact VMI. The
parallels to the Theremin in that it is played “in the air”. It is basically composed of a
series of infrared proximity sensors. These sensors map the position of objects
(typically the player‟s hands) that are placed above them. The x-y coordinates of the
player‟s hands are mapped to sounds using real-time synthesis algorithms. The paper
gives technical details of the AirStick and describes two different approaches that
were used to trigger sounds. The first approach is described as sustained events,
where the note or sound being generated is sustained until the hand is removed. The
second is described as percussive events, where each new sound is triggered and then
decayed regardless of other new hand movements. I see the Sound Spheres VMI as
percussive in nature and hence sound generation takes the percussive events
approach.
Also of interest was the account of the initial user experience of the AirStick. Franco
tells us that players new to such an abstract environment tend to quickly find some
level of sensibility with playing the instrument, and in this sense the controller may
be seen as too easy, almost to the point of it being nothing more than a novelty.
However, he is quick to point out that through a level of persistence musicians find
qualities that would enable a possible virtuosity to it. This indicates that this type of
contact by body movement of the player. Their paper describes the design and
evaluation of the synthesizer. The particular technology used for the non-contact
interface was the processing of video input which processed the movement of the
hands and head into audio parameters. The paper details a number of audio
processing elements that can be controlled by the player. Although the elements of
tone, cutoff and decay are of specific interest and could have been important design
factors for the Sound Spheres VMI, they have not (due to the fixed project schedule)
Finger tracking is a specific type of motion capture technique used to follow the
movement and/or the gestures of fingers in the air. In Vlaming‟s (2008) thesis a wide
range of motion capture techniques and systems are identified, which he classifies as
optical, inertial, mechanical and magnetic systems. For finger tracking we see it is
optical systems that are most prevalent. Vlaming describes in some detail different
types of optical system (passive markers, active markers, and marker-less). For
passive marker finger tracking, reflective markers are placed on the fingers which are
illuminated by a light source which is reflected by the markers back into a camera.
Basically no direct light is sent from the fingers. This has the advantage that the user
of the system does not need to wear any electronic device. With active marker finger
tracking the opposite is true, and direct light must be emitted from the fingers.
Marker-less systems use video capture systems to recognize the movement and
his experience and study in using the Wiimote for finger tracking, and in doing so he
25
does highlight several minor problems he had with this approach, such as the need to
ensure both the horizontal and vertical alignment of the Wiimote are positioned
correctly for optimal results, and the issues found with the different type of reflective
materials he tried for the markers. Another common problem highlighted by Vlaming
where a gesture wholly or partially visually obstructions another. Take for example a
system that is operated through hand gestures. Without care it would be quite
possible for the system operator to apply a gesture on one hand that obstructs or
overlaps a gesture on the other hand. I have outlined how the design of the Sound
2.6 Wiimote
The use of the Wiimote for finger tracking is clearly important for my project;
however it is not the focus of the study itself. Its importance is twofold. Firstly it
must be acknowledged that inspiration for my project came from Johnny Lee‟s
(2008) study “Hacking the Nintendo Wii Remote” and his compelling presentation at
Research into the Wiimote technology and implementations are still important
other Wiimote finger tracking implementations gives the opportunity to learn from
other‟s efforts and perhaps mistakes. More importantly it was necessary to access
26
whether the accuracy of the finger tracking with the Wiimote would be sufficient for
Softic (2009) gives a general understanding of the Wiimote technology and its set-up
for finger tracking and provides very good illustrations of how to setup a finger
Lee‟s paper by providing more technical details, especially on the specification of the
LEDs used in the LED array. He also provides a detailed account on how the infrared
illuminator is built. However, his paper really only provides a general description of
each of the components of a Wiimote based finger tracking system. There seems to
be no specific aim to the study and hence there was little that could be taken from his
work to be of use for the design of the Sound Spheres VMI interface.
A study by Vuong, et al. (2009) evaluates the accuracy of the tracking algorithm
used to position the Wiimote in 3D space. Using its built-in camera, the position and
infrared LED positions and then relating these to the reported position of the
Wiimote‟s camera. The positions of the infrared LEDs reside in one coordinate
system and the position of the Wiimote‟s camera focal point resides in another. It is
from their coordinate system to that of the Wiimote‟s coordinate space, and
subsequently the 3D position of the Wiimote. Whilst not stated in the study this
describes the mathematics for 3D geometric rotation (used for the tracking
determines the residual distances between the reported positions of the Wiimote and
Phasespace. The study concludes that the overall level of accuracy of the Wiimote
makes it suitable for most tele-immersion systems. It also concludes that there are a
number of parameters that prevent the Wiimote from being highly accurate. Firstly
the Wiimote‟s camera resolution is relatively small (1024 x 768) which contributes
to loss of precision. Secondly, the relative positions of the infrared beacons may not
one might suggest that this may well be the case. The LEDs appear to be held in
position by polystyrene and tape which may possibly allow movement of the LEDs
after initial measurement of the positions. Finally not all four infrared LEDs are
detected during the entire motion of the Wiimote. This may have been down to the
choice of the LEDs which differ in the beam angle of light radiance. Perhaps this
point could have been overcome by Vuong if he had investigated further into
different LED technologies and tried different types of LED in the study.
Wang and Huang (2008) also use a triangulation method to explore the potential of
throughput. Their study differs from Vuong‟s in two aspects. Firstly two Wiimotes
are used for stereo triangulation. Secondly, it is the LEDs that are moved in 3D space
instead of the Wiimote itself. Again the mathematics of their tracking algorithm is
well explained. The study does not however give any specific accuracy level, and
The studies by both Vuong, and Wang and Huang raised concerns regarding whether
the proposed set-up and use of infrared LEDS and the Wiimote for the Sound
Spheres VMI will be accurate enough to provide the necessary control when playing
28
the VMI. As with Lee‟s demonstration of Wiimote motion tracking, the Sound
Spheres system will utilize a single Wiimote and will track up to four markers
reflecting infrared light. At any point in time the system will be tracking from zero to
four markers (depending on the performers gestures), and each marker effectively
moves independently from each other. Positioning is therefore really only carried out
between a single marker and one Wiimote so triangulation cannot take place and
therefore it is only possible to determine the x and y coordinates each. This is not a
significant problem though, due to the fact that the Sound Spheres VMI only tracks
the x and y coordinates of each marker anyway. However, the system may
example the player‟s hands are set at different distances from the Wiimote. Early
prototyping confirmed that the motion tracking of the Wiimote seems to be adequate
The use of two Wiimotes for finger tracking has been studied by Martin (2010). His
project is very similar to the Sound Spheres project in that he also implements finger
sound feature space by means of open hand gestures. There are a number of
a 3D space and hence uses triangulation and two Wiimotes. Secondly he has opted
not to use finger markers but to instead place the infrared LED‟s on the tips of the
fingers (held in place by a glove). He cites that this approach worked better than the
use of reflective markers on the fingers. Interestingly he also acknowledges the fact
that as fingers are bent, sometimes the LEDs are no longer tracked. This is because
of the beam angle of the LEDs is not sufficiently wide to be continuously picked up
29
Martin‟s work provided important input for the design of the Sound Spheres VMI. It
was considered that bending the fingers while interacting with the Sound Spheres
VMI would be a key movement in the way that performers played the instrument and
perhaps the use of reflective markers at the fingertips would present similar issues
faced by Martin during finger tracking. To overcome this, the reflective marker was
stuck to a convex shaped cover for the finger tip which allows for a more continued
reflection of the transmitted infrared light back to the Wiimote camera as the finger
is bent.
A key theme presented in both the project background and literature review is the
importance of the control parameters position, speed, pressure and angle for new
The literature review also revealed other important factors (i.e. playability,
instrument and hence finger tracking could be considered an effective method for
Can implementation of the control parameters of position, speed, pressure and angle
30
tracking an effective non-contact technique in which to play a virtual musical
instrument?
2.8 Summary
The Sound Spheres project essentially involves the design and construction of a new
non-contact virtual musical instrument, Sound Spheres, and a study on its playability.
general, new electronic musical interfaces and more specifically VMIs. It also
The non-contact element of the Sound Spheres VMI will be implemented using
finger tracking and will utilize the Wiimote game controller to provide the finger
tracking mechanism. For further design consideration the literature review identifies
finger tracking techniques and challenges, and discusses technical issues associated
31
Chapter 3 Research Methods
This project essentially involves both the design of a new virtual musical
methods were needed for each of these project goals. This section describes
Four stages to the project‟s research methodology have been identified, with each
having a number of research methods applied. These stages are; literature review,
This methodology is very similar to that used by Paine (2005) in the development
both prototyping and a user study during its development. Much of the development
was focused on the mapping of the instrument‟s various control parameters to sound
synthesis. Prototyping also played an important role here. The mapping of control
parameters is equally central to the Sound Spheres VMI and prototyping of the
mapping will also be included in the project. For user testing of the resultant control
would be observed using the instrument and their feedback was recorded using
cognitive interviews.
The design, construction and testing of the Sound Spheres system took place in the
literature review and pilot study phases. The system was then evaluated in the user
32
An overview of the research stages, applicable methods and the type of data to be
Research Methods
Literature Review
Problem Definition
Literature Search
Pilot Study
Research Data
Design / Construct / Test
Participant
Prototype Review Notes
Comments
User Study
Data Interpretation
Data Analysis
Presentation of Results
This is a key research method and was used as input to the system‟s design and to
identify appropriate methods for its assessment. There are many aspects to the
system‟s design that have benefited from literature review, such as understanding the
workings, algorithms and limitations of the Wiimote. It has enabled best practice and
of similar systems.
The system was initially designed using knowledge obtained from the problem
domain and literature review.
The system was then constructed (both hardware and software) and the first
prototype of the system produced.
The system was then modified to incorporate selected ideas and to fix any defects
discovered from the first prototype review. A second prototype system was
produced.
34
A second prototype review session was conducted with the same three people
who participated in the first review session and it followed the same format. The
primary aims of this session were to:-
Finally the system was modified to incorporate participant comments and to fix
any further defects discovered from the second prototype review.
A user study of the Sound Spheres VMI was conducted and it catered for both the
used to record the proceedings for later playback during the data interpretation phase.
Eight participants were selected to take part in the user study, including the three
participants who took part in the prototype review. This enables assessment on
whether the skills of the prototype participants had improved. Tauber, E. et al.
(2005) conclude that 3 to 8 participants yield the most useful results. Participants
were from a range of age groups and both musicians and non-musicians were
included. The profile of these participants is shown in table 1.
35
Participated in
Participant Age Prototype Reviews? Musician?
1 15 No Yes
2 17 No Yes
3 47 No No
4 43 No Yes
5 40 Yes Yes
6 9 No Yes
7 42 Yes No
8 25 Yes No
Table 1 – User Study Participant Profile
A user study session was conducted for each participant in isolation. Each session
lasted for approximately 100 minutes (just over 1½ hours) and was structured
into stages, each of a set period of time, as shown in figure 2.
The session started with basic introduction to the system and how to use it. Each
participant was given the same introduction and instructions.
The participant was then given a period of free play to see how they initially
interacted with the system and to observe their path of discovery.
Further instruction was given to formally show the participant how the position,
speed, pressure and angle controls can affect musical outcomes (they may have
already worked this out for themselves in the free play section of course).
Next the participant was given an additional period of free play, to assess how
they then performed with knowledge they had gained from the structured-play
and to see how they might make use of the combination of control parameters.
Finally a reproducibility test was conducted where the participant was asked to
compose a simple tune, using the controls to add expression as desired. They
were asked to repeatedly reproduce the tune with the same expression.
36
The user study research methods used at each of these stages, along with the planned
1. Basic Instruction 5
3. Interview 1 10 Interview
4. Control Instruction 5
9. Interview 2 10 Interview
Stage Duration
(minutes)
3.2.4 Observation
This qualitative method (commonly used in user studies) focuses on observing how
people adapt to and perform with the system. This approach has recently been used
participants freedom to use the software in any way they wished and to make music
with it to explore its full potential. Their user study involved observing participants
participants generally interact with the system, it would not specifically address a
more focussed review of how they interacted with the control parameters of position,
speed, pressure and angle. In this regard periods of structured-play have been
included in which participants were asked to explore each of the control parameters
in turn, so that their interaction with them could be observed in isolation. This means
that questions relating to the control parameters in the structured interview should
constrained to specific tasks to provide some basis for comparison. The same think-
The studies by Jorda (2004b) and Ferguson and Wanderley (2009) highlight
previously suggested that this may be one way of assessing the Sound Spheres VMI.
Thus, a simple reproducibility test (as described in section 3.2.3) has been
incorporated into the user session. The observation technique was applied as they
played. To focus concentration on the task the participants were asked not to
3.2.6 Interviews
Conducting cognitive interviews with users after using the system enabled the
made whilst they were using the system. This research method is commonly used
when performing user studies, and has recently been used by Kiefer (2010) when
38
evaluating the Phalanger gesture recognition system. As participants were of
different ages (maturity level), musical abilities and experience with interactive
systems, the terms of reference and understanding between each participant was not
consistent. The cognitive interview process therefore catered for this difference in
domain knowledge. Paine (2005) also cites this as a reason for introducing cognitive
interviews during the user studies of the Thummer electronic musical instrument.
The strengths and weaknesses of interviews reside in the interaction between the
interviewer and respondent. Considering the difference in ages, musical ability and
experience with interactive systems of the planned participants the type of interaction
with them is also likely to differ. There is therefore a potential for bias or distortion
in the interview responses. This bias was considered carefully during data analysis
and to some extent is offset by the fact that other research methods (such as a
The output of the interview process was a set of interview notes and a video
3.2.7 Questionnaires
The use of questionnaires as a research method was recently adopted for a project
that developed several Theremin based 3D-interfaces (Geiger, C., et al. 2008). For
consisting of 49 questions that captured data on both the playability of the system
and its control parameters. The questionnaire allowed responses from the participants
39
Walonick (2004) presents a very informative paper on designing and using
and wording questions. For example, the advice to group related questions and to
ensure that each question asks for an answer on only one dimension has been
followed.
The questionnaire (and interview) questions were designed to collect data in the
following areas:-
Of the 49 questions 39 asked the participant to respond using a 5 point Likert rating
scale (strongly disagree, disagree, neither agree or disagree, agree, and strongly
agree), thus providing quantitative data to which statistical analysis could be applied
(e.g. 75% of participants thought that the mapping of sound synthesis to the position
control parameter was appropriate). One could consider the wording of the responses
implies symmetry around the middle response (neither agree nor disagree) and hence
one might consider the data as interval-level data also. It is widely believed that the
the strongly agree or strongly disagree. Therefore, this project has considered the
Likert scale data as ordinal. The scoring used a bipolar scale of -2, -1, 0, 1, 2 instead
questions were worded such that positive responses (agree and strongly agree) were
in support of the hypothesis to the research question. Three questions were in reverse
of this, where the responses “disagree” and “strongly disagree” are viewed as
positive.
The remaining questions served to identify the participant and their ability to play
and read music, rank control parameters based on ease of use and importance to
musical outcomes, and finally to ask for general comments about what participants
The quantitative data captured from the questionnaire was used to support or dispel
One approach that was used for evaluating expressive musical interfaces by Stowell,
et al. (2008) was discourse analysis. They report that the discourse analysis method
can derive detailed information about how musicians interact with a new musical
interface. However, considering the fixed project timescale and the fact I am
the discourse analysis method has not been used as it is time intensive and would
Both card sorting and participatory design research methods had been considered
but these options were ruled out. The requirements for the Sound Spheres application
have not been generated by a known set of users, and hence I believe capturing user
41
An agile/iterative approach to the prototyping stage of development was also
considered, perhaps by prototyping and reviewing how the system will implement
each of the control parameters (position, speed, pressure and angle) in turn. There
were two concerns with this approach. Firstly, the project has a fixed timescale and
the project plan allowed a relatively short time for the design and development of the
system, making it difficult to incorporate four prototype reviews into the schedule.
Secondly, reviewing the implementation of these controls in isolation does not help
identify any issues of usability when two or more of them are combined.
were not structured entirely on the control parameters. Other aspects of the system
were equally important to assess during the prototype review (such as layout, visual
Much of the data collected is qualitative and as such a grounded theory methodology
has been used in preparing this data for analysis which is carried out in two stages.
Firstly the data has been thoroughly read through and in the case of the video
recordings, transcripts of participant‟s comments were made (along with notes to put
comments into context). Secondly, the key points from each data source were then
coded, conceptualized and categorized which allowed comparison analysis across the
data to discover similarities and differences between the data sources. This
methodology has also been used for the non-Likert scale questions from the user
study questionnaires.
42
Several approaches were used to analyse the Likert scale questions from the user
study questionnaires. Responses to the questionnaire were collated and scored for
statistical analysis. A series of bar charts were also produced to graphically show the
results. The median and mode was determined for each question to measure central
tendency. The mean and standard deviation was not considered due to the ordinal
nature of the data. The responses for each question were also nominally grouped into
positive and non-positive categories. Summation of the responses was not seen as
correlation. In this case the Spearman’s rank correlation method was used to
A chi square test for the questions that specifically relate to control parameters was
achieved for a non-contact VMI. This test was not used as the sample data was not
sufficiently large.
A non-parametric method for statistical hypothesis tests was used. In this case,
because the samples were small the Mann-Whitney U Test was used. A comparison
was made between musicians and non-musicians, and between those who did and did
not participate in the prototype reviews. Perhaps, for example, those who participated
in the prototype reviews found control of the Sound Spheres VMI easier, suggesting
43
Two questions on the user study questionnaire asked participants to express a
determined by scoring each response and totalling the scores for each control
parameter. The control parameter with the highest score was ranked first.
44
Chapter 4 System Design
4.1 Overview
This section presents the functional design of the Sound Spheres VMI. The design
based programming and the project‟s fixed timescale. In this regard functionality was
limited to directly address the research question. The final version of the system was
Section 4.11.
An object-oriented approach was taken for the software component of the VMI. The
system‟s software has been developed using Microsoft‟s Visual Basic .Net object
for its graphics implementation. For handling and interpreting data from the Wiimote
the software application uses a .Net managed library developed by Peek (2009).
A series of videos have been created to demonstrate the Sound Spheres WMI user
Unlike some finger tracking applications, complex hand or finger gestures to interact
with the software have been avoided. Instead the finger tips are used as passive
Only four points can be simultaneously tracked with the Wiimote and hence this
Each of these sound spheres is assigned a unique sounding musical note. The
collision of the tracking spheres with the sound spheres plays back the assigned
sound, and hence the user is able to play the VMI rather like a percussive (e.g. drum,
for the prototype reviews, one of which was eventually selected for the usability
study. A sound with unique qualities was sought to avoid being recognizable as
another instrument (a piano for example). A midi controller and synthesiser were
used to create distinct musical sounds for the individual notes to be assigned to each
The presentation (or layout) of the sound spheres on the screen is an important
Both the tracking spheres and sound spheres have been aligned on the x-y axis (i.e. z-
46
Figure 3 – Axis Alignment in Perspective View
Note however, the actual display on the screen is an orthogonal view as shown in
Figure 4.
In this layout one might question why spheres are being used and not circles (i.e. 3-
dimensional over 2-dimensional). The reason for this is to provide visual feedback
47
The sound spheres are arranged in two rows, each comprising the 12 notes of an
octave as shown in Figure 5. To differentiate the natural notes from the sharp notes,
different size sound spheres are used. This type of visual differentiation is used in
many traditional musical instruments. For example, a piano‟s natural and sharp notes
are differentiated using black and white keys, as well as differences in key size.
Similarly a glockenspiel uses size and position of its bars for natural and sharp note
differentiation.
The two rows of sound spheres also correspond to two different octaves, one octave
apart.
48
4.3 Implementation of Control Parameters
This section describes how each of the control parameters of position, speed,
pressure and angle has been implemented for the Sound Spheres VMI.
4.3.1 Position
When a tracking sphere collides with a sound sphere the horizontal distance from the
point of collision and the central line of the sound sphere is determined (as illustrated
in Figure 6). The sound generated at the point of collision is then adjusted dependent
Tracking
Sphere
Sound
Sphere
4.3.2 Angle
When a tracking sphere collides with the sound sphere the angle between three
points is determined. Point 1 is the center of the tracking sphere at the start of its
movement towards the sound sphere. Point 2 is the center of the tracking sphere at
the point of collision with the sound sphere. Point 3 is a point anywhere on the
horizontal line at the point of collision. The sound generated is then be adjusted
This type of action can often be seen when a drummer strikes a cymbal for example.
A change in the angle in which the drummer chooses to strike the cymbal with the
drumstick will produce a different sound. Sometimes the player will use a very large
49
angle and appear to brush the drumstick over the surface of the cymbal and
sometimes a more direct hit is executed with widely different sounds being
generated.
Note: A tracking sphere colliding with sound sphere at the same position can yield a
difference in sound depending on the starting position of the tracking sphere. This
Determining the starting point of the tracking sphere was a challenge. The transition
of a tracking sphere between the playing of one note to the next will frequently
involve continuous motion and hence the point at which the movement starts to play
or even a stringed instrument like the piano, the movement of the striking object (be
it a mallet, stick or fingers) from one note to the next is rarely linear. A player
generally lifts the object from one striking position before they start the movement to
make another strike. With this in mind, the tracking sphere‟s starting position was
determined by the point at which the movement changes from a positive direction in
50
the y-plane to a negative one (i.e. the point at which a downward movement starts
4.3.3 Speed
When a tracking sphere collides with a sound sphere the average speed of the
average speed = distance between start position and collision position / (t2 – t1)
…where t1 is the time at the starting position of the tracking sphere and t2 is the time
The sound generated at the point of collision is then adjusted dependent on the
One might ask the question how it is possible to control a low speed when having to
quickly move a tracking sphere from one sound sphere to another to play in time to a
musical piece. This problem, however, exists in many traditional instruments and
does not prohibit the control parameter being of importance. Again, consider the
xylophone. The volume of sound generated is very much dependent on the speed on
51
which the player strikes the mallets onto the xylophone bars. The mallets may well
have to be moved quickly from note to another in order to play in time to a piece of
music, but the player is still able to exert a degree of control in the resulting volume.
The issue of control therefore becomes more to do with the skill of the player and the
choice of how the next note in the piece is executed. With the Sound Spheres VMI
the player has a choice of playing any subsequent note with any of the four tracking
spheres and hence perhaps the distance they need to travel can be minimized. A
One must also bear in mind that in reality a small movement of the fingers affects a
big movement in the tracking spheres (sensitivity) and hence the fingers do not need
A more difficult aspect to the speed concept as presented above is to determine the
starting position of the tracking sphere (as described for the angle control in section
4.3.2).
4.3.4 Pressure
challenge. For this project pressure was considered from a perspective of momentum.
Consider two objects with different masses travelling at the same velocity. In this
case their momentum differs. If they were both to collide against the same surface
then the one with the largest mass will exert more pressure on the surface.
52
In the virtual world tracking spheres obviously have no mass. However they do have
size! As each tracking sphere is the same type of object we can reasonably conclude
that tracking spheres with the same size would also have the same mass. We could
then conclude that a large tracking sphere would exert a greater pressure on a sound
In other words, by varying the size of a tracking sphere we vary the pressure being
To implement the ability to dynamically and quickly change the size of the tracking
spheres the user interface displays a visual component that I call a pressure control.
A pressure control has been placed on either side of the user interface so it can be
quickly accessed by tracking spheres controlled by either the player‟s right or left
hand. The pressure control has two circular surfaces, one containing an upwards
facing arrow representing increasing pressure and one a downward facing arrow
the size (and hence the implied pressure) of all tracking spheres when the center
point of one of the tracking spheres is positioned over one of the pressure control‟s
surfaces.
Figure 9 shows the sensory feedback loops for both contact and non-contact VMIs. It
can be seen that the non-contact sensory loop can only provide feedback to the user
using audio and vision. Thus visual feedback is an important factor to the design of
53
Figure 9 - Sensory Feedback Loops
The Sound Spheres VMI provides visual feedback to the player when the tracking
Firstly, graphics are displayed at the point of each collision. A graphics particle
engine was implemented to display a set of flying sparks at the point of collision.
The direction and dispersal of the sparks is dependent on the position of the colliding
Secondly, when a tracking sphere collides with a sound sphere, the sound sphere
vibrates if it were each on a spring. The vibration diminishes over time until the
vibration stops. The direction of the vibration is always up and down. Consideration
was given as to whether the direction of vibration should be dependent on the angle
also, however, this was not implemented. As the sound spheres are placed close
54
Thirdly, when a tracking sphere collides with a sound sphere, the sound sphere spins
around its horizontal axis. The initial speed of spin is dependent on the speed of the
colliding tracking sphere, and the speed of rotation diminishes over time until the
spinning stops. So that the speed of spin is readily apparent to the user graphical
The vibration and spinning of the tracking spheres are not related to any specific
control parameter and denote tracking sphere collision only. It was felt that two
elements of visual feedback provide a stronger reference and better compensate for
For ease and speed of implementation, the sounds generated by the Sound Spheres
VMI use a set of pre-recorded sound wave files and Microsoft‟s DirectSound
technology is used to load and play these back. This technology also provides the
ability to apply effects to sounds as they are played back. Each effect has parameters
Four specific effects have been selected and a mapping between these and the control
parameters (position, speed, pressure and angle) have been applied as shown in
Table 2.
55
Control Parameter Effect
The Sound Spheres VMI is comprised of the following software and hardware
components. Specific details of each component can be found in Appendix J.
Wiimote controller.
Cover for the infrared led array (to shield light from the player‟s eyes).
The components are setup on a two-tiered desk with the top tier used as a surface on
which to stand the speakers and computer monitor and the lower tier used as a
surface for placement of the Wiimote and LED arrays. Separate tiers enable the
Wiimote and LED arrays to be positioned horizontally central to the monitor and
56
speakers without obstructing the player‟s view of the monitor. The Wiimote and
adjustable chair also allows player‟s to raise or lower their playing position. The
reference speakers are positioned either side of the monitor so that stereo effects are
The initial system setup included two LED arrays. However, one of the LED arrays
failed and could not be repaired. This put the project at risk as there was no time in
the project to procure and construct another. The eventual solution to the problem
vastly improved the playability of the system. Details of this problem and solution
can be found in Appendix J. The final system setup only used one infrared LED
array, and the Wiimote was positioned so that its camera was partially obscured by
the LED array. A cover was placed over the LED array and Wiimote to shield
infrared light the participant‟s eyes. This can be seen in Figure 12.
57
Figure 12 - LED Array, Wiimote and Cover
lightweight plastic cap with a convex surface onto which highly reflective tape has
been fixed. As described in section 2.6 based on work by Martin (2010), the convex
surface improves continued reflection when the fingers are slightly bent. The cap is
placed over the tip of the finger and is held into position by gaffer tape inserted
inside the cap. The gaffer tape can be renewed when necessary to ensure good
adhesion. The plastic caps used were taken from Playmobil© toy figures, as the hair
attachment was discovered to be the perfect shape, size and very lightweight, and
The physical setup and nature of the Sound Spheres VMI naturally reduce occlusion
problems from occurring. This is because there is nothing between the Wiimote
camera and the player‟s finger tips and whilst occlusion can occur if fingers on one
hand obstruct fingers on the other hand, it is would be quite a difficult gesture to
have occlusion of fingers on the same hand and this sort of gesture is unlikely when
playing the Sound Spheres VMI in a percussive manner where only the finger tips
are used.
58
Occlusion is considered a limitation of the VMI that players will need to work out
and differs little with playing a traditional instrument. For example, the physical
constraints of a guitar and a guitarist‟s hands and fingers mean that not all desired
playing positions are possible. Knowing these constraints and through a learning
process a guitarist has to work out the most effective way to play a desired piece of
music on the guitar. Similarly, the occlusion problem associated with the Sound
Spheres VMI could be considered simply as a constraint that the player must learn to
A central component to the Sound Spheres research project is a user study. Whilst
the participants are volunteers it was important that they are made fully aware of any
potential risks to their health and safety before agreeing to participate. In this regard
three potential risks have been identified and were disclosed to all participants prior
to confirm that they were informed about and understand the risks involved. These
The Sound Spheres system involves the use of an infrared (IR) illuminator. Without
participant‟s health and safety, namely a potential risk of eye damage. This risk has
been reduced in the design of the Sound Spheres VMI, through a combination of
Ergonomics were also considerations. Playing requires participants to sit with their
arms extended and pointing one or more fingers towards the computer display.
59
Movement of both the arms and fingers is necessary. Lee (2008) suggests that the
onset of fatigue happens quite quickly for this type of activity. I believe it is ethical
to warn participants of this fact, especially as exercising muscles that are not
generally used may result in stiffness or soreness sometime after the exercise has
stopped.
People with the photo-sensitive epilepsy‟ medical condition may find that moving or
flickering light can cause problems, and this can include computer screens. Whilst
this condition is quite rare it is ethical to warn participants of this risk and to ensure
The pilot study‟s prototype review sessions were very much part of the design
process rather than a study into the VMI‟s playability. For this reason the results or
primary outcomes of the prototype review are included here in this chapter. The
initial prototype review was used to discuss and make a number of design decisions,
and the second review proved invaluable for refining audio and visual feedback of
the VMI.
A number of unexpected events occurred during the first prototype review session
which inadvertently provided valuable feedback. Firstly, the user interface randomly
displayed additional tracking spheres. This turned out to be reflective materials that
the participants were wearing, such as wedding rings, a t-shirt covered in sequins,
and gloss painted finger nails. Secondly the tracking sphere movement seemed to
randomly stop. This turned out to be the invocation of auto-backup and virus
60
scanning software on the laptop that were affecting the performance of the graphics
display.
The following list summarizes key design influences from the prototype reviews:-
Two methods for controlling the pressure parameter were initially identified and
presented to the participants. Whilst one option was more analogous to pressure,
the chosen option is more intuitive to the users and was easier to implement. The
selected option was preferred by two of the three participants. However, after
using the pressure control during the second review session they thought that the
speed of increase or decrease of tracking sphere size was too fast and hence this
Three different sound types for the VMI were presented to the participants. The
One participant identified that the decay and attack of each note could be
adjusted dependent on the speed and pressure control parameters so that the
synchresis was more obvious. The greater the speed the quicker the attack and
the greater the pressure the longer the decay. This suggestion has not been
The vibration of the sound spheres upon collision from a tracking sphere was
thought to appear too „springy‟, that is the variable distance of vibration was too
great and the time taken to return to its non-vibrating state was too long.
Participants thought that the speed of sound sphere spin upon collision from a
tracking sphere correlated well with the speed of the tracking sphere. Two
61
improvements were highlighted however. One participant suggested that the
speed of spin should be dependent on both the speed and pressure of the tracking
sphere. This suggestion has not been implemented; however it has been noted for
future research. Another suggested that the sound spheres should not spin at all
until a collision took place (initially all spheres were in a state of spin and
collision just made the spin greater). This suggestion was trialed but upon seeing
the results the participants were in agreement that it should not be implemented.
Without spin the sound spheres appeared on the screen more like circles rather
than spheres and the participants preferred the 3D nature of the interface.
Different playing positions were compared for playability and comfort, including
comparison between sitting and standing, and the amount of bend in the elbows.
The seated position was discovered to be more preferable and enabled a higher
degree of control.
62
Chapter 5 Results
This chapter presents results and analysis of data collected during the user study
deriving from the research methods outlined in Chapter 3. Detailed tables of results
5.1 Playability
The summary of results from the general playability section of the user study
questionnaire, as presented in Table 3, clearly show that playing the Sound Spheres
VMI was a pleasurable experience and, that whilst challenging, did not generally
cause frustration or fatigue. The Sound Spheres VMI was on the whole thought to
facilitate the creation of music well and the majority of participants saw a positive
The results above are in line with observations and interview responses. However a
few differences are worth noting. Firstly, signs of frustration were observed by four
of the participants and for three of these, frustration was observed early in the
session, (two participants had problems with keeping the finger markers in place, and
63
one participant‟s initial playing technique was not providing enough control).
However they were able to resolve issues quickly and hence they were no longer
feeling frustration by the end of the session. Secondly, while the majority of
participants responded in the questionnaire that their playing improved over time, the
observed result was that this was only for general control (i.e. moving the tracking
spheres and producing sound) and using the control parameters in isolation.
participants were not entirely content with the layout of the sound spheres. Having
the top position of the sharp notes lower than the top position of the natural notes
presented problems when using the angle control parameter as the natural notes
obstructed access to the sharp notes with some angles, sometimes resulting in the
wrong note being played. Furthermore, the placement of the two octaves on separate
rows also caused participants an element of difficulty playing even simple tunes. The
transition of tracking spheres from the top octave to the bottom invariably caused
Whilst participants had the opportunity to play with up to four reflective markers on
the fingers, all participants opted to use only two, with one marker on each of their
index fingers. Only one participant attempted to use four markers, and whilst enjoyed
the possibility, found it relatively difficult to control. One participant had a tendency
64
to use just one marker despite wearing two; however this appeared to be down to a
A summary of results from the Audio Visual Feedback section of the user study
The majority of participants liked the sounds generated by the Sound Spheres VMI
improvements in the sound generation. Similarly the look and feel of the Sound
Spheres VMI also received positive feedback. Despite this, four participants did
express views during the user study on how the user interface might be changed.
65
The general opinions on the control aspects of the Sound Spheres VMI are shown in
the results of questions 15, 16 and 18 in Table 4. Controlling the movement of the
tracking spheres was found easy by most participants. In contrast the consistency of
movement was viewed less positively. Furthermore only half of the participants
All participants agreed that visual feedback was used appropriately with the majority
agreeing that both the spinning of the sound spheres and the flying sparks on
5.3 Reproducibility
Table 5 shows a summary of results from the reproducibility section of the user
study questionnaire. All but one participant agreed that they were able to repeatedly
play a simple piece of music using the Sound Spheres VMI. Observation showed that
The user study questionnaire indicates that the control parameters were not used
(intentionally at least) during the reproducibility test. Results also indicate that the
ability to play a piece of music in perfect time was only achieved by one participant.
Observation of participants did show that in fact two participants appeared able to
66
5.4 Control Parameters
This section focuses on results that relate specifically to the assessment on the
speed) based on ease of control, audio and visual feedback, and perceived
Three factors were considered for audio feedback during the application of each
control parameter. Firstly, did participants think the change in sound was apparent
(i.e. could the participant clearly hear the affect of the control parameter)? Secondly,
was the change in sound consistent (i.e. repeated use of the same application of a
control parameter produced the same change in sound)? Thirdly, was the change in
sound appropriate (i.e. was it well suited to the application of the control
parameter)? Only the apparent use of visual feedback was considered when each
The questionnaire responses (see Tables 6, 7 and 9) show that the position, speed and
pressure control parameters all received largely positive feedback from participants.
As can be seen in Table 8, the questionnaire responses clearly show the angle control
parameter received less positive feedback when compared with other control
parameters. The ease of control of this parameter and the consistency of its affect on
the audio feedback were factors that the majority of participants gave negative
feedback to.
67
Table 6 - Questionnaire Result Summary - Position Control
68
Table 9 - Questionnaire Result Summary - Pressure Control
On the user study questionnaire participants were asked to rank the control
parameters from 1 to 4 in order of ease of control (1 being the easiest and 4 being the
hardest). The detailed results of these rankings can be found in Appendix D and are
The overall ranking of the control parameters shows that the Pressure control
parameter was the easiest to control followed in order by Speed, Position and Angle.
These results correlate with the observed ease of control and with the Likert question
69
responses in the questionnaire. Furthermore, when ranked by those who participated
in the prototype review only and then only by musicians, the control parameters were
Participants were also asked to rank the four control parameters from 1 to 4 in order
of importance to musical outcomes (i.e. which control could be used best for
affecting the musical outcome, 1 being the most important and 4 being the least). The
The overall ranking of the control parameters shows that the Speed control parameter
Position and Angle. When ranked only by musicians, the control parameters were
ranked in the same order. However, when ranked by those that participated in the
prototype review it was found that Angle and Position rankings were in reverse.
70
5.5 Spearman’s Rank Correlation Results
The results of the Spearman‟s rank correlation method can be found in Appendix E.
Very few of the P-values are lower than 0.05 and for those that are, further analysis
shows a summary of the correlation results where the P-value < 0.05.
The issue of reliability can be seen from the result between question 11 and 15 where
there exists a strong correlation between them with a high significance (P-value =
0.01). However, the negative correlation coefficient value (of -0.834) suggests that
the Sound Spheres VMI facilitates the creation of music better as the control of the
tracking spheres gets harder. This is the reverse of what would perhaps be expected,
the creation of music well and 75% thinking that the movement of the tracking
spheres was easy. Five out of the eight results in Table 10 could be considered
unreliable based on similar explanations as above. The remaining three results appear
more reliable, however considering other results their reliability could be called into
question also.
Correlation suggests that accuracy in positioning the tracking spheres increases as the
correlation also suggests that tracking sphere positioning is considered easier with an
A strong correlation exists between the improvement of ability to play the Sound
Spheres VMI over time and the ability to distinguish the application of more than
one control parameter at a time. This suggests that progression of ability or skill of
87.5% of participants thought that the Sound Spheres VMI was challenging and
correlation was identified. Similarly, 87.5% of participants thought that the Sound
All participants were in agreement that visual feedback was used appropriately. One
might therefore expect that sound sphere spin and the use of flying sparks (the two
72
forms of visual feedback that received predominately positive responses) would
correlate strongly with the appropriate use of visual feedback. However, they did not.
feedback.
The speed, position and pressure control parameters all showed predominantly
positive questionnaire responses in relation to the audio and visual feedback used for
each control. However, there was no strong correlation between audio and visual
feedback for these control parameters, suggesting that synchresis of the Sound
The Mann-Whitney U Test was used for statistical hypothesis tests. A comparison
was made between musicians and non-musicians, and between those who did and did
suggesting that non-musicians are not disadvantaged in using the finger tracking
method for music playing. However, there were five results that identified notable
differences between the responses of those who participated in the prototype review
sessions and first time users of the Sound Spheres VMI. Table 11 shows an extract
from the Mann-Whitney summary where the results are significantly different with a
confidence level of 0.05. These results indicate that participants of the prototype
review sessions were more able to consistently control the movement and position of
the tracking spheres. They also used the control parameters to add expression during
73
play more than first time participants. Participants of the prototype review sessions
more strongly agreed with the change in sound being apparent and consistent when
using the pressure control. This suggests that, like traditional instruments, awareness
5.7 Validation
non-contact VMI using a finger tracking method. I make the assertion that validation
of the positive application of these factors would imply that finger tracking would be
considered an effective method to play a non-contact VMI. The user study of the
Sound Spheres VMI confirms that several of these factors are afforded the finger
5.7.1 Playability
Playing a non-contact VMI using finger tracking was an enjoyable experience for all,
with all participants wishing to play the Sound Spheres VMI again. The positive
results regarding the control of tracking sphere movement and the ability of the
Sound Spheres VMI to facilitate the creation of music well implies that the finger
74
tracking technique affords successful playability. The degree of success is less
Firstly, seven out of eight participants did not agree that they could play in perfect
time. This suggests that the Sound Spheres VMI does not facilitate the creation of
attributed to the fact that the Sound Spheres VMI was a totally new experience to
participants did show that in fact two participants appeared able to play in good time.
As these also participated in the prototype reviews it indicates that good timing is
Secondly, none of the participants attempted to apply the control parameters during
the reproducibility test. On reflection this is entirely natural. When learning any new
musical instrument the ability to produce music and apply good control only comes
with practice and expecting participants to do this with only 1½ hours of playing a
new instrument for the first time is overly ambitious. It is therefore inconclusive
whether the finger tracking method can facilitate control during play. However, as
discussed in section 5.1 the Sound Spheres VMI does facilitate control parameters
well and hence there is good reason to assume that over time a player could develop
5.7.2 Progression
All but one participant agreed that their ability to play the Sound Spheres VMI
improved overtime and this was also observed, especially during the structured play
stages of the user study sessions. Observation also showed that two participants
75
improved in their ability to use more than one control parameter at a time (during
free-play session stages). One can therefore conclude that the progression factor can
5.7.3 Predictability
and feedback and (both audio and visual). Results from the user study show conflict
in consistency. The speed, pressure and position control parameters were generally
considered easy to control and provided consistent audio feedback. However, only
two participants agreed that they could consistently control the movement of the
sessions and this would again indicate that consistency in control could be
accomplished with time and practice. Whilst 75% of participants thought that the
change in sound for different angles was apparent, only 37.5% thought it consistent.
The difficulty in controlling the angle parameter (as discussed in section 5.7.5) is
All participants found the Sound Spheres VMI challenging to play and the finger
tracking element to this was central to this challenge. However, despite initial
frustration observed from some participants only one agreed that it was frustrating to
play. No participant thought the Sound Spheres VMI was boring to play. Comments
made (such as “so cool”, “wow”, “amazing”, “really fun”, “addictive”) would
indicate this. I conclude therefore that the balance between challenge, frustration and
76
5.7.5 Application of Control Parameters
Results from the user study would indicate that implementation of the position,
speed, and pressure control parameters has been successfully achieved. Each of these
outcomes.
Whilst two participants agreed that the angle control parameter was easy to control,
75% of participants found angle control difficult. This was evident from rankings
which placed angle as 4th for both ease of control and importance to musical
outcomes. Observation and interview responses strongly indicate that the sound
spheres layout (as described in section 5.1.1) is a key factor for the difficulty in using
the angle control parameter. Another factor however could be that visual feedback
was not implemented for the angle control parameter and suggests that synchresis
plays an important role for successful control of non-contact VMIs and without it
control is perhaps impaired. A point to consider is that the results of the TIEM
questionnaire (as highlighted in section 2.1) also rank the angle control parameter 4th
in terms of the qualities of movement needed for control. It is quite likely therefore
that even if control, layout and synchresis were improved the angle control parameter
77
Chapter 6 Conclusions
This research project concludes that the implementation of the position, speed, and
parameter can be successfully achieved. Software is at the core of the Sound Spheres
VMI and hence finger tracking control, user interface, sound synthesis and visual
feedback (key factors that contribute to control parameter success) are entirely
applied to the Sound Spheres VMI, this research project also sought to determine
play a virtual musical instrument. This research concludes that factors of playability,
progression, control, and balance between challenge, frustration and boredom can be
applied to the finger tracking method. Whether the factors of predictability and
differences between first time users and those who had played Sound Spheres before,
during the prototype review sessions, presents evidence suggesting these factors
could be achieved by the finger tracking method if participants had more time and
78
Results indicate that non-musicians are not disadvantaged in using the finger tracking
method for music playing. The Sound Spheres VMI thus provides a fun and novel
This project has been successful in that a non-contact VMI using the finger tracking
method has been conceived, designed, constructed, prototyped, tested and studied
from the ground up. It has given birth to a new and novel non-contact VMI, the
Sound Spheres. The TIEM database (as described in section 1.2) provides evidence
to the originality of the Sound Spheres VMI, with only 4 instruments in the
technique. The user study has enabled many factors relating to finger tracking
methods, control parameters, audio and visual feedback and system design decisions
to be assessed. It has also identified areas for future research and ideas for further
Whilst the research methods of the user study were appropriate and provided a good
set of data for analysis, in hindsight a different approach to the user study would
have been more suited. I have surmised that where results are inconclusive it is down
to lack of time and practice with the Sound Spheres VMI. Had user study sessions
been more numerous and/or longer in duration participants would have had more
time to adapt to the Sound Spheres VMI and develop better skills and control.
Similarly the project would have benefited in more prototype reviews as not all
design issues were identified during this stage. The poor sound sphere layout that
contributed to the unsuccessful implementation of the angle control is one such issue
79
In hindsight a more neutral sample of participants should have been selected. The
participants chosen were predominantly friends and family and this non-neutrality
may have biased the results of the prototype review and user study with participants
The Likert scale used in the user study questionnaire used a five point scale and the
central response (neither agree or disagree) was chosen in 20% of responses. Much
of the analysis used was based on whether responses showed a negative or positive
trend and therefore it may have been more appropriate to use a four point scale where
Results from the Spearman‟s Rank Correlation were disappointing with few
statistically significant or reliable results found. It is possible that the small number
was an important aspect of this research, a greater number of participants would have
The project has not been without its challenges. The design and construction of the
system was far more complex and time consuming than first envisaged. The learning
curve of new technologies, graphics programming and statistics has been steep.
Personal challenges have also made it difficult to accommodate the demands of the
project. A detailed and evolving project schedule has enabled the system and study to
80
6.2 Future Research
This research highlights a number of topics that, whilst fall outside the project scope,
This project acknowledges the importance of synchresis in the design of VMIs, and
this has been incorporated into the design of the Sound Spheres VMI. Evidence from
the user study suggests that without both audio and visual feedback the
previous research by Moody (2009) to determine whether this is true for a non-
contact VMI, to what extent synchresis affects musical outcomes, or how best to
This research assessed the Sound Spheres VMI from the player‟s perspective only.
as stressed by both Mulder (2000) and Dobrian (2003), much of one‟s appreciation
participation is discussed in sections 2.2 and 2.3. Further research could determine
the alternative or optimal mapping strategies for musical performance and audience
The Sound Spheres VMI used the Wiimote and passive markers to implement finger
tracking. Perhaps other finger tracking techniques could be investigated. If the Sound
Spheres VMI could support multiple techniques then a comparison could be made
and perhaps the most suitable technique found. Furthermore, research into whether
other gestural interfaces could be adopted for the Sound Spheres VMI could be
considered. One such interface is Microsoft‟s Kinect which was launched after the
comparison study with other possible mappings could be undertaken to identify the
most appropriate sound synthesis model for each of the control parameters.
For the Sound Spheres VMI a very simple user interface was designed. Other user
Functional design of the system was limited to address the research question only to
ensure the system was delivered on time according to the project schedule. However
82
References
Crevoiser, A., Bornand, C., Guichard, A., Matsumura S., Arakawa, C. (2006). Sound
Rose: Creating Music and Images with a Touch Table. Proceedings of the
2006 International Confererence on new Interfaces for Musical Expression
(NIME06), Paris, France.
Crowley, J., Berard, F., Coutaz, J. (1995). Finger Tracking as an Input Device for
Augmented Reality. Proceedings of the International Workshop on Face and
Gesture Recognition, Zurich.1995.
Erich M. von Horbostel., Curt Sachs. (1914). Systematik der Musikinstrumente: Ein
Versuch. Translated as "Classification of Musical Instruments" by Anthony
Baines and Klaus Wachsmann, Galpin Society Journal (1961), 14: 3-29.
Gieger, C., Reckter, H., Paschke, D., Schulz, F. (2008). Evolution of a Theremin-
Based 3D-interface for Music Synthesis. IEEE Symposium on 3D User
Interfaces 2008 8-9 March, Reno, Nevada, USA.
Jorda, A. (2004b). Digital Instruments and Players: Part II – Diversity, Freedom and
Control. Proceedings of the International Computer Music Conference,
Miami, 2004.
Kiefer, C. (2010). Input Devices and Mapping Techniques for the Intuitive Control
and Composition and Editing for Digital Music. ACM, New York, NY, USA.
Lee, J. (2010). Hacking the Nintendo Wii Remote. IEEE Pervasive Computing, Vol.
7, No. 3, 39–45.
Paine, G. (2009). Towards unified design guidelines for new interfaces for musical
expression. Organised Sound, 14(2), pp 143-156.
84
Paine, G., Stevenson, I., Pearce, A. (2007). The Thummer Mapping Project (ThuMP).
International Conference on New Interfaces for Musical Expression
(NIME07), New York City, NY.
Softic, S. (2009). Using Nintendo Wii Remote Controller for Finger Tracking,
Gesture Detection and a HCI Device. Institute of Information Systems &
Computer Media, Graz University of Technology.
Tauber, E., Stanford, J., Klein, L. (2005). How many users are enough? User
Experience; Vol 4. No. 4.
Vuong, P., Kurillo, G., Bajcsy R. (2009). Wiimote Tracking Algorithm and its
Limitations”.
Walonick, D. (2004). Excerpts from Survival Statistics. StatPac, Inc., 8609 Lyndale
Ave. S. #209A, Bloomington, MN 55420
Wang, D., Huang, D. (2008). Low-Cost Motion Capturing Using Nintendo Wii
Remote Controllers. CSC2228 Project Report, Department of Electrical and
Computer Engineering, University of Toronto, Ontario.
Wong, E., Yuen, W., Choy, C. (2008). Designing Wii Controller – A Powerful
Musical Instrument In An Interactive Music Performance System.
Proceedings of MoMM, Linz, Austria
85
Index
AirStick, 24 playability, 15
bluetooth, 8 progression, 15
instrumental, 18 Theremin, 3
interaction, 2 TIEM, 4
ornamental, 18 Wiimote, 26
passive markers, 8
The position of the finger tips is represented on the Each control parameter was mapped to a different
user interface as small spheres (tracking spheres), whose audio effect. For example, a varying amount of
movement are used to trigger sounds through collision modulation is applied to the generated sound dependent
with a set of fixed larger spheres (the sound spheres), on the degree of angle. Control parameters were also
which are organized in two rows, each comprising the 12 mapped to different visual effects which included flying
notes of an octave. Before sounds are played back, the sparks, spinning of the sound spheres.
software synthesizes the sound depending on how the
musician has used the control parameters of position,
speed, pressure and angle. Control Audio Effect Visual
Parameter Effect
Stereo Panning. Flying Sparks.
Position
The sound is The direction of
increasingly panned sparks is
to the left or right dependent of the
speaker dependent position of
on the position of tracking sphere
collision. collision.
Volume. Spin.
Speed
A greater speed The greater the
results in a higher speed of the
volume. tracking sphere
the faster the
sound spheres
Figure 1. Playing the Sound Spheres spin on
collision.
Much of the design effort focused on how each of the Parametric EQ. Size.
control parameters could be implemented without the Pressure
A greater pressure The greater the
sense of touch and with only audio and visual feedback. results in a tone pressure the
For example, the angle control parameter was based on where the higher larger the
the angle generated at the point of collision by the frequencies are tracking sphere.
tracking sphere‟s starting position and collision point (as boosted.
illustrated in Figure 2). Having in mind how instruments Chorus. None.
like the piano and the xylophone are played, the tracking Angle
An acute angle
sphere‟s starting position was determined by the position results in a chorus
at which the movement changes from a positive direction effect with a greater
in the y-plane to a negative one, i.e. the point at which a degree of
downward movement starts, following an upward modulation than a
movement. less acute angle.
Table 1. Control Parameter Mapping
4. Control Instruction 5
Statistical analysis of the questionnaire responses
5. Structured Play - Position 5
showed very positive feedback to many factors relating to
Session 6. Structured Play - Speed 5
Observation
Research the Sound Spheres VMI. For example, 87.5% of
Stages 7. Structured Play - Pressure 5
Participant Details
General Playability
Disagree
Disagree
Agree or
Strongly
Strongly
Neither
response only.
Agree
Agree
6. Playing the Sound Spheres VMI was enjoyable.
7. Playing the Sound Spheres VMI was
challenging.
92
Audio and Visual Feedback
Disagree
Disagree
Disagree
For each of the statements below select one
Agree or
Strongly
Strongly
Neither
response only.
Agree
Agree
13. I liked the sounds generated by the Sound
Spheres VMI.
Position Control
Disagree
Disagree
Disagree
Strongly
Neither
response only.
Agree
Agree
93
Speed Control
Disagree
Disagree
Disagree
Agree or
Strongly
Strongly
Neither
response only.
Agree
Agree
27. It was easy to control the speed of the tracking
spheres when colliding with the sound spheres.
Angle Control
Disagree
Disagree
Agree or
Strongly
Strongly
Neither
response only.
Agree
Agree
32. It was easy to control the angle of the tracking
spheres when colliding with the sound spheres.
94
Pressure Control
Disagree
Disagree
Disagree
Agree or
Strongly
Strongly
Neither
response only.
Agree
Agree
36. It was easy to control the pressure of the
tracking spheres when colliding with the sound
spheres.
Disagree
Disagree
Agree or
Strongly
Strongly
Neither
42. Rank each of the control parameters from 1 to Angle: ___ Position: ___
4 in order of ease of control (1 being the
Pressure: ___ Speed: ___
easiest and 4 being the hardest).
Note: No control must have the same rank.
43. Rank each of the control parameters from 1 to Angle: ___ Position: ___
4 in order of importance to musical outcomes
Pressure: ___ Speed: ___
(1 being the most important and 4 being the
least important).
Note: No control must have the same rank.
95
Reproducibility
Disagree
Disagree
Disagree
Agree or
Strongly
Strongly
Neither
Agree
Agree
For each of the statements below select one
response only.
General Comments
47. What did you like most about the Sound Spheres VMI?
48. What did you like least about the Sound Spheres VMI?
49. Would you like to play the Sound Spheres again? No Yes (tick one only)
96
Appendix C – User Study Questionnaire Results - Likert Scale
The table below shows the responses to the Likert scale questions on the user study
questionnaire given by each participant. The mode and median are shown for the
responses to each question. The likert scale used a bipolar scoring as follows:-
A score of 1 and 2 (Agree and Strongly Agree) was considered a positive response.
There were three exceptions to this (questions 8, 9 and 10, shown with an asterisk
suffix) where a favourable responses was considered to be in the negative and in
these cases a score of -2 and -1 (Strongly Disagree and Disagree) was considered a
positive response.
97
Appendix D – User Study Questionnaire Results - Control
Parameter Rankings
The tables below shows the responses to the ranking questions (42 and 43) on the
user study questionnaire given by each participant. Participants were asked to rank
each control parameter from 1 to 4 based on the ease of using the control and the
determined by scoring each response and totalling the scores for each control
parameter. The control parameter with the highest score was ranked first.
98
Appendix E – Spearman’s Rank Correlation Results
Results for the Spearman‟s rank correlation coefficient (Rho) for selected Likert
question pairings are shown in the table below. The P-Value and Significance
percentage are also shown. Typically one would reject the null hypothesis when the
P-value < 0.05 (a statistical significance of 95%), and results that attain this level of
significance have been highlighted in red. The percentage of positive and non-
positive responses has been included for each question to aid analysis of the result.
99
100
101
102
103
104
Appendix F – Mann-Whitney U-Test Results
The U1 and U2 values for the Mann-Whitney U-Tests were calculated as shown
below:
The following tables show the n1, n2, R1, R2, U1 and U2 values. The U1 or U2 values
were compared with Mann-Whitney U distribution critical values at a significance
level 0.05. A red-flag is shown in the table against any U1 or U2 value falls below the
corresponding critical value, indicating that the null hypothesis can be rejected and
that a significant difference is found.
105
Continued on next page…
106
Results for Prototype Review Participants and First Time Participants
107
108
Appendix G – User Study Interview Responses
109
Continued on next page…
110
111
Appendix H – Categorization of Qualitative Results
112
Appendix I – Ethical Issues
A number of ethical issues were considered for the Sound Spheres research project.
A central component to the Sound Spheres research project is a user study of a new
system involving a number of participants. Whilst the participants are volunteers (i.e.
willing participants) it was important that they are made fully aware of any potential
risks to their health and safety before agreeing to participate in the user study. In this
regard I identified three specific potential risks that I feel ethically bound to disclose.
Firstly, the Sound Spheres system involves the use of an infrared (IR) illuminator.
One must therefore ask the question “is this a safe technology in which to subject
use of such equipment. However my research into this area (as outlined in later in
this section) provides convincing evidence that this technology is safe when
durations of use) are taken. Without adhering to the precautions there is a small risk
that the participant‟s health and safety could be compromised, namely a potential risk
of eye damage. These risks and evidence of safe use have been fully disclosed to
For the Sound Spheres VMI the natural playing position of the hands is at torso
height with the arms bent at the elbows. Therefore the IR illuminator will also be
positioned at torso height and raised up or down dependent on the height of the
player. This variation in height will be most prominent between a child and an adult.
113
Therefore in addition to adjusting the height, a shield has been used to cover the IR
illuminator so that infrared light cannot be directed towards the player‟s eyes.
Secondly, there are ergonomic considerations. Playing any musical instrument for
the first time requires individual adaptation and playing the Sound Spheres virtual
instrument will be no exception. Playing required the participant to sit with their
arms extended and pointing one or more fingers towards the computer display.
Movement of both the arms and fingers is necessary. Lee (2008) suggests that the
onset of fatigue happens quite quickly for this type of activity. I believe it is ethical
to warn participants of this fact, especially as exercising muscles that are not
generally used may result in stiffness or soreness sometime after the exercise has
stopped.
Lastly, consideration has been made to the specific medical condition called „photo-
sensitive epilepsy‟. People with this condition may find that moving or flickering
light can cause problems, and this can include computer screens. Whilst this
condition is quite rare (only 3-5% of people with epilepsy are in fact photo-sensitive)
it is ethical to warn participants of this risk and to ensure that anybody with known
I have discussed all of these issues with participants prior to selection. Additionally I
have asked all participants to sign an acknowledgement form (prior to taking part in
the user study) confirming that they were told about and understand the risks
involved. This form also acted as a parental consent form for any participants that are
considered minors. In Abu Dhabi (the location of this research) a minor is considered
114
Discrimination
The Sound Spheres virtual instrument makes no specific provision for disadvantaged
users with disabilities and can therefore be argued that discriminates against such
users. Users of the system must have use of their upper limbs and must not have
not accessible to all? Is it ethical to develop a musical instrument that is not playable
by all?
Whilst issues of accessibility by those with physical disabilities have seen much
ethical debate, I believe that in the case of Sound Spheres there is no real ethical
debate to be had. Looking at the Sound Spheres system from a musical instrument
perspective would support this view. As stated in the project overview, the
and thus will logically discriminate against those with certain physical disabilities.
Looking at the Sound Spheres system from a software application perspective would
also give support to this notion. I would suggest that the majority of software based
systems make no specific provision for those with disabilities. Also, the Software
Computer Society) simply states that one must only “consider issues of physical
disabilities” where appropriate but does not go as far as to suggest that this should
115
Appendix J – Sound Spheres Setup and Installation
System Components
The Sound Spheres VMI comprised of the following software and hardware
components.
Sony Vaio Z-Series laptop computer running the Windows 7 operating system,
with an Intel Core i7 CPU @2.67 GHz processor, 6GB of memory, and a
NVIDIA GeForce GT 330M graphics card.
Bluetooth adapter.
M-Audio Black Box. This is essentially a computer interface for home recording.
For the purposes of the Sound Spheres VMI I am using it for its 24-bit sound
card which is needed to implement some of the effects (such as stereo panning)
of the Microsoft DirectSound software library. There are additional features of
this device that make for fun an interesting playing of the Sound Spheres VMI,
such as a 100 pre-set drum beats and various effects processors. These features
however are outside of the scope of this project. All that is essential is the 24-bit
sound card.
Wiimote controller.
116
Infrared LED arrays with 32 LEDs with a wavelength of 850nm.
Cover for the infrared led array and the Wiimote, shielding light from the
player‟s eyes.
Bluetooth
Adapter
Wiimote and
Laptop Computer M-Audio Black Box
LED Arrays
The components are essentially setup on a two-tiered desk. The top tier is used as a
surface on which to stand the speakers and computer monitor. The lower tier is used
as a surface on which the Wiimote and LED arrays are placed. The separate tiers
enable the Wiimote and LED arrays to be positioned horizontally central to the
monitor and speakers, but without obstructing the player‟s view of the monitor. The
1
Images have been taken from the corporate websites from Samsung, M-Audio, Sony and Nintendo.
117
lower tier is set at 70 cm above the ground and represents the minimum height that
the Wiimote can be placed. The Wiimote and LED arrays can be adjusted up or
down as desired to suit the height and natural playing position of each player. The
top tier is set 20 cm above the lower tier and hence allows the height of the Wiimote
The players were also sat on an adjustable chair which also allowed them to raise or
lower their playing position. The positioning of the laptop and M-Audio Black Box
device are not important as long as they do not obstruct the player‟s view of the
monitor or the LED arrays. The reference speakers are positioned either side of the
2
Images have been taken from the corporate websites from Samsung, M-Audio, Sony and Nintendo.
118
System Failure
An event occurred during the first user study session that, whilst initially put the
research project at risk, provided a valuable insight worth further study. One of the
infrared LED arrays stopped working and could not be repaired. With no time to
procure and construct another, an attempt to get the system working with just one
LED array was made. With this configuration the movement of the tracking spheres
was possible but they would only move in a limited area of the user interface. A
different algorithm for calculating coordinates was tried to overcome this but was
unsuccessful. Through desperate trial and error it was discovered that when the
Wiimote was placed at a slight angle to the LED array the movement improved,
however it was still stilted and inconsistent. A breakthrough came when the Wiimote
camera was partially obscured (accidentally) by the back of the LED array and the
movement of the tracking spheres became very consistent, fluid, and more
responsive than when two LED arrays were working. The Sound Spheres playability
vastly improved! Due to project time constraints the technical reasons of why
119
Software Installation
This section outlines installation steps for the Sound Spheres VMI. The Sound
http://dl.dropbox.com/u/18234532/SoundSpheres.zip
Note: The system has been developed and tested using a laptop computer running the
Windows 7 operating system, with an Intel Core i7 CPU @2.67 GHz processor, 6GB
of memory, and a NVIDIA GeForce GT 330M graphics card. The Sound Spheres
specification.
2. Download the compressed folder (SoundSpheres.zip) file from its online location
(see above) and extract the contents to a desired folder (the installation folder).
Follow the steps outlined in the Installation.txt file delivered in the installation
folder.
The alternative is to download the Sound Spheres VMI source code (as detailed
120
Starting the Sound Spheres VMI
Note: Before the Sound Spheres VMI software can be started (executed) it is
necessary to establish a bluetooth connection between the Wiimote controller and the
laptop (or workstation). This is perhaps the most challenging part of the setup.
Several bluetooth stacks / drivers were tried without success. Even the bluetooth
drivers delivered by Microsoft failed. Searching on the internet revealed that this was
a common problem. The bluetooth driver that seemed to have the best success rate
3. Position the IR LED array(s) and the Wiimote as shown in Section 4.6. Turn on
Sound Spheres splash screen (as shown below) will be displayed while the
system is initializing.
121
5. If the application is unable to establish a connection with the Wiimote controller
6. The Sound Spheres user interface is then displayed, regardless of whether the
Wiimote can be initialized. This was to facilitate testing as the sound spheres can
122
Appendix K – Potential Enhancements
The Sound Spheres VMI limits movement of the tracking and sound spheres to a
space. This design was primarily to do with the fact that a single Wiimote was
used and hence the position of the finger tracking markers in 3D space is not
the finger tracking markers and would enable a richer (and perhaps more
appealing) graphical interface where both the tracking and sound spheres could
be positioned along the z-axis as well and the x-axis and y-axis. This would
improved. Furthermore, the use of multiple Wiimotes will reduce (or possibly
The Sound Spheres VMI uses pre-recorded sound wave files for audio feedback.
The sounds generated by the Sound Spheres VMI could be implemented using
true midi output. This would enable the VMI to be used in conjunction with other
midi devices for variable sound generation and hence the player could assign
The implementation of tone, cutoff and decay these would enable richer audio
feedback and perhaps enhance musical outcomes and improve appeal. Similarly
123
The Sound Spheres VMI is limited to two fixed octaves. The ability for the
implemented.
sphere. This would enhance visual feedback for both the player and audience.
Similarly, the spinning speed of the sound spheres is currently affected by the
The control-to-sound mappings and their parameters have been essentially hard-
coded for this project. One improvement would to make these mappings and their
parameters configurable so that the performer can configure the Sound Spheres
124
Appendix L – Sound Spheres Source Code
The full source code for the Sound Spheres VMI is available online in a compressed
http://dl.dropbox.com/u/18234532/Sound%20Spheres%20Source%20Code.zip
Note: The source code will also be made available on requests sent to
Craig.G.Hughes@gmail.com
The Sound Spheres VMI been developed with Microsoft‟s Visual Studio
Professional 2008 using the Visual Basic .Net programming languages. The
Sound Spheres –The main solution file for this project and can accessed from
Software pre-requisites
125
Appendix M – Sound Spheres Videos
In order to show various aspects of the Sound Spheres VMI a series of short videos
have been created and have been made available online.
The following videos show one of the participants of the user study playing the
Sound Spheres VMI during the first discovery and free play session.
http://dl.dropbox.com/u/18234532/Sound%20Spheres%20Video%201.MOV
http://dl.dropbox.com/u/18234532/Sound%20Spheres%20Video%202.mov
The following short videos attempts to show the change in sound as each of the
control parameters are used:-
The following video shows one of the participants of the user study playing a simple
melody during the reproducibility test.
http://dl.dropbox.com/u/18234532/SoundSpheres%20-%20SimpleSong.AVI
126