You are on page 1of 70

GENERATIVE FOOTSTEPS: SOUNDS FOR FILM POSTPRODUCTION by Julin Tllez Mndez A DISSERTATION SUBMITED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF BACHELOR OF SCIENCE In the School of Audio Engineering MIDDLESEX UNIVERSITY

JUNE 2013

ABSTRACT.

This dissertation adds to the research in post-production practices by using generative audio to digitally re-construct Foley stages. The rationale for combining generative audio with Foley processes is to analyse the possible implementation of new technology that could benefit from Foley practices in low-budget films. This research project also intersects sound synthesis, signal analysis and user interaction, where a behavioural analysis based on ground reaction forces was prototyped.

ACKNOWLEDGEMENT.
I would like to dedicate this dissertation to Andy J. Farnell whose expertise has really helped me immensely on my way to writing this essay. To Gillian McIver, Helena Hollis and Philippa Embley who never ceased in helping me right until the very end. A Dios por su intervencin divina en este logro Acadmico. A mi madre Zoraida Mndez y Abuela Bertha Daza por hacer the mi una mejor persona. A mis tas Martha y Bellky Mndez por su incondicional apoyo sin utedes nada hubiese sido possible.

ii

TABLE OF CONTENTS.
CHAPTER 1 INTRODUCTION. 1.1 SIGNIFICANCE OF THIS STUDY. 1.2 PROBLEM STATEMENT. 1.3 LAYOUT OF DISSERTATION. CHAPTER 2 GENERATIVE FOOTSTEP SOUNDS. 2.1 LITERATURE REVIEW. 2.1.1 INTRODUCTION. 2.1.2 SOUND TEXTURES. 2.1.3 DEFINITIONS AND PRINCIPLES OF GRANULAR SYNTHESIS. 2.1.4 STOCHASTIC ANALYSIS. 2.1.5 PROCEDURAL AUDIO IN RESPONSE TO FOOTSTEP MODELLING. 2.1.6 SUMMARY. 2.2 METHODOLOGY. 2.2.1 INTRODUCTION. 2.2.2 OBJECTIVES. 2.2.3 PARAMETERS. 2.2.3.1 The Grain Envelope Analysis. 2.2.3.2 The Grain Dynamics. 2.2.3.3 Footstep-modelling. 2.2.3.4 The Ground Reaction Force. 2.2.4 PROCEDURES. 2.2.4.1 Pure Data. 2.2.4.2 Arduino. 2.2.5 Architecture. 2.2.6 SUMMARY. 1 2 3 4 5 5 5 6 8 10 12 14 15 15 15 15 16 18 19 20 23 23 25 26 28

iii

CHAPTER 3 EVALUATION. 3.1 INTRODUCTION. 3.2. QUANTITATIVE DATA. 3.2.1 THE DATA COLLECTION METHOD. 3.2.2 RESEARCH FINDINGS. 3.2.2.1 STATISTICAL ANALYSIS. 3.2.2.2 T-Test. 3.2.2.3 Chi-Square. 3.2.3 THE RESULTS AND EVALUATION. 3.3 QUALITATIVE DATA. 3.3.1 DATA COLLECTION METHOD. 3.3.2 RESEARCH FINDINGS. 3.3.2.1 One-to-One Interview. 3.3.2.2 e-Interviewing. CHAPTER 4 CONCLUSION. APPENDICES. APPENDIX A. APPENDIX B. APPENDIX C. APPENDIX D. APPENDIX E. APPENDIX F. APPENDIX G. REFERENCES. BIBLIOGRAPHY.

29 29 30 31 33 35 38 39 40 41 41 43 43 44 46 48 48 49 52 53 53 54 55 56 59

iii

LIST OF TABLES.
Table 1: Average Quality. Table 2: Chi-Square. Table 3: Expected Values. 37 39 40

LIST OF FIGURES.
Figure 1: Sound Texture Extraction. Figure 2: Gaussian Window. Figure 3: Output list. Figure 4: Transient Detector. Figure 5: Grain Dynamics. Figure 6: GRF Exemplified. Figure 7: The Gait Phase. Figure 8: GRF Distribution in Pure Data. Figure 9: PD Environment. Figure 10: The Cloud. Figure 11: Code in Arduino. Figure 12: Architecture. Figure 13: Prototype. Figure 14: Polynomial Curves. Figure 15: Question 2. Figure 16: Question 3. Figure 17: T-Test in Excel. 6 9 16 17 18 21 22 23 24 24 25 26 27 28 33 35 39

iv

CHAPTER 1 INTRODUCTION.

This project will focus on the use of granular synthesis techniques for dynamically generated audio with a main emphasis on film post-production. In particular, footstep modelling will be studied extensively. The results will be compared with those obtained from previously recorded content including Foley and several location recordings. Creating dynamically generated audio, otherwise known as Procedural Audio (PA) is a practice that involves the process of using programmable sound structures. This allows the user to manipulate audio by establishing the input, the internal and output parameters to ultimately develop a non-repetitive and meaningful sound (Farnell, 2010). Different types of technology have called on a number of methods to attempt to provide a quick and efficient solution for audio, especially on interactive applications such as video games. Many of these sources and methods are discussed below, however it is beyond the scope of this study to try and resolve these issues once and for all. They will undoubtedly cause controversy and debate for many years to come. This work, on the other hand, aims to contribute to the existing evidence that should add to a better understanding of generated audio. This study will highlight the need to continue the research and development of new technology that will help to encompass generative audio.

1.1 Significance of this Study.


This study will aim to gather and provide the existing theories in an effort to expand on, clarify and support them. Various books and academic papers have been extensively examined in order to tailor a perspective that can justify the reason for this study, which aims to answer three very specific questions: Why are generative audio and sound modelling so important? How can they be applied and what methods have been developed? What benefits can generative audio bring to the Post-production industry? In order to define the scope of this study, I have chosen to investigate and analyse the process of modelling sound for Foley footsteps. The purpose of this will be to study existing footstep models, especially those exhibited in Andy Farnells book, Designing Sound. Based on the studies carried out by authors such as Roberto Bresin and Perry R. Cook, I will attempt to formulate a footstep modelling method. This study is a compelling effort to promote structured sound models in the post-production industry. It will also be beneficial to sound design professionals and students, as it will provide information and performance evaluations of certain methods previously used in accordance with footstep modelling. Moreover, it will be helpful to the post-production industry and independent sound professionals, as it will inform them more in the area of generative audio.

1.2 Problem Statement.


The human ear can only discern a limited number of sounds. This selective attention, otherwise known as the cocktail party effect, focuses on a particular stimulus, while filtering out a range of other stimuli (Moray, 1959). Recent observations have shown that one in seven people are able to recall information from irrelevant sources (Wood and Cowan, 1995). Sonic events need to happen in order for one to be able to differentiate between a single or a continuous stream of events (Strobl, Eckel and Rochesso, 2006). For this reason, sensible decisions regarding what sounds should be heard at particular times are imperative. Sonic content has the power to enhance the narrative of film but it can also distract ones attention and create discomfort. Over the years, re-creation and re-recording of all human sounds in a film has been refined into an art named Foley. This process consists of several important steps and individuals and it can be used to soften the audio as well as to heighten scenes. According to Vannesa Ament, a former Foley artist, many films contain so many different sounds that the listeners ears can easily become overwhelmed (Ament, 2009). Foley stages are unique in the sense that they are built with various surfaces that cover concrete, wood, carpet, tile, linoleum, dirt, sand and even water. One of the reasons why low-budget films sound amateur is the lack of recording facilities and particularly the lack of Foley stages. This dissertation will add to research in Foley practices by using generative audio to digitally re-construct these stages. The rationale for combining generative audio with Foley processes is to analyse the possible implementation of new technology that

could benefit from Foley practices in low-budget films. Throughout the last thirty years, customised libraries have been an essential part of post-production work. Recording assets have become an increasing commodity; a single library can easily compile over ten thousand individual samples. According to David Lewis Yewdall, it will literately take years to get to know a library (Yewdall, 2007). Having thousands of sounds collected has relatively simplified sound design; however sound libraries on their own, are nothing but an agglomeration of samples. Excellent editors can create very realistic and convincing sounds, but they will never sound as authentic as custom-recorded ones.

1.3 Layout of Dissertation.


This study will be structured in the following ways. Chapter 2 consists of a literature review and the methodology, the literature review will study the background framework of sound textures, different approaches for the creation of these textures will be highlighted; it will then introduce granular synthesis and explain how it serves to structure generative audio. An attempt will be made to analyse the evolution of dynamically generated audio in response to sound modelling. Accordingly, the methodology will be discussed and both the anatomy and actions of the foot will be examined in detail to gain a greater understanding of the modes and dynamics of gait movement. This research project will follow a post-positivist approach where cause and effect thinking is reinforced. Chapter three will evaluate and analyse all data collected in an attempt to convey a structure for footstep modelling, which will be summarised and concluded in Chapter four.

CHAPTER 2 GENERATIVE FOOTSTEP SOUNDS.

2.1 Literature Review.


2.1.1 Introduction.
This chapter provides a review of the literature and secondary data related to sound texture, granular synthesis and footsteps modelling. Accordingly, this chapter will initially discuss the principle of sound texture, presenting some examples and observations on the subject. Consequently, it will proceed to define granular synthesis, followed by a description and analysis of the evolution and development of procedural audio in response to sound modelling. The concept of procedural audio (PA) is closely linked to programming as it uses routines, subroutines and methods to create, reshape and synthetise sound in real time, thus there will also be an analysis of how these two relate to one another. Finally, there will be a critical analysis of the benefits and challenges of implementing procedural audio in post-production, as well as a consideration of the failures of its implementation.

2.1.2 Sound textures.


Further studies in everyday listening, led by Gaver (Gaver, 1993) have served as the foundation for understanding sound and hearing, particularly in the analysis and synthesis of sounds with procedural audio. By separating contact objects from any interaction, Gaver described individual impacts as a continuous waveform, which characterises the force they introduce to the object, suggesting that there may be information for interactions that are invariant over objects, in this particular, case it is the force exerted when a persons body is in contact with the ground (see figure 1). This particular topic will be examined extensively in the oncoming subheadings. In the virtual world, interaction is represented in terms of energy passed through a filter allowing objects to be modelled independently. With regards to footstep modelling, Farnell, who reflects on the generation and control of complex signals, has also extensively researched the behaviour and intention of sound. Reflecting on the complexity of walking, you will understand why film artists still dig Foley pits to produce the nuance of footsteps, and why sampled audio is an inflexible choice (Farnell, 2010).

Figure 1: Sound Texture Extraction (Gaver, 1993, p 293).

Despite new contributions to this concept being theoretical, a few implementations such as the Foley Automatic developed by Kees van den Doel, Paul G. Kry and Dinesh K. pai, have proven to deliver high-quality synthetic sound. The Foley Automatic is composed of a dynamics simulator, a graphics renderer and a audio modeller. Interactive audio depends upon world events where order and timing are not usually pre-determined. According to Farnell, the common principle, which makes audio interactive, is the need for user input. In an attempt to represent emotional qualities, sounds need to adapt to pull the mood of the user (Farnell, 2007). This project is based on Gavers foundation analysis and synthesis of sounds, which involves an iterative process of analysing recorded material and synthetising a duplicate on the basis of the analysis. As described by Gaver, the criteria for sound texture is based on conveying information about a given aspect of the event as opposed to being perceptibly identical to the original sound (Gaver, 1993). Nicolas Saint-Arnaud, defined sound texture as a constant long-term characteristic and attention span. A sound texture should exhibit similar characteristics over time. It can have local structure and randomness but the characteristics of the fine structure must remain constant on the large scale. A sound texture is characterized by its sustain Attention span is the maximum between events before they become distinct. High level characteristics must be exposed within the attention span of a few seconds (Saint-Arnaud, 1995). Different studies have broadly approached the question of how to perform a sound segmentation in order to create a sonic event

that resembles the original. However, no up to date applications for producing sound textures are available and it is still based on manually editing recorded sound material. An increasing number of analysis and synthesis of sound textures have been formulated in the past few years, where an intersection of many fields such as signal analysis, sound synthesis modelling information retrieval and computer graphics is notorious (Strobl, Eckel and Rochesso, 2006). In the context of footstep modelling, granular synthesis presents arguably the best approach; this research is therefore to study the principles of granular synthesis in an attempt to collect information that could lead to better-structured and concise sound model.

2.1.3 Definitions and Principles of Granular Synthesis.


The concept of granular synthesis has existed for many years, based on the fletcher paradox, stated by Zeno, which divides time into points as opposed to segments if everything when it occupies an equal space is at rest, and if that which is in locomotion is always occupying such space at any moment, the flying arrow is therefore motionless (Aristotle, 239). Albert Einstein also predicted that ultrasonic vibration could occur on the quantum level of atomic structure, which led to the concept of acoustical quanta (Roads, 2001). Consequently, there are various descriptions and definitions of granular synthesis that are in existence. British scientist, Dennis Gabor proposed that All sounds can be decomposed into a family of functions obtained by time and frequency shifts of a single Gaussian particle. Any sound can be decomposed into an appropriate combination of thousands of elementary grains (Gabor, 1946); such a statement was significant in the

development of time frequency analysis, and set the starting point for granular synthesis. Roads who implemented granular sound processing in the digital domain has also made several contributions. In his book Microsound, he stated that sound can be considered as a succession of frames passing by at a rate too fast to be heard as discrete events; sounds can be broken down into a succession of events on a smaller time scale (Roads, 2001). For the purpose of this research project, the description provided by Gabor with a slight variation on the pure Gaussian curve (see figure 2) will be adopted (Farnell, 2010). A Tukey envelope, also known as the cosine-tapered window, will be used; this envelope attempts to smoothly set the waveform to zero at the boundaries, evolving from a rectangle to a Hannig envelope (Harris, 1978). It is useful to briefly consider the principles of granular synthesis and how these affect audio. According to Roads, (2001) a micro-acoustic event contains a waveform, typically between one thousandth of a second and one tenth of a second, shaped by an amplitude envelope. The components of any grain of sound approach the minimum perceivable time for duration, frequency and amplitude, creating time and frequency domain information. By combining grains over time, sonic atmospheres are created. However, granular synthesis requires a broader amount of control data, which is usually controlled by the user in global terms, leaving the synthesis algorithm to fill in the details.

Figure 2: Gaussian Window (Roads, 2001, p87).

Gabor (1946) observed that any signal could be expanded in terms of elementary acoustical quanta by a process, which includes time analysis. Grain envelopes and durations vary in a frequency-dependent manner. However, it is the waveform within the grain, which is the most important parameter, as it can vary from grain to grain or be a fixed wave throughout the grains duration. This implementation pointed out the biggest flaw of time granulation, a constant level mismatch at the beginning and end of every sampled grain, creating a micro-transient between grains and thus, resulting in a periodic clicking sound. More recent work has shown that when grain envelopes are overlapped, it creates a seamless cross-fade between them (Jones and Parks, 1988). Numerous generative audio content has been created and extensively developed using the principles of acoustical quanta, allowing sound designers to easily sample, synthesise and shape audio content; producing complex but controllable sounds with a relatively small Central Processing Unit (CPU) usage. According to Curtis, a grain generator is a basic digital synthesis instrument, which consists of a wavetable where amplitude is controlled by a Gaussian envelope. In this project, the global organisation of the grains will follow an asynchronous system, which means that the grains will be encapsulated in regions or clouds which are controlled by a stochastic or chaotic algorithm.

2.1.4 Stochastic Analysis.


Stochastic event modelling is a process that involves random variables where X = {X(t) ; 0 t < }, on the synthesis level, the aim is to generate a signal that can vary continuously

10

according to various parameters. Dynamic stochastic synthesis is a concept that has existed for the last fifty years, composers such as Xenakis, have speculated about the possibility of synthesising completely new sonic waveforms on the basis of probability (Harley, 2004). Xeneakis proposals to the usual method of sound synthesis take the form of five different strategies (Roads, 1996): 1. The direct use of probability distributions such as Gaussian and exponential. 2. Combining probability functions through multiplications. 3. Combining probability functions through addition (over time). 4. Using random variables of amplitude and time as functions of other variables. 5. Going to and fro between events using variables. Roads describes how the user could control the grain cloud by adjusting certain parameters, these include (Roads, 2001): 1. The start-time and duration. 2. The grains duration. 3. The density of grains per second. 4. The frequency band of the cloud. 5. The amplitude envelope of the cloud. 6. Their spatial dispersion. All these considerations will be tested and further explained in the oncoming subheadings, where the effects of different grain duration, densities and irregularities will be examined in more detail.

11

2.1.5 Procedural Audio in Response to Footstep Modelling.


Having examined the principles of granular synthesis and determined a suitable definition for this study, it is vital to understand the evolution and growth of procedural audio and how the development of technology and interactive demand has shaped its expansion. Traditionally in films, pre-recorded samples are commonly used to simulate diegetic sounds such as footsteps. All sonic material needs to be gathered in order to represent what is being shown on the screen, to achieve a high level of fidelity, sound libraries and directly recorded sounds are implemented (Mott, 1990). However, this approach has several disadvantages; sampled sounds are repetitive and location recording is not always the best or easiest option. Recent synthetic sound models have seen an increase in interest, with several algorithms, which make it possible create sounding objects by implementing the use of physical principles (Cook, 2002). Despite the recognised advantages and benefits of procedural audio, a review of the literature in this area has revealed that the adoption of procedural audio is relatively low. This is one of the areas of challenge that this research project seeks to address, however before this can be measured and an accurate research instrument created, it is also useful to understand the state of procedural audio in the sonic industry. It appears evident from the literature that procedural audio and Programming are inextricably linked (Javerlein, 2000). Generally the latter measures the success of the outcome, however the concept of procedural audio has been further redefined in terms of a combination of linear, recorded, interactive, adaptive, sequenced, synthetic, generative and artificial intelligence (AI)

12

audio, which suggests that what is of greatest importance in procedural audio is the meaning we give to the input, internal states and output of the systems. Having taking this all into account, if programming is only a means by which one creates meaningful sound, where does the conflict lie? The problem with procedural audio is that there arent any sets, which contain the sound of a specific object and if there are, there is no way of searching for them. Farnell strongly believes that a better approach to producing sound requires more traditional mathematical approaches based on engineering and physics (Farnell, 2007). However, dynamically generated sound is not the answer to all these problems; there are plenty of areas where it fails to replace recorded sound, such as dialogues and music scores. Even though methods for research and development have been established, practical issues continue to affect the realism of dynamically generated sound. Sound designers, who have adapted their skills and learned new tools, are in the process of finding equilibrium between data and procedural models, which is not a fast or a complete process. Perhaps one of the greatest disadvantages of generated audio is that it still cannot encapsulate the significant sounds of life. Post-production sound effects seem to fall into the psychological rather than technical category, in most cases, they reveal through sound the acoustic landscape in which we live in. Associations of everyday sound play a decisive part in the language of sound imagery, but they can easily be confused. One of the reasons for this is that we often see without hearing (Balazs, 1949). According to Bela Balaz, the Hungarian-Jewish film critic, there is a very considerable difference between our visual and acoustic education. We are far more used to visual forms than sound

13

forms; this is because we have become accustomed to seeing and then hearing, making it rather difficult to draw conclusions about a concrete object just by listening to it. The relationship between visuals and sound will be furthered explained in chapter 3.2.2.1 Statistical Analysis. Sample based audio has proven to be successful, because its principle is to represent our acoustic world, however it is an impractical method as it fails to change in accordance to the visible source. On the other hand, a single procedural structure could accurately replace an entire sound library; the problem does not lie in its principles but in that it attempts to represent motifs associated with various situations in film rather than our acoustic world. Having generated a great deal of sample based audio, production companies have drastically changed our perception of sound through film, associating melodies and sound to specific objects or situations, making it particularly difficult for new content to take over.

2.1.6 Summary.
This literature review has studied the background and evolution of dynamically generated audio and has also analysed its evolution in parallel to developments in technology. It is clear that whilst procedural audio has many obvious advantages, its acceptance has been lower than expected and various reasons have been suggested to explain why this might be the case. This section has also briefly mentioned some footstep modelling followed by a critical analysis of the benefits and challenges of implementing procedural audio in post-production. The following section will present the methodology that will be used during this study.

14

2.2 Methodology.

2.2.1 Introduction.
In this chapter, the objectives, parameters and procedures used in this research project are explained; especially those involved in developing dynamically generated sound where the process for creating a footstep-modelling analysis will be explained.

2.2.2 Objectives.
The general objectives of this research project are: A review of the existent knowledge on sound textures and footstep modelling. To develop a method for the creation of dynamic sound textures. To incorporate the previously mentioned method in footstep sound modelling.

2.2.3 Parameters.
According to Yonathan Bard, models are designed to explain the relationships between quantities that can be measured independently (Bard, 1974). To understand these relationships a set of parameters need to be introduced.

15

2.2.3.1 The Grain Envelope Analysis.


The system architecture of this model extracts and analyses the signal with an envelope follower, which outputs the signals root means square (RMS.) All significant peaks are located once the threshold has been set. If no threshold is selected, all peaks above 50 dB will be segmented into individual events. In order to ensure that peaks are tracked accurately a Hanning window, sized in samples (1024 default), has been set. Once the envelope has marked all the significant peaks, the DSP will then output and list all the events. Figure 3 shows a simple example of the envelope followers listing process.

Figure 3: Output list.

The numbers shown in Figure 3 are expressed in milliseconds and are applied to mark the cut-off points between events. Significant sub events can sometimes be found within the events, for this reason the sample gets normalised, which makes the peak-to-peak transient recognition much more effective. This process, however, is strictly for events recognition and is not used as part of any playback. Thus, signal-to-noise ratio is not raised at any moment. Each particle noise event can be pitch-shifted, reversed, stretched and smoothed. In his analysis

16

of walking sounds, Cook suggested that in order to gel the sonic events, a short and exponentially decaying noise burst should be added, which has proven to be and exceptional addition to this algorithm. According to Curtis, time appears to be reversible in the quantum level, meaning that grains or events can be reversed in time. Moreover, if the grain envelope is symmetrical, the reversed event should sound exactly the same. In Pure Data (PD), this was easily achieved by simply reversing the output list, which turned out to be a success as it gave the sound texture a time-reversible feature. However, as the overall amplitude of the samples synthetised were not symmetric, it was impossible to demonstrate that the waveform of a grain and its reversed form were identical. Figure 4 shows the envelope analysis process.

Figure 4: Transient Detector.

17

2.2.3.2 The Grain Dynamics.


In order to provide a more comprehensive interaction with the sampled signal, several dynamics such as density, duration and pitch of the grain were implemented. The grain density was easily achieved by dividing the number of grains by a thousand. On the other hand, the duration and pitch required a more precise adjustment. To identify the pitch of a sound texture is extremely difficult, as they do not posses any harmonic spectra. However, if properly arranged, it is possible to distinguish the sound texture from being higher or lower. Frequency and time are inversely proportional in the micro level (Gabor, 1947). Therefore, expanding or shortening a grain has inverse repercussions on its frequency bandwidth, which results in an evident change of timbral character. In order to achieve an accurate timbral change, a two-octave bandwith was introduced. Figure 5 shows the patch implemented to transform the pitch of a selected event.

Figure 5: Grain Dynamics.

18

2.2.3.3 Footstep-modelling.
This section describes how particles are extracted based on Physically Inspired Stochastic Event Modelling (PhISEM). According to Cook, who has extensively researched this area, the parameterisation of walking sounds should involve interaction, preferably provoked by friction or pressure from the feet. A stochastic approach, a non-deterministic sequence of random variables, models the probability that particles will make noise; sound probability is constant at each time step (Cook, 2002). Studies have shown the human ability to perceive source characteristics of a natural auditory event. From various analyses applied on walking sounds, a relationship between auditory events and acoustic structure was found. This study considered sounds of walking and running footstep sequences on different textures. Textures such as gravel, snow and grass were chosen, this was motivated by the assumption that a noisy and rich sound spectra will still be perceived by the ear as a natural sound. Studies carried out by Roberto Bresin, who has extensively studied new models for sound control, shown how a double support is created when both feet are on the ground at the same time, suggesting there are not any silent intervals between two adjacent steps. However, not specifying a time constrain between two particular events will blend them into a unison texture; therefore an Attention Span has to be created between steps, in order to perceive them as separate events (Saint-Arnaud, 1995). According to Bresin, Legato and Staccato can be associated to walking and running respectively. Some of his recent work has reported a strong connection between motion and music performance.

19

Having stated several parameters that directly influence walking sounds, it is evident that large libraries of pre-recorded sounds do not contain every possible scenario, which greatly compromises the sonic appreciation.

2.2.3.4 The Ground Reaction Force.


A footstep sound is a combination of multiple impact sounds between the foot (exciter) and the floor (resonator). This model has chosen to separate both components and consider the exciter as an input for different types of resonators. In other words, by extracting the pressure exerted by ones foot, different modes can be extracted and implemented to recreate the sounds of different kinds of floors. In the field of mechanics the pressure exerted by ones body is called the Ground Reaction Force (GRF), which derives from Newtons third law: To every action there is always opposed an equal reaction: or the mutual actions of two bodies upon each other are always equal, and directed to contrary parts (Newton, Motte and Machin, 2010). The architecture of this model will use the ground response force principles to find and analyse the forces that intervene in the creation of the multiple impact sounds that constitute a footstep. It will then apply the analysed forces to the different resonators, creating an opposed equal reaction that will be later translated into sound. See figure 6. In order to analyse the forces involved in the foots motion, it is important to understand how they are distributed. A normal gait is composed of two phases, a stance phase (60%) and a swing phase (40%). The stance phase is composed of five categories, initial contact, loading response, mid-stance, terminal stance and pre-swing.

20

Figure 6: GRF Exemplified. (http://epicmartialarts.wordpress.com/tag/ground-reaction-force/)

The swing phase consists of an initial swing, a mid-swing and a terminal swing (Porter, 2007). All these phases exert different forces making it incredibly hard to translate all of his micro-movements into sound. Farnell has proposed to analyse the gait phases not as individual events, but as a distribution of forces. As a result, three phases become apparent as shown in figure 7 (Farnell, 2010): 1. The Contact Phase: The heel makes contact with the ground and the ankle rotates the foot. 2. The Mid-stance Phase: The bodys weight is shifted onto the outer tarsal. 3. The Propulsive Phase: The foot rolls along the ground ending up on its toes.

21

Figure 7: The Gait Phase.

(http://naturalrunningcenter.com/2012/06/21/walking-vs-running-gaits/) Ideally, each gait cycle would generate identical GRF distributions, however they can significantly change as the walking pace and level ground change. If this werent the case, two complete footsteps could be sufficient in generating a walking pattern. This introduces another variable, the movement of the body, which fluctuates above and below the sum of the left and right foots GRF. Andy J. Farnell, explained in his book Designing Sound, the three different modes of movement (Farnell, 2010): 1. Creeping: 2. Walking: Minimises pressure changes, which diminishes the sound. Maximises locomotion while minimising energy expenditure. 3. Running: Accelerates locomotion. Figure 8 exemplifies the Ground Reaction Force distribution of a gait phase, where the bodys weight is transferred onto the heel, sometimes before the weight is completely transferred, there is a transient force experienced just before the load response, surprisingly this force exceeds the normal standing force. The weights distribution between the heel coming down and the toe pushing off evens out just before the propulsive phase where the bodys weight is entirely on the feet.

22

Figure 8: GRF Distribution in Pure Data.

2.2.4 Procedures.
This section describes the instruments and architecture involved in the creation of this research project. It aims to establish an efficient workflow that could later be implemented to future work. This section will also explain how diverse theories and models will be tested and how relevant data will be collected.

2.2.4.1 Pure Data.


The demonstration prototype that accompanies this research project has been built using this platform. In order to run this software or patch Pure Data 0.44.0 is required. Pure Data (PD) is an open source visual programming language developed by Miller Puckette. It is a real time graphical programming environment for audio, video and graphical processing. PD was chosen partially because it is designed for real-time processing and because it allows a fast modification of parameters, making it extremely interactive and user friendly (See figure 9).

23

Figure 9: PD Environment.

Figure 10: The Cloud.

24

2.2.4.2 Arduino.
In order to establish a more interactive communication between the user and the patch, a piezo-resistive force sensor was implemented (see figure 13). The prototyping platform Arduino UNO creates a link between PD and the presence sensor. When pressure is applied to the sensor, Arduino will receive the input of the analogue pin, which ranges from 0 to 1023 and it will then transmit the value to the object comport 9600 in PD. Figure 11 illustrates this process.

Figure 10: Code in Arduino.

25

2.2.5 Architecture.
This footstep model has been inspired by Perry Cook and Andy J. Farnells approaches to walking sounds. Their investigations into parametrised synthesis, especially granularity, have been of great help. Figure 12 illustrates the signal flow of this prototype. Based on Roads idea of users control, this patch routes all the information to a common cloud (see figure 10) where the user can easily modify the dynamics of the grain, as well as the sensitivity of the feet sensors. All seven parameters mentioned in section 2.1.4 (see page 11) were taken into account when designing this patch. The sensors define the start time and duration of this process (1). The grain duration is specified by the option smooth, which divides its input into a 100ms window and adds it to the transients size (2). Seemingly the density of grains per second (grains/1000ms) is specified by the option grains (3). Two band-pass filters determine the frequency band of the cloud (4). An amplitude envelope and a freeverb~ (PD customs reverb) have also been incorporated, giving the user the option of custom-shape the signal before it
Figure 12: Architecture.

reaches the output (5 & 6).

26

Figure 11: Prototype.

In order to accurately transcribe and digitise the sensors information, a split-phase and a polynomial curve have been incorporated. The split-phase converts the input given by the sensors into a signal that can be later scanned by the Phasor~ object in PD. It combines both feet and creates a time constrain between one another, defining an Attention Span fooling the ear into perceiving both inputs as separate events (Saint-Arnaud, 1995). The polynomial curve is defined by the equation (Farnell, 2010):

f(x) = -1.5n (x^3-x) (1-x)


where, 0n<1 Figure 14, illustrates the envelope of the polynomial curve for the minimum and maximum values of n. These curves create a small envelope for each of the three gait phases aforementioned. See Figure 8. As mentioned in chapter 2.3.2.1, a burst of noise has also been added to the patch, which contributes to the randomness of the stochastic analysis and helps to mask any imperfections of the grain selection, if any. A low-pass filter has been attached to the noise generator, so that high frequencies can be added or filtered out. This white noise is triggered directly by the sensor pad.

27

In order to evaluate the accuracy and precision of these methods external feedback will be collected, this will be explained further in the following chapter.

Figure 12: Polynomial Curves.

2.2.6 Summary.
This methodology has extensively analysed the existent knowledge on sound textures with granular synthesis in order to develop a method for the creation of dynamically generated textures (see page 16). It then proceeded to integrate the mentioned model to a footstep model created from the behavioural analysis conducted in section 2.2.2.3 (see page 19). It has also described the architecture of the prototype designed as part of this research. The following chapter will present the Evaluation Process that was used for this project.

28

CHAPTER 3 EVALUATION.

3.1 Introduction.
The evaluation process presented in this study uses a mixed method design. According to John W. Creswell, analysing both quantitative and qualitative data helps to understand the research problem thoroughly (Creswell, 2002). A mixed method design is based upon pragmatic statements, which accept the truth as a normative argument. Interesting opinions have been given regarding mixed methods, however the issue of distinguishing between aesthetic assumptions have not been addressed yet (Sale, Lohfeld, Brazil, 2002). This research project will use a sequential explanatory mixed methods design, according to Creswell (Creswell, 2002), this method is the most straightforward of the six major mixed method approaches, which is an advantage as it organises data more efficiently. This method collects and analyses quantitative data and then goes on to collect and analyse qualitative data. Keneth R. Howe, an educational researcher, stated that researchers should forge ahead only with what works. Following this statement, this study introduced three topics, in order to structure the design of this research project: Priority, Implementation and Integration (Creswell, Plano Clark, Guttman & Hanson, 2003).

29

a) b) c)

Which of these methods, quantitative or qualitative will be emphasised in this study? Will data collection come in sequence or in chronological stages? How will this data be integrated?

Special priority will be given to quantitative data leaving all qualitative results to assist the results obtained in the quantitative stage. For the purposes of efficiency, data was collected and integrated in chronological stages, which offered a more comprehensive information. and broader landscape of the gleaned

3.2. Quantitative Data.


Michael Chion explained, in his book Audio-Vision, how sounds can objectively evoke impressions without necessarily relating to their source (Chion, 1990). A combination of synchronism and synthesis, forged by Chion as synchresis, describes the mental fusion between sounds and visuals, when they occur simultaneously. According to Chion (Chion, 1990, p115), when a precise expectation of sound is set up, synchresis predisposes the spectator to accept the sound he or she hears. With regards to footsteps, Chion refers to synchresis as unstoppable stating that, We can therefore use just about any sound effects for these footsteps that we might desire (Chion, 1990, p64). As an example, he referred to the film comedy Mon Oncle, by the French filmmaker Jacques Tati, where a variety of noises for human footsteps, which involved Ping-Pong balls and

30

glass objects were used. One of the purposes of this survey was to demonstrate how synchronised sound textures could fool the ear into thinking that real footsteps are being played. In order to achieve this, a total of ten clips were played to an audience, where a mixture of Foley, location recording and generated sounds were given. A non-probability sampling approach was used for this research project, as it is not the purpose of this study to infer from the sample to the general population but to add to the knowledge of this study.

3.2.1 The Data Collection Method.


Data collection mostly consists of observations, where several audio samples were compared to those created with the model aforementioned. A self-developed survey, containing items of different formats such as multiple choice and dichotomous questions were structured. Colin Robson describes surveys as a very effective method in collecting data from a specific population, or a sample from that population (Robson, 2002). Seemingly, they are widely accepted as a key tool in conducting and applying research methods (Rossi, Wrigth and Anderson, 1983). The survey consisted of five questions, which will be divided into two sections. The first section of this analysis asked questions related to the participants status (Audio or Film student). The second section measured the participants ability to differentiate between recorded and dynamically generated sounds (See Appendix A). This model sought to understand the individuals perception of diegetic sounds. The quantitative data was collected on the 29th and 30th May 2013 at Student Audio Engineering (S.A.E) House in east London, U.K. The survey was distributed to a specific population of

31

students (Audio and Film students). In total thirty individuals were given surveys. Based on Howes statement (see 3.1), the goals of the surveys were to identify what sound textures participants believed to be real. Two independent variables were introduced; these were Recorded and Generated Sounds, which were played at random to the participants. As mentioned above, a total of ten short-clips containing five different sound textures, were prepared for this survey. The first group of participants surveyed were mostly Audio students, a brief explanation explaining attention span and the layout of the audio, was given prior the survey. At first participants were asked to listen to just the audio of the short-clips. A fifteen second gap between clips was given, not only for them to draw their own conclusions (as an informal conversational interview) but also to allow their short-term memory to forget the sonic information, which they had gathered. According to Perry Miller, the duration of our short memory seems to be between fifteen and thirty seconds (Miller, 1956). This way, the average audio-visual span disappears from ones mind, allowing new data to be processed clearly. The second part of the survey combined both picture and sound. The structure of the survey (see Appendix A) contained three basic questions, which were aimed to investigate the participants relation to sound libraries. Both questions how you would rate the content and what you would look for in sound libraries, were an excellent start, which led to an open debate conducted after the survey. This offered even more data for discussion and research.

32

3.2.2 Research Findings


This section describes the results of the survey by initially assessing the descriptive statistics in order to specify the different variables and characteristics that were measured. An analysis of the remaining variables and aspects of the survey will also be presented. As described in the previous section, the research population comprised of thirty research participants using a non-purposive sampling approach. The quantitative variables of this project were collected on two different days, as it was very difficult to integrate both audio and film students together. In order to accurately measure both departments, this research project has surveyed a total of fifteen audio students, one audio specialist, thirteen film students and two film specialists. The first part of the survey (see Appendix A) established how many participants had used sound libraries for their particular projects. As seen in figure 15, when asked about the content of such libraries in question two, 40% of both audio and film student thought their quality was very poor.

AUDIO STUDENTS: QUESTION 2

FILM STUDENTS: QUESTION 2

POOR 40%

HQ 20% GOOD 20% VARIES 20%


Figure 13: Question 2.

POOR 40% VARIES 60%

33

However, the concept of poor is a very vague statement. Are the contents of these libraries poor in sonic quality? Or are they poor because they do not meet the users needs? In order to clarify this concept a follow up question was introduced, this was What do you look for in sound libraries? As shown in figure 16, Audio and Film students look for very different and specific material. 67% of the audio students surveyed specifically looked for ambience sounds, whereas 57% of the film students surveyed looked for Foley sounds. Many types of hypotheses can be drawn from this statement. The perception of sound in film goes far and beyond the pure physics of the sonic spectra. Throughout history, film producers have chosen to artificially construct the sound of their films (Gorbman, 1976). Advances in technology have expanded the creative possibilities of filmmakers and sound designers; the difference lies in how these sonic experiences are created. Based on the data collected, one could easily assume that film students have an internal approach to sound (Chion, 1990). Physiological sounds such as breathing and moans, or more subjective sounds such as a memory or a mental voice can easily be achieved by using Foley and ADR (Automated Dialogue Replacement) practices, which might explain why their main concern, when browsing through a sound library, are Foley sounds. On the other hand, audio students seek to describe the soundscape of the picture, either by recreating the sonic characteristics of the environment or by artificially creating a completely new sonic environment.

34

AUDIO STUDENTS: QUESTION 3 FOLEY 16% OTHER 17%

FILM STUDENTS: QUESTION 3

FX 14% AMBIENC E 29%

AMBIENC E 67%

FOLEY 57%

Figure 14: Question 3.

Another question arises from these two hypotheses. This is, how is the quality of such libraries perceived, if their contents are listen to as part of a group of sounds? This is a very important question as it strives to understand our perception of artificially constructed sound. Being able to recreate generative audio means nothing if it does not work in the context it was designed for. In order to understand this matter, the aforementioned footstep sounds (see 2.2.2.3) were played along with ambience sounds as well as different sound effects and dynamics. The results will be analysed in the next section.

3.2.2.1 Statistical Analysis


This section examines the results of the statistical analysis collected from the second part of the survey. It tries to understand how generative audio can be implemented in postproduction processes. It should be noted at the outset of this analysis that this research followed a non-sampling technique. According to researchers such as Fredrick J Gravatter, convenience sampling is probably the most adequate method to use when the

35

population to be investigated is too large. Participants were therefore selected, based on their accessibility and proximity to the researcher. Although convenience sampling does not offer any guarantee of a representative sample, it collects basic data that could later be analysed or used as a pilot study (Gravetter, 2011 p 151). In order to ensure that each variable was evaluated to its best, they were examined one at a time, a series of visual displays were created to help explain the relationships between the variables examined in this study. A total of ten short clips were presented to the participants, to answer the question, where do you think unrealistic sounds have been placed? Participants were given a scale from one to five to rate clips realism. The films that were used for this experiment were: Terminator 2: Judgment day (1991) mixed by the American sound designer Garry Rydstrom, Pulp Fiction (1994) mixed by David Bartlett, Mon Oncle (1958) produced by Jacques Tati and Here (2013) produced as part of my portfolio. Clips one, three, four and five were re-mixed in order to introduce the footsteps generated by the patch developed. The purpose of this experiment was to determine what combination of sounds seemed the most realistic to the participant. The results of this experiment are shown in Appendix B. This research study conducted a T-Test and a Chi-squared test. The aim was to understand whether there was a significant difference between how participants rated the clips with generated sounds and how they rated the clips with recorded sounds. As noted in section 1.2 (see page 3) this dissertation aims to add to the research in Foley practices by using generative audio. It is not therefore a comparative analysis between recorded and generative audio. A combination of generated footsteps was presented to the participants in clips 1,

36

3, 4 and 5. Table 1 shows the average quality that the participants gave to generated and recorded audio respectively. PARTICIPANT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 AVERAGE STDEV. GENERATED AUDIO 3.25 4 2.5 3.75 1 4.5 3.25 3.75 2.5 2.5 2.5 3 3.25 2.75 4 2.3 2.16 3.5 2.16 2.3 3 2 3.25 3 2.16 3.5 3 3 3.25 4 2.969333333 0.75013991 RECORDED AUDIO 2.16 2.5 3.5 3.5 2.16 3.5 3.16 2.83 2.5 2.3 2.5 2 2.5 4 2.75 3 4.5 3.25 3.75 2.75 2.5 2.3 2.16 3.5 3 3.25 2.75 3 2 3 2.885666667 0.619625489

Table 1: Average Quality.

37

As seen in table 1, it is possible to conclude that there is no statistical difference between the perceived quality of generated and recorded audio, this conclusion is based on their standard deviation values, which clearly shows that the average values from both parties overlap. In order to critically assess these values, a T-Test was conducted; which was aimed to understand how likely these differences were to be reliable.

3.2.2.2 T-Test
Null Hypothesis H0: (GA = RA). There is no discernible sonic difference between recorded audio and generated audio. Alternative Hypothesis H1: (GA < RA). Recorded audio possesses better sonic qualities. Therefore, there is a significant difference between recorded audio and generated audio. Alternative Hypothesis H2: (GA > RA). Generative audio possesses better sonic qualities. Therefore, there is a significant difference between recorded audio and generated audio. All data was computed using Microsoft Excel (figure 17). Additionally, this set of results were compared to those obtained at www.graphpad.com (see Appendix C), from where this research concluded that the two tailed probability (p) value of the data equalled 0.639. This probability value does not provide enough evidence to reject the Null Hypothesis (H0), as there is no evidence to prove that there is a significant difference between recorded and generated audio. However, this does not mean that the Null Hypothesis is true. A couple of conclusions can be drawn from this test:

38

The population surveyed could not discern between recorded and generated audio. An average of 3 (Good Quality) was given to the clips containing generative audio (See Appendix A).

Figure 15: T-Test in Excel.

3.2.2.3 Chi-Square
Null Hypothesis H0: (As = Fs) There is no difference between how Audio and Film students perceive audio quality.

DEPARTMENT AUDIO FILM GRAND TOTAL

GENERATED AUDIO 3.1 2.838666667 5.938666667

RECORDED AUDIO 2.790666667 2.980666667 5.771333333

GRAND TOTAL 5.890666667 5.819333333 11.71

Table 2: Chi-Square.

39

EXPECTED VALUES: DEPARTMENT AUDIO FILM GRAND TOTAL GENERATED AUDIO 2.987421501 2.951245166 5.938666667 RECORDED AUDIO 2.903245166 2.868088168 5.771333333 GRAND TOTAL 5.890666667 5.819333333 11.71

Table 3: Expected Values. The p value obtained from Excel was 0.895, which means that this project cannot reject the Null hypothesis and therefore, there is no difference in how audio and film students perceive sound. Moreover, the independent Chi-square values for Audio and Film students were 0.008607859 and 0.008713374 respectively, which are just below the critical value 0.05. This strongly highlights why this hypothesis cannot be rejected.

3.2.3 The Results and Evaluation.


As mentioned in 2.2, the main purpose of collecting quantitative data was to demonstrate how synchresis could trick the human ear into thinking that real footsteps are shown on the screen. From the information collated, it is easy to conclude that whether or not there is a significant difference between recorded and generated audio, the outcome of the latter has the potential of being equally as good as recorded audio. The last part of this section sought to understand if film students had a more internal approach to sound and just how different this approach was from these audio students. It later became apparent that there is not actual difference between how audio and film students perceive sound, this is probably due a level of subjectivity that is always present. Therefore it was not possible to study such statements.

40

3.3 Qualitative Data.


Qualitative research is often criticized for lacking rigor, where the terms reliable and valid are usually associated with data obtained by quantitative methods. However, in this mixed method design the qualitative data collected is oriented to support the findings of the quantitative phase. This qualitative research was divided into two sections, a one-to-one interview with Andy Farnell and a post-survey discussion supported by an e-mail interview with Gillian McIver. The Norwegian psychologist Steinar Kavale expressed in his book Doing Interviews (Kvale, 2008) that in order to successfully conduct an interview, a pilot testing must be implemented. This pilot testing was conducted informally to audio students as a conversational interview, where new and interesting follow up questions helped to the refinement of the topics discussed in both interviews. Researchers such as Creswell, Goodchild and Turner have broadly studied mixed method designs. According to Creswell, the advantages of a mixed design are its easy implementation and in-depth exploration of quantitative data (Creswell, 2002). However, quantitative results may show no significant differences, thus making the whole process slow, as it requires lengthy amount of time to complete.

3.3.1 Data Collection Method.


According to Monique Hennik, the author of Qualitative Research Methods, an in-depth interview is a one-to-one method of data collection that involves an interviewer and a interviewee discussing specific topics in depth (Hennik, 2011). I

41

had the opportunity to arrange a Face to face interview with Andy Farnell. This fifteen-minute in-depth interview available to listen to online at www.juliantellez.com/interactiveaudio/Farnell.wav. The purpose of this interview was to gain further knowledge in the efficiency, design and implementation of generative audio. Five conversational questions were introduced to Farnell; not only did he give a clear insight of all the aforementioned discussed topics, but he also shared his perspectives with regards to the needs of audio and film. In order to assist with the results obtained by the survey, a couple of interviews were conducted. The structure of these standardised, open-ended interviews included five questions where the content was grounded in the results of the statistical analysis, which was extracted from the survey. The participants A. J. Farnell and Gillian McIver a Canadian filmmaker, writer and visual artist were interviewed using a standardised interview approach to ensure that the same general areas of information were collected from both of them. Additionally, Paul Groom, Alessandro Ugo and Daria Fissoun (Film specialists) were also contacted. As described by Sharan B. Merriam (Merriam, 1998), in regards to qualitative data, collection and analysis occurred simultaneously. According to McNamara (McNamara, 2008), there is potentially a lack of consistency in the way questions are posed, meaning that respondents may or may not be answering the same questions. For this reason, the interview was conducted via e-mail, not only to the ensure consistency between them but also to make it easier for the participants to analyse the questions, allowing them to contribute as much detailed information as they desired.

42

3.3.2 Research Findings.


This section presents the conclusions from the data collected; the first section describes in detail the interview design for Farnell. Subsequently the second section further expands upon the conclusions, which were drawn on completion of the first interview.

3.3.2.1 One-to-One Interview.


In answer to the question what do you think are the possibilities of module-based DSPs such as PD becoming a prominent audio engine solution? Andy expressed that DSPs are intended to fill the gap between the users level of expertise and the high-level use interaction offered by DAWs (Digital Audio Work station). However, as far as the possibilities go, their flexibility have earned them a place in the audio industry. A follow up question was introduced, where the interviewee was asked about the flexibility of the DSPs and when linked to generated audio how this flexibility is perceived. Farnell described how there is an apparent hierarchical stack that constitutes generated sounds; these were of behaviour, model and implementation, when asked which one of them was most important, he emphasised that design (behaviour plus model) was more important than implementation, he added when you have a great model, then you can use various kinds of methods the outcomes will be equally good, because the behavioural analysis, which encapsulates model and method, facilitates the implementation. Some prove of this, he noted, is the work that was done by Dan Stowell from Queen Mary Universitys research group: Centre of Digital Music. Stowell has re-written most of the

43

examples from Farnells textbook implementing Supercollider instead of PD. Farnell stresses that although implementation is exchangeable, there is still a huge gap between the design and the users implementation. Physically controlled implementation, as proposed by Farnell, is the best way to research this issue. In answer to the question, do you think generative sound could potentially meet the needs of the film industry? Farnell introduced a very interesting analogy, where he related generative sounds as the beginning of a more sophisticated approach to audio. I think in the next ten years you will have a CGA (Computer Generated Audio) in Hollywood CGA is much more powerful than CGI (Computer Generated Imagery) because there is a spectrum where they can be mixed with traditional techniques Most people wont know the difference between generated and recorded audio (Farnell, 2013). Personally, I have found this interview, especially the aforementioned analogy to be very inspiring, I believe that it is possible to restructure the post-production workflow by analysing and designing the sound of a particular location stage so that one could be able to use the sounds created by performers at any location.

3.3.2.2 e-Interviewing.
The email interviewing turned out to be more flexible, convenient and less obtrusive than a conventional interview. However, as it took a lot longer than the previous discussions and interviews, only the information provided by Gillian McIver will be analysed (Appendix D). The questions were introduced generically in order to get more objective answers. The rationale to this stems from a short discussion I had with some film

44

students where they expressed discontent with audio, especially sound libraries. In answer to the question why is it that film-audio is secondary in the film industry? McIver outlined that the problem does not lie in the industry but in education, mentioning that there was a clear division between both departments, so if the problem lies in education, how can both parties overcome difficulties such as correct audio replacement and authentic sonic representations? Just like a DSP fills the gap between expertise and interaction, I believe that there is a gap where the expertise of signal processing can meet the production needs by means of interaction. When asked about the emphasis the film industry puts on the creation of sound technology, McIver replied: Most do not think about it judging by this answer, one could conclude that If any sound technology that is aimed at the film industry were to be developed in the near future, it would have to be embedded and more importantly interactive and user-friendly.

45

CHAPTER 4 CONCLUSION.

The techniques used for the generation and control of grain signals were studied extensively throughout this research project. A special emphasis was placed on structuring a footstep model that enabled an instant interaction between the user and the DSP. It encompassed some of the studies carried out by A. J. Farnell, P. R. Cook and R. Bresin. In adherence to these studies, a process of evaluation and testing was also conducted alongside the footstep method, formulated in this research project. It was a compelling effort to promote generative audio in the postproduction industry. The analysis of sound synthesis with procedural audio was reviewed in great detail, where different approaches for the creation of sound textures were highlighted. Consequently, it defined the evolution and development of generated audio in response to sound modelling. This was achieved by structuring the associations between everyday sounds and sound imagery. The criteria used for this project conveys information that characterises an individual sound by the force, that the body exerts upon it (Gaver, 1993). From the evidence given by the aforementioned authors, a study of the background and evolution of dynamically generated audio was collected; this outlined its advantages and drawbacks. A complete separation between contact objects and interaction was achieved. The main findings create an intersection between sound

46

synthesis (see section 2.1.3, p 8), signal analysis (see section 2.1.5, p 12) and users interaction (see section 2.2.5, p 26) (Strobl, Eckel and Rochesso, 2006). Additionally, an evaluation phase was introduced, where several statistical tests were conducted in order to corroborate the information stated (see section 3.2.2.1, p 35). As noted at the end of sections 3.3.2.1 and 3.3.2.2 (p 43-44), sound technology has an enormous potential, which will most certainly be explored in years to come. Recent advances have placed sound technology in a very prominent position allowing for efficient interaction and productivity. As far as footstep-modelling goes, there are endless possibilities (in terms of sound textures) where further studies can be conducted. I have extensively emphasised the importance of user interactivity throughout this research project. By adding GRF recognition, this study has overcome this issue, allowing the patch to identify the users gait characteristics (see section 2.2.3.4, p 20). However, it is still a prototype and some adjustments will be made in the near future. The principles of GRF apply to every mass on Earth; it would certainly be interesting to recreate any sound by simply extracting sound textures from the environment (Gaver, 1993). This piece of work is intended to promote the use of generated audio in the film industry. As discovered, there are numerous applications for these methods within the post-production sector. However, further research and study is necessary in order to make generated audio a standard practice.

47

APPENDICES.

Appendix A.
Survey.
29 May 2013 London, U.K.
th

Footstep synthesis.

Please take a moment to analyse the clips.. When youre done, please answer the following questions: ABOUT YOU.

Audio Student. Film Student. None.

How would you rate the content of these libraries? Consistent high quality.

Generally good. Quality varies . Poor quality.


What do you look for in sound libraries? Foley sounds.

Have you ever used sound libraries? Yes.

No.

Ambience sounds. Fx. Other. _______________

ABOUT THE CLIPS. Please rate the clips on a scale from 1 to 5: (1) poor, (2) fair, (3) good, (4) very good, (5) outstanding.

CLIP 1 2 3 4 5 6 7 8 9 10

Thank you for your participation!

48

Appendix B. AUDIO STUDENTS: CLIP 1


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8

FILM STUDENTS: CLIP 1


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8

AUDIO STUDENTS: CLIP 2


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8 10

FILM STUDENTS: CLIP 2


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 5 10 CLIP 2

AUDIO STUDENTS: CLIP 3


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8 10

FILM STUDENTS: CLIP 3


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8 10

49

AUDIO STUDENTS: CLIP 4


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8

FILM STUDENTS: CLIP 4


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8 10

AUDIO STUDENTS: CLIP 5


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8

FILM STUDENTS: CLIP 5


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8 10

AUDIO STUDENTS: CLIP 6


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8 10

FILM STUDENTS: CLIP 6


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8 10

50

AUDIO STUDENTS: CLIP 7


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8

FILM STUDENTS: CLIP 7


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 5 10 15

AUDIO STUDENTS: CLIP 8


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8

FILM STUDENTS: CLIP 8


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 1 2 3 4 5

AUDIO STUDENTS: CLIP 9


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8 10

FILM STUDENTS: CLIP 9


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 2 4 6 8 10 12

51

AUDIO STUDENTS: CLIP 10


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 1 2 3 4 5

FILM STUDENTS: CLIP 10


OUTSTANDING VERY GOOD GOOD FAIR POOR 0 5 10 15

Appendix C.

52

Appendix D.

Appendix E.
Transients representation patch:

53

Appendix E.

54

Appendix E.

55

REFERENCES.
Ament, V (2009). The Foley Grail: The Art of Performing Sound for Film, Games, and Animation. Oxford: Focal Press. Balazs, B (1949). Theory of Film: Sound. London: Dennis Dobson Ltd. Bard, Y. (1974). Nonlinear Parameter Estimation. New York Academic Press. Chion, M (1990). Audio Vision. New Jersey: Columbia University Press. Cook, P (2002). Real Sound Synthesis for Interactive Applications. Massachusetts: AK Peters, Ltd. Creswell, J.W (2002). Reseach Design: Qualitative, Quantitative and Mixed Methods Approaches. New York: SAGE Publications Ltd. Creswell, J.W., Plano Clark, V. & Hanson, W (2003). Advanced Mixed Methods Research Design. Thousand Oaks: SAGE Publications Ltd. Farnell, A. (2007). Marching Onwards: Procedural Synthetic Footsteps for Video Games and Animation. Proceedings of the Pure Data convention. Farnell, A (2010). Designing Sound. London: MIT Press. Gabor, D. (1946). Theory of communication. Journal of the Institute of Electrical Engineers 3, (93), 429-457. Gabor, D. (1947). Acoustical quanta and the theory of hearing. Nature. 591- 594. Gaver, W.. (1993). How Do We Hear in the World?: Explorations in Ecological Acoustics. Ecological Psychology. 5 (4), 292-297. Gorbman, C (1976). Teaching the Soundtrack. Quarterly Review of Film and Video. Gravetter, F. J., and Wallnau, L. B. (2011). Essentials of Statistics for the Behavioral Sciences (7th Edition). Belmont, CA: Thomson/Wadsworth.

56

Harris, F. (1978). On the Use of Windows for Harmonic Analysis with the Discrete Furier Transform. Proceedings of the IEEE. Harley, J (2004). Xenakis: His Life in Music. New York: Routledge. 215-218. Hennink, M., Hutter, I. & Bailey, A (2011). Qualitative Research Methods. New Jersey: SAGE Publications Ltd. Javarlainen, H (2000). Algorithmic musical composition. Helsinki University of Technology, TiK-111080 Seminar on content creation Jones, D. & Parks, T. (1988). Generation and Combination of Grains for Music Synhthesis. Computer Music Journal. 12 (2), 27-34. McNamara, C . (2008). General Guidelines for Conducting Interviews.Available: http://managementhelp.org/businessresearch/interviews.htm. Last accessed 8th May 2013. Merriam, S. B. (1998). Qualitative research and case study applications in education. San Francisco: Jossey-Bass. Miller, G.A (1956). The Magical Number Seven, Plus or Minus Two: Some Limitis on our Capacity for Processing Information. The Psychological Review. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology. 11, 56-60. Mott, R (1990). Sound Effects, Radio TV and Film. Boston: Focal Press. Newton, Sir I., Motte, A. & Machin, J (2010). The Mathematical Principles of Natural Philosophy, Volume 1. Carolaina Charleston: Nabu Press. Porter, D. & Schon, L (2007). Baxter's The Foot and Ankle in Sport. 2nd ed. Missouri: Mosby. Roads, C (2001). Microsound. London: MIT Press. 85-118. Saint-Arnaud, N. (1991). Classification of Sound Textures. Mater of Science in Telecommunications. Universite Laval, Quebec. Sale, J., Lohfeld, L. & Brazil, K (2002). Revisiting the Quantitative-Qualitative Debate: Implications for Mixed-Methods Research. Netherlands: Kluwer Academic Publishers. Strobl, G., Eckel, G. & Rocchesso, D. (2006). Sound Texture Modelling: A Survey. Proceedings of the Sound and Music Computing Conference.

57

Yewdall, D (2011). The Practical Art of Motion Picture Sound. 4th ed. Oxford: Focal Press. Wood, N & Cowan, N. (1995). The Cocktail Party Phenomenon Revisited: How Frequent Are Attention Shifts to One's Name in a Irrelevant Auditory Channel?. Journal of Experimental Psychology: Learning, Memory and Cognition. 21 (1), 225-260.

58

BIBLIOGRAPHY.
Ament, V (2009). The Foley Grail: The Art of Performing Sound for Film, Games, and Animation. Oxford: Focal Press. Balazs, B (1949). Theory of Film: Sound. London: Dennis Dobson Ltd. Bard, Y. (1974). Nonlinear Parameter Estimation. New York Academic Press. Bresin, R., Fridberg, A. & Dahl, S. (2001). Toward a New Model for Sound Control. Proceedings of the COST G-6 Conference on Digital Audio Effects. Bresin, R. & Fontana, F. (2003). Physics-Based Sound Synthesis and Control: Crhusing, Walking and Running by Crumpling Sounds.Proceedings of the XIV Colloquium on Musical Informatics. Chion, M (1990). Audio Vision. New Jersey: Columbia University Press. Cook, P (1999). Toward Physically-Informed Parametric Synthesis of Sound Effects. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Cook, P (2002). Real Sound Synthesis for Interactive Applications. Massachusetts: AK Peters, Ltd. Cook, P. (2002). Modeling Bill's Gait: Analysis and Parametric Synthesis of Walking Sounds. Audio Engr. Society 22 Conference. 1-3. Creswell, J.W (2002). Reseach Design: Qualitative, Quantitative and Mixed Methods Approaches. New York: SAGE Publications Ltd. Creswell, J.W., Plano Clark, V. & Hanson, W (2003). Advanced Mixed Methods Research Design. Thousand Oaks: SAGE Publications Ltd. Dahl, S. (2000). The playing of an accent: Preliminary observations from temporal and kinematic analysis of percussionists. Journal of New Music Research. 29 (3), 225-234. Dannenberg, R. & Derenyi, I. (1998). Combining Instrument and Performance

59

Models for High-Quality Pennsylvania.

Music

Synthesis.

Carnegie

Mellon

University,

Farnell, A. (2007). Marching Onwards: Procedural Synthetic Footsteps for Video Games and Animation. Proceedings of the Pure Data convention. Farnell, A (2010). Designing Sound. London: MIT Press. Forrester, M. (2006). Auditory Perception and Sound as Event: Theorising Sound Imagery in Psychology. Available: http://www.kent.ac.uk/arts/sound-journal/index.html. Last accessed 8th May 2013. Gabor, D. (1946). Theory of communication. Journal of the Institute of Electrical Engineers 3, (93), 429-457. Gabor, D. (1947). Acoustical quanta and the theory of hearing. Nature. 591- 594. Gaver, W.. (1993). How Do We Hear in the World?: Explorations in Ecological Acoustics. Ecological Psychology. 5 (4), 292-297. Gorbman, C (1976). Teaching the Soundtrack. Quarterly Review of Film and Video. Gravetter, F. J., and Wallnau, L. B. (2011). Essentials of Statistics for the Behavioral Sciences (7th Edition). Belmont, CA: Thomson/Wadsworth. Hahn, J., Geigel, J., Gritz. L., Takala, T. & Mishra, S . (1995). An Integrated Approach to Audio and Motion. Journal of Visualization and Computer Animation. 6 (2), 109-129. Harris, F. (1978). On the Use of Windows for Harmonic Analysis with the Discrete Furier Transform. Proceedings of the IEEE. Harley, J (2004). Xenakis: His Life in Music. New York: Routledge. 215-218. Hennink, M., Hutter, I. & Bailey, A (2011). Qualitative Research Methods. New Jersey: SAGE Publications Ltd. Howe, K.R (1988). Against the Quantitative-Qualitative Incompatibility Thesis or dogmas Die Hard. Educational Researcher. Javarlainen, H (2000). Algorithmic musical composition. Helsinki University of Technology, TiK-111080 Seminar on content creation.

60

Jenkins, J. & Ellis, C. (2007). Using Ground Reaction Forces from Gait Analysis: Body Mass as a Week Biometric. Fith International Conference on Pervasive Computing. Jones, D. & Parks, T. (1988). Generation and Combination of Grains for Music Synhthesis. Computer Music Journal. 12 (2), 27-34. Lostchocolatelab. (2010). Audio Implementation Greats No 8: Procedural Audio Now. Available: http://designingsound.org/2010/09/audio-implementation-greats-8-procedural-audi o-now/. Last accessed 8th May 2013. McNamara, C . (2008). General Guidelines for Conducting Interviews.Available: http://managementhelp.org/businessresearch/interviews.htm. Last accessed 8th May 2013. Merriam, S. B. (1998). Qualitative research and case study applications in education. San Francisco: Jossey-Bass. Miller, G.A (1956). The Magical Number Seven, Plus or Minus Two: Some Limitis on our Capacity for Processing Information. The Psychological Review. Milicevic, M. (2008). Film Sound Beyond Reality: Subjective Sound In Narrative Cinema. Available: http://filmsound.org/articles/beyond.htm#pet5. Last accessed 8th May 2013. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology. 11, 56-60. Mott, R (1990). Sound Effects, Radio TV and Film. Boston: Focal Press. Newton, Sir I., Motte, A. & Machin, J (2010). The Mathematical Principles of Natural Philosophy, Volume 1. Carolaina Charleston: Nabu Press. Nordahl, R., Serafin, S. & Turchet, L. (2009). Extraction of Ground Reaction Forces for Real Time Synthesis of Walking Sounds. Proceeding Audio Mostly Conference. Nordahl, R., Serafin, S. & Turchet, L (2010). Sound Synthesis and Evaluation of Interactive Footstep for Virtual Reality Applications. Porter, D. & Schon, L (2007). Baxter's The Foot and Ankle in Sport. 2nd ed. Missouri: Mosby. O' Brien, J., Cook, P., Essl, G. (2001). Synthesising Sounds from Physically Based Motion. Computer Graphics Proceeedings, Annual Conference Series.

61

Roads, C (1996). The Computer Music Tutorial. Massachusetts: MIT Press. 338-342. Roads, C. (1988). Introduction to Granular Synthesis. Computer Music Journal. 12 (2), 11-13. Roads, C (2001). Microsound. London: MIT Press. 85-118. Robson, C (2002). Real World Research: A Resource for Social Scientists and Practitioner-Researchers. 2nd ed. New Jersey: Willey. Rowe, R (1993). Interactive Music Systems: Machine Listening and Composing. Cambridge: MIT Press. Rowe, R. (1999). The Aesthetics of Interactive Music Systems.Contemporary Music Review. 18 (3), 83-87. Saint-Arnaud, N. (1991). Classification of Sound Textures. Mater of Science in Telecommunications. Universite Laval, Quebec. Sale, J., Lohfeld, L. & Brazil, K (2002). Revisiting the Quantitative-Qualitative Debate: Implications for Mixed-Methods Research. Netherlands: Kluwer Academic Publishers. Strobl, G., Eckel, G. & Rocchesso, D. (2006). Sound Texture Modelling: A Survey. Proceedings of the Sound and Music Computing Conference. Strobl, G. (2007). Parametric Sound Texture Generator. Graz University, Styria. Turchet, L. & Serafin, S. (2011). A Preliminary Study on Sound Delivery Methods for Footstep Sounds. Proceeding of the 14th International Conference on Digital Audio Effects. Turner, D.. (2010). Qualitative Interview Design: A Practical Guide For Novice Investigators. The Qualitative Report. 15, 754-760. Truax, B (1993). Time-Shifting and Transposition of Sampled Sound With a Real-Time Granulation Technique. Proceedings of the International Computer Music Conference. Yewdall, D (2011). The Practical Art of Motion Picture Sound. 4th ed. Oxford: Focal Press.

62

Wood, N & Cowan, N. (1995). The Cocktail Party Phenomenon Revisited: How Frequent Are Attention Shifts to One's Name in a Irrelevant Auditory Channel?. Journal of Experimental Psychology: Learning, Memory and Cognition. 21 (1), 225-260.

63

You might also like