You are on page 1of 295



A thesis presented to The National Univerisity of Ireland in fullment of the requirements for the degree of



Supervisor of Research: An tOllamh A.M. de Paor

Head of Department: Professor T. Brazil


When people become disabled as a result of a road trac accident, stroke or another condition, they may often lose their ability to control their environment and communicate with others by conventional means. This thesis investigates methods of harnessing vestigial body signals as channels of control and communication for people with very severe disabilities, using advanced signal acquisition and processing techniques. Bioelectrical, acoustic and movement signals are among the signals investigated. Some applications are presented that have been developed to assist environmental control and communication. These applications rely on a variety of control signals for operation. Some applications may be controlled by a simple binary switching action whereas others require user selection from a wider range of possible options. A mechanical switch or adjustable knob may be used to interact with these applications but this may not be an option for people who are very severely disabled. The remainder of the thesis focuses on alternative methods of enabling user interaction with these and other applications. If a person who is physically disabled is able to modify some body signal in such a way that two states can be distinguished reliably and repeatedly, then this can be used to actuate a switching action. Reliable detection of more than two states is necessary for multiple-level switching control. As users abilities, requirements and personal preferences vary greatly, a wide range of body signals have been explored. Bio-signals investigated include the electrooculogram (EOG), the electromyogram (EMG), the mechanomyogram (MMG) and the conductance of the skin. The EOG is the electrical signal measurable around the eyes and can be used to detect eye movements with careful signal processing. The EMG and the

MMG are the electrical and mechanical signals observable as a result of muscle contraction. The conductance of the skin varies as a person relaxes or tenses and with practice it can be consciously controlled. These signals were all explored as methods of communication and control. Also, investigation of the underlying physical processes that generate these signals led to the development of a number of mathematical models. These models are also presented here. Small movements may be harnessed using computer vision techniques. This has the advantage of being non-contact. Often people who have become disabled will still be capable of making ickers of movement e.g. with a nger or a toe. While these movements may be too weak to operate a mechanical switch, if they are repeatable they may be used to provide a switching action in software through detection with a video camera. Phoneme recognition is explored as an alternative to speech recognition. Physically disabled persons who have lost the ability to produce speech may still be capable of making simple sounds such as single-phoneme utterances. If these sounds are consistently repeatable then they may be used as the basis of a communication or control device. Phoneme recognition oers another advantage over speech recognition in that it may provide a method of controlling a continuously varying parameter through varying the length of the phoneme or the pitch of a vowel sound. Temporal and spectral features that characterise dierent phonemes are explored to enable phoneme distinction. Phoneme recognition devices developed in both hardware and software are described.


I would rstly like to thank Harry, my supervisor, for all his support, encouragement and advice and for sacricing his August bank holiday Monday to help me get this thesis in on time!

Thanks also to all the postgrads who have been in the lab in the NRH with me over the past three years - Deirdre, Claire, Catherine, Kieran, Ciaran and Jane. Special thanks to Ted for all his assistance, support and friendship. Thanks also to Emer for generating some of the graphs for this thesis.

Thanks to my parents for their patience and nancial help and to my sisters Tamara and Jill for keeping the house (relatively) quiet to enable me to get some work done.

Thanks to all my friends for understanding my disappearance over the past few months and giving me space to get this thesis nished.

Finally, a big thanks to Conor for being so supportive and patient with me over the past few months, for giving me a quiet place to work and for helping me with the pictures for this thesis!


An Investigation into Non-Verbal Sound-Based Modes of Human-to-Computer Communication with Rehabilitation Applications, Edward Burke, Yvonne Nolan & Annraoi de Paor, Adjunct Proceedings of 10th International Conference on Human-Computer Interaction, Crete, June 22-27 2003, pp. 241-2.

The Mechanomyogram as a Tool of Communication and Control for the Disabled, Yvonne Nolan & Annraoi de Paor, 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Francisco, CA, September 1-5 2004, pp. 4928-2931.

An Electrooculogram Based System for Communication and Control Using Target Position Variation, Edward Burke, Yvonne Nolan & Annraoi de Paor, IEEE EMBSS UKRI Postgraduate Conference on Biomedical Engineering and Medical Physics, Reading, UK, July 18-20 2005, pp. 25-6.

The human eye position control system in a rehabilitation setting, Yvonne Nolan, Edward Burke, Claire Boylan & Annraoi de Paor, International Conference on Trends in Biomedical Engineering, University of Zilina, Slovakia, September 7-9 2005.

Accepted Paper: Phoneme Recognition Based Software System for Computer Interaction by Disabled People, Yvonne Nolan & Annraoi de Paor, IEEE EUROCON 2005 - International Conference on Computers as a Tool, University of Belgrade, Serbia and Montenegro, November 21-24 2005.



1 Introduction 1.1 Assistive Technologies . . . . . . . . . . . . . . . . . . . . . . . 1.2 Thesis Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 2 3

2 Assistive Technology 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Causes of Paralysis . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 2.2.2 2.2.3 Neurological Damage . . . . . . . . . . . . . . . . . . . . Spinal Cord Injuries . . . . . . . . . . . . . . . . . . . .

6 6 7 7 9

Diseases of the Nervous System . . . . . . . . . . . . . . 17

2.3 Assistive Technology . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.1 2.3.2 2.3.3 Importance of a Switching Action . . . . . . . . . . . . . 19 Switch Based Systems . . . . . . . . . . . . . . . . . . . 20 Brain Computer Interfaces . . . . . . . . . . . . . . . . 23

2.4 Communication Device . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.1 Technical Details . . . . . . . . . . . . . . . . . . . . . . 25


2.4.2 2.4.3 2.4.4 2.4.5

The Natterbox Graphical User Inteface . . . . . . . . . . 25 Switch Interface Box . . . . . . . . . . . . . . . . . . . . 26 Other Features . . . . . . . . . . . . . . . . . . . . . . . 26 Possible Future Developments of Natterbox . . . . . . . 31

2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Muscle Signals


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 The Nervous System . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.1 3.2.2 Nerves and the Nervous System . . . . . . . . . . . . . . 34 Resting and Action Potentials . . . . . . . . . . . . . . . 38

3.3 Muscles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3.1 3.3.2 3.3.3 Muscle Physiology . . . . . . . . . . . . . . . . . . . . . 41 Muscle Contraction . . . . . . . . . . . . . . . . . . . . . 44 Muscle Action in People with Physical Disabilities . . . . 47

3.4 Electromyogram . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.1 3.4.2 EMG Measurement . . . . . . . . . . . . . . . . . . . . . 49 EMG as a Control Signal . . . . . . . . . . . . . . . . . . 52

3.5 Mechanomyogram . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.1 3.5.2 MMG as a Control Signal . . . . . . . . . . . . . . . . . 56 MMG Application for Communication and Control . . . 58

3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64


4 Other Biosignals - Eye Movements and Skin Conductance


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 The Electrooculogram . . . . . . . . . . . . . . . . . . . . . . . 66 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.2.6 4.2.7 4.2.8 4.2.9 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 66 Anatomy of the Eye . . . . . . . . . . . . . . . . . . . . 67 Eye Tracking Methodologies . . . . . . . . . . . . . . . . 69 The EOG as a Control Signal . . . . . . . . . . . . . . . 76 Target Position Variation . . . . . . . . . . . . . . . . . 84

Experimental Work . . . . . . . . . . . . . . . . . . . . . 86 TPV Based Menu Selection . . . . . . . . . . . . . . . . 94 Limitations of Eyetracking for Cursor Control . . . . . . 99 A Model of the Eye . . . . . . . . . . . . . . . . . . . . . 100

4.3 Electrodermal Activity as a Control Signal . . . . . . . . . . . . 119 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 119 Anatomy and Physiology of the Skin . . . . . . . . . . . 120 Electrodermal Activity . . . . . . . . . . . . . . . . . . . 121 Skin Conductance as a Control Signal . . . . . . . . . . . 123 Non-invasive Measurement of the Sympathetic System Firing Rate . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5 Visual Techniques


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.2 Visual Based Communication and Control Systems . . . . . . . 133 5.2.1 5.2.2 The Camera Mouse . . . . . . . . . . . . . . . . . . . . . 133 Reected Laser Speckle Pattern . . . . . . . . . . . . . . 135

5.3 Visual Technique for Switching Action . . . . . . . . . . . . . . 136 5.3.1 5.3.2 5.3.3 5.3.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 136 Technical Details . . . . . . . . . . . . . . . . . . . . . . 138 Frame Comparison Method . . . . . . . . . . . . . . . . 139 Path Description Method . . . . . . . . . . . . . . . . . 150

5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

6 Acoustic Body Signals


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.2 Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 160 6.2.1 6.2.2 Speech Recognition: Techniques . . . . . . . . . . . . . . 160 Speech Recognition: Limitations . . . . . . . . . . . . . . 163

6.3 Anatomy, Physiology and Physics of Speech Production . . . . . 164 6.3.1 6.3.2 6.3.3 6.3.4 Respiration . . . . . . . . . . . . . . . . . . . . . . . . . 165 Phonation . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Resonance . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Articulation . . . . . . . . . . . . . . . . . . . . . . . . . 171

6.4 Types of Speech Sounds . . . . . . . . . . . . . . . . . . . . . . 173


6.4.1 6.4.2 6.4.3 6.4.4

The Phoneme . . . . . . . . . . . . . . . . . . . . . . . . 174 Types of Excitation . . . . . . . . . . . . . . . . . . . . . 177 Characteristics of Speech Sounds . . . . . . . . . . . . . 180 Proposal of a Phoneme Recognition Based System for Communication and Control . . . . . . . . . . . . . . . . 183

6.5 Hardware Application . . . . . . . . . . . . . . . . . . . . . . . 186 6.5.1 6.5.2 Analogue Circuit . . . . . . . . . . . . . . . . . . . . . . 188 Microcontroller Circuit . . . . . . . . . . . . . . . . . . . 192

6.6 Software Application . . . . . . . . . . . . . . . . . . . . . . . . 194 6.6.1 6.6.2 Application for Linux . . . . . . . . . . . . . . . . . . . . 195 Application for Windows . . . . . . . . . . . . . . . . . . 199

6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

7 Conclusions


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.2 Resolution of the Aims of this Thesis . . . . . . . . . . . . . . . 212 7.2.1 7.2.2 7.2.3 7.2.4 Overview of Current Communication and Control Methods213 Identication of Signals . . . . . . . . . . . . . . . . . . . 213 Measurement Techniques . . . . . . . . . . . . . . . . . . 214 Signal Processing Techniques and Working Systems Developed . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 7.2.5 7.2.6 Patient Testing . . . . . . . . . . . . . . . . . . . . . . . 218 Biological Studies . . . . . . . . . . . . . . . . . . . . . . 220 vii

7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 7.3.1 7.3.2 7.3.3 7.3.4 7.3.5 7.3.6 The Mechanomyogram . . . . . . . . . . . . . . . . . . . 221 Target Position Variation . . . . . . . . . . . . . . . . . . 222 Visual Methods for Mouse Cursor Control . . . . . . . . 222 Communication System Speed . . . . . . . . . . . . . . . 223 Multi-Modal Control Signals . . . . . . . . . . . . . . . . 223 Other Vestigial Signals . . . . . . . . . . . . . . . . . . . 223

A MMG Circuit


B Simulink Models


C MATLAB Code for TPV Fit Function


D Optimum Stability


E Circuit Diagram for Measuring Skin Conductance


Phoneme Detection Circuit Diagrams and Circuit Analysis 251 F.1 Analogue Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . 251 F.1.1 Pre-Amplier . . . . . . . . . . . . . . . . . . . . . . . . 251 F.1.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 F.1.3 Amplier . . . . . . . . . . . . . . . . . . . . . . . . . . 254 F.1.4 Rectier . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 F.1.5 Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . 255


F.1.6 Delay and Comparator . . . . . . . . . . . . . . . . . . . 256 F.1.7 Relays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 F.2 Microcontroller Circuit . . . . . . . . . . . . . . . . . . . . . . . 259 F.2.1 Microphone . . . . . . . . . . . . . . . . . . . . . . . . . 259 F.2.2 Amplier . . . . . . . . . . . . . . . . . . . . . . . . . . 259 F.2.3 Innite Clipper . . . . . . . . . . . . . . . . . . . . . . . 262 F.2.4 Microcontroller . . . . . . . . . . . . . . . . . . . . . . . 262 F.2.5 Debouncing Circuit . . . . . . . . . . . . . . . . . . . . . 262 F.2.6 Current Amplier and Relay Coils . . . . . . . . . . . . . 263

G PIC 16F84 External Components and Pinout


H Phoneme Recognition Microcontroller Code and Flowchart 266

Code for Programs I.1 I.2 I.3 I.4 I.5 I.6


Natterbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 USB Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 MMG Detection Program . . . . . . . . . . . . . . . . . . . . . 274 Path Description Program . . . . . . . . . . . . . . . . . . . . . 274 Graphical Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Spelling Bee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275


List of Figures
2.1 The Vertebral Column . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 The Spinal Nerves . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Dasher program . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4 Natterbox GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5 Natterbox Phrases Menu . . . . . . . . . . . . . . . . . . . . . . 30 3.1 The Nerve Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Classication of Nerve Fibre Types . . . . . . . . . . . . . . . . 36 3.3 Nerve Fibres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 An Action Potential . . . . . . . . . . . . . . . . . . . . . . . . 39

3.5 Muscle Anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.6 The Muscle Fibre . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.7 Sarcomere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.8 The Neck Muscles . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.9 EMG and frequency spectrum . . . . . . . . . . . . . . . . . . . 50 3.10 EMG Dierential Amplier . . . . . . . . . . . . . . . . . . . . 51

3.11 Electrode Position

. . . . . . . . . . . . . . . . . . . . . . . . . 51

3.12 MMG showing Muscle Contraction . . . . . . . . . . . . . . . . 57 3.13 MMG Prosthesis Socket . . . . . . . . . . . . . . . . . . . . . . 58 3.14 Accelerometer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.15 MMG Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.1 The Outer Eye . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2 Cross section of the eye . . . . . . . . . . . . . . . . . . . . . . . 68 4.3 Pupil and Corneal Reections . . . . . . . . . . . . . . . . . . . 72 4.4 50Hz Video Eyetracker . . . . . . . . . . . . . . . . . . . . . . . 73 4.5 Scleral Search Coil . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.6 EOG Electrode Positions . . . . . . . . . . . . . . . . . . . . . . 75 4.7 EOG recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.8 EOG controlled alphabet board . . . . . . . . . . . . . . . . . . 77 4.9 TPV Based Menu Selection Application . . . . . . . . . . . . . 85

4.10 TPV Candidate Target Shapes . . . . . . . . . . . . . . . . . . . 87 4.11 Results of TPV: Experiment 1 . . . . . . . . . . . . . . . . . . . 90 4.12 TPV Experiment 2 Screenshot . . . . . . . . . . . . . . . . . . . 94 4.13 TPV Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.14 Fit Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.15 Eye feedback control loop . . . . . . . . . . . . . . . . . . . . . 102 4.16 Step Response of Eye with Muscle Spindle Inuence . . . . . . . 106


4.17 Nuclear Bag Model . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.18 Unit step response and Bode magnitude diagrams of the muscle spindle controllers . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.19 Actual EOG and Simulated Saccadic Responses . . . . . . . . . 111 4.20 Feedback Control Loop for Smooth Pursuit . . . . . . . . . . . . 113 4.21 Modied loop for Smooth Pursuit . . . . . . . . . . . . . . . . . 115 4.22 Bode Plot for Gi (s) . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.23 Smooth Pursuit Model Graphs . . . . . . . . . . . . . . . . . . . 117 4.24 Sweat Gland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.25 Electrodermal response . . . . . . . . . . . . . . . . . . . . . . . 124 4.26 Skin Conductance Model . . . . . . . . . . . . . . . . . . . . . . 126 4.27 Proposed Loop For Firing Rate Output . . . . . . . . . . . . . . 127 4.28 Measured and Modelled Skin Conductance . . . . . . . . . . . . 128 4.29 Measured Skin Conductance and Estimated Firing Rate . . . . . 129 5.1 Camera Mouse Search Window . . . . . . . . . . . . . . . . . . 134 5.2 Speckle Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.3 Webcam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.4 Filter Graph used for Video Data in application. . . . . . . . . . 139 5.5 Filtered Video Frames . . . . . . . . . . . . . . . . . . . . . . . 143 5.6 Various Thresholding Methods . . . . . . . . . . . . . . . . . . . 145 5.7 Video Frame Histogram . . . . . . . . . . . . . . . . . . . . . . 146


5.8 Path Description . . . . . . . . . . . . . . . . . . . . . . . . . . 152 5.9 Region Finding . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 5.10 Overlapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.1 The Vocal Organs . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.2 Waveform of Vowel Sounds . . . . . . . . . . . . . . . . . . . . . 179 6.3 Spectrum of Vowel Sounds . . . . . . . . . . . . . . . . . . . . . 182 6.4 Phoneme Waveforms and Spectra . . . . . . . . . . . . . . . . . 189 6.5 Analogue Circuit Block Diagram . . . . . . . . . . . . . . . . . 190 6.6 Audio signal pre-processing . . . . . . . . . . . . . . . . . . . . 193 6.7 AudioWidget GUI . . . . . . . . . . . . . . . . . . . . . . . . . 200 6.8 Graphical Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 6.9 The X10 Module . . . . . . . . . . . . . . . . . . . . . . . . . . 201 6.10 Phoneme Detection Program Signal and Spectrum . . . . . . . . 206 6.11 The Spelling Bee GUI . . . . . . . . . . . . . . . . . . . . . . . 208 A.1 MMG Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 B.1 Simulink MMG Muscle Contraction Detection . . . . . . . . . . 238 B.2 Simulink Model for Eye System . . . . . . . . . . . . . . . . . . 239 B.3 Simulink Model for Smooth Pursuit . . . . . . . . . . . . . . . . 240 B.4 Simulink Model for Firing Rate . . . . . . . . . . . . . . . . . . 241 D.1 Root Locus Varying f0 . . . . . . . . . . . . . . . . . . . . . . . 245


D.2 Root Locus Varying f1 . . . . . . . . . . . . . . . . . . . . . . . 246 D.3 Root Locus Varying h0 . . . . . . . . . . . . . . . . . . . . . . . 247 D.4 Root Locus Varying h1 . . . . . . . . . . . . . . . . . . . . . . . 248 E.1 Skin Conductance Circuit Diagram . . . . . . . . . . . . . . . . 250 F.1 Circuit Diagram for Phoneme Detection . . . . . . . . . . . . . 257 F.2 Electret Microphone Circuit . . . . . . . . . . . . . . . . . . . . 260 F.3 Circuit Diagram for PIC-Based Phoneme Detection . . . . . . . 261 G.1 Pin-out Diagram for PIC . . . . . . . . . . . . . . . . . . . . . . 265 H.1 Microcontroller Flowchart . . . . . . . . . . . . . . . . . . . . . 272


List of Tables
2.1 Cranial Nerve Damage . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Incomplete Spinal Cord Injury Patterns . . . . . . . . . . . . . . 14 2.3 Spinal Cord Injuries Motor Classications . . . . . . . . . . . . 15 2.4 Spinal Cord Injury Functional Abilities . . . . . . . . . . . . . . 16 3.1 MMG Experimental Results . . . . . . . . . . . . . . . . . . . . 63 4.1 Icon Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.2 TPV Experiment 2 Sequence . . . . . . . . . . . . . . . . . . . . 93 5.1 Program Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

5.2 Video Capture Parameters . . . . . . . . . . . . . . . . . . . . . 141 5.3 RGB24 format . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6.1 The Phonemes of Hiberno-English . . . . . . . . . . . . . . . . . 176 6.2 Classication of English Consonants . . . . . . . . . . . . . . . . 178 6.3 Spectral Peaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 6.4 Example Relative Harmonic Amplitudes . . . . . . . . . . . . . 205


F.1 Component Values for Phoneme Detection Circuit . . . . . . . . 258 F.2 Component Values for PIC-Based Circuit . . . . . . . . . . . . . 260


Chapter 1 Introduction
This thesis arises from work in the Engineering Research Laboratory in the National Rehabilitation Hospital (NRH)1 . Typically, the patients in this hospital are people who have become disabled as a result of a stroke, disease or accident. Advances in medical research are ensuring that more and more people survive from these disabling conditions. It is important that research follows that not only keeps these people alive, but also enables a fullling and worthwhile quality of life. Loss of speech production abilities can be one of the most devastating elements of severe physical disability. Without the means to communicate by conventional methods, people may nd themselves shut o from the outside world. Communication with other people is one of the most important actions that we as humans perform. It is important to be able to converse with loved ones, and to have a means for expressing our emotions, needs and desires. Communication with others allows us to build relationships, make requests, reach our intellectual potential and lead a stimulating and participative life. The independence of people with severe physical disabilities is also an important consideration. Results from the 2002 census from the Central Statistics

Rochestown Ave., Dun Laoghaire, Co. Dublin, Ireland

Oce [1] indicate that there are 159,000 people in this country who provide regular unpaid help for a friend or family member with a long-term illness, health problem or disability. Frequent reliance on family and friends can be frustrating for the disabled person, both for practical reasons and because it can compromise a persons feelings of dignity. As technology advances, it is important to ensure that systems are developed which can provide disabled people with the ability to control their living environment, without needing assistance from others.


Assistive Technologies

For people who are unable to control their environment and communicate with others by conventional means, there are various systems available which provide alternative methods of performing these tasks. The term augmentative and alternative communication is often used to describe a range of alternative communication techniques, from the use of gestures, sign language and facial expressions to the use of alphabet or picture symbol boards [2]. In order to be able to make use of these systems it is necessary to be able to interact with the system in some way. Perkins and Stenning [3] state that the main objective for people who are unable to use a keyboard is to be able to identify a function or movement over which they have some control and utilise that. This could be from movement of the head, eyes, chin, arms, hands or feet, for example. These movements can be converted into such electrical signals as on or o switches, or, in the case of those with a little more control, variable voltages. People with very severe physical disabilities may only be capable of making very small movements to indicate intent that may be dicult to harness. The focus of this thesis is on investigating advanced methods of signal acquisition and signal processing to enable these signals to be captured and used to control communication and control devices. The principal aims of this thesis may be outlined as follows. 2

Overview of current methods of providing communication and control for disabled people. Identication of alternative signals from the body which may be harnessed from the body for communication and control purposes for people with very severe disabilities. Study of measurement techniques that may be used to acquire these vestigial signals. Investigation of signal processing methods to enable these signals to be correctly interpreted. Development of working systems that demonstrate the capabilities of these techniques. Testing of these techniques and systems with people with severe disabilities. Development of some mathematical models that evolved as a result of studying these body signals.


Thesis Layout

Some of the causes of paralysis and severe disability are outlined in Chapter 2. An overview of assistive technology applications that may be of relevance to people with very severe disabilities is given and the importance of identifying a switching action is emphasised. An alphabet board based communication tool was developed as part of this work called the Natterbox. This is also described in Chapter 2. The nervous system and the structure of muscle are given in Chapter 3, and the mechanism of muscle contraction is described. Often people who are disabled will retain some ability to contract certain muscles, but not to a sucient 3

extent to enable a mechanical switch to be used. However, the muscle contraction may still be harnessed for communication and control purposes through other means. The electromyogram is the electrical signal observable from the surface of the skin due to action potentials which occur on contraction. The electromyogram as a control signal for prosthetics and for communication and control systems is described. An alternative method of measuring muscle contraction for communication and control purposes is proposed. This method uses the mechanomyogram, which is the mechanical signal observable on the skin surface due to muscle contraction. A mechanomyogram based system for communication and control was developed and this is presented here. Some experiments were also performed with this system to assess its ecacy in controlling an alphabet board. The results of these experiments are reported. Two more biosignals are investigated in Chapter 4, the electrooculogram and the electrical conductance of the skin. The electrooculogram is the electrical signal observable around the eyes which can be used to measure eye movement. An overview of dierent eye movement measurement techniques is given and the electrooculogram is described in more detail. Some limitations of the electrooculogram signal as a communication and control signal are identied and a novel technique is presented that seeks to overcome these limitations to allow the electrooculogram to be used as a control signal. Study of movement of the eyes led to development of a mathematical model of the eye, which is also presented in Chapter 4. This model incorporates the eect of the muscle spindle on the eyes torque and predicts saccadic and smooth pursuit eye movements. The electrical conductance of the skin is also briey explored as a control signal. Electrical skin conductance is related to sweat gland activity on the surface of the skin and may be modulated by tensing or relaxing, as will be discussed. Resulting from this study, a technique for measuring the ring rate of the sympathetic nervous system was developed which uses measurement of the skin conductance as its input. Visual techniques are discussed in Chapter 5, which use a computer camera 4

or another light sensitive device to measure movement. Often people who have become disabled will retain the ability to make ickers of movement of a certain body part, for example a nger or a thumb. If these movements are repeatable then they may be used to indicate intent. A novel algorithm for describing specic paths of motion is presented. This algorithm is incorporated into a software program, which detects specic movements and uses them to generate a switching action. This switching action can then be used to control any communication and control application operable by one switch. Acoustic methods of harnessing signals from the body are explored in Chapter 6. For people who have speech production abilities, there is a wide range of speech recognition technologies available that allow environmental control using the voice. For those who are unable to speak, there may still be ways of harnessing acoustic signals from the body. Often people who have lost the ability to produce speech will still remain capable of producing non-verbal utterances. If these utterances are repeatable then they may be used as the basis of a communication and control system. A number of acoustic based systems were developed as part of the work described here and these are presented in this chapter. A system for controlling a reading machine, an environmental controller and an alphabet board based communication device are given. The conclusions drawn from the research presented here are given in Chapter 7. Suggestions are made for future work in the area of communication and control for disabled people.

Chapter 2 Assistive Technology



Assistive technology is dened by Lazzaro [4] as any device that enables persons with disabilities to work, study, live, or play independently. Cook and Hussey [5] describe it as any device or technology that increases or improves the functional capabilities of individuals with disabilities. Assistive technology may oer assistance to people with a wide range of disabilities including vision, hearing, motor, speech and learning impairments. Screen magniers and braille are assistive technologies for blind or partially blind persons. Hearing aids and subtitled lms may be classed as assistive technologies for the deaf. This thesis focuses on assistive technologies for people who, for one reason or another, require assistance to communicate with others and to control their environment. A principal aim of this thesis is to explore ways in which signals from the body may be harnessed so that people with extremely severe physical disabilities can interact with control and communication devices. In this chapter, some of the possible causes of paralysis are rst described in Section 2.2. Section 2.3 reviews some of the available assistive technology devices that may be of benet to such people. An application called the

Natterbox is presented in Section 2.4. This communication application was developed as part of this work to act as a testing board for switching action methods described in later chapters.


Causes of Paralysis

There are many dierent circumstances that will lead to a person requiring the use of an assistive device to communicate with others or to control their environment. Paralysis can result from spinal injury following a road trac accident or other trauma. It can be caused by damage to the brain due to a brain haemorrhage or a tumour. Motor neurone diseases, which cause wasting of the muscle tissue, may eventually lead to paralysis, and necessitate use of a communication and control device. Some of the reasons that may lead to a person becoming severely physically disabled are discussed in this section although this review is by no means extensive. A major focus of this thesis is on exploring a range of available options, so that a suitable assistive technology system may be identied for each individual user, based on their capabilities and requirements, rather than oering one single solution that will allow all severely disabled people to use a control and communication device. Similarly, it is impossible to state here the exact group of people who might benet from the methods described in this thesis. Some of the more common causes of paralysis will now be discussed.


Neurological Damage

Neurological damage, or damage to the brain, can occur due to a number of dierent circumstances. One of the most commonly occurring reasons is due to a stroke. The Irish Health Website [6] estimates that 8500 people in this country suer from a stroke annually.

Stroke is not a disease in itself, but a syndrome of neurological damage caused by cerebrovascular disease [7]. Although paralysis is the most commonly associated aspect of a stroke, the stroke syndrome consists of a number of dierent aspects which also include spasticity, contractures, sensory disturbances, psychological impairments, emotional and personality changes and apraxia (the loss of ability to carry out familiar purposeful movements in the absence of paralysis [8]). A stroke occurs when normal blood circulation in the brain is interrupted, either due to occlusion caused by a blood clot (an ischemic stroke) or through sudden bursting of blood vessels (a haemorrhagic stroke). Strokes due to blood clots may be divided into two categories. Cerebral thrombosis occurs due a clot that develops in situ and cerebral embolism is caused by a clot that forms elsewhere in the body and travels up to the brain [7]. Paralysis can result from damage to the frontal lobe and/or damage to the internal capsule bres. The frontal lobe of the brain contains the motor area, which connects to the motor cranial nerve nuclei and the anterior horn cells. The internal capsule of the brain is the narrow pathway for all motor and sensory bres ascending from lower levels to the cortex. Damage to one side of the motor bres or the frontal lobe leads to loss of power in the muscles on the side of the body opposite the lesion [9], a paralysis known as hemiplegia [8]. While paralysis is the main symptom of a stroke relevant here, some other symptoms caused by damage to the cranial nerves are summarised in Table 2.1. The cranial nerves exist in pairs and damage to one of the nerves may result in the symptoms listed at the side of the lesion. Note that damage to the tenth nerve is one of the causes of total or partial loss of speech production abilities. Speech impairments will be discussed in more detail in Chapter 6. Following a stroke, some voluntary movement may return within a few weeks of the incident. This is usually due to a number of causes. Following cerebral infarction and particularly in the case of a cerebral haemorrhage, abnormally large amounts of uid in the surrounding tissue can temporarily 8

Table 2.1: Signs and symptoms of cranial damage, adapted from [10], pg. 100 Nerve V Name

Signs and Symptoms of Damage

Pain and burning on outer and inner aspect of cheek Loss of sensation over face and cheek


abducens facial auditory

Diplopia, external rectus weakness, squint Weakness of face Vertigo, vomiting, nystagmus Deafness and tinnitus


glossopharyngeal vagus

Loss of taste Dysphagia Paralysis of vocal cord and palate

disrupt neurological function. As the pressure subsides, the neurons in this area may regain function. Motor function may also be restored due to central nervous system reorganisation where other areas of the brain take on the role of voluntary motor control [7]. This partial return of voluntary movement following a stroke may be of enormous benet when considering methods for enabling stroke victims to interact with control and communication systems.


Spinal Cord Injuries

Spinal cord injuries usually occur as the result of a trauma, which is often caused by a road trac accident or a domestic, sporting or work-related injury. The basic anatomical features of the spine and the innervation of the spinal cord will rst be discussed and the classications of spinal cord injury will then be described.

Structure of the Vertebral Column and the Spine The spinal cord is protected by the vertebral column, a line of bony vertebrae that runs down the middle of the back. The structure of the vertebral column 9

is shown in Figure 2.1. When viewed from the side, the vertebral column displays ve curves - an upper and lower cervical curve, and one each thoracic, lumbar and sacral [11]. The sacral curve is not shown in Figure 2.1 but it is located at the very bottom of the vertebral column, from the lumbarsacral junction to the coccyx. The coccyx is better known as the tailbone, which is made up of several fused vertebrae at the base of the spine [12]. The spinal cord terminates before the end of the vertebral column, around the top of the lumbar vertebrae in adults [13]. The lower tip of the spinal cord is called the conus medullaris [8]. The area from the conus medullaris to the coccyx is known as the cauda equina [13].

The Cervical Spine The purpose of the cervical spine is mobility. The two curves in the cervical spine can be divided into upper and lower segments at the second cervical vertebra. The rst cervical vertebra (C1) is called the atlas and the second cervical vertebra (C2) is called the axis. The upper cervical muscles move the head and neck and are principally concerned with positioning of the eyes and the line of vision, hence these muscles are highly innervated to enable these movements to be made with a ne degree of precision [11]. The axis provides a pivot about which the atlas and head rotate. The lower cervical spine (C2-C7) also contribute to movement of the head and neck. The Thoracic Spine and Ribs An important function of the thoracic spine and rib cage is to protect the heart, lungs and major vessels from compression. Due to this, the thoracic area is the least mobile region of the spine. The thoracic vertebrae are numbered T1-T12 and the ribs are numbered R1-12 on each side. The diaphragm muscle bres are attached to ribs R7-R12.


Figure 2.1: The Vertebral Column, from pg. 2 in [11]


The Lumbar Spine The lumbar spine is made up of ve vertebrae numbered L1-L5. The fth lumbar vertebra (L5) is the largest and its ligaments assist in stabilising the lumbar spine to the pelvis. There are 31 pairs of spinal nerves attached to the spinal column. Each pair is named according to the vertebra to which they are related. The spinal nerves are shown in Figure 2.2.

Classication of Injury Injury of the spinal cord may produce damage that results in complete or incomplete impairment of function. A complete lesion is one where motor and sensory function are absent below the level of injury. A complete lesion may be caused by a complete severance of the spinal cord, by nerve bre breakage due to stretching of the cord or due to a restriction of blood ow (ischaemia) to the cord. An incomplete lesion will enable certain degrees of motor and/or sensory function below the injury [14]. There are recognised patterns of incomplete spinal cord injuries,which are summarised in Table 2.2. A spinal cord injury may produce damage to upper motor neurons, lower motor neurons or both. Upper motor neurons originate in the brain and are located within the spinal cord. An upper motor neuron injury will be located at or above T12. Upper motor neuron injury produces spasticity of limbs below the level of the lesion and spasticity of bowel and bladder functioning. Lower motor neurons originate within the spinal cord where they receive nerve impulses from the upper motor neurons. These neurons transmit motor impulses to specic muscle groups and receive sensory information which is transmitted back to the upper motor neurons. Lower motor neuron injuries may occur at the level of the upper neuron but more commonly are identied when occurring at or below T12. Lower motor neuron injuries produce accidity of the legs, decreased muscle tone, loss of reexes and atonicity of bladder and bowel [14]. 12

Figure 2.2: The Spinal Nerves, pg. 208 in [11]


Table 2.2: Patterns of incomplete spinal cord injuries, from text in [14] Syndrome
Central Cord

Damaged Area Common Cause

Cervical Region Hyperextension injury

Flaccid arm weakness Good leg function Injured Side Loss of Motor Function Uninjured Side Loss of temperature & pain sensation


Hemisection of Spinal Cord

Stab Wound

Anterior Cord

Corticospinal & spinothalamic tracts

Ischaemia & direct trauma

Variable loss of motor function Reduced sensitivity to pain and temperature

Conus medullaris/ cauda equina

Sacral cord or the cauda equina nerves

Flaccid bladder and bowel Loss of leg motor function

Spinal cord injuries due to complete lesions are usually classied according to the level of injury to the spine. Table 2.3 summarises the motor classication of spinal cord injury. The word paraplegia describes lower lesion spinal cord injuries resulting in partial or total loss of the use of the legs. The words tetraplegia and quadriplegia both describe high level spinal cord injuries, usually occurring due to injury of the cervical spine. Both terms mean paralysis of four limbs and the injury causes the victim to lose total or partial use of their arms and legs [15]. The main causes of spinal cord injury may be gauged from gures from the Duke of Cornwall Spinal Treatment Centre, which are given in [16]. For the new patient admissions with spinal injuries for the period 1993-1995, 36% are due to road trac accidents, 6.5% are due to self harm and criminal assault, 37% are due to domestic and industrial accidents and 20.5% are due to injuries at sport. Until recently spinal cord injury was recognised as a fatal condition.


Table 2.3: Motor classication of spinal cord injury, adapted from pg. 63 in [14] Level C4 C5 C6 C7 C8 T1 Muscles Deltoids Elbow Flexors Wrist Extensors Elbow Extensors Finger Flexors Finger Abductors Level L2 L3 L4 L5 S1 S4-S5 Muscles Hip Flexors Knee Extensors Ankle dorsiexors Long toe extensors Ankle Plantar Flexors Anal contraction

In the First World War, 90% of patients who suered a spinal cord injury died within one year of wounding and only about 1% survived more than 20 years [16]. The chances of survival from a spinal cord injury began to increase in the 1940s with the introduction of sulfanilamides and antibiotics [14]. Nowadays, due to better understanding and management of spinal cord injury, the outlook has greatly improved for people with spinal cord injuries. There has been a gradual change in the pattern of survival from low-lesion paraplegia in the 1950s, high-lesion paraplegia in the 1960s and low-lesion quadriplegia in the 1970s. Finally, in the 1980s, people with spinal cord injuries at or above C4, resulting in high-lesion quadriplegia, have been surviving in signicant numbers. It is estimated that each year in the USA, 166 sustain injury at C1-C3 and 540 people at C4 [14]. As medicine advances, such individuals will survive in increasing numbers and thus it is important to identify methods for interaction with communication and control systems for this group of severely disabled individuals. The functional ability of tetraplegic patients based on the level of injury are summarised in Table 2.4. In general, movements of the limbs suer more severely than those of the head, neck and truck. Movements of the lower face also tend to be more severely impaired than those of the upper face [10].


Table 2.4: Expected functional ability based on level of injury, constructed using
information from [16].

Level of Injury Complete lesion below C3

Functional Ability Dependent on others for all care Chin and head movement Can use breath controlled devices

Complete lesion below C4

Dependent on others for all care Chin and head movement Shoulder shrugging possible Can type/use computer using a mouth stick

Complete lesion below C5

Shoulder movement Elbow exion

Complete lesion below C6 Complete lesion below C7

Wrist Extension Full wrist movement Some hand function

Complete lesion below C8 Complete lesion below T1

All hand muscles except intrinsics preserved Complete innervation of arms



Diseases of the Nervous System

The words motor neurone disease (MND) and amyotrophic lateral sclerosis (ALS) are often used interchangeably. However, amyotrophic lateral sclerosis may be described more accurately as a type of motor neurone disease, and probably the most well known. Motor neurone diseases aect the motor nerves in the brain and the spinal cord [17] and the term motor neurone disease may be used to describe all the diseases of the anterior horn cells and motor system, including ALS [18]. Motor neurone diseases may be divided into two categories - idiopathic motor neurone diseases and toxin-related motor neurone diseases. An idiopathic disease is one of spontaneous origin [8]. The idiopathic motor neurone diseases include both the familial and juvenile forms of amyotrophic lateral sclerosis. Also included under this category are progressive bulbar palsy (PBP), progressive muscular atrophy (PMA), primary lateral sclerosis (PLS), Madras motor neurone disease and monomelic motor neurone disease [18]. The toxinrelated motor neurone diseases are suspected to be linked to environmental factors [18]. These include Guamanian ALS (due to a high incidence of ALS in Guam), lathyrism and Konzo. The exact gure for the number of people diagnosed with ALS varies, but it is thought to aect between 1-3 in every 100,000 of the population each year [17, 18]. There are an estimated 300 people living with amyotrophic lateral sclerosis at any one time in Ireland [17]. ALS is a progressive fatal disease of the nervous system and the rate of progression depends on the individual [18]. The muscles rst aected by motor neurone diseases tend to be those in the hands, feet or mouth and throat. As ALS progresses, the ability to walk, use the upper limbs and feed orally are progressively reduced. In the terminal stage of the disease, none of these functions can be independently performed and respiratory functions become compromised [18]. At this stage of the disease, it is as important as ever to give the person the best quality of life possible and


assistive technologies must be considered that can harness the vestigial signals left to these people. Usually, the motor function of the eye muscles is spared due to the calcium binding proteins in these nerve cells [18] and this feature could be used to provide a method of control and communication, as will be discussed in Chapter 4. Brain computer interface (BCI) technologies are also often considered at the very latest stages of the disease, these will be briey described in Section 2.3.3. Paralysis can also occur due to demyelinating diseases such as multiple sclerosis. A demyelinating disease causes impairment of conduction of signals in nerves as it damages the myelin sheath of neurons. More about the structure of nerves will be described in Chapter 3. Neurological damage resulting in paralysis may also occur due to viral infections such as poliomyelitis or polio [10] or due to bacterial infections such as bacterial meningitis, which aects the uid in the spinal cord and the uid surrounding the brain [19].


Assistive Technology

Assistive technologies can be of immense benet to people with severe physical disabilities such as those described above. As mentioned already, this thesis focuses mainly on facilitating interaction with two type of assistive technology applications - control and communication. Communication applications are usually described in assistive technology terms as augmentative and alternative communication (AAC) systems [2]. Augmentative and alternative communication systems refer to assistive technology systems designed for people who have limited or no speech production abilities. Alternative communication systems usually consist of some sort of alphabet board or symbolic board [4]. Some alternative communication systems display text to a computer screen, others output the text to a printer and some work in conjunction with speech synthesis systems to speak out the in-


tended message. Some are computer operated and some are handheld, such as the handheld LightWriter1 , a dual display keyboard based communication aid. Some, such as Voicemate2 , allow the user to record phrases for digitised playback [4]. Control applications refer to any system that can be operated automatically using a control signal. For example, a control signal could be used to handle an environmental control system to operate appliances in the users environment, such as lights, fans or the television. The reading machine described in Chapter 6 is another example of a system that may be operated using a control signal. Control signals can also be used to operate wheelchairs or electrically powered prosthetics. The electromyogram muscle signal is often harnessed to replace muscle function to control prosthetics for amputees, as described in Chapter 3.


Importance of a Switching Action

The simplest control signal is probably the switching action, which is any action that allows the user to alternate between two possible states, on or o. There are numerous systems in use today that may be operated by pressing a single switch or multiple switches. Such systems are often called switch-activated input systems [2]. A standard computer keyboard may be described as a switch based system for interfacing with a computer. The keyboard usually has around 100 keys or switches and each key press sends a control signal to the processor which is recognised as a dierent letter or symbol by the computer. The combination of two or more key presses may also be used to increase the number of possible control signals [5]. There are many types of commercially available switches and a comprehensive guide to switches is given in [20]. The standard type of switch is the paddle
1 2

Lightwriter, Zygo Industries, Inc., P.O. Box 1008, Portland, OR 97202 USA Tash Inc., Unit 1, 91 Station Street, Ajax, Ont. L1S 3H2, Canada.


type switch. These mechanical switches have movement in one direction and can be activated by the user by pressing on the switch with any part of the body. For persons who do not have sucient strength or ability to operate these switches there are a number of other types of switches available. These switches include suck-pu switches, wobble switches, leaf switches and lever switches [5, 21]. The switch chosen for a particular individual will depend on the capabilities of the user. For people who are very severely physically disabled, performing a switching action using any of these physical switches may not be an option. In these cases, other methods of harnessing signals from the body to provide a switching signal must be explored. One of the main objectives in developing alternative systems for communication and control is to be able to correctly identify two or more distinct states that a user can voluntarily elicit. If these states can be reliably distinguished, then transition from one state to another can be harnessed as a means of eecting a switching action.


Switch Based Systems

Switches are generally used in one of two ways - in a scanning system or in a coding system. In a coding system, the user taps out a message using some scheme such as the famous Morse code, using the switch. The Morse code software functions like a translator, converting Morse code to text in real time [4]. The coding can either be done using one switch with long switch presses for the dash and short switch presses for the dots, or using two separate switches to represent dots and dashes [2]. Morse code based systems have the disadvantage that the code must rst be learnt by the user. A more popular type of switch-activated input system uses scanning based selection. These systems are usually based on some variation of the rowscanning method described by Simpson and Koester [22]. The user is presented with a screen of options, arranged in rows and columns. The program scans 20

through the rows and the user can select a particular row by pressing a switch. The program then scans through each item on the selected row and the user can select the desired item by pressing a switch again. Row scanning is often used in software alphabet boards and can be used to spell out messages [2]. The idea of switch based menu selection has been around for years. The personal computer became popular in the early 1980s and software based assistive technology systems soon followed. An independent living system known as the ADAPTER program was developed around 20 years ago by a team in Lousiana Tech University in the USA [23]. This program uses the row-scanning method to allow the user to select one of several tasks from a menu. The ve options given are letters, words, codes, phone and environment. The program is designed to be operated with a mechanical switch and the two examples mentioned are a push-button switch and a bulb-pressure switch. If the user selects the letter option on the main menu then they will be presented with a second sub-menu with rows of letters and numbers which allows messages to be spelled out. The word option provides quick access to a list of important words e.g. light, water, bath etc. Selection of the code option allows communication through Morse code by pressing the switch for long or short periods which is then converted to text. The phone option displays a pre-programmed list of names and phone numbers which may be dialled through the computer and the environment option allows control of appliances in the users surroundings. Another scanning based alphabet board system developed around this time is described in [21], in which the scanning device is a hardware logic-based module that uses LEDs to highlight each character. This device can be connected to the computer as a substitute for a manually operated keyboard. The system uses two switches to scan through the characters and enter the required character into the computer. Damper [24] estimates that a communication rate of 6-8 words per minute is typically achieved using an alphabet board based communication system. There have been a number of dierent methods suggested for increasing the 21

rate at which the user can select the letters. Perkins and Stenning [3] experimented with the idea of using two or ve switches to operate an alphabet board and also tested the communication rate with dierent menu layouts. The two layouts tested had 57 characters - one had letters and each number once and the second had additional characters related to frequency of use (e.g. the letter E appears on the board ve times) but no numbers. Simpson and Koester [22] have proposed a method of increasing text entry rate using an adaptive row-column scanning algorithm which increases or decreases the scan delays according to user performance. Although it is not yet implemented as a switch based text entry system, the Dasher program by Ward [25] will briey be described. Rates of 39 words per minute have been claimed for it when operated using a mouse and 25 words per minute when operated using eye tracking. It is a software based program which enables a person to spell out words by steering through a continuously expanding two-dimensional scene containing alphabetical listings of the letters [26]. A screenshot from this program is shown in Figure 2.3. The line in the centre of the screen is the cursor. The user is initially presented with an alphabetical list of letters and the user selects a letter by moving the cursor inside the area of the letter. As the user approaches a letter the letter grows in size. Once the letter is selected the user will again nd themselves presented with another list of letters but the relative sizes of all the letters on the new list is based on the probability of this letter being the desired letter based on the previous letter. Dasher uses a language model to predict this, and the model is trainable on example documents in almost any language [26]. In the example shown in Figure 2.3, the user is spelling out the word demonstration and has already selected demonstrat. As the user moves the cursor closer towards the letter i, the letter grows in size until the user is inside the box. The screenshot also illustrates alternative words that could instead have been selected such as demolished, demonstrated that, demoralise and demonstrative. A number of dierent methods for interfacing with the


Figure 2.3: Dasher program - spelling out the word demonstration. Dasher program are suggested on the Dasher website [27], including a mouse, a joystick, eye-tracking and head-tracking. Future possible developments of Dasher are described in [26], and include a suggestion for a modied method for operation using a single switch. This will allow the user to operate Dasher using a switch that changes the direction of cursor movement on activation.


Brain Computer Interfaces

Brain computer interfaces (BCI) may oer another method of providing switching actions in cases of very severe disability. Brain computer interfaces are usually used in situations of very severe disability where there is no other method of communication and control possible. These methods allow the user 23

to interact with the computer using some measurement of brain activity, such as function magnetic resonance imaging (fMRI) or the electroencephalogram, the electrical signal measurable from the surface of the scalp. Correct interpretation of these signals can be used to convey user intention and thus actuate a switching action. The area of brain computer interfaces for the disabled is a huge research area and the interested reader is referred to the IEEE review of the rst international BCI technology meeting [28] as a starting point for more information.


Communication Device

A software communication device called Natterbox was developed as part of this study, based on an alphabet board. The code for this program is included in Appendix I. Although there are many similar communication programs available commercially, this program was developed for two reasons. Firstly, it was in response to a request made by one of the occupational therapists in the hospital, who had been using a previous version of the same program, which had been developed earlier in our laboratory in the NRH. She was attempting to use the system with a male patient who had suered from a brainstem stroke. The patient had poor visual ability and was also very photosensitive. This rendered him unable to see the letters of the alphabet board on screen. She suggested making each of the rows of the alphabet board a dierent colour, in accordance with the layout of physical alphabet boards used by occupational therapists. An auditory facility was then added which speaks out the colours on each of the dierent rows as they are highlighted. The patient was able to learn which letters corresponded to which coloured row and hence could perform a switching action when the program called out the name of the row that was desired. The program then calls out each letter in that row in turn, and the user can again select the desired letter when it is reached, thus enabling the user to spell out messages.


The second benet gained from development of the Natterbox program is that it served as a useful testing board for dierent switching mechansims developed in the work presented here. Since the Natterbox allows the user to spell out words and sentences simply by performing a single switching action, it was an invaluable tool in demonstrating translation of dierent body signals into communication. The Natterbox program as described here was used by a number of dierent patients in the hospital. For each of these patients, a reliable method of interfacing with the program had to be identied and some of the techniques used are discussed in this thesis. As the program developed, various features were added in response to therapist and patient requests. Some of these will now be briey outlined.


Technical Details

The Natterbox program was developed with C++ using the Fast Light Tool Kit3 (FLTK) to develop the graphical user interface. The sound feature was added using tools from the Simple Directmedia Layer4 (SDL), which is a C++ multimedia library designed to provide access to audio devices. The primary advantage of using FLTK and SDL is that they are both cross-platform, making the Natterbox program portable across dierent operating systems.


The Natterbox Graphical User Inteface

The graphical user interface (GUI) of the Natterbox main menu is shown in Figure 2.4, demonstrating a message being spelled out. In Figure 2.4(a), the yellow row is highlighted. The user activates a switch to select this row and the program begins scanning the letters on that row. In Figure 2.4(b), the symbol . is highlighted. The user again activates a switch to select this symbol. Figure 2.4(c) shows that the symbol has appeared on the message banner and
3 4

FLTK Website: SDL Website:


also on the history panel along the right-hand side of the screen.


Switch Interface Box

The switch input required by Natterbox was chosen to be an F2 keypress. Thus Natterbox can be used in one of three ways. Firstly it is operable by simply pressing the physical key on the keyboard. Obviously this is not a very useful interaction method for people with very severe disabilities. Secondly, it may be used in conjunction with another program that is monitoring some signal from the body and will simulate an F2 keypress when it recognises intention. Possible methods for harnessing body signals for these purposes forms much of the remainder of the this thesis. Thirdly, it may be used with a switch interface box. Any arbitrary two way switch, such as those mentioned in Section 2.3.1, can be connected to this box. The switch interface box is connected into the USB port of the computer and a supplementary software application simulates an F2 key press on detection of a switching action. The supplementary program was called USB Switch and the code is given in Appendix I.


Other Features

Phrases Menu Due to requests from the occupational therapists in the hospital, the option of a sub-menu was added to the Natterbox program. This sub-menu provides quick access to a list of commonly used phrases. This menu may be opened by selecting the last row in the main menu. The sub-menu screen is shown in Figure 2.5(a). When the user selects the phrase Turn on or o fan it appears in the message banner back in the main screen. This phrase could be used by the user to request that the fan is turned o if it is already on, or turned o if it is on. 26





Figure 2.4: The Natterbox program (a) The program is highlighting the second
(yellow) row. (b) When the user selects the second row the user begins scanning the letters on this row. The . button is currently highlighted. (c) The user selects this symbol and it appears above on the banner.


Printing Feature An option to print the message to paper was added in response to a request from a patient who wanted a facility for writing letters to her children. This request was fullled by placing an option Print at the bottom of the phrases menu. Selection of this option sends all the text in the history box to an attached printer. This option could be of immense benet to users since it allows the user to prepare lengthy messages in advance.

Cancel Feature A cancel option was added for people who are capable of actuating a second switching action. The second switch input cancels the eect of the last input. Thus if the user has accidently selected a letter they may delete this letter from the message bar by activating the second switch. If the user has accidently selected the wrong row and the program is scanning through each of the items on that row, the user may use the second switch to change back to row scanning.

Three-Switch Mouse A three-switch mouse was developed for one of the patients who was in the hospital who was particularly successful with the Natterbox program. The patient used a push-button switch placed between his thumb and hand to operate the program. He also had head movement on both sides so was able to operate two head switches. The Natterbox program was modied to include a mouse cursor control system using these three switching actions. The patient could exit the alphabet board program by selecting an Exit option at the end of the phrases menu. This switches the program into mouse cursor control mode. The mouse cursor is controlled by the USB Switch program. The head switches may be used to move the mouse cursor either up and 29



Figure 2.5: The Natterbox Phrases Menu (a) The program is highlighting the second
phrase Turn on or o the fan. (b) When the user selects this phrase it appears on the banner back in the main menu.


down, or left and right. Switching between these two directions is performed using the hand switch. Pressing the hand switch twice in succession actuates a mouse click.


Possible Future Developments of Natterbox

The addition of a submenu to Natterbox containing numbers and punctuation marks could be of great benet. In addition to adding to user dignity by making the messages look more presentable, they could also enable emoticons to be used to add more meaning to messages. Emoticons are being more and more popular nowadays due to emailing, instant messaging and text messaging. Emoticons (emotion icons) are a method of adding symbols to the end of messages to represent dierent facial expressions. These can be used to communicate more eectively what is meant by the message. For instance, the simple term Its ok could be interpreted in a number of dierent ways. It can be intended straightforwardly and this can be emphasised by placing a smiley face symbol at the end of the message i.e. Its ok :-). Conversely, if the person wishes to impart some sort of satirical tone to the message, they may express this by adding the sad smiley Its ok :-( or the angry smiley symbol Its ok :-@, depending on intent. These emoticon symbols are becoming more and more integrated into casual everyday written communications and could oer an immense benet to people who are severely disabled and wish to more eectively convey their emotions when writing messages. The addition of a speech synthesiser to the complete program to allow the messages to be spoken out loud is also being considered.



This chapter has outlined some of the diseases, conditions and circumstances that may render a person severely physically disabled. A review of assistive 31

technology applications has been given and the importance of generating a switching action has been emphasised. Now that these areas of been discussed, the aims of this thesis may be more accurately dened. This thesis aims to investigate alternative methods of harnessing vestigial signals from people who have been severely paralysed and have very little motor function, such as those with high-level lesions above C4. These people may be unable to operate a mechanical switch and thus require a more complex technique to be identied that will allow a switching action to be actuated. A large part of the remainder of this thesis focuses on methods of harnessing these vestigial signals to provide switching actions and other control signals.


Chapter 3 Muscle Signals



This chapter and Chapter 4 investigate methods of harnessing bio-signals from the body for control and communication purposes. The exact criteria required to enable a particular body signal to be described as a bio-signal are not always well dened. In the broadest sense of the word, a bio-signal may refer to any signal from the body related to biological function. Under this denition, all of the signals presented in this thesis would fall under the category of bio-signals, including the signals obtained through video capture techniques, described in Chapter 5, and speech signals obtained through audio signal processing techniques, described in Chapter 6. A more narrow denition of the term biosignals is meant here. A bio-signal as discussed in this thesis refers to any signal that is measurable directly from the surface of the skin. This includes signals such as biopotentials, which are measured voltages from certain sites on the body, but also other electrical signals, such as the electrical skin conductance, and mechanical signals, such as the mechanomyogram. This chapter discusses two bio-signals which may be used to detect muscle contraction. These are the electrical signal, the electromyogram (EMG),


and the mechanical signal, the mechanomyogram (MMG). Muscle signal based switching systems may be an option for people who retain some ability to contract certain muscles but may not be able to operate a mechanical switch. This may be because the particular muscle that can be contracted is not suitable for operating a switch or because the muscle contraction is not strong enough to operate the switch. This chapter investigates how deliberate muscle contraction can be used to eect a switching action to operate control and communication systems. The anatomy and physiology of the nerves and the nervous system are rst described in Section 3.2.1. Action potentials and the method of information transfer in the body are described in Section 3.2.2. The anatomy of muscle and the process of muscle contraction are discussed in Section 3.3. Some dierent muscles that may be suitable for use for an EMG-based or MMG-based system are identied in Section 3.3.3. The electromyogram as a control signal is discussed in Section 3.4. Finally the possibility of using the mechanomyogram as a control signal is explored in Section 3.5.


The Nervous System

Nerves and the Nervous System

The Nerve Cell The basic building block of the human bodys nervous system is the nerve cell, or neuron. The neurons in the body are interconnected to form a network which is responsible for transmitting information around the body. The spinal cord, the brain and the sensory organs (such as the eyes and ears) all consist largely of neurons. The structure of a neuron is shown in Figure 3.1. The central part of


Figure 3.1: The Nerve Cell, from pg. 2 in [29] the neuron is the cell body, or soma, which contains the nucleus. The cell body has a number of branches leading from its centre, which can either be dendrites or axons. The dendrites receive information and the axons transmit information, both in the form of impulses, which will be described in more detail later. There is generally only one axon per cell. The axon links the nerve cell with other cells, which can be nerve cells, muscle cells or glandular cells. In a peripheral nerve, the axon and its supporting tissue make up the nerve bre. A bundle of nerve bres is known as a nerve.

Classication of Nerve Fibres The peripheral nervous system refers to the neurons that reside outside the central nervous system (CNS) and consists of the somatic nervous system and the autonomic nervous system [30]. A nerve bre may be classied as either an aerent nerve bre or an eerent nerve bre. An aerent nerve bre transmits information to the neurons of the CNS and the eerent nerve bre transmits 35

information from the CNS. Aerent nerve bres may further be divided into somatic nerve bres and visceral nerve bres. Visceral aerents are nerve bres from the viscera, which are the major internal organs of the body. All other aerent nerve bres in the body are called somatic aerents. These come from the skeletal muscle, the joints and the sensory organs such as the eyes and ears, and bring information to the CNS. Eerent nerve bres can be categorised as either motor nerve bres or autonomic nerve bres. Motor eerents control skeletal muscle and autonomic eerents control the glands, smooth muscle and cardiac muscle. See Figure 3.2 for a summary of nerve bre classications. The visceral aerent nerve bres and the autonomic eerent nerve bres both belong to the autonomic nervous system. The autonomic nervous system is responsible for controlling such functions as digestion, respiration, perspiration and metabolism which are not normally under voluntary control. The function of perspiration, controlled by the autonomic nervous system will be described in more detail in Chapter 4.

Sensory Organs Skeletal Muscle Joints



Central Nervous System


Skeletal Muscle





Cardiac Muscle Smooth Muscle Glands

Figure 3.2: Classication of Nerve Fibre Types

Supporting Tissue Neurons are supported by a special type of tissue constructed of glial cells. These cells perform a similar role to connective tissue in other organs of the body. In a peripheral nerve, every axon lies within a sheath of cells known as 36

Figure 3.3: (A) Myelinated Nerve Fibre (B) Unmyelinated Nerve Fibres, from pg.
8 in [29].

Schwann cells, which are a type of glial cell. The Schwann cell and the axon together make up the nerve bre. A nerve bre may be either a myelinated nerve bre or an unmyelinated nerve bre depending on how the Schwann cells are positioned around the axon. Myelinated nerve bres have a higher conduction velocity than unmyelinated nerve bres. About two-thirds of the nerve bres in the body are unmyelinated bres, including most of the bres in the autonomic nervous system, since these processes generally do not require a fast reaction time. In myelinated nerve bres, the Schwann cell winds around the axon several times as shown in Figure 3.3. A lipid-protein mixture known as myelin is laid down in layers between the Schwann cell body, forming a myelin sheath. This sheath insulates the nerve membrane from the conductive body uids surrounding the exterior of the nerve bre. The myelin sheath is discontinous along the length of the axon. At regular intervals there are unmyelinated sections which are called the Nodes of Ranvier. These nodes are essential in enabling fast conduction in myelinated bres [29]. As mentioned in Chapter 2, diseases such as multiple sclerosis damage the myelin sheath of neurons, or dymyelinate the bres along the cerebrospinal axis [10]. Paralysis occurs due to impairment of the conduction of signals in demyelinated nerves. 37


Resting and Action Potentials

The Membrane Potential A potential dierence usually exists between the inside and outside of any cell membrane, including the neuron. The membrane potential of a cell usually refers to the potential of the inside of the cell relative to the outside of the cell i.e. the extracellular uid surrounding the cell is taken to be at zero potential. When no external triggers are acting on a cell, the cell is described as being in its resting state. A human nerve or skeletal muscle cell has a resting potential of between -55mV and -100mV [29]. This potential dierence arises from a dierence in concentration of the ions K+ and Na+ inside and outside the cell. The selectively permeable cell membrane allows K+ ions to pass through but blocks Na+ ions. A mechanism known as the ATPase pump pumps only two K+ ions into the cell for every three Na+ cells pumped out of the cell resulting in the outside of the cell being more positive than the inside. The origin of the resting potential is explained in further detail in [29].

The Action Potential As mentioned already, the function of the nerve cell is to transmit information throughout the body. A neuron is an excitable cell which may be activated by a stimulus. The neurons dendrites are its stimulus receptors. If the stimulus is sucient to cause the cell membrane to be depolarised beyond the gate threshold potential, then an electrical discharge of the cell will be triggered. This produces an electrical pulse called the action potential or nerve impulse. The action potential is a sequence of depolarisation and repolarisation of the cell membrane generated by a Na+ current into the cell followed by a K+ current out of the cell. The stages of an action potential are shown in Figure 3.4.





55 70


Resting Potential

1 5

Figure 3.4: An Action Potential. This graph shows the change in membrane potential as a function of time when an action potential is elicited by a stimulus. The time duration varies between bre types.

Stage 1 - Activation When the dendrites receive an activation stimulus the Na+ channels begin to open and the Na+ concentration inside the cell increases, making the inside of the cell more positive. Once the membrane potential is raised past a threshold (typically around -50mV), an action potential occurs. Stage 2 - Depolarisation As more Na+ channels open, more Na+ ions enter the cell and the inside of the cell membrane rapidly loses its negative charge. This stage is also known as the rising phase of the action potential. It typically lasts 0.2 0.5ms. Stage 3 - Overshoot The inside of the cell eventually becomes positve relative to the outside of the cell. The positive portion of the action potential is known as the overshoot.


Stage 4 - Repolarisation The Na+ channels close and the K+ channels open. The cell membrane begins to repolarise towards the resting potential. Stage 5 - Hyperpolarisation The membrane potential may temporarily become even more negative than the resting potential. This is to prevent the neuron from responding to another stimulus during this time, or at least to raise the threshold for any new stimulus. Stage 6 The membrane returns to its resting potential.

Propagation of the Action Potential An action potential in a cell membrane is triggered by an initial stimulus to the neuron. That action potential provides the stimulus for a neighbouring segment of cell membrane and so on until the neurons axon is reached. The action potential then propagates down the axon, or nerve bre, by successive stimulation of sections of the axon membrane. Because an action potential is an all-or-nothing reaction, once the gate threshold is reached, the amplitude of the action potential will be constant along the path of propagation. The speed, or conduction velocity, at which the action potential travels down the nerve bre depends on a number of factors, including the initial resting potential of the cell, the nerve bre diameter and also whether or not the nerve bre is myelinated. Myelinated nerve bres have a faster conduction velocity as the action potential jumps between the nodes of Ranvier. This method of conduction is known as saltatory conduction and is described in more detail in [29].


Synaptic Transmission The action potential propagates along the axon until it reaches the axonal ending. From there, the action potential is transmitted to another cell, which may be another nerve cell, a glandular cell or a muscle cell. The junction of the axonal ending with another cell is called a synapse. The action potential is usually transmitted to the next cell through a chemical process at the synapse. If the axon ends on a skeletal muscle cell then this is a specialised kind of synapse known as a neuromuscular end plate. In this case, the action potential will trigger the muscle to contract. The physical processes that must occur to enable muscle contraction will be examined in more detail later, but rst the structure of the muscle is described.


Muscle Physiology

There are three types of muscle present in the human body - smooth, skeletal and cardiac. Smooth muscle is the muscle found in all hollow organs of the body except the heart, and is generally not under voluntary control. Cardiac muscle, the only type of muscle which does not experience fatigue, is the muscle found in the walls of the heart which continuously pumps blood through the heart. Skeletal muscle is the muscle attached to the skeleton which is the type of muscle that will be described here. The main function of skeletal muscle is to generate forces which move the skeletal bones in the body. The basic structure of a skeletal muscle is shown in Figure 3.5. Muscle is a long bundle of esh which is attached to the bones at both ends by tendons. The muscle is protected by an outer layer of tough tissue called the epimysium. Inside the epimysium are fasicles or bundles of muscle bre cells. The fasicles are surrounded by another layer of connective tissue called 41

Epimysiumouter layer of the muscle Tendon

111111 000000 each muscle bundle 111111 000000 111111 000000 111111 000000 11111 00000 111 000 111111 000000 11111 00000 111 000 111111 000000 11111 00000 111 000 111111 000000 11111 00000 111 000 111111 000000 11111 00000 111 000 11 00 11111 00000 11111 00000 111 000 11 00 11111 00000 11111 00000 111 000 11 00 11111 00000 11111 00000 111 000 11 00 11111 00000 11111 00000 111 000 11 00 11111 00000 11111 00000 111 000 11 00 11111 00000 11111 00000 111 000 11 00 11111 00000 11111 00000 111 000 11 00 11111 00000 11111 00000 111 000 11 00 111111 000000 11111 00000 11111 00000 111111 000000 11111 00000 11111 00000 1111 11 0000 00 111111 000000 11111 00000 11111 00000 1111 0000 111111 000000 11111 00000 1111 0000 111111 000000 1111 0000 111111 000000

Perimysium surrounds


Muscle fibre (cell)

Fasicle bundle of muscle cells

Endomysium surrounds each cell

Figure 3.5: Muscle Anatomy the perimysium. The individual muscle bre is surrounded by a layer of tissue called the endomysium. The structure of the individual muscle bre will now be described now in more detail.

The Muscle Fibre Each individual muscle bre is a cell which may be as long as the entire muscle and 10 to 100m in diameter. The nuclei are positioned around the edge of the bre. The inside of the muscle bres consists of closely packed protein structures called myobrils which are the seat of muscle contraction. The myobrils run along the length of the muscle bre. These myobrils exhibit a cross striation pattern which is shown in Figure 3.6. The myobrils may be seen in detail using a technique known as polarised light microscopy. Under a microscope, the myobrils exhibit a repeating

pattern of dark and light bands. The dark bands are termed A-bands or anisotropic bands and the light bands are termed I-bands or isotropic bands. Anisotropic and isotropic refer to how the bands transmit the polarized light which is shone on them as part of the microscopy process. The isotropic bands transmit incident polarised light at the same velocity regardless of the direction and so appear light coloured, while the anisotropic bands transmit the light at dierent velocities depending on the direction of the incident light and


Muscle Fibre Myofibril

111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 1 0 111111 1 0ABand IBand 000000 111111 000000 1 0 111111 000000 1111111111 0000000000 1 0 111111 000000 1 1 0 0 111111 000000 1111 0000 1 0 1 1 0 0 1111111 1 111 0000000 0 000 1111 0000 1 111 0 000 1 0 1111111 1 111 0000000 0 000 1 111 0 000 1111 0000 1 0 1111 0000 1 0 1111 0000 1 0 1111111 1 111 0000000 0 000 1 111 1 111 0 000 0 000 1111 0000 1 0 1111 0000 1 111 1 111 0 000 0 000 1111 0000 1 0 1111111 1 111 0000000 0 000 1 0 1111 11 0000 00 1 0 1111 0000 1 0 1111 11 0000 00 1 0 1 0 111111111 000000000 1 0 11 00 111111111 000000000 1111111111 0000000000 1 0 11 111111111 000000000 1 0 Sarcomere 00 11 00 111111111 000000000 1 0 11 00 111111111 000000000 1 0 11 00 Z disc 1 0 11 00

11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00

Figure 3.6: The muscle bre and the myobril cross striation pattern. therefore appear dark coloured. In the middle of the I-band there is a thin dark strip known as the Z-disc. The basic contractile element of muscle is known as the sarcomere and is the region between two Z-discs. The sarcomere is about 2m in length. The myobril is made up of a repeating chain of sarcomeres. A sarcomere consists of one A-band and one I-band. The structure of the sarcomere is shown in Figure 3.7(a). The Z-discs link adjacent thin myolaments, the Ibands, which are about 5nm in diameter. These bands primarily consist of actin, but also contain tropomyosin and toponin [31]. The A-band in the centre of the sarcomere contains thicker myolaments made of myosin which interlink the thin myolaments [29]. These myosin laments are about 11nm in diameter [30]. When the muscle contracts the thin laments are pulled between the thick laments. The position of the actin and myosin laments are shown before contraction in Figure 3.7(a) and during contraction in Figure 3.7(b). The importance of these bands and their role in muscle contraction will be described in the next section.









Figure 3.7: (a) The sarcomere before contraction occurs. The A-band, containing
thick myosin laments, and the I-band, containing the Z-disc and the thin actin laments are shown. (b) On contraction of the muscle, the thin actin laments slide between the myosin laments.


Muscle Contraction

The Motor Unit Each eerent motor nerve bre, or motor neuron as they are also known, stimulates a number of muscle bres. The nerve bre, and the muscle bres it innervates, make up the smallest functional unit of muscle contraction known as the motor unit. Each individual muscle bre in a motor unit will be stimulated simultaneously by the nerve bre, so they will each always contract and relax in synchronisation. The force produced by a muscle can be increased by increasing either of two parameters:(i) The number of active motor units. The motor units are roughly arranged in parallel along the length of the muscle so by activating more motor units, more muscle force can be produced. The forces produced by individual muscle units sum algebraically to give the total muscle force. 44

(ii) The rate at which the nerve bres activate the muscle bres, or re. This rate is known as the ring frequency. When a single motor unit receives a single stimulation, the response is a single twitch. The duration of a single twitch varies depending on whether the muscle bres are slowtwitch (Type 1) muscle bres or fast-twitch (Type 2) muscle bres. A motor unit will usually be made up entirely of either fast-twitch muscle bres or slow-twitch muscle bres. The slow motor units have a slower speed of contraction but will take longer to fatigue. When a muscle contracts, the slow motor units are recruited rst, this principle is known as the size principle of motor unit recruitment [31]. The duration of a single twitch in a slow-twitch muscle bre is about 200ms. The action potential causing the single twitch is only about 0.5ms in duration so the twitch goes on for a long time once it has been initiated. If the length of a single twitch is 200ms and the ring frequency is less than 5Hz, then the force response will show a series of individual twitches. As the ring frequency of the motor unit increases, the second stimulus will begin to stimulate the muscle before the eects of the rst stimulus have subsided. In this cases the forces begin to accumulate. As the ring frequency increases, the force response becomes larger in magnitude. For relatively low frequencies (less than 20Hz for slow motor units and less than 50Hz for fast motor units) there will be some force relaxation between stimulation pulses. If the muscle force is oscillating, then this is known as unfused tetanic contraction. At higher ring frequencies the force will remain constant, this is known as fused tetanic contraction.

Types of Contraction When a muscle is stimulated by a nerve impulse, it tends to shorten, provided it can overcome the external resistance imposed on it. Shortening and force production of muscle is referred to as contraction [31]. A shortening contraction is called a concentric contraction. In certain instances the muscle is xed so 45

it cannot shorten and the increase in muscle contraction is then measurable as an increase in the force acting on the muscle. This type of contraction is known as an isometric contraction. Each muscle has a maximum isometric force capability which is the maximum amount of force that can be applied to a muscle which is xed at a certain length without forcible stretching. If the muscle is subjected to an external force greater than its maximum isometric force capability then the muscle is forcibly stretched. This is known as eccentric contraction. These contractions can be measured in vivo - i.e. while the muscle is still living in the human body. Other muscle contractions are measurable by severing a muscle at its tendons and placing it in a bath for experiments. These types of measurements are known as in vitro measurements (literally meaning in glass). In vitro experiments can be used to measure isotonic or isokinetic contractions. Isotonic contraction occurs when the muscle is subjected to a constant load and isokinetic contraction refers to contractions performed at a constant speed. An in vivo contraction is rarely fully isometric or isotonic.

Molecular Mechanism of Contraction During an isotonic contraction, it is observed that the width of the A-bands stays constant but the width of the I-bands becomes narrower. However, the length of the actin laments in the I-band are found to stay the same length during the contraction. The I-band is thus shortened by the actin laments sliding in between the myosin laments. The cross-bridge theory, which was rst postulated by Huxley in 1957 [32], is widely used to describe how the actin laments slide between the myosin laments. When a muscle begins to contract a cross-bridge is formed between the myosin and actin laments. The head of the cross-bridge rotates, which pulls the actin lament between the myosin laments. The bridge is then broken and reformed with the next part of the actin lament and the cycle continues. As described earlier, a muscle cell is stimulated to contract when it re-


ceives an action potential. It is thought that the depolarisation of the cell that occurs during an action potential might cause an increase in the calcium ion concentration inside the cell. The exterior of the myobrils consists of a network of tiny sacks or vesicles, known as the sarcoplasmic reticulum. The vesicles provide calcium to the Z-discs when the cell is depolarised. The crossbridge is formed by a binding of the actin and myosin molecules and requires calcium ions to split the ATP and release energy for contraction. When the muscle is in a relaxed state, the sarcomere contains a very low concentration of calcium ions, so there is no interaction between the actin and myosin and no ATP splitting. On activation the calcium ion concentration rises and so cross-bridges are formed between the two sets of laments, ATP is split and sliding occurs [30].


Muscle Action in People with Physical Disabilities

Often, even people who have become severely paralysed will retain some level of ability to contract certain muscles. For example, quadriplegic patients who have been injured around the C5/C6 level usually retain the ability to move their head to some extent. In some cases, this movement is sucient to allow the person to communicate intent by operating head-switches, which are usually axed to their wheelchair. Unfortunately, although a person may still be able to activate a muscle voluntarily, often the contractions may be too weak to operate a conventional mechanical switch. This weakness is caused largely by a loss of functional input from higher brain centres to the spinal motor nerves, which leads to partial muscle paralysis and submaximal muscle activation [33]. In these situations, the contraction must be detected by other means. The sternocleidomastoid muscle is one of the muscles which may often still be under voluntary control in people with high-lesion quadriplegia. This muscle is one of the muscles which ex the neck. The neck muscles are shown


Figure 3.8: The Neck Muscles, showing the sternocleidomastoid, from pg. 97 in [11] in Figure 3.8. The sternocleidomastoid muscle receives motor supply from the spinal part of the accessory (eleventh cranial nerve). It receives sensory bres from the anterior rami of C2 and 3 [11] and thus may still be controlled by people who still have these nerve bres intact, which usually includes people with spinal cord injuries lower than this level. Unilateral contraction of the sternocleidomastoid laterally exes the head on the neck, rotating it to the opposite side, and laterally exes the cervical spine. Bilateral contraction draws the head forwards and assists in neck exion. Dierentiation between muscle contraction and muscle relaxation can be used to control a single switch system e.g. a communication program. There are two methods considered for measuring muscle contraction. Muscle contraction may be detected non-invasively by measuring either the electrical or mechanical signal at the surface of the skin. The electrical signal is known as the electromyogram and the mechanical signal is known as the mechanomyogram. These will now be described in more detail.




The electromyogram or EMG is an electrical signal that can be used to observe muscle contraction. It is measured either by using surface electrodes on the skin (surface EMG) or by invasive needle electrodes which are inserted directly into the muscle bre (the invasive, needle or indwelling EMG). As mentioned already, a muscle bre contracts when it receives an action potential. The electromyogram observed is the sum of all the action potentials that occur around the electrode site. In almost all cases, muscle contraction causes an increase in the overall amplitude of the EMG. Thus it is possible to determine when a muscle is contracting by monitoring the EMG amplitude. The EMG is a stochastic signal with most of its usable energy in the 0500Hz frequency spectrum, with its dominant energy in the 50-150Hz range. The amplitude of the signal varies from 0-10mV (peak-to-peak) or 0-1.5mV (rms) [34]. An example of an EMG and its frequency spectrum is shown in Figure 3.9.


EMG Measurement

The EMG may be measured invasively or non-invasively. Clinical electromyography almost always uses invasive needle electrodes as it is concerned with the study of individual muscle bres [35]. It produces a higher frequency spectrum than surface electromyography and allows localised measurement of muscle bre activity [36]. For simple detection of muscle contraction, it is usually sucient to measure the electromyogram non-invasively, using surface electrodes. The standard measurement technique for surface electromyography uses three electrodes. A ground electrode is used to reduce extraneous noise and interference, and is placed on a neutral part of the body such as the bony part of the wrist. The two other electrodes are placed over the muscle. These two 49

Figure 3.9: EMG and frequency spectrum, from [34], measured from the tibialis
anterior muscle during a constant force isometric contraction at 50% of voluntary maximum.

electrodes are often termed the pick-up or recording electrode (the negative electrode) and the reference electrode (the positive electrode) [35]. The signal from these two electrodes is dierentially amplied to cancel the noise, as shown in Figure 3.10. The surface electrodes used are usually silver (Ag) or silver-chloride (AgCl). Saline gel or paste is placed between the electrode and the skin to improve the electrical contact [37]. Over the past 50 years it has been taught that the electrode location should be on the motor point of a muscle, at the innervation zone. According to De Luca [34], this is probably the worst location for detecting an EMG. The motor point is the point where the introduction of electrical currents causes muscle twitches. Electrodes placed at this point tend to have a wider frequency spectrum [36] due to the addition and subtraction of action potentials with minor phase dierences. The widely regarded optimum position to place the electrodes over the muscle is now on the belly of the muscle, midway between the motor point and the tendinous insertion, approximately 1cm apart [36]. The electrode position on the muscle is shown in Figure 3.11. 50

Figure 3.10: EMG dierential amplier conguration, from [34]. The EMG is
represented by m and the noise signal by n.

Figure 3.11: Preferred location for the electrodes, from [34]. The electrode shown
is a parallel bar electrode but two circular electrodes could also be used, placed approximately 1cm apart.



EMG as a Control Signal

The electromyogram has been used as a control signal for assistive technologies for a number of years. It is usually used for prosthesis control and is often described by the term myoelectric control. Scott and Parker [38] estimate that the use of the myoelectric signal to control powered prostheses had become an important clinical alternative by 1980. Myoelectric controlled prostheses are usually controlled by measuring the EMG on the muscle remnants in the residual limb. The rst types of myoelectric controlled prostheses were hand prostheses for below-elbow amputees [38]. These systems were generally based on two-site two-state control. Electrodes are placed over two muscles such as the forearm extensor muscles and the forearm exor muscles. Contraction of one muscle opens the hand and contraction of the other muscle closes the hand. This has the advantage in that it seems natural to use these two muscles to control hand movement, but it may be dicult to learn to produce isolated contractions of the two muscles. For more complex prostheses, such as those that include the elbow joint, simple on-o control is not sucient. Hudgins, Parker and Scott [39] have explored a multifunction myoelectric control strategy which extracts features of the EMG in an attempt to identify four distinct types of muscle contraction from measurements of the EMG. Dierent modes of muscle contraction will result in dierent signal patterns due to dierent motor unit activation patterns [40]. In the experiments that they have reported [39], one electrode is placed over the biceps brachii and one over the triceps brachii to enable maximum pickup of the EMG from the muscles in the upper arm. The four movements tested were forearm supination, elbow extension, wrist exion and forearm pronation. If these four movements can be correctly recognised, then each one can be used to generate a distinct signal to control a prosthetic limb. The classier for the system described is based on an articial neural network 52

classier [41], which was found to correctly classify between 70-98% of test patterns after an initial training of the neural network. The concept of myoelectric control for prostheses has been extended to explored its suitability as a control signal for severely disabled people for other applications. Chang et al [40] have explored electromyogram pattern recognition techniques as a control command for man-machine interfaces. The onset of muscle contraction is detected by counting the number of zero crossings in the signal. The feature extraction stage uses a fourth order autoregressive (AR) model and the classier used is a modied maximum likelihood distance classier [42]. The EMG was recorded from the sternocleidomastoid and upper trapezius muscles (see Figure 3.8) and discrimination of ten muscle motions of the neck and shoulders was investigated. From these 10 motions, it was found that ve specic motions were almost perfectly recognisable. These are head exion, head right rotation, head left rotation, right shoulder elevation and left shoulder elevation. The mean correct recognition rate reported was 95% indicating that a system such as this could provide a ve-way control system that could be used to operate a control and communication system for severely disabled people with control of their neck muscles. The EMG was briey explored as part of the work presented here, as a control signal for a male patient in the NRH. This patient had sucient hand control to use a hand switch, which he initially used to operate the Natterbox. He also had some degree of control of his neck muscles, most noticeably left and right neck rotation. This led to an investigation of possible methods of eliciting two switching actions corresponding to these two movements. The EMG was one of the methods considered. Two pairs of electrodes were placed between the sternocleidomastoid and upper trapezius muscles of the neck, one pair on either side. Each pair of signals was then dierentially amplied. Rotation of the neck to the right was found to cause an increase in amplitude of the signal between the left pair of electrodes and vice versa. This increase in amplitude could be harnessed by thresholding the signal and thus used to actuate two 53

switching actions. This method was tested with him for operation of the ThreeSwitch Mouse described in Section 2.4.4, which allows a mouse cursor to be controlled with three switches. The hand switch was used to switch between left/right and up/down movement and the two head movements were used to move the mouse either up and down or left and right depending on the mode of operation. This enabled the patient to control a number of software programs, including Windows Media Player, which he could use to select albums, play songs and adjust the volume independently. The Three-Switch Mouse program was combined with the Natterbox to allow the patient to easily switch between the two. EMG-based detection of head movements was eventually replaced with two mechanical switches placed at either side of the head. With careful placement of the switches, the patient was able to actuate these switches with slight head rotations to the left and the right. Nonetheless, this case highlights the potential of the EMG for communication and control purposes. While the EMG oers much potential for use as a communication and control signal, the work presented here investigates the possibility of using the mechanical signal, the mechanomyogram or MMG, as an alternative method of providing communication and control using muscle contraction. There may be a number of advantages in using the MMG over the EMG. The MMG can be measured using a single small accelerometer attached to the skin, as opposed to the three electrodes required for single-channel EMG recordings. This may be more convenient and comfortable for the user. Since the MMG is a mechanical signal, no skin preparation is required, as opposed to EMG recordings which typically require that the skin be prepared with an alcohol swab before recordings to improve skin conductance. The MMG typically has a higher signal-to-noise ratio than the EMG, which means that an MMG system has the potential to detect and make 54

use of weaker contractions than an EMG system does. Bandwidth is lower than electromyogram bandwidth (typically 3-100Hz) so a lower sampling rate can be used. Mechanical vibrations can propagate through uid and tissue surrounding the contracting muscles which means MMG sensors are capable of detecting contraction signals from virtually every muscle in the body due to MMG signal propagation properties, while EMG is limited to the contraction of supercial muscles [43]. Less precise sensor placement is necessary than with the EMG. EMG may vary due to changes in skin conductance caused by sweating. This is not a factor when measuring the MMG.



As well as the electrical signal, muscle contraction produces mechanical vibrations that may be detected at the surface of the skin. This mechanical signal has been observed and detected since the beginning of the 1800s [44] but it is only more recently that it is being better understood. The mechanical signal due to muscle contraction is described under a wide variety of names. As it is a pressure signal with much of its frequency spectrum close to that of sound it can be measured using a microphone. When measured in this way it is often referred to as muscle sound, the phonomyogram, the acoustic myogram or the soundmyogram. As some of the signal is below the audible range of the human ear, Orizio [45] suggests these terms should be avoided. It is also sometimes referred to as the vibromyogram or accelerometermyogram when measured using an accelerometer, as in the system described here, but these terms reect the nature of the measurement technique rather than the nature of the signal. Orizio [45] proposes the term mechanomyogram to more accurately reect the nature of this signal. 55

The mechanomyogram (MMG) signal is observable at the surface of the muscle due to the movement of the muscle bres underneath. Orizio [45] states that it is due to three things: Dimensional changes of the active bres upon contraction. A gross lateral movement at the initiation of muscle contraction generated by the non-simultaneous activation of muscle bres. Smaller subsequent lateral oscillations generated at the resonant frequency of the muscle. The resonant frequency of a particular muscle is a function of several parameters including muscle mass, length, topology and stiness. The exact relationship between the peak MMG frequency components and the muscle resonant frequencies has been investigated by Barry, and is discussed in [46]. The amplied MMG measured from the biceps brachii is shown in Figure 3.12. The measurement technique will be described in more detail shortly. The MMG is a random, noise-like signal with an approximately Gaussian amplitude distribution. The useful bandwidth of the signal is between approximately 3Hz to 100Hz, with a peak usually around 20-25Hz [47]. In the gure shown, the subject contracts their muscle at approximately t=5s, which increases the absolute amplitude of the signal. This feature can be used to detect muscle contraction, and therefore provide a means of communication and control for disabled people.


MMG as a Control Signal

Recently, the MMG has been explored by Silva, Heim and Chau as a control signal for prosthetic limbs [43]. The system developed uses three microphoneaccelerometer pairs as sensors, placed evenly spaced 120 apart inside a socket


Amplified MMG (Contraction at t=5s)



Amplitude (V)









Time (s)

Figure 3.12: The amplied MMG, showing the increase in amplitude when the
muscle contracts at t 5s.


Figure 3.13: Side(left) and front(right) views of soft silicone socket built for MMG
recording, from [43]. Note the embedded multisensor array containing three coupled MMG sensors equi-distant angles around the end of the stump.

1.5cm up from the stump, as shown in Figure 3.13. The mechanical muscle signal generated upon muscle contraction can propagate through uid and tissue to the sensors, but will be proportionally dampened depending on the distance travelled. Thus signals from muscles far away from the sensors will be diminished compared to signals from muscles that are nearer the sensors. Hence dierent signal amplitudes will be observed at dierent sensors depending on which muscles are used in the movement and thus dierent muscle activities can be classied. Results reported in [43] seem to indicate that muscle activity can be tracked with MMG sensors in a way similar to that of EMG sensors.


MMG Application for Communication and Control

As part of the work presented here, an MMG based system was developed that could be used to control any application operable by one switching action, such as the Natterbox program described in Chapter 2. The system developed will now be described. The signal acquisition technique is rst explained and then the signal processing steps necessary and the resulting system developed will be described.


Signal Acquisition The MMG was detected using a dual-axis accelerometer ADXL203E (dimensions 5mm 5mm 2mm) from Analog Devices1 , shown in Figure 3.14. Its small size makes it an attractive method of monitoring muscle activity. This sensor was axed to the belly of the muscle using adhesive tape, oriented so that one axis of the accelerometer was measuring the signal along the muscle bres, the other axis was measuring the signal perpendicular to the muscle bres tangential to the skin surface. When the accelerometer is powered by a 5V supply, the two output signals have a DC oset of 2.5V and this needs to be removed to allow greater amplication before the signal is converted to a digital signal and read into the computer. The signal will also have some DC component due to orientation in the earths gravitational eld (1V/g). The 2.5V component is subtracted using the circuit in Appendix A. This is preferable to high pass ltering the signal as it may be necessary to know accelerometer orientation in some applications. The bandwidth of the signal from the accelerometer is limited by its output capacitors to 200Hz. Note that the actual signal measured is acceleration. In order to get the displacement, the signal should be double integrated. In this instance, since the objective is control rather than signal analysis, that step is unnecessary.

Signal Processing The acquired signals from the two channels of the accelerometer are read into the computer using the NIDAQ PCI-6023E data acquisition card (sampling rate 500Hz). The Real Time Workshop, which is part of Simulink for MATLAB was used to perform initial tests on this signal to determine the steps necessary for detection of contraction from the MMG. The Simulink block diagram is given in Appendix B. Once the required steps had been identied, the code was converted to a stand-alone C++ program using the National Instruments

Analog Devices Website:


Figure 3.14: The accelerometer used to measure MMG, compared with a one euro

libraries. This program outputs a switching action on detection of muscle contraction. Simulation of the F2 key press was chosen as the switching action since this is the expected input for operation of the Natterbox. The MMG signal shown in Figure 3.15(a) is an example of the raw signal that the computer receives, for one axis. For the 100 second period shown, the subject contracts their muscle three times which is clearly observable in the recorded signal. The DC component of the signal is due to gravity, and the magnitude of the DC component of each signal is dependent on the orientation the accelerometer with respect to the earths gravitational eld. There is a pronounced change in the overall shape of the muscle between the relaxed and contracted states. This causes a relatively sudden change in the displacement of the accelerometer at the onset of contraction, causing distinct exaggerated peaks in each of the two accelerometer output signals at these times, as shown for one axis in 3.15(a). Furthermore, small changes in the orientation of accelerometer between the relaxed and contracted states cause changes in the gravitational oset of its output signals. The two signals were high passed ltered (cuto 2Hz) to remove this DC component, as shown in


Figure 3.15(b). The resulting signals were then full-wave rectied as shown in Figure 3.15(c) and smoothed using a moving-average lter (N=100) [42]. The averaged signal is shown in Figure 3.15(d). Finally, the processed signal from both channels is compared to a threshold value and a decision is made as to whether or not the muscle is contracted. The appropriate threshold value is dependent on which muscle is being observed and on the users maximum voluntary contraction of that muscle. Therefore, provision is made for this value to be determined by the therapist. A value of 3.5V was chosen as an appropriate threshold for the signal shown in Figure 3.15(d). If the averaged signal is higher than this threshold then the output of this block is 1, as shown in Figure 3.15(e). If the output from either channel is 1 a software switching action is performed. In the nal software implementation, a 0.5s debounce time was added to ensure that multiple peaks will not be translated into multiple switch actuations.

Software Implementation Simulink provides an option for converting a block diagram to C++ code and these les were used as the basis of a stand-alone MMG muscle contraction detection system. DirectX was used to provide a graphical user interface which allowed the user or therapist to observe the signals at each of the stages mentioned. The graphical user interface also allows the threshold to be changed. The code for this program is on the included disk (see Appendix I).

Testing Preliminary testing of this system was performed on four able-bodied persons to test their ability to use muscle contraction to communicate using the Natterbox. The system was tested on both the biceps brachii muscle and the sternocleidomastoid muscle of each subject. The testers were asked to spell out the message The quick brown fox jumps over the lazy dog (9 words) 61

Raw MMG 3.5 2

Filtered MMG




1.5 Amplitude (V) Amplitude (V)






1.5 40





50 Time (s)






3 40





50 Time (s)






(a) Raw
Rectified MMG 3 140

(b) Filtered
Averaged MMG



100 2 Amplitude (V) Amplitude (V) 42 44 46 48 50 Time (s) 52 54 56 58 60 80



1 40



0 40

0 40





50 Time (s)






(c) Rectied
Output MMG

(d) Averaged


Amplitude (V)




0 40





50 Time (s)






(e) Output

Figure 3.15: The MMG at various stages of processing, generated using the Simulink
model in Appendix B


Table 3.1: MMG Experimental results. *B=biceps brachii, S=sternocleidomastoid

User Muscle* Time taken (min) 1 B S 2 B S 3 B S 4 B S 6:59 5:40 5:58 6:20 5:12 4:56 5:38 5:42 Speed No. of errors 2 1 0 3 1 3 0 0

1.28 1.58 1.51 1.42 1.73 1.82 1.59 1.58

by contracting their muscle when the desired row or letter is highlighted i.e. contracting the biceps brachii by clenching their st or moving their head to contract the sternocleidomastoid muscle. The results of the experiments on the biceps brachii muscle and the sternocleidomastoid muscle for each of the four users are shown in Table 3.1. The speed of the users was limited by the 0.5 second debounce time of the system and also by the scanning speed of the alphabet board. The alphabet board scanned at 0.5s per row/column, which appeared to be comfortable speed for the users. Therefore, the time taken to select a single letter was between 1-6s depending on the position of the letter on the board. The average speed over the four users and the two muscles was 1.56 words/min with an average of 1.25 errors. Compared to natural speech, which has rates of up to a few hundred words per minute, a speed of 1.56 words/min is very slow, and may be frustrating for the user. However, for users who are completely unable to communicate by conventional means any nite speed of communication is better than nothing.


Further Considerations Unintentional movements of the person caused the system to incorrectly detect contraction of the muscle on several occasions. Waiting a few hundred milliseconds after the initial peak due to muscle movement, and then determining if the muscle is still contracted, could prevent this, although this might aect user perception since it would introduce a delay between muscle contraction and the system response. Although the subjects where asked to trigger the sternocleidomastoid using full head movements, facial movements also caused unintentional triggering. These movements could be intentionally detected by an appropriately placed accelerometer, and used as the basis of communication. The work described here was presented at the 2004 IEEE EMBS conference in California.



The MMG appears to oer a promising alternative to EMG for control using muscle contraction. Since it is capable of detecting very small contractions, it may be useful in cases of very severe disability where the individual has very limited muscle contracting abilities. Preliminary results indicate that the MMG can provide a useful method of aiding communication by people who cannot communicate using traditional means. Further studies should be carried out on this system to assess its performance with disabled people and its ability to detect muscle contractions in these subjects. Pattern recognition techniques that allow dierentiation between dierent muscle actions should also be investigated in more detail for the MMG, ultimately to provide a means of operating multiple-switch operated systems.


Chapter 4 Other Biosignals - Eye Movements and Skin Conductance



The previous chapter described two biosignals which relate to muscle contraction - the electromyogram and the mechanomyogram. Two further physiological signals from the body are dealt with in this chapter, both of which may be harnessed to provide control and communication for people who are severely disabled. These are the electrooculogram (EOG) and the electrical conductance of the skin. Section 4.2 describes the electrooculogram (EOG). This signal is a biopotential measurable around the eye - either between the top and bottom of the eye (the vertical EOG), or between the two sides of the eye (the horizontal EOG). The EOG amplitude varies as the eyeball rotates within the head, and thus can be used to determine horizontal and vertical eye movements. These movements can then be harnessed as a control signal for applications, as will be discussed. In Section 4.3, a method for control based on the electrical conductance of the skin is described. The conductance of the skin may be con-


trolled by consciously relaxing or tensing the body, thus activating or relaxing the sweat glands. This method is constrained by the length of time taken to elicit a response, but it may be applicable in cases of very severe disability. A method for measurement of the ring rate of the sympathetic nervous system based on measurement of the skin conductance is also presented.


The Electrooculogram

In this section, eye tracking is discussed as a means of control and communication for disabled people. Firstly, the anatomy of the eye is briey described in Section 4.2.2, and a review of eye tracking technologies is presented. The eye tracking method chosen to study in more detail and implement as part of the work presented here is based on measurement of a signal known as the electrooculogram (EOG). A description of the physiological origin of the EOG is given, with reference to the anatomy of the eye. The method employed to measure the EOG is described, and some of the advantages and limitations of using this signal to track eye movements are discussed. A novel method called Target Position Variation (TPV) is presented in Section 4.2.5, which was developed as part of the work described here as a way of overcoming some of the EOGs limitations. This method allows for more accurate inference of absolute eye position from the EOG, enabling more robust control in EOG-based applications. Based on these studies of eye movement for communication and control purposes, a feedback model for eye movements was developed which is given here. This model describes rotation of the eyeball in either the horizontal or vertical plane. It accurately predicts the measured response of the eye when it makes a sudden saccadic movement, that is, when the eyes focus suddenly moves from one target to another. The response of the eye to smooth pursuit movements, where the eyes are following a moving target, are


Figure 4.1: Sections visible in the outer part of the eye (from [48]) also briey explored using this model. Development of these models led to a deeper understanding of the underlying processes involved during saccadic and smooth pursuit eye movements. This knowledge was of immense benet when considering how eye movements could be used for communication and control purposes.


Anatomy of the Eye

The main features visible at the front of the eye are shown in Figure 4.1. The lens, directly behind the pupil, focuses light coming in through the opening in the centre of the eye, the pupil, onto the light sensitive tissue at the back of the eye, the retina. The iris is the coloured part of the eye and it controls the amount of light that can enter the eye by changing the size of the pupil, contracting the pupil in bright light and expanding the pupil in darker conditions. The pupil has very dierent reectance properties than the surrounding iris and usually appears black in normal lighting conditions. Light rays entering through the pupil rst pass through the cornea, the clear tissue covering the front of the eye. The cornea and vitreous uid in the eye bend and refract this light. The conjuctiva is a membrane that lines the eyelids and covers the sclera, the white part of the eye. The boundary between the iris and the sclera is known as the limbus, and is often used in eye tracking. A horizontal section through the right eye is shown in Figure 4.2, showing


Figure 4.2: Horizontal cross section of the eye, the symbol K is reected onto the
retina (from [48])

how an image of the letter K is projected onto the retina at the back of the eye. The crystalline lens located just behind the iris focuses the light rays onto the retina. Note that the image on the retina is inverted. The brain is able to process this image and invert it so we see the image in its original upright form. The light rays falling on the retina cause chemical changes in the photosensitive cells of the retina. These cells convert the light rays to electrical impulses which are transmitted to the brain via the optic nerve. There are two types of photosensitive cells in the retina, cones and rods [49]. The rods are extremely sensitive to light allowing the eye to respond to light in dimly lit environments. They do not distinguish between colours, however, and have low visual acuity, or attention to detail. The cones are much less responsive to light but have a much higher visual acuity. Dierent cones respond to dierent wavelengths of light, enabling colour vision. The fovea is an area of the retina of particular importance. It is a dip in the retina directly opposite the lens and is densely packed with cone cells, allowing humans to see ne detail, such as small print. The human eye is capable of moving in a number of dierent manners to observe, read or examine the world in front of them. Most types of eye


movements are conjugate, where both eyes move together in the same direction. A saccadic eye movement is a type of conjugate eye movement where the eye suddenly changes xation from one place to another voluntarily, such as in reading, where the eye jumps back to the start of the next line when it reaches the end of the previous line. Saccadic eye movements are characterised by a very high initial acceleration and deceleration. The purpose of saccadic eye movement is to x the new target image on the fovea. During saccadic motion, the eye moves with an angular velocity in the range 30-900/s [50]. When the eye is tracking a continuously moving object, the movement is described as a smooth pursuit movement. Smooth pursuit movements have an angular motion in the range 1-30/s [50]. Smooth pursuit movements can not be produced voluntarily as they always require a moving stimulus [51]. Saccadic eye movements and smooth eye movements will be examined in more detail in Section 4.2.9, where a model for both of these movements is proposed. Compensatory eye movements are another type of smooth movements similar to pursuit movements, which act to keep the eyes xed on a target when the head or trunk moves.


Eye Tracking Methodologies

The areas concerned with measurement of both relative eye movement and absolute eye position are generally both included under the term eye tracking, although it must be pointed out that some of the eye tracking methodologies only measure the direction of eye movement from an initial position, rather than continuously tracking the exact location of the eye. However, as it is the most generally used term, eye tracking will be used here to describe measurement of both changes in eye position and absolute eye position. Eye tracking has become an important research eld due to its applicability to a range of dierent disciplines. It is used in market research to provide a way of assessing the eectiveness of dierent advertisement layouts on the viewer.


It is often used by psychologists during tests, to determine the patients focus or interest level, for example. It is used clinically to determine illnesses such as schizophrenia from unusual eye movements, and in developmental tests on babies. For the work presented here, eye movement is important as it may represent an individuals only voluntary movement during some of the later stages of motor neurone diseases such as Amyotrophic Lateral Sclerosis (ALS). Unlike other motor neurons in the body which are aected by ALS, the motor neurons in the eye are relatively spared even at the very terminal stage of this disease, due to high levels of calcium-binding proteins in these cells [18]. If eye movement can be correctly tracked in people with diseases such as this, then it can provide a useful tool for enabling these people to communicate with others and to control their environment. There are many dierent systems available commercially that can be used to track eye movement. Some of the dierent methods commonly employed for eye tracking are discussed here by division into three categories - visual eye tracking techniques, the magnetic search coil technique and the electrooculogram.

Visual Eye Tracking Techniques Visual eye tracking techniques, which are sometimes referred to as videooculographic methods, are based on observation of the position of the eye, usually by using some sort of camera. Clearly, one can roughly determine the direction of a persons gaze by monitoring the location of the centre of their eyes, since the eyeball rotates to place the object that the person is looking at directly in line with the centre of the pupil, at the centre of vision. In automated visual eye tracking, computer vision techniques are employed to track one or more pertinent features of the eye. In most of these methods, light is shone directly at the eye. Infra-red light is often used, as it is invisible and does not make the subject close their eyes. When light is shone directly at


the pupil so that it enters the eye along the optic axis, it is reected back due to the reective properties of the retina and the pupil will appear as a bright reection. This accounts for the red-eye eect in photography and the phenomenon is known as the bright-eye eect. The cornea also reects light, as can be readily observed in a room with a window where the image of the window can be seen to appear on the eyes surface. A camera is used to detect these reections which can then be used to calculate eye position relative to some reference point. Two commonly used methods using reections are the Pupil Centre/Corneal Reection Technique and the Limbus Boundary Technique. The pupil centre/corneal reection technique, or simply corneal reection technique as it is often called, is credited to Kenneth Mason who formalised this technique in the late 1960s [52]. His technique presents an automated procedure for eye tracking, based on observation of the eye with a camera, detection of two reections and calculation of the eye gaze position. The two reections used are the large reection from the pupil due to the bright-eye eect and the smaller reection from the corneal bulge of the eye. A photograph of the eye showing the bright pupil and the smaller corneal reection is shown in Figure 4.3. The smaller reection is often called the glint or rst Purkinje image. The position of the eye is determined based on the relative movement of the pupil reection with respect to the corneal reection. The radius of curvature of the cornea is less than that of the eye, so when the eye moves the corneal reection moves in the direction of eye movement but only about half as far as the pupil moves. This can be used to calculate eye gaze position. Several commercial systems are available that use this technique for eye gaze tracking, for example, the 50Hz Video Eyetracker Toolbox, manufactured by Cambridge Research Systems1 , and the Eyegaze Computer System, manufactured by LC Technologies Incorporated2 . Another method often used tracks the limbus,

Cambridge Research Systems Ltd., 80 Riverside Estate, Sir Thomas Longley Rd.,

Rochester, Kent ME2 4BH, England. Website: 2 LC Technologies Inc., 3955 Pender Drive, Suite 120, Fairfax, Virginia 22030, USA. Website:


Figure 4.3: Photograph of the reections when light is shone upon the eye, from LC
Technologies Inc.2 , showing the large reection of the pupil and the smaller corneal reection.

which is the iris-scleral edge, as the eye moves around. This boundary can be readily detected due to large dierences in colour intensity between the sclera and the iris. Once the boundary is detected, the iris can be modelled as two circles or ellipses. The position of the two eyes can be calculated based on the centre of the two shapes which will change as the eyeball rotates away from the central eld of vision of the camera. This method has diculties, particularly for tracking in the vertical direction, due to the eyelid occluding part of the limbus when the eye is looking up or looking down. The commercially available system IRIS, manufactured by Cambridge Research Systems1 , and the Model 310 Limbus Tracker, from the Applied Science Laboratories3 , are both based on limbus tracking. Many of the systems described above require that the head be kept stationary, and some use a large, constrictive head-rest to keep the users head in place. The head-rest used with the 50Hz Video Eyetracker is shown in Figure 4.4. There are many systems available that attempt to overcome this constraint by using computer vision techniques which also track the movement of the head and incorporate this factor into calculation of the eye position. One such system is FaceLab developed by Seeing Machines and Cambridge Research Systems1 , although this system has quite a low recovery time (0.2s) from a tracking failure that may occur when the head is moved suddenly.

Applied Science Laboratories, 175 Middlesex Turnpike, Bedford, MA 01730, USA. Web-



Figure 4.4: 50Hz Video Eyetracker, from Cambridge Research Systems website 1 .

Magnetic Search Coil Technique The Magnetic Search Coil technique was developed in the 1960s by Robinson [53], and has been marketed as a method of eye tracking by Skalar Medical4 since 1975, under the name Scleral Search Coil (SSC). An induction coil, encased in a suction ring of silicone rubber, is axed onto the eyes limbus, the boundary between the sclera and the iris. A high frequency horizontal and vertical magnetic eld is generated around the subject, which induces a high frequency voltage in the induction coil. As the user moves their eye, the voltage changes, and thus the sclera position, and therefore the eye, can be tracked. A photograph of a subject wearing the scleral search coil is shown in Figure 4.5. This technique is not often used nowadays for a number of reasons. Initial preparation is cumbersome - application of a local anesthetic is required before inserting the coil into the eye. The subject can only wear the search coil continuously for 30 minutes before irritation begins to occur. The subject must also stay in the centre of the magnetic eld for the duration of the recordings. Health issues with high frequency electromagnetic elds are not yet resolved.

The Electrooculogram The EOG is probably the most commonly used non-visual method of eye tracking. The EOG is a bio-electrical skin potential measured around the eyes. As already described in Section 4.2.2, the photoreceptor cells in the retina are

Skalar Medical bv, Thorbeckestraat 18, 2613 BW DELFT, The Netherlands. Website:


Figure 4.5: User wearing the scleral search coil, from Skalar website 4 . The wire is
just visible coming down the right side of the picture.

excited by light rays falling on them. This causes increased negativity of the membrane potential, due to ions being pumped out. Over time, a charge separation occurs between the cornea and retina of the eyeball, which can vary anywhere between 50-3500V from peak to peak. In humans the cornea is positive with respect to the retina. The eyeball can be thought of as a dipole rotating in a conducting medium. The DC voltage generated by the eye radiates into adjacent tissues, which produces a measurable electric eld in the vicinity of the eye, which rotates with the eye. The EOG is measured by placing two electrodes at opposite sides of the eyes and dierentially amplifying the signal to obtain the DC voltage between two sides of the eyeball. This measurement enables the direction of eye gaze to be inferred since the potential eld varies as the eyeball rotates towards or away from each electrode. Conventionally, the vertical and horizontal directions are used to measure eye positions (either individually or in conjunction with each other depending on the application in question). Vertical movements are detected by placing electrodes above and below the eye and horizontal movements are detected by placing the electrodes to the left and right of the eye (the outer canthi of the eye). Note that for horizontal recordings, the electrodes are generally not placed on the outer and inner sides of the same eyeball, as may be expected. This is due to practical diculties in placing the electrode at the inner edge of the eye, and for conjugate eye movements, 74

+ _

(a) Vertical EOG (b) Horizontal EOG

Figure 4.6: EOG Electrode Positions. Eye movement towards the + electrode
increases the EOG amplitude, movement towards the - electrode decreases its amplitude.

placing the electrodes at the two outer canthi should give an almost identical recording. Care must be taken when placing the electrodes to choose a location that will minimise EMG interference, which may occur when the subject frowns or speaks. Typical electrode positions for the vertical and horizontal EOG are shown in Figure 4.6. The EOG recordings for saccadic eye movement and smooth pursuit eye movement are shown in Figure 4.7, recorded as part of the work presented here. These are both horizontal EOG recordings. The recording of the saccadic eye movement was made by asking the subject to focus on the centre of a square on the computer screen and then suddenly translating the square horizontally by an optical angle of 15 at t = 0s. The recording of the smooth pursuit movement was made by asking the subject to follow a square which is moving on screen in a sinusoidal fashion, with frequency 0.4Hz. These signals have been amplied by a factor of approximately 1000, sampled at 200Hz and lowpass ltered in MATLAB using a 25th order FIR equiripple lter with fpass = 10Hz and fstop = 30Hz. The complete EOG measurement method will be described in more detail in Section 4.2.6.


Filtered and Amplifed EOG signal during Smooth Pursuit


Filtered and Amplified EOG during a Saccadic Movement




Amplitude (V)

Amplitude (V)

























Time (s)

Time (s)

(a) Smooth Pursuit

(b) Saccade

Figure 4.7: EOG recordings, recorded by the author. (a) EOG during smooth
pursuit of a target moving sinusoidally with frequency 0.4Hz, over a 20s time frame. Notice the baseline has a DC oset. (b) EOG during a saccade. The target moves at t = 0s, and the eye moves to the new position with a latency of just over 0.25s


The EOG as a Control Signal

The EOG as a communication and control tool has been explored by a number of research teams, some of whom are mentioned below. Some of the systems developed are based on detection of a small number of eye movements which may be translated into switching actions while others attempt to recognise absolute eye position from the EOG signal. An EOG based alphabet was developed in the lab [49] and uses the relative movements of the vertical and horizontal EOG to select letters on the alphabet board shown in Figure 4.8. The EagleEye system is another EOG based communication and control tool, developed by a team in Boston College in the 1990s [54]. Absolute eye position is used to control a cursor moving over an alphabet board based communication system, which is described by Teece [55]. This system uses the horizontal and vertical EOG to control a cursor moving over a software alphabet board. If the users eye-gaze remains within the region of a certain letter for more than 50% of a 833ms epoch, then that letter is selected and appears below the alphabet board. The main limitation of this system is that it requires frequent manual balancing of the amplier voltages, to compensate 76

Figure 4.8: EOG controlled alphabet board, from [49] for baseline drift. For people with very severe disabilities, this would require the aid of a helper, and thus this restraint limits the independence of the user. The problem of baseline drift in EOG recordings is further discussed below. The EOG has also been investigated as a controller for a wheelchair by Barea et al. [56], as part of a larger wheelchair project from the University of Alcala in Madrid known as SIAMO [57] (a Spanish acronym for Integral System for Assisted Mobility) which uses ultrasound, infrared sensors and cameras to create information about an environment and facilitate safer wheelchair navigation. In the system described in [56], the wheelchair user is presented with a menu of dierent wheelchair commands on a computer laptop screen (e.g. STOP, FORWARD, BACKWARD, LEFT and RIGHT). The user looks at the desired word to select a command to operate the wheelchair. This system uses an AC coupled amplier to overcome problems with DC drift. The EOG was used for communication and control purposes with a patient in the hospital. The male patient had suered a brainstem stroke and as a result was only capable of making small vertical eye movements. These movements were harnessed by recording the vertical EOG and actuating a switching action whenever a threshold was crossed. Thus the user was able to operate the Natterbox program and thus spell out messages.


Advantages of the EOG over other methods The visual systems mentioned above in this section oer robust methods of eye tracking, usually with very good accuracy. While in certain circumstances, visual methods may be more appropriate, the electrooculogram oers a number of advantages. Some of the reasons for favouring the EOG over other options for measuring eye movements are presented here.

Range The EOG typically has a larger range than visual methods which are constrained for large vertical rotations where the cornea and iris tend to disappear behind the eyelid. Oster and Stern [50] estimate that since visualisation of the eye is not necessary for EOG recordings, angular deviations of up to 80 can be recorded along both the horizontal and vertical planes of rotation using electrooculography. Visual based systems often have a lot more restricted range, for example, the 50Hz Video Eyetracker from Cambridge Research Systems1 has a horizontal range of only 40 and a vertical range of just 20 . Linearity The reective properties of ocular structures used to calculate eye position in visual methods are linear only for a restricted range, compared to the EOG where the voltage dierence is essentially linearly related to the angle of gaze for 30 and to the sine of the angle for 30 to 60 [50]. Head Movements are Permissible The EOG has the advantage that the signal recorded is the actual eyeball position with respect to the head. Thus for systems designed to measure relative eyeball position to control switches (e.g. looking up, down, left and right could translate to four separate switch presses) head movements will not hinder accurate recording. Devices for restraining the 78

head or sensing head movement are only necessary when the absolute eye position is required. Conversely, visual methods such as the limbus boundary technique require that the head be kept stationary so a head movement will not be misinterpreted as a change in eye position, and even slight head movements with respect to the light source can produce disproportionately large calibration errors. Head-brackets or chin-rests are often used to keep the head in place, often these are uncomfortable and therefore impractical to use for any length of time. Even visual methods that compensate for head movements by tracking relative movement of two points in the eye (as in the pupil boundary/corneal reection technique) require that the eyes be kept within the line of sight of the camera and thus often use a head rest anyway to keep the head in position. The criterion that the head must be kept in front of a camera may not be possible in certain circumstances where it is conceivable that the user may not be in front of a computer screen or in instances where the user has uncontrolled head spasms, as may be the case for users with cerebral palsy. Non-invasive Unlike techniques such as the magnetic search coil technique, EOG recordings do not require anything to be xed to the eye which might cause discomfort or interfere with normal vision. EOG recording only requires three electrodes (for one channel recording), or ve electrodes (for two channel recording), which are axed externally to the skin. Obstacles in front of the eye In visual methods, measurements may be interfered with by scratches on the cornea or by contact lenses. Bifocal glasses and hard contact lenses seem to cause particular problems for these systems. EOG measurements are not aected by these obstacles.


Cost EOG based recordings are typically cheaper than visual methods, as they can be made with some relatively inexpensive electrodes, some form of data acquisition card and appropriate software, unlike most of the visual systems described above, which require expensive equipment and can cost around the e10,000 benchmark. Any method using infrared light requires an infrared transmitter and camera for operation, plus expensive software to calculate the eye position from the captured image. Software to convert EOG recordings into absolute eye position is considerably more straightforward than video based techniques that require complicated computations to analyse video frames and convert this into an estimate of eye position, and thus EOG software should be less expensive. In hospitals, the electrodes necessary to measure the EOG are usually readily available. Lighting Conditions Variable lighting conditions may make some of the visual systems unsuitable or at least require re-calibration when the user moves between dierent environments. One such scenario which could pose problems is where the eye tracking system is attached to a users wheelchair. As the user moves between dierent environments the system needs to respond accordingly. This could be achieved using some sort of photosensitive device to measure lighting conditions but this would need to be interfaced with whatever system is used. Variable lighting conditions will cause baseline drift of the EOG signal. For measurement of relative eye movements this should not be a problem, since a system could be developed to respond only to sudden large changes in EOG amplitude, rather than slow changes due to varying lighting conditions. Eye Closure is Permissible The EOG is commonly used to record eye movement patterns when the eye is closed, for example during sleep. Visual methods require the eye 80

to remain open to know where the eye is positioned relative to the head, whereas an attenuated version of the EOG signal is still present when the eye is closed. Real-Time The EOG can be used in real-time as the EOG signal responds instantaneously to a change in eye position and the eye position can be quickly inferred from the change. The EOG is linear up to 30 . The frequency response of visual methods is limited by the frame rate and the calculation time of eye position from the frames.

Obviously there are also some disadvantages and these are discussed below.

Limitations of EOG-Based Eyetracking The EOG recording technique requires electrodes to be placed on both sides of the eyes, and this may cause some problems. Firstly, it requires that a helper is present who has been taught how to correctly position the electrodes. Secondly, electrodes placed around the eyes may draw attention to the users disability and compromise the users feelings of dignity. For horizontal EOG recordings, a possible solution is to use a pair of glasses or sunglasses. The two electrodes are placed on the inside of the temple arm of the glasses so that the electrodes make contact with the skin when the glasses are worn. Many people who are disabled already wear sunglasses, even indoors, due to photosensitivity. Another large problem faced by EOG-based gaze tracking systems using DC coupled ampliers is the problem of baseline drift. This problem may be circumvented by using an AC coupled amplier but then the signal recorded will only reect changes in the eye position rather than expressing the absolute eye position. If eye position is to be used for any sort of continuous control (rather than one or more switching actions) then a DC coupled amplier is 81

usually necessary. The measured EOG voltage varies for two reasons. Either the eye moves (which we want to record), or baseline drift occurs (which we want to ignore). Baseline drift occurs due to the following factors:-

Lighting Conditions The DC level of the EOG signal varies with lighting conditions over long periods of time. When the source of the light entering the eye changes from dark conditions to room lighting, Oster and Stern [50] state that it can take anywhere from between 29-52 minutes for the measured potential to stabilise to within 10% of the baseline, and anywhere between 17-51 minutes when the transition is from room lighting to darkness. Electrode Contact The baseline may vary due to the spontaneous movement of ions between the skin and the electrode used to pick up the EOG voltage. The mostly commonly used electrode type is silver-silver chloride (Ag-AgCl). Large DC potentials of up to 50mV can develop across a pair of Ag-AgCl electrodes in the absence of any bioelectric event, due to dierences in the properties of the two electrode surfaces with respect to the electrolytic conduction gel [58]. The extent of the ion movement is related to a number of variables including the state of the electrode gel used, variables in the subjects skin and the strength of the contact between the skin and the electrode. Proper preparation of the skin is necessary to maximise conduction between the skin and the conduction gel, usually by brushing the skin with alcohol to remove facial oils.


Artifacts due to EMG or Changes in Skin Potential The baseline signal may change due to interference from other bioelectrical signals in the body, such as the electromyogram (EMG) or the skin potential. EMG activity arises from movement of the muscles close to the eyes, for example if the subject frowns or speaks. These signals may be eectively rejected by careful positioning of the electrodes and through low pass ltering the signal. Skin potential changes due to sweating or emotional anxiety pose a more serious problem. Age and Sex Oster and Stern [50] report that age and sex have a signicant eect on baseline voltage levels, although this should not pose a problem if a system is calibrated initially for each particular user. Diurnal Variations The baseline potential possibly varies throughout the day.

Manual calibration is often used to compensate for DC drift - the subject shifts his gaze between points of known visual angle and the amplier is balanced until one achieves the desired relationship between voltage output and degree of eye rotation. With frequent re-calibration, accuracies of up to 30 can be obtained [50]. While manual calibration may be acceptable practice in clinical tests that use the EOG, this restriction hinders the EOG from being used independently as a control and communication tool by people with disabilities. A technique called Target Position Variation is proposed here, which enables the user to automatically re-calibrate their EOG whenever signicant baseline drift is perceived to have occurred.



Target Position Variation

Target Position Variation (TPV) was developed as part of the work presented here as a way of improving EOG based communication and control. There are two possible applications for TPV given here, although there may be others. These are in menu selection or in automatic eye position re-calibration. In TPV-based menu selection the user is presented with a screen containing a number of dierent menu items. An example menu is shown in Figure 4.9, where four menu options are given - Lights, Radio, Television and Fan. Underneath each of the menu options are icons which are each moving sinusoidally at a unique frequency. The user chooses one of the four options by looking at the appropriate icon and following its path of movement. This type of eye movement is a smooth pursuit movement and will generate an EOG similar to the one shown in Figure 4.7(a), where the EOG voltage value varies in synchronisation with the sinusoidal movement of the icon on screen. Since each of the icons is moving at a dierent frequency, spectral analysis of the EOG signal can be used to determine which icon is being tracked by the user. Thus the system can identify the required menu item. In the system shown in Figure 4.9, the program could then issue a command to an environmental control module to toggle the state of the appliance chosen. Phase dierences could also be used instead of frequency dierences to individually recognise each icon. Target Position Variation can also be used for automatic eye position recalibration in applications where eye gaze is used for continuous control. An example of such a system would be one where eye position is used to control the position of a mouse cursor on screen. In EOG based systems for mouse cursor control, the user may nd that after a certain time the position of the cursor begins to drift away from their centre of vision, due to the onset of baseline 84

Figure 4.9: TPV based menu selection application. The user selects a menu option
by tracking the moving icon below the desired item.


drift. Once this occurs, re-calibration is necessary. Usually this requires another person to manually balance the amplier to compensate for drift. TPV oers a means of automatic re-calibration without requiring another person present, by positioning an icon at some corner of the screen which is varying sinusoidally. Each time the user nds that there is a signicant error between their desired cursor position and the actual cursor, they move their eye gaze towards the oscillating icon at the corner of the screen. This will generate a sinusoidal wave in their EOG which can easily be detected. Since the position of the icon on screen is known, the system can calculate the baseline oset and thus re-calibrate the absolute eye position for subsequent mouse cursor movements.


Experimental Work

A number of dierent experiments were conducted with able-bodied subjects to assess the suitability of TPV for human-computer interfaces, and to determine suitable parameters that could be used to form a working system. Preliminary testing was performed to select the following features used for the subsequent experiments: Target Shape A suitable icon shape is one with a well dened point that the subject can be instructed to focus on. This is desirable if TPV is to be used to re-calibrate eye position. Since the position of the focal point is known, a more accurate estimate of absolute eye position can be made if it can be assumed that the subject is focusing on this point. Two candidate shapes are shown in Figure 4.10. Each has a clearly dened point in the centre of the icon. The shape in Figure 4.10(a) was arbitrarily chosen for the experiments described below.


(a) Square

(b) Diamond

Figure 4.10: TPV Candidate target shapes . Target Size If the icon is too small it will be dicult to follow, if the icon is too large the users gaze may drift away from the centre of the icon. An icon width of 60 pixels was chosen. The screen size used for the experiments was 32cm and the horizontal pixel width was 1024, so this corresponds to an actual width of 1.875cm. Oscillation Pattern Two target oscillation patterns were initially tested - a triangular wave pattern, where the magnitude of the speed of the icons motion is constant, and a sinusoidally varying pattern. Although it was observed that both patterns were reected equally well in the recorded EOG, the sinusoidally varying pattern was chosen as it was deemed more easily recognisable spectrally. A sinusoidal pattern will correspond to a single peak in the frequency spectrum, whereas a triangular wave will also contain odd-numbered harmonics which may complicate the detection process and also might introduce the constraint that two dierent icons could not have frequencies that are odd-numbered multiples of each other. Oscillation Direction The maximum variation obtainable in the EOG when the eye is in smooth pursuit of an object occurs when the object is moving along the line of 87

the EOG recording. Thus to obtain this maximum variation, there were two possibilities available - either the target could be made to oscillate vertically and the vertical EOG recorded, or the target could be made to oscillate horizontally and the horizontal EOG recorded. Since the vertical EOG may introduce artifacts due to blinking, the horizontal EOG was chosen for experiments and thus a horizontal oscillation direction was used.

Procedure Gold cup electrodes were used lled with an electrolyte gel. Since the horizontal EOG was to be measured, the active and reference electrodes were xed to the subjects temples, beside the two eyes outer canthi, the junction where the upper and lower eyelids meet. The ground electrode was xed to the subjects right earlobe. A DC custom-built EOG amplier with an approximate voltage gain of 1000 was used. The design is based on a classic instrumentation amplier topology [59] that requires a single quad op-amp, along with a handful of resistors and capacitors, which need not have particularly small tolerances. The LTC1053 chopper-stabilised quad op-amp was chosen for its high input impedance and low input oset voltage. Two PCs were used for these experiments, a display PC to display the moving icons and a recording PC to record the EOG data. Data was acquired by the recording PC using a National Instruments NIDAQ PCI 6023E card. The sampling frequency was 200Hz. Two channels were acquired, one for the EOG and one to record synchronisation pulses from the corner of the display PCs screen, via a phototransistor. These pulses allow the recorded EOG data to be accurately related to the oscillation of each icons position. The data was acquired using MATLABs Real Time Windows Target in the Real Time Workshop. The user was positioned opposite the centre of the display PC screen, 50cm 88

from the screen. Since the screen width was 32cm, if the visual angle is taken to be at 0 when the user is focusing on the centre of the screen, then the maximum visual angle through which the users eye position will extend while still looking at the screen is 17 . This is within the linear range of EOG measurement, which is approximately 30 , so the EOG amplitude will be directly proportional to the angle of gaze. The experimental procedure consisted of two separate parts.

Experiment 1 This part of the experiment consisted of 12 separate EOG recordings, each lasting 20s each. For each recording, a single target was presented moving with a horizontal sinusoidal oscillation centred at the middle of the screen at position {x0 , y0}. A dierent combination of amplitude, A, and frequency, f, was used for each of the recordings. The horizontal position for the icon at any moment in time, xn , may be calculated as: xn = x0 + A sin(2f t) (4.1)

Three amplitudes of oscillation were used and for each amplitude setting, four dierent frequencies were tested. The three amplitudes of oscillation used were 25, 50 and 100 pixels from the centre in both directions, corresponding to a maximum visual angle from the centre line of 0.89 , 1.79 and 3.59 . The four frequencies of oscillation used were 0.2Hz, 0.4Hz, 0.8Hz and 1.6Hz. For each icon movement, the subject was instructed to keep his or her head still and follow the position of the icon on screen. The data was recorded over the 20s period for each icon movement. The twelve graphs obtained over each 20s period for one subject are shown in Figure 4.11. Note that the y-axis extends over a range of 0.25V for each of the twelve graphs, although each may have a dierent DC oset.









































(a) EOG A=25 f = 0.2Hz

0 0.1 0.05 0.05

(b) EOG A=50 f=0.2Hz

(c) EOG A=100 f=0.2Hz

0 0.05



































(d) EOG A=25 f=0.4Hz

0 0 0.05 0.05

(e) EOG A=50 f=0.4Hz

(f) EOG A=100 f=0.4Hz

0 0.05




0.15 0.15 0.15

0.2 0.2 0.2

0.25 0.25 0 500 1000 1500 2000 2500 3000 3500 4000 0.25 0 500 1000 1500 2000 2500 3000 3500 4000 500 1000 1500 2000 2500 3000 3500 4000

(g) EOG A=25 f=0.8Hz

0.2 0.1 0.15 0.05

(h) EOG A=50 f=0.8Hz

(i) EOG A=100 f=0.8Hz

0.05 0


































(j) EOG A=25 f=1.6Hz

(k) EOG A=50 f=1.6Hz

(l) EOG A=100 f=1.6Hz

Figure 4.11: EOG Recordings of One Subject for TPV: Experiment 1


Visual inspection of the graphs shown in Figure 4.11 reveals the following observations: 1. The phenomenon of baseline drift is evident, especially in Figures 4.11(a), 4.11(h), 4.11(k) and 4.11(i). 2. The EOG recordings for A=25, which corresponds to movement through an angle of 0.89 , are strongly contaminated by noise. This is especially evident for f=0.8Hz and f=1.6Hz, where the oscillation is barely discernible from the background noise. Low pass ltering could be used to reduce the eect of noise in the signal, but this may introduce a phase lag which would not be desirable if phase dierence was chosen as the parameter used to recognise which icon is being followed. 3. The EOG for A=50 and A=100, which corresponds to movement through an angle of 1.79 and 3.59 , both look promising, the oscillation is clearly visible for each. 4. The four dierent frequencies (0.2Hz, 0.4Hz, 0.8Hz and 1.6Hz) appear to all give satisfactory results, although the subjects reported that it was beginning to become dicult to follow the icon when it was moving through an amplitude of A=100 (3.58 ) at 1.6Hz. The average eye speed, ve , required to pursue a target moving with this frequency at this amplitude is: ve = (4 3.59)(/cycle) 1.6(cycles/s) = 23 /s (4.2)

This is approaching the limit of eye movement for smooth pursuit, which is around 30 /s. 5. For f=0.2Hz, it takes 5s for one cycle of the target oscillation to be completed in the EOG. Depending on the method used to recognise an oscillation, this may introduce a signicant delay between when the user begins to follow the icon and when the system begins to recognise this pattern and respond to this action. 91

Table 4.1: Parameters of the four icons for each of the four parts of Experiment 2
Fixed x0 I1 I2 I3 I4 150 850 500 500 y0 350 350 150 550 Colour red green blue black Expt 2A 0 90 180 270 f 0.4Hz 0.4Hz 0.4Hz 0.4Hz Expt 2B 0 90 180 270 f 0.8Hz 0.8Hz 0.8Hz 0.8Hz Expt 2C 0 90 180 270 f 1.6Hz 1.6Hz 1.6Hz 1.6Hz Expt 2D 0 0 0 0 f 0.2Hz 0.4Hz 0.8Hz 1.6Hz

These experiments were repeated on a number of dierent subjects and similar results were obtained from each subject. Based on these results, A=100 was used for all oscillations in the next experiment.

Experiment 2 In this experiment, 4 icons, which can be labelled I1 , I2 , I3 and I4 , were presented to the subject. Each icon oscillates with an amplitude of 100 pixels. The experiment consisted of four parts. In Part A, all four icons oscillate at 0.4Hz, but each with a dierent phase (I2 , I3 and I4 are 90 , 180 and 270 out of phase with I1 ). For Part B and Part C of the experiment, the frequency of all four icons was changed to 0.8Hz and 1.6Hz respectively and the phase dierences remained the same. In Part D, all four icons oscillate at dierent frequencies. The horizontal position for each of the four icons at any moment in time, xn may be calculated as: xn = x0 + 100 sin(2fn t + n ) (4.3)

The individual values of f and used to calculate x1 , x2 , x3 and x4 for each of the four parts of the experiment are summarised in Table 4.1. The screen that the subject is presented with is shown in Figure 4.12. The screen consists of a box in the centre of the screen, the control box, surrounded by four moving icons. A box on the bottom left-hand corner of the screen is used for synchronisation. Before the experiment starts, the experimental 92

Table 4.2: Sequence that the subject follows each of the icons in Experiment 2D. 0-5s 5-10s White Red subject should follow I1

10-15s White 15-20s Green subject should follow I2

20-25s White 25-30s Blue subject should follow I3

30-35s White 35-40s Black subject should follow I4

40-45s White procedure is described to the subject. The subject should look at the control box at the centre of the screen which will initially be white. Once the control box changes colour, the subject should move their gaze to the icon with the colour corresponding to the colour of the control box. The subject should follow the moving icon until they can see in their peripheral vision that the control box colour has returned to white. The subject should then move their gaze onto the control box until it changes colour again. The subject should familiarise himself or herself with the location of the four coloured icons before the experiment starts. The duration of each of the four experiments was 45 seconds and the centre box colour in each followed the same sequence which is shown in Table 4.2. The resulting EOG for one subject for each of the four 45s periods is shown in Figure 4.13. As the red and green icons are to the left and right of the centre box, a vertical oset in EOG can be seen for the periods 5-10s and 15-20s, where the user moves their eyes to follow those icons. The blue and black icons are moving directly above and below the centre box and hence do not cause a change in the DC oset when the user moves their eyes to look at these icons.


Figure 4.12: TPV Experiment 2: Screenshot of the scene presented to the subject.


TPV Based Menu Selection

Dierent signal processing techniques were examined to identify a robust method of determining which icon the subject is looking at from the recorded data. The method developed here is named target identication. The aim was that the method identied could be used in a system for menu selection, that would contain K dierent icons, each moving at a dierent frequency. The method should enable automatic determination of which icon, if any, the subject is looking at and hence enable the program to select the menu option corresponding to that particular icon. The initial impulse might seem to perform a full N-point FFT on a chunk of the recorded EOG data, from some previous sample n N to the current sample n. Then this spectrum would be searched to nd the frequency component with the maximum power, and a conclusion could be drawn that the icon that the user is looking at is the icon with the closest frequency to this maximum. However, this method is extremely computationally inecient, and 94


0.5 0.4

0.3 0.3 0.2 0.2 0.1 0 0 0.1 0.1 0 2000 4000 6000 8000 0.2 0 2000 4000 6000 8000


(a) Expt 2A
0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0 2000 4000 6000 8000 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0 2000

(b) Expt 2B




(c) Expt 2C

(d) Expt 2D

Figure 4.13: TPV Experiment 2. The x-axis shows number of samples and the
y-axis shows Volts (gain = 1000). In Expt 2A, 2B and 2C the frequencies of the icons are xed to 0.4Hz, 0.8Hz and 1.6Hz respectively and the icons are moving each 90 out of phase with the last. In Expt 2D the icons are moving with frequencies 0.2Hz, 0.4Hz, 0.8Hz and 1.6Hz.


also introduces errors if there is some spurious frequency peak due to noise. In reality, the only frequency components that need to be examined are those at the same frequencies as each of the icons. If the power at any of these components is greater than a threshold, then the user is assumed to be following the icon with the frequency in question, if not, no icon is being followed. Instead of performing an FFT computation, which calculates the power in the spectrum at every frequency interval, it is only necessary to calculate the components of the Fourier Series corresponding to each of the K frequencies. The EOG signal s(t) is sampled by the computer at a xed sampling rate, fs . The sampling interval t = fs 1 . The notation used to represent the sampled EOG signal is s[n] = [si , si1 , si2 . . . s1 , s0 ], where s0 is the the rst sample, sampled at t=0s (n=0) , and si is the current sample, sampled at t = it (n=i). There are K Fourier Series component coecients to be calculated at each sampling instant i, which we can number from {0 k K}. The current coecient at sampling instant i corresponding to the particular icon frequency, fk , with period Tk , is notated as cik . This coecient can be calculated using the last Tk samples of recorded data siTk +1 si . cik = 2 Tk

sm exp(2fk mt)
m=iTk +1


Based on this, a candidate reconstruction signal ri [n], consisting of T elements, can be calculated at each of the K frequencies fk . r ik [n] = Re(cik ) cos(2fk nT [n]) + Im(cik ) sin(2fk nT [n]) (4.5) where rik [n] = [riTk +1 , riTk +2 , ri1 , ri] and T [n] = [t(i-Tk +1), t(i-Tk +2), t(i-1), t(i)] (4.6) (4.7)

If the reconstructed signals for the period [n : n Ti + 1] at any of the K frequencies are a close t to the recorded EOG data over the preceding Tk seconds, then it can be assumed that the user is tracking an icon moving with that frequency. The closeness of t is quantied as follows. The EOG recorded 96

will have a DC oset which needs to be subtracted from the original signal before it can be compared to the reconstructed signal. The signal average, cik , was calculated in each case over the previous Tk samples. cik = 1 Tk

m=iTk +1


The error value, eik between each of the K reconstructed signals and the actual recorded signal over the previous Tk samples is calculated for the each candidate frequencies as follows.

eik =
m=iTk +1

|sm cik rm |


The t value for each frequencies, F Vk is dened as: F Vik = |cik | eik (4.10)

At each point i, the user is assumed to be tracking an icon moving at frequency fk if the t value F Vik > F Vthreshold. The data from Experiment 2D was used to calculate the t value at each sample interval for each of the four frequencies, f1 = 0.2Hz, f2 = 0.4Hz, f3 = 0.8Hz and f4 = 1.6Hz, called F V0.2 [n], F V0.4 [n], F V0.8 [n] and F V1.6 [n]. There are ve decision choices that can be made by a system at each time interval for the data from Experiment 2D, either the user was looking at one of the four icons moving sinusoidally with frequencies of 0.2Hz, 0.4Hz, 0.8Hz and 1.6Hz, or the user is not looking at any of the four icons. The t value at each sample is plotted for each of the four frequencies in Figure 4.14. Note that in each case the t value is only calculated once Tk samples have elapsed. Note that each of the t functions shows a peak when the user is tracking the correct icon but not all to the same amplitude, therefore dierent threshold values should be used for each of the four t functions. Also the peak in in Figure 4.14 (b), F V0.2 , only peaks at the end of the oscillation. Since in this experiment, the user was only tracking each icon for 5s, the function does not work very well to identify the icon at lower frequencies. This is due to the fact that it takes 5s before one period of oscillation of the icon moving at 0.2Hz occurs. In the gure shown, the 97



0 0.5 0.01










0.005 0 0.02










0.01 0 0.04










0.02 0 0.1










0.05 0









Figure 4.14: (a) Original EOG recording (b) Fit values for f=0.2Hz (c) Fit values
for f=0.4Hz (d) Fit values for f=0.8Hz (e) Fit values for f=1.6Hz

user is tracking the 0.2Hz icon from approximately [t = 9.5s t = 14.5s], but a noticeable peak in the function F V0.2 can be only seen to occur slightly around t=14s, i.e. once the reconstructed signal covers one period of recorded data at 0.2Hz. Quicker recognition times for more slowly moving icons could perhaps be achieved by modifying the denition of the t value function at lower frequencies - maybe by only attempting to calculate a reconstructed signal that is a fraction of a period long. The best t is for the f=0.8Hz or f=1.6Hz data. A working system using this method could calculate each of the four t functions at regular intervals and compare each one to a suitable threshold. When one of the four t functions crosses its threshold, the system recognises that the user is following the icon with the frequency in question, and performs the appropriate action. The theory of target position variation, the experimental work presented here and the method of target identication formed the basis of a paper presented at the conference PGBIOMED05 in Reading [60]. 98


Limitations of Eyetracking for Cursor Control

Mouse cursor control has been suggested here as a suitable application for eyegaze tracking. However, this method does have some diculties. The use of eye movements as a substitute for a manually controlled mouse to navigate a pointing cursor on a computer screen initially seems to be an ideal application for eye movements. This proposal is not as straightforward to implement as it may at rst appear for a number of reasons. Firstly, even when a user perceives that they are looking steadily at a single object the eye may make slight continuous jerks. One of the reasons for this eye jerking is due to microsaccades, which occur to ensure that individual photoreceptor cells of the retina are continuously stimulated [61]. Eye jerks may also occur due to the eye drifting around various points of an icon on screen. These jerks may cause problems when using eye tracking for cursor control since it will be hard to keep the pointer xed on a target. The system needs to recognise such situations and respond as if the subject were looking steadily at the desired target. Just and Carpenter [62] have developed an approach which attempts to overcome this problem. If the user makes several xations around an object on screen, connected by small saccades, then the motion is grouped together into a single gaze at the object. The second problem of using eye movements continually to control a pointing cursor or other device is what is described by Jacob [63] as the Midas Touch problem. This is described as follows:

At rst, it is empowering to be able simply to look at what you want and have it happen, rather than having to look at it (as you would anyway) and then point and click it with the mouse. Before long, though, it becomes like the Midas Touch. Everywhere you look, another command is activated; you cannot look anywhere without issuing a command.



A Model of the Eye

The investigation of eye movements for communication and control purposes motivated an in-depth study of the processes involved which cause horizontal and vertical rotation of the eyes. The eects of the eyes muscle spindle on the eyeballs torque were also considered. Based on this study, a feedback model of the eye has been developed. This system models rotation of the eye in either the horizontal or the vertical plane. In either plane, rotation of the eye is controlled by a pair of agonist-antagonist extraocular muscles. Contraction of one muscle rotates the eye in the positive direction, and of the other in the negative direction. In the model presented here, these two muscles are condensed into a single equivalent bidirectional muscle, which can rotate the eyeball in either direction. The inuence of the muscle spindle is incorporated into the model in an inner feedback loop. The model was developed initially to model saccadic eye movements and it attempts to predict the eye response to a saccade, such as the 15 jump shown in Figure 4.7(b). The original model was later modied to include prediction of smooth pursuit movements. The model of the eye for saccadic eye movements will rst be described. The structure of the control loop for this model is shown in Figure 4.15. The model shown consists of two feedback loops. The outer loop consists of a non-linear element in cascade with a controller in the forward path. The inner loop, containing C1 (s) and G(s), models the eect of the muscle spindle on the eye, which acts to speed up the response of the eye.

Inner Loop of Model The eyeball muscle torque is controlled by the muscle spindle. The inner feedback loop, shown in Figure 4.15, represents the spindle feedback mechanism. The muscle spindle essentially senses the error, el , between rl , the locally generated reference value for and the current value of , and uses it to generate the gross rotational torque Tg , on the eyeball. Drawing on Starks work [64] 100

on control of the hand, the transfer function relating el (s) and Tg (s) is taken to be of the form Tg (s) f1 s + f0 = C1 (s) = 2 el (s) s + h1 s + h0 (4.11)

G(s), the transfer function of the eye dynamics can be calculated using the model proposed by Westheimer [65], which gives the following equation relating the eye position, , to the net torque Tn , applied to the eyeball. J d2 d = Tn F K 2 dt dt (4.12)

The gross rotational torque, falls o with the velocity of contraction due to the eects of friction, usually nonlinearly. However, in the interests of getting a linear dynamic model, the fall-o is approximated as as being linearly related to eye velocity. Thus the net torque applied to the eyeball, Tn , can be related to the friction factor, f , and gross torque Tg . Tn = Tg f d dt (4.13)

Substituting Equation 4.13 into Equation 4.12 and taking the Laplace transform with zero initial conditions gives the transfer function G(s) of the eye dynamics. G(s) = (s) Tg (s) (4.14)

1 J = f +F K s2 + s+ J J

where, according to Westheimer [65], the values of the constants, in SI units are: J = 2.2 103 F = 168 J K = 14400 = 1202 J (4.15) 101

possible signal from the brain

from brain


+ _

nonlinearity NL

C2 (s)

rl +


muscle spindle C1 (s) 1


eyeball dynamics



Figure 4.15: Proposed feedback control structure for the eye.

The standard form of the transfer function of a second-order linear system is

2 Gdc n 2 s2 + 2n s + n


where n = undamped natural frequency Gdc = DC gain of the system = damping ratio Comparing this to the transfer function G(s) with no friction added (f = 0) G(s) = 454.55 + 168s + (120)2 (4.17)


gives a damping ratio = 0.7 and undamped natural frequency = 120 rad/s. f The literature gives no guidance as to what value to assign to . This model J proposes that the value of f is such that it increases the damping ratio to f = 1, the value at which critical damping occurs, i.e. = 72, giving J G(s) = 454.55 + 240s + (120)2 454.55 = (s + 120)2 s2


Gi (s), the overall transfer function of the inner loop relating the eye position to the local reference value rl then can be calculated as Gi (s) = C1 (s)G(s) 1 + C1 (s)G(s) 1 .(f1 s + f0 ) J (s2 + h1 s + h0 ).(s + 120)2 (4.19)

1 + .(f1 s + f0 ) J

The denominator here is the characteristic polynomial of the inner loop, whose roots determine the nature of its dynamics. From Equation 4.18, it can be seen that the roots of the characteristic polynomial of the transfer function for the eyeball dynamics alone, without the inuence of the muscle spindle, would be 103

at s = 120. Assuming that the function of the muscle spindle is to speed up the eyeball response, the roots of the overall characteristic polynomial are lower than this value. This model proposes to place all four roots at the same location i.e. at s = 120b, where b > 1. This gives (s + 120b)4 = (s2 + h1 s + h0 ).(s + 120)2 + 1 .(f1 s + f0 ) J (4.20)

Multiplying out and comparing coecients gives the following values: h1 = (4b 2).(120) h0 = (6b2 8b + 3).(120)2 f1 = J(4b3 12b2 + 12b 4).(120)3 f0 = J(b4 6b2 + 8b 3).(120)4 (4.21)

Comparison with the recorded data of Figure 4.7(b) seems to indicate that a value b = 2, in conjunction with tuning of the integral controller C2 (s) in the outer loop, gives a very good match to the real response. Substituting these values into Equation 4.11 gives the expression C1 (s) = 15206.4s + 2280960 s2 + 720s + 158400 (15206.4)(s + 150) = (s + 360 169.7056).(s + 360 + 169.7056) (4.22) (4.23)

As shown by Cogan and de Paor [66], assigning the four roots of the characteristic polynomial to the same location gives the interesting property of optimum stability to the system. If all controller parameters but one are held at their nominal values then, as that one is varied through its nominal value, the right-most root is as deep in the left half plane as possible. The four parameters of this controller are f1 , f0 , h1 and h0 . To demonstrate the principle of optimum stability, three of these parameters are held at their nominal value in turn and the fourth is varied. The root locus for each of the four roots are included in Appendix D. It can be seen that in each case, as the root passes through its nominal value, the system is optimally stable in the sense that its right-most eigenvalue is as deep in the left half plane as possible. 104

The eect of the muscle spindle to speed up the eyeball response can be observed by comparing the unit step responses of the overall transfer function G(s)C1 (s) of the inner loop, Gi (s) = , and the transfer function for the 1 + G(s)C1 (s) eyeball dynamics alone, G(s). Note that these two transfer functions have dierent static gains. Gi (s) = = G(s)C1 (s) 1 + G(s)C1 (s) 120)2(s2 (4.24)

454.55(16896s + 2304000) (s + + 800s + 160000) + 454.55(16896s + 2304000) 454.55(2304000) Gi (0) = 2 (160000) + 454.55(2304000) (120) = 0.3125 (4.25) 454.55 (120)2 = 0.0316

and G(0) =


For comparison purposes G(0) is scaled to have the same value as Gi (0). The two unit step responses are shown in Figure 4.16. The eect of the muscle spindle to speed up the step response is evident. With the inuence of the muscle spindle, the system reaches a steady state value of 0.3125 after 0.0267s, compared to a time delay of 0.0701s without the muscle spindle included. As seen in Equation 4.23, the poles of this transfer function C1 (s) are complex. However, work by Stark [64] strongly suggests that both poles should be real. This criterion can be demonstrated by considering the muscle spindle model in Figure 4.17. The ring rate of the muscle spindle, f, is controlled by x2 , the stretched length of the second spring in the model. B(s) is a biochemical k length transducer with a transfer function of the form , basically a low 1 + s pass lter with gain k and time constant . The total stretch on the central part of the muscle spindle, which is called the nuclear bag is the sum of the two individual spring stretches, x1 + x2 .


Step Response 0.35



0.2 Amplitude 0.15 0.1 0.05

Gi(s) G(s)
0 0 0.02 0.04 0.06 Time (sec) 0.08 0.1 0.12

Figure 4.16: Step response of eye, the transfer function Gi(s) includes the inuence
of the muscle spindle and the transfer function G(s) is of the eyeball dynamics alone, without this inuence. The eect of the muscle spindle is to speed up the eyeball response.

x1 x2 1

1 , 2 = spring stiffness x1 , x2 = stretch of springs f = firing rate of spindle on the spindle nerve D=damper


B(s) x1 + x2

Figure 4.17: Model of the nuclear bag, the central part of the muscle spindle


F (s) is equivalent to the muscle spindle controller X1 (s) + X2 (s) described above, C1 (s), so an expression for this must be found that will In this model, demonstrate that the poles of the transfer function are real. From examination of Figure 4.17, and expression for F (s) in terms of X2 (s) can be directly written down. F (s) = X2 (s) = k .X2 (s) 1 + s 1 + s .F (s) k


The relationship between X2 (s) and X1 (s) can be found by balancing the forces at the point P, and taking the Laplace transform with zero initial conditions. dx1 = 2 x2 dt 1 X1 (s) + DsX1 (s) = 2 X2 (s) 2 X1 (s) = .X2 (s) 1 + Ds 1 x1 + D

(4.28) (4.29)

Using this equation and Equation 4.27 an expression for X1 (s)+X2 (s) in terms of F(s) can be found. 2 + 1 .X2 (s) 1 + Ds 1 + 2 + Ds = .X2 (s) 1 + Ds 1 + 2 + Ds 1 + s . .F (s) = 1 + Ds k F (s) k(1 + Ds) = X1 (s) + X2 (s) (1 + 2 + Ds)(1 + s ) X1 (s) + X2 (s) =

(4.30) (4.31)

This transfer function is C1 (s), the muscle spindle controller, which can be written in the following form. k 1 s+ D 1 + 2 1 s+ s+ D

C1 (s) =


Since 1 , 2 , D and are all real-valued, this transfer function has real poles, unlike the muscle spindle transfer function estimate given in Equation 4.23. 107

To obey this restriction, the original transfer function was replaced by a close approximation which has real roots. The denominator of C1 (s) was approximated as s2 + 800s + 160000 = (s + 400)2 , which is close to the denominator in Equation 4.23 but has real roots. Then f0 was scaled to preserve the static gain C1 (0). The scaled value of f0 is f0 . f0 h0 f0 2280960 = 160000 158400 0 = 2304000 f C1 (0) = f1 Similarly, f1 is scaled to f1 to preserve the ratio h1 15206.4 f1 = 800 720 f1 = 16896 Therefore the expression for C1 (s) actually used was C1 (s) = 16896s + 2304000 s2 + 800s + 160000 (4.35)



To give an idea of the accuracy of the approximation, a comparison of the unit step responses of the two controllers and their Bode magnitude diagrams are shown in Figure 4.18.

Outer Loop of Model As shown in Figure 4.15, the outer loop of the model conists of a non-linear element NL(e) followed by a controller C2 (s). This is based on the model proposed by Stark [64] for saccadic eye movements, with several important dierences. The controller in Starks model is a pure integrator and this has ki been kept in the model presented here, so C2 (s) = . The non-linear element s in Starks model has a dead-zone i.e. NL(e) = 0 for |e| < 0.03. This feature is preserved, since it represents the fact that no correction need be applied if the image remains focused on the fovea. However, for errors outside this 108

Step Response 25 (a) (b)


15 Amplitude 10 5 0 0



0.015 Time (sec)





(a) Muscle spindle unit step responses

Bode Diagram 30 (a) (b)



Magnitude (dB)



0 1 10


10 Frequency (rad/sec)


(b) Bode magnitude diagrams of the muscle spindle controllers

Figure 4.18: Unit step response and Bode magnitude diagrams of the muscle spindle
controllers. Blue Lines: Original transfer function Equation 4.23. Red Lines: Modied Transfer Function Equation 4.35,


region, Starks model has a linear variation of NL(e) with e, and with this linear variation it appears not possible to reproduce the linear transition of the saccadic response in Figure 4.7(b). Since C2 (s) is an integrator, it seems to suggest that NL(e) is saturated for |e| > 0.03, and that the input to the inner loop (the muscle spindle controller), is ramping up linearly during the transition. Thus the function NL(e) used in this model is as follows. NL(e) = |S| for|e| > 0.03 0 otherwise (4.36)

From inspection of Figure 4.7(b), it is clear that the rate of change of is governed by the product Ski . Therefore S = 1 can be arbitrarily chosen, and ki adjusted to get the slope of the response of the model to match the empirical data. A value of ki = 19 was found to give a simulated response close to the measured response. Starks model also has a pure time delay element of 0.235s within the outer loop. However, in simulations, this delay leads to sustained oscillations of the system. This may be prevented by placing the delay element outside the outer loop, as shown in Figure 4.15. Increasing the delay to 0.267s gives the closest match between the actual EOG response recorded in Figure 4.7(b) and the simulated response. The two responses are shown superimposed on each other in Figure 4.19. The real response is scaled vertically for comparison purposes. In the simulated model, the input is 0.2618 radians (15 ). The simulation was done in Simulink, part of the MATLAB environment. The Simulink model used is given in Appendix B. This model is summarised in the paper [67].

Model for Smooth Pursuit Movements As shown, a Simulink simulation of the model described gives a response to a 15 input which is very close to the measured EOG response to a saccadic eye movement of 15 . However, for smooth pursuit movement, the system 110

Saccadic Eye Movement

0.35 EOG model 0.3



Voltage (V)









Time (s)

Figure 4.19: Actual EOG and Simulated Saccadic Responses to a target that suddenly jumps by approximately 0.2618 rad.


as described does not work so well. Figure 4.7(a) shows the measured EOG response to a sinusoidally varying target. However, when a sinusoid is set as the input to the control loop in Figure 4.15, simulation gives an output with jumpy steps not apparent in the measured EOG response. Clearly, something else is also happening to give the response measured in Figure 4.7(a). In the model proposed here, when following predictable target movements such as sinewaves, there is an extra input from the brain, along the path labelled possible signal from the brain in Figure 4.15. This signal is synthesised in such a way as to follow the predictable reference input with essentially zero (or very small) error. The model presented in Figure 4.15 is modied for smooth pursuit movements as shown in Figure 4.20. When the eye is moving in a saccadic fashion this model is the same as the original model in Figure 4.15. However, when the eye is in smooth pursuit of an object, this model proposes that the signal labelled ref passes through two additional blocks. Firstly, it passes through a Predictor block which eectively cancels out the eect of the time delay. For a sinewave input this would be a phase advance network, to cancel the phase lag introduced by the delay. The signal ref is adjusted by the brain in such a way that the signal e is conned to the deadzone of the nonlinearity NL(e) i.e. the image remains focused on the fovea. Therefore the output from the NL(e) block is always zero and hence the portion consisting of NL(e) and C2 (s) is eectively disconnected while the eye is in smooth pursuit of a target. The system is therefore responding as if it were just the inner loop. This model also proposes that the signal ref goes through another block which is adjusted accordingly by the brain to make ref = . With the outer loop disconnected the modied model is as shown in Figure 4.21. Therefore the additional block H(s) is eectively an inverse model of the inner loop Gi (s). If ref is a sinusoid, ref = m sin(t) (4.37)

then at the frequency , H(s) is behaving as if it had a transfer function 112

Saccadic Mode

Smooth Pursuit Mode


effective inverse model of inner loop


+ _

nonlinearity NL

C2 (s)

rl +

+ _



muscle spindle C1 (s) 1


eyeball dynamics


Figure 4.20: Feedback Control Loop, modied to include eects of Smooth Pursuit Movements

1 1 i.e. its gain is and its phase is Gi (), which gives an Gi () Gi () output from the predictor block : = m . sin(t Gi ()) |Gi ()| (4.38)

The Bode magnitude and phase of Gi () were plotted in MATLAB, and can be seen in Figure 4.22. At the frequencies in question for smooth pursuit ( < 100rad/s), the gain of this transfer function is a constant and the phase dierence is approximately zero. It can be seen from Figure 4.22 that the gain of Gi at low frequencies is around 10dB = 100.5 . This is approximately equal to the static gain gure of 0.3125 calculated in Equation 4.25. For initial tests of this model, the input from the outer loop was disconnected and H(s) was modelled by a simple gain block. The Simulink model used is given in Appendix B, with H(s) replaced by a gain block of value 1 . The input sinusoid was chosen to match as closely as possible the 0.3125 measured EOG signal in Figure 4.7(a) which is following a target oscillating at 0.8Hz or 5.027 rad/s. The results are plotted in Figure 4.23. As can be seen from Figure 4.23(b) there is a very small oset between the input and output of this model. This is due to approximating H(s) by a gain block. This neglects any phase dierence introduced by this block. The Bode phase plot in Figure 4.22 shows that Gi (s) introduces a slight change in phase with frequency and therefore the inverse model of H(s) should also account for this. However, there is a problem with modelling H(s) as the inverse transfer function of Gi (s). The inverse transfer function of Gi (s) can be written by inverting Equation 4.24. 1 (s + 120)2(s2 + 800s + 160000) + 454.55(16896s + 2304000) = (4.39) Gi (s) 454.55(16896s + 2304000) This is an improper transfer function i.e. its numerator is of a higher degree than the denominator. A proper rational function, which would be characteristic of a realisable function, may be approximated by placing a fourth 114

H(s) ref

+ G(s) ref H(s) Gi (s)


C1 (s)

Figure 4.21: Modied loop for Smooth Pursuit

Bode Diagram 10 15 20 Magnitude (dB) Phase (deg) 25 30 35 40 45 50 0





225 10


10 Frequency (rad/sec)


Figure 4.22: Magnitude and Phase Bode Plots for Gi (s).


Comparison of Input and Output of Smooth Pursuit Model

0.1 input output 0.08



Amplitude (V)

















Number of Samples

(a) Graph
Comparison of Input and Output of Smooth Pursuit Model
0.061 input output


Amplitude (V)












Number of Samples

(b) Close up

Figure 4.23: Graphs showing the input and output of the model whose Simulink
code is shown in Appendix B. The two are very closely following each other although there is a very slight oset present, which is observable in the close-up in (b).


order low pass lter in cascade with the inverse transfer function. H(s) = 1 Gi (s)[1 + si ]4 (4.40)

i may be chosen so that for the frequencies of interest i << 1. i = 0.0002s was considered a sucient value to pass eye movements due to smooth pursuit. In simulation of the full model, the predictor was also modelled as a phase lead network. The phase lead network was modelled as a predictor P(s) with a transfer function of the form P (s) = kd sd 1 + sd (4.41)

At the frequency of the input the predictor should cancel out the eect of the time delay element, L i.e. the angle of the predictor should be equal to the angle of the time delay tan1 (d ) = L (4.42) 2 With = 5.027 rad/sec and L=0.267s, d = 0.0443s. kd is then chosen so that the gain of the predictor at = 5.027 rad/sec is 1. kd 5.027 0.0443 = 1 1 + (5.027)2 (d )2 kd = 4.6004


These values were used to model the predictor block in Simulink. It was found that the error e was not always exactly zero. However, it was always within the deadzone of the non-linear element (i.e. |e| < 0.03) and thus the output of this block was zero, and the portion consisting of NL(e) and C2 (s) is eectively disconnected while the eye is in smooth pursuit, as hoped. Development of these mathematical models which describe saccadic and smooth pursuit eye movements led to a greater understanding of the physical processes involved during motion of the eyes. This knowledge was invaluable when considering how vestigial eye movements could be harnessed for communication and control purposes for physically disabled persons. In particular, the study on smooth pursuit eye movements was benecial when developing the Target Position Variation algorithm described in Section 4.2.5. 118


Electrodermal Activity as a Control Signal


The eld of electrodermal activity (EDA) includes any electrical activity measurable on the surface of the skin. Electrical signals that are commonly measured are the skin resistance, the skin conductance, the skin impedance and the skin potential. These signals are modied by activation of the sweat glands, which changes the electrical properties of the skin. Although the sweat glands are part of the autonomic nervous system and therefore not usually considered to be under voluntary control the sweat glands can be consciously activated under certain circumstances. The main function of the sweat glands is thermoregulation, but they are also activated due to emotional stress. Thus they may be consciously activated by willing oneself into a tense state. When sweat glands are activated, the skin resistance decreases. The time taken for the skin resistance to decrease following a conscious eort to tense can be quite slow. It can take about 27s for the response to reach peak. Recovery time is considerably slower and varies greatly, it can take anywhere from between 1-30s to return to 50% of baseline [68]. The length of time taken to produce a response to a voluntary action makes electrodermal activity an extremely slow method of control and communication for severely disabled people. However in some circumstances, it may be the only feasible option. Since the time taken to return to baseline is quite long, it may be more suitable to applications where many switching actions are not required to be performed quickly, such as in environmental control. The anatomy and physiology of the skin is briey discussed before electrodermal activity is discussed in more detail. The feasibility of using skin resistance as a control signal for people who are severely disabled is explored by performing some experiments where voluntary responses are elicited. Fi119

nally a non-invasive method of measuring the ring rate of the sympathetic nervous system from measurement of the skin conductance is proposed, based on a model developed by Burke [69] for the skin conductance.


Anatomy and Physiology of the Skin

The Autonomic Nervous System The autonomic nervous system (ANS) has already been briey described in the last chapter and the dierent types of nerve bres in the body have been discussed. The visceral aerent nerve bres and the autonomic eerent nerve bres are the two types of nerve bres of the ANS. The ANS controls automatic functions of the body such as arterial pressure, sweating and body temperature and is not normally under voluntary control. The ANS is made up of three separate systems: - the sympathetic system, the parasympathetic system and the enteric system. The enteric system, responsible for the gut region of the body, is usually considered separately. The sympathetic nervous system and the parasympathetic nervous system are both responsible for transmitting ANS signals through the body. The two systems operate in antagonism, e.g. the sympathetic nervous system is responsible for pupil dilation while the parasympathetic nervous system controls pupil constriction. Sweat gland activity is controlled by the sympathetic nervous system.

The Sweat Glands Skin all over the human body contains sweat glands, which are activated by the sympathetic nervous system. There are two types of sweat glands in the body, apocrine sweat glands and eccrine sweat glands. Apocrine sweat glands are found in the underarms and on the genitals, around the nipples and the navel. The sweat produced by apocrine glands is a thick viscous uid. The exact function of the apocrine sweat glands is not fully understood although they 120

are thought to play a role in producing sexual scent hormones (pheromones) [70]. They do not play a role in electrodermal activity so are not of much importance here. Eccrine sweat glands are found on the surface of the skin over most of the body, but are especially dense on the palms of the hands (palmar surfaces) and the soles of the feet (plantar surfaces). The eccrine sweat glands secrete a thinner, more watery uid than the apocrine sweat glands. Their main function is thermoregulation. The sweat glands secrete moisture onto the surface of the skin, which is evaporated into the air. The process of evaporation requires latent heat, which is taken from the skin thus cooling the skins surface. Eccrine sweat glands are also activated in response to emotional arousal, such as anxiety or rage. The reason for emotional eccrine sweating is thought to be a product of the evolutionary ght or ight response [69]. Emotional sweating improves grip, gives greater tactile sensitivity and increases the skins resistance to cutting or abrasion of the skin. The eccrine sweat glands on the palmar and plantar surfaces respond more strongly to emotional stimuli than heat stimuli, whereas the opposite is true of the glands found on the forehead, neck and back of the hands. Figure 4.24 shows the main parts of an eccrine sweat gland through the dierent layers of skin. Skin consists of three main layers, the dermis, the epidermis and the subdermis. The secretory portion of a sweat gland is located in the subdermis. When a sweat gland is activated, this portion lls up with sweat, which is a mixture of salt, water and urea. This uid travels up through the ducts in the dermis layer and deposits moisture on the skins surface, through the sweat pores in the epidermis.


Electrodermal Activity

The study of electrodermal activity (EDA), or the electrical responses measurable at the surface of the skin, is common in the eld of psychophysiology [71]. 121


Stratum Corneum

} }


Eccrine Sweat Duct Secretory Portion of Sweat Gland



Figure 4.24: Sweat Gland. The stratum corneum acts as a variable resistor with
decreased resistance due to sweat.

There are two main approaches to measuring EDA, the endosomatic method and the exosomatic method [72]. Activation of the sweat glands, either by an increase in body temperature or in response to an emotional stimulus, causes both an increase in skin conductance (or a decrease in skin resistance) and a change in the skin potential. Either of these features can be used to measure the degree of sweat gland activation. The endosomatic method measures these changes in potential at the skin surface. The exosomatic method measures the skin resistance or conductance across an area of skin. The endosomatic method typically uses invasive electrodes [73] so the exosomatic method is used here. This will now be described in more detail. The change in skin conductance due to sweat gland activity was rst observed by Fr in 1888. It is now a well observed phenomenon that skin conductance increases when a person becomes emotionally aroused [69]. As already mentioned, when a person becomes anxious or angry, the ight or ght response is invoked, which activates the eccrine sweat glands. The skin conductance increases when the sweat glands are activated due to a number of factors, but largely because sweat is the equivalent of a 0.3% NaCl solution, 122

and hence a weak electrolyte [74]. Skin is a good conductor of electricity and skin conductance can be measured by passing a small current through an area of the skin. The number of sweat glands that deposit sweat on the skin surface is approximately proportional to the number of conductive pathways on the skins surface and thus measuring the skin conductance gives some indication of the number of sweat glands that are active [72]. Skin conductance is usually classied into two levels, the tonic skin conductance level and the phasic skin conductance level. The tonic skin conductance level is the baseline level of skin conductance, and is also called the Skin Conductance Level (SCL). The phasic skin conductance is the response of the skin conductance to an event and is also called the Skin Conductance Response. SCRs may last 10-20s before returning to the SCL. A number of dierent models of the process of skin conductance exist, the most widely accepted of these is the Sweat Circuit Model (also known as the Poral Valve Model ). This was developed by Edelberg in 1972 [68]. According to this model, when the sweat ducts begin to ll with sweat the skin conductance increases which causes the phasic skin conductance response. The skin conductance returns to its tonic level when the sweat is deposited on the skin or reabsorbed by the sweat glands.


Skin Conductance as a Control Signal

Some preliminary experimental data was obtained to explore the possibility of using conductance of the skin as a control signal. The circuit used is given in Appendix E. It uses two electrodes which are attached to the medial phlanx on the index and middle nger. The voltage output of the circuit is proportional to the skin conductance. The data is read into the computer using the NIDAQ 6023E data acquisition card and sampled in MATLAB at 200Hz. For the experimental setup used and the particular position of variable


x 10

Skin Conductance


Amplitude (S)









Time (s)

Figure 4.25: Electrodermal response. The subject is asked to tense up after each
50s interval.

resistor, the skin conductance G, in Siemens, may be calculated from the voltage at the output of the circuit according to the following equation, which is obtained from analysis of the circuit in Appendix E. G= 1.405 e0 105 Siemens 5.252 (4.44)

The measured data is shown in Figure 4.25. The subject was asked to tense up at t=50s, t=100s, t=150s, t=200s and t=250s. It appears that at t=100s the subject did not tense up as it was perceived that the signal had not returned suciently to baseline at this point. At all of the other points where the subject tenses, a noticeable rise in the skin conductance is evident. This is a promising indication of the potential of skin conductance as a control signal for people who are very severely disabled.



Non-invasive Measurement of the Sympathetic System Firing Rate

Arising from the work on electrodermal activity, a non-invasive measurement technique for the ring rate of the sympathetic nervous system was developed. While this measurement technique is not directly related to communication and control for disabled people, it evolved as a side interest based on study of skin conductance and will be described here. It uses a model for the skin conductance developed by Burke [69], which is shown in Figure 4.26. This model has as its input the ring rate of the sympathetic nervous system, f , and as its output, the skin conductance, g. The output of this system, g, is observable through measurement techniques such as the one described above. However, the ring rate f is usually unobservable unless invasive instrumentation is used. A novel technique is described here which will allow the ring rate f to be observed based on the measured skin conductance g. A block diagram of the proposed measurement technique is shown in Figure 4.27. The sympathetic nervous system ring rate f is required and the skin conductance g is directly measurable. The parameters of the controller C(s), were tuned to get gm , the output of the skin conductance model, to follow the measured skin conductance g as closely as possible. If the controller can be adjusted so that e = g gm = 0 unobservable parameter, f . The controller C(s) used is a PID controller whose transfer function is of the form C(s) = kp + kd s ki + s 1 + s (4.46) (4.45)

then it is hypothesised that the output of the system y will be equal to the

To tune the controller, the loop in Figure 4.27 was set up with the real input block replaced by the skin conductance model. An artically synthesised ring rate was input into this block. It was hypothesised that if the controller could 125

Figure 4.26: Model of Skin Conductance, from [69]


Actual Skin Conductance

+ gm _


Firing Rate Immeasurable Noninvasively

Skin Conductance Model Skin Conductance Measurable

Figure 4.27: Proposed loop which allows skin conductance measurements to be used
to observe the sympathetic system ring rate

be tuned for this artical ring rate, which had very severe variations, then the controller should also work with the measured input. The values that gave the best match are as follows. ki = 12; kp = 90; kd = 10; = 0.1 (4.47)

Using these parameters, the complete loop was simulated in Simulink. The full Simulink block diagram is given in Figure B.4 in Appendix B. The input to the system was the measured skin conductance in Figure 4.25. The model developed by Burke [?] uses a standard value for expressing skin conductance which is microSiemens (S) so the measured data was scaled accordingly. The results of this simulation look very promising. A graph showing the measured skin conductance g and the output of the model gm is given in Figure 4.28. After a transient the modelled value follows the measured value of skin conductance almost perfectly. The ring rate y produced at the output of the model is shown in Figure 4.29 with the measured skin conductance. The measured skin conductance was from the experiment described in Section 4.3.4 where the subject was asked to tense up at 50s intervals. The increase in ring rate at the times that the subject attempted to tense up are clearly visible.


Measured skin conductance g and modelled skin conductance gm 1 g gm 0.9



Amplitude (V)













Time (s)

Figure 4.28: The modelled value of skin conductance gm and the measured value
of skin conductance g


Measured Skin Conductance, g and Firing Rate y 4 g y 3.5

Amplitude (V)










Time (s)

Figure 4.29: The input to the model,g, the skin conductance,which was measured
experimentally and the output of the model, y, the sympathetic nervous system ring rate




This chapter has described two biosignals, the electrooculogram and the conductance of the skin. In discussion of the electrooculogram, some alternatives to the electrooculogram for measuring eye movements were described, including the magnetic search coil technique, corneal reection technique and limbus tracking. Some advantages of using the electrooculogram instead of these other methods to measure eye movements were then given. The EOG may have a larger range than visual methods and it is generally cheaper to implement. The EOG amplitude is linearly related to the eye angle for small angles and may be used in real-time applications. It is not aected by obstacles such as glasses in front of the eye and it permits eye closure. Depending on the application, it may permit head movements and may be used in variable lighting conditions. Some limitations of the EOG are also discussed, the main one being the problem with baseline drift. This usually requires manual re-calibration of the ampliers when it occurs. A novel method called Target Position Variation is developed, which enables automatic software re-calibration of the eye position when baseline drift is evidenced. Target Position Variation looks like a promising approach that can be used with the EOG to provide automatic re-calibration of the eye position or to use in a menu selection program. A control model for the eye is also developed which models a saccadic and a smooth pursuit eye movement. The saccadic model ts well with experimental data. Electrodermal activity is also briey explored as a control signal. Conductance of the skin was chosen as the electrodermal phenomenon to measure. The user can elicit a voluntary change in their skin conductance by tensing up or imagining themselves in a state of stress or anger. The time taken for the response to return to baseline is very slow, meaning it would only be a feasible option in cases of very severe disabilities where there is no preferable alternative. A method for measurement of the sympathetic nervous system


ring rate is also described. Results seem to indicate that this method could be used to provide a low-cost, non-invasive tool for monitoring the ring rate, possibly for clinical applications.


Chapter 5 Visual Techniques



This chapter describes visual techniques for obtaining vestigial signals from the body. These techniques detect movements using computer cameras or other light detection devices. For various reasons, measurement of body signals by the methods already discussed may not always be suitable or possible. For example, a person requiring the use of a control or a communication system may nd it uncomfortable to have electrodes axed to the skin. This may be especially true when dealing with children. Also, disabled people often do not like to use anything that visibly draws attention to their disability. Thirdly, electrode based systems may be impractical if the person is prone to heavy perspiration, as it may be dicult to keep the electrodes in place. Finally, clothing may make it impractical to measure the movement that the person is capable of making using the EMG or MMG. Visual techniques can oer disabled people a non-contact solution for computer interaction. Often a person who has become disabled will retain the ability to make slight movements or rotations of a nger or toe. If these movements are repeatable then they may be used to indicate intent through


observation of these movements with a computer camera. This chapter rstly describes some video analysis techniques that have been developed by others in Section 5.2. A system developed as part of this thesis to investigate visual methods of detecting vestigial ickers of movement is then described in Section 5.3.


Visual Based Communication and Control Systems

Two applications for communication and control for disabled people using a computer camera developed by others will now be described. Both of these are based on tracking movement of a body part to control a mouse cursor on screen. The rst system, the Camera Mouse, uses a template matching technique to track a particular body part. The second system tracks motion based on movement of the reected laser speckle pattern of skin.


The Camera Mouse

The Camera Mouse is a visual based system developed by Betke, Gips and Fleming [75] to provide computer access for people with severe disabilities. The system developed tracks movements of a particular body feature and uses this to control movement of a mouse cursor on screen. Various body features are explored in [75], including the tip of the nose, the lips, the thumb and the eyes. The system developed uses two computers, a vision computer (the tracker) and a user computer (the driver). The vision computer receives the video signals, interprets the data and sends the appropriate control signal to the user computer. Initial setup of the system is performed on the vision computer. The user or a helper clicks on the body feature in the image that is to be tracked, and adjusts the camera pan and tilt angles and zoom so


Figure 5.1: Camera Mouse Search Window, from [75]. the desired body feature is in the centre of the image. A template of the body feature to be tracked is stored and subsequently used to determine the co-ordinates of the body feature position. These co-ordinates are sent to the user computer to control the mouse position. A search for a template match is performed within a search window. The search area is centred around the estimated position of the feature in the previous frame, as shown in Figure 5.1. The template is shifted around the window and the correlation between the template and the test template area is calculated. The best correlation is used as the centre of the search window in the next frame and the centre of this area is the new estimate of the feature position, which is used to determine mouse cursor position. A mouse click is performed by dwelling on a region for a certain length of time. The nose was found to be a very reliable feature to track, as it tends to be brighter than the rest of the face and does not become occluded when the user moves their head signicantly. The eyes have a distinctive template but may be dicult to move while simultaneously viewing the screen and they also may often be blocked by the nose. The lower lip and lip cleft have a good brightness dierence and hence are a good feature to use, particularly for users who may not have the ability to move their head. The thumb did not work very well, as it was dicult to focus the camera on the centre of the thumb.



Reected Laser Speckle Pattern

This system, developed by Reilly and OMalley [76], is a visual-based motion detection system for communication and control based on the reected laser speckle pattern. When a laser beam is shone on a scattering object, a speckled pattern will be reected due to the roughness of the surface, as shown for the surface of the skin in Figure 5.2. If the surface moves, the speckle pattern will also move proportionally. Based on this principle, movement of a body part can be estimated by monitoring the skins reected speckle. The movement is converted into two dimensional cursor co-ordinates to move a mouse cursor on a computer screen. Two techniques of generating mouse click actions are considered with this system, intensity variation and dwell time. The system uses two laser diodes as emitters, and two linear charge-coupled device (CCD) arrays as detectors, one for each axis. Motion estimation is achieved through correlation of two consecutive frames. One of the problems with using human skin as the scattering surface for motion detection is that in addition to changes in light diusion due to movements of the skin surface, there will also be small changes in light diusion due to the constant ow of blood under the skin. This undesirable speckle pattern variance is referred to as sensor interference and is in the range 20-100Hz. Also tremors, which are described in more detail below, may introduce noise in the range 1-15Hz, but this noise may be removed by low-pass ltering the signal. Mouse clicks may be generated by varying the intensity of the speckle pattern, by moving towards or away from the sensor. Alternatively, dwell time may be used to generate a click by holding still for a set length of time.


Figure 5.2: The reected speckle generated from skin, from [76]


Visual Technique for Switching Action


This section describes an application developed as part of the work presented here. The application actuates a switching action upon detection of movement. Two dierent approaches were tested in development of this application, which will be described here under the titles of frame dierence method and path description method. Program operation using the frame dierence method generates a switching action each time any arbitrary movement in front of the computer camera is recognised. This method has many problems, especially when the symptoms characteristic of a typical disabled user are considered. If head injuries include damage to the central gray matter of the brain (the basal ganglia), then the person may often suer from any of a class of movement disorders known as the dyskinesias [77]. These disorders cover a range of excessive abnormal involuntary movements. Tremor is one of the dyskinesias, characterised by a low frequency (1-15Hz) rhythmic sinusoidal oscillatory movement of a body part resulting from the involuntary alternating contraction and relaxation of opposing groups of skeletal muscles [76]. Chorea is another, consisting of irregular, unpredictable, brief jerky movements and is common in the hereditary Huntingtons disease [77]. Obviously if the user


suers from any of these dyskinesias, then their movements could generate unintentional switching actions. Also any large background movement may trigger the program to incorrectly detect movement. However, the frame detection method is presented here as it may be useful in a limited number of applications, and has the advantage over the second approach in that it does not require any initialisation before use. The second approach, the path description method, aims to actuate a switching action only when one particular dened movement is recognised. This is designed for people who may only have a slight movement such as a icker of a thumb. An initialisation procedure is necessary to dene the path description, which requires the aid of a therapist or a helper. At the beginning of each session, the user performs the voluntary action that they intend to use to actuate the switching action. The program records the movement and uses it to calculate the parameters that describe the path of motion, which will allow subsequent movements by the user to be compared to this particular action. An original algorithm is described that uses a six-dimensional plane based on the centre of brightness of the red, green and blue pixels in the horizontal and vertical directions. Two distinct regions within this plane are dened that are entered during the path of motion. Movement between these two regions can then be used to control a two-way switch in software, which enables operation of any application requiring a switching action, such as the Natterbox program described in Chapter 2. This method works best if the nger or moving part is placed on a dark background. While this may seem a very restrictive requirement, it is envisaged that in a working system using this method the black background could be provided as part of the necessary application equipment, attached beneath the computer camera.


Figure 5.3: Webcam used with the system described in Section 5.3


Technical Details

The computer camera or web-cam used with this system is shown in Figure 5.3. It is an iBot2 USB web-cam made by Orange Micro, which has been specically modied for this application. The camera has been mounted on a sti stand with a moveable arm to enable it to be easily aimed at the moving body part. This step was performed to enable the system to be easily tested, however, the program should work with any computer camera, provided it can be suitably mounted to point at the moving body part. This technique oers an inexpensive way of interfacing a user with a computer. Computer cameras are becoming increasingly low-cost and are often even integrated into modern computers. The software program was written using some of the DirectX 9.0 application programming interface (API) libraries. DirectX provides a method for software programs in Windows applications to interface directly with hardware devices connected to the computer. The graphical user interface was developed using Direct3D and video capture and rendering was achieved using Direct Show.


Capture Filter

Transform Filter

Video Renderer Filter




Figure 5.4: Filter Graph used for Video Data in application. A DirectShow application is based on the concept of the lter graph. A lter graph is basically a chain of lter blocks connected together in software. Dierent lter blocks may be added to the lter graph depending on the individual requirements of a particular program. For the application here, the lter graph is made up of three lter blocks - a capture lter, a transform lter and a renderer lter. The capture lter is required to pass data from the computer camera to the computer. The transform lter processes and interprets this data. The renderer lters takes the data and outputs the processed video frames on screen. Each lter block has its own set of specic properties which can be modied through use of the appropriate interface. The lter graph manager handles the ow of data through the lter graph. A block diagram of the lter graph used in this application is shown in Figure 5.4. The transform lter used in this lter graph was developed specically for this application by creating a class of lter called CMyTransformFilter, which inherits from the standard Direct Show lter type CTransformFilter. Most of the processing and analysis of the video data is done by the Transform function of this lter. The steps involved in the program are described in Table 5.1. The frame comparison method will rst be described, and then the path description method.


Frame Comparison Method

Step 1: Video Capture The capture lter used is of type IBaseFilter. The computer camera is accessed by the program by creating a list of video capture devices connected 139

Table 5.1: Steps involved in the application, for both the Frame Comparison Method
and the Path Description Method Step Path Description Method Frame Comparison Method Step

1 2 3 4 5 6 7 8 9

Video Capture Low Pass Filtering Reduction of Data Centre of Brightness Calculations Path Description Calculation of Regions Denition of Region Spaces Switch Actuation Rendering Frame Comparison

1 2 3

5 6

to the computer. The program retrieves the moniker to the video capture device at the top of the list, which should be the computer camera. This moniker is then used to create the capture lter and associate the camera with that lter block. The desired parameters for video capture are set through the interface IAMVideoProcAmp. The camera parameters used are shown in Table 5.2. The automatic brightness control feature is turned o by setting the camera exposure time to a constant, since the automatic brightness feature interferes with analysis of the data later. The format for the data is set to RGB24, which means each pixel is described using 24 bits of data, or 3 bytes - one each to describe the intensity value of red (R), green (G) and blue (B) in that pixel. Table 5.3 shows some sample pixel values for dierent colours, and their hexadecimal values.

Step 2: Low Pass Filtering The interpretation of the data received is performed by the transform lter, which was written specically for this application. The handling of the data by the transform lter can be divided into two parts: - image processing and image 140

Table 5.2: Video Capture Parameters Parameter Brightness Contrast Camera Exposure Frame Resolution Width Frame Resolution Height Format Value Mid-range High Constant 320 pixels 240 pixels RGB24

Table 5.3: The RGB24 format for some sample colours Colour Red Green Blue Yellow Cyan Magenta White Black Midgrey Red 255 0 0 255 0 255 255 0 128 Green 0 255 0 255 255 0 255 0 128 Blue 0 0 255 000 255 255 255 0 128 Hexadecimal FF0000 00FF00 0000FF FFFF00 00FFFF FF00FF FFFFFF 000000 808080


analysis. Step 2 (ltering) and Step 3 (reduction of data) may be thought of as image processing stages. The image is analysed in Step 4 (frame comparison), and if sucient movement has occurred, a switching action is performed. For the path description method which is discussed later, the image analysis is more complicated and is described in Steps 4-7. Filtering of the video frames is necessary to remove noise from the captured image, which is most typically evidenced by spurious pixels with large dierences in value to their neighbouring pixels. A simple averaging lter was found to give suciently good results. More complex ltering methods are discussed in [78]. The lter sample size is n, where the sample is an n n subset of the video frame. This size is denable by the user since dierent lighting conditions and dierent computer cameras may necessitate dierent amounts of ltering. The data received may be thought of as three two-dimensional arrays of size 320 240 called Xr , Xg and Xb , where xr [i][j], xg [i][j] and xb [i][j] are the red, green and blue components respectively of the pixel value in the ith row and j th column. The ltered pixel values, xr [i][j], xg [i][j] and xb [i][j] may be calculated for each colour using the following formula:
k=i+n l=j+n

x[k][l] x[i][j] =
k=in l=jn

(2n + 1)2


Some ltered video frames showing the eects of dierent lter sizes are shown in Figure 5.5.

Step 3: Reduction of Data It was found that reducing the image to an 8-level image gave more reliable results and also greatly reduced the computational complexity of the later analysis. As mentioned already, each pixel of the video frame received consists of three bytes (RGB24 format), one byte each for red, green and blue. Each of 142

(a) n=0

(b) n=1

(c) n=3

(d) n=4

(e) n=6

(f) n=9

Figure 5.5: The eect of dierent lter sizes on a single video frame these bytes normally has a value in the range 0255. Reduction of the image to an 8-level image is an extension of the concept of bi-level images, which are discussed in Chapter 2 for black and white images by Parker [79]. Black and white images may be more accurately termed as grey-level images, since they are actually made up of a spectrum of grey levels ranging from black to white. Bi-level images are produced from grey-level images by compressing the range of greys until only two levels remain, a process known as thresholding the image. This idea may be extended for colour images by translation of the original colour image to an 8-level image. As mentioned before, each pixel in the original image is represented by 3 bytes, or 24 bits, of data. This may be reduced to 3 bits of data by assigning each of the bytes either the value 1 or the value 0, resulting in 23 = 8 possible colour combinations. The value assigned to each byte is decided based on thresholding the data. If a pixel value has its red, green or blue component above its respective threshold, then the red, green or blue value of that pixel is turned on i.e. given the value 1, if its original value is below the threshold then that pixel is turned o i.e. 143

given the value 0. Note that the renderer used to present the video image on screen requires RGB24 format so that a pixel value of 1 was remapped to a value of 255 in the nal stage, but for the purposes here it is sucient to think of the pixel values in 3-bit binary form. The biggest problem lies in choosing an appropriate threshold value for each of the three colours red, green and blue. There were a number of dierent methods considered for calculating this threshold, which will now be discussed. 1. The threshold could be chosen midway between the two extremes i.e. at 128. This method is not particularly eective as it is very sensitive to lighting conditions, if the room is bright then a lot of the picture will be white, if the room is dark then a lot of the picture will be dark. The eect of this type of thresholding is shown in Figure 5.6(b) for the example frame in Figure 5.6(a). As the original image is quite dark, only the brightest line along the centre of the nger is visible in the nal image. 2. Alternatively the median value can be used, which is the level with an equal number of pixels above and below this value. This method gives better results than the rst method, although it is sensitive to the relative size of the body part and the background. The eect of this type of thresholding is shown in Figure 5.6(c). In this particular image, as the nger is much smaller than the background, some of the background is articially lightened. 3. Another option is to use the mean pixel value as the threshold for each of the three colours. The mean value X of each video frame may be calculated as: X=
i=239 j=311

i=0 j=0

240 320


The eect of using the mean value as the threshold is shown in Figure 5.6(d). Like the median, it is sensitive to the relative sizes of the foreground and background. 144






Figure 5.6: Various Thresholding Methods.(a) Original video frame showing a nger
(b) Threshold is chosen at 128 (c) Threshold is chosen at the median (d) Threshold is chosen at the mean (e) Threshold is chosen based on nding peaks in a bimodal distribution


Figure 5.7: Typical histogram of one video frame showing the number of pixels with
each value from 0 to 255, for the red, green and blue channels.

4. The fourth method considered is based on calculation of three histograms of the image for each video frame received, one for each of the red, green and blue channels. Each histogram is produced by counting the number of pixels in the frame with each possible value from 0255 and creating a chart containing each possible level on the horizontal axis and the number of pixels in the frame with each level on the vertical axis. A typical histogram distribution is shown in Figure 5.7 for a pale hand on a black background. The total area under each curve is 240x320 pixels. If the image is a bimodal image (e.g. an image with one clearly recognisable bright region and one clearly recognisable dark region such as the nger on a dark background) then the histogram should have two distinct peaks, corresponding to a bright peak and a dark peak. The histogram shown in Figure 5.7 can roughly be considered an example of a bimodal histogram with two distinct regions. The signicance of the white and black lines will be explained shortly. Theoretically, the lowest point between these two peaks is the ideal threshold point to use to reduce the data to an 8-level image but the diculty lies in accurately locating these peaks, as discussed in Chapter 5 in Parker [79]. The threshold can be chosen either midway between these two points or at the minimum value between these two points. For this system, the minimum value was chosen. For darker skin tones, the two peaks would 146

be closer together but the black background should still be much darker than the darkest possible skin tone, and the distant between peaks can be increased if necessary by increasing the camera contrast. The eect of this type of thresholding is shown in Figure 5.6(e). The two peaks in the bimodal histogram were found using the following method, for each of the three colours. The histogram may be written as a 256 element array Y = [y0 , y1 , , y254 , y255 ]. A deadzone, w, is user-dened which should represent the expected width of the peaks. The best value for this depends on the contrast of the two colours in the image but a value of 30 was found to work well with the image of the hand on the black background above. The width is adjustable by the user or therapist as dierent values may work better with dierent cameras, scene contrasts and lighting conditions. The rst peak is found by searching the data to nd the level of maximum value. The rst and last 10 levels are ignored since an abnormally large percentage of the pixels will have these values due to noise. The rst maximum index is labelled i1 where yi1 = max yj


The area around this peak is marked as {p1 , p2 } = {(i1 w), (i1 + w)}. The second peak i2 is then found by searching the histogram data for another maximum outside the area of the rst peak. yi2 = max max yj , max yk
p2 <k<245



Due to noise, each video frame often has a number of pixels that change value sporadically from one frame to the next. This is undesirable since when it occurs the two maximum indexes calculated, and thus the threshold level, will jump about from frame to frame. This creates ickering of the image which may be misinterpreted as actual pixel changes due to movement. This problem was overcome by weighting the two maximum values based on previous values 147

as follows. w1 may be set by the user but a value of 0.5 seems to work well. w2 = 1 w1 i1w = w1 i1 + w2 i1wp i2w = w1 i2 + w2 i2wp (5.5)

i1w and i2w are the new weighted values of i1 and i2 and i1wp and i2wp are the weighted values of i1 and i2 calculated for the previous frame. The initial values of these indices are arbitrarily chosen at 50 and 200 but the inuence due to these two values should be negligible within a few seconds. For the frame shown in Figure 5.7, the two white lines represent the weighted values of the two peaks detected. The threshold point nally chosen is the index T which is represented by the black line. This value is found by searching between the two indices i1 and i2 to nd a minimum, and weighting this index by previous values as for the two maxima. Based on this value T, three new two dimensional arrays Br , Bg and Bb are calculated from the ltered arrays Xr , Xg and Xb . These new arrays consist solely of boolean or binary numbers, i.e. numbers which can only have the value 0 or 1. Each element of the new array is calculated based on the following rules for each of the three colour channels: If x[i][j] < T B[i][j] = 0 If x[i][j] T B[i][j] = 1 Step 4: Frame Comparison The previous frame may be described by three boolean arrays Cr , Cg and Cb , where cr [i][j] is the pixel value in the ith row and j th column for Cr . The calculation of the frame dierence BC between the current frame and the


previous frame may be written using the following equation:

i=311 j=239

BC =
i=0 j=0

(b[i][j] c[i][j])


The symbol comes from the Latin word aut, which means or, but not both and is used here to represent the binary operator XOR, or Exclusive OR. The symbol is also commonly used to denote this operator. This operation will output a value of 1 if one of the two operands is 1, but a value of 0 if both of them are 1, or both of them are 0. This process may also be described in terms of its software implementation i.e. the program compares the value of each pixel in the current frame with the value of the pixel in the same position in the previous frame, and each time they are dierent, BC is incremented.

Step 5: Switch Actuation The value of BC represents the number of pixels that have changed between the current frame and the previous frame. A switching action is actuated if the value of BC is greater than a threshold. The most appropriate threshold to use depends on a number of factors so it is user denable. If the program is only required to respond to large movements then a higher threshold should be used. If the camera is very far away from the moving body part then perhaps a lower threshold would be more suitable. The switching action is actuated by simulation of an F2 key press. This particular key value was chosen as it is the expected input for the Natterbox program described in Chapter 2. It is possible to change the simulated key press to another keyboard value for operation with any other software program requiring a dierent single key press.

Step 6: Rendering The renderer lter block in the lter graph is also of type IBaseFilter. This lter block is used to display the video frames on screen. The type of ren149

derer is a Video Mixing Renderer9. The interface IAMStreamConfig is used to set the display size to height 240 and width 320. The renderer receives the processed video frames from the transform lter, with one modication. In order to display correctly on screen in RGB24 format, the pixels with value 1 are mapped back to a value of 255. Therefore each of the three colour channels of all the pixels in the displayed image will have either the value 0 or the value 255, resulting in eight possible colour combinations in RGB24 format red, green, blue, black, white, cyan, magenta and yellow. All of the eight level pictures that have been included in this chapter are produced by this renderer.


Path Description Method

Many of the steps in the path description method are identical to those described in the frame comparison method and so will not be covered here. These steps are video capture, low pass ltering and reduction of data. However, the image analysis process for this method is dierent, and this will now be described.

Step 4: Centre of Brightness Calculations The centre of brightness for each of the three colours is calculated as follows. The width of the image, W,is 320 pixels and the height of the frame H, is 240 pixels. As before, the image data are in three binary two dimensional arrays called Br , Bg and Bb . br [i][j] is the value of the red pixel in the ith row and the j th column and may be either 1 or 0. The average of row i, xi is calculated by

xi =



from which the global average X is calculated as






The centre of brightness for the row i is COBxi and is calculated as follows.

COBxi =


(b[i][j] j) xi


The overall x-coordinate for the centre of brightness COBx is then


COBxi COBx = and for the y-coordinate COBy is

i=H i=0


COBy =


(xi i) X


The calculated centres of brightness for the little nger in Figure 5.8(a) are shown in Figure 5.8(c). Note that the actual centres of brightness are at the centres of the boxes, the boxes are enlarged for demonstration purposes. The centres of brightness of the red and the green values appear to be in exactly the same place and so the two boxes showing the centres overlap to give a yellow box (red + green = yellow).

Step 5: Path Description As discussed before, the path description method is based on recording the path of the movement. Now that the centres of brightness have been dened, the exact method used to describe the path may be explained. The therapist or helper must press a start button to begin recording the movement. The subject then makes their movement and the program records the path P by recording the six centres of brightness for each frame. At the end of the movement the therapist presses a stop button. If the number of frames is N, then the size of








Figure 5.8: Path Description Method, see text in Section 5.3.4 further explanations.


where xr [i] and yr [i] are the x-coordinates and y-coordinates of the centre of brightness of the colour red in frame i and V[i] is the 1 6 vector whose six points may be referred to as: V [i] = {V [i][1], V [i][2], V [i][3], V [i][4], V [i][5], V [i][6]} = {xr [i], yr [i], xg [i], yg [i], xb [i], yb [i]} (5.14)

P will then be 6 N and may be described by the matrix x [1] yr [1] xg [1] yg [1] xb [1] yb [1] r xr [2] yr [2] xg [2] yg [2] xb [2] yb [2] P = . . . . . . . . . . . . . . . . . . xr [N] yr [N] xg [N] yg [N] xb [N] yb [N] V [1] V [2] = . . . V [N]



This process is shown in Figure 5.8 for movement of a little nger. The starting and nishing positions of the little nger are shown in Figures 5.8(a) and 5.8(b). The calculated centres of brightness are shown in Figure 5.8(c). Figure 5.8(d) and Figure 5.8(e) show the path that is traced out as a little nger is moved.

Step 6: Calculation of Regions The Euclidean distance between each set of six points, is dened between frame i and frame j as Dij :

Dij = Dji =

V [i][k] V [j][k]


for 0 < i, j < N. The Euclidean distances between each frame in the path and every other frame in the path are calculated by comparing each column in P. From this the indices k and l are recorded, corresponding to the two frames with the maximum Euclidean distance between them. P1 = V [k] and P2 = 153

V [l] are then recorded as the two region points which are used to dene the two regions corresponding to switch closed and switch open. The two sixdimensional region points calculated for the example of a little nger moving are represented by two sets of three two dimensional boxes in Figure 5.8(f). The centres of brightness are close to each other in this example so the boxes overlap and a white box is displayed, but this may not always be the case, as discussed below.

Step 7: Denition of Region Spaces Since they are six dimensional regions it is hard to visualise how the two regions are formed but an attempt at a description is shown in Figure 5.8(f). The regions are enclosed by three boxes around the region points, one for each of the red, green and blue channels. A six-dimensional threshold T is dened with each element calculated as: Ti = P1 [i] + P2 [i] for 0 < i < 6 2 (5.16)

Based on these values, a two-dimensional region is dened for each of the three colours corresponding to the two dierent states. An example of this for one colour is shown in Figure 5.9. The two crosses inside boxes represent P1 (at {60, 180}) and P2 (at {240,40}) for that colour and the two shaded areas represent the corresponding regions. The thresholds are calculated as: T1 = 60 + 240 2 = 150 180 + 40 = 2 = 110



from which the regions R1 and R2 may be dened as follows. If x < 150 and y < 110 {x, y} R1 If x > 150 and y > 110 {x, y} R2 154

60 120 180 240 80 160 240 320

Figure 5.9: The two regions calculated from the two furthest points, indicated by crosses In many cases the centres of brightness for each of the three colours will be in the same position, or close to the same position, as appears to be the case in Figure 5.8(f). In this example, each of the three pairs of region spaces appear to overlap each other closely and the six-dimensional region spaces are eectively mapped onto a two-dimensional space. In cases like this, it may seem redundant to use the three colour channels for the path and region description, and it may appear that one colour should just be chosen. However, there are certain instances where only one of the three colours will move or the colours will move by dierent amounts and using all six parameters may allow for a wider range of movements to be performed. The example in Figure 5.10 shows an example of where the three colours are not overlapping. In this example the movement performed by the user is the action of bending the thumb. However, this movement is complicated by the fact that the user is wearing red nail-polish, which shifts the coordinates for the red centre of brightness away from the other two in the initial position, as seen in Figure 5.10(a). However, when the thumb is bent, the nail area is not as large in the image as it is initially and the three centres of brightness overlap, as seen in Figure 5.10(b). Three two-dimensional representations of the six-dimensional regions calculated are shown in Figure 5.10(c). Figure 5.10(d) shows these more clearly. The centres of brightness surrounded by black and the corresponding lines drawn with black interleaving are all representative of the same region, i.e. P1 and R1 and the boxes surrounded in white show P2 and the 155







Figure 5.10: Overlapping


lines interleaved with white are R2 . The box in Figure 5.10(e) is enlarged in Figure 5.10(f). Inspection of this gure reveals a few facts. Firstly, in order to map the six-dimensional region to a two-dimensional region, it would be necessary to reduce the region sizes to the two regions that are shown surrounded in pink. Although the red part of P1 and P2 is inside this box, the blue and the green parts of P1 and P2 are not. Remembering that these points are the furthest points moved to during the thumb-bending, it is unlikely that these points would ever be inside this box and therefore a two-dimensional region would not be satisfactory using this method. Secondly, in the centre of Figure 5.10(f) it can be seen that R1 [3 6] and R2 [1] and R2 [2] overlap i.e. the initial position of the green and blue coordinates and the nal position of the red coordinates are almost in the same place. This indicates that using all six dimensions will allow for the movement to be more readily and accurately detected.



The generality of the application means that it can be adapted for use by people with a range of dierent movement abilities. For people with a large range of head movement it may be sucient to use the frame dierence method with a large dierence required between successive frames. The advantage of using a system such as this is that it is easy to adapt it to the movement that the disabled person is able to perform best i.e. the system is adaptable to suit a particular user, rather than the user having to adapt to suit the system. The centre of brightness method also oers potential for mouse cursor control. If the person has a greater range of movement, for example the ability to fully move their hand, then they can direct the centres of brightness around the screen by changing the position of their hand accordingly. While there are many important potential applications for a visual based mouse cursor control 157

system, this idea was not explored in great detail as part of this thesis, because the patients we worked with did not have adequate hand mobility. However it may oer a promising application for future developments. In summary, visual based methods oer a promising non-contact solution for detecting ickers of movement which may be harnessed for communication and control purposes.


Chapter 6 Acoustic Body Signals



Acoustic body signals may be dened as any sound or noise that can be produced voluntarily by the body. In this chapter, we deal only with acoustic body signals that have been created using the vocal organs of the human body and neglect sounds that could be produced using other parts of the body, for example, hand-clapping or foot-tapping. The primary acoustic signal produced by the vocal organs is speech, which is one of the most important modes of communication. People who have become disabled but retain the ability to produce speech are at an immense advantage over those without full speech production abilities. Recent advancements in speech recognition technologies and the demand for hands-free operation of everyday appliances and gadgets have made commercial speech-based environmental control systems readily available to the public. However, for people who have lost the ability to speak intelligibly, currently available speech recognition systems are generally not an option. Nonetheless, acoustic signal based systems may provide a useful channel for environmental control and communication for this group of people in certain cases. Often such people, while not able to coherently produce words or sentences, will still remain capable of voluntary and reproducible utterances, 159

such as single phoneme utterances, grunts or whistles. This chapter investigates how these utterances may be harnessed for communication and control purposes. Speech recognition technologies are reviewed in Section 6.2, and some of the common methods employed to recognise speech are described. In order to understand the characteristic shapes of acoustic signals that make them identiable as speech, it is necessary to give a brief outline of how speech sounds are created by the body. The process of speech production by the vocal organs is discussed in Section 6.3. A speech signal may be thought of as a continuous stream of phonemes, which can be dened as the basic sound units of speech often used by speech therapists, linguists and speech recognition engineers. The phoneme alphabet for the dialect of English used here in Ireland, HibernoEnglish, is introduced in Section 6.4. The characteristics of a speech signal may then be explored by considering the dierent types of phonemes and the physical processes performed by the vocal organs in making each of the dierent phoneme sounds. An attempt is made to dene features of dierent phonemes that will make them distinguishable from each other. Phoneme recognition as an alternative acoustic signal to speech recognition technology is discussed in Section 6.4.4 and some advantages of using this method are given. Some applications of phoneme recognition that have been developed as part of this work are presented, both in hardware (Section 6.5) and in software (Section 6.6).


Speech Recognition
Speech Recognition: Techniques

There are many and varied speech recognition based applications available commercially. The speech recognition technology market is dominated by


Scansoft Inc.1 , a Belgian based computer software technology company that manufactures the speech recognition suite Dragon Naturally Speaking, a desktop dictation program with recognition rates of up to 99%. Embedded speech recognition chips are becoming increasingly popular, and are frequently included in mobile phones enabling the user to record a persons name and associate a particular phone number with it for automated voice dialling. Speech recognition systems often have a large degree of variability in the methods used to interact with them. Some systems require that the speaker chooses one-word commands from a small vocabulary of words, while others attempt to recognise continuous speech from a large or unlimited vocabulary. Some are tailored to an individual users voice by training the system for a particular user. Other systems are expected to work with a broad range of speakers and dialects or adapt to the speakers voice over the time the application is in use. There are a number of dierent approaches used in an attempt to correctly recognise speech and three are briey described here: - the acoustic-phonetic approach, the articial intelligence approach and the pattern matching approach.

The Acoustic-Phonetic Approach The acoustic-phonetic approach was common in early speech recognition systems. The speech stream is analysed in an attempt to break the continuous stream of data into a series of individual units, called phonemes, within the stream. The series of phonemes is then analysed in an attempt to recognise words from the phoneme stream. There may be a large variability within each phoneme denition, due to speaker pitch and accent, and transducer variability, and thus it may be dicult to set accurate boundaries for each phoneme type. Co-articulation, where two phonemes spoken in quick succession blend into each other, may make it dicult to split the stream up accurately.



The Articial Intelligence Approach This approach attempts to mimic the process of speech recognition by the brain by taking phonemic, lexical, syntactic, semantic and pragmatic knowledge and using each of these to build up an articial neural network which learns the relationships between events and thus can make an intelligent guess at which word was spoken based on this knowledge. More information on articial neural networks can be found in [41].

The Pattern Matching Approach The pattern matching approach of speech recognition involves two stages: pattern training and pattern comparison. The system must be trained for each word or phrase that needs to be recognised by the system. In the pattern training stage, each word or phrase is spoken one or more times by the speaker or speakers. The speech waveform is analysed and a template pattern of the word or phrase is then built from the trainer data. Template patterns are built using the feature extraction method or the statistical method. The statistical method creates a statistical model of the behaviour of the training model and uses this as a template to compare with the received speech signal. The statistical model attempts to calculate the probability that a certain phoneme or sequence of phonemes was uttered based on features of the sounds. The most commonly used type of model in speech recognition is the Hidden Markov Model(HMM) [80, 81]. With the feature extraction method, the speech waveform is broken down into short fragments of the order of tens of milliseconds, that may or may not overlap in time. For each fragment a number of pertinent features are calculated. These features are used to build a parametric model of that speech sound. The features calculated may be spectral, temporal or both. Spectral parameters commonly used include the output from a lter bank, a Discrete


Fourier Transform (DFT) or linear predictive coding(LPC) analysis. Temporal features include the locations of various zero or level crossing times in the signal. In the pattern matching stage, the same parameters are calculated for the received sound wave and the calculated features are compared to the reference template set for each of the dierent possible words in the systems vocabulary. There are several dierent classication methods - discriminant functions are often used [70]. A classication score is recorded for each comparison and the word is recognised as being the one with the highest score. Due to variations in lengths of dierent segments of a word, the signal is often rst time warped to get the best t [80]. One of the techniques used to recognise phonemes in the work presented here is based on a similar approach for phoneme recognition.


Speech Recognition: Limitations

A number of dierent factors must be taken into consideration when choosing a speech recognition system. Vocabulary Size Required In almost all speech recognition systems, there is a trade o between vocabulary size and the accuracy of the system. As the vocabulary size increases, the probability of a mis-classication also rises. The time taken by the system to recognise a particular word usually increases too, since there are more words for comparison. System Training Some systems require each user to train the system to their individual voice. This means the user has to repeat each word in the systems vocabulary multiple times to enable the system to create an individual template for each word. This is cumbersome on the user and may prevent a system from being used by more than one user. Gaps between Words Speech recognition systems based on word recognition from a continuous speech stream often expect users to leave a distinct pause between words to enable accurate detection, as the system needs to 163

recognise when a word has been spoken and separate out distinct words from each other. This can be awkward as it is not the way people speak naturally. Users with Communication Disorders Most speech recognition systems currently available assume that the user is capable of producing good quality, easily comprehensible speech. The speech signal produced by users with speech disorders such as speech dysarthria, sigmatisms (lisps) and stutters may generate problems in speech recognition, often rendering speech recognition systems unusable for speech-impaired people. However, these are the people who would have the most to gain from acoustic based environmental controllers. People who have suered a stroke may often be left with speech dysarthria as well as severe motor impairment - and thus could benet greatly from systems that enable automatic control of appliances in their surroundings. Some of these issues are discussed in more detail in [80].


Anatomy, Physiology and Physics of Speech Production

Human speech production is a complex process involving the careful interaction of a number of dierent organs of the body. The formation of speech sounds can be described as a process with four stages - these are respiration, phonation, resonance and articulation. The organs relevant to speech production are known as the vocal organs, shown in Figure 6.1. Note the location of the important vocal organs which will be referred to now when discussing each of these four stages. The lungs are responsible for respiration and phonation occurs in the larynx, including the cartilages and the vocal cords. Resonance occurs mainly in the vocal tract which is the pathway from the larynx to the lips including the throat and the mouth. Articulation uses the group of organs 164

known as the articulators which include the tongue, the teeth, the nose, the lips and the hard and soft palate.



The rst stage in the production of speech sounds is respiration, or breathing. Sound waves propagate as pressure waves produced by causing air particles to vibrate. In speech, this air originates as a continuous stream of air exhaled from the lungs. When breathing normally, this exhalation of air is inaudible. It is only when the stream of air is caused to vibrate that it can be detected as sound by the human ear (or any other device capable of detecting sound such as a microphone) and thus may be described as speech. Air which is exhaled by the lungs travels up through the trachea (windpipe) and into the larynx.



The larynx, or voice box, is the phonation mechanism of the speech production system (to phonate means to vocalise or to produce a sound). The larynx converts the stream of air from the lungs into an audible sound. The larynx is located in the neck and is basically a stack of cartilages. There are nine cartilages in the larynx in total. The thyroid cartilage is the most prominent of these and is located at the front of the neck. It is also known as the Adams Apple. (Both men and women have this cartilage but it is more prominent in men as it is larger.) The arytenoid cartilages are a pair of cartilages located at the back of the larynx. Two folds of ligament extend from the thyroid cartilage at the front to the arytenoid cartilage at the back, known as the two vocal folds or vocal cords. The vocal cords act as an adjustable barrier across the air passage from the lungs. When not speaking, the arytenoid cartilages remain apart from each other and air is free to pass through the gap between the vocal cords (this 165

Figure 6.1: The Vocal Organs, from [82]


gap is known as the glottis). When the arytenoid cartilages push together, the gap between the vocal cords closes, shutting o the air passage from the lungs. During speech, the vocal cords open and close the glottis rapidly. This chops up the continuous stream of air from the lungs into periodic pus of air. This series of pus is heard as a buzz. The cycle of opening and closing of the glottis is controlled by air pressure from the trachea. The process that enables the glottis to open and close is described in [83]. The manner in which the vocal cords vibrate is a complicated process but may be compared simplistically to a set of vibrating strings. The phenomenon of vibrating a string to produce a sound is well understood and forms the basis of sound production in many musical instruments such as a guitar or a piano. A string vibrating as a whole will vibrate at its fundamental frequency. The string can also vibrate in other modes at multiples (overtones or harmonics) of the fundamental frequency. In accordance with this model, the muscle bres of the vocal cord muscles (the vocalis muscles) vibrate not only as a whole, but also in localised groups. Thus, the sound coming through the glottis will be made up of a number of frequency components, the fundamental frequency plus frequency components at multiples of the fundamental frequency. Generally, the pitch of the sound which is nally heard is equal to the fundamental frequency of the vocal cords.(Note that the words pitch and fundamental frequency are often used interchangeably but they may not always be equal. In telephone line transmissions where the speech signal is bandpass ltered, the pitch, which may be thought of as the perceived fundamental frequency, may not be the same as the actual fundamental frequency of the sound transmitted.) The frequency, f , of any wave in nature is related to its velocity, v, and wavelength, , as: f= v (6.1)

For a vibrating string, the velocity of sound propagation along it v, its tension


T and its mass per unit length are related by: v= T (6.2)

If the string is vibrating in its fundamental mode then the wave produced has a wavelength of twice the length, L, so = 2L. The relationship between a strings fundamental frequency, f , length, L, mass per unit length and tension T may be summarised by the following equation:


T / 2L


Thus the frequency of vocal cord vibration, and therefore the pitch of the sound produced, may be determined by adjustment of these and other factors. Length of the vocal cords The longer the vocal cords, the more slowly they will vibrate. The portion of the vocal cords which vibrates may also be constricted to produce higher pitches. (Conversely, if the vocal cords are actively elongated they will produce higher pitches due to thinning of the vocal cords.) Mass of the vocal cords The chief mass of the vocal cords is due to the paired vocalis muscle. The physical massiveness of the vocal cords sets the range of frequencies achievable by any one person. In general, the male has heavier vocal cords than the female and thus has a lower range of frequencies available for speech. Tension in the vocal cords Tension in the vocal cords may be altered by muscle action and thus can be used to adjust the pitch of the sound produced. Subglottic air pressure An increase in the subglottic air pressure raises the pitch (and also the amplitude of the sound). 168

Elasticity of the vocal cord margins An increase in the elasticity of the vocal cord margins raises the pitch. Position and size of the larynx Controversial opinion states that vertical position of the larynx may inuence pitch. The pitch of the nal sound produced is determined by the frequency of the vocal cord vibrations and is usually between 50-250Hz for men and 120-500Hz for women. The tone produced by the vocal cords vibration is known as the glottal tone. It is a dull, monotonous sound that is unlike the nal speech sounds that are uttered. Speech sounds are given a more musical quality by the eects of resonance.



As stated above, the purpose of resonance is to improve the quality of the speech sound. Some resonance of the sound occurs before the sound passes through the larynx, in the trachea and thoracic (chest) cavities. The supraglottic resonators are the cavities of the larynx above the vocal cords, the pharynx (throat), the oral cavity (the mouth) and the nasal cavity (the nose). The eect of resonance is to alter the dierent frequency components from the glottal tone, amplifying some and weakening others. For the purposes of describing speech production, the throat and the mouth are usually grouped into one unit referred to as the vocal tract. The vocal tract extends from the output of the larynx to the lips. The phenomenon of vocal tract resonance may best be described by approximately modelling the vocal tract as a tube that is closed at one end (at the vocal cords). Resonance may be dened as the property whereby a vibratory body will amplify an applied force having a frequency close to or equal to its own natural frequency - and is seen to occur in a tube closed at one end. Such a tube has a series of characteristic 169

frequencies associated with it known as the natural frequencies or resonant frequencies. In a tube of uniform cross-sectional area which is closed at one end, the lowest resonant frequency will have a wavelength, , of 4 times the length, L, of the tube ( = 4L). This frequency is known as the fundamental frequency or rst harmonic. The higher resonant frequencies are odd-numbered multiples of the lowest one, and are the higher order harmonics. A vocal tract of 17cm long, with a uniform cross-sectional area for simplication, has a fundamental frequency at 500Hz, a third harmonic at 1500Hz, a fth at 2500Hz, and so on. In speech production, the harmonic frequencies of the vocal tract are known as formant frequencies since they tend to form the overall spectrum of the sound. There are innitely many formants for each sound. In digital speech processing there are usually 3-5 left within the Nyquist band after sampling. The formant frequencies may be altered by changing the shape of the vocal tract. When a sound wave that is made up of a number of dierent frequency components enters a tube closed at one end, frequencies which are close to the resonant frequencies of the tube will be amplied and frequencies which are far away from the resonant frequencies of the tube will be attenuated. The signal coming through the glottis and into the vocal tract is made up of several frequency components which have been produced by vocal cord vibration. So, the vocal tract will amplify frequency components that are close to its formant frequencies and attenuate the frequency components which are far away from its formant frequencies. Resonance usually produces signicant amplication of the signal overall. A sound which is lacking in resonance will sound unpleasant to the ear. In music, the quality of a sound is called its timbre. The timbre is a measure of the amount of harmonics in a sound. A musical note of the same pitch will be distinguishable when played on a piano from when it is played on a violin because of the presence of dierent proportions of harmonics. The two notes have the same pitch but a dierent timbre. A musical sound is described as having a good timbre if it has many harmonics.


Likewise, in speech, a sound with many harmonics will sound more musical and pleasing to the ear than a monotone sound, which has one harmonic (the fundamental) and has a at sound [84]. The dierence between the overtones of the vocal cord vibration signal and the overtones of the vocal tract and how they inuence the nal speech signal must be emphasised here. The overtones of the signal produced by the vocal cords are the harmonic frequencies of the speech sound and they determine the frequency components present in the nal signal including the pitch. The natural frequencies of the vocal tract are the formant frequencies of the speech sound and they determine how the frequency components already present in the signal entering the vocal tract are amplied. Resonance in the vocal tract cannot add any extra frequency components to the signal, other than higher order multiples of the frequency components already present in the signal, which act to make the signal sound more pleasing to the ear. Therefore the pitch of the sound can only be altered by altering the frequency of the vocal cord vibration. Resonance is important in production of dierent vowel sounds. Dierent ratios of harmonic amplitudes are recognised as dierent vowels - thus a person must change the conguration of their vocal tract to produce dierent vowel sounds. Conversely, the pitch of a particular vowel sound is changed by altering the speed of vibration of the vocal cords and keeping the same vocal tract position. Since resonance adds higher frequency components to the sound, the full range of speech sounds for all human voices is between about 50-2000Hz. Resonance shapes the sound and gives it quality. Further sound shaping is done by articulation, which also occurs in the vocal tract.



Like resonance, the process of articulation is also a sound shaping one. Articulation shapes the sounds to make them acceptable to the listener and recog171

nisable as speech. The articulators are valves which can stop the exhaled air completely or narrow the space through which it passes. They separate the sounds transmitted to them and are particularly important in the production of consonant sounds. The articulators include the lips, teeth, hard and soft palate, tongue, mandible (jaw), and posterior pharyngeal wall and probably the inner edges of the vocal folds. The structures of the mouth articulate recognisable sounds. The tongue, the palate, the lips and the teeth play a part in articulation.

The tongue is the most important of the articulators. It also acts as a resonator by working with the mandible to modify the shape of the mouth. Many of the consonants are produced by movements of the tongue against the gums, palate and teeth to create friction or plosion eects. The palate consists of a bony hard palate that forms the roof of the mouth and a muscular soft palate at the back of the mouth. The velum, the lower portion of the soft palate, is especially important in controlling the pressure within the mouth. The velum helps to dam up the air by aiding closure of the nasal passages. The structure of human teeth, and the fact that they are even in height and width, is an important prerequisite for the production of fricative sounds, which will be dened in the next section. The mandible, or lower jaw, is one of the primary articulators, and also performs an important role in resonance. A tight jaw adds to tonal atness. The lips are important in the production of the labial consonants, which are dened in the next section. They also form certain vowels and diphthongs. The cheeks are used like the lips to articulate the labial consonants. 172

The tonsils occasionally grow large enough to have an eect on the air ow, and can add an adenoidal quality to the voice.

More information on the anatomy of speech production may be found in [83, 82, 85]. For intelligible speech production, there must be a clear movement between each sound formation. The clarity needed for intelligibility is provided by the consonants while the musical quality is provided by the vowels between. The consonants are shaped largely by the articulators while the vowels are primarily a product of resonance. The dierent types of sounds are now described in more detail.


Types of Speech Sounds

In the previous section, the organs used in the process of speech production were described. The vocal organs conguration for each element of a word and the features of each particular sound in a speech stream which make words identiable have not yet been discussed properly. A speech waveform may be thought of as a continuous stream of speech units known as phonemes. The set of all possible phonemes for a language covers all the possible combinations of sounds that may be necessary to create a word in that language. Before the dierent arrangements of the vocal organs for each phoneme and the spectral and temporal features of each phoneme can be discussed, a more complete description of the denition of a phoneme is given, and the phoneme alphabet for Hiberno-English is dened.



The Phoneme

Denition of Phoneme The basic unit of speech is the phoneme. The phoneme is the smallest element of a word that distinguishes it from another word. For example, the word cat is made up of three phonemes - a vowel sound between two consonant sounds. It is distinguishable from the word chat as the rst phoneme is dierent, although the second and third phonemes are identical. Two words that are dissimilar by only one phoneme, such as these, are known as a minimal pair. The English language is composed of between 39-49 phonemes, depending on dierent dialects. The phoneme can be thought of as a set of ideal sound units which can be concatenated to produce speech as a stream of discrete codes. In reality, each particular phoneme will have dierences depending on accent, gender and coarticulatory eects due to rapid transition from one phoneme to the next. The dierent ways of pronouncing a particular phoneme are known as allophones of that phoneme, and the decision whether to class two dierent sounds as two allophones of one phoneme or two separate phonemes is not always clear. For example, the le sound in l ove and the el sound in cattle are classied as two dierent phonemes by some phoneticians, and considered allophones of the same phoneme by others.

Phonetic Alphabet A phonetic alphabet is an alphabet used to write down the correct pronunciation of words. There are several phonetic alphabets commonly used by linguists and phoneticians to dene the dierent sounds available. The Speech Assessment Methods Phonetic Alphabet (SAMPA) and the Advanced Research Projects Agency alphaBET (ARPABET) are two alphabets commonly used, popular since they both consist solely of ASCII characters. The International Phonetic Alphabet (IPA) is a more complete alphabet developed by the In-


ternational Phonetics Association to standarise transcription. It is the ocial language of linguists and the rst version of the alphabet was developed in 1888. Most of its symbols are from the Roman alphabet, some are from the Greek alphabet and some unrelated to any other alphabet. For this reason it may not be suitable for all computers, but it oers the greatest range of symbols for dierent phonemes. It also includes diacritic symbols which can be used to indicate slight phonetic variations. For example, the phoneme [p] has both aspirated and unaspirated allophones (aspirated in pin and unaspirated in spin). A superscript h is sometimes used to indicate an aspirated phoneme i.e. [ph ]. It is a recommendation of the International Phonetics Association to use square brackets (e.g. [word ])to enclose phonetic transcriptions that include diacritic markings (known as the narrow transcription). The broad transcription which omits slight phonemic dierences may be enclosed in slashes (e.g. /word /). For our purposes, the broad transcription is sucient and will be used here for all subsequent phoneme transcriptions. More on phoneme alphabets may be found in [83, 86]. As mentioned before, boundary denitions between dierent phoneme sounds are not always clear. For example, in Hiberno-English, the dialect of English used in Ireland, /w/ and // are regarded as two dierent phonemes (e.g. wine versus whine) whereas in other dialects the initial phoneme in these two words is indistinguishable. Even within Ireland, dierences in these two phonemes may be more or less apparent depending on the speakers region and pronounciation. Regional accents have always been a feature of Hiberno-English speech and one phonetic alphabet cannot represent all the possible combinations. For a more detailed discussion on phoneme dierences of dierent dialects, refer to [87, 88]. A list of the phonemes which are commonly regarded as those that make up the Hiberno-English dialect is shown in Table 6.1. Note that the glottal stop is the sound made when the vocal cords are pressed together, as in the middle of the word uh-oh.


Table 6.1: The Phonemes of Hiberno-English (based on phoneme denitions in [89] ) IPA Symbol Example Long Vowels i: heat e: take A: father o: tone O: call u: cool Short Vowels had E bed 2 put I hi t 6 not @ above Diphthongs OI boy aI fi ne aU shout iE field uE tour Plosives p pea b bee t tee d d awn k k ey g go P glottal stop t batter IPA Symbol Example Aricates j ust church Fricatives v v iew f f ee T thin D then s see z z oo S shell Z measure h he when Nasals m me n no N sing Approximants Laterals l l aw Tremulants r r ed Semi-vowels j you w we



Types of Excitation

Speech sounds may be categorised as being either voiced or unvoiced sounds. Voiced sounds are sounds produced by vocal cord vibration and unvoiced sounds are produced by constricting the vocal tract at a certain point by the tongue, teeth and lips, and forcing air through the constriction causing air turbulence. All vowel sounds are voiced sounds, as well as some of the consonants. Voiced sounds have a periodic time domain waveform and a frequency spectrum that is a series of regularly spaced harmonics. Unvoiced sounds have a noisy time domain signal and a noisy spectrum. The vowel sounds are produced by air passing through the vocal tract. Dierent vowel sounds are produced by varying the amplitudes of the formant frequencies, which were discussed in Section 6.3.3. Vowel sounds have waveforms which repeat periodically for the duration of the sound. The waveform was recorded by the author for nine dierent vowel sounds, and is shown in Figure 6.2, exhibiting the periodic nature of these phoneme sounds. The sampling rate used was 22050Hz and the x-axis scale shows the number of samples. A diphthong is a special type of vowel phoneme that occurs when two vowels are produced in quick succession. The vocal tract shape moves from one conguration to another in such a way as to cause the two vowels to run into each other which produces a dierent phoneme sound than would be produced if the two vowel sounds were sounded separately with a pause between. The consonants may be classed either by their place of articulation or by their manner of articulation. The categories for places of articulation are labial (lips), labio-dental (lips and teeth), dental (teeth), alveolar (gums), palatal (roof of mouth), velar (part of soft palate) or glottal (gap between vocal cords). The categories used for manner of articulation are plosive, fricative, semivowel, liquids or nasal and these are described in more detail below.


Table 6.2: Classication of English Consonants By Place of Articulation and Manner of Articulation(reproduced from [82]) Place of Articulation Manner of Articulation Plosive Fricative Semi-vowel Liquids Nasal Labial p, b w m Labio-Dental f, v Dental T, D Alveolar t, d s, z j l, r n Palatal S, Z Velar k, g N Glottal h Plosives or stops are made by completely blocking the air ow somewhere in the mouth and then suddenly releasing the built up pressure. The air ow can be blocked by pressing the lips together (labial), pressing the tongue against the gums (alveolar) or by pressing the tongue against the soft palate (velar). /p//t//k/ are unvoiced plosives, /b//d//g/ are voiced. Fricatives are unvoiced consonants made by constricting the air ow somewhere in the mouth to an extent that it makes the air turbulent, producing a hissy sound. Nasal sounds are voiced consonants made by lowering the soft palate, coupling the nasal cavities to the pharynx, and blocking the mouth somewhere along its length. Semi-Vowels are voiced consonants that are made by briey keeping the vocal tract in a vowel-like position and then moving it rapidly to the next vowel sound in the syllable. Liquids - Voiced consonants. Laterals are a type of liquid, the voiced consonant /l/ is made by putting the tip of the tongue against the gums and allowing air to pass on either side of the tongue.



(a) 0
0.05 0.05 0 100 200 300 400 500 600 700 800 900 1000

(b) 0
0.05 0.2 0 100 200 300 400 500 600 700 800 900 1000

(c) 0
0.2 0.1 0 100 200 300 400 500 600 700 800 900 1000

(d) 0
0.1 0.2 0 100 200 300 400 500 600 700 800 900 1000


0 100 200 300 400 500 600 700 800 900 1000

0.2 0.2 0

(f) 0
0.2 0.2 0 100 200 300 400 500 600 700 800 900 1000

(g) 0
0.2 0.2 0 100 200 300 400 500 600 700 800 900 1000

(h) 0
0.2 0.2 0 100 200 300 400 500 600 700 800 900 1000


0 0 100 200 300 400 500 600 700 800 900 1000


Figure 6.2: Waveform of 9 phonemes over a time interval of approx 0.0454s. (a)
/i:/ (b) /e:/ (c) /o:/ (d) /u:/ (e)/O:/ (f) // (g)/E/ (h) /2/ (i)/I/



Characteristics of Speech Sounds

Sound Intensity Sound intensity is a measure of the amount of power in a sound wave. The unit of sound intensity is the decibel (dB), which is a ratio between the amount of power in the sound wave and the amount of power in the conventionally dened smallest sound intensity which is just about audible (1016 W/cm2 ). A 10dB increase in sound intensity corresponds to an increase of the power in the signal by a factor of 10. In normal conversational speech, the sound intensity three feet away from the speaker is around 65dB. There is an approximately 700-to-1 range of intensities between the weakest and strongest phoneme sounds in normal speech. The vowels produce the strongest sound intensity, but even among these, there is a 3-to-1 dierence. The strongest vowel sound is /O:/ and the weakest is /i:/, which has about the same intensity as the strongest consonant, /r/. This phoneme sound is two and a half times more intense than /S/, six times more intense than /n/ and 200 times greater than the weakest sound /T/ [82].

Spectrum of Speech As mentioned before, each vowel is individually recognisable due to dierent proportions of the formant frequencies for each sound. Dierent vowels can be recognised based on the amplitudes of dierent formants in the spectrum. The log frequency spectrum for the nine vowels in Figure 6.2, as computed by the author, is shown in Figure 6.3. Usually, the rst three or four formant frequencies are adequate for recognition (although there is evidence that vowels can still be recognised when the rst two formants are absent and higher formants are present). Even for consonant sounds, spectral features still play an important role in


their classication. The sounds /s/ and /S/ can be distinguished from other fricatives as they have larger sound intensities. They are distinguished from each other by spectral dierences, /s/ has a lot of its energy above 4000Hz while /S/ has its energy concentrated in the 2000-3000Hz region.

Number of Zero Crossings Another popular feature often used to distinguish between phonemes in the time domain is the number of zero crossings. A postive going zero crossing (PGZC) may be dened as the point where the signal changes from negative amplitude to positive amplitude and a negative going zero crossing (NGZC) may be dened as the point where the signal changes from positive amplitude to negative amplitude. The number of PGZC or NGZC are counted for a xed duration of the signal and used to characterise the sound. A signal resembling random noise, such as the /s/ phoneme will typically have a much higher number of zero crossings than a signal which is periodic, such as a vowel. A pure sinewave signal, such as a whistle, will have one PGZC and one NGZC per signal period.

Periodicity A number of dierent features of a signal can be used to test for periodicity. Two are used here in dierent applications, in the application described in Section 6.5.2, the signal is tested for periodicity temporally based on comparison of the interval between successive positive going zero crossings. In the application described in Section 6.6.2, periodicity is determined by locating the three highest peaks in the frequency spectrum. If the signal is periodic, then these should be harmonics, and thus multiples of a common factor, the pitch. More information on characteristics of speech can be found in [82].


(a) 200
400 00 100 200 300 400 500 600 700 800 900 1000

(b) 100
200 00 100 (c) 200 00 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000

(d) 100 Log Scale

200 00 100 (e) 200 00 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000

(f) 100
200 00 100 200 300 400 500 600 700 800 900 1000

(g) 100
200 200 0 100 200 300 400 500 600 700 800 900 1000

(h) 0
200 200 0 100 200 300 400 500 600 700 800 900 1000

(i) 0
200 0 100 200 300 400 500 600 700 800 900 1000


Figure 6.3: Spectrum of 9 phonemes, shown in the time domain in Figure 6.2.
(a) /i:/ (b) /e:/ (c) /o:/ (d) /u:/ (e)/O:/ (f) // (g)/E/ (h) /2/ (i)/I/. Each of the dierent vowel sounds has a number of spectral peaks, which are in general at multiples of the lowest peak, the fundamental frequency component. These peaks are known as harmonics. Dierent vowel sounds are distinguished from each other based on the relative amplitudes of these harmonics.



Proposal of a Phoneme Recognition Based System for Communication and Control

There are some instances where using a full speech recognition system for communication and control would be impossible or infeasible. Section 6.2.2 has already discussed some of the limitations of speech recognition technologies. A phoneme recognition based system is proposed as an alternative acoustic based system that may prove to be preferable under certain circumstances. Some of the factors inuencing choice of a phoneme recognition system over a speech recognition system are discussed below. Users with Physical Speech Disorders As mentioned before, currently available speech recognition systems may not be suitable for people with speech disorders. Current gures from the website Irish Health [6] state that 8,500 people in this country suer from a stroke annually, leaving an estimated 30,000 of the population with a residual disability, with 20% of these people unable to walk and 50% in need of day to day assistance. People who have suered a stroke may often exhibit speech disorders such as oral-verbal apraxia or speech dysarthria. Oral-verbal apraxia arises due to damage of the anterior portion of the dominant cerebral hemisphere. It is characterised by an inability to perform the voluntary motor sequences required for the production of speech. When a person suering from oral-verbal apraxia attempts to speak, often they can only produce a fraction of all the phonemic utterances required for intelligible speech. While this disability usually renders speech recognition systems unusable, the phonemic utterances produced by people with this condition may be harnessed through a phoneme recognition system, if these sounds are consistently and voluntarily repeatable. Speech dysarthria results from damage to the brainstem or damage to 183

one or both of the motor strips located in the frontal portion of the cerebral cortex, which aects the motor aspects of speech (respiration, phonation, resonance and articulation). If the damage is unilateral then speech dysarthria is evidenced only by a slight slurring of consonants, a change in voice quality or a reduction in the rate of speech. However, if both sides of the cerebral cortex or brainstem are damaged, moderate to severe speech dysarthria usually occurs. This has a range of manifestations. If the respiratory system is aected, there may not be enough air expelled to vibrate the vocal cords. If phonation can be produced, it may be so brief that only one word can be uttered. Poor sound intensity, monopitch, monoloudness, hypernasality and a slow rate of speaking are other features common to speech dysarthria. Users with Verbal Intellectual Functioning Disorders As well as communication disorders arising from a physical inability to perform the necessary processes for forming speech (respiration, phonation, resonance and articulation), communication disorders due to problems with verbal intellectual functioning may also render speech recognition technologies impractical. Cerebrovascular diseases, such as those that cause strokes, may often cause damage to the left cerebral hemisphere. This area of the brain is usually where the brains language centre is located, and damage may result in acquired language impairments such as aphasia. Aphasia may be dened as a loss of ability to understand language or to create speech [7]. Aphasic symptoms vary greatly - mild aphasics may nd themselves unable to recall certain words, while those with very severe aphasia may completely lose their linguistic abilities, including their ability to recognise nonverbal symbols such as pictures and gestures. A person with aphasia may not be able to use speech recognition systems due to an inability to associate verbal commands with desired actions. However, such people may have more success with phoneme recognition systems since it relies on a simpler vocabulary based


on single phonemes. The reader is referred to [7] for more information on communication disorders. Control of a Continuously Varying Parameter Phoneme based systems may provide a more intuitive way of controlling a parameter which requires continuous rather than discrete control. A continuously varying parameter as discussed here includes any parameter that may ordinarily be controlled by turning a knob rather than by pressing a switch, such as the volume control on a radio. It is appreciated that, in many modern appliances the act of turning a knob to increase a parameter in fact moves the parameter through a range of discrete values rather than allowing strictly continuous selection where the parameter can be set to any value within a range, but such instances may still be included under the heading of a continuously varying parameter for discussion here. Two aspects of a phoneme may be used to control a continuously varying parameter - pitch and duration. If a phoneme is a periodic sound, such as any of the vowel phonemes in Table 6.1, then it will have a fundamental frequency component, or pitch, associated with it. Variations in pitch can be used to indicate the direction of change (e.g. for the radio volume example, an increase in pitch could correspond to volume up and a decrease in pitch could correspond to volume down). Pitch variation can also be used to control the rate of change of the parameter, i.e. how fast the volume is increased or decreased. Phoneme duration can be used to indicate the extent of change (e.g. for the radio, the user makes the required sound until the volume has reached an acceptable level). Phoneme control of a continuous parameter is demonstrated in the work presented here by the sample application Spelling Bee, described in Section 6.6.2, in which a bee is moved around the computer screen by varying the pitch and duration of a periodic sound.


User Training There is typically a much smaller degree of user variability when uttering a single phoneme than there would be when uttering a word or a sentence, due to a more uniform method of pronunciation when uttering single phonemes, which is largely independent of accent. The greatest degree of variability for each particular phoneme is probably pitch dierences, but even this can be taken into account by measuring the ratios of the harmonics, rather than their exact location or by setting a range of pitches characteristic of a phoneme. Because of this advantage, simple phoneme recognition systems such as those described in Section 6.5 can store a set of features common to a particular phoneme to enable recognition, rather than requiring the user to train the system to recognise their individual voice. These systems may be suitable for environments where the systems are required to respond to the voice input of a large range of dierent users, such as in public buildings. Vocabulary Size As the number of phonemes in the English language is only somewhere between 39-49 depending on dierent dialects, the number of possible commands for a phoneme based system is relatively small. While this limitation does prevent phoneme recognition from being used in more complex applications, a small vocabulary does oer the benets of a higher accuracy rate and a faster classication time.


Hardware Application

Two hardware based systems were developed as part of the work here. Both are based on detection of the two phonemes /o:/and /s/. These phonemes were chosen since they have spectral and temporal features which make them readily distinguishable from each other, enabling them to be used without


requiring training to a particular users voice. The waveforms of these two phonemes and their spectra are shown in Figure 6.4. The systems described here can be interfaced with any application requiring operation by one or two switches but have been specically developed to be used with a reading machine, which was previously developed in the lab in the National Rehabilitation Hospital. The reading machine was designed to provide people who have been severely disabled an alternative option to page turners for reading books, since page turners have a number of problems - they provide no means for turning back to a previous page, often turn more than one page at a time and sometimes tear the pages of the book. In order to read a book using this reading machine, the book must be rst transferred page by page onto a roll of acetate which is then axed a moving motor on the reading machine. The reading machine input is a 1/4 inch stereo phone plug which can be thought of as two switches - Switch A closes when the tip of the plug is connected to the plugs sheath, and Switch B closes when the ring of the plug is connected to the plugs sheath. When Switch A closes the reading machine will scroll the acetate onto the next page. When Switch B closes the reading machine turns the entire roll around by 180 allowing the user to read the other side of the page. To go back a page, Switch A must be held closed for a xed length of time. So to operate this machine using phonemic control, the user makes a short /o:/ sound to scroll to the next page, an /s/ sound to ip over to read the back of the page, and a long /o:/ sound to go back to the previous page. The system was designed rst using analogue circuitry based on lters. Circuitry based on a narrow-band, band-pass lter with a low frequency passband was designed to close Switch A if a low frequency signal was detected (e.g. /o:/) and circuitry based on a wide-band, band-pass lter with a higher frequency passband was designed to close Switch B if a high frequency signal was detected (e.g. /s/). This circuit performed reasonably well but was eventually replaced by a microcontroller based circuit which replaces the spectral 187

based criterion for classication with a closely related temporal based criterion based on the number of positive going zero crossings (PGZCs), and also adds a requirement that the signal is periodic before it will be recognised as the phoneme /o:/.


Analogue Circuit

As the analogue circuit developed was only briey used, just a short description of the theory of its operation will be given here. For those who are interested in the full technical details, the complete circuit diagram for the circuit built is given in Appendix F, along with the calculations of all the values given here. The operation of this circuit is described as a series of stages, which are shown in the block diagram in Figure 6.5.

Pre-Amplier The pre-amplier stage is necessary to boost the signal before it is passed to the lter. The gain of this stage is 10.

Filtering The output from the pre-amplier is passed to two band-pass lters. One of these is designed to pass signals with frequency components characteristic of an utterance of the /o:/ phoneme, the other is designed to pass signals with frequency components characteristic of an utterance of the /s/ phoneme. The /o:/ utterance is a periodic signal and hence can be considered to have a narrow-band spectrum, although it will have harmonics at higher frequencies. The pitch of this vowel sound varies between users. Therefore, to pass this signal successfully, the lter used was a narrow-band, band-pass lter with adjustable centre frequency. The maximum gain of the lter is 25, and the centre frequency is adjustable between approximately 200Hz - 1.6kHz. 188

(a) /o:/ Waveform

(b) /s/ Waveform

(c) /o:/ Spectrum

(d) /s/ Spectrum

Figure 6.4: Phoneme Waveforms and Spectra. The /o:/ waveform is periodic and
has more of its power at a lower frequency than the /s/ phoneme, which has frequency components across the spectrum.


Lowpass Filter




Switch 1

Smoothing & Preamplifier


Bandpass Filter Amplifier Rectifier Buffer Threshold Delay & Comparator Switch 2

Figure 6.5: Block diagram of Circuit Stages for Phoneme Recognition of a Low Pitched and High Pitched Sound

The /s/ utterance resembles noise with a wide-band spectrum with most of its energy between 2-8kHz. The lter required to pass this sound successfully is a wide-band, band-pass lter. The lter chosen has a centre frequency at 5kHz, a bandwidth of 1.25kHz, and a maximum gain of 12.5. Since this phoneme is typically of a lower intensity than the /o:/ phoneme (refer to Section 6.4.3), further amplication is required in the next stage. Amplier This stage was only required in the part of the circuit for detection of the utterance /s/. The signal is further amplied by a factor of 3.9. Rectier The signal that is passed through each of the two lters is rectied using an envelope detector. Buer This stage increases the current gain of the circuit to enable sucient current to be supplied to close the relay coil. Threshold Thresholding will output a signal large enough to close the switch if the sound intensity received is large enough. This prevents components of other noises that are at a similar frequency to the frequency desired from accidentally closing the switch. Delay and Comparator A delay was used for subcircuit B, since often accidental noises can have a similar frequency spectrum to this phoneme (e.g. sudden door slams etc.). The delay stage means that a signal of the correct spectrum should be sustained for 0.27s before the switch will close. The comparator then will suddenly switch on a current (and raise the voltage level) once the required delay time has elapsed. Relays The relay coils will close a switch when 5V is dropped across them. These switches can then be connected to any system that requires two switching actions.



Microcontroller Circuit

A system with similar functionality to the one described above was implemented using a PIC microcontroller, which meant that the condition of periodicity could be added to the criteria for recognition of the phoneme /o:/. The criteria used in this case to recognise each of the two phonemes are:

/o:/ Phoneme (i) The time interval between successive positive going zero crossings (PGZCs) must remain roughly constant. (ii) The time interval must be within a range of thresholds. /s/ Phoneme The average time interval between PGZCs must be greater than a threshold (typically 2000 PGZC/s)

The microcontroller based phoneme detection circuit is given in Appendix F. The system consists of 7 stages: - the input stage (microphone), an amplication stage, an innite clipper, a microcontroller, a debouncing stage, a current amplication stage and an output stage (relay coils). The output of the amplication and innite clipping stages is shown in Figure 6.6. The nal signal is the signal received by the microcontroller. Two PIC16F84 microcontrollers were used, one to detect each phoneme. The pin-out of the microcontrollers and the external components used are given in Appendix G. The code is given in Appendix H. The reader is referred to [90] for a comprehensive reference to the assembly language commands used. The technique for detecting the /s/ phoneme uses the fact that it has a higher number of zero-crossings per unit time than the /o:/ phoneme. The microcontrollers timer TMR0 is congured to count from the microcontrollers external clock input pin. The TMR0 is a one-byte register which will set a ag 192



Figure 6.6: Pre-processing stages of the audio signal. The top graph shows the raw
signal. The middle graph shows the signal amplied and shifted so it rides around 4.5V. The bottom graph shows the innitely clipped signal that is input into the microcontroller.


when it overows. It is initialised to 236, before starting a 10.24ms dummy loop. While this loop is running, the timer increments each time a rising edge of the input signal occurs e.g. each time a PGZC is detected. At the end of the 10.24ms period the timer overow ag is checked. The timer will overow if a sucient number of rising edges have occurred within the time period and the timer overow ag will be set. If this is the case, an /s/ phoneme is deemed to have occurred and the appropriate action is taken. The technique for detecting the /o:/ phoneme looks for two features of this phoneme, rstly, that it has a periodic waveform with one zero-crossing per period, and secondly, that is has a lower number of zero-crossings than the /s/ phoneme. In this case, the microcontrollers timer TMR0 is set to count internally. It is pre-scaled so it will overow if it counts uninterrupted for approximately 66ms. If it overows the interrupt service routine is called. The signal is input on the microcontrollers external interrupt pin, which will also call the interrupt service routine when the value at this pin changes. The main program runs a dummy loop that runs continuously until one of these two interrupts calls the interrupt service routine. When this routine is called the program runs through a loop of decision processes to decide what action to take. The ow of control of the interrupt service routine is shown in the owchart in Figure H.1 in Appendix H.


Software Application

Two software phoneme recognition systems and two complementary sample software applications using each of these recognition systems were developed as part of the work described here. The system was rst developed based on the Linux operating system. A phoneme recognition tool named the AudioWidget was created which can be included into more complex programs to provide phoneme recognition capabilities. The AudioWidget operates in two modes. The rst of these modes is pitch detection and the second is phoneme detection. 194

The phoneme detection mode is again based on the two phonemes /o:/ and /s/. An environmental control graphical menu system was developed incorporating the AudioWidget to provide a sample application. This system is particularly suited to use by aphasics since the menu items are pictures and the control mechanism is non-verbal, meaning the whole system can be used completely independently of words or text. The nal phoneme recognition system presented here has been developed for the Windows operating system. All the methods described thus far are based on recognition of the same two phonemes and may be thought of as the phoneme recognition equivalent to the Acoustic Phonetic Approach of speech recognition described in Section 6.2.1. These systems have limited applicability and a more generic system congurable by the users therapist is described. This system may be thought of as the phoneme recognition equivalent of the Pattern Matching Approach discussed in Section 6.2.1. A sample application for this system was also developed, which has been called the Spelling Bee and is described below.


Application for Linux

The reason that the two software phoneme recognition systems that are presented here have been divided into two separate sections based on operating systems is because each operating system requires a dierent programming approach to enable data to be read into the programme from the sound card. Linux probably provides an easier programming interface with the sound card and thus was chosen as the operating system to initially use for developing a system requiring access of the sound card. A separate system was later developed for the Win32 operating system. The Linux application developed uses the Open Sound System (OSS) [91] application programming interface for capturing sounds, which is dened by including the header le linux/soundcard.h in programs requiring this in195

terface. The OSS is a device driver developed for UNIX and UNIX-compatible operating systems, such as Linux, which supports most of the common sound cards available. Any sound card will have a number of dierent devices on it. The Digitised Voice Device is used for recording and playback of digitised sound. A sound card may also have a Mixer Device to control various input and output levels and a Synthesiser Device for playing music and generating sound eects. The OSS supports a number of dierent device les which enable access to various devices on the sound card. The most important device le for the purposes here is dev/dsp. This device le can be treated like a normal le stored on the hard drive and can be used with standard Linux le control calls dened in fcntl.h such as open, close, read and write. Reading this device le returns the audio data recorded from the current input source, which is the microphone by default.

AudioWidget The program AudioWidget uses OSS and the standard le control calls to read data from the microphone using dev/dsp. The default audio data format when reading from this device le is 8kHz/8-bit unsigned/mono but it is usually not safe to assume that this will always be the case so the program explicitly sets these parameters to these values. The fragment size is set to 512 samples which are read into a 512 element array which we will call x. This corresponds to a time of 0.064s, which is a reasonable time to take as the vocal tract conguration and excitation will usually not vary signicantly during this time. A 512-sample Fast Fourier Transform (FFT) is performed on the 512-sample fragment, using code taken from Dr. Dobbs Journal [92] which is based on a radix 2 algorithm for FFT computation. This returns an array, which we will call y, with 256 elements, or N = 256. The program operates in two modes pitch tracking mode and phoneme recognition mode. In pitch tracking mode the program responds only to signals with a single


frequency component, such as a whistle. The program continuously analyses each 512-sample fragment received to test for single frequency component signals. The amplitude Amax of the maximum frequency component in the spectrum is identied by looking within the y array in the range cutoff < i < N , where cutoff is dened by the program. (This is usually around 10 and is to ensure peaks due to low-frequency noise are eliminated from the search). The index nmax , of this peak is recorded. The program then looks to see if there is another peak in the array y outside of this peak area i.e. an mmax within the array y such that: ymmax > Amax if 10.0 mmax < (nmax 5) mmax > (nmax + 5) (6.4)

If there exists another peak such that mmax exists, then the signal at the microphone input is not a single frequency component signal and thus is ignored. If only one main frequency component exists, then the pitch, and thus the fundamental frequency fmax , of the signal may be calculated using N, nmax and the sampling frequency fs : fmax = nmax fs /2 N 4000 = nmax 256


In phoneme tracking mode the program continuously classies each fragment of data as one of three types: 1. NO PHONEME - neither phoneme sound was uttered 2. O PHONEME - the /o:/ phoneme was uttered 3. S PHONEME - the /s/ phoneme was uttered Since the data in the x and y arrays are in 8 bit unsigned format, possible sample values are in the range [0, 255]. The mean values x, y = 128. The zero crossing point for the spectrum may then be considered to be the point


where: yi1 <= 127 yi >= 128 (6.6)

The rst step in classifying the signal is to decide whether or not any sound occurred by calculating the variance S 2 from the sampled time signal x. In this case x = 128, and N = 512. S2 =
N 1 i=0 (xi

x )2


If S 2 is less than a threshold, then the fragment is classied as being of type NO PHONEME. Otherwise, the number of PGZCs, p, are counted according to the criteria in Equation 6.6. The maximum interval between PGZC, Imax , and the minimum interval between PGZCs, Imin , are recorded. Phoneme classication is made based on the following criteria which were set out based on experimental ndings: if if if p<8 8 < p < 20 p > 40 and NO PHONEME




The graphical user interface (GUI) for the AudioWidget is based on the cross-platform GUI toolkit for C++ called the Fast Light Toolkit2 (FLTK). The AudioWidget GUI is shown in Figure 6.7. Note that the graph on the left shows the time domain signal and the graph on the right shows the corresponding frequency spectrum for that fragment. The power in the spectrum is scaled to t inside the box which is why there appears to be a large spectrum even when no sound is uttered, the spectrum shown just represents wide-band ambient noise from other sources. The red part of the spectrum is the part of the spectrum falling within the cuto region which is ignored by the program when looking for peaks. It can be seen that for a whistle there is only one


distinct peak, for the /o:/ phoneme there are a series of evenly spaced peaks representing the harmonics, and for the /s/ phoneme the spectrum looks like wide-band noise.

Graphical Menu An environmental control program Graphical Menu was developed as part of the work here to give an example of an application incorporating the AudioWidget described above. This program is congurable by a therapist or helper and is designed to be operable solely by symbols and non-verbal utterances rather than words - enabling use by people regardless of their linguistic abilities. The program is a list of menu items which are each individually created by the therapist. An arbitrary command is associated with each menu item. There is a drawing facility for adding a symbolic representation to each menu item, a textual description of the menu item may also be added optionally. The graphical menu is shown in Figure 6.8 with three menu items added: Turn Radio On, Turn Light On and Turn Light O. The user scrolls through menu items by making the sound /s/ to move to the next menu item. A menu item is selected by making the sound /o:/. A commercially available supplementary home automation module called the X10 module 3 was used to control appliances. The X10 command signals are transmitted over domestic power lines. The command signals can be transmitted via the computers power line through a buildings electrical wiring to the appliances power line. The X10 system is shown in Figure 6.9.


Application for Windows

In the Windows application developed here, the sound card was accessed using DirectSound, which is part of the DirectX suite of application programming in3

X10 Home Automation System:


(a) Whistle

(b) No input

(c) /o:/ phoneme

(d) /s/ phoneme

Figure 6.7: The AudioWidget responding to dierent signals (a) Pitch Detection
Mode - the tracker on the bottom varies according to pitch (b)(c)(d) Phoneme Detection Mode - the status bar at the bottom changes colour according to detected phoneme.


Figure 6.8: Graphical Menu operated using the AudioWidget

Figure 6.9: The X10 module. A control signal from the computer travels down
through the computers electrical connection to the X10 receiver attached to the lamp, enabling the computer to switch the lamp on.


terfaces. DirectSound enables WAV sounds to be captured from a microphone. DirectSound and other DirectX suites are based on the Windows Component Object Model (COM) technology. The program called Phoneme Detection creates a DirectSound device which is then used to create a capture buer which captures the data from the microphone. In this application a 1s buer was created with PCM wave format, one channel, an 8kHz sampling rate and 16 bits per sample. This buer is continuously lled with data. Each time a 512-sample chunk is lled, it is read into array x, with N = 512, and analysed. The spectrum of the fragment is calculated, again by performing a 512-sample FFT based on code from Dr. Dobbs Journal [92] which is read into an array y with N = 256.

Phoneme Detection Program On each fragment of audio data received the variance S 2 is calculated to check if a sound was uttered, using Equation 6.7 with N = 512 and x = 0, since in this case the data is signed and thus has zero mean. If the variance is greater than a threshold a number of features are calculated to enable the sound to be characterised. Number of PGZCs The number of PGZCs, p, is incremented when yi1 < 0 (6.8)

yi >= 0

Spectral Peaks Spectral peaks were detected using the Slope Peak Detection Method outlined below, where w is the peak width, dened at the beginning of the program: Peak is detected at i if and only if: yiw < yiw+1 yi1 < yi and 202

Table 6.3: Spectral Peaks for Signal in Figure 6.10 Peak peak_index_max[0] peak_index_max[1] peak_index_max[2] Freq (Hz) Harmonic 234.375 468.750 937.500 Fundamental 2nd harmonic 4th harmonic Highest Peak Amplitude 2nd 1st 3rd

yi > yi+1 yi+w1 > yi+w Each time a peak is detected its index is read into the next available location in an array peak_index. Once all the peaks have been detected peak_index is searched and the peak indices of the three highest peaks are read into a three element array peak_index_max. These three indices are then re-arranged in ascending order of index value. If the signal is periodic, the rst element of this array will usually correspond to the pitch of the received signal. Peak detection for the signal in Figure 6.10(a) is shown in Figure 6.10(b). The circles represent all the detected peaks (the red line marks the highest peak, the green line the second highest and the blue line the third highest peak). The three highest peaks are shown in Table 6.3. Periodicity A truly periodic signal may be dened as a signal which exactly repeats itself after every T seconds, where T is the period of the signal. For the most part, periodic acoustic signals such as whistles and vowel sounds will not be exactly periodic once received by the computer due a number of factors including ambient noise, recording errors and slight movement of the vocal organs when attempting to produce a constant periodic tone. For the purposes here, we dene a signal as being approximately periodic if most of the power in its frequency spectrum lies within the fundamental frequency peak plus the harmonic peaks. Since the three largest peaks in the frequency spectrum have already been identied we can used these to dene a test for approximate periodicity of any arbitrary signal. If the lowest peak index is called P1 and the two 203

other main peaks are called P2 and P3 , then a signal is periodic if P2 and P3 are multiples of P1 i.e. P1 is the fundamental frequency component and P2 and P3 its harmonics (which are the two highest harmonics of the signal other than the fundamental and will not necessarily be the 2nd and 3rd harmonics). If this is the case, then the fragment is marked as an approximately periodic signal. P2 and P3 are tested to assess if they are multiples of P1 by calculating P3 P2 the two ratios r1 = and r2 = , and from this d1 and d2 were P1 P1 calculated as the portions of r1 and r2 , respectively, after the decimal point. From this the nal variables t1 and t2 were calculated based on the following conditions: d if d <= 0.5 (1 d) if d > 0.5



P2 and P3 are multiples of P1 and thus the signal is marked as approximately periodic if t1 < 0.15 and t2 < 0.15. See the program code in Appendix I for further details. In the example signal in Figure 6.10 and in Table 6.3, the peaks at 468.75Hz and the peak at 937.5Hz are exact multiples of the fundamental frequency peak at 234.374Hz, thus the signal is periodic. Inspection of the time signal in Figure 6.10(a) indeed shows that this appears to be the case. Normalised Peak Values As mentioned before, dierent vowel sounds are identiable based on dierent relative amplitudes of their harmonics. The normalised amplitudes of the three peaks in the array peak_index_max were recorded and stored in an array called peak_ratio, normalised so the maximum peak amplitude had a value of 100. Typical values for the amplitudes of dierent vowel phonemes for a female are shown in Table 6.4. Note that the values shown are only those recorded for one fragment of data, and the values recorded will in general uctuate from the values shown throughout the duration of the vowel utterance, causing 204

Phoneme /i:/ /e:/ /o:/ /O:/ // /E/ /2/ /I/

peak_ratio[0] 100 100 100 100 100 100 100 100

peak_ratio[1] 94 95 94 97 91 95 91 91

peak_ratio[2] 77 85 90 95 83 91 87 90

Table 6.4: Example values for relative harmonic amplitudes of vowels calculated by
the program Phoneme Detection.

considerable overlap between phoneme denitions. Thus, each of the 8 vowel sounds will not be readily distinguishable simultaneously, but it is hoped that for each individual user, a subset of these sounds will exhibit distinctive enough amplitudes to enable correct classication of some of these sounds at the same time.

Like the AudioWidget program, this program can operate in two modes pitch tracking mode and phoneme detection mode. The pitch tracking mode in this program is more adaptable than in the AudioWidget described in Section 6.6.1 since it also includes a facility to simultaneously detect non-periodic phonemes while pitch tracking is running. If the audio signal received is periodic then the pitch of the signal is calculated and this can be associated with any command requiring a continuous input (such as the volume on a radio or moving a mouse up and down). Non-periodic utterances can be used to create stored template feature sets, each of which can be associated with a dierent command. When a non-periodic utterance is received its features are compared to the stored feature sets. If a match is found then the associated command is performed.


(a) Signal

(b) Spectrum

Figure 6.10: Signal and its spectrum, from phoneme detection program.


In pure phoneme detection mode, periodic utterances can also be used to create stored template feature sets, allowing dierent vowel sounds to be used to create dierent template feature sets. If two template feature sets are too similar then the program generates a warning and the user is advised to rerecord both sounds or to choose a dierent sound to associate a command with.

Spelling Bee An example communication program called the Spelling Bee was developed, incorporating the Phoneme Detection program described above. The GUI for this program is shown in Figure 6.11, showing a bitmap of a bee. When no input is received from the microphone, the bee drifts from left to right across the middle of the alphabet board, at a user congurable speed. When the bee reaches the end of the board it wraps around the alphabet board and will reappear on the left side of the screen and drift over towards the right again. The vertical direction of the bees drift is controlled by the user by making a periodic sound, such as a vowel sound, and adjusting the pitch to move the bee upwards or downwards. To direct the bee towards the top-right corner the user needs to make a rising pitch sound, to direct the bee towards the bottom-right corner the user needs to make a falling pitch sound. Each time the program receives a periodic sound from the microphone, the pitch of the sound received is calculated, using the pitch detection feature of the Phoneme Detection program. The pitch dierence, f , is calculated by subtracting the current pitch from the previous pitch. The vertical increment or decrement of bee position, Ai , is then calculated using f and the previous value of Ai1 , according to the following rules.

if |f | > constant A = 0. This ensures the program will only respond to pitch changes within a certain range, such as a rising or falling vowel sound, and will not respond to sudden large pitch changes due to 207

Figure 6.11: Spelling Bee The bee is directed up and down by the users voice.
In the current screenshot the pitch dierence between the last pitch (29) and the current 2nd peak (30) is +1 so the bee will move upwards

. arbitrary noises from other sources. if f > 0 and if Ai1 > 0 Ai = Ai1 + 5f . This enables the program to increase the bees rate of movement upwards (accelerate) if a continuously rising periodic sound is made. if f < 0 and if Ai1 < 0 Ai = Ai1 + 5f . This enables the program to increase the bees acceleration downwards if a continuously falling periodic sound is made. Otherwise Ai = 7f .


Note that in this program, the second peak (peak_index_max[1]) is chosen as the peak to label as the pitch of the sound, although the true pitch will actually be the rst peak. The reasons for this choice are now explained. The program is only looking at small windows of data at a time (512 samples per fragment at 8000 samples per second gives 0.064 seconds per fragment). This small window was chosen so the assumption could be made that the vocal tract conguration remains constant over the entire fragment and also to enable the program to respond faster to user input. However, it does introduce the limitation that each element of the frequency spectrum array spectrum corresponds to a frequency step of approximately 16Hz. So, for the program to respond to an increase in pitch using the fundamental frequency component, the user would need to raise the pitch of their voice by 16Hz before any change in pitch would be detected. By using the 2nd harmonic component, the user has twice the degree of control over the pitch increment. Using the 3rd component of the array peak_index_max would give even a greater degree of control although in practice this was found not to work very well - the third peak detected seemed to jump position frequently between the 3rd and 4th harmonics of the spectrum. The Spelling Bee also requires that a non-periodic sound is recorded at the start of the program which is called the Selection Indicator. The features of this sound are stored and each time a non-periodic sound is made, the its features are compared to the stored features. If a match is found, the program is indicated and the program waits for another match to certify that the expected phoneme was actually uttered. If another matching fragment is found within a short interval of time, then a selection is conrmed and the bee lands on whichever letter he is currently hovering over. The chosen letter then appears in the edit box across the bottom of the screen. Thus the user can spell out a message.




Phoneme detection has been discussed in this chapter as an alternative method to speech recognition for providing communication and control for disabled people. The advantages of phoneme recognition based systems have been discussed. For users with communication disorders such as apraxia, speech dysarthria or aphasia, speech recognition systems may not be suitable and phoneme recognition based systems could oer many of the same benets as speech recognition systems provide, such as hands-free control of the users environment. Phoneme recognition based systems may be preferable in situations where control of a continuously varying parameter is required, such as in volume control of a radio or television. It may be more suitable to environments where the system can not be trained to a particular users voice - such as where there are many users or in a public building. A number of phoneme recognition based systems have been developed and a number of applications of these systems have been described in this chapter. Phoneme recognition has been explored to assess how it may be used to control a reading machine, an environmental control menu and an alphabet board based communication tool. A system requiring training to a users voice and a system designed to work with a broad range of dierent voices have been developed, and both have advantages and limitations.


Chapter 7 Conclusions



This thesis has explored a number of dierent signals from the body to investigate their potential as control signals for communication and control systems for the severely disabled. It is impossible to choose any one of these signals as the best signal for communication and control purposes from the signals identied. For each person requiring the use of a communication and control system, there will be a dierent set of variables which may determine the best method of enabling interaction with a system. Firstly, the physical abilities of the person must be identied to assess the range of options open to them. Their individual motivations and practical requirements from an assistive technology system must also be taken into account. While some people may wish to use a system that will allow more ecient communication and are willing to spend some length of time mastering a technique for interaction, others may prefer a simpler method such as a single switch operated system. Also, a person may have dierent opinions about how an alternative control system may draw attention to their disability. Obviously their personal preferences should always be taken into consideration when choosing a method of control. While some people may nd that electrodes attached to the skin will make 211

their disability more conspicuous, others may feel uncomfortable making utterances if they are in an environment where others will be able to overhear their commands.


Resolution of the Aims of this Thesis

The main aims of this thesis were outlined in Chapter 1 and methods used to meet these aims will be discussed here. These were: 1. Overview of current methods of providing communication and control for disabled people. 2. Identication of alternative signals from the body which may be harnessed from the body for communication and control purposes for people with very severe disabilities. 3. Study of measurement techniques that may be used to acquire these vestigial signals 4. Investigation of signal processing methods to enable these signals to be correctly interpreted and development of working systems that demonstrate the capabilities of these techniques. 5. Testing of these techniques and systems with people with severe disabilities. 6. Development of some mathematical models that evolved as a result of studying these body signals.



Overview of Current Communication and Control Methods

There are a wide range of communication and control systems currently available and many dierent methods have been considered to enable a disabled person to interface with these systems. A number of systems which were considered of particular relevance to the thesis were described in the main body of the thesis.


Identication of Signals

This thesis sought to identify vestigial signals left to severely disabled people that could be harnessed for communication and control purposes. Four principal signals were investigated in this thesis. Muscle contraction was one of the vestigial signals explored as a method of communication and control for disabled people, in Chapters 3 and 5. If people who are disabled have the ability to contract a muscle, then this may be harnessed to provide a method of communication and control. Obviously if the muscle contraction is strong enough, then it enables the person to use a mechanical switch. If this is not the case, then the signal must be harnessed by other means. Eye movements were discussed in Chapter 4. As the muscles in the eye often remain under voluntary control even in very severe cases of disability, eye movements are an important signal to consider for communication and control purposes. Acoustic signals as a method of communication and control were discussed in Chapter 6. Speech recognition technologies may be used to provide control by people with full speech production abilities. For those who are only capable of producing a subset of the utterances necessary to create intelligible speech,


other methods of harnessing these utterances must be considered. Skin conductance was briey explored in Chapter 4 as another method of providing a switching action. Measurement of the electrical conductance of the skin may serve as a method of monitoring the activity of the sweat glands on the skins surface. Sweat gland activity may be consciously controlled by tensing up or imagining oneself in a state of stress or anger. This causes emotional sweating to occur, which will increase the measured skin conductance. One of the drawbacks with using this signal is that it is a very slow method of control. However, in cases where there is no preferable alternative it may be the only option. It is recognised that there may be other signals from the body that have not been explored in this thesis that could be harnessed for communication and control for disabled people. Some other possible signals that could be considered for future investigations are mentioned in Section 7.3. The measurement techniques and signal processing methods used to develop working systems will now be described for each of the four signals.


Measurement Techniques

Muscle Contraction Muscle contraction in physically disabled people will often be very weak. A number of methods of detecting this muscle contraction were considered. The three methods of harnessing weak muscle contractions were considered. The electromyogram (EMG) and the mechanomyogram (MMG) were discussed in Chapter 3 and visual methods were discussed in Chapter 5. The EMG is typically measured using three electrodes. Two recording electrodes are place over the belly of the muscle and a ground electrode is placed on a neutral part of the body, such as a wrist. The two recorded signals are dierentially amplied. Muscle contraction can be detected by harnessing 214

this signal since the amplitude of the EMG increases in almost all cases upon contraction, due to the generation of action potentials on contraction. In Chapter 3, the MMG was explored as an alternative to the EMG for measuring muscle contraction to enable control of communication and control systems for disabled people. The mechanomyogram may oer a number of benets over myoelectric control. The mechanomyogram is measurable using a single small accelerometer, as opposed to the three electrodes which are required for electromyographic recordings. The electromyogram requires skin preparation to improve its conductance and this step is unnecessary for the mechanomyogram as it is a mechanical signal. Skin conductance may also be a problem for EMG recording since it can be aected due to varying thermal conditions or emotional anxiety. The MMG is capable of detecting weaker contractions than the EMG, which makes it an attractive option as a controller for disabled people who may have very weak muscle activity. Also the MMG can measure activity from deeper muscles than can the surface EMG, which typically only detects activity from the surface muscles. Visual techniques were explored in Chapter 5 to investigate the possibility of using a computer camera to measure observable ickers of movement. This may be a preferable option in cases where the person does not want to have anything attached to their skin, or where they are prone to heavy perspiration. The visual method of movement measurement that was developed as part of the work here has an added benet in that it responds only to the particular movement that the user or therapist has chosen. Thus other movements, whether voluntary or involuntary (such as muscle tremors or spasticity) will not unintentionally trigger the program to respond.

Eye movements A number of dierent methods exist for measuring eye movement. It is often measured visually, using methods such as the corneal reection technique [52], 215

which shines light (usually infrared) into the eye and detects the reected pupil and cornea. The electrooculogram was the method of eye movement measurement used in the work presented in this thesis. Two electrodes are placed at opposite sides of the eyeball and the electrical signal varies between the electrodes as the eye moves. Despite oering a number of benets over other methods, it is often overlooked as an eye movement method technique due to problems with baseline drift. Often EOG based systems require manual re-calibration of the amplier when baseline drift occurs and this is impractical if the aim is to achieve user independence. Aside from this limitation, the EOG may be an attractive option for communication and control as it can provide an inexpensive method of interfacing a user with a computer. The EOG also has other benets. It may have a wider range than visual techniques and is not subject to interference from spectacles worn in front of the eyes.

Acoustic Utterances Speech recognition technologies may not be an option for people who have lost their speech production abilities but still remain capable of making non-verbal but repeatable utterances. Phoneme detection is explored as an alternative acoustic signal. Phoneme recognition may be suitable for users with communication disorders such as apraxia, speech dysarthria or aphasia, who often are unable to use speech recognition systems because of their speech impairment. Phoneme recognition based systems may also be preferable in situations where control of a continuously varying parameter is required, such as volume control of a radio or television.

Skin Conductance Skin conductance was measured using the circuit in Appendix E, which outputs a voltage proportional to the conductance on the surface of the skin. Skin conductance was measured on the palmar surface of the ngers, since this area 216

typically has a higher number of sweat glands than other skin surfaces.


Signal Processing Techniques and Working Systems Developed

Muscle Contraction An MMG based communication and control application was developed, the code for which is given in Appendix I. A number of signal processing steps were performed which enable a switching action to be performed when the MMG amplitude increases suciently. The system was tested to assess the speed at which muscle contraction could be used to spell out a 9-word message on a software alphabet board. The average speed over the four users and the two muscles was 1.56 words/min with an average of 1.25 errors. While this is a very slow method of communication compared to natural speech, for people who are severely disabled it could provide an invaluable tool to enable communication where they might otherwise have none. Two visual-based methods of measuring movements were also developed, the Frame Comparison Method and the Path Description Method. These methods enable switching actions to be performed using ickers of movement which are detected using a computer camera. The algorithms presented were incorporated into a software computer program which allows the persons movement to be recorded and will generate a switching action on repetition of that movement. The code for this program is given in Appendix I.

Eye movements A novel technique of using the electrooculogram as a control signal for the disabled was presented in Chapter 4, known as Target Position Variation. 217

Target Position Variation is based on the principle of monitoring the users EOG to look for oscillations of known amplitude which identify when a user is looking at a particular target moving on screen. Two possible applications for Target Position Variation were described. It may be used in menu selection to detect one of a number of options by tracking the target for that option moving on screen. It may also be used as part of an eye-gaze controlled eye cursor program to enable automatic software re-calibration of the eye position.

Acoustic Utterances A number of phoneme recognition based systems were presented in Chapter 6. The spectral and temporal features of the two phonemes /o:/ and /s/ were explored to assess how they may be distinguished by a phoneme recognition system. Systems were developed in both hardware and software based on recognition of these two phonemes to control a reading machine and an environmental control menu. The microcontroller code for the hardware application is given in Appendix H. The C++ code used for the environmental control menu is given in Appendix I. A more exible phoneme recognition approach was then investigated which allows arbitrary utterances to be associated with switching actions. A pitch detection algorithm was developed and used to control the vertical position of a pointer (the bee) moving over an alphabet board. This may allow a person to spell out messages using the pitch of their voice in conjunction with another non-periodic utterance. The code is given in Appendix I.


Patient Testing

Many of the systems developed here were tested with patients in the hospital in the NRH to assess their suitability for communication and control purposes. The Natterbox was probably the most widely used program. This was modi-


ed to run directly from a compact disc which was given to the therapists in the hospital. The therapists could then independently choose an appropriate mechanical switch which could enable each particular patient to operate this program. For patients with a more severe level of disability, some of the methods described in this thesis were used to discover the most suitable method of harnessing a body signal to provide a switching action. Two cases will be outlined here. The rst case was a male patient in his 50s who had suered a brainstem stroke. This stroke had left him completely paralysed, so much so that he was almost completely locked-in. In fact, it took some time after he had suered the stroke before it was realised that he had retained almost complete mental facilities. His only voluntary movement was an ability to slightly move his eyes upwards. This movement was almost imperceptible to the naked eye but it was considered that the EOG may oer a suitable means of harnessing this action. The vertical EOG was acquired using a National Instruments data acquisition card and thresholded in Simulink. By choosing a suitable threshold each time, it was possible to detect when the patient moved their eyes upward and actuate a switching action. This allowed him to use the Natterbox to spell out messages. The second case was a male patient in his 20s who had suered a road trac accident. This patient was almost completely paralysed from the neck down, although he did retain the ability to adduct and abduct his thumb and thus was able to use this action to operate a mechanical switch placed between his thumb and hand. This allowed him to use Natterbox and over time he became quite procient at spelling out messages to his friends, family and workers in the hospital. The patient was also a great music lover and some methods of oering him some independence to choose and play dierent songs and albums were considered. His music albums were uploaded to Windows Media Player. The graphical user interface of this software program allows a mouse to be 219

used to navigate through dierent albums and choose particular songs to play. The volume may also be controlled through this program. As this patient was incapable of using a conventional mouse, an alternative method of providing mouse cursor control was necessary. The mouse cursor may be controlled using three switches, using a program developed as part of the work here called Three-Switch Mouse, which is described in Section 2.4.4. The patient was already capable of using a mechanical switch so it was only necessary to identify two additional signals that could be used to provide the second and third switching actions. The patient had the ability to make slight neck rotations to the left and right. As briey described in Section 3.4.2, the EMG was recorded from his neck muscles to detect when he was moving his head in either direction. Thus he was able to use the Three-Switch Mouse to control the mouse cursor.


Biological Studies

During exploration of body signals that may be harnessed by disabled people for communication and control purposes, extensive studies were undertaken on the biological functions of the human body. Two distinct results of these studies were presented. These are the control model that was developed to model saccadic and smooth pursuit eye movements and a method for indirect measurement of the ring rate of the sympathetic nervous system.

Eye Movement Model A control model for the eye was developed which models rotation of the eye in either the horizontal or the vertical plane. In either plane, rotation of the eye is controlled by a pair of agonist-antagonist extraocular muscles. Contraction of one muscle rotates the eye in the positive direction, and of the other in the negative direction. In the model that was presented, these two muscles were condensed into a single equivalent bidirectional muscle, which can rotate 220

the eyeball in either direction. The eects of the eyes muscle spindle on the torque of the eye were also incorporated into the model in an inner feedback loop. The model was initially explored to study saccadic eye movements, which are movements where the eye suddenly jumps from one location to another. It was found that it was possible to use this model to simulate a saccadic movement that ts very well the measured EOG response. The model was then extended to assess its ability to correctly model smooth pursuit movements, which are the movements that occur when the eye is following a moving target, such as in the Target Position Variation method. Initial results seem to indicate that it is possible to predict smooth pursuit movements with this model.

Firing Rate Measurement Technique An original method for measurement of the sympathetic nervous system ring rate was also described. To the best of the authors knowledge there is no existing method for observation of this variable through non-invasive techniques. The measurement technique developed uses measurement of the conductance of the skin to observe the ring rate, through use of a skin conductance model in a feedback loop under PID control. Results from this model show that the modelled and measured skin conductanes seem to follow each other almost exactly, which seems to indicate that this method could be used to provide a low-cost, non-invasive tool for ring rate measurement.


Future Work
The Mechanomyogram

Further studies should be carried out on this signal to assess its ability to correctly detect weak muscle contractions in people with severe disabilities. Pat221

tern recognition techniques that allow dierentiation between dierent muscle actions should also be investigated in more detail for the MMG, ultimately to provide a means of operating multiple-switch operated systems.


Target Position Variation

Target Position Variation has been explored in principle and it is found that it is readily possible to detect when a user is looking at an on-screen target from analysis of their EOG. The next step is to integrate this method into a working EOG based system. In particular the application of TPV as an automatic re-calibration tool for an EOG based mouse cursor control system is of interest.


Visual Methods for Mouse Cursor Control

Mouse cursor control using the centre of brightness of the hand was mentioned in Chapter 5. Movement of a persons hand could be used to move the centre of brightness around and thus be translated into mouse cursor movements. It is important to identify intelligent ways of translating the centre of brightness co-ordinates into mouse cursor co-ordinates, so as to provide an intuitive means for cursor control. The path description method developed in Chapter 5 may be sensitive to gross movements by the user which can move their starting and nal positions from those used in the path description denition. This problem should be addressed, perhaps by using some sort of technique that can detect when this has occurred and instruct the user to move to the starting position and rerecord the new position. From there the new path of motion could be dened, taking into account that the plane may also have rotated from its original position.



Communication System Speed

Some possible future developments of the Natterbox have already been discussed in Chapter 2. The maximum communication rates achievable using any of the techniques described in this thesis are ultimately limited by the speed of the communication system used. It is important to investigate methods of increasing the speed of a communication system to enable faster communication rates, maybe through some type of text prediction algorithm. Modication of the Dasher program developed by Ward [25] for single switch operation could oer faster communication rates.


Multi-Modal Control Signals

All the methods of providing control signals described in this thesis are based on choosing one body signal which may be harnessed to provide a switching action or other control signal. In theory, if a person has the ability to generate two or more signals consecutively then this may provide a more accurate method of generating a control signal. For example, a system designed to actuate a switching action can monitor two or more signals from the body and be designed to respond only when it detects that the person has made both of these signals. Obviously, many of the patients encountered are so severely disabled that it may be dicult to recognise one action that may be voluntarily repeatable, never mind two. Even if two such signals can be identied, it may be dicult, if not impossible, for the patient to make both of these signals consecutively.


Other Vestigial Signals

Several vestigial signals from the body have been investigated to discover their potential for use for communication and control by disabled people. Of course, there may be other signals from the body that may be harnessed for communi223

cation and control purposes. Two suggestions of signals that may be explored in future research on this topic are tongue movements and whistling. People with injuries below C4 level usually have control of muscles in the neck and above. This generally includes the muscles in the tongue. The tongue has a large number of muscles which enable very precise movements to be performed and this could be of use for communication and control purposes. A possible method would be to use some type of mouthguard consisting of a number of sensor pads. The user could move their tongue to press on a particular sensor pad to perform a certain switching action associated with that pad. Since tongue movements can usually be quite exact, this could potentially oer a large number of dierent switching actions to be performed. Whistling is another acoustic signal that has been explored by others in the laboratory in the NRH. Whistle pitch may be used to control a continuously varying parameter, such as the mouse cursor position. For those who are unable to whistle unassisted, a whistle placed in the mouth could be used to provide a switching action.


[1] Central Statistics Oce., accessed 1st August 2005. [2] S L Glennen and D. C. DeCoste. Handbook of Augmentative and Alternative Communication. Singular Publishing Group, 1997. [3] W J Perkins and B F Stenning. Control units for operation of computers by severely physically handicapped persons. Journal of Medical Engineering and Technology, 10(1):2123, January/February 1986. [4] Joseph J. Lazzaro. Adapting PCs for Disabilities. Addison-Wesley Publishing Company, 1996. [5] A M Cook and S M Hussey. Assistive Technologies: Principles and Practice. Mosby, 1995. [6] Irish Health Website., accessed 30th June 2005. [7] J W Sharpless. Mossmans A Problem-Oriented Approach to Stroke Rehabilitation. Charles C Thomas Publisher, 2nd edition, 1982. [8] Dorlands Illustrated Medical Dictionary. W. B. Saunders Company, 27th edition, 1988. [9] M Johnstone. Chapter 1: Controlled Movement. In Restoration of Motor Function in the Stroke Patient. Churchill Livingstone, 2nd edition, 1983. [10] F Walshe. Diseases of the Nervous System. E & S Livingstone, 11th edition, 1970. 225

[11] J Oliver and A Middleditch.

Functional Anatomy of the Spine.

Butterworth-Heinemann, Reed Educational and Professional Publishing Ltd, 1991. [12] Coccyx: Wikipedia Online Encyclopedia. wiki/Coccyx, accessed 27th July 2005. Wikipedia Modication Date: 12th June 2005 22:27. [13] Spinal Cord: Wikipedia Online Encyclopedia. http://en.wikipedia. org/wiki/Spinal_cord, accessed 30th July 2005. Wikipedia Modication Date: 13th July 2005, 08:45. [14] K Whalley Hammell. Spinal Cord Injury Rehabilitation. Chapman and Hall, 1995. [15] Quadriplegia: Wikipedia Online Encyclopedia. http://en.wikipedia. org/wiki/Quadriplegia, accessed 27th July 2005. Wikipedia Modication Date: 16th July 2005 02:33. [16] D Grundy and A Swain. ABC of Spinal Cord Injury. BMJ Publishing Group, 3rd edition, 1996. [17] Irish Motor Neurone Disease Association., accessed 29th July 2005. [18] M Dunitz. Amyotrophic Lateral Sclerosis. Martin Dunitz Ltd, 2000. [19] Directors of Health Promotion and Education. facts. Bacterial meningitis, ac-

cessed 30th July 2005. [20] A F Bergen, J Presperin and T Tallman. Positioning for function: wheelchairs and other assistive technologies. Valhalla Rehabilitation Publications, 1990.


[21] J H Wells, S W Smye and A J Wilson. A microcomputer keyboard substitute for the disabled. Journal of Medical Engineering and Technology, 10(2):5861, March/April 1986. [22] R C Simpson and H H Koester. Adaptive one-switch row-column scanning. IEEE Transactions on Rehabilitation Engineering, 7(4):464473, December 1999. [23] H S Ranu. Engineering aspects of rehabilitation for the handicapped. Journal of Medical Engineering and Technology, 10(1):1620, January/February 1986. [24] R Damper. Text composition by the physically disabled: a rate prediction model for scanning input. Applied Ergonomics, 15:289296, 1984. [25] D J Ward and D J C MacKay. Fast hands-free writing by gaze direction. Nature, 418(6900):838, 2002. [26] David MacKay. Dasher: an ecient keyboard alternative. Interfaces, 60, Autumn 2004. [27] Dasher website., accessed 30th July 2005. [28] J R Wolpaw, N Birbaumer, W J Heetderks, D J McFarland, P H Peckham, G Schalk, E Donchin, L A Quatrano, C J Robinson and T M Vaughan. Brain-computer interface technology: A review of the rst international meeting. IEEE Transactions on Rehabilitation Engineering, 8(2):164173, June 2000. [29] R F Schmidt (editor). Fundamentals of Neurophysiology. Springer-Verlag, 3rd edition, 1985. [30] R D Keynes and D J Aidley. Nerve and Muscle. Cambridge University Press, 3rd edition, 2001.


[31] M Epstein and W Herzog. Theoretical Models of Skeletal Muscle. John Wiley and Sons, 1998. [32] A F Huxley. Muscle structure and theories of contraction. Progress in Biophysics and Biophysical Chemistry, 7:255318, 1957. [33] J G Broton C K Thomas and B Calancie. Motor unit forces and recruitment patterns after cervical spinal cord injury. Muscle and Nerve, pages 212220, February 1997. [34] C J De Luca. ing. Surface Electromyography: Detection and Record-

Delsys Inc. E-book:

SEMGintro.pdf, 2002. [35] J L Echternach. Introduction to Electromyography and Nerve Conduction Testing. Slack Inc., 2nd edition, 2003. [36] D Gordon and E Robertson. Electromyography: Recording. Univerisity of Ottawa, Canada, apa4311/emg_c.pdf accessed 31st July 2005. [37] Jang-Zern Tsai. Chapter 7: Nervous system. In J G Webster, editor, Bioinstrumentation. Wiley International, 2004. [38] R N Scott and P A Parker. Myoelectric prostheses: state of the art. Journal of Medical Engineering and Technology, 12(4):143151, July/August. [39] B Hudgins, P Parker and R N Scott. A new strategy for multifunction myoelectric control. IEEE Transactions on Biomedical Engineering, 40(1):8294, January 1993. [40] G-C Chang, W-J Kang, J-J Luh, C-K Cheng, J-S Lai, J-J J Chen and TS Kuo. Real-time implementation of electromyogram pattern recognition as a control command of man-machine interface. Medical Engineering & Physics, 18(7):529537, October 1996.


[41] S K Rogers and M Kabrisky. An Introduction to Biological and Articial Neural Networks for Pattern Recognition. SPIE Optical Engineering Press, 1991. [42] M H Hayes. Statistical Digital Signal Processing and Modeling. Wiley, 1996. [43] J Silva, W Heim and T Chau. MMG-Based classication of muscle activity for prosthesis control. In Proceedings of the 26th Annual International Conference of the IEEE EMBS, San Francisco, CA, USA, Sept 2004. [44] G Oster. Early research on muscle sounds. In Proceedings of the 11th Annual International Conference of the Engineering in Medicine and Biology Society, volume 3, page 1039, Seattle, WA, USA, November 1989. [45] C Orizio. Muscle sound: bases for the introdution of a mechanomyographic signal in muscle studies. Critical Reviews in Biomedical Engineering, 21(3):201243, 1993. [46] D T Barry. IEEE Transactions on Biomedical Engineering, 37(5):525531, May 1990. [47] M I A Harba and G E Chee. Muscle mechanomyographic and electromyographic signals compared with reference to action potential average propagation velocity. In Proceedings of the 19th International Conference EMBS, Chicago, IL, USA, Oct-Nov 1997. [48] R F Schmidt, editor. pg. 129, Fundamentals of Sensory Physiology. New York Springer, 1981, 2nd edition. [49] C Boylan. An exploration of the electro-oculogram as a tool of communication and control for paralysed people. University College Dublin, Ireland, Final Year Project Report, 2003. [50] P J Oster and J A Stern. Chapter 5, Measurement of Eye Movement. In Irene Martin and Peter H Venables, editors, Techniques in Psychophysiology. John Wiley and Sons, 1980. 229

[51] W Becker and A F Fuchs. Prediction in the oculomotor system: smooth pursuit during transient disappearance of a visual target. Experimental Brain Research, 57(3):562575, 1985. [52] K A Mason. Control Apparatus Sensitive to Eye Movement. US Patent #3462604, August 1969. [53] D A Robinson. A method of measuring eye movements using a scleral search coil in a magnetic eld. IEEE Transactions on Biomedical Engineering, 10:137145, 1963. [54] J Gips and P Olivieri. Eagle Eyes: An Eye Control System for People with Disabilities. In Proceedings of the 11th International Conference on Technology and Persons with Disabilities, March 1996. [55] J J Teece, J Gips, C P Olivieri, L J Pok and M R Consiglio. Eye movement control of computer functions. International Journal of Psychophysiology, 29:319325, 1998. [56] R Barea, L Boquete, M Mazo and E Lpez. System for assisted mobility using eye movements based on electrooculography. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 10(4):209218, December 2002. [57] M Mazo and the Research Group of the SIAMO Project. An integral system for assisted mobility. IEEE Robotics and Automation Magazine, pages 4656, March 2001. [58] Biocontrol Systems. EOG Biocontrol Technology and Applications. http: //, accessed 14th July, 2004. [59] A S Sedra and K C Smith. Microelectronic Circuits. Oxford University Press, 3rd edition, 1982. [60] E Burke, Y Nolan and A de Paor. An electro-oculogram based system for communication and control using target position variation. In IEEE 230

EMBS UK and RI Postgraduate Conference on Biomedical Engineering and Medical Physics, Reading, UK, July 2005. [61] Eyes: Wikipedia Online Encyclopedia.

wiki/Eyes, accessed 14th July 2005. Wikipedia Modication Date: 11:08, 14 July 2005. [62] M A Just and P A Carpenter. A theory of reading: From eye xations to comprehension. Psychological Reviews, 87(4):329254, 1980. [63] R J K Jacob. Eye movement-based human-computer interaction techniques: Towards non-command interfaces. Advances in Human-Computer Interaction, 4:151190, 1993. [64] L Stark. Neurological Control Systems: Studies in Bioengineering. Plenum Press New York, 1968. [65] G Westheimer. Mechanism of saccadic eye movements. AMA Archives Ophthalmology, 52:710724, 1954. [66] B Cogan and A de Paor. Optimum stability and mimimum complexity as desiderata in feedback control system design. In IFAC Conference, Control Systems Design, pages 5153, Bratislava, Slovakia, June 2000. [67] Y Nolan, E Burke, C Boylan and A de Paor. The human eye position control system in a rehabilitation setting. In International Conference on Trends in Biomedical Engineering, University of Zilina, Slovakia, September 7-9 2005. [68] R Edelberg. Chapter 9: Electrical activity of the skin. In Handbook of Psychophysiology. Holt, Rinehart and Winston Inc, 1972. [69] D P Burke. Real-time Processing of Biological Signals to Provide Multimedia Biofeedback as an Aid to Relaxation Therapy. MEngSc Thesis, University College Dublin, Ireland, 1998.


[70] W S T Hays. Human pheromones: have they been demonstrated? Behavioral Ecology and Sociobiology, 54(2):8997, 2003. [71] L E Lajos. The Relation Between Electrodermal Activity in Sleep, Negative Eect, and Stress in Patients referred for Nocturnal Polysomnography. PhD thesis, Department of Psychology, Louisiana State University, 2002. [72] L A Geddes and L E Baker. Principles of Applied Biomedical Instrumentation. Wiley, 3rd edition, 1989. [73] Electrodermal Activity. accessed 31st May 2005. [74] The Electrodermal Response. bem/bembook/27/27.htm, accessed 31st May 2005. [75] M Betke, J Gips and P Fleming. The Camera Mouse: Visual tracking of body features to provide computer access for people with severe disabilities. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 10(1):110, March 2002. [76] R B Reilly and M OMalley. Adaptive noncontact gesture-based system for augmentative communication. IEEE Transactions on Rehabilitation Engineering, 7(2):174182, June 1999. [77] W J Weiner and A E Lang. Movement Disorders: A Comprehensive Survey. Futura Publishing Company, Mount Kisco, New York, 1989. [78] D A Forsyth and J Ponce. Chapter 7:Linear Filters. In Computer Vision: A Modern Approach. Pearson Education International, Prentice Hall, 2003. [79] J R Parker. Practical Computer Vision Using C. John Wiley & Sons, Inc., 1994. [80] L Rabiner and B Juang. Chapter 47, speech recognition by machine. In The Digital Signal Processing Handbook. CRC Press LLC, 1998. 232,

[81] D Kershaw et al S Young, G Evermann. The HTK Book. E-Book, Microsoft Corporation, 1995. [82] P B Denes and E N Pinson. The Speech Chain:The Physics and Biology of Spoken Language. Bell Telephone Laboratories, 1963. [83] J R Deller, J G Proakis and J H L Hansen. Discrete-Time Processing of Speech Signals. Macmillan, 1993. [84] I Johnston. Measured tones: The interplay of physics and music. Institute of Physics Publishing, 2nd edition, 2002. [85] H M Kaplan. Anatomy and Physiology of Speech. McGraw-Hill, 2nd edition, 1971. [86] Summer Institute of Lingustics Website.

linguistics/GlossaryOfLinguisticTerms/, accessed 12th April 2005. [87] IPA Chart for English: Wikipedia Online Encyclopedia. http://en.\_chart\_for\_English, accessed 11th May 2005. Wikipedia Modication Date: 07:01, 5th May 2005. [88] S Lemmetty. Chapter 3, Phonetics and Theory of Speech Production. In Review of Speech Synthesis Technology. ~slemmett/dippa/index.html, accessed 11th May 2005. [89] A Hiberno English Archive. cessed 11th May 2005. [90] R A Penfold. An Introduction to PIC Microcontrollers. Babani Electronics Books, 1997. [91] Open Sound System Documentation., accessed 27th June 2005. [92] J G G Dobbe. Algorithm alley - fast fourier transform. In Dr. Dobbs Journal. February 1995. 233, ac-

[93] R A Penfold. Electronic Hobbyists Data Book. Babani Electronics Books, 1996.


Appendix A MMG Circuit

The circuit diagram for the MMG circuit in Section 3.5.2 is given in Figure A.1. The purpose of this circuit is to remove the 2.5V oset from the signal, to set the bandwidth and to amplify the signal. The bandwidth is set to 200Hz by choosing Cx = Cy = 0.027F , as described in the ADXL203E datasheet from Analog Devices. The gain of the circuit is 2R =2 R R is chosen to be 1M so as not to load the accelerometer appreciably. (A.1)


+5V Vs R X Y


ADXL203E 236


2R R 2R






Figure A.1: MMG Circuit

Appendix B Simulink Models

Four simulink models are given here. The rst is the model used to detect muscle contraction from the MMG. The second model is the model used to simulate a saccadic jump. The third is the model used to simulate a smooth pursuit movement. The last model is the model used to observe the ring rate of the sympathetic nervous system based on measurement of skin conductance.


238 Figure B.1: Simulink block diagram used to detect muscle contraction from the MMG

239 Figure B.2: Simulink model used to simulate a saccadic jump of 15 or 0.2618 rad. The output is shown in the main text in Figure 4.19

240 Figure B.3: Modied Simulink Model used to simulate a smooth pursuit movement. The output is shown in the main text in Figure 4.23

241 Figure B.4: Simulink model used to observe the ring rate of the sympathetic nervous system based on measurement of the skin conductance.
The estimated ring rate y is shown in Figure 4.29 and g and gm are shown in Figure 4.28.

Appendix C MATLAB Code for TPV Fit Function

This is the MATLAB code used to generate the four t functions values in Figure 4.14. The data used was sampled at 200Hz and 45s long so contained 9000 samples overall. The name of the array with the original data is EOG.signals.values. The four frequency components that are present in this signal and which the t functions seek to identify are 0.2Hz, 0.4Hz, 0.8Hz and 1.6Hz and the variable f is set to each of these values in turn and the following code is run to generate each of the t function values. SE=EOG.signals.values(1:9000); SET=transpose(SE); t=(0:0.005:44.995); p=SET.*exp(i*2*pi*f*t); if (f==1.6) period = 125; end


if (f==0.8) period = 250; end if (f==0.4) period = 500; end if (f==0.2) period = 1000; end for n=period:9000 c(n) = sum(p(n-period+1:n))/(0.5*period); avg(n)=sum(SE(n-period+1:n))/period; end for n=period:9000 r(n) = sum(abs(SET(n-period+1:n)-avg(n)-real(c(n))* cos(2*pi*f*t(n-period+1:n)) - imag(c(n))*sin(2*pi*f*t(n-period+1:n)))); end


Appendix D Optimum Stability

The characteristic polynomial of the muscle spindle controller is: P (s) = (s2 + h1 s + h0).(s + 120)2 + 1 (f1 s + f0 ) J (D.1)

This fourth order equation has four roots. Placing all four at s = 240 gives a good match between the overall step response of the muscle spindle loop and a real step response, where the eye suddenly jumps (a saccadic movement). This value gives the following values for the four free parameters: h1 = 720 h0 = 158400 f1 = 15206.4 f0 = 2280960 (D.2)

Assigning the four roots of Equation D.1 to the same value gives the principle of optimum stability. If all the controller parameters but one are held at their nominal values, then, as that one is varied through its nominal value, the right-most root is as deep in the left half plane as possible [66]. This will be demonstrated now for each of the four parameters.


Root Locus as f0 varies




Imaginary Axis





400 600








Real Axis

Figure D.1: Root locus of controller with transfer function given by Equation D.1
with values h1 , h0 and f1 as given in Equation D.2, and f0 varies from 0


Root Locus as f1 varies




Imaginary Axis





800 1200







Real Axis

Figure D.2: Root locus of controller with transfer function given by Equation D.1
with values h1 , h0 and f0 as given in Equation D.2, and f1 varies from 0


Root Locus as ho varies




Imaginary Axis



1500 800








Real Axis

Figure D.3: Root locus of controller with transfer function given by Equation D.1
with values h0 , f1 and f0 as given in Equation D.2, and h0 varies from 0


Root Locus as h1 varies





Imaginary Axis






500 500






Real Axis

Figure D.4: Root locus of controller with transfer function given by Equation D.1
with values h0 , f1 and f0 as given in Equation D.2, and h1 varies from 0


Appendix E Circuit Diagram for Measuring Skin Conductance

This is the circuit diagram used to measure conductance. The output voltage e0 is proportional to the conductance of the skin. Each side of the skin conductance block in the diagram corresponds to each of the two electrodes that are used to measure skin conductance.


5V C1




e1 e2



250 2.4k

e0 = 2[e1 + e2 ]


Figure E.1: Circuit used to measure skin conductance

Appendix F Phoneme Detection Circuit Diagrams and Circuit Analysis

The analogue circuit and microcontroller circuit used for phoneme detection are given here.


Analogue Circuit

The analogue circuit diagram is given in Figure F.1. It consists of seven stages - pre-amplier, ltering, amplier, rectier, threshold, delay/comparator and relays.


Gain = 10 103 R2 = = 10 R1 1 103 (F.1)




Two band-pass lters were used, one to pass the low-frequency, narrow-band signal of an /o:/ sound (which we will call Filter A) and one to pass the high-frequency, wide-band signal of an /s/ sound (Filter B).

Filter A The band-pass lter contains a potentiometer R4 which may be adjusted according to users pitch to adjust the centre frequency of the band-pass lter. The transfer function of the lter is: ( T (s) = s2 1 )s R3 C3 (F.2)

1 C3 1 R3 +s (1 + )+ (1 + ) C3 R3 C2 R3 R5 C2 C3 R4 For any lter with a transfer function of the form: T (s) = the maximum gain |T (jw)|peak b0 s + a1 s + a0 occurs at = a0 and is of the value s2

b0 . a1

In this case, choosing R3 = 2k, C1 = C2 = 0.1F and R3 = 100k gives a gain of:

Gain =

1 C1 (1 + ) C1 R3 C2 1 3 0.1 106 2 10 = 1 0.1 106 (1 + ) 0.1 106 100 103 0.1 106 = 25

b0 = a1

1 R1 C1


The centre frequency is given by: c = = 1 R3 (1 + ) R3 R5 C2 C3 R4 (5 105 )(1 + 252 2 103 ) R4 (F.5)

As Equation F.5 shows, the centre frequency can be altered by adjusting the value of R4 to suit dierent pitches. A 1k potentiometer gives: fc = 1.59kHz for R4 = 10 fc = 515.72Hz for R4 = 100 fc = 251.64Hz for R4 = 500 fc = 194.92Hz for R4 = 1k Comparing this to the spectrum of the phoneme /o:/ back in Figure 6.5, these values should be sucient to pass the fundamental and/or rst overtone of the correct phoneme over a range of pitches.

Filter B The circuit for the band-pass lter was obtained from pg. 35 of [93]. It contains four resistors, R6, R7, R8a and R9 and two capacitors C3 and C4. The component values are chosen to set the centre frequency according to the following formula: 0.159 RC C = C3 = C4 fc = R= R6 R7 (F.6)

(F.7) (F.8)

R8 = R9 = 10k Choosing fc = 5kHz and C=22nF gives R= 0.159 = 1.445k 5 103 22 209

The values of R6 and R7 depend also on the Q value required. For Q = 1: R6 = 2R, R7 = 0.5R For Q = 0.5: R6 = R7 = R 253

as Q , R1 R4 We want a bandwidth of approximately 1.25kHz (Q = 4). Choosing R1 = 180 and R4 = 12k ts the requirements.



The second amplier stage is only needed for the /s/ circuit. This is due to the fact that the /s/ phoneme is generally of a much lower intensity than the /o:/ phoneme, and also because the maximum gain of Filter B is only 12.5 (it is 25 for Filter A). The signal needs to be further amplied to provide a high enough signal to control the relay. The gain of the amplier stage is given by: Gain = R1 2 3.9 103 = = 3.9 R1 1 1 103 (F.9)



The rectier stage of the circuit is necessary to make sure the signal stays above a threshold for a sucient length of time (see Section F.1.5). It is basically just an envelope detector with a slow time constant (at least 10 times the maximum period of the signal). The rectier for the /o:/ phoneme we will call Rectier A, and for the /s/ phoneme, we will call Rectier B.

Rectier A The time constant of the circuit is given by: 1 = R10 C8 (F.10)

Choosing R10 = 270k and C8 = 220nF gives 1 = 0.0594s. The max period of the signal should be about 0.005s (200Hz). 254

Rectier B The time constant of the circuit is given by: 2 = R13 C9 (F.11)

Choosing R13 = 1M and C9 = 100nF gives 2 = 0.1s. The max period of signal should be about 0.001s (1000Hz).



The purpose of the threshold stage of the circuit is to ensure the signal is suciently large to turn on the switch. This is especially important in the case of an /s/ sound being made. Since this is a wide-band signal, in some cases, it may contain a small amount of low-frequency components of similar frequency to the frequency of the /o:/ phoneme. This could accidently close Switch A as well as Switch B if the thresholding stage is not performed to ensure that the amount of that frequency is high enough. A comparator is used for thresholding, if the signal is larger than a reference voltage, the output will be high, if not, the output will be low.

Threshold A The reference level is set using the following equation: Vref = R14 Vcc R14 + R15 (F.12)

Choosing R14 = 1k, R15 = 390 and Vcc = 9V gives Vref = 6.47V.

Threshold B Vref = R16 Vcc R16 + R17 (F.13)

Choosing R16 = 4.7k, R17 = 10k and Vcc = 9V gives Vref = 2.88V. 255


Delay and Comparator

This stage is only required for the /s/ part of the circuit. The circuit is basically an integrator with an output given by: Vout = Vcc t (R18 + R19 )C10 (F.14)

The time constant of the circuit is 3 = (R18 + R19 )C10 (F.15)

When the input signal is high (Vcc ), the output signal will initially be also at Vcc . As long as the input stays high, the output will begin to drop. At t = 3 the output will be at 0V. This output is connected to the inverting terminal of a comparator, with its non-inverting terminal connected to ground. As long as the input is higher than ground, or 0V, the comparator output will be low. When the signal reaches 0V at t=3 , the output of the comparator suddenly changes to Vcc causing the switch to close. Choosing R18 = 10k, R19 = 560k and C10 = 470nF gives 3 = 0.2679s.



The switching action is performed using relay coils. When sucient voltage is dropped across the coil, the switch closes. The coil requires a voltage drop of about 5V across it, and has quite a low resistance of 83.3. Therefore it requires an output capable of supplying at least 60mA to close the switch.







C3 C2 R3

R5 +

C8 +
R14 R15 +9V


+ Microphone
C4 R6 C5 R7 R8 +9V R9 C6 C7 R12 C10



R18 R17




R16 +9V








Figure F.1: Circuit Diagram for Phoneme Detection. Component Values are given in Table F.1

Table F.1: Component Values for circuit in Fig F.1 R1 R2 R3 R4 R5 R6 R7 R8 R9 1k 10k 2k 1k(pot) 100k 12k 180 10k 10k C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 20F 100nF 100nF 22nF 22nF 4.7F 10F 220nF 200nF 470nF

R10 270k R11 1k R12 3.9k R13 1M R14 1k R15 390 R16 4.7k R17 10k R18 10k R19 560k



Op-amps 741 or 3140



Microcontroller Circuit

The microcontroller circuit diagram is given in Figure F.3. It consists of six stages - microphone, amplier, innite clipper, debouncing circuit, microcontroller and current amplier/relay coils.



The microphone input stage was designed for use with an electret condenser microphone. Most computer microphones are electret microphones and so a computer microphone may be connected to the circuit using a standard 2.5mm stereo jack. Electret condenser microphones exploit the phenomenon of capacitance changes due to mechanical vibrations (e.g. changes in air pressure due to sound pressure), to produce a voltage signal proportional to the sound wave. The electret microphone already has a built in charge but a few volts are needed to power the built in FET buer. A circuit diagram of an electret microphone is shown in Figure F.2. The three connections are the power, signal and ground. The signal that is output by the electret microphone usually has a few volts DC bias included, so this needs to be taken into account by using a capacitor to block the DC component.



The amplier stage has two purposes, amplication and moving the reference level so the signal rides around 4.5V. The reference level is set using two equal resistors R5 and a variable resistor R6 to allow the user to manually compensate for any deviation from midlevel. The reference level, Vref is given by: Vref = R5 Vcc 2R5 + R6 (F.16)

Vcc = 9V, R5=150k and R6 = 0 10k, allowing Vref to be adjusted between 4.5V and 4.645V. 259

+ve 5V

R1 C1 Vout


Microphone Gnd

Figure F.2: Circuit Required for Electret Microphone. R1 and C1 are usually
included within the microphone casing, R2 is a load resistor. Typical values: R1 = 2.2k, C1 = 10F and R2=10k.

Table F.2: Component Values for circuit in Fig F.3 R1 R2 R3 R4 R5 R6 R7 R8 R9 2.2k 10k 1M(pot) 680k 150k 10k(pot) 910 2.7k 2k XTL Diodes Op-amps Regulator7806L 4MHz 1N4148 741 C1 C2 C3 C4 10F 22F 22nF 100F

R10 15k R11 2.2k R12 15k











Voltage Regulator R3

+6V (6) (14) (4) PIC (17) (5) (15) (16) C3 XTL C3 R9 C4






R2 C1 R5 +

+ _



+6V (4) (3) PIC (18) (5) (15) (16) C3 XTL C3 (14)

R9 C4



R8 C2


Figure F.3: Circuit Diagram for PIC-Based Phoneme Detection. Component Values are given in Table F.2

The gain of this stage can be calculated using the following equation: Gain = R3 + R4 R2 (F.17)

R2 = 10k, to match the output impedance of the microphone. R4 = 680k and R3 = 0 1M , allowing the gain to be adjusted between 68 and 168.


Innite Clipper

This stage innitely clips the signal, using a comparator. The signal received from the output of the last stage is compared to Vref . If the signal is higher than the reference level the comparator goes into positive saturation and if it is lower the comparator goes into negative saturation. The comparator used is a 3140 which can be powered o 0V and 9V. This gives an output signal which switches between 0V and approximately 8V. The potential divider below is used to convert this signal to a level suitable for input to the PIC(6V). Vout = R8 Vin R7 + R8 2.7 103 = (8) 910 + (2.7 103 ) = 5.98V




The amplied, innitely clipped signal is input into the microcontroller. On detection of the correct sound, the microcontroller sends its output high for as long as the correct sound is detected. The methods used to determine if the correct phoneme was uttered are given in the main body of the text.


Debouncing Circuit

The debounce circuit prevents ickering by only allowing the switch to close once the output remains high for a set length of time. The op-amp acts as 262

a comparator. The inverting input of the op-amp at this stage is set to a reference level of V = 4.5V using the two equal valued resistors, both labelled R11, to divide the 9V supply. When the output of the PIC, Vpic , is high (6V ), the capacitor C4 begins to charge up with time constant = R10C4 = 150ms. When the capacitor is charged, the voltage V+ at the non-inverting terminal is V+ = R10 (Vpic VD ) R10 + R9 = 4.68V


where VD is the voltage drop across the diode (0.7V). Hence the output of the comparator will only turn on (6V) after a time slightly less than .


Current Amplier and Relay Coils

The maximum current available from the output of a 741 op-amp is only about 10mA. The relay coil has a resistance RCOIL = 83.3 which needs 5V dropped across it to close the switch, so requires available current of about 60mA. Hence a current amplier was needed. A BJT based circuit was used in common collector conguration. A suitable value for the base resistor, R12, was calculated as 12k (using BJT datasheet values = 250 and VBEON = 0.8V ).


Appendix G PIC 16F84 External Components and Pinout

The microcontroller used was a PIC16F84, powered by a 9V supply. The pin-out for this microcontroller is given in Appendix G. The only external components necessary for this stage are a crystal and two capacitors, which sets the clock rate. A 4MHz crystal and 222nF capacitors were used, which gives a clock speed of one command execution per s, a quarter of the crystal speed. The two phonemes are detected independently using two separate PIC16F84 microcontrollers, which both use a relay at their output to close a switch upon recognition of the appropriate phoneme. The relay-switched outputs can be connected to the switch inputs of the reading machine or those of another device. The code programmed onto each of the two microcontrollers is given in this appendix, and described in Section 6.5.2. chkooo.asm is the code for detection of /o:/ and chksss.asm is the code for detection of /s/.




Figure G.1: Pin-out for PIC 16F84 (see [90]).


Appendix H Phoneme Recognition Microcontroller Code and Flowchart

Code for Detection of [s] Phoneme ;chkooo.asm STATUS equ 3 PORTA equ 5 PORTB equ 6 TMR0 equ 1 OPT equ 1 INTCON equ 0BH #DEFINE OO_OUT PORTA,0 #DEFINE SS_OUT PORTA,1 #DEFINE ZERO STATUS,2 #DEFINE PGNO STATUS,5 CNTR1 equ 0CH CNTR2 equ 0DH INT_OLD equ 0EH INT_NEW equ 0FH OO_INTS equ 2AH ;start of code 266

org 0 goto init org 4 goto isr ; Congure inputs and outputs init bsf PGNO ;Select page 1 clrf PORTB clrf PORTA movlw 10H movwf PORTA ;RA4 is input (TMR0 Clock Input) movlw 01H movwf PORTB ;RB0 is input (external interrupt) bcf PGNO ; Select page 0 ; Initialise values bcf OO_OUT bcf SS_OUT ;congure timer bsf PGNO bcf OPT,5 ;use internal clock for TMR0 bcf OPT,3 ;use prescaler with RTCC bsf OPT,2 ;256 bsf OPT,1 bsf OPT,0 bsf OPT,6 ; external interrupt occurs on the rising edge of signal bcf PGNO clrf INTCON bsf INTCON,7 ;enable interrupts bsf INTCON,4 ;enable external interrupt bsf INTCON,5 ;enable timer overow interrupt clrf TMR0 clrf INT_OLD 267

clrf OO_INTS ; innite loop, can be interrupted by service routine loop goto loop ;interrupt service routine isr btfsc INTCON,2 ;check if interrupt caused by timer overow goto ovrflw ;check freq<1000Hz movf TMR0,0 movwf INT_NEW andlw b11111100 ;result is zero if timer is <4 (1ms) - freq too high btfsc ZERO goto set_low goto compare ;compare two consecutive intervals compare movf INT_NEW,0 subwf INT_OLD,0; subtracts W from INT_OLD btfss STATUS,0; If result is negative then complement xorlw b11111111; bitwise complement result of subtraction andlw b11111100; bitwise AND the result with 11111100 btfss ZERO ;zero if two numbers are similar (< 1ms dierence) goto set_low goto chk_int set_low clrf OO_INTS;reset ooh intervals counter if intervals were dierent bcf OO_OUT movf INT_NEW,0 movwf INT_OLD; copy INT_NEW into INT_OLD clrf TMR0 bcf INTCON,1 retfie chk_int incf OO_INTS,1 268

movf INT_NEW,0 movwf INT_OLD ;copy INT_NEW into INT_OLD ;check if OO_INTS has reached 4 movf OO_INTS,0 sublw d4 btfsc ZERO ;zero if 4 consecutive intervals are similar goto set_oo clrf TMR0 bcf INTCON,1 retfie ;return from interrupt routine (back to dummy loop) set_oo bsf OO_OUT decf OO_INTS clrf TMR0 bcf INTCON,1 ;reset external interrupt ag retfie ovrflw bcf OO_OUT clrf OO_INTS bcf INTCON,2 ;reset timer overow ag retfie end Code for Detection of [s] Phoneme ; chksss.asm - written 16/1/2003 STATUS equ 3 PORTA equ 5 PORTB equ 6 TMR0 equ 1 OPT equ 1 INTCON equ 0BH #DEFINE SS_OUT PORTA,1 #DEFINE OVRFLW INTCON,2 #DEFINE PGNO STATUS,5 269

CNTR1 equ 0CH CNTR2 equ 0DH ;code starts here org 0 goto init ; Congure inputs and outputs and timer init bsf PGNO ;Select page 1 clrf PORTB clrf PORTA movlw 10H movwf PORTA ;RA4 is input (TMR0 Clock Input) bsf OPT,5 ; Use external clock for TMR0 bsf OPT,3 ; Dont use prescaler with RTCC bsf OPT,4 bcf PGNO ; Select page 0 ; Initialise values bcf SS OUT chk_sss clrf INTCON ;disable interrupts and reset timer overow ag clrf CNTR1 movlw b00001100 ; 12 decimal movwf CNTR2 movlw b11101100 ;236 decimal movwf TMR0 loop1 decfsz CNTR1,1 goto loop1 decfsz CNTR2,1 goto loop1 ; 10.24ms loop btfsc OVRFLW goto set_op bcf SS_OUT goto chk_sss set_op bsf SS_OUT goto chk_sss 270




is interrupt caused by timer overflow? NO is interval between this interrupt and the last too small? NO is interval of similar length to last interval? YES has number of consecutive similar intervals reached 4? YES SET OUTPUT PIN HIGH

YES clear output pin clear interval counter RETURN YES record interval clear timer NO

Figure H.1: Flowchart for Interrupt Service Routine in PIC program to detect
utterance of the phoneme /o:/


Appendix I Code for Programs

The code for the programs is on the included CD. The les included are as follows:



Source code main.cpp main.h Executable natter.exe


USB Switch

Source Code main.cpp Icon les These les are used to generate an iconic indicator on the system toolbar for mouse cursor control: icon1.ico icon2.ico resource.res 273



MMG Detection Program

Source Files main.cpp mmg3.cpp* msgproc.cpp setup.cpp Header Files main.h setup.h mmg3.h* mmg3_common.h* mmg3_export.h* mmg3_prm.h* mmg3_reg.h*
*Modied Real-Time Workshop code generated for Simulink model mmg3.mdl


Path Description Program

Source Files main.cpp creategraph.cpp setup.cpp samplegrabber.cpp render.cpp Header Files main.h creategraph.h setup.h


samplegrabber.h render.h


Graphical Menu

Source Files main.cpp audio_widget.cpp draw_window.cpp _draw_button.cpp pct.cpp picture.cpp Header Files main.h audio_widget.h draw_window.h _draw_button.h pct.h picture.h The makele is also included.


Spelling Bee

Source Files main.cpp audio_window.cpp pct.cpp render.cpp setup.cpp sound.cpp


Header Files main.h audio_window.h pct.h render.h setup.h sound.h