SKINPUT Seminar Report

Submitted in partial fulfilment of the requirements for the award of the degree of Bachelor of Technology in Computer Science Engineering of Cochin University Of Science And Technology by BINU PV (12080025)

DIVISION OF COMPUTER SCIENCE SCHOOL OF ENGINEERING COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY KOCHI-682022 AUGUST 2010 DIVISION OF COMPUTER SCIENCE SCHOOL OF ENGINEERING COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY KOCHI-682022

Certificate

Certified that this is a bonafide record of the seminar entitled “SKINPUT” presented by the following student “BINU PV” of the VI th semester, Computer Science and Engineering in the year 2010 in partial fulfillment of the requirements in the award of Degree of Bachelor of Technology in Computer Science and Engineering of Cochin University of Science and Technology.

Mrs. LATHA R NAIR SEMINAR GUIDE ACKNOWLEDGEMENT

Dr. DAVID PETER HEAD OF DIVISION

I thank GOD almighty for guiding me throughout the seminar. I would like to than k all those who have contributed to the completion of the seminar and helped me with valuable suggestions for improvement. I am extremely grateful to Dr. David Peter, Head Of Division, Division of Comp uter Science, for providing me with best facilities and atmosphere for the c reative work guidance and encouragement. I would like to thank my coordinator an d guide, Mrs. Latha R Nair, Sr. Lecturer, Division of Computer Science, for a ll help and support extend to me. I thank all Staff members of my college and f riends for extending their cooperation during my seminar. Above all I would like to thank my parents without whose blessings; I would not have been able to accomplish my goal. ABSTRACT

Skinput is an input technology that uses bio-acoustic sensing to localize finger taps on the skin. When augmented with a Pico-projector, the device can provide a direct manipulation, graphical user interface on the body. The technology was developed by Chris Harrison, Desney Tan, and Dan Morris, at Microsoft Research's Computational User Experiences Group. Skinput represents one way to decouple in put from electronic devices with the aim of allowing devices to become smaller w ithout simultaneously shrinking the surface area on which input can be performed . While other systems, like Sixth Sense have attempted this with computer vision , Skinput employs acoustics, which take advantage of the human body's natural

…………16 Whole Arm………………….02 Bio-Sensing.. bone conduction)..22 .……………………17 BMI effect………………..…….…………………..20 Identification of Finger………..sound conductive properties (e....18 SUPLEMENTAL EXPERIMENTS……….16 Fore Arm……………….……………....19 Single handed gestures………. tracking markers.....…………...1 5..…………….……..7 4...05 Arm-band prototype…………………..1 4...…………12 Experimental conditions…………………….……………….05 Processing……………………………..6 4...8 5 5......3 3 3....3 5.5 4.…………12 Participants……………………….21 Segmenting Finger Input……….09 EXPERIMENTS…………………………………..3 4 4.2 4...01 RELATED WORKS……………………………….12 Design and setup………………….05 Bio-acoustic………………………….....2 2. This allows the body to b e annexed as an input surface without the need for the skin to be invasively ins trumented with sensors.1 2...…..1 3..2 5.………. or other items........2 3.19 Walking and Jogging………………………..…………14 Procedure……………………………………15 Five Fingers………………………..4 Title INTRODUCTION………………………………………….……………….4 4...... T ABLE OF CONTENTS Chapter No Page LIST OF FIGURES I 1 2 2.....………04 SKINPUT……………………………………………..……..……......02 Acoustic Input……………………….3 4..02 Always Available Input………………….g..

. Since it cannot simply make butt ons and screens larger without losing the primary benefit of small size.6 7 LIST OF FIGURES CONCLUSION…………………………………. However. users are unlikely to want to carry appropriated surfaces with them (at this point. and jog wheels) and consequentl y diminishes their usability and functionality.23 REFERENCE……………………………………. and one that happens to always t ravel with us: our skin.. Appropriating the human body as an input device is appe aling not only because we have roughly two square meters of external surface are a. arms.g. However.. proprioception – our sense of how our body is confi . buttons. there is one surface that h as been previous overlooked as an input canvas. On e option is to opportunistically appropriate surface area from the environment f or interactive purposes. and in a mobile context.…………. torso).g.. their small size typically leads to limited inte raction space (e. For example. tables are not always present. [10] describes a technique that allows a s mall mobile device to turn tables on which it rests into a gestural finger input canvas. on e might as well just have a larger device).……….24 No 1 Name Page Armband 04 Transverse wave propagation 06 Longitudinal Wave Propagation 07 Response of Tap 11 Input Location 14 Accuracy of whole arm 16 Accuracy of input location gr 17 Effect of BMI 21 2 3 4 5 6 7 oups 8 SKINPUT Chapter 1 INTRODUCTION Devices with significant computational power and capabilities can now be easily carried on our bodies. However. up per legs. conside r alternative approaches that enhance interactions with small mobile systems. but also because much of it is easily accessible by our hands (e. diminutive screens.. Furthermore.

Techniques based on computer vision are popular (e. uncomfortable.g. and suffers from privacy and scalability issues in shared environments.27]. which would be prohibitively complex and expensive. For example. and clap our hands together without visual assistanc e. This approach is feasib le.15]) is a logical choice for always-available input. however. e.g. or is merely hovering abo ve it.g. [13.. Post and Orth [22] present a “smart fabric” system that embeds sensors and conductors into a brick. e. are computationally expensive and error pro ne in mobile scenarios (where. but suffers from serious occlusion and accuracy limitations. but are cumbersome. These. but is li mited in its precision in unpredictable acoustic environments. 3) We assess the robustness and limitations of this system through a user study. touch the tip of our nose.26. a finger has tapped a button. In the present work.marker-based vision tracking system. non-input optical flow is prevalent). Other approaches have tak en the form of wearable computing. SOE 1 SKINPUT Chapter 2 RELATED WORK 2. non-invasive. and disruptive to tactil e sensation. we present our w ork on Skinput – a method that allows the body to be appropriated for finger input using a novel. we can readily flick each of our fingers. Speec h input (e. The Sixth-Sense project [19] proposes a mobile. In this paper. A number of alternative approaches have been proposed that operate in this s pace. see [7] f or a recent survey). an input system that does not require a user to carry or pick up a devi ce. 2. but taking this approach to always-available input necessitates embedding technology in all clothing.2 Bio-Sensing Skinput leverages the natural acoustic conduction properties of the human body t o provide an input system. and is thus related to previous work in the use of bi . de termining whether. always available input/output capability by combining projected informati on with a color. Few external input devices can claim this accurate.g. For example..gured in three-dimensional space – allows us to accurately interact with our bodie s in an eyes-free manner. eyes-free input character istic and provide such a large interaction area. For example. glove-bas ed input systems (see [25] for a review) allow users to retain most of their nat ural hand movements. This typically involves a physical input devi ce built in a form considered to be part of one’s clothing. wearable bio-acoustic sensor. we briefly explore the combination of on-body sensing with on-body projection. is extraordinarily difficult. [3. The contributions of this paper are: 1) We describe the design of a novel. wearable sensor for bio-acoustic signal ac quisition 2) We describe an analysis approach that enables our system to resolve the locat ion of finger taps on the body. Division of Computer Engineering.1 Always-Available Input The primary goal of Skinput is to provide an always available mobile input syste m – that is.

g.g. Moreover. as is constrained to finger motions in one hand. increasing the degree of invasiveness and visibi lity. Finally. [16. Signals traditionally used for diagnostic medicine. and require levels of focus. brain signals have been h arnessed as a direct input for use by paralyzed patients (e. yields classification accuracies a round 90% for four gestures (e. Performance of fa lse positive rejection remains untested in both systems at present. but dire ct brain computer interfaces (BCIs) still lack the bandwidth required for everyd ay computing tasks. training.18]). These leverage the fact that sound frequencies relevant to human speech propagate well through bone.17. this approach typically requires expensive amplification systems and the ap plication of conductive gel for effective signal acquisition. and through an HMM. the wrist). The input technology most rel ated to our own is that of Amento et al. SOE 3 SKINPUT 2. brain sensing technologies such as electroencephalography (EEG) & functional n ear-infrared spectroscopy (fNIR) have been used by HCI researchers to ass ess cognitive and emotional state (e. where they can sense vibrations propagating f rom the mouth and larynx during speech. These features are generally subconsciously drive n and cannot be controlled with sufficient precision for direct input. which would limit the acceptability of this approach for most users. The Hambone system [6] employs a similar setup. bone conduction microphones and headphones – now common consumer technologies . Division of Computer Engineering.11.g.14]). Bone conduction headphones send sound th rough the bones of the skull and jaw directly to the inner ear. bypassing transm ission of sound through the air and outer ear. snap fingers). both techniques required the placement of sensors near the area of interaction (e. have been appropriated for assessing a user’s emot ional state (e. At present.g. [2].g. leaving an unobstructed path for environmental sounds..3 Acoustic Input Our approach is also inspired by systems that leverage acoustic transmission . this work was never formally ev aluated.g. [8. raise heels. [9. However. SOE 2 SKINPUT heart rate and skin resistance.represent an additional bio-sensing technology that is relevant to the present work. such as Division of Computer Engineering. howe ver. In contrast. this work also prima rily looked at involuntary signals. Bone conduction microphone s are typically worn near the ear. There has been less work r elating to the intersection of finger input and biological signals.20])..ological signals for computer input. who placed contact microphones on a user’s wrist to assess finger movement. Similarly . [23. Researchers have harnessed the electrical signals generated by muscle activation during norm al hand movement through electromyography (EMG) (e. and concentration tha t are incompatible with typical computer interaction.24]).

Then we will describe the Skinput sensor and the processing techniq . In our prototype system. but found to be insufficientl y robust on the human body. Furthermore. Paradiso et al. a novel input technique that allows the skin to be used as a finger input surface. including a contiguous and flat area for projection (discussed subseq uently). To capture this acoustic information. we introduce Skinput. Both of these systems use acoustic time-o f-flight for localization. we developed a wearable armband that is non-invasive and easily removable In this section. This is an attractive are a to appropriate as it provides considerable surface area for interaction . Armband Division of Computer Engineering. for comput er augmentation of a real. we choose to focus on the arm (a lthough the technique could be applied elsewhere). which we explored. SOE 4 SKINPUT Chapter 3 SKINPUT To expand the range of sensing modalities for always available input systems. with a specific focus on the mechanical properties of the arm. we discuss the mechanical ph enomena that enables Skinput. [21] measured the arrival ti me of a sound at multiple sensors to locate hand taps on a glass window. the forearm and hands contain a complex assemblag e of bones that increases acoustic distinctiveness of different location s.through (non-body) input surfaces. [12] use a similar approach to localize a ball hitting a table. leading to the fingerprinting approach described in this paper. Ishii e t al. Figure 1.world game.

In addition to the energy that propagates on the surface of the arm. and classify bio-acoustic signals. Similarly. fingers). While we do not explicitly model the specific mechanisms of conduction. higher frequen cies propagate more readily through bone than through soft tissue. Bones are held together by li gaments. wrist. We highlight these two separate forms of conduction –transverse waves moving directly along the arm surface. the most readily visible are transverse waves. we do believe the success of our technique depends on the complex acoustic patterns that result from mixtures of these moda lities. creating location specific acoustic signatures.1 Bio-Acoustics When a finger taps the skin. The sensor is activated as the wave passes underneath it. The amplitude of these ripples is correlated to both the tapping force a nd to the volume and compliance of soft tissues under the impact area.g. which have negligible compliance. \ Figure 2. 3. and bone cond uction carries energy over larger distances than soft tissue conduction. These longitudinal (com pressive) waves travel through the soft tissues of the arm. In genera l. This makes joints behave as acoustic filters. several distinct forms of acoustic energ y are produced.. toward the skeleton. Roughly speaking. these will selectively attenuate spec ific frequencies. creatin g transverse waves (ripples). Some energy is radiated into the air as sound waves. exciting the bone. analyze. tapping on soft regions of the arm creates higher amplitude transverse waves than tapping on boney areas (e. In some cases. Transverse wave propagation: Finger impacts displace the skin. this energ y is not captured by the Skinput system. and longitudinal waves moving into and ou t of the bone through soft tissues – because these mechanisms carry energy at diff erent frequencies and over different distances. or depend on these mechanisms for our analysis. w hich is much less deformable then the soft tissue but can respond to mechanical excitation by rotating and translating as a rigid body. and joints often include additional biological structures such as fluid cavities. we also believe that joints play an important role in makin g tapped locations acoustically distinct. these ma y simply dampen acoustics. resulting in new longitudinal waves that propagate outward t o the skin. palm. these appear as ripples. SOE 5 SKINPUT length of the bone. When shot with a high-sp eed camera. created by the di splacement of the skin from a finger impact (Figure 2). in other cases. Among the acoustic energy transmitted t hrough the arm. which propagate outward from the point of c ontact. some energy is transmitted inward. .ues we use to segment. This excitation vibrates soft tissues surrounding t he entire Division of Computer Engineering.

piezo contact microphones [2]. As such. However. which we found in our empirical pilot studies to be vital in characterizing finger taps. these d evices are typically engineered for capturing human voice.to-noise ratio. Inc. narrow . Whi le bone conduction microphones might seem a suitable choice for Skinput. allowing the sensing element to be responsive to a unique. in tur n. Thus m ost sensors in this category were not especially sensitive to lower-frequency si gnals (e. SOE 6 SKINPUT Figure 3.Division of Computer Engineering. . a flat response curve leads to the capture of irrelevant frequencies and thus to a high signal. To overcome these challenges. we are able to alter the resonant frequency. because only a specific set of frequencies is conduct ed through the arm in response to tap input. However. Adding more mass lowers the range of excitation to which a sensor responds. 25Hz). By adding small weights to the end of the cantilever. we evaluated many sensing technologies. Specifically. lowDivision of Computer Engineering. we moved away from a single sensing element with a flat response curve. To capture the rich variety of acoustic information described in the previous section. Foremost. we found them to be lacking in sever al significant ways. Measurement Specialties.). these transducers w ere engineered for very different applications than measuring acoustics transmit ted through the human body. cantilevered piezo films (M iniSense100. Figure 4 shows the response curve for on e of our sensors. Longitudinal wave propagation: Finger impacts create longitudinal (com pressive) waves that cause internal skeletal structures to vibrate.. conventional microphones coupled with stethoscopes [10]. most mechanical sensors are engineered to prov ide relatively flat response curves over the range of frequencies that is releva nt to our signal. we weighted each element such t hat it aligned with particular frequencies that pilot studies showed to be usefu l in characterizing bio-acoustic input. we employ small. tuned to a resonant frequency of 78Hz. and accelerometers. including bone conduction microphones. This is a desirable property for most applications where a fai thful representation of an input signal – uncolored by the properties of the trans ducer – is desired. to an array of highly tuned v ibration sensors.g. creates longitudinal waves that emanate outwards from the bone (along its ent ire length) toward the skin. This. and filter out energy below the range of human speech (whose lowest frequency is around 85Hz). SOE 7 SKINPUT frequency band of the acoustic spectrum.

com). We tun ed the upper sensor package to be more sensitive to lower frequency signals. Each loca tion thus provided slightly different acoustic coverage and information. This reduced sample rat e (and consequently low processing bandwidth) makes our technique readily portab le to embedded processors. as these were more prevalent in fleshier areas. features two arrays of five sensi ng elements. Thus.. we tuned the lower sens or array to be sensitive to higher frequencies. a sampling rate that would be considered to o low for speech or environmental audio. the skin stretch in duced by many routine movements (e. with better acoustic coupling to the Humorous. Based on pilot data collection. For example. incorporated into an armband form factor. Conversely. However. which ru ns parallel to this on the medial side of the arm closest to the body. MEMS). when pl aced on Division of Computer Engineering. Each channel was sampled at 5. shearing motions caused by stretching).3 Processing In our prototype system. The decision to have two sensor packages was motivated by our focus on the arm for input. but was able to represent the relevan t spectrum of frequencies transmitted through the arm. one package was located near the Radius. we hoped to collect acoustic information from t he fleshy bicep area in addition to the firmer area on the underside of the arm. we employ a Mackie Onyx 1200F audio interface to digita lly capture data from the ten sensors (http://mackie. an arm-mounted audio player)..The curve shows a ~14dB drop-off 20Hz away from the resonant frequency. 3. In particular.g.. the bone that runs from the lateral side of the elbow to the thumb side of the wrist. Finally. we selected a different set of resonant frequencies for each sensor package (Table 1). 3. shown in Figures 1 and 5. rendering it suitable for inclusion in future mobile devic es (e. in order to better capture signals transmitted though (denser) bones. helpful in disambiguating input location..g. the main bone that runs from sho ulder to elbow. and could therefore provide the full sampling power required for Skinput (55kHz total). This was connected vi a Firewire to a conventional desktop computer. the sensors are highly responsive to motion perpendicular to th e skin plane – perfect for capturing transverse surface waves (Figure 2) and longi tudinal waves emanating from interior structures (Figure 3). Data was then sent   .2 Armband Prototype Our final prototype. Additiona lly. on the forearm. the cantilevered sensors were naturally insensitive to forces parallel to t he skin (e. the ATmega168 processor employed by the Arduino platform can sample analog readings at 77kHz with no loss of precision. our sensor design is relatively inexpensive and can be manufactured in a very small form f actor (e. reaching for a doorknob) tends to be atte nuated.5kHz.g.g. When the sensor was placed below the elbow. SOE 8 SKINPUT the upper arm (above the elbow). where a thin client written in C interfaced with the device using the Audio Stream Input/ Output (ASIO) protocol. and the other near the Ulna.

we calculate all average amplitude r atios between channel pairs (45 features). If the intensity did not fall below a second. this heuristi c proved to be highly robust.g. We calculate a 256-point FFT for all ten channels. it classified these input instances. Subsequent feature selection established the all-pairs amplitu de ratios and Division of Computer Engineering. features are computed o ver the entire input window and do not capture any temporal dynamics. Third. the waveforms are analyzed. Although simple. SOE 9 SKINPUT stream was segmented into individual taps using an absolute exponential average of all ten channels (Figure6. Second. although o nly the lower ten values are used (representing the acoustic power from 0Hz to 1 93Hz). point impacts) meant acoustic signals were not particu larly expressive over time (unlike gestures. independent “closing” threshold (Figure 6. a rough estimation of the fundamental frequency of the signal displacing each sensor (10 features). We also include an average of these r atios (1 feature). man y of which are derived combinatorially. When an intensity threshold was exc eeded (Figure 6. mainly due to the extreme noise suppression provid ed by our sensing approach. Signa ls simply diminished in intensity overtime. If start and end crossings were detected that satisfied these criteria. we include the av erage amplitude. First. We empl oy a brute force machine learning approach. the program recorded the timestamp as a potential start of a tap. The highly disc rete nature of taps (i. We also include the center of mass of the power sp ectrum within the same 0Hz to 193Hz range for each channel.. Thus. e. clenching of the hand). the acoustic data in that period (plus a 60ms buffer on either end) was considered a n input event (Figure 6. yielding 100 features. upper blue line).amplitude FFT value found on any channel. Upper Array 25 Hz 27 Hz 30 Hz 38 Hz 78 Hz Lower Array 25 Hz 27 Hz 40 Hz 44 Hz 64 Hz Table 1. red waveform). Resonant frequencies of individual elements in the two sensor packages. T he audio Division of Computer Engineering.e. it segm ented inputs from the data stream into independent instances (taps). vertical green regions). it provided a live visualization of the data from our ten sensor s. For gross information. written in Java. computing 186 features in total. SOE 10 . From these. After an input has been segmented. which was useful in identifying acoustic features (Figure 6). This program performed three key fu nctions. lower purple line) between 100ms and 700ms after the onset crossing (a duration we found to be the common for finger impacts).from our thin client over a local socket to our primary application. the event was discard ed. These are normalized by the highest. standard deviation and total (absolute) energy of the waveforms in each channel (30 features).

A full description of VMs is beyond the scope of this paper (see [4] for a t utorial). SOE 11 SKINPUT Chapter 4 EXPERIMENT 4.3). Our software uses the implementation provided in the Weka machine lear ning toolkit [28]. When using Skinput to recognize live input. T his stage requires the collection of several examples for each input location of interest. Figure 4: Ten channels of acoustic data generated by three finger taps on the fo rearm. Thus. These 186 features are passed to a Support Vector Machine (SVM) classifier. Ages ranged from 20 to 56 (mean 38. These are fed into the tr ained SVM for classification. followed by three taps on the wrist. These participants represented a diverse c ross. As can be seen in our video. and computed body mass indexes (BMIs) ranged from 20.9 ( . Division of Computer Engineering. The exponential average of the channels is shown in red.section of potential ages and body types.SKINPUT certain bands of the FFT to be the most predictive features.1 Participants To evaluate the performance of our system. It should be noted. Before the SVM can classify input instances. Any int eractive features bound to that event are fired. We use an event model in our software once an inpu t is classified.5 (normal) to 31. the same 186 acoustic fea tures are computed on-thinly for each segmented input. Note how different sensing element s are actuated by the two locations. however. the results prese nted in this paper should be considered a baseline. they recruited 13 participants (7 f emale) from the Greater Seattle area. more sophisticated c lassification techniques and features could be employed. an event associated with that location is instantiated. that other. Segm ented input windows are highlighted in green. it must first be trained to the user and the sensor position. we readily achieve interactive speeds.

obese). In addition to five finger tips. Whole Arm (Five Locations) A nother gesture set investigated the use of five input locations On the forearm and hand: arm. but it also relied o n an input surface (the forearm) with a high degree of physical uniformity (unli ke. One condition placed the sensor above the elbow.g. This allowed partici pants to accurately tap these locations without training or markings. could offer 19 readi ly identifiable input locations on the fingers alone. volume). pilot experi ments showed Division of Computer Engineering. we thus decided to place the sensor arrays on t he forearm. Forearm (Ten Locations) In an effort to assess the upper bound of our approach’s sensing resolution. as demonstrated when we count by tapping on our fi ngers. “Forearm”). just below the elbow. This condition was included to gauge how well use rs could target on-body input locations in an eyes-free context (e. described below. Additiona lly. these locations proved to be acoustically distinct during piloting.tofinger dexterity. Not only was this a very high d ensity of input locations (unlike the whole-arm condition). From these three groupings. are of particular interest with respect to interface design. nine minor). which further dampens signals. At the same time. they are d istinct and named parts of the body (e. “Who le Arm”).. SOE 12 SKINPUT measureable acoustic differences among fingers. driving). there are 14 knuckles (five major. interactions with the complex structure o f the wrist bones. “Fingers”). For this experimental condition. taken together. Second. 4.. the fingers are linearly ordered.g. which are even already well-na med (e. Despite these difficulties. with th e large spatial spread of input points offering further variation. our fifth and final experimental condition used ten lo cations on just the forearm (Figure 6. thumb and middle finger (Figure 7. and menu selec tion. we have exceptiona l finger. Foremost. “wrist”). the hand). they provide clearly discrete interaction points.2 Experimental Conditions We selected three input groupings from the multitude of possible location combin ations to test. which we theorize is primarily r elated to finger length and thickness.. We used these locations in three different conditions. The fingers offer intere sting affordances that make them compelling to appropriate for input.g. wrist. We selected these locations for two important reasons..g. illustrated in Figure 7. Additionally. Th is drastically reduces acoustic variation and makes differentiating among them d ifficult.. palm. pus h the limits of our sensing capability. Additionally. which is potentially useful fo r interfaces like number entry. and at the same time. which. First. with all but the thumb sharing a similar skeletal and muscular structure. we derived five different experimental conditions. acoustic information must cross as many as five (finger and wrist) joints to reach the forearm. We believe that these groupings. magnitude control (e. This was incorporated into the experim ent to measure the accuracy loss across this significant articulation point (the elbow). Finge rs (Five Locations) one set of gestures we tested had participants tapping on th e tips of each of their five fingers (Figure 6. e. while another placed it below. We expected that these factors would make acoustic sensing . ring finger). participants repeated the lower placement condition in an eyes-free context: participants were told to close their eyes and face forward. both for training and testing. and variations in the acoustic transmission properties of the muscles extending from the fingers to the forearm. fingers are among the most uniform appendages on the bo dy. Finally.g.

Right-hande d participants had the armband placed on the left arm.tech placeholders for projected. this location was compelling due to its large and flat surf ace area. tucked ag ainst their body. and ten marked poin ts on the forearm with the sensors above the elbow. which allowed them to use their dominant hand for finger input. we placed the armband ~7cm above the elbow.difficult. Whi le performing tasks. SOE 13 SKINPUT believe the forearm is ideal for projected interface elements. To maximize the surface area for input. As mentioned previously. both sighted and blind. which had no apparent effect on the operation of the system. we f lipped the setup. both visually and for finger input. in front of a desktop computer that presented stimu li. w e Division of Computer Engineering. five points on the whole arm with the sensors above the elbow. For conditions with sensors below the elbow. This was both to reduce confusion (sin ce locations on the forearm do not have common names) and to increase input consistency. we placed the armband~3cm away from the elbow. For the one left-handed participant. but comfortable. 4. as was done in the previously described conditions. Simultaneously. Rather than naming the input locations. most chose the latter. Participants were seated in a conventional office chair. with one sensor package near the radius and the other near the u lna. Division of Computer Engineering. the same points with sensors below the elbow.4 Procedure . the stickers serv ed as low. Figure 5: The three input location sets evaluated in the study. such that one sensor package rested on the biceps. SOE 14 SKINPUT 4. leaving the entire forearm free. or on the chair’s adjustable armrest. Moreover. Tightness of the armband was adjusted to be firm. participants could place their elbow on the desk. we placed the sensor above the elbow.3 Design and Setup We employed a within-subjects design. colored stickers to mark input targets. this makes for an ideal projection surface for dynamic interfaces. For conditions with the sensors above the elbow. as well as its immediate accessibility. with each participant performing tasks in each of the five conditions in randomized order: five fingers with sensors below elbow. we employed small .

“pi nky”. with each location appearing ten tim es in total. as in other conditions.7% (SD=10. The order of stimuli was randomized. 4. We used the training data to bui ld an SVM classifier.5 Five Fingers Despite multiple joint crossings and ~40cm of separation between the i nput targets and sensors. partici pants were instructed to comfortably tap each location ten times. Overall. Accuracy of the three whole-arm-centric conditions. Participants practiced duplicating these motions for approximately one minute with each gesture set. three rounds o f training data were collected per input location set (30 examples per location.6%. classification rates were high. Additionally.For each condition. classification accuracy remained high for the five-fin ger condition. “tap your wrist”). Segmentation. In total. This constituted one training round. In this section. age. If an input was not segmented (i . who often initially tapped unnecessarily hard.g. the experimenter walked through the input locations to be te sted and demonstrated finger taps on each. 200 data points total). SOE 15 SKINPUT Figure 6. The system performed real-time segmentation and classification. and to practice tapping their arm and hands with a finger on the op posite hand. Division of Computer Engineering. segmentation error rates were negligible in all conditio ns. with a finger of their choosing. averaging 87. 150 data points total). Error bars represe nt standard deviation. “wrist”). Overall. chance=20%) across participants. It also allowed us to convey the appropriate tap force to participa nts. we presented particip ants with simple text stimuli (e. the tap was too quiet). was essentially perfect. Inspection of the confusion matrices showed no systematic errors in the classification. with an average accuracy across cond itions of 87. We p rovided feedback so that participants could see where the system was making erro rs (as they would if using a real application). When classificatio n was incorrect. which instructed them where t o tap. and sex.e.g. and not included in further analysis. we report on the clas sification accuracies for the test phases in the five different conditions . we present preliminary results explorin g the correlation between classification accuracy and factors such as BMI. participants could see this and would simply tap again. and provided immediate feedback to the participant (e. “you tapped your wrist”). To train the system. This allowed p articipants to familiarize themselves with our naming conventions (e. During the subsequent testing phase.5% of .0%. Total training time for each experime ntal condition was approximately three minutes. the system believed the input to be an adjacent finger 60. An exception to this procedure was in the case of the t en forearm locations.g. where only two rounds were collected to save time (20 exam ples per location. with e rrors tending to be evenly distributed over the other digits.

only marginally above prior probability (40%). We believe that a dditional training data. SOE 16 SKINPUT eyes-free input condition yielded lower accuracies than other conditions . considered very strong). participants appeared to be able to tap locations wi th perhaps a 2cm radius of error. Figure 7.1% .72. This is not surprising. chance=20%).5% ( SD=10. The below-elbow placement performed the best.3% (SD=7.4%. which better covers the increased input variability. chance=20%). as this conditi on placed the sensors closer to the input targets than the other conditions. It was apparent from watching participants complete this condition that targeting precision was reduced . 4. The Division of Computer Engineering. Mov ing the sensor above the elbow reduced accuracy to 88. a drop of 7. This suggests there ar e only limited acoustic continuities between the fingers. F was cre ated following analysis of per-location accuracy data.7 Forearm Classification accuracy for the ten-location forearm condition stood at 81.5%. this margin of error appeared to double or triple when the eyes were closed.6 Whole Arm Participants performed three conditions with the whole-arm location confi guration.2%.the time. We would also caution designers developing eyes -free. The only potential exc eption to this was in the case of the pinky. where the ring finger constituted 63.5% (SD=5. a surprisingly strong result for an input set we devised to push our system’s sensing limit (K=0. posting a 95. Higher accuracies can be achieved by collapsing the ten input location s into groups. chance=20%) average accuracy. Although not formally captured. chance=10%). averaging 85. Figure 8 shows these resu lts. The goal of this exercise was to explore the tradeoff . but otherwise identical counterpart condition. Following the experiment. 4. This represents a 10. on-body interfaces to carefully consider the locations participants can tap accurately.8%.3% percent of the misclassifications. wo uld remove much of this deficit.0% (SD=9. This is almost certainly related to the acoustic loss at the elb ow joint and the additional 10cm of distance between the sensor and input targets. we considered different ways to improve accuracy by collapsing the ten locations into larger input groupings.5% drop from its visionassiste d. A-E and G were created using a design-centric strategy. In sighted conditions.

Data & observations from the exper iment suggest that high BMI is correlated with decreased accuracies. layout with high accuracy at six input location s (Figure 9. wh ich represents a particularly valuable input surface for application designers. A-E and G). Other factors such as age and sex. This included. p=. this search confi rmed that our intuitive groupings were optimal. males yielded higher classification accuracies than females.9 – representing borderline produced the three lowest average accurac ies. age and sex adjusted [5] (F1. those with BMI greater and less than the US national me dian. which may be correlated to BMI in specific populations.12=8. although irregular. this search revealed on e plausible. This effect was more prominent laterally than longitudi nally. Figure 10 illustrates this significant disparity here participants are sepa rated into two groups. Unlike in the five-fingers condition. 29. For most location counts. Division of Computer Engineering.Division of Computer Engineering. but we expe ct that this is an artifact of BMI correlation in our sample. We grouped targets into sets based on what we believed to be logical spatial gro upings (Figure 9. the prevalence of fatty tissue s and the density/ mass of bones.6. D and E). The partici pants with the three highest BMIs (29. we also performed an exhaustive search over all possible groupings. SOE 18 SKINPUT Chapter 5 SUPPLEMENTAL EXPERIMENTS . tend to dampen or facilit ate the transmission of acoustic energy in the body. there appeared to be shar ed acoustic traits that led to a higher likelihood of confusion with adjacent ta rgets than distant ones. and probably not an effect of sex directly.8 BMI Effects Early on. with a high degree of bilateral symme try along the long axis. To assess how these variati ons affected our sensing accuracy. In addition to exploring classification accuracies for layouts that we considered to be intuitive. however. For example. F). respectively. longitudinal groupings (B and C vs. we suspected that our acoustic approach was susceptible to variations in body composition. and 31. This is uns urprising given the morphology of the arm.013). most notably. Figure 9 illustrates this with lateral groupings consistently outperformi ng similarly arranged. SOE 17 SKINPUT between classification accuracy and number of input locations on the forearm. we calculated each participant’s body mass inde x (BMI) from self-reported weight and height.2. might also exhi bit a correlation with classification accuracy.65. 4. in our participant pool. These.

3% and 60. Considering that jogging is perhaps the hardest input filtering and segmentatio n test. In all cases. Similar to the previous experiment. two conditions). In the first additional experiment. with sensors couple d to the body. the system believ ed there was input when in fact there was not) and true positives (i. was 100%. the sys tem was able to correctly segment an intended input) was captured.3 mph and jogged at 4. in which participants were prompted to provide a particular input (ten times per input type). The male walked at 2.g. In regard to bio-acoustic sensing. In both walking trials. Participants only provided ten examples for each of three tested input location s.. and palm. This experiment explored the accuracy of our system in these sce narios. howe ver. 5. respectively.e. a nd walking and jogging represent perhaps the most common types of wholebody motion. noise created during other motions is particularly troublesome. Thus.We conducted a series of smaller. Furthermore. segmentation and classification were performed in real-time. Classification accuracy. mean age 26. we considered only bimanual gestures. we believe the chief cause for this decrease was the quality of the training data. lowered classification accuracy). the rate of false positives (i. the training examples were collected while participants were jogging. we view this result as extremely positive. input order w as randomized. Three input locations were used to evaluate accuracy: arm. we recruited seven new participants (3 female. a wristtap was recognized as a wri st tap) was Division of Computer Engineering. Additionally. true positive acc uracy was 100%. As before.9) from within our institution. Each participant trained and tested the system while walking and joggin g on a treadmill. the system had four false-positive input events (two per participant) over six m inutes of continuous jogging. decreased to 83. Classification accuracy for the inputs (e. We believe that more rigorous collection of training data could yield even s tronger results. In the jogging trials. the female at 1. but als o sparse – neither of which is conducive to accurate machine learning classificati on.1 Walking and Jogging As discussed previously.. Although the noise generated from the jogging almost certainly degraded the signal (and in turn. we recruited one male (age 23) and one female (age 26) for a single-purpose experiment.e. Meanwhile. acoustically-driven input techniques are often sensitiv e to environmental noise. targeted experiments to explore the feasibilit y of our approach for other applications.7% for the female (chance=33%). 5.3 mph. For the rest of the experiments. whe . True-positive accuracy. The testing phase took roughly three minutes to complete (four trials total: two participant s. where participants provided between 10 and 20 examples for each input type. wri st.0% for the male and female participants resp ectively (chance=33%). the syste m never produced a false positive input.2 Single-Handed Gestures In the experiments discussed thus far.1 mph. wh ich tested performance of the system while users walked and jogged.9 and 3. as with walking.. the sensor armband was placed just below the elbow. and a testing phase. the resulting training data was not only highly variable. SOE 19 SKINPUT 100% for the male and 86. each additional experiment consis ted of a training phase.

This motivated us to run a third and independent experiment that co mbined taps and flicks into a single gesture set. using the thumb as a catch. we asked participants to tap their index finger against 1) a finger on their other hand. are used to pro vide input. This was the focus of [2]. or recognition of different objects gra sped in the environment.3% (SD=4. and 3 ) an LCD screen. Furthermore. For example. suggesting a mechanism for high-accuracy. eyes-free input. 2) a paper pad approximately 80 pages thick. there are several interesting applications that cou ld take advantage of this functionality. &completed an independent testing round.3 Identification of Finger Tap Type Users can “tap” surfaces with their fingers in several distinct ways. and in particular the fingers. including workstations or devices compo sed of different interactive surfaces.1%.re the sensor-free arm. Participants re-trained the system. it became apparent that our system had some ability to identify the type of material on which the user was operating. Results show that we can identify the contacted object with about 87. chance=12. then rapidly flicking the fingers fo rward). Even with eight input c lasses in very close spatial proximity. l ending credence to the possibility of having ten or more functions on the hand a lone. 5.1% (SD=8.8%. one can use the tip of their finger (potentially even their finger nail) or the pad (flat. However. Even as accuracy stands now. the system was able to achieve are mark able 87. Using a similar setup to the main experiment. This result is comparable to the a forementioned tenl ocation forearm experiment (which achieved 81.e.6% (SD=5.3%. there are a range of gestures that can be performed with j ust the fingers of one hand. Accuracy was significantly lower for participants with BMIs above the 50th percentile. This capability was never considered when designing the system. chance=25%).5%) accuracy. Figure 8. so superior acoustic features may exist.5% accuracy).1%. We ran an identical experiment using flicks instead of taps (i..chance=25%) accuracy in the te sting phase. chance=33%) accuracy. Our system was able to identify the four input types with an overall accuracy of 89. . Division of Computer Engineering. This yielded an impressive 96. proprioception of our fingers on a single hand is quite accurate.Surface and Object Recognition During piloting. SOE 20 SKINPUT .8% (SD=3. although this wo rk did not evaluate classification accuracy.

It is also possible to use the knuckles (both major and minor metac arpophalangeal joints).4 Segmenting Finger Input A pragmatic concern regarding the appropriation of fingertips for input was that other routine tasks would generate false positives.g. and major knuckle. and 20 times on the surface of a t able in front of them. For example. This traini ng phase was followed by a testing phase. typing on a ke yboard strikes the finger tips in a very similar manner to the finger-tipinput w e proposed previously. chance=50%). we asked participants to tap their index finge r 20 times with a finger on their other hand.3% (SD=4. Division of Computer Engineering. c apacitive and vision-based systems). we had participants tap on a table situated in front of them in three ways (ten times each): finger tip. A classifier trained on this data yielded an average accuracy of 89. while a “pad-tap” activates an options menu. we set out to explore whether finger-to-fin ger input sounded sufficiently distinct such that other actions could be disre garded.5% (SD=4. SOE 21 SKINPUT bottom) of their finger. This ability has several potential uses. Thus.. This data was used to train our classifier.7%. SOE 22 SKINPUT Chapter 6 . while the latter mo re fleshy. which yielded a participant wide average accuracy of 94. As an initial assessment. finger pad.Division of Computer Engineering. chance=33%) during the testing period.5%. To evaluate our approach’s ability to distinguish these in put types. Perhaps the most notable is the ability for interactive touch surfaces to distin guish different types of finger contacts (which are indistinguishable in e. One example interaction could be that “double knocking” on an item opens it. The former tends to be quite boney. 5.

we have presented initial results demonstrating other potential uses of our approach. Results from our experiments have shown that our system per forms very well for a series of gestures. Addit ionally. taps with different parts of the finger. which we hope to further explore in future work. SOE 23 . We have described a novel. and differentiati ng between materials and objects. Division of Computer Engineering. we have presented our approach to appropriating the human body as an input surface. We conclude with descriptions of several prototype applic ations that demonstrate the rich design space we believe Skinput enables. These include si ngle-handed gestures. even when the body is in motion.CONCLUSION In this paper. wearable bio-acoustic sensing arra y that we built into an armband in order to detect and localize finger taps on t he forearm and hand.

1998. Deyle. 40-51.. Jun. G. Hill. Abstracts. Lung and Blood Institute. Nicolescu. Fabiani.S. and Terveen.R. 4. Oc t. C. P. 2. In Proc.. D. In Proc. 3. M. IEEE PERCOM ’06. 2007. Wolpaw. Argyros.. LNCS 3979. S.. In Proc.J... IEEE Tran s. and Musilek. June 1998. ISWC '07. 121-167. and Starner. M. Sept.I. A. L. Computer Vision and Image Understanding. Erol. Boyle. A Keystroke and Pointer Control Input Interface for Wearable Computers. Evaluation. 17. D ata Mining and Knowledge Discovery.. 1-8.A. . E. 5. Conversion o f EEG activity into cursor movement by a brain-computer interface (BCI).acoustic Fingertip Gesture Interface. T. 2-11. and Pfurtscheller. and Treatment of Overw eight and Obesity in Adults. Clinical Guidelines on the Identification.3. 724-725. G. Ahmad.J. A Tutorial on Support Vector Machines for Pattern Recognition. B. A. and Twombly X. and Lourakis. 8. Vision-based hand pose estimation: A review. The Sound of One Hand: A Wrist-mounted Bio.SKINPUT Chapter 7 REFERENCES 1.2. T. Amento.. 7. W..E. 12. Hambone: A Bio-Acoustic Gesture Interface.D. ECCV 2006 Workshop on Compu ter Vision in HCI. Vision-based Interpretation of Hand Gestu res for Remote Control of a Computer Mouse.. 331-8. on Neural Systems and Rehabilitation Engineering. Burges. 6.. National Heart. Poole. J.A.. G. 2. 2004. Bebis. McFarland. F. 108. In CHI ‘02 Ext. Palinko. R.

Division of Computer Engineering. SOE 24 .