A thesis submitted in partial fulfillment for a Bachelor of Science in Web Design and Interactive Media from the Art

Institute of Seattle

March 2011

By understanding which gestures are similar between cultures and what gesture vocabulary definition methods have had the most success, humans will appropriately assimilate an understanding of the plausibility and practicality of global hand and finger gesture technology into their lives.

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

Ever noticed that while people talk on the phone, they still gesticulate, regardless of the fact that the person on the end of the other phone line cannot see their hands? Since first humans stood up on two feet, hands have become a major portion of communication. Pointing at prey, pounding a chest while screaming, and touching are a few of the natural gestures that have evolved from this upward movement [Barfield, 1997]. Gestures are ingrained so deeply that we use them even when we’re on the phone talking to someone across town. The gestures are performed regardless of the fact that the interface isn’t portraying our communicative gestures to the recipient. Gesture understanding interfaces are becoming popular and affordable, which increases practicality. But are the gestures, decided by HCI5 professionals and developers, the right symbolic language for the job? Is the scope of their gesture vocabulary6 limited to malleable consumers, or are these gestures practical for all cultures and less tech savvy humans? This paper intends to evaluate the results of an evolution of gestures across cultures, in order to uncover the practicality and plausibility of a global gesture vocabulary set. If gestures fulfill these criteria, should implementation be done where a single individual or company decides the vocabulary to be used by all users? Or should it follow an open source mentality, and allow users to develop their own gesture motions, therefore defining custom vocabularies? By understanding which gestures are similar between cultures and what gesture vocabulary definition methods have had the most success, humans will appropriately assimilate an understanding of the plausibility and practicality of global hand and finger gesture technology into their lives. The urge to gesture is undeniable. It takes extra thinking and muscle control to keep gesticulation from accompanying human communication. Try giving directions down the street without any gesturing. Nonverbal behavior has accompanied speech for so long that some actions have moved into the subconscious. For example, when someone calls out to another to see if they are OK, the first thing one may 1

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

do is display an OK gesture. They will not think about which finger or arm to display, it just shoots out from their person, and is instantly understood on the receiving end. This type of gesture is used to provide a piece of information; it signals that they do not need immediate assistance. The gesture in less than one second has communicated a few sentences, validating how quickly the gesture effectively replaces short verbal communication. As soon as someone blasts through the finish line in first place, their arms shoot into the air in a victory gesture. They don’t think about how they are going to show their victory when they win, it’s a natural reaction to the situation. Unless they are in the National Football League, then a dance is required. Besides the NFL, the emotional state of the individual is clearly stated simply with their arms. According to Edward Warman in 1892, “All negative gestures fall below the level of the shoulder-line; all positive gestures rise above the level of the shoulder-line. This is fully illustrated by animals, their expressive agent being the tail.” Dogs are especially known for their expressive tails. Often it is the first thing the human examines when confronting a dog. When a dog’s tail is tucked between its legs, and its head is sunken down towards the floor, humans understand and empathize with the gesture. Humans communicate with animals all the time, and it is gestures which make this verbally impossible situation possible. This thesis paper does not intend to focus on subconscious nonverbal communication, unintentional gesturing, or talking with animals. It is driven by the fact that gestures are interwoven into human experience and communication. This is specially noted and used as a root for argument and analysis. Touching is as important in communication as gesturing. It can assist, accentuate or provide pinpoint information. Touching is a naturally occurring human gesture and communication method. It is unique; it has dual ways of sending and receiving information. We process and consider the feedback of the object being touched before making our next decision. Consider the simple gesture of touching a line in a book, it communicates and draws attention to the information instantly. Touching is fast, precise, and 2

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

globally understood. Pointing, similar to touching, is another instant way to communicate vital information by directing eyes towards the desired object. It is also a global hand gesture, fortunately, easy to implement in an interface. Touching and pointing start to fail as global gestural input methods when they demand specifics and require additional pinpoint information beyond x and y positions. For example, using both hands to point at separate objects could be interpreted as trying to communicate length, start and end points, selection points, size, width, etc. Pointing and touching are a firm foundation for the start of a gesture vocabulary. Hands are the most used body part in human gestures. According to Maria Karam almost 40% of all gestures are done with the hand, followed by semaphoric6 gesturing 30%, multiple hands at 20%, and the rest are distributed between the body and limbs (Figure 1). Over half of all human communication gestures are supported or driven by hands. It is obvious while watching any human communicating in their native language that hands drive the majority of the gesticulation4. Maria Karam also notes that gesturing travels over space very well, near or far: Waving across the airport at a family member, or brushing the hair of a companion. There are gesture zones which influence the type of gestures performed [Karam, 2006]. The first zone is an 3
Figure 1: [Karam, 2006]

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

intimate zone where hand gestures range from touching to 18 inches. The second zone is the personal zone, which starts at around 18 inches from our person and ends about four feet away (arms length). The third zone is known as the interpersonal space or social distance (four to eight feet) and the fourth zone is the public distance zone or anything more than 8 feet. The following will primarily obseve and examine the first and second zones. Ray Birdwhistell, an American anthropologist, coined the term kinesics which is the study of body language, facial expressions and gestures to interpret meaning. From his studies, he named an interesting set of gesture classifications. He identifies 5 types: emblems17, illustrators18, regulators19, adaptors20, and affect displays21. Emblems are gestures used in place of words, like the loser ‘L’ done with the thumb and pointer finger. Illustrators are co-verbal gestures7, like clapping your hands together in a squishing motion while talking about an opposing team. Regulators are gestures used to control the speed and flow of the communication. Adaptors are gestures which release physical or emotional tension, like wiping your forehead with the back of your hand. Affect displays are gestures which display emotion, and emotion plus gesture usually equals dramatic or exaggerated motions. Hand gestures could be classified using logic similar to Birdwhistell’s. I felt this closely related to Edward Warman, who in 1892 identified the main functions of a hand to be: define or indicate, affirm or deny, mold or detect, conceal or reveal, hold or surrender, accept or reject, inquire or acquire, support or protect, and caress or assail. The hand does so much in our first and second zones of communication. Gesticulation starts to make sense and appear similar between humans when the hand motions are classified as Warman describes. For example, Desmond Morris’ movie “The Human Animal – The Language of the Body,” presents individuals conversing in a public space from around the world. As an observer, without knowing any of the language, the gesticulation shows many clues about the conversation, especially when Warman’s and Birdwhistell’s theories are used as interpreters. 4

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

Warman says “the use of the little finger represents delicacy and refinement,” which could be the first clue of global gesture plausibility. The pinky is observed in gestures involving intricacy and pin point instruction across cultures. The pinky is only involved with other gestures though, and there is no single pinky related gesture that is cross culture. Study of global gestures quickly shows that cultures have customized gestures to fit their needs. The ‘thumbs up’ hand gesture is a classic example, but so are the ‘head nod’, ‘go away’ or ‘come here’ gestures. Thumbs up origins, according to Desmond Morris, were from the days of the gladiators in the Roman Coliseums [Rose, 2005]. At that time, thumbs down indicated a stabbing motion, a gesture meant to indicate ending the life of the losing gladiator. Thumbs up meant to let them live, or draw back the sword from the gladiator. The reason this gesture pair is a classic example of cultural differentiation, is if thumbs up can mean something entirely different to people in Iran, Afghanistan, Nigeria and parts of Italy and Greece, where it is an obscene insult [Axtell, 1991]. These being true, there is little hope for even an ‘OK’ or ‘yes’ symbol to have global
concurrence.

More interesting examples are the seemingly common hand gestures to ‘go away’ or

‘come here’. It is particularly important that these motions be examined because in communicating with computers, the entire experience is about moving through information. This means pulling and pushing different views of information to the user. Swiping on the iPad is a ‘go away’ and ‘come here’ type of command, pushing old out to let the next in. Unfortunately, the direction that people motion with their hands to gesture ‘go away’ or ‘come here’ is inconsistent. Desmond Morris evaluated this gesture across cultures and found that some people motion their hand away from themselves to invite something in. A pushing gesture logically makes sense as a rejection gesture, but others have evidently learned through their environment to interpret it as an inviting motion. This does not change the plausibility factor of this thesis. Gesticulation is unmistakably natural for humans, certainly more natural than a keyboard and mouse, or a rectangle with buttons that all look the same.

5

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

In 1972, Warren Teitelman used experimental programming to develop the concept called DWIM (‘Do What I Mean’), which included a set of gestures to substitute for functions. He was the first individual to connect gestures to computer functions with elegance. His intent was similar to that of this thesis. Defining a global vocabulary must start with a system that can efficiently read a user’s gestures and intelligently ‘do what they mean,’ is a way of approaching a definition for a global vocabulary. It could be very powerful if gesticulation was not so culturally specific. Finding a foundation for classifying hand gestures into groups and breaking them down into smaller, more manageable classes is vital to the gesture set’s success. This will make evaluation and understanding much easier. To start, there are dynamic and static gestures [Freeman and Roth, 1995]. A static gesture being a thumbs up, and a dynamic gesture2 being waving. Thumbs up is a stationary position of the hand that is held and sustained until the message is received. Waving is a gesture of movement and repetition. The hand shakes back and forth trying to get attention. It is important that these two gesture properties are supported in a global gesture vocabulary. The following lists all single hand static and dynamic gestures, all of which are defined primarily by static or dynamic properties: crossed fingers, finger-gun, middle finger, fist pump, loser-L, money, poking, Vulcan salute, wave, chop, point, punch, etc. Two hand gestures: air quotes, applause, x across chest, gator chomp, hand rubbing, jazz hands, victory clasp, whatever-W, chest pound, surrender. Gestures with other body parts: air kiss, bowing, choking, drinking, curtsey, one knee, hand over heart, mooning, nod, shrug, shush, throat slash, gang sign, cross, crossed arms, right arm across chest, hand over eyes (look), hands over face, rocking out. In 1986, Jean-Luc Nespoulous identified the Nespoulous scheme [Nespoulous, 1986]. This specified three categories of gestures: mimetic9, deictic10, and arbitrary. Mimetic gestures mimic the object being described (think air guitar or invisible cell phone call). Deictic gestures are those often related to 6

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

pointing out and having its reference determined by the situation. An example is when one is specifying that ‘this should go over there,’ the hand is used to direct without words. Arbitrary gestures are those that are learned, agreed upon, or customized. Though uncommon, once learned they can be used and understood instantly between others familiar with it. Maria Karam mentions semaphoric gestures in her thesis, which are gestures involving objects. It’s important to recognize semaphoric gestures, because holding objects or a hand position could be a key signal in separating gesticulation with a computer command gesture. Typically semaphoric gestures are used to describe what airplane ground control do with their marshaling wand when communicating with the pilot while the plane backs up. Mimetic gestures are interesting because as long as the referenced object is understood by both parties, understanding can be transferred with just hands. An interface experience could potentially be tailed so that all operations animate the way they function. For example, grabbing and dragging are often indicated by a mouse with a hand icon, visually supporting the action the computer is taking. Newer pointer icons even appear to grip the page as the click and drag is performed. Edward Warman classified hand gestures into patterns and clusters. Patterns and clusters are similar to the gesture categories from Maria Karam, Nespoulous, and Freeman, originating however from a macro, rather than a micro, point of view. According to Warman, patterns were visual gestures, exaggeration gestures, and fine point gestures, which are made up of static and dynamic, mimetic, and deictic properties. Gesture clusters use sequences, combinations, symbols and animated properties to describe unique gestures. Identifying gestures in this way is advantageous when describing to a human what the system is capable of and what is available as a gesture. Waving is a left to right sequence which is repeated. Verbally describing this movement to a new user of this gesture vocabulary would be simple. Combinations of finger gestures alone could offer enough actions to support a large vocabulary, including: 3 taps, 2 taps (index to middle), 3 finger tap, 2 finger tap, 3 finger tap follow by 1 finger swipe. 7

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

A gesture vocabulary entirely developed as gestures that must be performed in patterns and clusters could be unique enough to reach an arbitrary gesture status, which Nespoulous said makes the action easily recalled from the brain. Gestures which use repeated and combined motions could be easily recognized by the interface, but could also be too complex. With humans, there is also a natural need when touching and gesturing, to repeat the gesture. It’s repeated to ensure delivery. This is especially true with hand gestures. Humans repeatedly smash and get frustrated with unresponsive objects which should be providing feedback. The button to cross the street is designed for smashing, a smart design choice because it gets smashed while impatient people wait for the light to change. Buttons on a keyboard get smashed if they don’t work, and humans point at an object over and over again if it’s not being recognized by the recipient of the gesture. A global gesture library could use a unique gesture system, such as one that only uses clusters. If a gesture set was simultaneously logical and abstract, the gestures would not need to fight with natural human gestures. Apple has implemented a system using abstract gestures, which are logical and relatively abstract. The gestures are simple to remember and simple for the interface to receive, proving the power of abstracted gesture vocabulary. Identifying, classifying and accumulating all of these gesture describing words are now the foundation for a decision on the global plausibility and practicality of gesture interfaces. The hand and fingers work together in an uncountable amount of gesture combinations. This offers a more robust, natural and efficient interface for digital communication. Anthropologic studies by Desmond Morris clearly show that it is impractical to pinpoint any single gesture that remains consistent across cultures. Hands are also not meant for purely static gestures. The keyboard and mouse have familiarized users with awkward objects to perform computer functions [Anderson, 1984]. A gesture set of themes, patterns, and clusters could bring the natural gesticulation in conversation to devices which once felt lifeless. This also switches the users’ mind set from static manipulation to dynamic manipulation. Hand gestures offer 8

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

more commands than a button interface ever could. The approach to introducing new gesture technology needs to be studied and evaluated for the most practical method to emerge. The practicality and plausibility of interface assimilation depends on it. From data gathered thus far, plausibility of a gesture interface working globally is very high. There is no doubt that if a vocabulary was created, it would pave way for a much more interactive and communication rich environment for a user and gesture interface. Hands are natural assistants to all human communication. Gestures are one of a few ways for humans to communicate without any words, across cultures, and bring meaning to a situation without knowing each other’s language. Interpreting gesturing is certainly a plausible interface method that is supported as a natural method for humans to communicate or support communication. Although, since gesture interfaces are still undertested, and interface technology is still so new, gesture interfaces are not practical. Globally, the keyboard and mouse are still trying to reach new territory. How could a magical hand reading interface do any better? Easy, imagine walking into a village in a rural village. In a bag are two monitors and two computers, as well as a keyboard and mouse, plus a multimodal12 computer. During set up of the two machines, villagers are crowding around. Eventually, type one sentence on the keyboard, move the mouse, and click a couple menu items. Then, switch to the multimodal computer. Do three gestures (deictic, dynamic, pattern), speak one sentence in English, and watch them after motioning for them to try. Also watch them all gesture and talk to each other, pointing at the machines, scratching their heads. Which will a villager naturally gravitate towards? “The advantage of having gestures read directly off your hand is that it’s more natural than groping for a mouse. Once harnessed, you can pay more attention to the application at hand.*Popular Science 1993+” Howver, these application are still expensive and unable to solve any various cultural problems, which makes assimilation difficult. No one may feel particularly drawn towards such a drastic call to change within the computer world. The 9

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

establishment of a gesture vocabulary could open doors for practicality later. It is plausible for the curious user. Juan Wachs researched hand gesture vocabularies for his thesis on gesture based robotic control, and broke the approaches down into 3 types: Centrist or Authoritarian14, Consensus15, and Customized16 [Wachs, 2006]. These classifications determine this thesis’s real plausibility, because the way the world receives a gesture vocabulary is very important to its acceptance. Apple is notorious for the Authoritarian approach, their devices require the user to conform to the input programmed by Apple. Many other items, such as TV controllers and cars, have implemented their interfaces the same way. The iPad comes equipped with multi-touch gesture recognition, but the user must learn the gestures created by apple. Apple is currently making an attempt to define a global gesture vocabulary using this technique [Raskin, 2000]. The Consensus approach has previously been used for smaller audiences [Munk, 2001], and not the world. Trying to gather an equal amount of users from all cultural backgrounds to agree upon a gesture vocabulary is expensive, and from this thesis one could infer the results. None of the gestures would jump out as largely common between groups. Furthermore, after the interface was released to the end user, it would still feel Authoritarian, since that single user had no direct input on the gesture choices. However, an initial consensus approach could inform an educated Authoritarian approach. The data of a global gesture consensus approach to hand gesture based interfaces would be unprecedented and extremely influential in future gesture interface approaches if achieved. The Customized approach would mean that the delivery of the unit would be a blank gesture slate, and upon opening, would need calibration. The customized approach requires the user set his or her own gesture library. This is a tricky path, because the instructions should be globally accessible and 10

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

understandable [Butow, 2007]. A blank slate is not always what people want, they like things to work out of the box with little to no customization. There would have to be some sort of base interaction for calibration, for which touch and/or pointing could substitute. The setup instructions would need to be very well written, and give clear example of the capabilities of their new gesture based interface. This approach would require less of a gesture vocabulary as it would a system for gesture learning and customization. This system could be an application for creating custom gesture vocabularies, at which point one could assume that a library of gesture vocabularies could be available for the new user from a gesture vocabulary market. This gives cultures ability to find a common library on their own, modify it until it’s widely used. This would be more of an Android approach, meaning certain companies and individuals could create gesture vocabularies for their audience. Considering that the earth has such a wide variety of languages, gestures, and body language styles, it is not practical to assume that a single gesture vocabulary could suffice. Too many direct opposites exist, making gesture meaning across cultures unobtainable. The plausibility of a gesture interface vocabulary, on the other hand, is apparent. Humans naturally navigate to that which they can touch and interact with on a physical level, receiving feedback from contact. Touching surfaces with hands or pointing at objects are fundamentally natural gestures for humans. These gestures are the foundation for more advanced gestures to naturally evolve. When or if an interface can be built using a customizable method for gesturing, it could exponentially expand the possibilities of computers and humans communicating fluently together. “When you find you can relate to computer on an intuitive basis, you are well on your way to accepting the idea that man and computer can exist in intimate symbiosis *New York Magazine+.”

11

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

1) Gesture: A form of non-verbal communication in which visible bodily actions communicate particular messages, either in place of speech or together and in parallel with spoken words. 2) Dynamic Gesture: Gesture which requires a motion pattern, like waving or stirring. 3) Static Gesture: A gesture which is posed and held, like a thumbs up. 4) Gesticulation: The act of assisting spoken words with visually supportive gestures, mainly including the hands and arms. 5) HCI: Human Computer Interaction. 6) Gesture Vocabulary: A collection of gestures that are mapped to functions or actions. 7) Co-verbal Gesture: A gesture used directly with words to convey additional meaning. 8) Pantomimes: A gesture used to imitate the actions of others. 9) Mimetic: The hand and finer motions describe an object’s main shape or representative feature. 10) Deictic: Point to establish the identity or spatial location of an object. 11) Semaphores: Using lights, flags, or arms as gesture tools for signaling. 12) Multimodal: An interface which recognizes more than 1 of the following: what you say, what you’re looking at, gestures, and eye tracking. 13) Electromyography: A tool for measuring electric body signals to detect medical abnormalities, activation levels, recruitment order or to analyze the biomechanics of human or animal movement. 14) Centrist or Authoritarian Approach: A single individual decides what gesture vocabulary should be used for all users. 15) Consensus Approach: A group of users, either implicitly or explicitly, decide on a common vocabulary to express a given set of commands. 16) Customized Approach: Each individual defines his/her own gesture vocabulary. 17) Emblems: Gestures used in place of words. 18) Illustrators: Gestures performed in cooperation with a word, to reinforce its meaning. 19) Regulators: Gestures used to control the speed and flow of the communication. 20) Adaptors: Gestures which release physical or emotional tension. 21) Affect displays: Gestures which display emotion, which emotion plus gesture usually equals dramatic or exaggerated motions.

12

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

Books___________________________________________________________________________________________________________
     
 Shneiderman, Ben, and Catherine Plaisant. Designing the user interface: strategies for effective human -computer interaction . Addison-Wesley Longman, 2009. pp. eBook. Axtell, Roger. Gestures: the do's and taboos of body language around the world . John Wiley & Sons, 1991. pp. eBook. Warman, Edward. Gestures and Attitudes. Boston: LEE AND SHEPARD Publishers, 1892. 374. Print. Anderson, Nancy S. Methods for Designing Software to Fit Human Needs and Capabilities . Washington, D.C.: National Academy Press, 1984. pp. Print. Saffer, Dan. Designing for interaction: creating smart applications and clever devices . Peachpit Pr, 2007. 44, 148. Print. Butow, Eric. User interface design for mere mortals. Addison-Wesley Professional, 2007. 141. Print. Raskin, Jef. The humane interface: new directions for designing interactive systems . Addison-Wesley, 2000. 9, 24. Print. Freeman W. and Roth M. Orientation histograms for hand gesture recognition, International Workshop on Automatic Face and Gesture Recognition . 1995. Zurich, June.Print. Nespoulous J., Perron P. and Lecours A. The biological foundation of gestures: motor and semiotic aspects. Lawrence Erlbaum Associates, Hillsdale, MJ. 1986. Print. Munk K. Development of a gesture plug-in for natural dialogue interfaces, Gesture and Sign Languages in Human-Computer Interaction . International Gesture Workshop, GW 2001, London, UK. 2001. Print. Barfield, T. The dictionary of anthropology. Illinois, 1997. Blackwell Publishing. Print.

13

Hand and Finger Gesture Interfaces

An Exploration of Global Practicality and Plausibility

Magazines______________________________________________________________________________________________________

Rose, Lacey. "Desmond Morris On Symbolic Gestures." Forbes 24 OCT 2005: Web. 5 Feb 2011. <http://www.forbes.com/2005/10/19/morris-desmond-gestures-culture-comm05cx_lr_1024morris.html>. Antonoff, Michael. "Living In A Virtual World." Popular Science. Jun 1993: 85. Print. O'Malley, Chris. "Computers & Software." Popular Science. Mar 1998: 31. Print.

 

Website________________________________________________________________________________________________________
 
"Gestures." Wikipedia. Web. 6 Feb 2011. <http://en.wikipedia.org/wiki/List_of_gestures>. "Body Language." Wikipedia. Web. 6 Feb 2011. <http://en.wikipedia.org/wiki/Body_language>.

PDF______________________________________________________________________________________________________________

Karam, Maria. "A framework for research and design of gesture -based human-computer interactions." University of South Hampton. (2006): Print.

<h t t p : / / e p r i n t s . e c s . s o t o n . a c . u k / 1 3 1 4 9 / 1 / T h e s i s . p d f > 
Wachs, Juan “Optimal Hand Gesture Vocabulary Design Methodology for Virtual Robotic Control” University of the Negev. (2006: Print. <http://web.ics.purdue.edu/~jpwachs/papers/PHD_JUAN_JW.pdf>

Video___________________________________________________________________________________________________________

"BBC Present: The Human Animal - The Language of the Body." Google Video. Web. 5 Feb 2011. <http://video.google.com/videoplay?docid= -3323021761394989726#>.

14

Sign up to vote on this title
UsefulNot useful