Submitted in partial fulfillment of the requirements for the Master of Music in Music Technology in the Department of Music and Performing Arts Professions in The Steinhardt School New York University Advisors: Dr. Kenneth Peacock, Dr. Agnieszka Roginska 06/10/2011

Copyright © 2011 Shefali Kumar Friesen

Table of Contents

I. Background A. Foundations of musical communication B. Game-changing technology

4 5

II. Limitations and Motivations A. Limitations or current technology B. Motivations for designing a new interface

7 8

III. Factors influencing design of the user interface (UI) A. Emotion-based navigation and content categorization B. Media-rich messaging C. One to One Communication D. Mobile delivery IV. Development A. Interface and message flow B. Application deployment

10 16 21 28

34 39

Application Screenshots


V. Discussion and Conclusions A. Summary B. Expanding features C. Future scope D. Predictions E. Limitations F. Emotitones as a tool Appendix References

47 48 50 51 52 52 55 56



“Music is a fundamental channel of communication: it provides a means by which people can share emotions, intentions, and meanings.” -Hargreaves, MacDonald, and Miell

A. A few words on communication models The word communication has many different definitions to date. While previous

definitions specified valid communication channels, today we accept communication to be: the “imparting or exchange of information, ideas, or feelings”, or simplified further, “an act or instance of transmitting” (Miriam-Webster, 2011). Despite (or perhaps because of) this broad definition, and the progressive connotation of the concept, researchers in music continue to wrestle with demonstrating proof that music is a valid form of communication. At the core of accomplishing this task, many attempts have been made to prove

that music is a language and thus a form of communication. Researchers have made efforts to apply linguistic principles such as semantics to music, claiming that for music to be qualified as a language, it would have to follow the same rules (Hargreaves et al, 2005). As scholars discovered, however, music is applicable to structural semantics (notes, chords etc), but lacks definable meaning (Kuhl, 2008). This “fluidity” of meaning in music, has made applying the ‘language’ label a difficult theory to support; as while

some characteristics such as tempo and harmonic progressions are measurable and definable, others such as emotional response, are personal and cultural - making them very difficult to measure by any means (Kuhl, 2008). While at the very least, the formal structure of music, as well as its tendency to

communicate meaning, makes for an elegant metaphor between it and language, later communication models, influenced by semiotics and cognitive science, placed emphasis on the usage of language rather than language as a system (Kuhl, 2008). In other words, rather than proving that music is a language, it is more productive to examine the comparisons between the system of music and the system of language. As discussed in III, using this approach, there is support for music as a transmission of meaning, emotion, and understanding. From a historic perspective, music and language have fundamentally existed in all

human societies (Mithen 2006). Some archeologists assert that music and language even existed in prehistoric societies (Blacking, 1973) and (Sloboda, 1985). With research in favor of these findings, musical communication of today, is simply a continuation of historic behavior.

B. Information technology and music technology “Information technologies have profoundly affected communication processes by

simplifying queries, liberating energy, reorganizing semantic axes and points of view, and reorienting the relationship between the production of meaning and our sociotechnical environment” (Tanzi, 1999). It has always been the role of communication technology to enable more meaningful human connections and to increase both the efficiency and frequency of information

exchange. Regardless of the innovator’s motivation, communication technologies alter cultural behaviors and typically make the world more accessible. Just as communication has been influenced by information technologies, musical

communication has also been influenced by music technologies. One example is the advent of the radio broadcast and its resulting behavior: music dedication. This cultural practice began as the Long Distance Dedication on the American Top 40 radio show in 1970. A DJ (Kasem) would play a mailed-in record while reading the accompanying letter dedicated to a listener. The tradition has continued for over four decades on the radio, and over mix-tapes, mix-cds, and digitally shared playlists. Another game-changing music technology innovation is the mp3. The Moving

Pictures Expert Group made a break-through with their implementation of lossy data compression/resolution and perceptual coding (http://mpeg.chiariglione.org/). Music files could as a result, be relatively small without compromising on auditory quality. Music is everywhere as a result, and implemented in online and offline communication behaviors (discussed in III). The ringback, while not quite a game-changer, facilitated a more intimate and

direct form of musical communication. In 2001, engineers found a way to manipulate what was heard on the caller’s side of a ringing phone. For the first time, a mobile phone owner could designate a unique song that each contact would listen to instead of the typical ring. Each song selected would communicate the dynamic of the relationship in some way. Other examples of game-changers are the digital recorder, digital audio

workstations or DAW, mp3 players, and mobile music such as ringtones, all of which either changed or introduced new ways of communicating with music.


A. Limitations of current technology As technology advances, so does communication - particularly in the age of social networking. Short Message Service texts or SMS messages are the most prevalent form of communication worldwide, and is remarkable in ability to communicate to anyone, anywhere, anytime. However, as revealed by case studies (to be shared in coming sections) text-based technologies often lead to misinterpretations of tone and meaning in messages. Whether the absence of intonation in an email leads to error in sizing up a relationship with a colleague, or a teenager misinterprets a “to the point” communication style to mean her loved one is laconic and uninterested, it is clear that the nature of today’s digital communication lacks the clarity of vocal communication, or better yet, face-to-face interactions. “It is too easy to misunderstand something if you can’t read the body language and the faces - a single word can easily be misunderstood.” (Stald, 2008). Aside from the lack of emotional cues in today’s communication channels, the

number of words allowed to convey an idea are often limited to 140-160 characters. Even an eloquent communicator would have trouble conveying his or her mood accurately. Another limitation which will be explored in the next section, is that free, and

novel communication platforms tend to favor the broadcasting communicator, or the contributions to the live “feed,” over the thoughtful, expressive communicator. The

rapid rate of information exchange has made communication fleeting in some ways. The result, is many users who have a voice, but do not know what to say, or who to say it to. One of the coping strategies for emotional limitations is emoticons, or the faces

made from punctuation and or letters to convey mood. They date back to 1800s as a concise way of representing emotions in theater (Bierce, 1912), but digitally they have surfaced in a number of ways. For example, when text-based virtual realities called MUDS (multi-user domains) began popping up in the 90s, they gave rise to an exclusive and universally understood language made up of “idioms, acronyms, and iconic emoticons,” which were created not only “to economize keystrokes, but also to help define the contexts for conversations, establishing responsiveness and attentiveness, communicating understanding, initiating play, describing actions in real life and conveying mood, feeling and emotion” (Cherny, 1995). Today, the prevalence of emoticons is significant. In instant messenger applications

from AOL IM to Blackberry’s BBM, emoticon menus are embedded and quick to call upon in order to establish a mood, clarify context, and other various shared meanings.

B. Motivations for designing a new interface The Emotitones name derived from the concept of emoticons. Emotitones are

auditory emoticons, in some ways, which although longer, and more complex, aim to provide an enriching and non-text-based tool for expression (see expanding features in Discussions). The logo for emotitones is an emoticon with an eighth note for a mouth.


The motivation of providing an emotionally-rich channel for communication, is

supplemented by the motivation to help the music industry, a business which has been wounded and dysfunctional for over a decade. The past decade has proved to be a time of resurgence; the beginning of a comeback of sorts for the music industry. Even after several years, however, the model is still broken, and solutions are needed to restore public opinion about the general value of music. From advertising and sync licensing, to ringtones and radio, music has fulfilled many commercial needs by tapping into the emotions of the end-user. In each case, however, music is consumed passively. To demonstrate the greater value inherent in a song, the listener must be actively engaged by music, or a musical experience. This engagement should result in an action (or transaction) of some kind. From an artistic perspective, lyrics are a powerful, but overlooked communication

tool. They convey common emotions, and situations relatable to any given person at any given time. This is evident in dedications, musical greetings, mix-tapes, and music sharing, and was a motivation for constructing the Emotitones platform. As mentioned, musical interaction and communication are not novel concepts.

Emotitones aims to focus on the shared meaning between sender and receiver, instead of performer and listener. This will facilitate a shift from passive listening to active communication. Given this motivation and the limitations of emotional expressivity in today’s digital communication, this thesis proposes a user interface for musical communication called Emotitones, and the elements contributing to the design of the UI.



After committing to creating a solution for musical communication, there were several decisions that went into the design of the Emotitones user interface. While prototyping the user experience felt relatively natural, significant research went into each design element. The first decision was to make the navigation and categorization, emotionbased.

A. Emotion-based navigation and content categorization Navigation is the backbone of user experience in many applications. It is a

prediction of user mindset- and must address the following questions: what will the user be thinking from page to page? and what will motivate them / guide them to complete the desired action? With musical communication as the end goal, it was crucial to focus on the

message; not text-based information, but rather audio-based emotion. To choose emotion-based navigation, it was necessary to find confirmation that emotional expressivity in music exists; in other words, adequate research suggesting that music is effective at conveying emotion (or concise ideas/sentiments) was required. The most compelling research addressing the emotional expressivity of music,

comes from studies in cognitive neuroscience within the last five years. Researchers know that it is possible to prime a stimulus for an expected result. In other words, by presenting one concept to a subject, the expectation of another concept can be created.

The former concept is called a primer, and can be more or less effective based on how related it is to the later concept. This phenomenon has been used to test several theories on cognitive processing. Before highlighting these studies, some brief comments on priming and the N400 are necessary. The brain has at least two systems: the first is implicit and responds quickly and

automatically, and the other is an explicit system designed to interpret signals from the implicit signal (Kubovy, 2008). Priming, an important factor in recent experiments measuring the cognitive processing of music, is an effect of the implicit memory, in which exposure to a stimulus influences response to a later stimulus (Kolb & Whishaw, 2003). Here is an example of priming described by Michael Kubovy, in the Library of Congress lecture series Music and the Brain: table 1.1:



table 1.1 In table 1.1, the words being compared are SOFA and SOFA. In subject 1, because SOFA is preceded by the word COUCH, it is said that it has been primed for an expected result, where as SOFA preceded by DISK in subject 2, has not been primed for the expected result. Similarly, CAR has been primed by TRUCK, but not CAR preceded by

HAMMER. When a stimulus has been primed effectively, it can be accessed more readily than the same unprimed stimulus, therefore leading to more streamlined cognitive processing (Kohl, Whishaw, 2003). To measure such claims,

electroencephalography (EEG) was used to record the faint electromagnetic fields emitted by the brain; more specifically, waves less than 100 Hz indicating brain activity (Niedermeyer, Silva, 2004). The voltages were picked up by electrodes, amplified, and recorded into a computer where they were analyzed. Before measuring activity related to unexpected stimuli in experiments surrounding cognitive processing, it is important to first measure results of expected stimuli. Kubovy again illustrates an example: The word dog was displayed on a screen over and over, while the subject’s brain waves recorded. The waveform of the stimulus was aligned with the recorded waveform of the response, and an average was taken of the two, resulting in the event related potential, or ERP. This is, in a sense an extraction of how the brain is responding to a stimulus (Kubovy & Shatin, 2009). When dealing with unexpected stimuli, negative voltages result in events recorded at various time intervals after the introduction of the stimulus. For example, when dealing with the sentence “I like my coffee with cream and dog” the unrelated and unexpected presence of dog prompts a reaction in the brain while processing. This ERP occurs 400 milliseconds after the unexpected stimulus, and is thus called the N400. The N400 is present in many of the studies at hand (Kubovy, 2006). Koelsch and his team achieved breakthroughs in their ability to make studies of

cognitive processing applicable to the processing of music. To do this, experiments used music and language as two separate approaches to priming the same word, with the results compared.

In one language priming study, for example, the word “wide” or “wideness” (from a German translation) was primed by one sentence: “the gaze wondered off into the distance”, while the second sentence was constructed with no association to the word wide: “The manocles (hand cuffs) allow only a little movement.” The result was expected: a higher N400 for wide/wideness proceeding the second (unrelated) sentence. To test this in the musical realm, Koelsch et al used a musical excerpt by Richard

Strauss (Opus 54 “Salome”) to evoke the feeling of the word “wide” or “wideness”. The second musical excerpt was from a more dissonant and closed-feeling piece by Valpola (from the E-minor piece for accordion). Just as the related prime in the language experiment yielded a lower N400 than the unrelated prime, the musical prime by Strauss resulted in a lower N400 in subjects than the musical prime by Valpola. This was a result of the characteristics of each piece. Because the piece by Strauss exhibits a multiinstrumental, consonant, and seemingly grand and expansive sonic quality, it evokes a feeling of wideness more so than the Valpola piece which is dissonant, with fewer instruments, with a seemingly closed feeling. The smaller value for an N400 peak is an indication of less cognitive processing of a concept. The illustration of the N400 responses (the peak on each chart as indicated on the

C2 chart) in figure 1.2 shows the cognitive responses to the concept of wideness using language-based and music-based primes. The dotted line represents the ERPs (event related potentials) responding to the unrelated primes, and the solid lines represent the ERPs in response to the related primes. In examining the two charts comparing the unrelated and related N400s for both music and language, it can be concluded that the musical primes are just as effective at conveying the idea of wideness as the language

primes are. This is indicated specifically by the differences in values of the N400 peaks comparing unrelated and related data points. The space between the two ERPs both for music and language primed-stimuli show the readiness of the brain to receive effectively primed concepts.

Event Related Potentials Figure 1.2


priming using Strauss piece

priming using Valpola

Koelsch and Steinbeis did a similar study recently, but instead of using excerpts of related and unrelated music to prime words, musical chords with consonance and dissonance, major and minor keys, and varying timbres were used as primers for emotionally congruous and incongruous words. N400s were again measured and it was found that the emotionally congruous words yielded a lower N400 in both musically trained and untrained subjects (Steinbeis & Koelsch 2010). This was a study in “affective priming effects of musical sounds on the processing of word meaning” and resulted in a

powerful contribution to the discussion of musical communication. In a similar study on duration, it was shown that this type of meaning, can be conveyed within a subject hearing 250 milliseconds of music (Bigand et al, 2005), and in other studies, meaning is conveyed in the changing of one semitone, or timbral characteristic (Sloboda et al, 2007). The findings suggest that “musical mode can effect the processing of language on affective level.” On the subject of ideas or affect conveyed by music, it is worth mentioning, that

what all musical structures used in these studies have in common, is that they are representative of concepts in one of three ways: 1) by imitation, 2) by association, or 3) a sense of embodiment (Kubovy & Shatin, 2009). For example, when trying to prime a subject to choose a circle over a square, a researcher may play a short musical excerpt conveying something “smooth” sounding verses something “angular”. Because the brain recognizes the embodiment of a circle, it can be primed by characteristics embodying the same concept. This phenomenon of mixed metaphor has been studied in great deal by experts in synesthesia - who would assert that sounds and music can represent a concept so powerfully, that they can convey meaning in other senses entirely (Cytowic, 2009). The notion that music is representative of concepts and emotions, suggests that allowing a user to choose his or her own music to convey an idea based on emotion and categorization, would be effective. Many musical representations in today’s society most likely are a result of cultural

learning, (such as a listener’s association of fanfare with royalty). “By communicating an emotion, however basic, music can refer to a variety of different affective states, which are more or less, unanimously understood by listeners familiar with the musical idiom” (Juslin, 2003). Music representation by association is most likely to be more

culturally dependent than other musical representations, and is dependent on the listener drawing on allusions (Kubovy, 2006). With a rich cultural memory, listeners have implicit knowledge of the music in his or her culture. Even beyond cultural associations, recent studies have demonstrated that certain

emotions in music are recognized both by Western and non-Western listeners. These subjects were able to classify western pieces as being happy, sad, or scary, without any familiarity with the pieces (Fritz et al 2009). This idea of universal emotions also supports emotion-based navigation. Unlike the past decade in which the emotional expressiveness of music was

limited to theories and proofs dealing with structural comparisons between music and language, currently there is clearly enough support to base navigation and content categorization on emotion. The details and better picture of what this means will be explored in the development section of this paper. B. Media-rich messaging The second UI decision to explore, is the enabling of media-rich messaging - in this case attaching a musical excerpt to the message being communicated, or making the musical excerpt the message itself. Research on this subject is lead by the communication sciences, but is becoming more cross-disciplinary due to the rise of social-networking technologies. Research by Weber & Mitchell provides a good starting point for this discussion of the effect multi-media has on communication efficiency. Observably, digital users adopt multi-media into their communication on a regular

basis. Whether the media comes in the form of an embedded video link, or photo attachments, the content becomes a crucial part of the message. Taken a step further,

with ever-evolving social platforms and enabling technologies, users (particularly young users) find the means to modify these media artifacts to make them their own. “Young people’s own digital productions facilitate a blending of media, genres, experimentations, modifications, and reiterations, which Mizuko ”Mimi” Ito describes as a media-mix” (Weber & Mitchell, 2008). This customization of media can be seen in remixed audio and video files pervasive around the internet. Common examples include film footage that has been overdubbed with audio created by the user or taken from another source; in the many renditions of unlicensed cover songs; and in the personalized computer animations used as digital greetings. This consumable nature of ‘media-mix’ is described by Henry Jenkins as being “production”; not production for commercial purposes, but for “interactive consumption,” in which a user consumes media including images, audio, and video, to create their own media productions (Weber & Mitchell, 2008). Best paraphrased: “users merge digital technologies with commercial media narratives in the context

of specific communities, in effect fusing and remaking both the narrative and the tool. From early scrap-booking practices in Studio-era Hollywood to the audio mix tapes of the 1970s, to the fan fiction and textual poaching explored by cultural studies researchers, we know that viewers and readers have long “re-mixed” or poached commercial culture” (McPherson, 2008). The limitations of plain text have become apparent to digital culture, leaving

media-less messages best for informational purposes only. While the production aspect of communication is predominantly exercised by youth demographics, the interactive consumption aspect of today’s digital communication has reached ubiquity. This can be observed in social-networking sites such as Facebook and Twitter.










communication. MUDS, as mentioned earlier, led to the creation of an exclusive language made of idioms, acronyms, and emoticons. Another, more complex example is Machinima culture. Machinima is the result of creating animated movies in real-time through video

game technology. Or more elaborately, visual narratives “created by recording events and performances (filmmaking) with artistically created characters moved over time (animation) within an adjustable virtual environment (3D game technology platform or engine)” (Lowood, 2005). While the example seems to impose an esoteric knowledge requirement on the user, the digital youth of today have a similarly fluent and complex handle on multimedia implementation, and like machinima users, view interactive capabilities with peers, equally valuable. Machinima users were able to exploit a technology platform, in order to express themselves, while simultaneously creating a subculture. This aspect of subculture, is also important in supporting specifically musicrich messaging. Media-rich messaging also supports the school of thought that multi-sensory

messages, lead to more emotionally-rich communication or experiences. “When modeling a communication experience, designers tend to limit user interaction to visual cues, occasionally accompanied by sound. But reality is actually multi-sensory and packed with an array of complex emotional cues... “(Metros, 1999), and “...the more modalities a medium uses, for example images and sounds, the more senses are activated and the more effective is the feeling of presence” (Stald, 2008). Support for this assertion that delivering information through more than one sensory experience is


effective, is found in the creation and implementation of earcons - an auditory tool that has been used to convey information for decades. Lemmens et al define earcons as “audio messages used in human-computer

interfaces to provide information and feedback.” While they are typically short (often less than 500 ms), they create strong associations, acting as cues for specific tasks that a user carries out. Both Windows and Apple computers have a history of using earcons which tell a user when they have carried out specific functions such as booting up their computer; opening files; saving files; and putting files in the trash. Earcons can confirm that a task has been carried out successfully; inform when an error or something unexpected has occurred; warn when something is failing, or needs attention, and occasionally act as bells and whistles to an otherwise mundane task. A surprising number of studies have been done on earcons, including methods of creating them; how musical elements contribute to their efficiency; the resulting associations formed by users; the psychological impact of positive and negative earcons; and how they relate to their visual counterpart. Most compelling in the discussion of music being used to create emotionally-rich communication through multi-sensory experiences, are the elements that earcon designers consider when approaching each audio cue. For example, in the study done by Lemmens et al it was asserted “the difference in affective appreciation of the major and minor modes can be incorporated in the set of transformations for earcons. The major/minor transformation can then be used specifically to create affectively-charged earcons for use in affective human-computer interfaces” (p. 2018). On the flip side, a study was done in 2010 on the potential hazards of using dissonant warnings for technical errors - possible creating too strong a negative visceral response in the end-user.

In another study called “Designing Earcons with Musical Grammers”, Hankinson

and Edwards recall early earcon designers who stayed away from compositions using more than four notes as to avoid musical associations and affect. Their study, on the contrary asserts that if used correctly, musical gestures and associated grammars applied to earcons can provide the user with rich information. This notion is confirmed by other researchers who have pinpointed the capability of conveying affect through pitch, rhythm, and timbre. The concise nature and observable impact of earcons have made them an

intriguing subject in examining audio-visual tools. Methods in cognitive processing (similar to those mentioned earlier) allow researchers to observe how earcons (varying in sonic quality) effect the brain’s ability to process information, as well as form affective responses. Congruency also plays a role - or how closely an earcon matches the concept it is trying to convey. Researchers use stimulus-response compatibility (SRC) to label or describe efficient implementations which result in improved user performance (strong stimulus-response mappings) (Lemmens et al, 2018). This is part of affective computing research. Despite the wide range of research, all are in agreement that earcons effectively convey information, thus improving human-computer interaction. Best said in the context of emotitones, “Earcons could be used, in any program employing emoticons, to more easily differentiate between positively and negatively valenced emotions” (Lemmens et al, 2024). In a culture where digital users receive information, form impressions and share

perspectives through consumption of multimedia, Emotitones facilitates this communication behavior further by allowing users to express themselves through the

multimedia content itself; more specifically, through musical content. Decisions regarding the transmission of these messages begin with the consideration for who the receiving audience is.

C. One to One Communication

This section of the user interface discussion concerns the communication channel specifically, and the decision to enable a one-to-one channel verses a broadcast. The strength in research favoring peer-to-peer communication over the type of

communication demonstrated in blogging, status-updates, and tweeting cultures, rests in the assumption that a communicator is more invested in a message directed at one user than in a broadcast to an undefined group of people. The main difference between the two approaches, is that in direct communication, the presence of the receiver / listener is crucial, and must be considered by the sender / communicator. While communication theories defining these roles are progressing as technology evolves, they stem from traditional models of communication, and are adapted as needed. The information transmission model by Shannon and Weaver was widely favored

in the mid-20th century. In this model, the communicator chooses a specific channel to deliver a message to a targeted receiver (Hargreaves et al, 2005). Many musical communication researchers consider this an oversimplification, arguing that

communication (musical or otherwise) involves creativity and interaction between the performer (sender) and listener (receiver): the communication is "much more interactive and re-creative than is suggested by the idea of information being passed from one

person (e.g. the performer) to another (the listener)” (Hargreaves et al, 2005). The listener, they assert, has a role in defining or interpreting the message (or piece of music), and therefore cannot be compared to a passive listener. Modern theories of musical communication address this shortcoming, however they lack consensus in terms of what roles the communicator and receiver play, and as to whether or not musical messages have coded meanings (Kendell & Carterette, 1990). The distinction between composer and performer was accepted however, and added as an extra step in the communication chain.

figure 1.3 This meant that the performer, had to first decode, then interpret musical meaning, then re-encode the message before sending to the listener, where in which “each of these processes is dependent on the shared implicit and explicit knowledge of all three participants in the chain, and is influenced by the context and environment within which the process takes place” (Hargreaves et al, 2005).

Figure 1.3 shows the complex but elegant musical communication model by Juslin,

who addressed the uncertainty between the listener’s perception and his or her affective response, as well as defined the composer’s role as a “causal” influence on the listener (Juslin, 2003). His studies also examined the translation of intention (composer’s and performer’s), and resulting affective response in the listener. Because the composer’s intention is translated by the performer’s intention, the performance takes on acoustic features that effect and shape the listener’s perception. The patterns that the listener then recognizes and internalizes, formulate a response - possibly emotional, and thus lead to a new mental state or experience (Hargreaves et al, 2005). In the case of Emotitones, it is reasonable to say that the sender is, in effect, a second performer interpreting the original message of the composer and singer, and again encoding the piece of music just before sending it to a receiver, who will be influenced by four considerations, the original composition/writing, the original performance, the sender’s added comments/impressions, and finally, the receiver’s own associations with the piece. Like Juslin, other scholars have been integral in addressing some of the subtleties

of communication chains. Speaking to the importance of the receiver, Johnson and Laird’s model asserts that when a communicator codes a message to the receiver, the message becomes symbolic, or a representation of what the sender wishes to send the receiver. The receiver must then decode the message, and therefore must have a mutual understanding of what the symbolic coding means (Hargreaves et al, 2005). While a performer on stage may opt to take artistic liberties in favor of direct, clear communication of a specific concept or idea, a user wishing to communicate an idea or emotion to another person, will be unsuccessful should he/she opt to send a vague,

coded message with no regard for mutual understanding. This is the difference between expression and communication. It is expected that designing the Emotitones UI using one-to-one communication (or communication within a small group), will prompt the sender to consider mutual understanding, and thus, will result in a more successful and fulfilling communication exchange. On a side note, it has been suggested (astutely) by Tanzi, that the first

communication decision belongs to the composer: “The composer must decide whether to hold on to sonic memories” or to let “algorithms dispose him of them. Music is thus ultimately cognitive and anthropological, not merely musical” (Tanzi, 1999). While progress in neuroscience has put musical communication models in the

context of information processing in the cognitive system, as discussed in III, and others have focused on modeling musical communication after language models using semantics and semiotics (also covered in III), other studies focus on communication models as influenced by digital technologies in the age of social-networking, resulting in the highly expressive nature of communication. The concept of expression as a form of everyday communication is a new

phenomenon, and one that indicates that a one-to-one channel for expressive communication is a logical next step for a new technology interface. Today, the logistics of message delivery (such as email platforms, and SMS platforms) are taken for granted, as users live in total ubiquity of technologically-driven communication channels. In other words, if a user sends a text message, he or she does not feel uncertain about whether the message will reach the recipient. Studies today focus on other layers of complexity; for instance, instead of computational thought being spent on how an intended message gets from the sender to receiver, users must consider the construction

of the message itself, and what channel to use for the delivery of the message. These are elements of new communication behaviors surfacing in the digital realm, and are being studied by researchers in several disciplines. With the tools for media-rich communication readily available, and the wide

choice in channels for message delivery, individual expressivity plays a much greater role. Communication exchange cannot happen without a series of individual decisions, each a part of the communicator’s preference and identity. The following are studies speaking to the influence of expressivity and identity in communication.

As new behaviors in the digital age emerge, theories and observations regarding selfperception and the formation of identity, are arrived at by applying traditional school of thought to modern, practical situations. Erving Goffman, is still quoted and studied today by digital theory researchers.

His “impression management” speaks to the tendency for individuals to monitor and guide others’ impressions by altering their own settings, physical appearance, and manners (Goffman, 1959). In today’s context, the performance of self “applies not only to face-to-face interaction, but also to asynchronous and real-time interaction on the internet. While Goffman could not have predicted the dynamics of computer-mediated interaction, his model works because users, socialized in face-to-face interaction are often conscious of applying the rules of such interaction to the cyber world” (Westlake, 2008). This is reminiscent of facebook user behaviors. Posting, tagging, and updating status, are actions typically broadcast to all other “friends” on a user’s profile, with each post carefully deliberated. Goffman labelled social interaction as being “dramaturgical” in that it is like a theater performance. His metaphorical “front” stage and “back” stage

distinguished between people acting or conforming to social rituals at gatherings, and people behaving when not playing a role and free to be themselves, respectively (Buckingham, 2008). “While certain elements that Goffman defined as part of the ‘front stage’ performance are absent in the computer-mediated interaction (visual cues such as clothing and facial expression and aural cues such as tone), they are replaced in chat and on websites by more “staged” elements such as font, photographs, music, and graphics” (Westlake, 2008). These staged elements become the characteristics of a digital individual, who can “tell stories of sorts (often non-linear and multi-voiced) and leave a digital trail, fingerprint, or photograph” (Weber & Mitchell, 2008). The “production” and “interactive consumption” discussed earlier are also

identity forming. Weber and Mitchell credit reflexivity as one explanation of how consumption and production contribute to identity formation: “Firstly their own media production (both through its processes and its outcomes) forces young people to look at themselves, sometimes through new eyes, providing feedback for further modification of their self-representations. Secondly, the source materials and modes of young people’s media production are often evident or transparent; the choices and processes that they use reveal and identify them in ways that they themselves might not even realize” (Weber & Mitchell, 2008). Digital artifacts used in remixing and in expression over social media channels

range in media type, duration, and format. Music-based examples have the most relevance in consideration of a musical communication tool. “Music is one of the most widespread and significant cultural objects that enhance

dimensions of people’s everyday life, and thus has become a significant component in


the domains of cognitive, emotional, and social functionality” (Hargreaves & North, 1999). The concept of music being a one-to-one interaction already exists through music sharing. Aside from those mentioned briefly in the introduction, peer-to-peer sharing applications, mp3 websites, and social-networking sites allowing profile music, all enable music sharing. While in some cases, these websites are used to display music in the public domain, music preferences, or music choices are often shared between peers. “Music represents a remarkable meeting point of the private and public realms, providing encounters of self-identity with collective identity” (Hesmondhalgh, 2008). The sophistication and method to sharing music in a meaningful way, Valcheva calls “Playlistism.” In making playlists, people characterize themselves, and express their personality

while capturing the emotional state they are in (Dijik, 2006). Ebane et al go so far as to say playlists are a “reliable personality barometer and a locus for negotiations of meaning, identity, and online presence” (Ebane, Slaney, & White, 2004). Anecdotally, most would say this is true - music preferences have strong associations to subculture. Frith has done many studies on this phenomenon, and made the conclusion that music functions as a “badge” for social beings. This badge-like quality of music, “is claimed to communicate value, attitude, and opinion to others and thus a means of identity representation and self-expression” (Valcheva, 2009). Frith’s findings also assert that an individual’s musical selection highlights some of the unconscious personality traits that person has. Several studies have examined the effects and functionality of music sharing

technologies (using playlists) including: iTunes (Voida et al, 2005), Napster (Brown et al,

2001), last.fm (Fitzpatrick 2008), Webjay (acquired and shut down by Yahoo), Push! Music, and TunA (Bassoli et al, 2006) (Valcheva 2009). In the case of last.fm, the platform allows users to share playlists, construct visualizations of musical taste, and express his or her identity through musical subculture. While TunA allows users to stream other users’ playlists in a “eavesdropping” manner, Push!Music is a novel system which allows users to “push” songs while mobile, in an effort to share music preferences, and make personal recommendations. This peer-to-peer interaction increases the value of musical interaction by placing importance on the receiver; if a song is being sent as a recommendation, the sender has taken the receiver into consideration. Making emotitones deliverable to individuals goes one step further; the sender

must consider if the message in the song itself is what should be communicated, not just the receiver’s potential affect to that style of music. D. Mobile delivery Thus far, current research / studies support a user interface which hosts emotion-based navigation and content classification; media-rich messaging, and peer-to-peer communication. The next UI element to consider is the method of delivery. After surveying the current reigning information technologies, it was clear that Emotitones would have to consider delivery over the mobile platform. “.. seen in this very broad evolutionary perspective, the significance of the mobile

phone lies in empowering people to engage in communication, which is at the same time free from the constraints of physical proximity and spatial immobility” (Geser, 2008).


One simple but powerful aspect favoring mobile devices is their worldwide

dominance; their ubiquity. The economic research illustrating this world wide dominance of mobile devices and mobile internet within the last two years alone is more than enough to make a decision on this delivery method (there are over 4.6 billion mobile users in the world); however, the design of the Emotitones user interface is based on neurological, technological and sociological analyses, not on economics. On the subject of the ever-present nature of mobile phones is Stald’s account: “it is

ubiquitous in youth cultural contexts as a medium for constant updating, coordinating, information access, and documentation. At the same time, the mobile is an important medium for social networking, the enhancing of group and group identity, and for the exchange between friends which is needed in the reflexive process of identity construction.” The mobile is “the ideal tool to deal with the pace of information exchange, the management of countless loose, close or intimate relations, the coordination of ever-changing daily activities, and the insecurity of every day life” (Stald, 2008). Stald’s findings were based on quantitative and qualitative studies on fifteen teenage to mid twenty-year old Danes and their mobile habits. The mobile phone is first and foremost a communicative device, however due to

the increasing number of capabilities and functions it is responsible for i.e. email, GPS, entertainment, news/reference, time keeping, etc, it is becoming an object of necessity; one that is crucial for functioning in today’s society. Rich Ling asserts that mobile devices change the approach to which daily life is organized and coordinated (Ling, 2004). In the traditional sense of time being the meter for the coordination of daily life, Ling suggests: “Instead of relying on a mediating system, mobile telephony allows for

direct contact that is in many cases more interactive and more flexible than time-based coordination” (Ling, 2004). Aside from the urgent and necessary functions, the phone is also viewed as a

personal log for day to day experiences (Stald, 2008). Media capturing functionalities allow users to document experiences through photos, notes, calendars and sound samples/voice memos. As Stald found, the memories created and shared on mobile devices inevitably lead to emotional connections felt with the phones back log of digital files. The emotive nature of the mobile phone in its ability to connect loved ones; to

function as a personal log; and in its ability to capture moments of communication and experience, evokes the imagery of Marshall McLuhan’s “extension of man.” As a medium, mobile users, particularly youth, have found several ways to personalize their devices, indicating further that there is an unarticulated emotional attachment between device and user. Some of these personalizations include background screen images, cell phone cases, ringtones, alarm tones, gaming, photo ids, and so on; “through its basic appearance, the decorative adaptations, the choice of ringtones, and other alerts, and through screen background, the mobile itself provides signals about the user’s identity or at least their self-perception. The use of language, spelling, their actual way of interacting in dialogues, and the use of additional communicative elements and services also reveal things about the user’s personal settings” (Stald, 2008). The emotional accounts of young mobile users across studies range from keeping

in touch through MMS messages and sharing moods and every day events, to taking video of crowning moments and engaging in full conversations over instant messenger. These accounts inevitably strengthen relationships and identity. This kind of emotional

expressivity of mobile devices supports the case for mobile delivery, but perhaps a more compelling case is the emergence of phatic communication. The mobile phone (via social media) has enabled communication functions which

traditionally were only present in verbal communication. As observed in interpersonal communication and linguistics, phatic communication, commonly referred to as “small talk” occurs when an exchange exists merely for the purpose of confirming that a channel exists and is functional. Originally derived by Russian linguist Roman Jakobson, this type of communication is not meant to convey any specific information or meaning, but instead, acts merely to utilize a channel, to check that the channel is working, or to make a comment about that channel (Jakobson, 1959). These exchanges have understood meaning that do not focus on the words themselves, but rather the delivery and intention of the phraseology. As pointed out by Zegarac and Clark, despite the meaningless nature of the words comprising a phatic message, the interpretation of these messages have social effects (Nicolle & Clark, 1998). While there are many studies in linguistics and communication sciences examining the content and intent of phatic messages, Wang et al go further to define “phatic technologies” whose primary purpose is to “establish, develop, and maintain human relationships”. While much of phatic communication can be seemingly thoughtless, Ling describes “grooming” messages (a type of phatic communication) which occur when a communicator lets another communicator know that they are “there” for them and actively listening; this exchange serves to nurture the relationship. The constant messages sent in youth culture for the purpose of “being thoughtful” (regardless of the lack of information in the message), has been compared to phatic communication in linguistics. The behaviors of SMS users frequently follow

phatic communication patterns, enabling small talk more so than conveying meaningful information (Ling, 2004). Behaviors such as “poking” on facebook, or pinging through instant messenger also demonstrate the digital application of phatic communication. Additional research on the subject has been on the rise within the last decade as

technology forms new communication behaviors, making devices such as the mobile phone, crucial to understand. As it relates to the mobile phone, phatic communication is observed (previously mentioned) as a social and emotive interaction without conveying specific information, such as with the text “hey how are you?” or “what’s up?” (Bilandzic et al, 2009). Another type of mobile phatic communication has been observed in European countries, as well as in Africa, North America, Latin America, and India (this is not an exhaustive list), and utilizes the ringing feature on mobile devices or other sonic alerts to communicate a shared meaning with another user, instead of the typical voice or text used to communicate (Kasesniemi et al, 2003). Observed in the study on Danish youth behavior, mobile users exhibited what is called “pilaris” by using the number of times a phone would ring to convey specific meaning (Stald, 2008). Mobile users observed in Donner’s study in Rwanda, used “beeping”, from SMS/text messaging and missed calls to communicate specific previously determined meanings. According to the observations, there were three kinds of beeps used: callback, pre-negotiated instrumental, and relational (Donner, 2007). Examples given for “pre-negotiated instrumental” include “I’m thinking of you” or “Come pick me up”. The behavior has spread so much so that an application was prototyped to “support phatic communication in the hybrid space” (Bilandzic et al, 2009). This behavior of using sonic alerts to communicate (only a small deviation from

the idea of communicating through musical clips), the emotional connection mobile

users feel to their devices, and the ubiquity and necessity of mobile devices, make the case for incorporating mobile delivery in the Emotitones user interface.



A. Interface and message flow The primary purpose of Emotitones, is to enable a emotionally-rich platform for communication. Observing that the effects of music are highly visceral in most cases, (especially when dealing with affect), and with adequate and current supporting research (mentioned in III), the emotion-based navigation was implemented. Emotion-based navigation revolves around the motivations of the expected

Emotitones user. The premise for sending an emotitone, is that a user desires a form of expression beyond simple text communication, which typically constrains emotional expressivity. This user has a pre-determined emotion or sentiment in mind when visiting the platform, thus Emotitones navigation should be reflective of their emotional motivations; informing the user on how to best express a given emotion. In other words, from the moment a user logs in, to the time they send an emotitone, they will be prompted to make functional decisions based on their emotions. Part of these navigational decisions is making it easy for the user to find a suitable

piece of media content to represent their sentiment. In the UI, this is facilitated by content categorization (database tagging) upon song clip ingestion, and a multiparametric search. When dealing with content ingestion, or uploading content the Emotitones

database, song clips are chosen based on their ability to convey succinct ideas or emotions. Ordinarily, this happens through eloquent song writing, in which the writer creates relatable, empathic lyrics; or through effective composition, in which the

composer creates music evoking highly visceral responses in listeners. The beta catalog of emotitones includes primarily vocal music in which it is requisite that the lyrics are concise, annunciated, and well-articulated. While the eventual database will include all types of music and sound conveying various emotions and ideas, the initial collection of song clips are somewhat literal for the purpose of developing a successful proof of concept. Once selected for the database, each emotitone is categorized and tagged according to the emotion or sentiment describing the over-arching theme being conveyed. This is to enable effective search for an appropriate emotitone. The emotional categories for the emotitones beta define what is thought to be the

most inclusive categories describing common, and universal human sentiments. They are: romance/encouragement/controversy/friendship/humor/spiritual/occasions/

musings/all. The key difference between these sentimental categories, and ones typically used in studies on music and emotion such as happy, sad, angry, and scared; is that the sentiments must take into account shared meaning with the receiver - a consideration that is absent in many studies which focus on the emotional reaction of only one listener. For example, if happy was used instead of romance, it would be very difficult to find music and lyrics appropriate for the relationship dynamic at hand. Vice versa, it is hard to think of a romantic song clip that would not be appropriate for a sender wishing to be romantic with the receiver (other than surface level characteristics such as gender, and other subtleties -to be discussed later). Other than the occasional browsing, it is hypothesized that users will send emotitones with a specific purpose and person in mind.


To facilitate the finding of appropriate emotitones, a three-parametric search was

implemented, with the emotion-based “sentiment/occasion” category described above being first. The second parameter for the sender to decide on is the genre of music, which is also part of the emotion-based search for an appropriate emotitone. It is hypothesized, that the sender will have a genre preference based on his or her own musical preferences. As explored earlier, these musical preferences stem directly from affect; from the visceral effects of listening to a specific genre of music over time. The beta phase genres include: rock / pop / hip hop / country / classics / world / other / all. While resources were consulted (charts, mp3 stores etc), these genre categories

were chosen based on their inclusive and encompassing nature (of sub-genres), and based on strong presence of subcultures. The third emotion-based search parameter that the sender must decide on is

gender, the choices being: male / female / all This parameter was implemented with the anticipation of senders having specific messages in mind, for specific people, thus having a preference in gender for the first person voice. This is an emotional consideration, with the hypothesis being that a song sent in the first person voice of the same gender, is more emotionally effective than one communicated in the opposite gender of the sender. Support for this could be found in surveying ringtone users as to which gender is preferred. This of course is only a starting point. Continuing with the navigation flow, after the user goes through the multi-

parametric search, they are then invited to preview the resulting clips if desired, or they

can proceed to sending the clip (or buying the full length song). On the send clip page, the sender is given the opportunity to customize their message by adding text (and in the future, photos or video). This is the last part of the implemented emotion-based navigation. The second design element of the UI discussed in III is media-rich messaging. While music was always intended to be the content through which users could communicate, decisions had to be made on catalog, duration, and file type. The emotitones beta is limited in terms of its categories, and content. Conceptually,

the catalog will house audio and visual clips representing the largest catalogs in the world. Only by giving the users exhaustive options, will they be able to communicate fully using the platform. In terms of clip length, a decision was made to cap duration at 30 seconds. Full

length clips were not considered as they are computationally expensive in terms of file size and delivery time, and in consideration for the ever decreasing attention spans of digital users. The length of fifteen to thirty seconds is the range of length for most song choruses. The ringtone edit of a song is typically this length, and most often the most emotive part of a song, as well as the most concise in terms of idea or concept. Logistically, being able to ask for ringtone edits from content providers is easier to accommodate as no further editing is required. In the cases where new edits need to be made, this is handled inhouse using Audacity. The desired file type for music clips is mp3 at 128 kbps. In the application, the

smaller the file size the better, and since the output speakers are likely to be of low quality, any higher quality music files would be undetectable.

The conclusion that one-to-one communication was desired over broadcast (such

as Twitter) led to the design of the “send tone” interface; the last page in the emotionbased navigation. As mentioned before, once a sender has selected a clip, he or she is given the opportunity to enter the recipient’s mobile number along with a customized text. While in beta, the mobile number entry is manual, the later stage versions of the application will interface with the user’s native contact list. There is also an entry prompted for the recipient email so that the receiver is notified that they have been sent an emotitone, and to please check their device settings if they do not receive an emotitone. To encourage a dialog between sender and receiver, the receiver is given the

opportunity to ‘reply with an emotitone’. The hope is that in the app version of the platform, a musical dialog can take place. The last element of the interface discussed in III, was the decision to integrate

mobile delivery. The app version of emotitones will be mobile-based and self-contained within the app, but the beta exists as a web to mobile platform. It is apparent as to why mobile delivery makes sense (discussed previously), however the decision to make the sender’s experience web-based was an issue of ease of use and adaptability. In other words, browser-based search and navigation is easier, and most likely will lead to more time spent on the site, and more users. In terms of file type, Emotitones are delivered as MMS messages. MMS was the

only option for mobile-specific, media-rich delivery, starting from web, and not selfcontained within an app.


B. Application deployment

1. The development of the Emotitones platform revolved around three core issues: 1) Where will the content be stored? 2) How will it be accessed? and 3) How will it be delivered? The first part of the storage issue refers to hosting. All sites must have a hosting solution, and in the last few years, many have migrated to cloud-based computing. The Emotitones demo was originally hosted on the Amazon EC2, however due to better support and more flexibility, Rackspace Cloud was chosen for beta, with the Emotitones server running on a linux-based Debian Box. Storage within the application is another issue relating to database development. The emotitones database has to be able to handle several functions: storage of mp3 files and corresponding tags/metadata; multi-user access; multi-parametric search capabilities; and sending, retrieval, and editing of data and files. The selected database system for Emotitones is MySQL (Facebook, Google, Wikipedia), as it can handle the requirements, plus large scale content ingestion. Access of content is enabled through the post-login, online interface which

communicates to the Emotitones MySQL database. Because the application is a browserbased interface with dynamic content, javascript was the selected as the development tool, with AJAX to integrate with MySQL. Javascript is reputable for non-browser based applications, and AJAX is a powerful server integration language. In the cases where user access involves uploading content through forms, XML, a

widely used tool for data transmission, is used for ingesting information in machinereadable form, while AJAX accesses the database repository. Emotitones has several of

these user-uploaded forms, some of which deal with music files, while others, simple text. The safekeeping of tags, metadata and other information is dependent on the XML coding. The delivery of emotitones, is reliant upon integration with a third party API

allowing for MMS delivery over all major carriers in North America. The Hook Mobile API uses M.A.X. 2.0 which is a Mobile API EXtension mobile utility platform. M.A.X. runs on a REST-based interface, which stands for Representational State Transfer; an architecture running over HTTP (web-based). The delivery of content from database to mobile phone also involves short codes, which give access to carrier delivery over the SMS and MMS platforms. Provided that a receiver’s phone is MMS enabled, emotitones can reach any user

using this API integration. In the receiver’s MMS inbox, the subject displays “John Doe (username) has sent you an Emotitone.” After clicking on the MMS itself, the receiver can view the customized text, and press the play button to hear the emotitone. The receiver is then given an option to reply with an emotitone, in which case they are directed to the web-based interface. For security, the previews and emotitones are forward-locked. The architecture in review, includes a remote server, a requesting source, a

receiving source, and a database repository. The server houses a database of music clips which have been edited, meta-tagged, and categorized by sentiment and/or occasion, genre, and gender. Emotitones integrates with an API allowing for successful delivery and receipt of MMS messages. Any web-enabled device is able to send emotitones, and any North American MMS-enabled device is able to receive emotitones. The Emotitones beta allows a user to do the following: create a login, browse clips, preview clips, select

and customize chosen clip with text, and send the clip via MMS to a receiver’s mobile phone.

Other functions of the site include: 1) A daily analysis of logs in the database repository to display information such as “Top 20 Emotitones chart” and “Today’s Top 5”; a back-end log of when emotitones have been sent; and safe keeping of user information, for login functionality. 2) A submissions form to allow users (artists or labels with copyright permissions) to upload edited emotitones to the database pending approval. The users are prompted to tag and classify each clip such that the emotitone will display in the results of the multiparametric search. While full-length downloads are accepted, they are edited before uploading to the database. 3) A suggestions page. Any user can fill out the online suggestions form if they think would like to request a song for the Emotitones service. They are prompted for categorization information, but not permitted to upload the file itself. 4) Aside from the multi-parametric search, users can search the emotitones database using a keyword search. Each song clip has been tagged with 12 keywords. In most cases the keywords include song title, artist name, chorus/hook phrasing, mood, corresponding emotion, genre, and genre of vocalist. 5) Buy links. In most places where a user can listen to an emotitone, he or she also has the option of purchasing the full length download. This was implemented from an emotional perspective. For example, if a receiver feels moved by the message in an emotitone that some one has sent, he or she may want the full length version of the song, which has a new meaning attached to it.







V. DISCUSSION AND CONCLUSIONS A. Summary Many novel systems, especially in the information technology and social-networking spaces, pre-date the presence of research in full support of the concept being exhibited. However, when approaching the Emotitones application, not as whole, but as a series of UI decisions, it was evident that multi-disciplinary support existed. From a practical standpoint, behaviors demonstrated by users of social, mobile, and communication technologies, already include integration of multimedia for emotive purposes. While platforms focusing on this phenomenon are in early stages of emergence, the digital culture, and its “interactive consumption” has existed for over two decades, and the mechanisms by which a communicator can express him/herself through digital media are already integrated in existing platforms such as facebook, myspace, twitter, and foursquare. The development of Emotitones has been an uphill battle of cross-platform

development, licensing negotiations, and issues with multi-territory delivery (as well as developmental cost considerations). These battles are worth fighting for the promise of a new communication platform; one that empowers users with multimedia content, namely music, to fulfill the emotional expressivity lacking in so many other platforms. The main difference, as articulated previously between these platforms and

Emotitones, is the added value placed on the listener, or end user. In a society where shameless plugs, spam, mass marketing, and junk mail are easily transmitted over

every platform, the listener, or receiver is taken for granted. Even peer to peer textual interaction lacks the empathy that face-to-face interaction between two strangers requires. A person can get away with terse, laconic text-based communication, while successful face-to-face communication must follow the rules of interpersonal communication. When a user sends an emotitone, he or she must have the receiver in mind. The main value of Emotitones is in the communication exchange. Because it had less bearing on the interface design and development, the subject of

how Emotitones can help artists has not been discussed. The artist perspective has always been a motivation for Emotitones. Music after all, is only possible with artistic effort and follow through. While the platform enables emotionally-rich communication, it is also a tool for artists to share and promote excerpts of work. A new release can be sent as an emotitone, with an adjoining text such as “I wrote this song for my father who has just passed”, and a link to the full length version of the song. Provided the artist does not communicate in a way that is construed as spam, it could be an effective way to reach fans on a more direct and visceral level than the typical release promotion.

B. Expanding features The beta phase of Emotitones is only a small representation of the features that

will make it a powerful platform for communication. Here are additions for the next phase: 1) Photo and video attachment capabilities: research supporting media-rich messaging suggests that a multi-sensory experience is much stronger in evoking an emotional response


2) World wide territories: limitations with the Hook Mobile API do not allow for delivery outside North America, however some of the strongest mobile markets are international such as Japan, China, Korea, Brazil, and the UK 3) Karaoke (customizable voice option): in countries such as Korea, the ability to sing over instrumental versions of songs, is prevalent. Emotitones aims to give users the option to record their own voice over a song clip, and send it. This may prove to be extremely powerful in emotional connectivity. 4) Sound sampling: while enabling users to upload any audio file could result in copyright infringement, users will be allowed to upload and send sounds recorded on their mobile or pc recorder. 5) Foreign-language song selection: making emotitones delivery available to countries outside North America is more valuable once there are song clips in the database local to that region. 6) Editing tool: when artists and labels submit content, songs must be pre-edited to the correct length. Implementing a dragging tool for editing duration, would make this easier and more manageable. 7) Exhaustive emotional categories and genres: beta phase development only allowed for less than a dozen emotional categories, and genres. Many more emotion based categories and genres will be added in order to properly tag and classify music. 8) Community and user id: Emotitones can spark many conversations on subjects such as song meaning, artists careers, song feedback in general etc. It is important to enable the community with comment platforms, and to allow for more information to identify who each user is. The research on subculture and identity suggests that musical identity is very important to digital users.

9) Commerce: the business model for Emotitones was not discussed, but it exists and revolves around premium content, royalties from full length music and other products, as well as virtual gifting. The platform will be implemented in the next phase. 10) Unlocking song selection: gaming has gone through huge growth in the age of social networking. In certain applications it is a camouflage for reward programs. In emotitones, the ability to “unlock” song selection will be treated as a game, rewarding users for loyalty or for their musical interest / knowledge. 11) “Short-hand” emotitones: As mentioned in the opening section, the derivation of the word emotitone, comes from emoticon. It is possible to make a “short-hand” version of audio excerpts such that they convey mood without lyrics, or lengthy passages. A shorthand emotitone, would be one second in duration or less, and added as a menu in instant messenger applications and the Emotitones community chat. The research available concerning earcons as well as the cognitive processing of musical gestures equal in duration suggest that the short-hand emotitones will be effective at communicating affect in the context of social interaction.

C. Future scope While the current focus of Emotitones is to enable musical communication, the

vision extends to multi-media communication in general. The future scope of the platform includes being able to send any digital artifact that communicates an idea, emotion or sentiment, whether it be a political speech fragment, a strong literary quote coupled with a related painting, or a humorous video clip from a movie. Because people of today are digital consumers and in most cases digital producers (unknowingly at times), they must be enabled to share in a way that gives credit to the communication

embedded in each piece of media they have collected or created. These multi-media pieces are in most cases, not meant to be consumed passively as they were created to convey meaning- and thus represent meaning. Such a platform revolves around the database itself, namely content ingestion

(having as much to choose from as possible) and optimized search functionality (making it easy for users to find what they want). The search engine required is a significant undertaking, and is part of the future scope of Emotitones. As an improvement to musical communication, a lyrics database is needed. Users

will want the choice of sending the lyrics to an emotitone along with the audio file. Whatever can be done to facilitate bricolage(see appendix), will make the service more compelling. D. Predictions As with any newly developed platform, there is always a possibility that technology will be used in a way other than that for which it was created. This happened with the application called Chatroulette in that it became a tool for sexual exploitation. Its intention was to facilitate world-wide impromptu video conversations (with the motivation of wanting to make the world more accessible to people). Typically this kind of malfunction happens when a service adds some kind of user-generated functionality. It is unforeseeable as to how Emotitones could be misused, however, the user-generated aspect of the platform will not be enabled for beta. It is predicted that user growth will rely on the growth of the content database. If a

user tries the platform but is unable to find a music clip suitable for the emotion or sentiment desired, they will most likely not return until there is a wider selection. Some

users however, will send an emotitone regardless because of its novelty. It is exciting to receive an emotitone, even if the words are not exactly right. This is akin to greeting cards, which historically are vague, and generic. It is up to the person giving the card to “customize” it with a personal message. E. Limitations The Emotitones beta is limited in many ways. As mentioned, content selection must reach critical mass before the platform truly enables musical communication. In addition, some users could see MMS delivery as a downside. While SMS and MMS have achieved relative ubiquity in today’s mobile market, the group of users who get charged to receive MMS, may find it frustrating to receive emotitones. Senders are warned several times, however, about standard messaging fees, and savvy (or just literate) mobile device owners know how to disable MMS delivery to their phones. This should not be a significant limitation, as there is no difference in cost between receiving an emotitone, or receiving a photo in MMS. Another major limitation is the login and navigation being web-based only. While

optimized for most mobile browsers, sending an emotitone from a mobile device is not a satisfying user experience. Mobile apps were created as solutions to this problem, and until emotitones exists in app form, the proper user experience of finding and sending emotitones will be limited to computer-based web browsers.

F. Using Emotitones as a research tools In the efforts to support the emotional expressiveness of music as compared to language, Emotitones can be used as a research tool. The Emotitones database logs quite

a bit of information including what emotitones people are sending most frequently as well as which emotional categories, genres, and genders are being sent, at what time of day, over which region, and how often that emotitone is reciprocated with another emotitone. As the emotitones user base grows, the behavioral tendencies of users will be valuable, possibly informing researchers of communication patterns used by today’s digital users - more specifically digital music users. As far as specific experiments, controls would have to be implemented, and an

example experiment might be to compare communication efficacy between emotitones and text messages. One way of doing this is to find 20 subjects (10 pairs who have a relationship of

some kind), separate them into adjacent rooms, and give them each an MMS enabled mobile device. Subject 1 of each pair would be instructed to search the Emotitones database and find five musical clips that best express the emotions or sentiments that he or she wishes to communicate to Subject 2. Subject 1 would then be asked to compose five text messages in lieu of each selected musical clip, corresponding to the same emotions or sentiments. Subject 2 would be sent the musical clips (as emotitones) as well as the five text messages one by one in random order. After receiving each clip or text, Subject 2 would be asked to write down the interpretation of what Subject 1 intended to communicate. The interpretations would be presented back to Subject 1 in pairs without indication of whether the interpretation was based on the text version or music version of the emotion. Subject 1 would be asked to select the interpretation best matching what the intended emotion or sentiment was. If the music-based interpretations more accurately


convey the intended emotions than the text-based interpretations, it can be suggested that the musical clips were more effective at communicating emotion. Other studies could be done related to genre preferences, communication patterns,

ethnomusicology and gender studies as related to musical communication.



A. Patent filing A thorough prior art search was done by both me and my patent attorneys at Russ Weinzimmer and Associates. Emotitones is patent-pending and the application can be viewed publicly on the uspto website. Provisions were just added to increase coverage and functionality, as well as claims to foreign territories.

B. Licensing and the Public Domain One of the biggest obstacles from growing the Emotitones catalog at a faster rate is the licensing process. Because of the state of the music industry, the four major labels are very protective of their digital assets. In the meantime, while the case is made, Emotitones is negotiating with independent content providers.

C. Bricolage The concept of bricolage has been used to describe the way youth plays with technology and digital files without real knowledge of what is being done. The messing around results in new creations and as a result, new behaviors, interactions and subcultures.

D. Getting to the next phase After beta launch, Emotitones will go into fundraising mode in order to facilitate new features and help with growth.



Abrams, D. (2009). Social Identity on a National Scale: Optimal Distinctiveness and Young People’s Self-Expression Through Musical Preference, Group Processes & Intergroup Relations vol. 12(3) 303-317, University of Kent. Bierce, Ambrose. (1912). For Brevity and Clarity. Collected Works (New York & Washington). Bigand, E., Viellard, S. Madurell, F., Marozeau, J. & Dacquet, A. (2005a). Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts. Cognition and Emotion, 19, 1113-1139. Bilandzic, M.;   Filonik, D.;   Gross, M.;   Hackel, A.;   Mangesius, H.;   Krcmar, H. (2009). A Mobile Application to Support Phatic Communication in the Hybrid Space. Information Technology: New Generations. ITNG ’09. Blacking, J. (1973). How Musical Is Man? Seattle: University of Washington Press. Blattner, M.M., Sumikawa, D.A., and Greenberg, R.M. (1989). Earcons and Icons: Their Structure and Common Design Principles. Human-Computer Interaction, Vol. 4 pp. 11-44. California: Lawrence Erlbaum Associates. Buckingham, D. (2008). Introducing Identity. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press. Cherny, L. (1995). The Modal Complexity of Speech Events in a Social MUD. Electronic Journal of Communications 5, No 4. (accessible at http://bhasha.stanford.edu/~cherny/ papers.html) Daltrozzo, J. and Schon, D. (2008). Conceptual Processing in Music as Revealed by N400: Effects on Words and Musical Targets. Journal of Cognitive Neuroscience, 21:10, pp. 1882-1892, Massachusetts Institute of Technology. Dijk, E.V., & Zeelenberg, M. (2006). The dampening effect of uncertainty on positive and negative emotions. Journal of Behavioral Decision Making, 19, 171-176.


Donner, J. (2007). The rules of beeping: Exchanging messages via intentional "missed calls" on mobile phones. Journal of Computer-Mediated Communication, 13(1), article 1. Durkee, R. (1999). American Top 40: The Countdown of the Century. New York City: Schirmer Books. Ebane, S. (2004). Digital music and subcultures: Sharing files, sharing styles. Vol. 9, No. 2. Fitzpatrick, C. (2008). Scrobbling Identity: Impression Management on Last.fm. Technomusicology: A Sandbox Journal, Vol. 1, No. 2. Garzonis, S., Jones, S., Jay, T., and O’Neill, E. (2009). Auditory Icon and Earcon Mobile Service Notifications: Intuitiveness, Learnability, Memorability and Preference. Boston: CHI. Hankinson, J.C.K., and Edwards, A.D.N., (1999). Designing Earcons with Musical Grammars. ACM SIGCAPH No. 65. York, England: University of York. Juslin, P.N. (2003). Communication emotion in music performance: Review and theoretical framework. In Music and Emotion: Theory and Research (pg 309-337). Oxford: Oxford University Press. Kasesniemi, E.L. (2003) Mobile Messages: Young People and a New Communication Culture Tampere, Finland: Tampere University Press. Koelsch, S. (2005). Investigating Emotion with Music: Neuroscientific Approaches. Leipzig, Germany: Max Planck Institute for Human Cognitive and Brain Sciences. Koelsch, S., Gunter, T.C., Wittfoth, M., and Sammler, D. (2005). Interaction between Syntax Processing in Language and in Music: An ERP Study. Journal of Cognitive Neuroscience 17:10, pp. 1565-1577, Massachusetts Institute of Technology. Kolb, B. and Whishaw, I.Q. (2003). Fundamentals of Human Neuropsychology. London: Worth Publishers. Kubovy, M. and Shatin, J. (2009). Music and the Brain Series. Washington D.C.: Library of Congress. Kuhl, O. (2008). Musical Semantics. New York: Peter Lang.

Lemmens, P.M.C., De Haan, A., Van Galen, G.P. and Meulenbroek, R.G.J. (2007). Emotionally charged earcons reveal affective congruency effects. Ergnomics Vol. 50, No. 12, 2017-2025. The Netherlands: Taylor & Francis. Levitin, D.J. (2006). This is Your Brain on Music: the science of a human obsession. New York, NY: Dutton. Levitin, D. J. (2008). The World in Six Songs: How the Musical Brain Created Human Nature. New York, NY: Dutton. Ling, R. (2004). The Mobile Connection: The Cell Phone's Impact on Society. Kindle Edition. McDermott, M. Goldman, S., and Booker, A. (2008). Mixing the Digital, Social, and Cultural: Learning, Identity, and Agency in Youth participation. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press. McNamara, T. P. (2005). Semantic priming: Perspectives from memory and word recognition. New York: Psychology Press. McPherson, T. (2008). A Rule Set for the Future. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press. Metros, S.E. (1999). Making Connections: A Model for On-line Interaction. Leonardo, Vol. 32, No 4, pp. 281-291. Milan, Italy. Miell, D., MacDonald, R., and Hargreaves, D. (2005). Musical Communication. Oxford: Oxford University Press. Mithen, S. (2006). The Singing Neanderthals: The Origins of Music, Language, Mind, and Body. Boston, Massachusetts: Harvard University Press. Modlitba, P. and Hoglind, D. (2005). Report in Musical Communication and Music Technology: Emotional expressions in dance. Mustonen, M.S., (2007). Introducing Timbre to Design of Semi-Abstract Earcons. Masters Thesis, Information System Science. University of Jyväskylä, Department of Computer Science and Information Systems.


Nicolle, S. and Clark, B. (1998). Phatic Interpretations: Standardization and Conventionalisation, Revista Alicantina de Estudios Ingleses 11: 183-191. Middlesex University. Niedermeyer E. and Da Silva F.L. (2004). Electroencephalography: Basic Principles, Clinical Applications, and Related Fields. London: Lippincot Williams & Wilkins. Nussbaum, C.O. (2007). The Musical Representation: Meaning, Ontology, and Emotion. Cambridge, Mass, The MIT Press. Peretz, I., and Zatorre, R. J. (2003). The Cognitive Neuroscience of Music. Oxford: Oxford University Press. Sandvig, C. (2008). Wireless Play and Unexpected Innovation. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press. Sloboda, J.A. (1985). The Musical Mind. The Cognitive Psychology of Music. Oxford: Clarendon Press. Sloboda, J.A. (2007) Stald, G. (2008). Mobile Identity: Youth, Identity, and Mobile Communication Media. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press. Steinbeis, N. and Koelsch, S. (2010). Affective Priming Effects of Musical Sounds on the Processing of Word Meaning. Journal of Cognitive Neuroscience 23:3, pp. 604-621. Massachusetts Institute of Technology. Tanzi, D. (1999). The Cultural Role and Communicative Properties of Scientifically Derived Compositional Theories. Leonardo Music Journal, Vol 9, pp. 103-106, Milan, Italy. Wang, Victoria, Tucker, J.V., and Haines, K.R. (2009). Phatic Technology and Modernity. Center for Criminal Justice and Criminology & Department of Computer Sciences, School of Human Sciences & School of Physical Sciences, Singleton Park: Swansea University. Weber, S. and Mitchell, C. (2008). Imaging, Keyboarding, and Posting Identities: Young People and New Media Technologies. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.


Westlake, E.J. (2008). Friend Me if You Facebook: Generation Y and Performative Surveillance. The Drama Review 52:4 (&200), New York University and the Massachusetts Institute of Technology. Williams, J. P. (2003). The Straightedge Subculture on the Internet: A Case Study of Style-Display Online. Australia: Media International Australia incorporating Culture and Policy. Valcheva, M. (2009). Playlistism: a means of identity expression and self-representation. The Mediatized Stories, The University of Oslo.


Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.