You are on page 1of 15

11211509 MECT1

A report submitted to Dublin City University, School of Computing for module CA652: Information Access, 20011/2012. I/We hereby certify that the work presented and the material contained herein is my/our own except where explicitly stated references to other material are made.

Emotional pattern recognition using machine learning


Information access assignment

Module: CA652 Lecturers: Dr Alan SMEATON Dr Cathal GURRIN

Summary
Abstract ............................................................................................................................ 2 1. System objectives ......................................................................................................... 3
1.1 Issues and related functionalities .................................................................................................. 3 1.2 System overview............................................................................................................................ 4 1.3 Constraints and limitations ........................................................................................................... 5

2. Functional description of the system ............................................................................ 5


2.1 Standard definition ........................................................................................................................ 5 2.2 The OCC Model .............................................................................................................................. 6 2.3 Review of existing ERS ................................................................................................................... 1 2.4 The central neural network ........................................................................................................... 2 2.5 Toward a cloud-based application ................................................................................................ 3

3. Evaluation plan ............................................................................................................ 4


3.1 Training phase ............................................................................................................................... 4 3.2 Assessment phase ......................................................................................................................... 4 3.3 Expected results ............................................................................................................................ 4

4. How could this system form the basis of a successful business? .................................... 5
4.1 Plenty of potential applications .................................................................................................... 5 4.2 Competitors Analysis ..................................................................................................................... 5 4.3 Sustainable advantages ................................................................................................................. 5 4.4 Constraints..................................................................................................................................... 6 4.5 Conclusion ..................................................................................................................................... 6

References ........................................................................................................................ 7

Abstract
We have entered an era of pervasive computing. Computers and the Internet have become ubiquitous in our everyday life. However, most of the Human Computer interactions (HCI) interfaces are still based on the traditional model of being passive on responding only to users commands [1]. Recently, the automated analysis of human affective behaviour has attracted increasing attention from researchers and companies. Indeed, equipping a machine with the ability of responding to the users emotional state have been proved successful in many fields including Tutoring Systems [2], call centers [3], Intelligent Automobile Systems [4] and game industry [5]. In this paper, we propose an emotional pattern recognition system based on the inputs of several types of sensors. We draw the characteristics of such a system before detailing the issues and constraints that should be considered when designing it. Next we describe a possible architecture and conclude on its potential for commercialisation.

1. System objectives
1.1 Issues and related functionalities
The value proposition of our system is to turn various sensors data into the understanding of an individuals emotions. Therefore a first attempt to summarize its function model is:
Various sensors data

System

Emotion recognized

Figure 1. Broad functional model

When developing an emotion recognition system (ERS), the first task is to define what emotional state it will retrieve. There are three different ways of taking a look at affects [6]: Discrete categories: We can refer to different classifications which have already been made by philosophers (Spinoza, Prinz), scientists (Ekman [7]) or both (Descartes [8]). The most used is the 6 basic emotions (Happiness, Anger, Fear, Sadness, Disgust and Surprise), the other emotions being considered as varieties of these ones. Dimensional description: an affective state is described in terms of a small number of latent dimensions such as evaluation, activation, control, power, etc. Appraisal-based approach: an emotion is described through a set of stimulus evaluation checks including the novelty, intrinsic pleasantness, goal based significance, coping potential, and compatibility with standards.

The appraisal-based approach defines a set of variables which maximizes the distinction between 2 different emotional states making it more practical and more suitable for the Artificial Intelligence (AI) field than the basic classification model. For its successful implementation on various projects (e.g. [9]), we choose to be compliant with the OCC model [10] that we will further detail in 2.3. Even with that model, affects are often complex to distinguish. The more distinct data about the subject internal state the system obtain, the more likely it will be able to recognize the relevant feeling. Therefore gathering all types of sensors inputs (audio-visual, physiological etc) in the ERS is critical. However, most off-the-shelf sensors do not necessarily share a common representation (metric) of a given indicator and the same emotional recognition technique cannot be applied to all sensor type. Thus we need to specify a data formatting standard. Finally the functionalities of our system are summarized by the following functional model:
Standardized sensors data OCC Model emotional state recognized

System
Figure 2. Functional model

1.2 System overview


Our emotional state changing very fast, sensors will produce loads of data. From this mass, our system has to time-efficiently extract some high-level information in order to conclude on a particular emotion. Rather than proposing a computational model based on controlling the way emotions are triggered, we want our system to learn when each one is generated. Thus, machine learning systems such as Support Vector Machine (SVM) or Neural networks seem to be adapted choices for this application. They all accept formatted data as inputs and fire a decision as output. Rather than computing the data, they learn from training experiences what their output should do. More about machine learning techniques will be drawn in Section 2.2. Moreover, as we have previously seen, the system needs to adapt to a lot of different sensor types but the same ERS cannot be applied with a maximal efficiency to all these different types. Therefore we propose a multi-layered architecture with a specialised ERS for each different type of sensors and a central neural network which will take the final decision. Thus we end up with the final architecture and challenges:

Figure 3. System architecture and Challenges

In the Challenges part of Figure 3, we summarize the different issues that we will more specifically address on the functional description in Section 2.

1.3 Constraints and limitations


We identify 3 types of limitations for our system: Related to social and governmental aspects. Regulatory entities approbation and market adoption may be affected by issues regarding data privacy and tracking. Indeed monitoring peoples emotions is highly sensitive information. Related to the machine learning algorithms. The training phase requires access to a lot of data and to high computation power, thus leading to constraints on a powerful and efficient (so expensive) infrastructure and the access to large corpus or databases of emotional samples. Moreover the results of such type of system are not 100% sure but are subject to fluctuations. This could also cause regulatory and adoption limitations. Related to emotion recognition. Feelings are difficult to distinguish and people may not react to a particular emotion with the same internal stimuli. For example, Ekman [7] discovered that facial expression, speech and body gesture indicators depend both on the affective state and the environment (cultural, demographical) where the affective behaviour occurs. Therefore the accuracy of such system can be pushed to a limited extent.

2. Functional description of the system


In the following we will further describe the technical issues presented in the section 1.2.

2.1 Standard definition


In order to guarantee the sensor data interoperability and provide an easy language to manipulate them, we choose XML for the of the standard data format definition. For each type of sensor (e.g. Temperature, web ), a common indicator and metric will be defined (e.g. internal temperature, C). In order to integrate with our solution, one will have to transform its data to fit with this specification and present it in a XML document typically of the form:

Figure 4. XML Standard definition

The system will then acquire this data, check if it matches with the specification and, if successful forward it to serve as input to the related machine learning systems.

2.2 The OCC Model


The appraisal theory of Orthony and al. [10], also known as the OCC model, is the concept our system use to represent the users emotional state. It defines three categories of emotional situation: - an event having consequences on ones goal; - an agents action that matches or deviates from the standards; - an object (idea or concept) that suits or not ones tastes. An event can then be categorized considering it can: - affect the fortune of others; - influence ones plans and prospects; act on ones well-being;

This gives six different classes of emotion, which can also be differentiated into eleven distinct pairs of opposite emotions (see below).

In order to translate these emotions into comprehensive variables that will then feed the ERS, we base our proposition on the work of Trabelsi & al. [1] which define a set of 14 indicators: - Sense of reality (real/unreal) - Praiseworthiness - Effort - Strength of Cognitive Unit - Appealingness - Realization - Unexpectedness - Desire-for-other - Expect-Dev - Desirability - Liking - Familiarity These indicators are described by digits values ranging from -1 (extreme low) to 1 (extreme high) according to their respective intensities, giving a vector of 14 values describing an emotional state. These OCC model vectors are the outputs of the specialized ERS and serve as input to the neural network which will, based on them, retrieve the users emotion.

2.3 Review of existing ERS


Developing an efficient ERS for a type of sensor is a difficult and expensive task. Therefore, we do not propose new specialised ERS in this study but rather focus on the integration of existing ones. For an ERS to be compliant with our system, it must accept standardised data as inputs and outputs an OCC model vector (or at least involve at some point this theory). This model being relatively widespread, multiple implementations have already been successfully developed, tested and applied. We review some of them in the following:

Thanks to the data standardisation, other ERS can with slight changes integrate with the proposed system. They just need to change their output to be OCC model vectors. Therefore specialised Support Vector machine, Dynamic Bayesian Network and could also be suitable targets.

2.4 The central neural network


Machine learning is the branch of artificial intelligence that studies and develops architectures and algorithms to equip an agent (a machine which is usually a computer) with certain behaviour and an ability to build internal models from empirical training data in order to solve a certain task [11]. Among them we distinguish the Neural Network that we will use for the final ERS. A neural network is a multi-categorical classifier. It is composed of an interconnected multi-layered set of entities called neurons, where each neuron can be activated outputting its activity which is a level of confidence in the recognition of a pattern. Each neuron is connected to the neurons at the next layer by weighted links.

Inputs of previous level

y1
y2

w1 w2
=

Neuron Weights

()
Output for next level

yn

wn

Figure 5. Neural network model

The whole concept relies on the firing function (). When the sum of all inputs multiplied by their affected weights exceeds a certain threshold, the neuron is activated and outputs a value yj as explained in Figure 5. Thus the decision-making algorithm is the combination of multiple neurons decisions. Initially, scientists configure the network hierarchy and the firing rule of each neuron. The training algorithms then teach the network by changing the weights affected to links. Training phase algorithms For each data samples, some neurons will fire (ie. say to the next level that they recognize the pattern) and some not. During training time, data samples are marked as belonging to one category or another. Their features are extracted and serve as inputs in the neural network. The objective of training algorithms is then to minimize the quadratic error of the output by reducing the weights of neurons that went wrong and improving the others depending on the level of confidence they output. Application to our system The central neural network will accept the OCC model emotional vector as inputs and will be trained with data samples extracted from the specialised ERS outputs. Its last layer will be composed of a single neuron which will take the final decision regarding the recognised emotional state.

Figure 6. Neural network architecture

2.5 Toward a cloud-based application


In Section 1.3, we reported constraints on a powerful and scalable infrastructure needed in order to deploy our ERS. Moreover, the system needs a way to improve its market reach. Thus, in order to solve both problems, we propose in the following to host our system in a cloud computing infrastructure. The idea is to build a service-oriented architecture (SOA) where each service is a specialized ERS for a sensor type. Each service will then fire its decision through the network to the central neural network which will retrieve the emotion and send the result to the client. This SOA approach will allow us to: - Solve the problem of interoperability between different sensors type. - Build a scalable infrastructure which provision on-demand high computational power. - Train and redeploy the ERS systems in real-time. - Combine different services in order to offer highly-customized services to clients. - Offer an easy way for customers to interact with our system through Internet. - Open the system to external innovation through API to allow client to build their own application and their own ERS compliant with our system. Finally we end up with the following system architecture:

Figure 7. Final cloud-based architecture

3. Evaluation plan
3.1 Training phase
The performance and accuracy of a machine learning system is highly dependent of the databases used to train it. Having enough authentic labelled data of human affective expressions is challenging because they are short-lived, rare and context-sensitive [6], but also expensive as they involve humans in the research process. Several attempts have been made to cope with these problems such as hiring professional actors or use cinema footages, but training the system with them affects the results accuracy when it comes to retrieve real-world emotions. Therefore we review here some of the databases that must be considered for different ERS types:

After the ERS have been trained, they can generate accurate OCC model vectors which will, in turn, be used to train the central neural network. Thanks to the cloud hosting, the different parts of the system can be trained and redeployed seamlessly improving their accuracy over time without disturbing the users.

3.2 Assessment phase


In order to reliably assess a customer-facing application, we cannot avoid including humans in the evaluation process. Because demographical and cultural background of the audience can influence the results (see Ekman [7]), the subjects will first have to fill a questionnaire. Then the sensors already integrated in the system will be installed on each subject and a set of images and videos likely to make them encompass some target emotions will be displayed. Finally we will compare for a set of 22 emotions to what extent (percentage) each of them are correctly recognized by the system. We will also be vigilant on the percentage of false positive for each category. Results will be matched with the demographical data collected previously and we will draw a conclusion on the accuracy of our ERS system.

3.3 Expected results


As seen in 2.3, the average results for ERS techniques range between 60 and 70%. Merging the data acquisition of several sensor types (and notably the possible integration of physiological sensors) may improve this ratio. Moreover, this project, by integrating more and more sensor types and enriching the training set of the central neural network, should keep improving over time. Therefore we expect to reach a ratio between 70 to 75% in order to launch a first customer-facing application.

4. How could this system form the basis of a successful business?


4.1 Plenty of potential applications
Emotional feedback is one of the hottest topics of physiological computing. Indeed, enabling a machine to know its user internal state could allow it to respond more accurately to its request. Recently, mood-based user interface have proved successful on the web. Websites such as stereomood.com which provide music accordingly to the visitors mood encompasses great audience, but they ask the user to click on its current mood in order to acquire this information. Thus automatic mood-based UI, which is one of the most straightforward applications we could imagine, would benefit a lot of our system. In the same field, reporting the users emotional state as status on social network. Knowing about the emotional state could also benefit the person itself and allow building a personal coach for emotion containment. It could also interest external people such as marketers who desire to know the emotional response to their last campaign or security guards who want to track dangerous individuals and check that they do not commit crimes. Finally, in a certain extent company-customized apps can be developed for frustration or stress monitoring. And that are just a few applications, most of them have still to be discovered. Through an API definition, customers could be able to produce their own ideas real, enabling open innovation. The business model would then be web service provider, where customer pays according to their service use and the complexity of their requests.

4.2 Competitors Analysis


The most serious actor of the field is currently Visual Recognition, a Dutch spin-off company of the ISLA Laboratory at the Universiteit van Amsterdam who recently released an emotion recognition software (ERS) called eMotion. Based on the facial expressions it aims at retrieving automatically your emotion. It has successfully been incorporated in a website GladOrSad.com which recognizes these two antagonists feelings based on an uploaded photo [12]. It has also been deployed for a marketing study at Unilever to determine what food makes people happier [13]. Despite this company is implanted on the field and has two successful products at its account, its ERS is not webbased, so not as interoperable and easily customizable as the proposed system.

4.3 Sustainable advantages


As we intend to use technologies currently in research stage, it is unlikely that our competitors are planning to integrate them in their products. This gives us the opportunity to secure our position through partnerships with the scientists of the field, thus developing technical barriers to entry. Furthermore, by providing a scalable highly-customizable and open infrastructure, we aim at establishing a strong customer base and retaining them through optional personalized features, such as a dedicated neural network for example.

4.4 Constraints
However the regulatory constraints identified in Section 1.3 are still a strong barrier for a Business to Consumer (B2C) start-up creation. Ethical issues such as tracking of a persons emotion and privacy are likely to be considered before launching any application. Moreover the machine learning algorithms being uncertain by nature, having a system output which could be possibly wrong can raise legal issues. For example, when tracking criminals, if the system fires anger while the subject is completely calm, it can cause unexpected troubles. Finally this system being the combination of several research works over laboratories, intellectual property and lobbying strategies issues might raise on its path to commercialisation.

4.5 Conclusion
The proposed system can be the basis of a successful business, because it allows several applications which could encompasses great success. The architecture being hosted on the cloud, there is no need for an expensive investment on servers and the computational resources being provisioned ondemand, the cloud hosting cost will match the web service use and thus the benefits. There is currently just one competitor on the field and it is not oriented towards a web integration of its solution. The most important risk is related to ethical and regulatory issues that can arise from the use of the web services by illegal applications. Therefore, we strongly believe that the proposed system should be further studied in the upcoming years.

References
[1] A. Trabelsi and C. Frasson, The Emotional Machine, a Machine Learning Approach to Online Prediction of Users Emotion and Intensity, 10th IEEE International Conference on Advanced Learning Technologies, 2010, pp. 613-617. [2] M. Ochs and C. Frasson, Emotionally Intelligent Tutoring Systems, Proc. International Florida Artificial Intelligence Research Society Conference (FLAIRS 04), May 2004, pp. 251-256. [3] C. M. Lee and S. S. Narayanan, Toward detecting emotions in spoken dialogs, IEEE Tran. Speech and Audio Processing, vol. 13, Mar. 2005, pp. 293-303, doi:10.1109/TSA.2004.838534. [4] Ji. Qiang, P. Lan and C. Looney, A probabilistic framework for modelling and real-time monitoring human fatigue, IEEE Trans. Systems, Man and Cybernetics, Part A: Systems and Humans, vol.36, Sep. 2006, pp. 862-875, doi: 10.1109/TSMCA.2005.855922 [5] S. Slater, R. Moreton, K. Buckley and A. Bridges, A Review of Agent Emotion Architectures, Eludamos Journal for Computer Game Culture, vol.2, 2008, pp. 203-214. [6] Z. Zeng, M. Pantic, G. I. Roisman and T. S. Huang, A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions, IEEE Tran. Pattern Analysis and Machine Intelligence, vol. 31, Jan. 2009, pp. 39-58, doi:10.1109/TPAMI.2008.52 [7] Emotion in the Human Face, P. Ekman, ed., second ed. Cambridge Univ. Press, 1982. [8] Ren Descartes (1983). The Passions. Paris: Librairie Philosophique J. Vrin. 353. [9] J. Bates, A. B. Loyall and W. S. Reilly, An Architecture for Action, Emotion, and Social Behaviour, European Workshop on Modelling and Autonomous Agents in a Multi-Agent World (MAAMAW 92), Jul. 1992, pp. 55-68. [10] A. Ortony, G. L. Clore and A. Collins, The cognitive structure of emotions. New York: Cambridge University Press, 1988. [11] Martel J. Convolutional Neural Networks - A Short Introduction to Deep Learning. Not published yet. 2012. [12] Available at http://www.gladorsad.com/, Last accessed 22nd March 2012. [13] Available at http://www.wired.com/science/discoveries/news/2007/07/expression_research, Last accessed 22nd March 2012. [14] C. Conati and H. Maclaren, Empirically building and evaluating a probabilistic model of user affect, User Modelling and User-Adapted Interaction, vol. 19, Aug. 2009, pp. 267-303. [15] M. Shaikh, P. Helmut, M. Ishizuka. (2006). A cognitively based approach to affect sensing from text. IUI '06 Proceedings of the 11th international conference on Intelligent user interfaces. 1 (1), p303-305. [16] Salway, A., Graham, M.: Extracting information about emotions in films. In ACM Multimedia 03, Berkeley, CA, USA. [17] S. Ioannou et al., Emotion recognition through facial expression analysis based on a neurofuzzy method, J. of Neural Networks, vol. 18, pp. 423435, 2005.

[18] T. Kanade, J. Cohn, and Y. Tian, Comprehensive Database for Facial Expression Analysis, Proc. IEEE Intl Conf. Face and Gesture Recognition (AFGR 00), pp. 46-53, 2000. [19] L. Yin, X. Wei, Y. Sun, J. Wang, and M.J. Rosato, A 3D Facial Expression Database for Facial Behavior Research, Proc. IEEE Intl Conf. Automatic Face and Gesture Recognition (AFGR 06), pp. 211-216, 2006. [20] H. Gunes and M. Piccardi, A Bimodal Face and Body Gesture Database for Automatic Analysis of Human Nonverbal Affective Behavior, Proc. 18th Intl Conf. Pattern Recognition (ICPR 06), vol. 1, pp. 1148-1153, 2006. [21] R. Banse and K.R. Scherer, Acoustic Profiles in Vocal Emotion Expression, J. Personality Social Psychology, vol. 70, no. 3, pp. 614-636, 1996. [22] Available at: http://cpk.auc.dk/~tb/speech/Emotions/ Last accessed: 04th April 2012.