FengOHalloran Semiotica2013 0082

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/235966209
The Multimodal Representation of Emotion in Film: Integrating Cognitive and

Semiotic Approaches
Article in Semiotica · December 2013

DOI: 10.1515/sem-2013-0082
CITATIONS READS
41 4,248
2 authors, including:
William Dezheng Feng

The Hong Kong Polytechnic University
73 PUBLICATIONS 514 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Multiliteracies pedagogy in English language teaching View project
All content following this page was uploaded by William Dezheng Feng on 16 May 2014.
The user has requested enhancement of the downloaded file.

DOI 10.1515/sem-2013-0082 Semiotica 2013; 197: 79 – 100
Dezheng Feng and Kay L. O’Halloran

The multimodal representation of emotion
in film: Integrating cognitive and semiotic
approaches
Abstract: This study provides a semiotic theorization of how emotion is repre-
sented in film to complement the cognitive approach, which focuses on how film
elicits emotion from viewers. Drawing upon social semiotic theories and cogni-
tive theories of emotion, we develop a multimodal framework in which filmic rep-
resentation of emotion is seen as combinations of semiotic choices derived from
cognitive components of emotion. The semiotic model is employed to investigate
how emotive meaning is realized through verbal and nonverbal resources. At the
discursive level of film, the choices available in the shot organization of eliciting
condition and expression are examined. The paper demonstrates how the social
semiotic approach, combined with cognitive theory of emotion structure, is able
to provide a comprehensive theoretical account of how various film techniques
represent emotion. It is also significant for the study of viewer emotion, which to
a large degree stems from character emotion.
Keywords: film; emotion; multimodal representation; social semiotics; cognitive

appraisal theory
Dezheng Feng: The Hong Kong Polytechnic University. E-mail: dezhengfeng@gmail.com

Kay L. O’Halloran: Curtin University, Australia. E-mail: kay.ohalloran@multimodal-analysis.com
1 Introduction
While there are a number of cognitive studies that focus on how film devices elic-
it emotion from the viewer (e.g., Carroll 2003; Smith 2003; Tan 1996), few theo-
rists provide a systematic account of how emotions are represented in film. Com-
plementary to cognitive theories that attribute the understanding of film to the
cognitive capacity of viewers, semioticians argue that films are constructed in
ways that guide interpretation prior to handing over the task of understanding to
Brought to you by | Curtin University Library

Authenticated | kay.ohalloran@multimodal-analysis.com author's copy
Download Date | 11/7/13 7:44 AM
80 Dezheng Feng and Kay L. O’Halloran
viewer’s cognitive capacity (Bateman and Schmidt 2011: 1, emphasis added). In

light of this, the present study aims to develop a semiotic approach for under-
standing the filmic construction of emotion, continuing the efforts of Bateman
and Schmidt (2011) and Tseng (2009). Meanwhile, we accept the cognitive posi-
tion that films systematically exploit the “folk psychology” of emotions (Newman
2005: 119). Integrating the methods and findings from both cognitive appraisal
theory (e.g., Frijda 1986; Lazarus 1991) and social semiotic theory (e.g., Halliday
1994), the present study provides a comprehensive multimodal account of how
emotion is represented in film.
According to cognitive appraisal theory, the appraisal of emotion anteced-
ents drives response of physiological reactions, motor expression, and action
preparation (Frijda 1986; Lazarus 1991; Scherer and Ellgring 2007). For example,
anger may be produced by an act of another person, which is appraised as an
obstruction to reaching a goal, and is expressed with physiological changes (e.g.,
raised heart rate) and aggressive actions. These components thus form a scenario
or schema consisting of the appraisal of eliciting condition, subjective feeling
and reaction/expression.
Cognitive linguists have investigated linguistic expressions in relation to the
cognitive components of emotion (e.g., Kövecses 2000). Kövecses’ (2000) major
insights are that descriptive expressions of emotion are mostly metaphorical, and
that these metaphorical expressions can be systemized according to a “folk
model” of emotion consisting of five stages: Cause, Emotion, Control, Loss of Con-
trol, and Behavioral Expression. Thousands of seemingly unrelated linguistic
metaphors (e.g., I am going to explode) are instances of conceptual metaphors
(e.g., anger is heat) which are instances of higher-level conceptual metaphors
from different stages of the model (e.g., Loss of Control). We argue that literal
expressions and nonverbal resources (e.g., facial expression, movement) also fall
into the cognitive structure of emotion, giving rise to a multimodal approach to
emotion representation.
The second theoretical basis is social semiotic theory, more specifically
Michael Halliday’s (1994) Systemic Functional (SF) model of language. From the
SF perspective, language consists of three strata: phonology/graphology, lexico-
grammar, and discourse semantics, which are related through the concept of re-
alization. Following the principle of stratification, we assume that such semiotic
strata exist in film (Bateman and Schmidt 2011; Tseng 2009), as displayed in
Figure 1 (the realizational relation is represented by slanted arrow). As such, emo-
tive meaning and the discursive organization in shots and syntagmas are realized
by the linguistic and nonverbal resources, which are rendered in audio and visual
tracks. This stratified semiotic model allows us to investigate how emotive mean-
ing is constructed across strata.

81
The multimodal representation of emotion in film
Fig. 1: Semiotic strata in language and film
Following social semiotic theory, texts consist of choices made at different strata
(Halliday 1994). In the case of film, the causes and the character’s linguistic and
nonverbal expressions of emotion are not spontaneous as in real life, but are semi-
otic discursive constructs designed by filmmakers. The semiotic approach enables
us to move beyond cognitive psychological studies to examine how emotions are
“designed” in film. The phenomenology of the causes and expressions of emotion
in real life provides resources for the semiotic choices and the psychological theo-
ries of emotion provide us with tools to categorize those resources (cf. Newman
2005).
Combining the social semiotic and cognitive approaches, we develop frame-
works for investigating the multimodal construction of emotion in film. The film-
maker’s semiotic choices are examined in relation to the cognitive structure of
emotion. In Section 2, a brief account of the cognitive components of emotion is
provided. Following this, the representation of eliciting conditions and expres-
sions of emotion is examined in Section 3, and the configuration of eliciting con-
dition and expression through film editing is explored in Section 4. We conclude
with a description of how the social semiotic approach, combined with cognitive
theories of emotion, is able to provide a theoretical account of emotion represen-
tation in film in Section 5.
2 Resources of representation: The cognitive

components of emotion
2.1 The cognitive components of emotion
The main theoretical basis for investigating emotion representation is the cogni-
tive appraisal theory (e.g., Frijda 1986; Scherer and Ellgring 2007), which argues

that emotion antecedents drive response patterning in terms of physiological re-

actions and behavioral expressions. Although with slight differences, cognitive
theorists agree that emotions involve antecedents, the interpretation and evalua-
tion of antecedents, subjective feelings, physiological changes, and behavioral
reactions. These components thus form a scenario or schema consisting of ante-
cedents, appraisal, subjective feeling, and reaction/expression. We will work
with a three-stage model of emotion representation involving the eliciting condi-
tion (EC), the feeling state (FS), and expression (Ex).
In film, the eliciting condition can be represented as narrative events that are
distinct from the expression of emotion. When there is a reaction to the eliciting
condition, which can be verbal or nonverbal, the expression stage is reached. In
this sense, the internal feeling state can only be inferred based on the eliciting
condition or the expression of emotion. Language, however, is able to encode the
feeling state symbolically through lexical items (e.g., happy, angry). For example,
we can report the feeling state of others as in “he is angry.” In the expression of
one’s own emotion, however, which is our main concern, linguistic expressions
belong to the expression stage, regardless of whether language is used to recount
the eliciting condition (e.g., “I got the job”) or the feeling state (e.g., “I feel
happy”). In this paper, we examine the multimodal construction of eliciting con-
dition and the character’s subsequent expression of emotion, while recognizing
that such stages of emotion representation are interactive and potentially recur-
sive in nature (i.e., the expression of emotion may become the eliciting condition
for subsequent expression of emotion).
The emotion scenario or schema describes how our knowledge of emotion is
stored in memory (Bartlett 1932). This schematic representation significantly
facilitates our recognition of emotion because one or several of the components
are able to activate our knowledge of a specific emotion. For example, a smiling
face alone can be recognized as happiness because it activates our “happiness
schema.” Therefore, partial representation of the emotion scenario is also able to
communicate emotion. But more often, the eliciting condition and the expres-
sion are represented consecutively in film to fully communicate the emotion and
engage the viewers.
2.2 The appraisal of eliciting conditions
It seems safe to assume that basic emotions and their eliciting condition and
expression in films can be understood by most audiences. As Ortony et al. (1988:
3) note, “it is apparent that writers can reliably produce in readers an awareness
of a character’s affective states by characterizing a situation whose construal is

83
assumed to give rise to them.” The reason is that appraisal of an eliciting condi-
tion is generally shared amongst members and groups of a society (Bless et al.
2004). Experiments have also shown that both children and adults can report and
agree on typical antecedents of several common emotions (e.g., Smith and
Ellsworth 1985).
As a result, filmmakers are able to speculate (correctly most of the time)
viewers’ emotional reactions based on cultural knowledge. It is thus possible for
filmmakers to “design” film emotions to optimize engagement with viewers. The
filmic and discursive strategies for designing emotion are elaborated in Sections
3 and 4.
2.3 The multimodal resources of emotion expression
Modern studies of emotion have been modality specific; that is, they focus on
language (e.g., Kövecses 2000; Martin and White 2005; Wierzbicka 1990), the face
(e.g., Ekman and Friesen 1975, 1978), the voice (e.g., Banse and Scherer 1996;
Scherer 2003) or the body (e.g., Wallbott 1998).
In terms of facial expression, it is generally accepted that certain configura-
tions of facial muscle groups are universally judged to be associated with particu-
lar emotions (Ekman and Friesen 1975). Accordingly, psychologists have devel-
oped portraits of facial patterns to account for basic emotions of happiness,
surprise, fear, anger, disgust, and sadness (e.g., Ekman and Friesen 1975, 1978;
Izard 1971). However, Carroll and Russell (1997: 165) argue that patterns of facial
expressions arise only secondarily, through the coincidental co-occurrence of two
or more different components. In Hollywood films, although professional actors’
happiness is represented by smiles in 97% of cases, surprise, anger, disgust or
sadness rarely show the predicted pattern of facial expression (found in 0 to 31%
of cases; Carroll and Russell 1997). This study challenges the position that facial
expressions are hardwired in the emotion experience and suggests the need for a
framework that accommodates comprehensive multimodal analysis of the repre-
sentation of emotion.
The evidence for emotion-specific patterns in vocal features is not as strong
as that for facial expression (Wallbott 1998: 880). These parameters are generally
considered in relation to the arousal level of emotion. The emotive meanings of
body movements, gestures and actions are even less clear, in that differential pat-
terns of bodily activity do not fall into clusters characteristic of discrete emotions
(Planalp 1998: 34). Therefore, it is reasonable to consider these resources as con-
tinuous expressions of underlying dimensions of emotion, such as arousal and
valence.

Multimodal accounts of emotion are rare, despite acknowledgement by affec-

tive scientists that emotions are almost always expressed by multimodal signs in
face, voice, gestures, and so forth (Scherer and Ellgring 2007: 158). Scherer and
Ellgring (2007) investigate how professional actors use prototypical multimodal
configurations of expressive actions to portray different emotions. The resources
they consider are the modality specific parameters of facial expressions (action
units), vocal variables (frequency, amplitude, etc.) and bodily actions (gestures).
In terms of recognition rate, they find that with only ten multimodal variables,
the accuracy rate of cross-validated prediction is much higher than monomodal
discrimination.
The finding that combinations of facial, vocal, and bodily cues can better
predict emotions than the single modalities confirms the need for multimodal
analysis. However, Scherer and Ellgring (2007) show that the coders’ recognition
rate of the multimodal expression of professional actors is only slightly more than
50%. There are two main reasons for this result. First, as they acknowledge, the
portrayal segments consist of brief standardized utterances, with often only a
single facial expression and a single gesture per segment. The more idiosyncratic
and predictable material actions, such as slamming the door in a real-life anger
scenario, are not considered. Second, the eliciting condition is not provided to
the coders. According to cognitive appraisal theory, emotion is distinguished by
the cognitive appraisal of antecedent events. If these events are excluded, the
recognition rate decreases. To fully understand the communication of emotion,
we need to take into consideration all variables, such as the situational context
and the multimodal expression of emotion. This may be impractical for psycho-
logical experiments, but systematic multimodal discourse analysis is able to shed
light on this complex issue.
3 Multimodal construction of eliciting condition

and expression
In what follows, frameworks for the multimodal construction of the eliciting con-
dition (EC) and expression (Ex) of emotion are developed and are illustrated with
examples from well known films and television series. The basic organizing as-
sumption is that meaning is constructed across the semiotic strata in which the
cognitive components of emotion, organized by shots and syntagmas, are real-
ized by the audio-visual resources rendered as audio and visual tracks in film (cf.
Figure 1).

85
3.1 The multimodal construction of eliciting conditions

Film theorists mostly focus on (facial) expression for studying character emotion
(e.g., Coplan 2006; Plantinga 1999), and some studies include the eliciting condi-
tion as the context or criteria of emotion (e.g., Carroll 1996; Newman 2005). Differ-
ent from them, regarding the eliciting condition as part of the emotion scenario
puts us in a stronger position to theorize its structure and investigate its relation
to emotion expression, as we shall demonstrate in this section and Section 4.
There have been many attempts to categorize the complex eliciting condi-
tions of emotion, for example, Ortony et al.’s (1988) three categories of events,
persons, and objects. In theorizing the eliciting conditions, we do not attempt to
categorize the material world “out there,” but to categorize the ways in which the
outer world affects the character’s subjectivity. The system is shown in Figure 2,
where five eliciting condition effects (EC1–EC5) are identified.
Fig. 2: The representation of eliciting condition
The primary distinction is between eliciting conditions whose relations to the

emoter are represented and those which are unrepresented. If the relation is rep-
resented, the cause of emotion may be what the emoter does or says (EC1). For
example, a character may feel proud for accomplishing something or feel guilty
for saying something. The eliciting condition can also be what the character sees/
hears/feels through visual (EC2), auditory (EC3) or somatic (EC4) senses. For ex-
ample, a person may be terrified by what he/she sees, saddened by what he/she
hears, or delighted by physical sensations he/she receives. If the relation is un-
represented (EC5), the eliciting condition is presented to the viewer as a narrative
event, but viewers don’t know how the emoter accesses that event. For example,
in Ridley Scott’s Gladiator (2000), the event that the old emperor is dead (eliciting
condition) is presented to the viewers, and then in a shot the tearful face of her
daughter is featured, but how she accesses the eliciting condition is not shown.
The eliciting condition in film is represented using audio-visual resources
(i.e., language, facial expressions, gesture, etc.), where the shot is the basic unit.
The five types of eliciting conditions result in different syntagmatic organizations

of eliciting condition and expression and these syntagmas are essential for the
understanding of filmic representation of emotion, as shall be discussed in
Section 4. The eliciting condition and the expression are mostly sequentially
arranged, in which some emotion-inducing event happens first and the charac-
ter’s emotional reactions follow. The emotion-inducing event is significant in
filmic communication of emotion. First, it doesn’t only enable us to infer the char-
acter’s emotion and but also makes us anticipate the character’s emotional reac-
tion. Second, it may provoke the viewer’s feeling. As is noted in Section 2.2, the
appraisal of many events is culturally shared. Therefore, we are not only able to
infer the character’s emotional reaction, but also feel the emotion to some extent
based on our identification with the character.
3.2 The multimodal representation of emotion expressions
The film character’s linguistic and nonverbal expressions of emotion are not
spontaneous as in real life, but are semiotic discursive constructs designed by
filmmakers. Therefore, the first dimension in our multimodal framework involves
the resources of verbal and nonverbal expression. The framework also includes
discursive choices, which include the quantity of expression (simple/complex) as
well as the context of expression (individual/interactive). The dimensions with
their respective systems are displayed in Figure 3. The dimensions of verbal and
Fig. 3: The representation of emotion expression

87
non-verbal expressions and discursive choices are elaborated in Section 3.2.1. The
stylistic choices, including camera positioning, music, and so forth, are not dis-
cussed separately, but are pointed out where relevant in the ensuing discussion.
3.2.1 The multimodal resources of emotion expression
The expression of emotion is mainly studied in two disciplines: linguistics, which

focuses on the verbal expression, and psychology, which focuses on nonverbal
expression. We propose the multimodal framework of verbal resources (Ex1) and
nonverbal behavior (Ex2; see Figure 3).
In terms of the Peircean trichotomy of iconic, indexical and symbolic signs,
language is symbolic, making it the most abstract (and complex) resource for
emotion expression. The multimodal framework for analyzing linguistic expres-
sions of emotion integrates the social semiotic Appraisal Theory (which should
be distinguished from cognitive appraisal theory; Martin and White 2005) and the
cognitive components of emotion.
The first distinction for linguistic expressions of emotion is signal and deno-
tation (Bednarek 2008). Kövecses (2000) makes a similar distinction, with the
categories of “expressive” and “descriptive” representations of emotion. Signals
typically includes expletives such as “wow,” “yuk,” “oh, my god,” and so forth.
They express the emotion in a more reflective way and do not “describe” the emo-
tion. Denotations describe some elements of the emotional experience. There are
two choices for the denotation of emotion: direct and indirect. Direct denotation
is simpler and includes the literal emotion terms which “inscribe” the “feeling
state” of the emotion scenario directly. The second option is indirect expression.
Martin and White (2005) provide descriptions of several linguistic strategies such
as lexical metaphor, intensification, and so on that realize the indirect expres-
sion of emotion. However, such strategies are not clearly structured. Based on the
cognitive components of emotion, two types of indirect expressions can be dis
tinguished: those describing the eliciting condition, and those describing the
resultant expression or action in the emotion scenario. In the utterance “I am so
angry, my boss just fired me for no reason, I smashed the door heavily,” the three
clauses describe the feeling state, the eliciting condition and the expression stage
respectively.
Nonverbal behavior signifies emotion in a different way. In Peirce’s trichot
omy, nonverbal behavior indexes emotion (Forceville 2005). However, we argue
that the nonverbal expressions in film are different from those in real life because
they are not spontaneous. That is to say, films “design” the facial expression, ges-
ture and so forth based on the real life expressions. Therefore, we need to add an

iconic stage in the process of signification and consider the visually represented
behavior as icons of indexes, rather than indexes themselves. Therefore, in Figure
4 in Section 3.2.2, the character mimicries the real expression of happiness that
indexes the emotion. Meanwhile, this study does not aim to work out a “gram-
mar” of nonverbal behavior (see Feng and O’Halloran forthcoming; Martinec
2001 for attempts of this kind), rather, as the nonverbal expressions of emotion
can normally be unambiguously recognized in Hollywood movies, we shall
merely interpret the meanings of facial expressions or vocal features based on the
studies reviewed in Section 2.3.
3.2.2 Discursive choices of representation
Emotion expression may be as simple as a single facial expression, or as complex

as unfolding across several scenes. Simple representation depicts the synchro-
nized expression which involves maximally one unit from one or more modali-
ties, for example, the expression of one clause, accompanied with one facial ex-
pression and/or one gesture. Complex representation includes consecutive
expressions from one or more modalities. For example, the film can first represent
the facial expression, followed by linguistic expressions and a series of emotional
actions. We also distinguish between interactive and individual expressions. The
former is expressed to interactant/s, which are subsequently analyzed in relation
to the structure of the interaction, and the latter is not expressed to others. Inter-
active and individual expressions employ the same verbal and nonverbal expres-
sions and can be simple or complex.
Simple expression, whether interactive or individual, is represented by the
reaction shot, although reaction shots are able to depict complex expressions as
well. The most prominent element in the reaction shot is facial expression, which
is the exclusive focus of many film analysts (e.g., Carroll 1996; Coplan 2006;
Plantinga 1999). Facial expressions may occur alone in the reaction shot and are
featured at a close distance. More often, facial expressions are accompanied by
other verbal or nonverbal expressions. When gestural or bodily cues are repre-
sented, medium shot is used to depict the gesture or torso. The shot from Episode
12, Season 4 of David Crane and Marta Kauffman’s Friends (1998) illustrated in
Figure 4 is a relevant case in point. The character Rachel is depicted in a medium
shot which shows her smiling face, upward posture and lifted arms. In the
soundtrack, the character Rachel is uttering the words “I am an assistant buyer”
in high pitched voice, which indicates excitement. The reaction shot may stand
alone, but it usually works together with eliciting condition shot and comprises
syntagmas, such as the point of view (POV) structure and reverse shots, as dis-
cussed in Section 4.

89
Fig. 4: Illustration of simple individual expression
In what follows, we shall focus on interactive expressions, which are essential in

the representation of emotion. By studying emotion expression in the structure of
interaction, we make a significant move from treating emotion as a personal phe-
nomenon to treating it as an interpersonal one. In interaction, simple expressions
are those expressed in one move while complex expressions are those expressed
in several moves.
We situate the expression move in the basic unit of interaction, namely, the
exchange (Martin and Rose 2007). At the level of exchange, two types of interac-
tive expression can be distinguished: those that are reactions motivated by the
previous move and those that express a pre-existing emotion, as displayed in Fig-
ure 5. The upper part of Figure 5 shows the structure of interaction-motivated in-
teraction, in which one move is the eliciting condition and the other move is the
reaction. This kind of expression is discussed in eliciting condition-expression
configuration in Section 4.4. The lower part of Figure 5 shows the expression of
the pre-existing emotion. The expression move may be preceded by the initiation
of the secondary knower (K2) who asks about the primary knower’s (K1) emotion,
or otherwise the expression is the first move. For the expression to be an interac-
tive move, linguistic expression needs to be present, which may represent the
eliciting condition, the feeling stage or the expression, and it is normally accom-
panied by nonverbal behavior. Following the expression, there is typically a re-
sponse in the second move.
Interaction is typically represented by reverse shots, in which the two
speakers are depicted in two alternating shots as they speak. The two shots from

Fig. 5: The communication of emotion in multimodal interaction
Episode 12, Season 4 of Friends in Figure 6 illustrate how the choices in reverse
shots are made. In the first shot (Move 1), Monica (K1) expresses her emotion to
Rachel by recounting the eliciting condition that she is offered the job of head
chef in high pitched, loud voice, accompanied with the facial expression of smile.
Rachel (K2) responds to Monica in the second shot (Move 2) with surprise.
Fig. 6: Emotion communication in reverse shots
However, as with all realizational relations, there is no one-to-one correspon-

dence between the interaction structure and the shot structure: two or more
turns/moves may be represented by one shot and one turn/move may be repre-
sented by several shots.
The combination of single units of modalities such as facial expression and
gesture are considered by psychologists studying multimodal expression of emo-

91
tion (e.g., Scherer and Ellgring 2007). As noted in Section 2.3, the limitation of
this approach is that the limited variables are unable to account for the complex-
ity of emotion expressions, which include idiosyncratic actions and emotions
which take place over time. These complex expressions are significant in the rep-
resentation of emotion. Very often, the immediate reaction is followed by several
shots or scenes of individual expression, or the emotion is expressed in multiple-
turn interaction (interactive expression). Many complex expressions involve both
individual and interactive expressions and may extend across several shots or
even several scenes. In a shot from Tom Shadyac’s Patch Adams (1998), Patch’s
emotion is expressed with several resources in several shots after being kissed by
Corinne, the girl who he admires. Patch first makes the “wow” sound, which
shows his enjoyment, then he laughs happily, and dances as he walks away. The
expression includes facial expression, linguistic expression, and material action
and communicates the intensity of Patch’s happiness.
As a discursive choice, the expression depicted is determined by many fac-
tors, in particular the intensity of the emotion and the genre of the film. Complex
expressions of emotion over long durations of time tend to appear in female-
oriented genres like melodrama and romance, while in male-oriented genres like
action movie, emotions are often expressed over shorter time periods. In the
melodrama Patch Adams, for example, Patch’s grief after his girlfriend Corinne
was murdered is expressed over approximately nine minutes. The expressions
include his immediate facial reaction after learning about the news, crying at
Corinne’s coffin, leaving the medical school, conversing with his two classmates,
attempting to jump off the cliff, and his speech, which blames God for the murder.
Such full-fledged expression is undoubtedly motivated by his intense grief and
despair, but the filmmaker’s choice to allocate nine minutes to Patch’s display of
emotion is certainly a discursive choice. The discursive choice is quite different in
Ridley Scott’s (2000) Gladiator, which is a Roman epic and a male-oriented ac-
tion film. When Maximus sees that his wife and son have been murdered, he cries
with much anguish at the sight of their corpses. However, this is the only expres-
sion of grief and the film gives it several seconds before moving on to another
stage of the narrative. Maximus’ emotion may be no less intense than Patch’s, but
the filmmaker chooses a more compact way to depict the emotion.
4 Filmic organization of eliciting condition and

expression
In Section 3, we discussed filmic choices/resources for representing eliciting
condition and expression, which are normally both represented to guarantee the

accurate depiction of emotion. A further issue to address, which is also a key as-
pect of filmic representation of emotion, is how they are co-deployed, or orga-
nized. Previous studies only explain the working mechanism of one or two filmic
resources, for example, Carroll’s (1996) theorization of the POV structure. In this
section, we provide a comprehensive account of the shot-connecting devices and
examine how causal relations between the eliciting condition and expression
are represented by formally connected shots. The framework counts as a step
towards explaining how the textual logic of film enables interpersonal meaning
(cf. Bateman and Schmidt 2011).
In our model, the eliciting condition-expression configuration is systemati-
cally organized by shots and syntagmas. However, as with previous models, there
is no one-to-one correspondence between the choices from eliciting condition-
expression configuration and the choices of their filmic organization. For exam-
ple, two interaction turns can be realized by reverse shots or a single shot. Never-
theless, patterns can be found between the semantic layer and the expression
layer.
To account for the eliciting condition-expression configuration, we draw
upon the “grande syntagmatique” (the syntagmatic categories for narrative film)
proposed by Metz (1974; see also Bateman 2007). The options for syntagma are
significantly fewer than Metz’s grande syntagmatique because the causal-
temporal relations between the eliciting condition and expression mean that only
narrative syntagmas are relevant for emotion representation. Other syntagmas,
such as parallel syntagma, which depicts conceptual relations (e.g., classifica-
tion) between events, are not relevant. The shots and syntagmas available for
representing eliciting condition-expression configuration are shown in Figure 7.
Fig. 7: Shots and syntagmas of eliciting condition-expression configuration
We are concerned with the shot relation which connects the eliciting condition
and the immediate linguistic or kinetic response within the basic unit of syn
tagma. There are cases where the eliciting condition and the expression are not
organized in one syntagma. First, the eliciting condition is presented to the viewer

93
as a narrative event and somehow the emoter knows it but we do not know how
he/she accesses it (the case of EC5 in Figure 2). Second, the filmmaker creates a
separate scene for the character to express his/her emotion. In an episode in
Friends (Crane and Kauffman 1998), Rachel is given the job of assistant buyer dur-
ing her conversation with her supervisor Joana. There are naturally emotional
reactions immediately after learning about the news, but the film cuts to another
scene and Rachel only expresses her emotion in the scene after that. Third, as
pointed out in Section 3.2.2, complex expressions may extend across several
scenes and hence extend beyond the autonomous syntagma.
4.1 The single shot representation
The representing capacity of one shot is indefinite. It can be as simple as a single

facial expression or as complex as a whole film. The eliciting condition and ex-
pression can be represented within one shot in many ways. One special type is
when the eliciting condition is represented by linguistic recount as part of expres-
sion (cf. Figure 3). Although the eliciting condition we discussed is parallel to
expression, linguistic recount is undoubtedly a way of representing the eliciting
condition. In this case, eliciting condition is related to expression as part of it and
they are typically represented by reaction shot. Normally, the eliciting condition
is verbally recounted, accompanied by nonverbal expression (with or without
verbal recount of expression). A good case in point is the example of Figure 4, in
which Rachel’s speech recounts the eliciting condition that she is an assistant
buyer and the expression is simultaneously constructed by the voice, the facial
expression and the gesture.
Other types of shot will not be specified as there are so many things a shot can
depict. In terms of eliciting condition and expression, a single shot can depict the
character and the object he/she is looking at, the multiple turns of interaction, or
the action and reaction portrayed by one tracking shot. However, such configura-
tions are more typically represented by narrative syntagmas, which are discussed
below.
4.2 Projection and the POV structure
Projection depicts the character and what he/she sees and thinks. We shall focus
on the former, which is represented by the POV structure. POV structure typically
portrays what the character sees and how he/she reacts to it, constituting the
EC2Êxpression type (cf. Figure 2). Carroll (1996) develops a cogent theory of POV

representation of emotion. According to Carroll, the point/glance shot sets out a

global range of emotions that broadly characterize the affective states the charac-
ter could be in. The point/object shot, then, delivers the object or cause of the
emotion, thereby enabling us to focus on the particular emotion. In the approach
developed in this paper, it is only one simple device, albeit powerful, for repre-
senting emotions in film. The most celebrated example is perhaps the two shots
at the beginning of Stephen Spielberg’s Raiders of the Lost Ark (1981), shown in
Figure 8. The point/object shot shows the close-up of a skeleton, followed by the
point/glance shot which shows the terrified face of a character. Admittedly, this
structure is the most convenient and easy-to-understand technique to represent
emotion.
Fig. 8: Illustration of the POV structure
Two points need to be stressed, based on Carroll’s (1996) classic theory. First, POV
structure is only one of the many mechanisms of emotion representation, as
pointed out by Plantinga (1999) and suggested in our system. Second, there are
variations to the POV structure. One obvious variation is the order of the object
and reaction. Naming the point/object and point/glance shot A and shot B re-
spectively, we can get A^B and BÂ structures. Then it seems that Carroll’s (1996)
treatment of reaction as “ranger finder” and object as “focuser” only applies to
the BÂ structure. Another variation is that the object and the reaction can be
represented in one shot, either within the same frame or by a panning/tilting

95
camera. Third, the POV shots may be elaborated by subsequent shots. That is, the
object or the reaction may be portrayed by more than one shot, as they often
are. Taking A^B structure as an example, it is often reiterated by another pair of
object-glance configuration (A^B + A′^B′), showing the object from a different
angle and the character’s reaction with slight variation, as in shot 3 and shot 4
in Figure 8, which follow the first two shots immediately. Variations of this
reiteration include showing the object again without showing the character
(A^B + A′ + A′′ + . . .) and showing the character’s reaction in several shots
(A^B + B′ + B′′ + . . .). The multiple reaction shots are commonly used to highlight
the character’s emotion, together with the long duration and close distance of the
shot. This technique is used not only to guarantee our recognition of the emotion
portrayed, but also to invoke our empathy (Plantinga 1999).
4.3 Alternating syntagma and reverse shots
Alternating syntagma portrays two or more series of events or interacting part-

ners by turns. The most common example is the shot-and-reverse shot structure
which depicts two interacting partners. The eliciting condition in the reverse shot
structure is typically verbal (i.e., EC3 in Figure 2), although it can also be the non-
verbal EC4. In an example from Patch Adams (Shadyac 1998), the first shot shows
Corinne kissing Patch and the reverse shot shows Patch’s expression of excite-
ment and they form the EC4Êxpression configuration.
In interactions where the eliciting condition includes verbal information, the
reverse shots are normally consistent with the speaker turns and the examination
of its structure can show the speaker roles and exactly how the emotion is com-
municated. The general framework is illustrated in Figure 9, complementing the
interaction framework in Figure 5. The interaction structures are based on the
systemic functional studies of interaction (e.g., Martin and Rose 2007; O’Donnell
1990). As with Figure 5, the unit of analysis is exchange, and the focus of investi-
gation is the options of move, such as initiation and response.
The structure of information oriented exchange is typically K1^K2 (Martin
and Rose 2007; O’Donnell 1990), which represents the eliciting condition and ex-
pression respectively. One character says something in the first shot and is fol-
lowed by another character’s reaction in the reverse shot. A piece of information
can be reacted to in various ways, for example, with surprise if it is unexpected,
or with indignation if it violates moral standards.
The example of Figure 6 in Section 3.3.2 is a good case in point. In the first
shot, Monica’s expression of emotion that she is offered the job as head chef is
also the eliciting condition of Rachel’s emotion which is expressed in the second

Fig. 9: Interaction structure and eliciting condition-expression configuration
shot. The reaction is unambiguously surprise, with the verbal signal “oh, my god”
in high pitch voice and the open mouth. The eliciting condition and expression
corresponds to the speaker turns, organized by reversed shots, realized by facial
expression, language and vocal features, and finally rendered as audio-visual
tracks. The relation between different layers of semiosis is illustrated in Figure 10.
Fig. 10: Semiotic strata in reverse shots
In action-oriented interaction, D2 (secondary doer) reacts to D1’s (primary doer)

speech act like in K1^K2 structure. The D1^D2^D1f structure seems more com-
mon, in which D1 reacts to D2’s acceptance/undertaking or rejection/refusal. The
expected responses normally cause positive emotions and the unexpected re-
sponses cause negative emotions. The example in Figure 11 from Gladiator illus-

97
trates emotional reaction to the goal-incongruent response. Commodus orders

Maximus’ death and Maximus asks Qintus to look after his family (D1) in the first
shot. This request is denied (D2) in the second shot. In the third shot, Maximus
screams loudly and rushes forward to attack Qintus (D1f). This is a typical
D1^D2^D1f structure, organized as reverse shots and realized with verbal and
nonverbal resources in medium close shots. Maximus’ anger toward Qintus is
unambiguously represented with the eliciting condition (Qintus’ refusal) and his
aggressive behavior.
Fig. 11: D1^D2^D1f structure in reverse shots
4.4 Linear narrative syntagma and successive action shots
Linear narrative syntagma captures the eliciting condition and expression as two
successive actions, namely, the action and the reaction. The actions may be con-
tinuous or discontinuous in form, but two shots depict them as succeeding ac-
tions from one participant. The shots feature what the character does or says
(EC1) as eliciting condition and how he/she responds to his/her action/speech.
However, such configuration of eliciting condition and expression is less com-
mon because reaction usually does not immediately follow action. The emoter
often responds to the effect of his/her action, instead of the action, so there is
typically a shot of the result of the action before the reaction shot. For example,
Episode 12, Season 1 in Friends, there is a scene in which Monica is playing table
football with others. The first shot shows her action of playing the ball, the second
shot shows the ball she scored and the third shot shows her excitement. The suc-
cessive action is interrupted by the second shot which forms a POV structure with
the third shot.
To summarize, this section examines the discursive resources for organizing
the eliciting condition and expression. It shows that different configurations of
eliciting condition and expression are organized in different syntagmas, as shown
in Figure 12.

Fig. 12: Eliciting condition-expression

configuration and syntagmas
5 Conclusion
This study provides a semiotic theorization of how emotion is represented in film,
complementing cognitive approaches which focus on how film elicits emotion
from the viewers. We develop a semiotic framework in which the filmic represen-
tation of emotion is seen as semiotic discursive choices and we apply the strati-
fied semiotic model to film discourse to investigate how emotive meaning is real-
ized through the choices of verbal/nonverbal resources and filmic devices.
Meanwhile, the framework also draws upon the cognitive components of emotion
which provide structure to the representational choices at the semantic level.
Then choice systems for the representation of the two main components of elicit-
ing condition and expression are developed. At the discursive level, the choices
available in the shot organization of the eliciting condition and expression are
examined.
The paper concludes that the social semiotic approach, combined with the
cognitive account of emotion structure, is able to explain how emotion is con-
structed in film, although not all resources are fully discussed (e.g., the use of
music, color, etc.). Such semiotic discussions complements current studies which
focus on film viewer’s emotional response. It does not only explain how various
film techniques work to represent emotion, but is also significant for the study of
viewer emotion since character emotion is the most important source that elicits
viewer emotion.

The multimodal representation of emotion in film 99
Acknowledgments: The research for this article was supported by Interactive

Digital Media Program Office (IDMPO) under the National Research Foundation
(NRF) in Singapore (Grant Number: NRF2007IDM-IDM002-066)
References
Banse, Rainer & Klaus Scherer. 1996. Acoustic profiles in vocal emotion expression. Journal of
Personality and Social Psychology 70(3). 614–636.
Bartlett, Charles F. 1932. Remembering. Cambridge: Cambridge University Press.
Bateman, John. 2007. Towards a grande paradigmatique of film: Christian Metz reloaded.
Semiotica 167(1/4): 13–64.
Bateman, John & Karl-Heinrich Schmidt. 2011. Multimodal film analysis: How films mean.
London: Routledge.
Bednarek, Monica. 2008. Emotion talk across corpora. Basingstoke: Palgrave.
Bless, Herbert, Klaus Fiedler & Fritz Strack. 2004. Social cognition: How individuals construct
social reality. Hove: Psychology Press.
Carroll, Noël. 1996. Theorizing the moving image. Cambridge: Cambridge University Press.
Carroll, Noël. 2003. Engaging the moving image. New Haven: Yale University Press.
Carroll, James M. & James A. Russell. 1997. Facial expressions in Hollywood’s portrayal of
emotion. Journal of Personality and Social Psychology 72. 164–176.
Coplan, Amy. 2006. Catching characters’ emotions: emotional contagion responses to narrative
fiction film. Film Studies 8. 26–38.
Ekman, Paul & Wallace V. Friesen. 1975. Unmasking the face. Englewood Cliffs, NJ: Prentice Hall.
Ekman, Paul & Wallace V. Friesen. 1978. The facial action coding system. Palo Alto, CA:
Consulting Psychologists Press.
Feng, Dezheng & O’Halloran, Kay. L. (forthcoming). Representing emotions in visual images: A
social semiotic approach. Journal of Pragmatics.
Forceville, Charles. 2005. Visual representations of the idealized cognitive model of anger in
the Asterix album La Zizanie. Journal of Pragmatics 37. 69–88.
Frijda, Nico. 1986. The emotions. Cambridge: Cambridge University Press.
Halliday, Michael. 1994. An introduction to functional grammar. London: Arnold.
Izard, Carroll E. 1971. The face of emotion. New York: Appleton Century Crofts.
Kövecses, Zoltán. 2000. Metaphor and emotion: Language, culture, and body in human feeling.
Cambridge: Cambridge University Press.
Lazarus, Richard. 1991. Emotion and adaptation. New York: Oxford University Press.
Martin, James R. & Peter White. 2005. The language of evaluation. New York: Palgrave
Macmillan.
Martin, James R. & David Rose. 2007. Working with discourse. London: Continuum.
Martinec, R. 2001. Interpersonal resources in action. Semiotica 135(1/4). 117–145.
Metz, Christian. 1974. Film language: A semiotics of the cinema. Oxford: Oxford University Press.
Newman, Michael. 2005. Characterization in American independent film. Madison: University of
Wisconsin-Madison dissertation.
O’Donnell, Michael. 1990. A dynamic model of exchange. Word 41. 293–327.
Ortony, Andrew, Gerald Clore & Allen Collins. 1988. The cognitive structure of emotions. New
York: Cambridge University Press.

Planalp, Sally. 1998. Communicating emotion in everyday life. In Peter Andersen & Laura
Guerrero (eds.), Handbook of communication and emotion, 30–48. San Diego: Academic
Press.
Plantinga, Carl. 1999. The scene of empathy and the human face on film. In Carl Plantinga &
Greg Smith (eds.), Passionate views: Film, cognition, and emotion, 239–255. Baltimore:
Johns Hopkins University Press.
Scherer, Klaus. R. 2003. Vocal communication of emotion: a review of research paradigms.
Speech Communication 40. 227–256.
Scherer, Klaus. R. & Heiner Ellgring. 2007. Multimodal expression of emotion: Affect programs
or componential appraisal patterns. Emotion 7. 158–171.
Smith, Craig & Phoebe Ellsworth. 1985. Patterns of cognitive appraisal in emotion. Journal of
Personality and Social Psychology 48. 813–838.
Smith, Greg R. 2003. Film structure and the emotion system. Cambridge: Cambridge University
Press.
Tan, Ed S. 1996. Emotion and the structure of narrative film: Film as an emotion machine.
Mahwah, NJ: Lawrence Erlbaum.
Tseng, C. 2009. Cohesion in film and the construction of filmic thematic configuration: A
functional perspective. Bremen: University of Bremen dissertation.
Wallbott, Herald G. 1998. Bodily expression of emotion. European Journal of Social Psychology
28. 879–896.
Wierzbicka, Anna. 1990. The semantics of emotions: fear and its relatives in English. Australian
Journal of Linguistics 10(2). 359–375.
Bionotes
Dezheng Feng (b. 1983) is a Research Assistant Professor at the Hong Kong Poly-
technic University 〈dezhengfeng@gmail.com〉. His research interests include
social semiotics, multimodal discourse analysis, and cognitive linguistics. His
publications include “Visual space and ideology: A critical cognitive analysis of
spatial orientations in advertising” (2011); “Intertextual voices and engagement
in TV advertisements” (with P. Wignell, 2011); “The visual representation of meta-
phor: A social semiotic perspective” (with K. O’Halloran, 2013); and “Multimodal
engagement in television advertising discourse” (2013).
Kay L. O’Halloran (b. 1958) is an associate professor at Curtin University, Australia

〈kay.ohalloran@multimodal-analysis.com〉. Her research interests include multi-
modal analysis, social semiotics, and mathematics discourse. Her publications
include “Inter-semiotic expansion of experiential meaning: Hierarchical scales
and metaphor in mathematics discourse” (2008); “Multimodal analysis within an
interactive software environment: Critical discourse perspectives” (with S. Tan,
B. A. Smith & A. Podlasov, 2011); “The semantic hyperspace: Accumulating mathe
matical knowledge across semiotic resources and modes” (2011); and “Multi
modal discourse analysis” (2011).

View publication stats

FengOHalloran Semiotica2013 0082

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FengOHalloran Semiotica2013 0082

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

The Multimodal Representation of Emotion in Film: Integrating Cognitive and

Article in Semiotica · December 2013

William Dezheng Feng

Multiliteracies pedagogy in English language teaching View project

The user has requested enhancement of the downloaded file.

Dezheng Feng and Kay L. O’Halloran

Keywords: film; emotion; multimodal representation; social semiotics; cognitive

Dezheng Feng: The Hong Kong Polytechnic University. E-mail: dezhengfeng@gmail.com

Brought to you by | Curtin University Library

viewer’s cognitive capacity (Bateman and Schmidt 2011: 1, emphasis added). In

Brought to you by | Curtin University Library

Fig. 1: Semiotic strata in language and film

2 Resources of representation: The cognitive

Brought to you by | Curtin University Library

that emotion antecedents drive response patterning in terms of physiological re-

2.2 The appraisal of eliciting conditions

Brought to you by | Curtin University Library

2.3 The multimodal resources of emotion expression

Brought to you by | Curtin University Library

Multimodal accounts of emotion are rare, despite acknowledgement by affec-

3 Multimodal construction of eliciting condition

Brought to you by | Curtin University Library

3.1 The multimodal construction of eliciting conditions

Fig. 2: The representation of eliciting condition

The primary distinction is between eliciting conditions whose relations to the

Brought to you by | Curtin University Library

3.2 The multimodal representation of emotion expressions

Fig. 3: The representation of emotion expression

Brought to you by | Curtin University Library

3.2.1 The multimodal resources of emotion expression

The expression of emotion is mainly studied in two disciplines: linguistics, which

Brought to you by | Curtin University Library

3.2.2 Discursive choices of representation

Emotion expression may be as simple as a single facial expression, or as complex

Brought to you by | Curtin University Library

Fig. 4: Illustration of simple individual expression

In what follows, we shall focus on interactive expressions, which are essential in

Brought to you by | Curtin University Library

Fig. 5: The communication of emotion in multimodal interaction

Fig. 6: Emotion communication in reverse shots

However, as with all realizational relations, there is no one-to-one correspon-

Brought to you by | Curtin University Library

4 Filmic organization of eliciting condition and

Brought to you by | Curtin University Library

Fig. 7: Shots and syntagmas of eliciting condition-expression configuration

Brought to you by | Curtin University Library

4.1 The single shot representation

The representing capacity of one shot is indefinite. It can be as simple as a single

4.2 Projection and the POV structure

Brought to you by | Curtin University Library

representation of emotion. According to Carroll, the point/glance shot sets out a

Fig. 8: Illustration of the POV structure

Brought to you by | Curtin University Library

4.3 Alternating syntagma and reverse shots

Alternating syntagma portrays two or more series of events or interacting part-

Brought to you by | Curtin University Library

Fig. 9: Interaction structure and eliciting condition-expression configuration

Fig. 10: Semiotic strata in reverse shots

In action-oriented interaction, D2 (secondary doer) reacts to D1’s (primary doer)

Brought to you by | Curtin University Library

trates emotional reaction to the goal-incongruent response. Commodus orders

2 Resources of representation: The cognitive

3 Multimodal construction of eliciting condition

4 Filmic organization of eliciting condition and