You are on page 1of 22

visual communication


Exploring dialogic engagement

with readers in multimodal
EFL textbooks in China

School of Chinese as a Second Language, Guangdong, China

This article examines, from a social semiotic perspective, how multimodal
resources in EFL textbooks are deployed to enable dialogic engagement
with readers. A range of multimodal features in the textbooks, including
illustrations and the labelling on illustrations, dialogue balloons, incomplete
jointly-constructed texts, and highlighting, are identified as enabling editor
voice to negotiate meanings with character and reader voice. Drawing upon
and interrogating the appraisal systems of engagement and graduation, it
shows how these semiotic resources function to realize various kinds and
degrees of heteroglossia. It is also found that the way in which a certain
engagement value can be scaled is strongly associated with the intrinsic
property of a given multimodal resource, such as the projective structure of
dialogue balloons, the co-constructedness of jointly-constructed texts, and
the supporting role of illustrations. The findings are hoped to shed light on
the understanding of dialogic process in a pedagogic context.

dialogic engagement dialogue balloons gradability of engagement values
highlighting illustration labelling jointly-constructed text multimodal
EFL textbooks

School textbooks, among other teaching materials, have attracted scholarly
attention in education reforms in China (Wang, 2000). One remarkable
feature of the present-day textbooks is the pervasive use of multimodal
resources that involve both verbal and visual semiotic modes. These resources

SAGE Publications (Los Angeles, London, New Delhi, Singapore and Washington DC: Copyright The Author(s), 2010.
Reprints and permissions:
Vol 9(4): 485506 DOI 10.1177/1470357210382186
include dialogue balloons, illustrations, labelling on illustrations, incomplete
texts that involve images, and highlighted parts in multimodal texts. While
studies of textbook images have addressed a number of issues, including eye
movement in relation to the presence/absence of images (e.g. Chen and Zhang,
1997; Huang, 1999; Song, 2005; Tao and Shen, 2003), most existing work
mainly considers the How (e.g. How to use images in classroom teaching?)
but neglects the Why (e.g. Why the image is needed in a given context? Can
we do without it?, Has the image achieved what it is supposed to achieve, and
why?) (Chen, 2005: 94).
In addition to the multimodal features, another relevant characteristic
of textbooks is their dialogic nature. For instance, dialogic process is
advocated throughout classroom teaching in the given pedagogic context,
and the role of textbooks is considered to be essential in this process (Chen
and Ye, 2006). Nevertheless, the ways in which multimodal resources can be
deployed to manage the dialogic/heteroglossic setting in textbooks remain
relatively unexplored. A close social semiotic reading of the visual as well as
verbal features of textbooks may suggest ways to better understand
and inter-
pret dialogia/heteroglossia in the pedagogic context. Textbooks for teaching
English as a foreign language (henceforth EFL) are far less investigated
compared to textbooks for other school subjects, which is recognized as one
of the main drawbacks in textbook research (Zhang, 2005). The aim of this
article is to examine how the multimodal features in EFL textbooks work to
engage the readers in ways that align them to set pedagogic goals. The present
study also explores to what extent the multimodal dialogic engagement with
readers opens up or closes down space for the negotiation of meanings.
The discussion is supported by social semiotic analysis by adapting
and extending the kinds and degrees of heteroglossia (technically termed
engagement and graduation) developed as part of appraisal theory (Hood and
Martin, 2007; Martin, 2000; Martin and White, 2005; White, 2003). Along with
the pedagogic demands mentioned earlier, the theoretical landscape in which
the current research is situated also calls for further multimodal exploration
in the semantic regions of engagement and graduation. Recent advances in
appraisal research have gone beyond language to include other semiotic modes
including images. Nevertheless, most of these studies have mainly focused on
attitudinal meanings (e.g. Economou, 2006), leaving the aspects of engagement
and graduation under-theorized. This article attempts to explore new ground
by identifying and analysing multimodal meaning-making resources that
realize and grade engagement values in multimodal texts.

As indicated earlier, the focus of this study is kinds and degrees of heteroglossia,
and how they are achieved in multimodal EFL textbooks. The notion of
voice explored here derives from the fundamental works by Bakhtin (1981,
1986) on heteroglossia, i.e. the dialogic and multi-voiced nature of all verbal

486 Visual Communication 9(4)

communication. According to Bakhtin (1981: 281), all spoken and written texts
are intrinsically multi-voiced and should be understood against a backdrop of
other concrete utterances on the same theme. As Bakhtin (1986) states:

Each utterance is filled with echoes and reverberations of other

utterances Each utterance refutes, affirms, supplements, and
relies on the others, presupposes them to be known and somehow
takes them into account each utterance is filled with various kinds
of responsive reactions to other utterances of the given sphere of
speech communication. (p. 91)

The notion of heteroglossia was later developed by Kristeva (1986) into

intertextuality. Here we mainly focus on the contributions systemic semioti-
cians bring to the studies on voice, because it is systemic functional semiotics
(henceforth SFS) (Martin, 2008)1 that the present study is aligned with. The
semioticians grammatics approach that investigates multimodal phenomena
as systems of meaning (Halliday, 1993: 52), among other advantages, makes
SFS a powerful tool in comprehensively and systematically elucidating the het-
eroglossic features under discussion. Starting with the semiotic system of lan-
guage, systemic linguists examine the heteroglossic nature of texts by analysing
projection resources at the level of lexicogrammar (Halliday, 1994; Halliday
and Matthiessen, 2004; Martin and Rose, 2007). The projection resources
that introduce other voices into a text include verbal and mental processes
that project locutions and ideas (e.g. I think , He doubts ), and embed-
ded projections realized in nominal groups (e.g. assertions, beliefs). Along with
projection, other linguistic resources such as those of modality (e.g. probably,
possibly) and concession (e.g. although) also bring in other voices and enable
the negotiation between voices (Martin and Rose, 2007).
At the level of discourse semantics, a number of voice studies within
systemic functional linguistics (henceforth SFL) view voice in its abstract
sense, modelling it as the recurrent configuration of evaluative resources in
texts. Iedema et al. (1994) study multiple voicing in media discourse, identify-
ing reporter voice, correspondent voice and commentator voice in differ-
ent media genres. Coffins (2000) work reveals the voice options of recorder,
interpreter and adjudicator in history discourse. Hood (2004) examines
voice in academic writing from two perspectives: the actual source of a given
evaluation, and the abstract evaluative syndromes technically named as voice
roles (i.e. observer, investigator and critic). It is noteworthy that voice in
the abstract sense is distinct from the concept of voice in heteroglossia stud-
ies. The former deals with the sub-potential of evaluative meanings that are
characteristic of a specific register (e.g. reporter voice in media discourse),
which can be explicated as key in the cline of instantiation concerning evalu-
ation (Martin and White, 2005: 164); whereas the latter is concerned with the
sources of propositions and evaluations that are heteroglossically present in
texts (e.g. the voice in Bakhtins notion of dialogism).

Chen: Exploring dialogic engagement with readers 487

The perspective on voice drawn upon in this study centres around
voice in the sense of source, i.e. whether a given proposition or evaluation is
attributed to the author or to another source that is co-present in the text. In
particular, the current research owes most to the taxonomy of engagement
meanings (known as the semantic system of engagement) and the ways in
which engagement meanings can be graded (known as graduation) developed
as part of appraisal theory within SFL (Martin, 2000; Martin and White,
2005). In adapting and extending engagement and graduation to multimodal
discourse, the present study is intended to offer insights into the heteroglossic
features in the given multimodal pedagogic context.


Modelling interpersonal meaning at the level of discourse semantics, appraisal
theory is designed as consisting of three subsystems for exploring the
attitudinal meanings encoded in a text (i.e. attitude), the ways in which space
can be opened up or closed down for different voices (i.e. engagement), and
the resources for manipulating the strength of feelings or degree of alignment
(i.e. graduation). Each of the subsystems within appraisal has its own
subcategories or options, and these options are semantic ones that transcend
diverse lexicogrammatical structures (Hood, 2004: 1314). It can be argued
that a wide range of meaning-making resources, including those that are non-
verbal, could be brought together and considered systematically as construing
a global rhetorical and evaluative orientation. This feature of appraisal theory
enables us to extend the discussion of engagement and graduation from
monomodal communication to multimodal discourse.
The engagement system covers both monoglossic and heteroglos-
sic aspects. The focus of this study is how multiple voices can be mediated
through multimodal meaning-making resources, and therefore the discus-
sion is concentrated on the heteroglossic dimension. The engagement sys-
tem sets up networks of options for modelling the expansion and contraction
of heteroglossic space in a text. As Martin and White (2005) articulate, the
engagement network covers all those locutions which provide the means for
the authorial voice to position itself with respect to, and hence to engage
with, the other voices and alternative positions construed as being in play
in the current communicative context (p. 94). To be specific, the taxonomy
of engagement meanings includes four main categories (Martin and White,

Disclaim: the textual voice positions itself as at odds with, or rejecting

some contrary position;
Proclaim: by representing the proposition as highly warrantable (compelling,
valid, plausible, well-founded, generally agreed, reliable, etc.), the textual
voice sets itself against, suppresses or rules out alternative positions;

488 Visual Communication 9(4)

Entertain: by explicitly presenting the proposition as grounded in its
own contingent, individual subjectivity, the authorial voice represents
the proposition as but one of a range of possible positions it thereby
entertains or invokes these dialogic alternatives;
Attribute: by representing proposition as grounded in the subjectivity of
an external voice, the textual voice represents the proposition as but one
of a range of possible positions it thereby entertains or invokes these
dialogic alternatives. (pp. 978)

More delicately, these engagement meanings can be divided into sub-

categories. For instance, disclaim is further divided into deny and counter
to specify dialogic positioning. Less delicately, the four types of engagement
meanings can be generalized into two broad categories, i.e. dialogic expan-
sion and dialogic contraction. Those resources that entertain alternative
voices within the text and those that attribute propositions to voices outside
the text function to open up the heteroglossic space are classified as dialogic
expansion. On the other hand, those resources that disclaim oppositional
opinions and those that proclaim a given proposition function to close down
the heteroglossic space are categorized as dialogic contraction.
Another semantic system that is of high relevance to the current
research is graduation, which is central to the whole appraisal system because
attitude and engagement can be regarded as domains of graduation which
differ according to the nature of the meanings being scaled (Martin and
White, 2005: 136). Details of this semantic region will be discussed later
when we examine the gradability of engagement values. Adopting a view of
multimodality, we are working towards the goal of understanding how verbal
and visual semiotic resources can be deployed to realize various kinds and
degrees of dialogic engagement. As Kress and Van Leeuwen (2001) point out:
within a given social-cultural domain, the same meanings can often be
expressed in different semiotic modes we move towards a view of multi-
modality in which common semiotic principles operate in and across differ-
ent modes (pp. 12).
The EFL textbook discourse under discussion can be viewed as a typical
configuration of semiotic choices from linguistic and visual semiotic systems.
In the ensuing section, a brief description of the multi-voiced nature of EFL
textbook discourse will be provided, which is followed by a closer examination
of the interplay between multiple voices and the management of heteroglossic


Our data are the whole series of 17 EFL textbooks for primary and secondary
education published by Peoples Education Press (henceforth PEP) between
2002 and 2006. As observed by Chen and Qin (2007), three voices constitute
the heteroglossic backdrop of EFL textbook discourse, i.e. editor voice, reader

Chen: Exploring dialogic engagement with readers 489

voice2 and character voice. Editor voice is frequently found in section titles,
which often take the form of imperative clauses. These imperative clauses are
either the unmarked ones that have no Mood such as Read and write, Write
and draw, and Match and say (e.g. the section title Write and draw in Figure 1),
or the lets ones where the understood Subject is you and I (Halliday, 1994: 87)
like Lets read, Lets play, Lets draw, and Lets sing. If treated like modulation,
imperative opens up monoglossic setting to include the reader, and hence the
editors role as a participant in the dialogic exchange is recognized. In addition
to imperative clauses, section titles can also be nominal groups such as Pair
work, Group work, Story time, Pronunciation, Task time, Culture, and adjectival
groups like Good to know. Nominal groups as section titles often imply the
actions the reader is supposed to take in the teaching section (e.g. Group work
implies You must carry out the task in groups). This kind of section title
backgrounds the proposer and conceals the must-ness of the proposal by
treating the proposal as an entity, which tends to bear some resemblance to
demodalization in administrative discourse (Iedema, 1995). Demodalization
can be achieved in various ways, ranging from nominalization (i.e. naming the
commanding action) such as Pronunciation, demodulation (i.e. masking the
proposer and the controlling nature of the command) such as Pair work and
Story time, to generalization (i.e. generalizing the demodulations as members
of a higher category) like Culture. Through ideationalizing the proposal, the
process of control and demand is disguised as a natural and non-negotiable
rule or institutional entity, and thus the editors power is further enhanced.
Along with section titles, other resources including the labels attached
to certain objects within an image also convey editor voice. For example,
in Figure 1 the unfinished words _ce-cream, _ish, _oose and _amburger are
inserted as labels into an image that depicts a chef holding a big tray full of
dishes. This practice is termed as labelling in the present discussion,3 which
will be taken up again with more details in the following section when we
identify and analyse engagement devices.

Figure 1 Multiple voices in EFL textbook discourse (adapted from PEP Primary English
Students Book I for Year 4, 2003: 9). Reproduced with permission.

490 Visual Communication 9(4)

One notable feature of the EFL textbooks under discussion is the
extensive use of cartoon characters. Character voice is another source of
propositions co-present with editor voice. The presence of character voice
is often indicated through a speech process visually realized by a dialogue
balloon (Kress and Van Leeuwen, 2006: 68) with an oblique line linking the
character with his or her utterance. In Figure 1, for instance, the oblique
protruding line from the dialogue balloon connects the chef with his
utterance What are they? Do you know? Dialogue balloons will be analysed
later in this article as another engagement device widely applied in EFL
As Martin and White (2005: 1624) point out, the subjectively deter-
mined reading position is the end point of instantiation, which activates the
attitudinal positions in a text. The reader of an EFL textbook is not merely a
passive addressee. Rather, the exercise sections, termed as jointly-constructed
texts in the current discussion, involve the readers active participation. In
Figure 1, for example, the drawings of empty dishes and the unfinished words
_ce-cream, _ish, _oose and _amburger are left for the readers completion,
and reader voice is thus actively involved. In SFL, language is regarded as a
semiotic system of meaning potential, and language behaviour is interpreted
as choice (Halliday, 1978: 39). Language is only one of the semiotic systems
that constitute a culture (p. 2). If this principle of potentiality is extended to
include other semiotic modes, it can be argued that in making a choice from
a semiotic system to finish a jointly-constructed text, what the reader actually
writes or draws (i.e. the actual semiotic choice) achieves its meaning by being
interpreted against the background of what he or she could have written or
drawn (i.e. potential semiotic choices). Jointly-constructed text as an engage-
ment resource will be accounted for as the analysis unfolds.


n this section, we consider each of the multimodal resources identified ear-
lier, namely labelling, dialogue balloons and jointly-constructed text, in terms
of the interplay of voices they present and the ways in which they function
to open up or close down space for negotiation of meanings. In the EFL
textbooks there are two additional resources for managing heteroglossic space:
illustration and highlighting. In what follows, these five types of multimodal
engagement resources will be analysed with examples from our data.

The term labelling used in this study refers to the practice of inserting
labels into an image. In the EFL textbooks under attention, labelling can be
frequently observed in those images depicting the scene of a story that may
provide a context for language in use (see Figure 2).

Chen: Exploring dialogic engagement with readers 491

Figure 2 Labelling (adapted from PEP Primary English Students Book II for Year 4,
2003: 31). Reproduced with permission.

The image in Figure 2 is a narrative representation (Kress and Van

Leeuwen, 2006: 45), describing the character Amy looking for her white socks
in haste in her bedroom. There is an action process, with Amy being the Actor
and clothes the Goal, while Amys arms serve as two vectors. Watching the
messy bedroom, Amys mother is anxious to help. Therefore, the image also
includes a reactional process, with Amys mother as the Reactor and the scene
in the bedroom as the Phenomenon. The vector in the reactional process is
formed by the eyeline of Amys mother. Speech process is also involved, and
the two characters voices are conveyed through two dialogue balloons.
Editor voice loses no time in negotiating meanings with character
voice via labelling. By putting the labels of jeans, pants, shorts, socks, and shoes
onto the clothes scattered around the room, editor voice challenges or rules
out alternative positions and thus limits the range of choices. For example,
the label shoes fends off such alternatives as sneakers or sports shoes. If put into
words, the meaning of this labelling can be expressed in the clause I contend
that they are called shoes. Therefore, we may argue that labelling functions
to contract the heteroglossic space, realizing proclaim, or pronouncement
to be more precise. By overtly pronouncing the names of the items portrayed
in the image, editor voice intervenes in the visual narrative representation.
In this sense, labelling can be viewed as an interpolation of the authorial

492 Visual Communication 9(4)

presence so as to assert or insist upon the value or warrantability of the propo-
sition (Martin and White, 2005: 128). Labelling is dialogic in the sense that it
acknowledges the presence of counter viewpoints in the given communicative
context, while at the same time it is also contractive due to its challenge or
resistance to possible dialogic alternatives. By closing down the heteroglossic
space in the text, the practice of labelling helps concentrate the readers
attention on the pronounced vocabulary items, which are part of the language
goals prescribed for students to achieve.

Dialogue balloons
Dialogue balloons, with an oblique protruding line linking the speaker with the
utterance, are commonly found in the EFL textbooks. Kress and Van Leeuwen
(2006: 68) describe this type of visual structure as projective because the
utterance is not represented directly but mediated through a speaker. In EFL
textbook discourse three types of dialogue balloon are identified, based on
different functions they perform: lending support to editor voice, explaining
the rules of games by demonstration, and giving directions to the reader,
which are illustrated in Figure 3.
In bringing the external character voice into a text, some dialogue
balloons function to lend support to editor voice. For instance, the upper part
of Type 1 in Figure 3 verbally describes seasonal differences between Beijing
and Sydney. The image underneath the verbal text represents a dialogue
between Chen Jie and John. John asks Chen Jie what season it is in March
in Beijing. She says it is spring and asks him what season it is in Sydney. He
then answers it is fall. The dialogue balloons here bring in character voice,
associating the proposition advanced by editor voice in the verbal text (i.e.
In Beijing, its spring from March to May. Summer is from June to August. Fall
is September to November. Winter is December to February the next year. But,
in Sydney, its spring from September to November. Summer is from December
to February the next year. Fall is from March to May.) with the external source
of support from the characters who are presumably students from China and
Australia. Accordingly, the dialogue balloons contribute to the heteroglossic
alliance in the text, realizing the engagement meaning of attribute, or
acknowledging to be more precise. By introducing character voice that is
in support of the argument, editor voice presents the proposition as highly
credible, hence aligning the reader into the putative reading position.
The second type of dialogue balloon functions to explain rules of
games by demonstration, which is exemplified by Type 2 in Figure 3. Through
the use of cartoons, it shows the readers (i.e. primary school students) how
to practise the expressions of commanding and offering by playing a game
with pictorial cards. There is no verbal instruction, but the image involving
a dialogue between the characters Sarah and Wu Yifan demonstrates the way
the students may perform. The dialogue balloons bring in character voice,
which encourages the reader to play a similar one. Visual demonstrations with

Chen: Exploring dialogic engagement with readers 493

Type 1: Lending support to editor voice (adapted from PEP Primary English Students
Book II for Year 5, 2003: 24). Reproduced with permission.

Type 2: Explaining rules of games by demonstration (adapted from PEP Primary English
Students Book I for Year 3, 2003: 53). Reproduced with permission.

Type 3: Giving directions to the reader (adapted from PEP Primary English Students
Book I for Year 6, 2003: 12). Reproduced with permission.
Figure 3 Three types of dialogue balloon.

494 Visual Communication 9(4)

dialogue balloons are frequently found in task-oriented teaching sections such
as Lets play, Task time, Pair work and Group work, where interaction between
readers is required in fulfilling the tasks. At least one way of accomplishing
the task is vividly demonstrated by characters in those teaching sections.
Herethe dialogue balloons actively make allowance for character voice, opening
up the heteroglossic space and realizing the engagement meaning of attribute.
In most images in the EFL textbooks, the characters do not gaze out at
the viewer. As for those few cases where characters look directly at the viewer,
eye contact is established between the two parties. In this type of contact4
image (Painter, 2007), directions are often given through imperative or
interrogative clauses in dialogue balloons. Type 3 in Figure 3 is a case in point.
In this image a little policeman looks at the viewer outside the picture frame.
This gaze symbolically invites the viewer to engage in an imaginary relation.
Character voice, as indicated in the dialogue balloon, gives directions to the
reader, and the utterance in the dialogue balloon (i.e. Look, read and match)
clarifies what is required from the reader in completing the exercise (i.e. to
match the traffic signs with their corresponding meanings). It can be inferred
from the analysis of Types 1, 2 and 3 in Figure 3 that dialogue balloons are an
engagement resource that function to introduce character voice into a text,
thus realizing the meaning of attribute.

Jointly-constructed text
Jointly-constructed text refers to any text that is intentionally unfinished and
aims to involve the readers participation in its ultimate completion. Jointly-
constructed text, which is essential in aligning the readers comprehension, has
found wide application in EFL textbooks. It takes a great variety of forms, and
the multimodal modes of communication further enrich the ways of engaging
reader voice. Figure 4 shows two types of jointly-constructed text. One is a
jointly-constructed drawing exercise (i.e. Type 1), and the other is a multi-
modal jointly-constructed herald page that appears at the very beginning of a
teaching unit (i.e. Type 2). The focus of analysis here is Type 1, while Type 2
will be analysed in the following section when gradability is discussed.
The main body of Type 1 in Figure 4 is an unfinished picture of a
human face, which is an analytical process (Kress and Van Leeuwen, 2006:
87) to be completed. In this analytical process, the human face picture is
the carrier, while the eyes, ears, nose and mouth to be drawn by the reader
are possessive attributes. The labels eye, ear, nose and mouth proclaimed by
editor voice indicate what should be drawn and where they should be drawn.
The reader is thus required to follow the labels to complete this structured
analytical process.
The strong, diagonal line of the pencil at the top left corner forms a
vector, indicating the presence of a narrative representation. According to
Kress and Van Leeuwen (2006: 5963), in a transactional visual narrative
process the vector links the Actor from which the vector departs and the Goal

Chen: Exploring dialogic engagement with readers 495

Type 1: Jointly-constructed drawing exercise (adapted from PEP Primary English
Students Book I for Year 3, 2003: 16). Reproduced with permission.

Type 2: Multimodal jointly-constructed herald page (adapted from Go For It Students

Book II for Year 7, 2005: 59). Reproduced with permission.
Figure 4 Two types of jointly-constructed text.

496 Visual Communication 9(4)

at which the vector is directed. The Actor can sometimes be fused with a
vector. For example, the salient pencil in Type 1, foregrounded with full colour
saturation, plays the dual role of both Actor and vector. The Goal in this
transactional action process is the unfinished drawing, and the vector formed
by the oblique pencil encourages the reader to participate in co-constructing
the multimodal text.
In terms of visual composition, at the top right corner there is a
smaller image depicting the character Mike holding a finished picture of a
human face. The upper right position indicates that the smaller image is the
Ideal and the New (Kress and Van Leeuwen, 2006: 17994). Compared with
the upper smaller image, the larger unfinished human face is placed at the
lower left position, which suggests its status as the Real and the Given. In other
words, the unfinished drawing is presented as the practical, agreed-upon
starting point, whereas the finished picture is the idealized and generalized
information that is unknown to the reader. In light of the labels that indicate
the possessive attributes and the contour of a human face, the reader is
supposed to achieve the distant, ideal goal as indicated in the upper smaller
image. In this multimodal text, the practice of labelling conveys editor voice,
whereas the upper smaller image expresses the character voice (this will be
analysed later as illustration). The lack of completion or fulfilment in this
multimodal jointly-constructed text in effect opens up a space for the reader
voice to join in. Reader voice engages with editor and character voice through
co-constructing the text. Owing to the fact that the answer to this multimodal
exercise comes from the external source of reader voice, the meaning of the
unfinished jointly-constructed text can be expressed in the clause According
to the reader, the picture will be . It may be inferred that jointly-constructed
text realizes the engagement meaning of attribute, expanding the heteroglossic
space by bringing in reader voice.

According to Barthes (1977: 401), textimage relations can be classified into
three types: anchorage (text elucidates image); illustration (image elucidates
text); and relay (text and image stand in a complementary relationship).
The major social purpose and function of EFL textbooks is language teaching/
learning, and there are quite a number of illustrative images clarifying or
supporting the corresponding verbal texts. In what follows, I examine the
possible engagement meanings an illustration may realize.
Let us take Type 1 in Figure 5 to describe how an image may illustrate a
verbal text by serving as the link between different parts of it. In Type 1, the
features that are supposed to be learned in the teaching section Lets say are the
letters j, k as well as the words jeep, jump, kangaroo, and key. Here two images
in comic style play an essential role in associating the otherwise unrelated
words with each other in a meaningful way. To be specific, the image on the
left links jeep with jump by describing a jeep frightening a frog so it jumped

Chen: Exploring dialogic engagement with readers 497

Type 1: Illustration as the link between verbal texts (adapted from PEP Primary English
Students Book II for Year 3, 2003: 26). Reproduced with permission.

Type 2: Illustration (adapted from PEP Primary English Students Book I for Year 6,
2003: 68). Reproduced with permission.
Figure 5 Two types of illustration that realize attribute meaning.

away; whereas the one on the right associates kangaroo with key by depicting
a big key in a kangaroos pouch. In doing so, the words to be learned are
meaningfully connected through the funny cartoons. The editor establishes
the linkage between verbal texts by resorting to character voice, and character
voice on the other hand acknowledges and lends support to editor voice. The
degree of openness in the multimodal text is therefore increased. In this sense,
the illustrations function to realize the engagement meaning of attribute.
Illustration may encode disclaim as well, which can be observed in
those images that describe improper behaviours (e.g. Figure 6). In Figure 6,
editor voice is explicit in the five imperative clauses (i.e. Be quiet in the library,
Dont drink or eat in the computer room, Dont walk on the grass in the garden,
Dont push in the hallway, and Dont waste food in the canteen), which advise
the reader not to violate the regulations in school. Five images are employed

498 Visual Communication 9(4)

Figure 6 Illustration of improper behaviours (adapted from PEP Primary English
Students Book II for Year 4, 2003: 12). Reproduced with permission.

to illustrate the imperative clauses by describing five instances of violation,

i.e. playing noisily in the library, drinking and eating in the computer room,
walking on the grass in the garden, pushing in the hallway, and wasting food
in the canteen. Consequently, editor voice in the verbal texts and character
voice in the five illustrative images go against each other. The illustrations
here function to close down the heteroglossic space, realizing the engagement
meaning of disclaim.

Another engagement device found in EFL textbooks is highlighting. Certain
elements in a visual display are more eye-catching than the others, hence
creating a hierarchy of visual importance. According to Kress and Van
Leeuwen (2006: 2013), a high degree of visual importance is referred to as
salience, which can be realized through relative size, colour contrast, tonal
contrast, placement in the composition, sharpness of focus, and all other
means that attract the viewers attention. In this section, we explore ways of
prioritizing certain visual elements through the choice of colour or typeface,
i.e. highlighting, and examine the way it contributes to the construction of a
heteroglossic setting.
In Figure 7, for instance, the whole multimodal text includes two parts.
The upper part describes a girl asking a policeman about the location of the

Chen: Exploring dialogic engagement with readers 499

Figure 7 Highlighting (adapted from PEP Primary English Students Book I for Year 6,
2003: 16). Reproduced with permission.

library. The word library is highlighted in blue in the dialogue balloon, which
indicates it is merely one of the many possible options that can be chosen
from. Other alternatives can be found in the five smaller images in the lower
part, depicting five buildings, i.e. post office, hospital, cinema, bookstore and
science museum. Each of the five alternatives can be substituted for the word
library that has already been chosen in the dialogue balloon. If put into words,
the engagement meaning of the reason for highlighting the word library is
It is possible that you want to go to the library or You may want to go to the
library. It is noteworthy that these alternatives all come from the multimodal
text itself. The highlighted word library opens up the heteroglossic space to
allow for other voices coming from the text itself instead of from somewhere
outside the multimodal text. In other words, the multimodal text per se has
already determined the range of possible alternatives (i.e. the five locations
represented in the five small images). Highlighting in this multimodal text
indicates the status of the highlighted word as one of the possible options
against the wider backdrop that consists of alternatives from the multimodal
text itself, and it functions to realize the engagement meaning of entertain.


Based on the analysis in this article, we are now in a position to discuss how
the previously mentioned multimodal meaning-making resources may
up-scale and down-scale engagement values. Gradability is a general feature

500 Visual Communication 9(4)

of the engagement system because the engagement values are scaled to the
degree of the speaker/writers investment in a given value position (Martin
and White, 2005: 1356). Graduation operates along two axes, i.e. force and
focus. The former grades meanings in terms of the intensity or amount of a
scalable value, while the latter applies to the usually non-scalable categories,
grading meanings according to the prototypicality and preciseness by which
the categorical boundary is drawn. Hood and Martin (2007) further extend
the graduation network, specifying force as embracing intensity (of a
quality), quantity (of a thing) and enhancement (of a process), and focus
encompassing fulfilment. Considering the fact that the meanings scaled
within the engagement system vary from subsystem to subsystem (Martin
and White, 2005: 135), we concentrate the discussion here on the same type of
engagement device that realizes the same engagement meaning.

Dialogue balloons: role of the character voice

In this analysis, we have categorized dialogue balloons in EFL textbooks into
three types according to their functions (see Figure 3): those that support
editor voice (Type 1), those that explain rules of games by demonstration
(Type 2), and those that give directions to the reader (Type 3). In Type 1,
editor voice is present in the communicative context, represented in a separate
paragraph above the image. The support from character voice further enhances
editor voice by providing evidence for the editors statement. The editors
proclamation is accordingly strengthened, and hence the heteroglossic space
is in a sense contracted as compared with the two other types of dialogue
Editor voice in Type 2, on the other hand, is absent from the
demonstration of games, while character voice explains the rule of a game by
both verbal and visual demonstration. As for Type 3, the editor who should
have given directions chooses to hide away from the dialogic setting, and
character voice is solely responsible for the exercise instruction. Accordingly,
it may be justified to say that Type 3 opens up more space for character voice
than Type 2 because the responsibility for giving instructions should have been
undertaken by an editor whereas demonstrations are not necessarily given by
an editor. Type 2 in turn expands more heteroglossic space than Type 1 in
that editor voice is absent from Type 2 but present in Type 1, and character
voice in Type 1 actually enhances editor voice. In other words, graduation
operates along the axis of force, i.e. the amount of responsibility the characters
undertake in instructing the reader or the amount of space opened up to
engage character voice.

Jointly-constructed text: degree of completion

Two types of jointly-constructed text have been identified earlier (see Figure
4): jointly-constructed drawing exercise (Type 1), and multimodal jointly-
constructed herald page (Type 2). The gradable attribute values encoded

Chen: Exploring dialogic engagement with readers 501

in jointly-constructed text can be approached in light of the degree of its
completion or fulfilment (Hood and Martin, 2007), namely, its prototypicality
(i.e. focus) as a co-construction that involves the readers participation. It
is argued that the higher degree of completeness a jointly-constructed text
possesses, the less heteroglossic space it opens up to reader voice. This is
because a jointly-constructed text will not have much room for external voices
to participate in the co-construction if it is almost complete on its own.
The image in Type 1 is far from being finished. Moreover, the readers
participation adds new meanings to the original text. This type of jointly-
constructed text invites, or even demands participation of the external reader
voice, and therefore may be scaled as possessing relatively high5 attribute
value. Type 2, on the contrary, is itself a complete image as far as drawing goes.
Five square boxes are inserted and left blank for the reader to fill in according
to tape a recording. In other words, the square boxes that demand the readers
participation are later added into the complete image. The incompleteness
of Type 2 is not an inherent property but a feature added later, and hence it
encodes relatively low attribute value.

Illustration: indispensability in linking verbal texts

As analysed here, illustration may associate otherwise unrelated verbal texts in a
meaningful way. The force of attribute meaning realized in illustration can be
investigated by looking at the indispensability of the image in linking different
parts of a verbal text. We clarify this by drawing a comparison between Type 1
and Type 2 in Figure 5. Type 1 functions to associate the two semantically
unrelated words with each other through the use of a meaningful and intrigu-
ing visual display. There would not be any semantic relevance between the two
words with the same initial letter (i.e. jeep and jump, kangaroo and key) if the
image were removed. On the contrary, the images in Type 2 are not so indis-
pensable to the whole multimodal text as compared with Type 1. Even if the
images were removed, the verbal text would still be coherent and consistent.
In other words, the multimodal text in which Type 1 is embedded calls for the
character voice to integrate different parts of the verbal text, hence opening up
more heteroglossic space and realizing higher attribute value.
The current discussion of gradability is mainly conducted within
the same multimodal engagement resource realizing the same engagement
meaning in EFL textbook discourse. Further studies may be conducted to
find out other multimodal engagement resources in various communicative
contexts, and a graduation continuum could be established within the same or
across different semiotic resources.

The concern of this article is two-fold. One is to identify and analyse the
multimodal resources that enable dialogic engagement in EFL textbook

502 Visual Communication 9(4)

discourse, and the other is to discuss how these multimodal resources scale
up or down engagement values. Five types of multimodal resources have been
identified as engagement devices. Specifically, labelling allows editor voice to
negotiate meanings with character voice by fending off alternative positions,
hence insisting upon the prescribed teaching goals and realizing proclaim.
Dialogue balloons and illustrations bring in character voice, and a given
proposition or viewpoint is thus attributed to characters. Jointly-constructed
text opens up space to introduce reader voice. Within certain illustrations editor
voice and character voice may go against one another and hence disclaim is
realized. Highlighting may function to entertain other possibilities that are
grounded within the contingent subjectivity of a multimodal text.
Efforts have also been made to explore the gradability of engagement
values in multimodal texts. A dialogue balloon may realize various degrees
of attribute value based on the amount (i.e. force) of responsibility
that characters undertake in instructing the reader. The degree of reader
involvement is closely related to the prototypicality (i.e. focus) of a jointly-
constructed text as a co-construction. The intensity (i.e. force) of attribute
value that an illustration realizes lies in its indispensability in linking different
components of a verbal text. It can be inferred from the findings that, on the
one hand, one type of engagement meaning may be realized through different
multimodal devices (e.g. Attribute can be realized by dialogue balloon or
illustration). On the other hand, one type of multimodal resource may encode
different engagement meanings (e.g. illustration may function to encode
attribute or disclaim). The way in which a multimodal resource may grade
an engagement value is strongly associated with the characteristics of that
meaning-making resource. This article has sought, albeit briefly, to suggest
ways in which engagement and graduation within appraisal theory may be
explored in multimodal texts. In bringing a knowledge of linguistics and
social semiotics to an understanding of multimodal pedagogical materials, it
may allow insights into the dialogic process currently advocated in the given
pedagogic context.

The author wishes to express her heartfelt thanks to the two anonymous
reviewers for their insightful, constructive and sympathetic comments on an
earlier draft of this article. The author would also like to acknowledge the
findings provided by the China Postdoctoral Science Foundation (Project No.

1. For a useful review of the systemic functional semiotic approach to
multimodal discourse analysis see Iedema (2003) and OHalloran

Chen: Exploring dialogic engagement with readers 503

2. Considering the fact that the main purpose and function of EFL
textbooks is for language teaching, we use reader in the term reader
voice to refer to both reader of verbal texts and viewer of visual images.
Nevertheless, the term viewer is sometimes adopted in the current
research when the analysis mainly centres around visual images.
3. The labelling in Figure 1 is different from labelling in the general sense
in that it contains a missing letter that is intended to involve the readers
participation. It could be termed jointly-constructed labelling, which
demonstrates features of both labelling and jointly-constructed text.
4. Kress and Van Leeuwen (2006: 11718) use demand and offer to
refer to whether eye contact is created between viewer and represented
5. When examining the gradability of engagement values, this article
uses high/low degree to describe the meanings that are graded along a
clined scale rather than to imply discrete values.


Go For It Students Book II for Year 7 (2005). Beijing: Peoples Education Press.
PEP Primary English Students Book I, Book II for Year 3 (2003). Beijing: Peoples
Education Press.
PEP Primary English Students Book I, Book II for Year 4 (2003). Beijing: Peoples
Education Press.
PEP Primary English Students Book II for Year 5 (2003). Beijing: Peoples
Education Press.
PEP Primary English Students Book I for Year 6 (2003). Beijing: Peoples
Education Press.

Bakhtin, M.M. (1981) The Dialogic Imagination, ed. M. Holquist, trans.
C. Emerson and M. Holquist. Austin: University of Texas Press.
Bakhtin, M.M. (1986) The Problem of Speech Genres, in C. Emerson and
M. Holquist (eds) Speech Genres and Other Late Essays, trans. V.W.
McGee, pp. 60102. Austin: University of Texas Press.
Barthes, R. (1977) Rhetoric of the Image, in S. Heath (trans.) ImageMusic
Text, pp. 3251. London: Fontana.
Chen, H.X. and Zhang, B.Y. (1997) The Influence of Illustration on Years 3
and 5 Students Reading Comprehension, Psychological Science (5):
Chen, Y.M. and Qin, X.Y. (2007) Multimodal Engagement Resources and
Voice Interaction in Textbook Discourse, Foreign Languages and Their
Teaching (12): 1518.
Chen, Y.R. (2005) Study on the Reform of Textbook Content Attributes,
unpublished PhD thesis, Shanghai, East China Normal University.

504 Visual Communication 9(4)

Chen, Y.Y. and Ye, L.X. (2006) Textbook: The Open Discourse in Dialogue,
Contemporary Educational Science (23): 3436.
Coffin, C. (2000) History as Discourse: Construals of Time, Cause and
Appraisal, unpublished PhD thesis, Department of English, University
of New South Wales, Sydney.
Economou, D. (2006) The Big Picture: The Role of the Lead Image in Print
Feature Stories, in I. Lassen et al. (eds) Mediating Ideology in Text and
Image, pp. 21133. Amsterdam: Benjamins.
Halliday, M.A.K. (1978) Language as Social Semiotic: The Social Interpretation
of Language and Meaning. London: Arnold.
Halliday, M.A.K. (1993) Language in a Changing World, Occasional Paper No.
13. Sydney: Applied Linguistics Association of Australia.
Halliday, M.A.K. (1994) An Introduction to Functional Grammar, 2nd edn.
London: Arnold.
Halliday, M.A.K. and Matthiessen, C.M.I.M. (2004) An Introduction to
Functional Grammar, 3rd edn. London: Arnold.
Hood, S. (2004) Appraising Research: Taking a Stance in Academic Writing,
unpublished PhD thesis, Faculty of Education, University of
Technology, Sydney.
Hood, S. and Martin, J.R. (2007) Invoking Attitude: The Play of Graduation
in Appraising Discourse, in R. Hasan et al. (eds) Continuing Discourse
on Language, Vol. 2, pp. 73964. London: Equinox.
Huang, Y.H. (1999) Utilizing Illustration to Teach Moral Education, Primary
School Teaching Research (10): 337.
Iedema, R. (1995) Literacy of Administration (Write it Right Literacy in
Industry Research Project Stage 3). Sydney: Sydney Metropolitan East
Disadvantaged Schools Program.
Iedema, R. (2003) Multimodality, Resemiotization: Extending the Analysis
of Discourse as Multi-semiotic Practice, Visual Communication 2(1):
Iedema, R., Feez, S. and White, P.R.R. (1994) Media Literacy. Write It Right
Literacy in Industry Research Project Stage 2. Sydney: Sydney
Metropolitan East Disadvantaged Schools Program, NSW Department
of School Education.
Kress, G. and Van Leeuwen, T. (2001) Multimodal Discourse: The Modes and
Media of Contemporary Communication. London: Arnold.
Kress, G. and Van Leeuwen, T. (2006) Reading Images: The Grammar of Visual
Design, 2nd edn. London: Routledge.
Kristeva, J. (1986) Word, Dialogue and Novel, in T. Moi (ed.) The Kristeva
Reader. Oxford: Blackwell.
Martin, J.R. (2000) Beyond Exchange: Appraisal Systems in English, in
S. Hunston and G. Thompson (eds) Evaluation in Text: Authorial
Stance and the Construction of Discourse, pp. 14275. Oxford: Oxford
University Press.

Chen: Exploring dialogic engagement with readers 505

Martin, J.R. (2008) Boomer Dreaming: The Texture of Recolonisation in a
Lifestyle Magazine, in G. Forey and G. Thompson (eds) Text-type and
Texture, pp. 25083. London: Equinox.
Martin, J. R. and Rose, D (2007) Working with Discourse: Meaning beyond the
Clause, 2nd edn. London: Continuum.
Martin, J.R. and White, P.R.R. (2005) The Language of Evaluation: Appraisal in
English. London: Palgrave.
OHalloran, K.L. (2008) Systemic Functional-Multimodal Discourse Analysis
(SF-MDA): Constructing Ideational Meaning Using Language and
Visual Imagery, Visual Communication 7(4): 44375.
Painter, C. (2007) Childrens Picture Book Narratives: Reading Sequences of
Images, in A. McCabe et al. (eds) Advances in Language and Education,
pp. 4059. London: Continuum.
Song, Z.S. (2005) A Study of Illustrations in Textbooks from the Perspective
of Cognitive Psychology, Journal of Beijing Normal University (Social
Science Edition) (6): 226.
Tao, Y. and Shen, J.L. (2003) The Immediate Processing Study on Reading
Picture Texts for Grade 11 Students, Psychological Development and
Education (2): 436.
Wang, Y. (2000) A Comparative Study of the Primary and Secondary
Textbook System in China Mainland, Hongkong, Macao, and Taiwan,
Curriculum, Teaching Material and Method (9): 4951.
White, P.R.R. (2003) Beyond Modality and Hedging: A Dialogic View of the
Language of Intersubjective Stance, Text, Special Issue on Appraisal,
23(2): 25984.
Zhang, S.H. (2005) Review of the Studies on Primary and Secondary School
Textbooks in China, Educational Science Research (5): 912.

YUMIN CHEN is a lecturer and post-doctoral research fellow in the School of
Chinese as a Second Language (SCSL), Sun Yat-sen University. She obtained
her PhD degrees in linguistics from the University of Sydney and Sun Yat-sen
University and her research interests include multimodal discourse analysis,
social semiotics and pedagogic discourse. Yumin is currently working on
multimodal pedagogic materials for language teaching.

Address: School of Chinese as a Second Language (SCSL), Sun Yat-sen Univer-

sity, No. 135 Xingang West Road, Guangzhou, Guangdong, China 510275.

506 Visual Communication 9(4)