You are on page 1of 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/259559003

Multimodality and Reading: The Construction of Meaning Through Image-


Text Interaction

Chapter · January 2009

CITATIONS READS

72 1,289

2 authors, including:

Len Unsworth
Australian Catholic University, Australia,Sydney
111 PUBLICATIONS   2,180 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Developing a grammatics good enough for school English View project

‘Digital Multimodal Adaptations of Children’s Literature: Semiotic Analyses and Classroom Applications’. View project

All content following this page was uploaded by Len Unsworth on 29 June 2019.

The user has requested enhancement of the downloaded file.


14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 151

Unsworth, L., & Cleirigh, C. (2009). Multimodality and reading: The Construction of
meaning through image-text interaction. In C. Jewitt (Ed.), Handbook of Multimodal
Analysis (pp. 151-164). London: Routledge.

11
Multimodality and reading

n
The construction of meaning through

tio
image–text interaction

bu
Len Unsworth and Chris Cléirigh

tri
is
Introduction

rd
The reconceptualization of literacy and literacy pedagogy has been increasingly advocated as a
fo
research imperative in view of the increasingly multimodal nature of paper and digital media
texts (Chandler-Olcott and Mahar, 2003; Hull and Nelson, 2005; Kamil et al. 2000; Lemke,
2006; Leu et al., 2004; Richards, 2001). Many see image–text relations as central to such a recon-
ot

ceptualization (Andrews, 2004; Bolter, 1998; Boulter, 1999; Dresang, 1999; Jewitt, 2002, 2006;
Jewitt and Kress, 2003; Kress, 2003b; Luke, 2003; New London Group, 2000).Although a good
deal of recent work addresses the ways in which images construct meanings, very little has
,N

specifically addressed the intersemiotic semantic relationships between images and language to
show how the visual and verbal modes interact to construct the integrated meanings of multi-
modal texts (Martinec and Salway, 2005; Royce, 2007).While some descriptive accounts have
fs

emerged such as McCloud’s (1994) explication of image–language interaction in graphic nov-


els, and educational work on science textbooks (Roth et al., 2005), research within systemic
oo

functional semiotics has provided a systematic account of a “semantic system of image–text rela-
tions that would map out how images and text interact” (Martinec and Salway, 2005: 341).
Nevertheless, the development of a generalized semiotic system describing the semantics of the
co-articulation of image and language remains in its infancy (Kress, 2001; Lemke, 2006; Macken-
Pr

Horarik, 2003; Unsworth, 2008).Advancing understanding of how images and language inter-
act to construct meaning seems crucial in seeking to reconceptualize literacy and literacy
pedagogy from a multimodal perspective.The discussion in this chapter seeks to contribute to
F

this kind of advancement.


Different research traditions in education have addressed image–text relations in reading com-
T&

prehension but have not sought to explicate how images and language interact to construct
meaning. For example, Mayer (2008) investigated a number of features of the deployment of
images and language in short multimedia science explanations to determine the principles of
such deployment that led to the most efficient comprehension. The resulting guidelines for
learning materials included: the coherence principle (excising extraneous words, sounds and
pictures); the signalling principle (highlighting essential material); and the spatial contiguity
principle (placing corresponding words and pictures near each other). In other studies where the
151
14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 152

LEN UNSWORTH AND CHRIS CLÉIRIGH

role of images in meaning making were addressed, such as the study of Brazilian school science
textbooks by Roth and colleagues (2005), the image–text relations, remote from any linguistic
or visual semiotic theory, were ambiguous and/or referred to relations among caption text and
main text rather than between image and caption or main text (Unsworth, 2006a). Due to space
constraints, this chapter focuses on the contributions of more semiotically oriented studies whose
theoretical bases are more apposite to explicating the semantic nature of image–language inter-

n
action. This work mainly derives from the systemic functional linguistic (SFL) work of Michael

tio
Halliday (Halliday and Matthiessen, 2004).
Within this focus we limit our discussion to the construction of ideational or representa-
tional meaning, leaving aside issues of compositional meaning concerning layout and promi-

bu
nence, and issues of interactive meaning concerned with pseudo interpersonal relations between
readers/viewers and the represented participants and objects in images and language.We briefly
critique systemic functional semiotic approaches to describing the ideational semantics of
image–language interaction, and then outline early work emphasizing the distinctively differ-

tri
ent semiotic affordances of image and language effecting their specialized function in contribut-
ing different aspects of overall meaning in texts. From this perspective we go on to propose a

is
semiotic framework designed to address the synergistic nature of image–language interaction in
meaning construction. We illustrate this framework with reference to science books for pri-

rd
mary school children and then demonstrate the efficacy of the framework in accounting for chil-
dren’s difficulties in comprehending a science text in a state-wide mandatory reading
comprehension test in Australia. In conclusion we note limitations of the present work in terms
fo
of the limited field of primary school science, the confining of the study to ideational meaning,
and to static and pictorial images and also in terms of the range of semantic relations encom-
passed in the framework. Implications of the work for pedagogy involving reading to learn and
ot

for further research on explicating the role of image–language interaction in reconceptualizing


literacy in multimodal terms are also briefly noted.
,N

Describing how images and text interact in making meaning

A focus on representational/ideational meaning


fs

Since our work focuses on representational or ideational meaning, only studies dealing with this
dimension of image–language relations are included here.Although some related work has addressed
oo

moving images (Baldry and Thibault, 2006) we will deal only with studies involving static images
and language in paper or digital media.We will also emphasize studies of science texts, although we
note the pioneering work of Gill (2002) on “ideational concurrence” of image and language in pic-
Pr

turebooks, and the ongoing work of O’Halloran (1999, 2000, 2003a, 2003b), who developed a sys-
tem for analyzing mathematical formalism derived from O’Toole’s (1994) semiotics of images, and
based on Halliday’s (1994) grammar,further developed the concept of semiotic metaphor to account
F

for the nature of transformations from language to mathematical symbolism.


T&

Describing cohesive relations between elements of images and text


Royce (1998, 2002, 2007) has taken the concept of “cohesive ties” (Halliday and Hasan, 1976;
Martin, 1992) among content words distributed across texts and applied this to links between
words and image segments. He mapped relations of repetition, synonymy, antonymy, meronymy,
hyponymy and collocation between visual and verbal message elements in an advertisement
(1998) and in a later extended article (2007) in the Economist magazine, and also in a high school
152
14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 153

IMAGE–TEXT INTERACTION

science text book (2002). Most of the cohesive ties involved repetition, meronymy or colloca-
tion, but the meronymy and collocation indicate that the visual and verbal modes provide much
more than a duplication of meanings.“Crossmodal” cohesive links were also described by Lemke
(2002) to indicate the role of these image–text relations on science websites. He indicated that
interpreting the contribution of the image component depended on these intermodal ties since
the construction of meaning from the image necessitated integrating information from the cap-

n
tion texts.While these studies indicated extensive interdependence between images and text, we

tio
sought to explicate further the ideational nature of that interdependence.

Modelling image–text relations on clause types and conjunction

bu
Martinec and Salway (2005) have taken Halliday’s descriptions of the types of relations between
clauses within a sentence (Halliday, 1985, 1994; Halliday and Matthiessen, 2004), and applied
them to relations between language and image.There are two aspects to this. One is the notion

tri
of dependency – which are the independent clauses and which are subordinate clauses? Martinec
and Salway have attempted to adapt this to describing the relative status of image and language

is
segments as equal or unequal, and if equal whether independent or complementary, and if
unequal whether the image is dependent on the language or whether the language is depend-

rd
ent on the image. However, there are a number of inconsistencies inherent in the status systems
they propose, which are highlighted in their own data analyses, that appear to make this aspect
of their proposal very difficult to work with (Unsworth and Cléirigh, 2009). But more broadly,
fo
it can be asked: what value is there in determining whether language and image are of equal or
unequal status? How does it further our understanding of how language and image collaborate
in the creation of a unified, coherent text? More promising, in this regard, as Martinec and
ot

Salway (2005) demonstrate, is the deployment of the second aspect of Halliday’s system of inter-
clausal relations – this is the nature of the semantic linkage between clauses.
The semantic linkages can expand the meaning of a clause in three ways. In elaboration one clause
,N

restates or clarifies another, in extension one clause adds information to another, and in enhancement
one clause provides information such as how, when, where or why in relation to the other clause.
Martinec and Salway have identified examples of these three kinds of linkages between images and
fs

text segments. However, in our view, in simply extrapolating from this aspect of clausal relations to
image–text relations their account does not sufficiently attend to the reciprocity of the different affor-
dances of image and text.This can be seen in their discussion of an image which represents people
oo

in a disaster setting with several representations of dead bodies on the floor.The caption says “Police
believe a short circuit set fire to the hall’s thatch roof.” Martinec and Salway say that “Here the image
enhances the text.The dead bodies lying on the floor are the result of a short circuit set fire to the hall’s
Pr

thatch roof ”(2005:351).What they don’t pick up on is that the text also enhances the image.The text
provides the reason for the image figures. In the account of our work in the subsequent sections, we
build on this strand of the theorizing discussed by Martinec and Salway and also in the work of
F

Matthiessen (2007), augmenting it with the notion of intermodal (mutual) identification.


T&

Functional specialization of image and language in science texts


In studies of science texts, Kress (1997b, 2000a, 2003a, 2003b) and Lemke (1998b) explicated
what they call the “functional specialization” of language and image. According to this the
resources of language are most apposite to the representation of sequential relations and the
making of categorical distinctions, while the resources of images are most apposite to the rep-
resentation of spatial relations and for formulating “… degree, quantity, gradation, continuous
153
14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 154

LEN UNSWORTH AND CHRIS CLÉIRIGH

change, continuous co-variation, non-integer ratios, varying proportionality, complex topolog-


ical relations of relative nearness or connectedness, or non-linear relationships and dynamical
emergence…” (Lemke, 1998b: 87). Lemke provides a detailed discussion of the meaning-
making resources of a scientific article that includes diagram and text, although this is not a
fully articulated framework of intersemiotic relations that would explicate the kinds of linkages
between images and language.

n
Kress (1997b, 2000a, 2003a, 2003b) has exemplified the functional specialization principle. His

tio
analysis of a textbook page dealing with “electronics” shows that:

The language is about events: relatively simple sentences (one or two clauses), which are

bu
about actions – what had been done; what is to be done; what might happen if… The dia-
grams represent the core information of this bit of the curriculum: what a circuit consists
of, and in what relation its components stand to each other.
(Kress, 2000b: 197)

tri
While the accounts provided by Kress in a number of papers do not develop a system of

is
intersemiotic relations, Bezemer and Kress (2008) have discussed the contributions of image
and text to meaning-making in terms of what they called epistemological commitment.They define

rd
this as an “unavoidable affordance” and provide as an example the fact that in the visual mode
the artist or designer is obliged to show the distance between two characters (and other spatial
relativities) which are not obligatory in a verbal description of the scene.They go on to note
fo
that:“Every mode imposes/demands such commitments as a matter of course, though each set
of commitments is different” (Bezemer and Kress, 2008: 176).This means that a visual and ver-
bal depiction of the same thing can be thought of as two different semiotic representations of
ot

the same phenomenon, so there is a simultaneous semiotic relationship of identity and differ-
entiation. This insight is integral to the modelling of image–text relations we are proposing,
and is not sufficiently addressed in existing models.
,N

Rethinking intermodal construction of representational meaning


fs

Reasoning towards a synergistic model of image–text relations


We know that no image or visual representation means in all and only the same ways that some
oo

text can mean (Lemke, 2002: 303–304). Some aspects of this essential incommensurability have
been noted in relation to the functional specialization principle and the concept of epistemo-
logical commitment. Lemke goes on to note that it is this essential incommensurability that
Pr

enables new meanings to be made from combinations of modalities in texts. So, although the
meaning-making affordances of image and text are very different, they interact synergistically
in the construction of meaning.The challenge is how to model the nature of this synergy while
F

preserving the notion of the distinct affordances of the different modes.


As distinct from simply applying existing linguistic descriptions to image–text interaction, we
T&

have used one aspect of systemic functional grammar as a heuristic to reason about the nature of
the image–language synergy.This grammatical concept is relational grammatical identification. It most
commonly occurs with the verb “to be” or “to have” where one participant in the clause is iden-
tified with another. For example, if we say “Gregory is the bald one” this indicates that in this par-
ticular context Gregory is the only bald person and hence he is uniquely identified. But in this
context Gregory and “the bald one” refer to the same entity, hence either Gregory can be used
to identify “the bald one” or “the bald one” can be used to identify Gregory:
154
14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 155

IMAGE–TEXT INTERACTION

Who is the bald one? Who is Gregory?

Gregory is the bald one Gregory is the bald one


Identifier Identified Identified Identifier

In this relationship of unique identification both participants in the clause are the same person.But

n
the clause is not tautologous,so there must be a difference between the separate representations of the
participants. Halliday and Matthiessen (2004) refer to this difference as one of “expression” and “con-

tio
tent”.They use the termToken for the lower “expression” andValue for the more abstract “content”.
So if we say “Angelika is the School Captain” we are identifying Angelika by assigning her to aValue.
Angelika is theToken (expression) and “the School Captain” is theValue (content). But all such iden-

bu
tifying clauses can“face both ways”with either theToken (expression) as Identifier,identifying theValue
(content) as Identified or the Value (content) as Identifier identifying the Token (expression) as
Identified. If we go back to “Gregory is the bald one”, the lower “expression” is “the bald one”, so

tri
“the bald one” is the Token and the higher (content) Value is Gregory. But as shown above, either
Gregory (Value) can identify who“the bald one”is,or“the bald one”(Token) can identify which one

is
is Gregory.We refer to this kind of grammatical identification as intensive identification.
This concept of mutual identification is central to our theorizing of image–language relations, but

rd
to provide the complete basis for our modelling of image–text relations, we need to point out two
more types of identifying clauses in English.The second type involves possession,but again we are fun-
damentally concerned with the relationship of identity between two participants. So, if we say “Anna
fo
has a harp” we are simply putting Anna into a class of people who have harps, so this is not an iden-
tifying clause.However,if we say“The harp is Anna’s”,then both“Anna’s”and“the harp”refer to that
which Anna possesses, and the relationship is one of identity. Here “The harp” is Token and “Anna’s”
isValue.We will refer to this kind of identification as possessive. In our discussion of image–language
ot

relations, we will suggest that this kind of identification involving possession enables us to account for
the ways in which images of entities can show elements (not realized verbally) that are possessed by the
,N

linguistically realized entities. So this deals with images augmenting verbal information.
The third type of identifying clause occurs when the relationship between two entities is such that
one entity is being identified by a circumstantial feature of time or place or manner etc., for example,
fs

“Tomorrow is the twelfth” or “the best way to get there is by train”, or “this situation is apparently
caused by anomalous low temperatures”. In all cases what is being related on either side of the verb
“is” in each clause is essentially the same thing.We will refer to this kind of identification as circum-
oo

stantial. In these cases “the Token can be quite varied in grammatical class – a nominal group, an
adverbial group, a prepositional phrase or an embedded clause, whereas theValue is often a nominal
group with the name of a class of circumstance as Thing” (Halliday and Matthiessen, 2004: 242).
Pr

However, the Token is always the “expression” form at the lower level of abstraction, so in “her act
was followed by applause”,theToken is“applause”,and in“The way to get there is by train”theToken
is “by train”. Similarly in our account of image–language relations the image is always the Token –
F

the lower “expression” form, whereas the language is the more abstract “content” form – theValue.
T&

Intermodal relations as (mutual) identification

Introducing intermodal identification


The grammatical concept of Identification can be used to model the bimodal synergy of pictorial
images and language (i.e. to model the image–language relations that jointly construct experiential
or representational meaning).An image form, as the lower order of abstraction, functions as Token,
155
14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 156

LEN UNSWORTH AND CHRIS CLÉIRIGH

Verbiage/Value
‘decoding’/glossing
IDENTIFIER
SELECTION
image/Token
‘encoding’/visualizing

n
Figure 11.1 Image or language as identifier.

tio
bu
whereas the language meaning, as the higher order of abstraction, functions asValue (Figure 11.1).
Sometimes it is the language (Value) – the “meaning” or higher order of abstraction that the
reader/viewer is familiar with and can use as the“Identifier”to identify the image.For example,sup-
pose we have an image of a Minke whale with a caption,“The Minke whale is the current target

tri
of Japanese whalers.” It may be that the reader/viewer has heard of the Minke as one among a vari-
ety of whales, but has never previously seen a picture of a Minke whale. S/he then uses the language

is
as Identifier to gloss the image. On the other hand, suppose we have an image of the Indian Myna
bird with the caption,“Indian Myna bird displaces native birds in urban habitats.” In this case the

rd
reader/viewer may be familiar with the bird visually but not previously have heard it referred to ver-
bally. In this case s/he uses the image as the Identifier to visualize the more abstract name of the bird.
So, either the language or the image can be the point of departure to identify the other. If the lan-
guage is the point of departure, it is the Identifier and it glosses the image. If the image is the point
fo
of departure, it visualizes the language.

Images identifying language: visualizing language elements


ot

Drawing on our outline of relational grammatical identification above, we are proposing three types
,N

of intermodal identification: intensive, possessive and circumstantial, which are discussed in this
section from the perspective of images visualizing language elements.The three types are illus-
trated by reference to Figure 11.2 showing the life cycle of a cicada (the labels for scenes 1–6
have been superimposed by Unsworth and Cléirigh).
fs

In intermodal intensive identification the image visualizes the qualities (shape, colour, texture)
of the identified participant. In Figure 11.2, Scene 5, the (unverbalized) qualities (shape, colour,
oo

texture) of the fully grown nymph are shown in the image.While both the image and the language
refer to the same phenomenon the image elaborates qualities due to the inherent epistemologi-
cal commitment (Bezemer and Kress, 2008) of the visual mode.
Pr

In intermodal possessive identification the image visualizes additional participant(s) not explicit
in the language element(s). The image visualizes the (unverbalized) additional things (parts)
that compose the identified participant. In Figure 11.2, Scene 6, the (unverbalized) things (parts)
that compose adult cicada are shown.
F

In intermodal circumstantial identification the image visualizes language element locations.


The image visualizes the (unverbalized) locations of the things (parts) that compose the identi-
T&

fied participant. In Figure 11.2, Scene 2, the image visualizes the (unverbalized) location of the
ovipositor (rear underside of the cicada’s abdomen).

Images identifying language: visualizing language configurations


A language configuration refers to the semantic relation of process, participants and circumstances
realized by the verb, noun groups and adverbial phrases. Intermodal circumstantial identification
156
14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 157

IMAGE–TEXT INTERACTION

n
tio
bu
tri
is
rd
fo
ot

Figure 11.2 Life cycle of a cicada.


,N

Source: Bird and Short, 1998: 29.

can occur when an image visualizes the locations of such language configurations. New circum-
fs

stantial meanings, of Manner and Cause–Condition, can be encoded by this intermodal identi-
fication. That is, the image–language interaction constructs circumstantial meanings of Manner
oo

and Cause–Condition that are not constructed by either modality alone – consistent with Lemke’s
notion of multiplying meaning (1998a, 1998b).
Pr

• In the relation between the image and the language configuration the locational relation
between image participants (i.e. their relative positions) can realize – visualize, show, demon-
strate – the (unverbalized) Manner: quality of a language configuration. In Figure 11.2,
Scene 1, the Manner: quality of male and female cicadas mating is shown as back-to-back,
F

male-over-female.
T&

• In the relation between the image and the language configuration the locational relation
between image participants (i.e. their relative positions) can realize – visualize, show, demon-
strate – the (unverbalized) Causal–Conditional: result or consequence of a language
configuration. In Figure 11.3, Step 2 the Cause: result of Wrap the bare end of one piece of
wire tightly around one of the legs of the clothes peg and Do the same with the second wire on the
other leg of the clothes peg are shown as uninsulated red wire around right peg leg and unin-
sulated blue wire around left peg leg.
157
14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 158

LEN UNSWORTH AND CHRIS CLÉIRIGH

n
tio
bu
tri
is
rd
fo
ot
,N
fs
oo
Pr

Figure 11.3 Making switches.

Source: Bartholomew 2003: 10


F
T&

• In the relation between the image and the language configuration, the locational relation
between image participants (i.e. their relative positions) can realize – visualize, show, demon-
strate – the (unverbalized) Causal–Conditional: reason or condition of a language con-
figuration. (Illustrations not included due to lack of copyright approval, but see for example
(McDonald, 2001: 170–171) where an image of a car-laden freeway leads to the billowing
smoke stacks of an industrial city relates causally to the subtitle “Climatic Change: Global
Warming”.)
158
14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 159

IMAGE–TEXT INTERACTION

Language identifying image: glossing of the image


A language element can identify an image participant. In Figure 11.2, Scene 2, the participant
element of the label ovipositor identifies (glosses) the located compositional participant (body part)
in the image. A language configuration (participant(s) + process +/- circumstance(s)) can also
identify (gloss) the spatial arrangement of image participants. In Figure 11.2, Scene 3, the
“doing” configuration of the caption nymph hatching identifies (glosses) the locational relation

n
between participants (nymph and broken egg) in the image.

tio
Images and the “main” text

bu
The images in Figure 11.2 appear on the right-hand side of a double-page spread in the chil-
dren’s information book with the main text occupying the left-hand side. Here we will consider
the main text segment corresponding to scenes 1 and 2 in Figure 11.2.This segment is re-pre-

tri
sented below, divided into clauses:

1. After mating

is
2. the female cicada lays her rows of eggs in bushes high in trees.
3. She makes slits in the twigs or branches with her strong, sharp ovipositor
4.
5.
and deposits the eggs into these slits.
It takes about six weeks for the eggs to hatch.
rd
fo
As we have indicated in discussing images and captions, the nature of the image–language rela-
tion is mutual identification.This is also the case for the relation between images and correspon-
ding main text segments. In Figure 11.2, Scene 1 identifies (visualizes) the location of clause 1,
ot

and as previously noted, the manner in which the mating occurs. Clause 1 identifies (glosses)
Scene 1, in that “mating” decodes the representational meaning of the spatial arrangement of
the images of the two cicadas positioned rear-to-rear on a branch. Similarly Scene 2 visualizes
,N

clause 2 also indicating manner, and clause 2 glosses Scene 2. In clause 3 “makes slits” precedes
what is depicted in Scene 2, but the participants in the language are visualized as is the manner
(means) of making the slits (i.e. with the ovipositor). Scene 2 visualizes clause 4 also indicating
fs

manner and correspondingly clause 4 glosses the spatial arrangement of image participants in
Scene 2. Of clause 5 only the participant “eggs” is visualized in Scene 2.
oo

Intermodal identification and the reader


Pr

Intermodal identification accommodates the concept of functional specialization of image and


language (Kress 1997b, 2000a, 2003a, 2003b) and the synergistic construction of meanings that
cannot be derived from either mode separately. It is also a way of considering options available
F

to readers in negotiating the meanings of multimodal texts. If an image appears in a text with-
out any language, the image is only decoded (glossed) by language that can be generated by the
T&

reader. So if an image, or any part of an image, is unfamiliar to the reader – such as a cicada body
part – it remains undecoded.A pedagogical function of language provided by the author is to sup-
plement what can be provided by the reader for decoding (glossing) the image. If language appears
without an image in a text, the language is only encoded by an image that can be visualized by
the reader based on his/her prior experience of the language in relation to the actual phenom-
enon or some prior visual representation of it. So if any part of the language is unfamiliar to the
reader – such as the word “ovipositor” – it remains unencoded visually. A pedagogical function
159
14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 160

LEN UNSWORTH AND CHRIS CLÉIRIGH

of an image provided by the author, is to supplement what can be provided by the reader, for visu-
ally encoding the language. So, monomodal texts demand more of the visual or verbal experience
of the reader than do bimodal texts. The more exotic/technical/abstract the field, the more
important the intermodal identification pedagogically.

Image–language relations in reading comprehension

n
tio
The increased extent and significance of images and image–language relations in a wide range
of textual material in recent years is reflected in reading comprehension pedagogy requirements
in government-mandated school syllabus documents in Australia and the United Kingdom.

bu
However, group reading comprehension tests used by governments and education authorities as
indicators of student literacy standards and teaching effectiveness, in general, rarely conceptual-
ize reading comprehension as involving images now such an integral part of most curriculum area

tri
reading materials and of reading materials more generally. In Australia, mandatory group reading
comprehension tests for year three and year five children in government schools, previously con-

is
ducted by the states, have been replaced by national tests in 2008.A number of the state tests did
seek to assess children’s meaning-making from the integrative use of images and language in their
colour stimulus booklets. Preliminary results from a nationally funded study1 investigating the role

rd
of image–language relations in children’s reading comprehension suggest that the nature of these
relations may influence the difficulty children experience in comprehending such texts
(Unsworth and Chan, 2008).
fo
Figure 11.4 is an extract from the stimulus booklet for the 2005 Basic Skills Test for year five
students in New South Wales schools.The question below is one of those based on this extract:
ot

Order these steps to show how a Greek water clock works.Write a number from 1 to 4
in each box to show the order.
,N

 water drips into a container


 a float rises in the rising water
 cogs moving up turn a clock hand
fs

 the float makes cogs move upwards


oo

Only 56 per cent of the total year five cohort of Students in NSW scored the correct answer.
Given that the nature and functioning of a Greek water clock is likely to be unfamiliar to most
year five students, an examination of the image–language relations from the perspective of
Pr

mutual identification can reveal potential obstacles to their reading comprehension.


In terms of the language elements identifying or glossing the image participants, this clearly
occurs with the image figure as a whole (“a water clock (clepshydra)”;“A Greek Water Clock”)
and with some of the image figures as component parts (“Cogs”; “Clock hand”; “Float”).
F

However, many of the parts of the image are not glossed by the language (frame, funnel, pipe,
reservoir, shaft, wheel) as this glossing is less relevant to understanding the movement of the
T&

clock hands, and inclusion of labels would clutter the image.The language,“Water supply” and
“Overflow”, although perhaps obvious to adult readers, sets up a potentially challenging inter-
modal misidentification for the inexperienced student reader.This language appears, falsely, to
be glossing parts of the image, when it is actually the nominalized forms of the more congru-
ent linguistic realization of events: “water flows in” and “excess water flows out” (and there is
no “water supply” or “overflow” visualized in the image). In terms of language configurations

160
14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 161

IMAGE–TEXT INTERACTION

n
tio
bu
tri
is
rd
fo
ot
,N
fs
oo
Pr
F
T&

Figure 11.4 A Greek water clock.

Source: Chapman 1996.

161
14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 162

LEN UNSWORTH AND CHRIS CLÉIRIGH

Table 11.1 Image visualizing language in the main caption text for the water clock

Language Visualization in image

Water trickles in This “doing” configuration is visualized by a vertical series of water drops
positioned in an unlabelled reservoir and extending below an unlabelled pipe
attached to the unlabelled funnel

n
and raises a float This “doing” configuration is visualized only by a labelled figure above the water

tio
in the unlabelled reservoir. No raising of the float is visualized. Readers are
required to be able to visualize it from their own experience
which is attached This “being” configuration is visualized by a series of connected parts:

bu
to a clock hand. • float connected to (unlabelled) shaft
• which has cogs which are connected to the cogs of an unlabelled wheel
• which is connected to a clock hand (ambiguously positioned against or

tri
connected to an unlabelled wheel).
Readers are required to fill in all the missing language that the image visualizes.

is
identifying (glossing) the spatial arrangement of image participants as “doing” configurations, the

rd
language “Water trickles in…” effectively glosses the relevant part of the image, but it is seman-
tically causally related to the second clause,“and raises a float”; however, this causal meaning is
realized linguistically as simple addition – “and”.This may well impede the understanding of the
fo
inexperienced reader since s/he has to unpack the reason–result relation.
If we now look at the text from the perspective of the image identifying (visualizing) the lan-
guage elements, it is clear that through intensive identifcation the image visualizes the unverbal-
ot

ized qualities (shape, texture, quality) of the identified participant(s) – the clock and its labelled
components – the “Cogs”,“Clock hand” and “Float”.Through possessive identification the image
also identifies (visualizes) the unverbalized parts that compose the identified participant (frame,
,N

funnel, pipe, reservoir, shaft, wheel). In addition, through circumstantial identification the image
visualizes the unverbalized locations of the image elements that compose the identified partic-
ipant (the clock). But if we now examine the image identifying (visualizing) language config-
fs

urations, focusing on the extended main caption text, the nature of further challenges for
inexperienced readers becomes apparent, as explicated in Table 11.1.
oo

There is a particular problem with the final clause in the language column in Table 11.1
(“which is attached to a clock hand”) in that it does not adequately decode or gloss the image
the children are presented with, which is likely to be very unfamiliar to the majority of them.
Considering this and the nature of the other image–language relations described in this text, it
Pr

is not surprising that only 56 per cent of ten-year-old children in year five at school were able
to answer the sequencing question correctly.The informational material for teachers describ-
ing the test indicates that the question discussed here requires students to be able to “sequence
F

steps by interpreting a diagram”.The kind of intermodal analysis we have described “unpacks”


into pedagogically useful information, what “interpreting” entails in terms of understanding
T&

image–language relations.

Conclusion

What is gained in our account is a more extravagant account of the synergistic construction of
representational meaning. For example, in our intermodal intensive identification, the image
162
14-Jewitt Ch-11:14-Jewitt Ch-11 4/16/2009 7:14 PM Page 163

IMAGE–TEXT INTERACTION

visualizes the unverbalized qualities (shape, colour, texture) of the verbalized participant. So this
takes into account the relationship between the language and aspects of visual representation in
images that is not accounted for by an image–language relation of exposition, where “the image
and the text are of the same level of generality” (Martinec and Salway, 2005: 350).The extrav-
agant description of intersemiotic representation in our approach is apposite to pedagogic issues
involved in children’s negotiation of science texts. In such texts what is primarily at stake for the

n
young readers is the effective negotiation of representational/ideational meanings that are fre-

tio
quently new and relatively complex to them. As we have shown in Table 11.1, if the text does
not gloss unfamiliar image segments or unfamiliar language is not visualized in images, inexpe-
rienced readers in the field can face significant difficulty.While this has implications for multi-

bu
modal text design for such learners, it also predicts where comprehension difficulties will be
located, guiding the generation of test items, and pedagogically informing teachers about the
need to gloss by “talking out” (Lemke, 1989; Unsworth, 2001) particular aspects of images with
children and/or the need for “visualizing” particular text segments through supplementation

tri
with additional images or the provision of some other form of concrete illustration.
While we propose that intermodal identification is a very useful framework, it is certainly not

is
sufficient. There are clearly other aspects of image–language relations (both in school science
materials and beyond) that have not been addressed here. For example, projection is a part of the

rd
SFL system of logico-semantic relations.This system deals with the quoting or reporting of word-
ing, or the quoting or reporting of thought. Speech or thought bubbles are noted as typical real-
izations in multimodal texts (Martinec and Salway, 2005; Matthiessen, 2007), and these certainly
fo
occur in science books and online science materials for children (Unsworth, 2006a).These are
cases of images projecting texts, and Matthiessen suggests it is unlikely that text projects image,
indicating that “this probably shades into elaboration” (Matthiessen, 2007: 35). However, there are
ot

examples in literary picturebooks for children where the main text projects an image (Unsworth,
2006b) and Martinec and Salway (2005) draw on Martin’s (1994) suggestion that in the context
of macrogenres, figures, tables, graphs and diagrams might involve the direct projection of ideas,
,N

bypassing wording, and reporting meaning directly. Clearly, there is much that remains to be done
in theorizing the semiotics of image–ing such theorizing.
fs

Note
oo

1 Unsworth, L., Barnes, G. and O’Donnell, K. New dimensions of group literacy tests for schools: Multimodal
reading comprehension in conventional and computer-based formats.Australian Research Council funded proj-
ect 2006-2008.
Pr
F
T&

163

View publication stats

You might also like