Professional Documents
Culture Documents
Information words14(3),
Design Journal and 207–230
visuals 207
© 2006 John Benjamins Publishing Company
Kenneth Kong
A taxonomy of the discourse relations
between words and visual
A document is mainly composed of words and images, ‘What is the use of a book’, thought Alice, ‘without
but the complex relationship that binds these two pictures and conversation?’
completely different semiotic resources is usually taken Alice’s Adventures in Wonderland (Lewis Carroll)
for granted as transparent. The simple relations between
‘Of course the abstract idea must be occasionally
words and images – ‘anchorage’ and ‘relay’, identified by
explained – paraphrased, as it were – by the aid of
Barthes almost 30 years ago – are unable to deal with the
pictures; but discreetly, cum grano Salis’
complexity of their bond, made even more complex by
(Arthur Schopenhauer)
current printing and computer technology. This paper
aims to identify the potential relations that bind texts and
images together by arguing for a multilevel description
of their logico-semantic relationships. The multiple,
evaluative and metaphorical functions of the relations
will also be discussed. The data generated from the
proposed framework can form an empirical corpus for
quantitative analysis. Examples from a variety of sources
will be used as examples to show how the framework can
be operationalized.
208 Kenneth Kong
language to the mixed mode of written and visual a sign’ (Eco, 1976, p. 6). A sign can include an image,
language, it is not to assume the complex phenomenon gesture, sound and of course a word. One of the most
can be understood by the existing theories of language important concepts in semiotics is the distinction of
or visuals alone. As Kress (1998, p. 65) points out, ‘we signifier (the sign itself) and signified (the referent),
seem to have a new code of writing and image [author’s (Saussure, 1974), which implies that signs are arbitrary
italics], in which information is carried differentially entities that vary from culture to culture. Another
by the two modes… simpler syntax does not mean that important distinction is between iconic and symbolic
the text – the verbal and visual elements together – is signs (Peirce, 1931-58). Iconic signs are the direct repre-
less complex’. In fact, Bonsiepe made a similar point sentations of the referents whereas symbolic signs make
more than 30 years back by arguing that a visual/verbal use of another image to make reference to the referent.
rhetorical figure is a combination of two types of signs Iconic signs are less subject to convention but may have
whose effectiveness in communication depends on the more constraints in terms of use; on the other hand,
tension between their semantic characteristics. The signs symbolic signs are more subject to convention but have
no longer add up, but rather operate in cumulative recip- fewer constraints in terms of use and manipulation. One
rocal relations. (1966, p. 171) of the most salient manipulative functions of signs is
While it is possible to borrow insights from to persuade people to take action, or, in brief, signs can
linguistics and visual design, it is too risky to take these have a persuasive function. Although rhetoric has been
for granted and apply them to either mode without mainly directed towards the study of words, it has been
careful consideration. While visuals have taken up some argued that visuals can perform similar persuasive func-
of the functions that written language used to perform, tions. The focus on the rhetorical/persuasive dimension
whereas written language is mainly used for reporting of signs has been labeled ‘visual rhetoric’ (Bonsiepe,
and narrating (Kress, 1998), the ‘display’ function of 1966; Marcus, 1979). For example, Marcus (1979) argued
visuals needs to be explored, especially in relation to the that many classical rhetorical devices (such as devices
narrating and reporting function of words. The most building in climax and devices of intensification) can
important issue in the research of multimodal docu- be applied to images. Verbal and visual signs cannot
ments is whether they can communicate meanings that be isolated when studying their persuasive functions,
traditional documents cannot (see Kaltenbacher, 2004). because, as Bonsiepe (1966) noted decades ago, there is a
In other words, we need a more systematic and sophisti- strong correlation of words and images in (for example)
cated network that can allow practitioners and analysts advertisements. This correlation has intensified in the
to understand the increasingly complicated verbal-visual age of the Internet.
connections. As mentioned earlier, recent studies of the connec-
tions between words and visuals have focused on the
supporting functions of language for visuals and vice
Word-image connections versa, without fully considering the interactive interplay
between the two. A good example of studies that focus
Visuals have been under close examination in a diverse on the supportive functions of visuals to body text
discipline of what is traditionally known as semiotics would be Pegg’s (2002) classification of images into
that is ‘concerned with everything that can be taken as ancillary, correlative or substantive types:
210 Kenneth Kong
[Ancillary images are] usually placed in the opposition relations (such as antonymy in verbal codes),
neighbourhood of relevant text (except in the case class:sub-class relations (such as hyponymy in verbal
of title-page illustrations, front-pieces, and thematic codes) and expectancy relations (such as collocation
colophons). The exact relationship of the image to the in verbal codes). While Royce is able to identify some
text however is left for the reader to determine. (p. 170) of the important word-image relations, the four types
of cohesion are not adequate in capturing the complex
Correlative illustrations are usually associated with
relationship. Indeed, the four ‘cohesive devices’
technical and scientific discourse and are characterised
identified by Halliday and Hasan are superficial links
by keying illustrations to the text in a variety of ways.
(p. 172) that connect words and clauses in verbal texts. There
are more ‘subtle’ or ‘external’ links that combine units
In […] substantive (illustrations), there is no need for in a text, which are known as coherence in discourse
the reader to build bridges between image and text, as analysis. For example, relations such as ‘explanation’,
with ancillary illustrations, nor for a writer to construct ‘sequence’ and ‘cause’ cannot be adequately dealt with by
elaborate and misunderstanding-prone bridges out of cohesive devices even though they may contribute to the
callout lines, numbering, or labels, as with correlative overall coherence of a text.
illustrations. If words (or numbers) used with Through a review of 24 previous studies, Marsh and
substantive illustrations are laid out two-dimensionally White (2003) could identify 46 relations between image
on a page so as to display structurally the relationships and text, which they then classified according to their
between them, then text and image are one. (p. 174) degree of integration or closeness as (a) functions that
have little relation to the text, such as those that decorate
By focusing on how images can ‘illustrate’ the verbal
or elicit emotion, (b) functions that have a close relation
content, Pegg’s study can only capture the degree of
to text, such as those that reiterate or exemplify, and (c)
integration between words and images. While this is
functions that go beyond the text, such as those that
an important index of connection, it does not explain
interpret, develop and contrast. Although this taxonomy
the semantic relations that bind the two. In the same
was based on extensive review, a large number of labels
vein, Eckkrammer (2004) identified four relations in
are either overlapping or difficult to apply in analysis.
word-image connections: transmedial relationships
For example, relations such as ‘control’, ‘relate’, ‘sample’,
(such as photos in novels), multimedial discourse (such
‘motivate’ and ‘translate’ are ambiguous terms that need
as book illustrations), mixed discourse (such as comic)
to be elaborated and re-defined. However, Marsh and
and syncretic discourse (such as visual poetry). Again
White highlight the importance of close integration
with a focus on the degree of connections, this does
between words and images. An image merely placed
not touch on how the connections are made possible.
beside text cannot make the connection clear; what
Similarly, Royce’s study (2002) of the intersemiotic
is needed is a framework that can help analysts or
systems of words and visuals in academic texts can
document designers to understand the complex
only partly identify the semantic relationships between
relationship between the two.
words and images. Drawing on Halliday and Hasan’s
Amongst recent studies, Horn’s idea of ‘visual
ideas of lexical cohesion (1976), Royce argued that
language’ (1998) is closer to providing an understand-
visual and linguistic elements can be realized through
ing of this relationship. Although visual language
similarity relations (such as synonyms in verbal codes),
can be analyzed from linguistic perspectives, it ‘has
A taxonomy of the discourse relations between words and visuals 211
distinct properties that make it different from natural disciplines such as speech synthesis and automation,
languages of words and from purely artistic languages. medicine, hypertext design, education and entertainment
It has a more complex syntax and requires more industry has been comprehensively reviewed by Kalten-
diverse and complex analysis’ (p. 13). Horn classifies the bacher (2004), who also argued that many such studies
relations between words and images along three main ’lack empirical evidence to support many of the claims
dimensions: semantic categories, classical rhetorical made’ (p. 202) and a corpus (a larger compilation of data)
devices and temporal relations. Semantic categories is an ideal solution to this problem. This paper can be
refer to the linking of two modes through their potential regarded as an important step towards a more systematic
meanings and include relations such as substitution, and quantitative approach to the issue.
disambiguation, labelling, example, reinforcement and
completion. Classical rhetorical devices are very similar
to the cohesive devices of Halliday and Hasan (1976) and The proposed taxonomy
include synecdoche (part-whole relation), metonymy
(association relation) and metaphor (suggestion of anal- Logico-semantic relations
ogy or likeness). Temporal relations are ‘different sets of
relationships between verbal and visual elements as seen The two text-image relations identified by Barthes are
in … process communications [rather than] in static ‘anchorage’ and ‘relay’, which come very close to what
displays’. Although Horn’s framework offers important Halliday (1994) calls ‘elaboration’ and ‘extension’, are two
insights into the ways in which words and images work of the many ‘logico-semantic relations ... which may
together to create meanings, its classifications can be hold between a primary and a secondary member of
refined and expanded. For example, the ‘metaphor’ a clause nexus’ (p. 219). Although the idea is based on
relation can be an inherent characteristic of any relation language clauses, the framework is equally applicable to
and is not necessarily an individual relation. In other verbal-visual connections, although with modifications.
words, a multi-layered framework that can consider the There are two categories of relations in Halliday’s
multiple functions of relations is a better solution. framework: expansion and projection. Expansion
Delin, et al. (2003), in an attempt in providing multi- refers to how a unit expands the other by extension and
layered framework of studying multimodal documents, elaboration, and projection refers to how a unit projects
argues that a detailed analysis of a document should take another unit, which can be a locution or an idea. The
into consideration at least five levels: content structure, relation of projection is more straightforward, and is
rhetorical structure, layout structure, navigation struc- usually found where the drawing of a character (a visual
ture and linguistic structure. The rhetorical structure, i.e. element) may have a projected speech or thought (a
‘how the content is argued’ (p. 56) is most relevant to the linguistic element). The relationship is usually signaled
issue of verbal-visual connections, but their framework by a connecting line between two units and the placing
is based on the linguistic model of Mann and Thomp- of the projected speech or thought inside a balloon. The
son (1988) without modifications. Besides, the linguistic expanded meaning is more complex and will receive
structure should be ideally incorporated into the rhetori- more treatment below. A third category – decoration
cal analysis because no information can be without rheto- – will also be added to the framework.
ric; language without rhetoric is ‘a pipe-dream’ (Bonseipe, The idea of expansion is useful when applied to
1966). Multimodality and its relevance in different word-image connections. The three types of expansion
212 Kenneth Kong
can be compared to elaborating an existing building, collection and variation are important in verbal texts,
extending a building and enhancing the environment. there are other equally important meanings present
In the case of elaboration, one unit (a clause, in in word-image extension but missing in Halliday’s
Halliday’s sense) ‘elaborates on the meaning of another framework. The first of these is the idea of sequentiality.
by further specifying or describing it’ (p. 225). The new In Halliday’s model, sequence is classified as a relation
unit ‘does not introduce a new element into the picture of enhancement (which will be discussed below) of a
but rather provides a further characterization of one main unit. In other words, the unit with the meaning of
that is already there, restating it, clarifying it, or adding sequence is subordinated to another unit by quantifying
a descriptive attribute or comment’ (p. 225). There are it. In the word-image world, both pictures and words
three sub-categories under elaboration: exposition, can convey ideas in chronological ways, without one unit
exemplification and clarification. ‘Exposition’ is being subordinated to another. A good example is comic
equivalent to ‘in other words’ in linguistic terms. strips, which are composed of spatially arranged panels
‘Exemplification’ is equivalent to ‘for example’ and that may contain words or pictures only. These units are
‘Clarification’ is equivalent to ‘to be precise’. I tend to self-sufficient and constitute the comic as a whole, and
use terms that are user-friendly, and replace ‘exposition’ cannot be regarded as subordinate in any sense.
and ‘clarification’ with ‘explanation’ and ‘specification’, Another important meaning of extension in word-
following van Leeuwen’s terminology (2005). Moreover, image relationships is alternation, which is different
there is a relation that Halliday neglects but which is from variation in the sense that one of the entities can
extremely important in multimodal discourse. A word completely replace another without losing any meaning.
or text can be used to identify a particular image and Response, as proposed by Grimes (1975), is embedded
vice versa. ‘Identification’ is an important function in in the mode of question and answer. Although derived
many genres, such as academic textbooks, technical from another unit, the ‘response’ unit also has its own
manuals and travel guidebooks. Hence, one unit can information.
elaborate on another by explaining, exemplifying, The third category of expansion is enhancement
specifying or identifying. (Halliday, 1994), which tries to expand a main idea unit
The second category of expansion is extension. Like by specifying circumstances. This is based on the idea of
an extension of a building, a unit with new meaning symmetry of relations, which will be further elaborated
can be added to the original unit. Based on language below. Basically the enhancement unit qualifies another
data, there are two main ways of extending a unit unit by specifying time, purpose, condition, goal etc.
– addition and variation (Halliday, 1994). In the case of These three categories of expansion are significant
addition, one unit is ‘adjoined to another; there is no in linking words and images, as is a fourth connection,
implication of causal or temporal relationship between which is more diffuse but less constrained by convention
them’ (p. 230). In linguistic terms, this is like the use of and more subject to interpretation. This is the deco-
the conjunction ‘and’. In the framework proposed here, rative function of images. Although words can also
the term ‘collection’ will be used instead, to underscore do the same to images, this is less likely because we
the constitutive function of semiotic resources. In the seldom relate words to decorative functions when
case of variation, one unit is regarded as ‘being in total we see words and pictures together. Although this
or partial replacement of another’, which is similar to function is subjective, it is important enough to be
the function of ‘comparing and contrasting’. Although labeled as another relation because while pictures can
A taxonomy of the discourse relations between words and visuals 213
Explanation
(in other words)
Exemplification
Elaboration (for example)
Specification
(No new
information (to be precise)
Identification
(namely)
Collection
Extension Variation
(New Sequence
information) Response
Expansion Alternation
Spatio-temporal
Manner
Cause
Effect
Condition
Enhancement Means
(New information Purpose
by specifying Justification
Types circumstances) Concession
of Motivation
relation Enablement
Restatement
Summary
Projection
(New Speech
information
usually in Thought
linguistic forms)
Decoration (New
but omissible
Various forms
and types
information)
simply decorate accompanying messages, they can which these relationships do not work. This is why
also ‘elicit emotion-laden reactions that may precede Grimes (1975) proposed a third type of relationship that
cognitive awareness and influence interpretation of can take either form. Known as the ‘neutral predicate’,
messages’ (Richards and David, 2005, p. 31). Conse- Grimes argued that it was most common in discourse.
quently, decoration, together with our existing two The best examples of neutral relations are collection
main relationship binders – expansion and projection and sequence, in which a unit may take a dominant
– constitute what I mean by the three logico-semantic role (with a bigger size or a more central position). In
relations of words and images. fact, I would like to go a step further by arguing that all
relations can be neutral, depending on the layout, posi-
Symmetry/Asymmetry of relations tioning (to be discussed below) and intended purpose.
Symmetrical and asymmetrical relations can be illustrat-
The symmetrical relations of verbal language have ed by the diagrammatic analogy of nucleus (main unit)
attracted enormous attention in linguistics (cf. Mann and satellite (subordinate unit) (Mann and Thompson,
and Thompson, 1988). Dealing with the schematic 1988). The two basic relations also give rise to other
structure of language, studies in textual relations have possibile combinations (see Figure 2).
always had strong implications for pedagogy (Bhatia, More examples of how these can be applied to word-
1993) and computer-based text generation (Mann and image relations can be found in Section 3.
Thompson, 1988). One very useful notion of textual
linguistics that can be applied to word-image relations is Spatial arrangement of relations
that of unit hierarchy, namely that relations which bind
units together may or may not be of equal status. There Kress and van Leeuwen’s concept of ‘integrated text’ is
can be at least three possibilities. First, a unit can be particularly useful in understanding how the arrange-
subordinate to the main unit. This is what linguists call a ment of relations can be linked to their discourse func-
hypotactic relationship or hypotaxis1: tions. As they argue, verbal and visual codes should not
be seen as separate; instead they should be ‘looked upon
Although he woke up late, he could still catch the bus.
as interacting with [and] affecting one another’ (1996, p.
A unit subordinate to another unit is found in the 183). This view of integrated text allows us to see how the
logico-semantic relations having supporting functions, ‘representational’ and ‘interactive’ meanings are interwo-
such as elaboration, enhancement and decoration. ven into three interrelated systems:
Second, two units can be of equal status and linked
in coordinate fashion: Information value: The placement of elements …
endows them with the specific informational values at-
Mary was a teacher and her husband was a nurse. tached to the various zones of the image: left and right,
top and bottom, center and margin.
This is known as a paratactic relationship or parataxis,
and is found in logico-semantic relations which are of Salience: The elements are made to attract the viewer’s
equal standing, without one modifying the other. The attention to different degrees, as realized by such factors
logico-semantic relations of extension and projection as placement in the foreground or background, relative
are good examples of this type of relationship. Although size, contrasts in total value (or colour), difference in
this is a useful classification, there are occasions in sharpness, etc.
A taxonomy of the discourse relations between words and visuals 215
Framing: The presence or absence of framing devices As all document designers know, the arrangement of
(realized by elements which create dividing lines, or by elements in a document is more than a random choice,
actual frame lines) disconnects or connects elements of and reflects rather complicated cultural and individual
the image, signifying that they belong or do not belong expectations. Kress and van Leeuwen touch on the issue
together in some sense. (p. 183) by identifying what is regarded as ‘given’ on the left-hand
side and what is regarded as ‘new’ on the right-hand
In other words, information value is linked to the
side, following how people read English sentences:
relationship between how information is spatially
‘for something to be Given means it is presented as
arranged and its inherent meanings and implications.
something the viewer already knows, as a familiar and
Salience and framing are related to how information
agreed-upon … For something to be New means that it
is highlighted and how it is divided. Since these three
is presented as something the viewer must pay special
elements are important in creating text coherence
attention.’ (1996, p.187). Other specific meanings may
through combining and textualising semiotic resources,
be created by other spatial arrangements. Centre posi-
they are treated under the ‘textual’ meta-function in
tion usually denotes a more important role of the unit
systemic-functional framework of Halliday. Two other
compared with the marginal position, the top position
meta-functions are ‘ideational’ and ‘interpersonal’,
means ‘ideal’, and the bottom position is usually related
which will be dealt with in detail below.
to more real and down-to-earth particularities. This is
Particularly relevant to our discussion here is the
why attractive models (which take up most of the space)
idea of information value, which pays attention to how
are always put in the central or ‘ideal’ position in an
different verbal and visual elements can be arranged
advertisement and disclaimers (which aim at reducing
in such a way that specific meanings can be created.
legal responsibilities) are placed at the very bottom.
It is certainly useful to apply these concepts in There are, in fact, empirical findings that support
analyzing the different logico-semantic and symmetrical these observations. Biber et al. (2002) argue, based
relations in a document. For example, the ‘identifica- on a large computerized language corpus, that most
tion’ relation, as a subordinate unit, is usually put on satellite clauses are posed after nucleus clauses and that
the right hand side (as NEW information2), and what is only about 30% of satellite clauses are positioned at
being identified is put on the left hand side (as GIVEN the beginning of a sentence or in the medial position.
information). There are at least two reasons for this deviation. Firstly,
In other words, what is designated as GIVEN on satellite clauses can function as the ‘bridge’ between
the left hand side usually coincides with the NUCLEUS previous and subsequent discourse (Givon, 2000).
element (the key element), whereas what is presented Interestingly, this principle can be equally applicable to
as NEW is closely related to the SATELLITE unit (the the relationship between verbal and visual elements.
subordinate unit) on the right hand side. This is basically In the diagram below, the identification satellite in
the same as linguistic information. the middle is put on the left hand side of the diagram
being illustrated, instead of the usual right position.
He arrived in Hong Kong When he was 12.
This is mainly because the function of the identification
Nucleus Satellite satellite is to serve as a link between the two panels
(Enhancement: Time)
(those of the recorder and the remote control) to show
Given/Familiar New/Unfamiliar the corresponding buttons. Secondly, from a linguistic
point of view, in addition to their coherence function,
Satellite: Identification
(New/Unfamiliar
Information)
Warrantability
(Doubt or certainty)
Epistemic
Comprehensibility
(Observation or Perception in
Evaluative type addition to doubt or certainty)
Desirability
Normativity
Attitudinal Usuality
Importance
Humorousness
and ‘obviously’ tend to express the writer’s conviction, Metaphor has been studied mainly from cognitive
whereas words such as ‘seemingly’, ‘apparently’ express and pragmatic perspectives, focusing on verbal data
some degree of doubt. alone. The most notable exception is Marcus (1998)
Other categories are expressions of more personal who studied the use of visual metaphors in user-inter-
judgment about a statement or visual. Desirability face design in computer documentation. The cognitive
assesses the extent to which something causes approach to metaphors argue that metaphors reflect
satisfaction or otherwise. Usuality expresses an attitude not only superficial linguistic patterns, but also the
that something is expected or unexpected. Normativity existence of certain conceptual patterns in our minds
assesses the appropriateness of an entity. Lastly, (Lakoff and Johnson, 1980; Lakoff and Turner, 1989).
humorousness assesses the extent to which something This can be illustrated by the way cognitive linguists
can cause entertainment or has a humorous effect. The study metaphors. They identify the linguistic resources
above figure summarizes the semantic relationships of that are used to express a metaphor with small letters.
the evaluative categories. For example, the metaphorical expression ‘I am quite
rusty in Spanish’ is represented by small letters. This
Metaphorical nature of relations is known as a linguistic metaphor. When cognitive
linguists refer to the conceptual frame that underlies
Like evaluation, metaphor may not be an inherent the linguistic metaphor, which is known as a conceptual
feature of all word-image relations but is important metaphor, they will use capital letters such as ‘MIND
in various genres. To define it briefly, metaphor is IS MACHINE’ (A = B). ‘A’ denotes the target domain
a figurative expression that is transferred from one (mind) and ‘B’ is the source domain (machine).
semantic domain to another. ‘Metaphor’ here includes Metaphors have been frequently used to convince
all kinds of figures of speech such as similes, metonomy, and persuade others. As Gibbs and Gerring (1989, p. 156)
synecdoche, hyperbole, and apostrophe, since the term argue, metaphors play a crucial role in maintaining social
‘metaphor’ has been so widespread that it can be used as and interpersonal relationships through the common
an umbrella term for all related terms (Chandler, 2004). ground of the speaker and the listener. The listener has
A taxonomy of the discourse relations between words and visuals 221
Again, previous studies on linguistic metaphors can be This figure can be illustrated diagrammatically as follows,
fruitfully applied to the study of visual metaphors. By together with the concepts of logico-semantic relations,
focusing on printed advertisements, Forceville (1996) symmetry/asymmetry and nucleus-satellite pair.
identified four types of pictorial metaphors: those with
one pictorially present term, those with two pictorially Specification
present terms, pictorial similes and verbo-pictorial
metaphors. Obviously, the last category is most relevant
to the discussion here. According to Forceville, in verbo-
pictorial synthesis, one of the domains (target or source)
is realized verbally, and the other is realised pictori-
Picture: Plane and the label Sentence: ‘This project is really
ally. The removal of either element may result in the
taking off.’
disappearance of a metaphorical relationship. Focusing (Inherently metaphorical)
on the use of metaphors in computer documentation,
Marcus (1998) argues that metaphors are a significant
component of user-interface designs (for example, the Identification (Metaphorical)
use of trash bin to represent a file folder for discarded
computer files) because metaphors can increase the
level of familiarity of new concepts to readers and can
also increase ease of learning, memorization and use.
Horn (1998) points out that metaphorical relationship The Bradford Plane (Visual)
between words and images can be elaborated by words Project
at a higher level. For example, a picture that shows a (Words)
plane taking off with the label ‘the Bradford project’ is
a metaphorical relationship, which is anchored by a In this diagram, the plane is the nucleus (the most
speaker’s sentence with a similar metaphorical meaning: important proposition in a message to which other
‘this project is really taking off ’. more peripheral propositions are referred), not only
222 Kenneth Kong
because it takes a central position in the picture but also An example of analysis (full text in appendix)
because the sentence ‘this project is really taking off ’
clarifies or specifies the metaphorical relationship. It This classification can be put into practice by examining
should be noted that words and images may not have an extract from a travel guide. Travel guides exhibit a
any metaphorical relationship binding them, but as with range of styles, from content that is extremely packed
other classification values, metaphor can overlap with with words without any pictures to content that is fully
other values. In the example above, the relationship illustrated with images and pictures. This reflects the
between the visual plane and the label (the Bradford history of the genre – texts produced more than 10
Project) is metaphorical and linked by the logico- years ago tend to be more word-based, whereas those
semantic relation of identification. The following chart produced in the last 5 years are usually illustrated
summaries the different possible combinations of text- with more pictures and images, although the degree
image relations or ‘network’ of relations in Halliday’s of integration between words and images can vary
terms (1994): considerably. The example selected puts words and
Expansion images in a rather integrated fashion and is what Horn
Logico-semantic Projection (1998) calls ‘visual language’.
relations Decoration The text is about a scenic spot in central India called
Fatehpur Sikri. The text, excluding the page numbers
and label at the top, can be divided into six blocks of
Coordination
(Nucleus-Nucleus) information. The name of the place ‘Fatehpur Sikri’, as the
Hierarchy Subordination smallest piece of information, is the first block (Block 1).
(Nucleus-Satellite) The second block serves as an introduction and is located
Neutral at the top left-hand corner, immediately underneath the
title (Block 2). It consists of an image of ‘Fretwork jali’
Right and a paragraph that introduces readers to the history
Text-image Left of the place. The most prominent aspect of this extract
Relations Information Centre is obviously the drawing of the place together with a
value Top synthesis of words and photographs that branch out from
Bottom the drawing (Block 3). The fourth block, squared into a
box, is located at the top right-hand corner and is entitled
Ideational
‘Visitors’ Checklist’ (Block 4). The last two blocks are
Meta-functions Interpersonal located at the bottom on the right-hand side. One is about
Textual
the sights that visitors should not miss (Block 5) and the
other is a plan of Fatehpur Sirki (Block 6). For the sake
Evaluative
Evaluative of better organization, I will initially focus on the logico-
Non-evaluative
semantic relations, the symmetrical status and the meta-
functional distribution of these relations in the extract.
Metaphorical
Metaphor I will then move on to the spatial arrangement of these
Non-metaphorical
relations, and finally the evaluative and metaphorical
Figure 9. Network of relations nature of a number of the relations.
A taxonomy of the discourse relations between words and visuals 223
Logico-semantic relations and their hierarchical represen- but adds new information to the nucleus. Similar to the
tation and meta-functional distribution MOTIVATION block, the fourth block, the ‘Visitor’s
Checklist’ adds further information about the site
At the macro-level, the three-dimensional picture (instead of simply elaborating), such as exact location,
with its elaboration can be regarded as the nucleus of address and telephone number of the information
information. All other blocks are satellites to it. But centre, and other important information about the site.
what are their binding relationships? The first block is Hence, this qualifies as a MEANS relation, also under
the IDENTIFICATION of the nucleus by a satellite. The the category of enhancement. The last two blocks do
second block, giving background historical information not give new information, and mainly elaborate what
to the main block, MOTIVATES the readers to read is already there in the nucleus. The plan of Fatehpur
the rest of the page by conveying the importance Sikri EXPLAINS the spatiality of the site, whereas the
of the scenic spot. Of course, readers can skip this ‘Star Sights’ block reinforces what is also highlighted
information and jump to any part of the extract, but in the nucleus by SPECIFYING which spot should not
that is the intended function because it is put in the top be missed. This also has an IDENTIFYING function
left-hand corner, orienting readers to it as the first piece by telling the reader exactly what a star sight is. The
of information by following the normal reader sequence following diagram shows the logico-semantic relations
from left to right in English. This is also qualified as found in the extract:
an enhancement relation because it does not elaborate
The foregoing analysis has only identified the nucleus The following table summarizes the distribution of all
and satellites of the larger blocks of information, and the relations found in the extract.
the logico-semantic relations that bind them together.
Obviously each block of information has it own Meta-functions Logico-semantic Number Percentage
internal structure that is at the same time marked by a relations
Ideational Elaboration: Identification 18 32.7%
hierarchical layer (or layers) of relations. To explain
Extension: Collection 12 21.8%
how this internal hierarchical structure works, I will Elaboration: Explanation 9 16.4%
refer to the central block (Block 3) as an example. The Extension: Variation 7 12.7%
three-dimensional map with a centrifugal elaboration Elaboration: Specification 5 9.1%
of words and images can be seen as a ‘composite’ of Sub-total 51 92.7%
different sub-blocks linked by COLLECTION, as a Interpersonal Decoration 2 3.4%
type of extension that adds new information. Each Enhancement: Motivation 1 1.8%
Enhancement: Means 1 1.8%
element in this composite is also supported by a number
Sub-total 4 7.3%
of IDENTIFICATION satellites, which label the
Textual 0
individual spots in the place. What about these satellites Total 55 100%
in relation to the photographs and short paragraphs that
go along with them? The photographs can be regarded Figure 12. Distribution and percentage of logico-semantic
as nuclei that invite readers to COMPARE (i.e. the relations
VARIATION relation) them with the corresponding
section in the map. The short paragraph that is usually
placed underneath each photograph is its anchor or
EXPLANATION. The complicated relationship of the
main block (Block 3) can be illustrated as follows.
Central Block
Extension: Collection
Elaboration: Identification
……………………………
Text Photograph Corresponding
part in the 3-D
map
Figure 11. Complicated relationship of the main block
A taxonomy of the discourse relations between words and visuals 225
It is clear from the table above that most of the relations extract can illustrate some of these considerations. The
are elaborative, particularly that of IDENTIFICATION. most obvious example is the picture. The specific spots
The second most frequent relation is COLLECTION, in the picture are identified by text and photographs
under the category of extension. EXPLANATION is on every side of it, instead of its right hand side, which
the third most frequent relation. This is not surprising is supposed to be the position for a satellite. However,
at all because the main communicative purpose the document designer still follows what is regarded as
of travel guides is to identify where to find certain the usual way of an identification relation in the block
places and then explain why they are worth visiting. about the floor plan (Block 6). What is being labelled
COLLECTION is frequently used to organize aspects of is put on the left hand side and the labels are put on
the explanation. COMPARISON is used in this extract the right hand side. Also note the placement of the
because photographs are used to show images of the background information (as a MOTIVATION satellite
spot to increase the readability of the document. The relation) at the top left hand corner, while the ‘Visitor’s
less frequent relations found are SPECIFICATION, Checklist’ (as a satellite relation of MEANS) is in the top
DECORATION, MOTIVATION and MEANS. In right hand corner. In sum, the information value of a
other words, the types and distribution of relations are document is rather complex and can only be understood
consistent with the overall purpose of the document in terms of the specific layout in question, in addition
in question. In terms of the meta-functions of the to the conventional understanding of what we mean by
relations, most of the relations are ideational, that is, information value.
to convey core information about the place. Although Evaluation is not an inherent feature of all relations
interpersonal meta-functions constitute only 7.3% of the and is usually embedded in a logico-semantic relation.
total relations, they are important in positioning readers In the extract, evaluation is frequently found in the
and constructing the image of a professional writer who verbal elaboration of the real pictures. Some of them are
understands the needs of a traveller. related to how beautiful or desirable a certain feature is
(desirability):
Spatial arrangement, and evaluative and metaphorical lie within this lavishly decorated ‘Chamber of Dreams.’
functions
Akbar’s queens and their attendants savoured the cool
In terms of the information value of spatial evening breezes.
arrangement, the nucleus is in the central position. The fine dado panels and delicately sculpted walls of
To put this another way, the picture is regarded as the this ornate sandstone pavilion make the stone seem like
nucleus of all relations mainly because it is in the central wood.
position and takes up most of the space. What I am
trying to argue is that the information value of a relation Some linguistic expressions are used to anchor the visual
is both its constructive and constitutive feature. As I with highlights of its uniqueness (Usuality/unusuality,
explained above, the NUCLEUS-SATELLIE pattern not Normativity, which usually refers to the idea of
usually coincides with the GIVEN-NEW pattern, but appropriateness):
there are always other considerations – regardless of This hall for private audience and debate is a unique
whether they are ideational, interpersonal or textual fusion of different architectural styles and religious
– when a document designer makes a decision. This motifs.
226 Kenneth Kong
It is topped with an unusual stone roof of imitation clay tourists should not miss. This inherently metaphorical
tiles. heading is used to identify the two important sights: the
Turkish Sultana’s House and Panch Mahal, which are
The evaluative category of warrantability is also found in
preceded by a star ‘*’. The relationship between the ‘star’
the verbal elaboration:
and the names of the places is one of identification, and
Its decorative screens were probably stolen after the city is also metaphorical.
was abandoned.
Sometimes identified as the treasury…
Conclusion
These are the main evaluative categories in the extract,
and are consistent with the communicative functions Although the importance of turning visual information
of a travel guide. Readers expect a travel guide to into empirical data has been highlighted in the recent
be informative, elaborate and accurate. At the same literature on multimodality, there remains a gap
time, writers of the text also have to convince readers between what has been said and what can be done to
that a certain place is worth visiting, which explains study multi-modal discourse. Kaltenbacher (2004, p.
why desirability and usuality/unusuality are most 203) succinctly argues for a need “to develop systematic
frequently used. This also highlights the persuasive methods for theoretically developing, systematically
function of travel guides, in addition to the commonly analyzing and empirically testing the semiotic unfold-
acknowledged function of giving factual information. ing of resources and modes and their combinations
To summarise, evaluative meanings that are usually in all aspects of our daily lives.” Some recent attempts
embedded in a larger segment of a logio-semantic (Bateman et al., 2004; Taylor, 2003) have been made to
relationship can be used to anchor pictures or words transcribe or annotate multi-modal discourses so they
(words anchoring pictures in the extract) by supporting can be stored and further studied. However, a systemic
the broader communicative function of a text. This is classification of the schematic or functional relationship
subtle, but can be argued to be more powerful. between text and images is still unexplored, as language
The last characteristic to be discussed here is meta- has its verbal, pictorial and schematic modes (Walker,
phor. Similar to evaluation, metaphor is NOT the 2001, p. 177).
inherent feature of a relation. It is possible to classify a Current studies either take an over-simplified view
relation into a logico-semantic type or call it a nucleus of the issue or use existing text-based models (such as
or satellite, but it may not be possible to identify a that of Mann and Thompson, 1988) as their framework
word-image relation of any metaphorical nature. In the without any modification, ignoring the potential
sample extract, a relation that is related to metaphorical differences between words and images. Further studies
meaning can be found in the information block about in word-image relations will not only be important in
the star sights. In fact, the heading ‘Star Sights’ is testing assumptions about these two modes (such as
metaphorical in itself. Star belongs to the domain of ‘only words are used to anchor pictures’), but will also
‘universe’ and does not have any direct connection to be important in two other respects. First, they will offer
scenic spots. The associated meaning of stars is always practical guidelines to document designers on how to
attached to importance in everyday discourse, and the use words and visuals in a more effective way. Second,
same meaning is invoked here. Star sights are places that they will offer an understanding of how new meanings
A taxonomy of the discourse relations between words and visuals 227
are created in this word-picture fusion. Although units examined in this paper are based on meaning as the sole
increasing attention is being paid to meaning making criterion for distinction, i.e. whether a unit is subordinate to
and negotiation in verbal language or conversational another in terms of their meaning. This is more useful since
visuals do not have some of the structural characteristics of
discourse, this important – subtle, but more powerful
verbal language. Moreover, in the case of verbal examples, the
– meaning-making device has not received enough units of analysis can be a word, a clause, a sentence or even a
consideration. paragraph although clauses are used as examples.
Hence, this paper identifies the potential relations 2. It should be noted, however, that the ‘new’ element in
between words and images. Most importantly, it argues the GIVEN-NEW pair is not the same as new information in
for a multiple level of relations and the need for empiri- the classification of logico-semantic relations in which new
cal analysis. The framework, shown using different information is related to the presence of information new to
types of data, can be applied to other types of discourse the nucleus. New information in terms of information value
including web-based genres and visual-based genres. concerns the degree of familiarity to the readers/viewers. This
Owing to space limitations, many concepts, particularly is the reason why alternative phrasing such as ‘Given-Un-
familiar’ or ‘Familiar-Unfamiliar’ might be more suitable in
the various logico-semantic relations, have not been
order to distinguish the two classification systems.
examined here in great detail, and only a limited set of
3. In systemic-functional linguistics, evaluation is usually
examples has be used as illustrations. The potentially
regarded as an interpersonal resource only, through which an
subjective and ambiguous nature of some verbal-visual evaluator is used to consciously influence a reader towards
relationships will need to be followed up in future some ideational content. I have to thank an anonymous re-
studies. viewer for pointing this out.
4. The epistemic function of evaluation is closely related to
modality in systemic-functional linguistics and is for adjusting
Acknowledgements a writer’s stance towards a proposition. Modal verbs such as
‘may’ fall into this category.
* I would like to thank the Research Grant Committee of
Hong Kong SAR for their generous support to the project on
which this paper is based (RGC research project # HKBU2162/ References
03H). Every attempt has been made to obtain permission to
reproduce copyright material. If any proper acknowledgement Barthes, R. (1967). Elements of Semiology. London: Cape.
has not been made, or permission not received, I would be glad Barthes, R. (1978). Image-Music-Text. New York: Hill and
to hear from the copyright holders. Wang. Reprinted in S. Sontag (Ed.) (1993). A Barthes
Reader. London: Vintage.
Bateman, J., Delin, J., & Henschel, R. (2004). Multimodality
Notes and empiricism: Preparing for a corpus-based approach to
the study of multimodal meaning-making’ In E. Ventola,
1. It should be highlighted that the idea of subordination and C. Charles & M. Kaltenbacher (Eds.), Perspectives on
coordination or hypotaxis and parataxis is also related to the Multimodality (pp. 65–87). Amsterdam: John Benjamins.
structural hierarchy of clauses. One of the clauses may have Bhatia, V. K. (1993). Analysing Genre: Language Use in
a higher standing and can stand alone, and the other is lower Professional Settings. London: Longman.
and cannot exist alone. Although this may be applicable to Biber, D., Conrad, S., Leech, G., & Longman (2002). Longman
some of the relations identified in this study, the hierarchical Student Grammar of Spoken and Written English. Harlow
Essex: Longman.
228 Kenneth Kong
Bonsieppe, G. (1966). Visual-verbal rhetoric. Dot Zero, 2, 37–38. forms of text. In I. Snyder (Ed.), Page to Screen: Taking Lit-
Chandler, D. (2004). Semiotics: The Basics. London: Routledge. eracy into the Electronic Era. London, New York: Routledge.
Charteris-Black, J. (2004). Corpus Approaches to Critical Meta- Kress, G., & Van Leeuwen, T. (1996). Reading Images: The
phor Analysis. New York: Palgrave Macmillan. Grammar of Visual Design. London: Routledge.
Delin, J., Bateman, J., & Allen, P. (2003). A model of genre in Kress, G., & Van Leeuwen, T. (1998). Analysis of newspaper
document layout. Information Design Journal, 11(1) 54–66. layout. In A. Bell & P. Garret (Eds.), Approaches to Media
Eckkrammer, E. M. (2004). Drawing on the theories of inter- Discourse. Oxford: Blackwell.
semiotic layering to analyse multimodality in medical self- Kress, G., & Van Leeuwen, T. (2001). Multimodal Discourse:
counselling texts and hypertexts. In E. Ventola, C. Charles The Modes and Media of Contemporary Communication.
& M. Kaltenbacher (Eds.), Perspectives on Multimodality London: Arnold.
(pp. 211–226). Amsterdam: John Benjamins. Lakoff, G., & Johnson, M. (1980). Metaphors We Live By.
Eco, U. (1976). A theory of semiotics. Indiana University Press: Chicago: Chicago University Press.
Bloomington. Lakoff, G., & Turner, M. (1989). More than Cool Reason: A Field
Fogg, B. J., Marshall, J., Laraki, O., Osipovich, A., Varma, C. et Guide to Poetic Metaphor. Chicago: University of Chicago
al. (2001). What makes websites credible? A report on a large Press.
quantitative study. SIGCHI, March 31-April 4, 2001, Seattle, Lemke, J. C. (1998). Resources for attitudinal meaning:
WA, USA. evaluative orientations in text semantics. Functions of
Forceville, C. (1996). Pictorial Metaphors in Advertising. Language, 5(1), 33–56.
London: Routledge. Lemke, J. C. (2002). Travels in hypermodality. Visual
Gibbs, R.W., & Gerring, R. J. (1989). How context makes Communication, 1(3), 299–325.
metaphor comprehension seem ‘special’. Metaphor and Lemke, J. C. (work in progress). Visual and Verbal Resources
Symbolic Activity, 4(3), 145–158. for Evaluative Meaning in Political Cartoons.
Givon, T. (2000). Syntax Vol. 2. Amsterdam: John Benjamins. Lie, H. K. (1991). The Electronic Broadsheet: All the News that
Greenbaum, S. (1969). Studies in English Adverbial Usage. Fits the Display. Master’s Thesis, Boston, School of Archi-
London: Longman. tecture and Planning, MIT. http://www.bilkent.edu.tr/pub/
Grimes, J. (1975). The Thread of Discourse. The Hague: Mouton. WWW/People/howcome/TEB/www/hwl_th_1.html.
Halliday, M. A. K. (1994). An Introduction to Functional Gram- Mann, W., & Thompson, S. A. (1988). Rhetorical Structure
mar. London: Arnold. Theory: Toward a functional theory of text organization.
Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. Text, 8(3), 242–281.
London and New York: Longman. Marcus, A. (1979). Visual rhetoric in a pictographic-ideo-
Horn, R. E. (1998). Visual Language: Gobal Communication graphic narrative. In T. Borbe (Ed.), Semiotics Unfolding:
for the 21st Century. Bainbridge Island: Washington: Proceedings of the Second Congress of the International As-
MacroVU, Inc. sociation for Semiotic Studies Vienna, July 1979.
Hunston, S., & Thompson, G. (2000). Evaluation in Text: Marcus, A. (1998). Metaphor design in user interfaces. Journal
Authorial Stance and the Construction of Discourse. Oxford: of Computer Documentation, 22(2), 43–57.
Oxford University Press. Marsh, E. E., & White, M. D. (2003). A taxanomy of
Kaltenbacher, M. (2004). Perspectives on multimodality: From relationships between images and text. Journal of
the early beginnings to the state of art. Information Design Documentation, 59(6), 647–672.
+ Document Design, 12(3), 190–207. Martin, J. R. (1992). English Text: System and Structure. Am-
Kong, K. C. C. (forthcoming). Linguistic resources as sterdam: John Benjamins.
evaluators in English and Chinese research articles. Martin, J. R. (2000). Beyond exchange: appraisal systems in
Multilingua. English. In S. Hunston & G. Thompson (Eds.), Evaluation
Kress, G. (1998). Visual and verbal modes of representation in in Text: Authorial Stance and the Construction of Discourse
electronically mediated communication: the potentials of new (pp. 142–175). Oxford: Oxford University Press.
A taxonomy of the discourse relations between words and visuals 229
Martin, J. R., & White, P. R. R. (2005). The Language of Evalua- Sources of examples
tion: Appraisal in English. New York: Palgrave Macmillan.
Oyama, R. (1999). Visual semiotics in a cross-cultural perspec- Eastern Europe (2001). Lonely Planet Press, 6th edition.
tive: a study of visual images in Japanese and selected British Horn, Robert E. (1998). Visual Language: Gobal
advertisements. Unpublished PhD dissertation, University Communication for the 21st Century. Bainbridge Island,
of London. WA: MacroVU, Inc.
Pegg, B. (2002). Two dimensional features in the history of India Travel Guide (2002). London: Dorling Kinderly (Mark
text format: how print technology has preserved linearity. Warner © Dorling Kindersley).
In N. Allen (Ed.), Working with Words and Images.
Westport, CT: Ablex Publishing.
Peirce, C.S. (1931-58). Collected Writings (Vols. 1–8). In C.
Hartshorne, P. Weiss & A.W. Burks (Eds.). Cambridge,
about the author
MA: Harvard University Press.
Richards, A.R., & David, C. (2005). Decorative color as a Kenneth Kong is an associate professor of linguistics in the
rhetorical enhancement on the world wide web. Technical English department of Hong Kong Baptist University. His
Communication Quarterly, 14(1), 31–48. academic interests include discourse analysis, multimodal
Royce, T. (2002). Multimodality in the TESOL Classroom: analysis, intercultural pragmatics and language for spe-
Exploring Visual-Verbal Synergy. TESOL Quarterly, 36(2), cific purposes. He has published extensively in the areas of
191–205. discourse analysis and pragmatics.
Saussure, F. de ([1916] 1974). Course in General Linguistics
(trans. Wade Baskin). London: Fontana/Collins. Contact
Still, J. M. (2001). A content analysis of university library Web
Kenneth Kong
sites in English speaking countries. Online Information
Department of English Language and Literature
Review, 25(3), 160–164.
Hong Kong Baptist University
Stroupe, C. (2000). Visualizing English: Recognizing the
Waterloo Road
hybrid literacy of visual and verbal authorship on the Web.
Kowloon
College English, 62(5), 607–632.
Hong Kong
Taylor, Ch. (2003). Multimodal Transcription in the Analysis,
e-mail: kkong@hkbu.edu.hk
Translation and Subtitling of Italian Films. The Translator,
9(2), 191–205.
Van Leeuwen, T. (2005). Introducing social semiotics. New
York: Routledge.
Van Leeuwen, T., & C. Jewitt (Eds.) (2000). Handbook of Visual
Analysis. London: Sage.
Walker, S. (2001). Typography and Language in Everyday Life.
London: Longman.
White, P. (2001). Appraisal website. Retrieved October 10, 2001
from http:///www.grammatics.com/appraisal/Appraisal
Guide
230 Kenneth Kong
Mark Warner© Dorling Kindersley. Reproduced with permission, except for the top left photo on the left page (© Ashok
Dilwaki) and the top right and top left photos on the right page (© DN Dube).