You are on page 1of 16


Educating the eye?

Kress and Van Leeuwen’s Reading Images:
The Grammar of Visual Design (1996)
Charles Forceville, Vrije Universiteit Amsterdam/
Rijksuniversiteit Leiden (OSL), The Netherlands


This review article of Kress and Van Leeuwen’s (KvL) Reading Images: The Grammar
of Visual Design (1996) begins by giving a summary of its main issues, and highlights
its innovative and bold proposals. In the following sections, some weaknesses and
controversial aspects of the book are discussed. Both are seen as following from the
semiotic and ideological approach adopted by the authors. Specifically, these affect the
proposals for the classification and interpretation of images, and the degree to which the
concepts delineated are generalizable. In the later sections, tentative suggestions are
made as to how KvL’s approach is relevant to the currently emerging ‘cognitivist’

Keywords: categorization; cognitivism; genre; ideological criticism; interpretation of

images; metaphor; relevance theory; semiotics; word & image relations

1 Introduction

Although contemporary society is flooded with images, we have few studies that
provide practical suggestions for the analysis of images and word & image ‘texts’
as distinct from more or less theoretical reflections on that topic (e.g. Arnheim,
1969; Aumont, 1997; Mitchell, 1986; Sonesson, 1988; Thompson, 1996).
Gunther Kress and Theo Van Leeuwen make a courageous attempt to help fill
the glaring gap with a book ambitiously titled Reading Images: The Grammar of
Visual Design, a revised version of their earlier Reading Images (1990). In this
review article I will first present an outline of the book’s contents. Given the wide
range of topics Kress and Van Leeuwen (henceforth KvL) address, this outline
cannot be complete, but it briefly touches upon the study’s main issues and gives
samples of their approach. Subsequently I discuss some issues in more detail,
indicating where KvL’s ideas in my view require qualification. To come clean
straight away: I find the book exciting, thought-provoking and readable, but I have
serious misgivings about a number of methodological issues, and some hesitation
about the ideological framework. These significantly affect the ‘tool-kit’ character
of the book, which thus is not quite the unproblematic textbook its title promises.
In the later sections of this article I will suggest how KvL’s views might be
embedded in a cognitivist approach – an approach which I believe will ultimately

Language and Literature Copyright © 1999 SAGE Publications

(London, Thousand Oaks, CA and New Delhi), Vol 8(2): 163–178
[0963–9470 (199906) 8:2; 163–178; 008171]

yield a more inclusive theory of the image than KvL’s semiotically and
ideologically oriented one.

2 Survey of the contents of Reading Images

In the introduction the authors explain that theirs is a ‘social semiotics’ inspired
by Hallidayan grammar. They ‘intend to provide inventories of the major
compositional structures which have become established as conventions in the
course of the history of visual semiotics, and to analyse how they are used to
produce meaning by contemporary image-makers’ (p. 1). Because of both the
ubiquity of images and the communicative and/or manipulative purposes of their
makers, KvL claim, it is important that people should be trained how to interpret
images. The authors are careful to point out that their grammar is not a universal
one but purports to account only for images in western society, and acknowledge
moreover that even here there may be regional and social variation. They further
presuppose that there is no fundamental difference, but rather a continuum,
between creative, artistic uses of pictures and communicative, ordinary uses. But
for each type of image they argue that the inclusion or exclusion of details, and
the manner of execution, can have ideological implications, and emphasize their
concern with this aspect of pictures, seeing their work as ‘a tool for practical as
well as critical applications in a range of fields’ (p. 14).
The first chapter elaborates a number of issues raised in the introduction.
Barthes’s (1986/1964) famous concept of text ‘anchoring’ or ‘relaying’ the
images it accompanies is discussed and criticized. KvL think that Barthes
concentrates too much on the interdependence of word & image; by contrast they
see the visual component of a text as ‘an independently organized and structured
message – connected with the verbal text, but in no way dependent on it: and
similarly the other way around’ (p. 17). Consequently, they take the view that
‘language and visual communication both realize the same more fundamental and
far-reaching systems of meaning that constitute our cultures, but that each does so
by means of its own specific forms, and independently’ (p. 17), although ‘not
everything that can be realized in language can also be realized by means of
images, or vice versa’ (p. 17). To suggest how the visual affects the way we make
sense of the world from a very early age, the chapter extensively discusses two
illustrations in children’s books. The authors persuasively argue that while the
non-linear nature of the page in one of them, featuring a central drawing and four
smaller ones in the corners, seems to turn it into an open-ended text, the number
of possible readings is in fact quite restricted, and almost inescapably incorporates
certain binary oppositions. The growing role of pictures in contemporary
children’s books (and in virtually every other type of text), KvL claim, changes
the way information is presented and has implications for what is presented. The
chapter ends with a brief discussion of the three functions any semiotic mode has
to fulfil in order to serve its communicational and representational purposes.

Language and Literature 1999 8(2)


These three ‘metafunctions’, adapted from Hallidayan grammar, are not limited to
a specific medium. The ‘ideational metafunction’ pertains to the ways in which
semiotic systems can refer to objects in the outside world, and the relations
between these objects, the ‘interpersonal metafunction’ deals with the relations
between sender and receiver of the sign, and the ‘textual metafunction’ accounts
for the options available to ensure that signs form complexes of signs, that is,
‘texts’. The three metafunctions structure Chapters 2 to 6 of the book.
In Chapter 2, KvL introduce the notion of ‘vector’ as the pictorial equivalent of
the action verb. Real or virtual lines between human elements in a picture
function in ways similar to verbs describing relations between what in Hallidayan
grammar are actors and goals. Since actions presuppose human or human-like
agency, vectorial patterns are called ‘narrative’, six major types being identified.
These types are to be contrasted with conceptual pictures, which represent
participants ‘in terms of their class, structure or meaning, in other words, in terms
of their generalized and more or less stable and timeless essence’ (p. 56). KvL
point out that the distinction applies not only to naturalistic pictures but also to
Three main types of conceptual representations are identified in Chapter 3. The
first is constituted by classificational processes, which relate participants to one
another in a taxonomy on the basis of some feature they share. KvL nicely
illustrate that classificational taxonomies have a tendency to equate all elements
depicted on the same level in terms of one dimension, and that this may disguise
crucial inequalities. The physical orientation of taxonomies (top–down,
bottom–up, left–right) is discussed and some implications are suggested. Even the
way in which diagrammatic lines in classificational structures are drawn is not
neutral, KvL suggest: straight and curved lines evoke connotations of cold
rationality and organicity respectively. The second type of conceptual process is
labelled ‘analytical’ and pertains to the depiction of part–whole structures. The
third type is ‘symbolic processes’. These pertain to the relation between some
element in a picture and what it symbolizes.
Chapter 4 shifts to the interaction between pictures and their viewers. A first
important difference arises from whether or not participants in pictures look
directly at the viewer. In the former case the participant appeals to the viewer, in a
so-called ‘demand’ picture; in the latter case the participant is the object rather
than the subject of the look. These latter are ‘offer’ pictures.1 KvL claim that in an
Australian primary school textbook the Aboriginal people are typically depicted
as ‘offers’, and hence as ‘objects of contemplation’ (p. 126), while white
immigrants are rendered as ‘demands’. But the authors acknowledge that
sometimes (e.g. in film and newsreading) it is simply genre conventions which
dictate the choice between offers and demands. Similar valuations may adhere to
the distance of the depicted participants, objects or events from the camera. Close-
up, medium shot and long shot suggest increasing social distance, but again,
certain frame sizes have become conventionalized in certain types of depiction. In
a later section, KvL propose a bold correlation between involvement with the

Language and Literature 1999 8(2)


depicted participants and the horizontal angle, which can be frontal or oblique.
Discussing two photographs of Aborigines, they conclude: ‘The frontal angle
says, as it were: “what you see here is part of our world, something we are
involved with.” The oblique angle says: “what you see here is not part of our
world; it is their world, something we are not involved with.” The producers of
these two photographs have, perhaps unconsciously, aligned themselves with the
white teachers and their teaching tools, but not with the Aborigines’ (p. 143,
emphasis, as elsewhere, in original) – and the viewer has no choice but to share
their perspective.
Pictures reflect different claims to verisimilitude. Chapter 5 deals with this
degree of a picture’s commitment to the ‘truthfulness’ to reality, a dimension of
pictures KvL call ‘modality’.2 In language, (epistemic) modality is expressed by
such auxiliary verbs as may, will and must, and adjectives such as possible,
probable and certain. KvL distinguish between eight dimensions that co-
determine the degree of naturalness of a picture. It is pointed out that what
constitutes naturalness, and hence also deviation from naturalness, via any of the
eight dimensions, may vary across different realms of pictures.
‘The meaning of composition’ is addressed in Chapter 6. Pictures, including
multimodal ‘texts’, give significant information through the ways in which their
elements are arranged. Three aspects are distinguished: the ‘zone’ in which an
element occurs (left/right, bottom/top, centre/margin); the ‘salience’ bestowed on
it (via foregrounding/ backgrounding, relative size, colour, etc.); and ‘framing’
devices such as vectors between participants. Particularly in the discussion of
zones, KvL come up with bold proposals. They suggest that (in western society)
the left is the region of the ‘given’ and the right the region of the ‘new’; and that
the top is the region of the ‘ideal’, whereas the bottom depicts the ‘real’.
Centre–margin spatial structures obviously stress the former at the expense of the
latter. Left–right and top–down orientations often combine with centre–margin
ones, KvL argue, for instance in so-called ‘triptychs’. The last section is devoted
to a discussion of the greater flexibility in reading paths in pictorial or
multimedial texts than in (more linear) verbal texts.
Meaning inheres not simply in what is depicted, but also in how the depiction
is conveyed materially. This issue is the central concern of Chapter 7. KvL
distinguish three major categories: inscription by hand, recording technologies
and synthesizing technologies. Each of these ‘modes of inscription’ models its
own relations between the producer and receiver of an image, while the
distribution of the image, too, is affected. Types of brushstroke (vigorous, thin,
pointillist, etc.) and surface materials (canvas, marble, wax, etc.) co-determine the
overall impact and connotations that a representation is likely to realize.
In their last full chapter, KvL tentatively explore the area of the three-
dimensional image, ranging from sculptures to children’s toys. While many of the
concepts delineated with respect to the two-dimensional image are applicable to
the three-dimensional as well, there are some obvious differences. For instance,
there are often no fixed perspectives from which to look at sculptures, spatial

Language and Literature 1999 8(2)


orientations (left–right, top–bottom, centre–margin) being more flexible; and the

physical setting in which they appear may differ from one museum or gallery to
another. Furthermore, certain three-dimensional objects (e.g. Playmobil toys)
encourage interactive behaviour.
KvL indeed cover an impressive range of aspects and types of pictures. The
choice of breadth inevitably means that a price is paid in terms of depth, so that in
some cases the analyses only whet the reader/viewer’s appetite for more detailed
analysis and more examples, while in others there remains a gap between the
analytical tools proposed and their practical use. But I discern also a number of
serious problems pertaining to methodology and perspective in KvL’s approach.
In the following sections I will outline and discuss my doubts and criticisms. I
present them in the awareness that KvL are pioneering largely unexplored
territory. My observations aim at marking some pitfalls and pointing out
distortions inherent in the type of map KvL have drawn.

3 Classifications

KvL are social semioticians, and it is typical of semiotic approaches that they
present phenomena in terms of oppositions, often in grid patterns or tree
diagrams. This, of course, is in itself a commendable way of defining similarities
and differences. Charting a new field involves categorizing, if possible
hierarchically, a hitherto undivided mass of data. KvL regularly summarize the
pertinent distinctions they have found in a tree diagram, followed by a section,
‘Realizations’, in which they briefly describe the sub-categories (some of the
either/or type, others of the and/and type) in the tree. The usefulness of these
hierarchical categories will depend on their applicability to new pictures. While
the authors describe and illustrate all their categories, not all of them are included
in the ‘Realizations’ sections. Why not? The number of levels distinguished can
amount to no fewer than six (p. 107, see Figure 1 in Appendix). We need all the
help and examples we can get to be persuaded that the schema in Figure 1 is
correct and applicable, but often we have to make do with a short description and
a single example of each slot in the schema – and the (arbitrary?) absence of
several subtypes in the ‘Realizations’ section is irritating. A more serious problem
is that according to Figure 1 ‘inclusive spatial structures’ cannot be conductive,
since ‘conductivity’ is one of the subdivisions of ‘exhaustive’, but not of
‘inclusive’. Exhaustive structures, KvL explain, depict all the elements their
carriers comprise, whereas inclusive structures do not. The latter select only a few
elements for depiction. Thus a technical drawing of a machine may (exhaustively)
depict all of its parts, or it may (inclusively) highlight only some of them. Now
‘conductors’ are said to ‘indicate a potential for dynamic interaction between the
Possessive Attributes they connect’ (p. 100). Examples of conductors are a
pipeline, a road, a railway track, but they may also be of a more abstract kind. It is
clear that an exhaustive technical drawing of a machine can be conductive, but I

Language and Literature 1999 8(2)

cannot see why an inclusive technical drawing of a machine may not be equally
conductive. However, in the scheme this possibility is excluded.
Here is a comparable issue. In Chapter 5, KvL devote a section to markers of
visual modality. They define eight markers: colour saturation, colour
differentiation, colour modulation, contextualization, representation, depth,
illumination and brightness. They say sensible things about each of these markers,
but there is little discussion on how some of them (notably the first three) relate to
one another, and how they can be used in the practical analysis of specific
pictures. In the ensuing section, ‘coding orientation’, KvL argue that what
constitutes the ‘highest modality’ (that is, what is considered ‘most normal’)
depends on the kind of picture discussed. They distinguish the following four
coding orientations: scientific/technological, sensory, abstract and naturalistic.
The abstract coding orientations
are used by sociocultural elites – in ‘high’ art, in academic and scientific
contexts, and so on. In such contexts modality is higher the more an image
reduces the individual to the general, and the concrete to its essential qualities.
The ability to produce and/or read texts grounded in this coding orientation is a
mark of social distinction, of being an ‘educated person’ or a ‘serious artist’.
(p. 170)
In diagram 5.5 (p. 171) the modality values of colour saturation are given for each
of the four orientations. Highest modality in the abstract orientation is ‘black and
white’. Whereas this may be an accurate description of scientific pictures and
diagrams, I am by no means convinced that the same standard applies to ‘high
art’. KvL seem to have some doubts, too, for after discussing a number of
paintings they admit that ‘the examples in the previous section show that the
modality values in art can be complex’ and a few lines later they go even further:
‘in many other kinds of images, too, “modality markers” do not move en bloc in a
particular direction across the scales, say from the abstract to the sensory, but
behave in relatively independent ways’ (p. 176). But then, of course, one wonders
what is the relation between the different modality markers, and what conclusions
one may draw on the basis of a certain marker having high or low modality.
This is not nit-picking: these difficulties point to a more basic problem, namely,
the problem of categorization. Of course the delimitation of categories and the
development of criteria to decide membership or non-membership of an item in a
category are crucial to scholarship of any kind. But the problem is that categories
are seldom clear-cut; many categories are fuzzy, and describe a continuum
between extremes rather than a binary opposition with an either/or structure. At
the end of a discussion about classificational processes, KvL themselves draw
attention to this danger: ‘Our discussion above has, we hope, made it clear that we
see these distinctions as tools with which to describe visual structures rather than
that specific, concrete visuals can necessarily always be described exhaustively
and uniquely in terms of any one of our categories’ (p. 88). However, this caution
is usually absent when KvL present their own classifications; the typically

Language and Literature 1999 8(2)


semiotic either/or branches3 as well as the hierarchical structure hide the fact of
fuzziness, suggest exhaustiveness and hint at stable, authoritative hierarchies. In
this respect, KvL might have benefited from Rosch’s work on ‘prototype theory’,
a notion that is central to Lakoff’s famous Women, Fire and Dangerous Things
(1987). Lakoff rejects the notion that something is either absolutely in, or outside
of, a category in favour of the notion that categories are radial structures, with
more and less prototypical members. His book appears in KvL’s bibliography, so
one would have expected the authors at least to discuss, and possibly even to
accommodate, this very different view of categorization.4

4 The interpretation of individual images

KvL illustrate and underpin the pertinence of the theoretical concepts they adduce
by constant reference to the many pictures (176 black and white and 8 colour
plates) in the book. This interaction between theory and practice is an excellent
feature, because it makes the theoretical proposals concrete, lively and verifiable.
Since KvL aim at providing a ‘grammar’ of western images, the concepts they
develop should ideally be routinely applicable; more specifically we would expect
that the demonstrations of the applicability of their concepts to the pictures in the
book are convincing and unproblematic: after all, the authors were free in their
choice of examples. And indeed in many cases the analyses of the pictures in the
light of the concepts discussed are persuasive and illuminating. But alongside
these there are a substantial number of pictures whose interpretations are
debatable, or at the very least one-sided. And this means that in these cases
whatever ‘grammar’ is operative in pictures is less intersubjectively shared than
KvL suggest. That is, either KvL claim too much explanatory or even predictive
power for their concepts, or the delineation of some ‘grammatical rules’ requires
further refinement. In this section, I will focus on some problematic cases.
To begin with, KvL are at least once verifiably wrong in their description of a
picture. Explaining that pictures with only one participant exemplify ‘non-
transactional structure’, KvL state that
the action in a non-transactional structure has no ‘Goal’, is not ‘done to’ or
‘aimed at’ anyone or anything. The non-transactional action process is
therefore analogous to the intransitive verb in language. … In the picture in
figure 2.16 [a film frame] the principal Actor is formed by Ben Hur (Charlton
Heston) and his chariot. … Ben Hur ‘races’, but he does not race anything or
anyone, at least not so far as we can see in this picture. (pp. 61–2)
But this is simply not true. Anyone who has seen the film knows that Ben Hur in
this climactic scene does race against a number of opponents (and more
specifically against his former friend and present foe Massala). Curiously, figure
2.16 depicts no fewer than three other chariots racing in the stadium, so that even
the visible evidence in this isolated film frame belies KvL’s analysis. That is, the

Language and Literature 1999 8(2)


frame exemplifies a transactional structure, structurally similar to that identified in

figure 2.1 (p. 43), which KvL render, plausibly, as ‘the British stalk the
Aborigines’. The problem recurs in the analysis of a sculpture by Kenneth
Armitage, People in the Wind (1952), shown in figure 8.2 (p. 244). KvL conclude
that the sculpture has ‘strong vectors, formed by the way the figures are bent
forwards as they struggle against the wind. But ... the action is “non-
transactional”. … The figures, it seems, “strain forwards”, but they do not “strain
towards something” ’. Perhaps this is true if one verbalizes their action as ‘to
strain towards something’, but with another verbalization there is no problem.
Figures 2.16, 8.1, and 8.2 could be rendered as ‘Ben Hur fights his opponents’,
‘Jacob fights the Angel’ (the latter a sculpture described as transactional) and ‘the
people fight the wind’, respectively. That is, given these verbalizations, all three
pictures qualify as being transactional. I dwell so long on this issue because it
might reveal that KvL compare visual structures too much with surface language
instead of with the mental processes of which both surface language and images
are the perceptible manifestations.5
In other cases, too, interpretations by KvL are controversial. Take the
Figure 3.38 shows an oil drilling installation in the Sahara desert. The de-
emphasizing of detail, and hence the ‘mood’ of the picture, results from the
extreme lighting conditions in which the setting sun plays the role of a low
backlight. In this way the oil drilling installation becomes a symbol for the
disappearance of the old Bedouin lifestyle. The accompanying text in fact ends
with a quote from a Bedouin, lamenting the demise of traditional ways of life –
followed by the photo. (p. 112)
An alternative interpretation of the same picture, however, could be, ‘How
beautifully man-made industry merges with nature!’. Why don’t KvL offer this
alternative interpretation besides their original one? One reason is suggested in
the quotation itself: KvL take into account the text that anchors the picture. But
that means they have not simply interpreted the picture; they have interpreted the
picture-cum-text. That is, they defuse their own (already quoted) view, contra
Barthes, that a picture and its accompanying text are mutually independent. Once
you have read the accompanying text, you can no longer look at the picture in a
disinterested manner, as KvL themselves prove here. But there is something else.
It is clear that KvL are committed to the idea that pictures reveal ideologies – a
notion that is generally shared in an age permeated by the postmodernist
awareness that no representation of reality can ever lay claim to being neutral –
and they are keen to identify and expose any suggestions of false ‘naturalness’
lurking beneath the surface. I sympathize with this aim, although I have
reservations about the ease with which they declare this holds for artistic
representations as well (p. 13). But while in a number of cases they are admirably
persuasive in showing how apparently innocent pictures slyly manipulate their
viewers, sometimes this zeal to expose hidden ideologies takes precedence over a

Language and Literature 1999 8(2)


cool attempt to analyse what is, presumably, objectively there. Surely the whole
point of developing a visual grammar makes sense only if there is general (within-
culture) agreement about the presence and effects of at least some aspects in a
picture, and these intersubjectively establishable aspects need to be identified and
described before any valuations or interpretations are attached to them.
A similar problem surfaces in KvL’s discussions of various children’s
drawings. The interpretations are fascinating, but there is no way in which the
reader/viewer can verify them empirically. Here is an example:
Figure 4.27 is the front cover of a ‘story’ on sailing boats by a child … The
characters [in the boat] do not look at us. … The angle is frontal and eye level,
and the two figures in the boat are neither particularly distant, nor particularly
close. There is no setting, no texture, no colour, no light and shade. … But for
the two figures, simply drawn, and more or less identical, except for their size
(a father and son?), this could be a technical drawing. As such it suits the
objective, generic, title, ‘Sailing Boats’. … In most of the illustrations inside
the [visual] essay, no human figures are seen, as though the child already
understands that the ‘learning’ of technical matters should be preceded by a
‘human element’ to attract non-initiates to the subject. (p. 158)
Well, yes, possibly – but this remains rather speculative. First, as in the preceding
case, the authors invoke context in the form of (a) verbal anchoring, via the title,
and (b) other pictures in the pictorial ‘essay’. Again, the picture is interpreted, not
simply as an isolated representation but as a word & image text, and moreover as
part of a more comprehensive whole. If this is crucial to KvL’s interpretation,
however, a grammar of pictures is after all quite heavily dependent on textual
anchoring and (pictorial) context, but KvL do not pay much attention to the
interaction between pictures and (con)text. Although they profess to be aware of
the importance of context, their concepts and models do not specify how context
must be incorporated (for alternative approaches, see Cook, 1992; Forceville,
1996). And if KvL are allowed to speculate about the drawing’s meaning, so is
everybody else. Let me try: the boat sails from right to left. Given the importance
of left–right orientations in pictures in terms of given–new, it is clear that the child
has a longing for the given rather than the new. The boat sails toward the given,
the past, turning its stern to the new, the future, and the child may suffer from
regressive behaviour and fear of the future. Moreover, the title of the drawing
occurs in the top half of the drawing, that is, in the region of the ideal, while the
picture is underneath. The child is thus rooted in the pictorial, but aspires to
language. Sensible? Flippant? I am sure that KvL have hit upon important
distinctions, but they do not specify when bottom/up and left/right orientations do
apply and when they do not. In several instances, KvL are carried away by their
theoretical and ideological framework, arbitrarily or rigidly applying it to new
pictures, and this sometimes yields highly unconvincing results. A full-blown
visual grammar should predict, or at least suggest, under what conditions certain
‘rules’ operate. But acceptable or valid interpretations of a picture may reside less

Language and Literature 1999 8(2)


fully in picture-immanent factors, and correspondingly more in pragmatic ones,

than KvL are prepared to acknowledge. This brings us naturally to the
generalizability of KvL’s concepts.

5 Generalizing across pictures

One of KvL’s exciting decisions is to tackle pictures from a wide variety of

sources, mixing paintings, diagrams, film frames, school book illustrations,
newspaper photographs, technical drawings, emblems, etc. This strategy is
refreshing, and potentially innovative: if the authors succeed in demonstrating the
general applicability of a certain concept, irrespective of the type of picture, they
genuinely help advance the theory of pictorial representation. However, many
proposals are problematic, for while grammaticality in language is relatively
stable across text-genres, the grammar of images may well be considerably more
genre-dependent. For one thing, KvL take great risks in incorporating the
occasional film frame. A film frame tolerates even less decontextualization than
static pictures. Not only does a frame acquire meaning and significance only in
combination with text (dialogues, voice-overs) and non-verbal sounds (music,
sound-effects), its meaning crucially depends on the shot and the sequence in
which the frame occurs (and ultimately on the entire film). I have already referred
to this in connection with the shot from Ben Hur, but the analyses of a frame from
Bergman’s Through a Glass Darkly (figure 6.1, p. 182) cause uneasiness as well.
Many of the distinctions signalled by KvL are not intrinsic to this particular,
isolated, frame, but can be interpreted only by taking into account information
beyond the frame itself. More specifically, the significance of a certain spatial lay-
out in a frame can be gauged only when compared with those in other frames or
shots involving the same characters, and this presupposes studying an entire film’s
montage and editing patterns (Hurst, 1996: 126). That is, expectations on the basis
of what has preceded are as important as picture-immanent structures.
Another example of KvL’s too sweeping claims is their discussion of
sequences of three images, ‘triptychs’. They maintain that the middle one usually
‘mediates’ between the other two, and give some examples (p. 208). But as
Goodman demonstrates, many triadic structures, whether verbal, visual or
medially mixed ones, display a different structure, namely, that of a mini-
argument with the third item providing the conclusion, solution or punch-line.
Newspaper lay-outs often pictorially reinforce the ‘argument’ structure of three-
item constellations (Goodman, 1995: 150 ff.), and the prototypical three-part
cartoon has the joke’s apex in its last panel (Goodman, 1995: 159–60).
In short, KvL often too easily assume (a) that their examples are representative
and (b) that their personal interpretations have intersubjective validity. One of the
tasks of the project of developing a more refined and sophisticated visual
grammar is to be more specific about the differences between shared and non-
shared interpretations of (elements of) pictures. It seems to me that fruitful

Language and Literature 1999 8(2)


insights are to be gained from the following lines of research. In the first place,
the notion of text-external context (to be distinguished from text-internal context,
see Forceville, 1996: Ch. 4, et passim), in the widest sense of the word, needs to
be studied and theorized. Contexts not only affect the interpretation of images (as
they affect the interpretation of any type of texts); they often crucially co-
determine them. Irony, for instance, can only be detected against the background
of extra-textual background assumptions. More generally, what is needed is an
awareness of authorial intentions that underlie pictures, and ‘genre’ is a great help
in this respect. Interpretations of a picture will be considerably constrained by the
awareness that it belongs to a certain genre. Other aspects of text-external factors
that may influence interpretation pertain to the identity of the viewer. Gender, age
and cultural background may all play a role in this respect. That is, a text-
immanent analysis of pictures needs to be systematically complemented by
pragmatic analyses (cf. Pateman, 1980). This brings me to a second line of
necessary research: empirical testing. Hypotheses about the impact of variables of
genre and of audiences upon interpretation can and must be tested. Work done by
empiricists working on literary texts can help focus ideas on how this is to be
done (cf. Ibsch et al., 1991; Steen, 1994; Zwaan, 1993). Moreover, some
empirical work on the interpretation of various types of images has already been
done (Camargo, 1987; Forceville, 1995; Mick and Politi, 1989; Morley, 1983;
Petterson, 1995; see also the extensive bibliographies in Braden, 1996 and
Moriarty and Kenney, 1995).

6 A visual grammar as part of cognitive science

The fact that KvL try to adapt Hallidayan grammar to pictures reveals their
awareness that certain communicative concepts exceed the boundaries of a
specific medium. Their allegiance to ‘social semiotics’, however, leads them to
link their insights sometimes rather quickly to ideological criticism. This is
unfortunate, not only because they sometimes tailor practice to theory rather than
vice versa, but also because they risk appealing only to readers who share their
critical stance. This might obscure the fact that KvL’s book also contains much
that is of interest to those who are primarily concerned with an issue that, as far as
I am concerned, is more fundamental, namely, the relation between what goes on
in the mind and manifestations of this activity. Scholars interested in this issue,
scattered over many disciplines, tend to use the adjective ‘cognitive’ to
characterize their work, and slowly even ‘the humanities’ are beginning to latch
on. Thus, Cook (1994) shows how some version of schema theory as advocated in
AI studies is a necessary component in any theory that wants to explain how
literature is understood. Similarly, two leading film theorists, Bordwell and
Carroll, are trying to direct research interests away from ideologically oriented
film analysis (specifically psychoanalytical work) to what they call ‘cognitivism’:
‘A cognitivist analysis or explanation seeks to understand human thought,

Language and Literature 1999 8(2)


emotion, and action by appeal to processes of mental representation, naturalistic

processes, and (some sense of) rational agency’ (Bordwell and Carroll, 1996: xvi;
cf. also Bordwell, 1985; Carroll, 1996). My own work on pictorial metaphor,
while fully acknowledging that metaphors have a strong ideological dimension,
primarily tries to outline what forms pictorial metaphors can take, and suggests
how word & image texts guide, but cannot enforce, interpretations (Forceville,
1996). Sperber and Wilson’s (1986) relevance theory provides angles for such an
approach. It is partly because their theory is not limited to verbal communication
but allows for other modes of communication as well (see Forceville, 1996: Ch. 5)
that it is so exciting and fruitful. It encourages scholars to compare ‘texts’ from
different media because their intentions and effects can be subsumed under the
same general heading of manifesting the aim to convey certain assumptions (or
ideas, moods, feelings). Moreover, Sperber and Wilson’s theoretical distinction
between aspects of a message that are unequivocally communicated (‘strong
implicatures’) and aspects that are weakly, more ambiguously, conveyed (‘weak
implicatures’) may well be particularly pertinent to pictures, especially static
ones. And their insistence that relevance is always relevance to an individual
(1986: 142 ff.) should sound comforting and stimulating to humanist scholars
who, quite rightly, are deeply suspicious of any theoretical approach that claims to
predetermine or even predict all interpretational possibilities.
Sperber and Wilson’s ‘relevance to an individual’ has a clear echo in Johnson’s
‘meaning is always meaning for some person or community’ (1987: 177). The
cognitivist programme in the Lakoffian tradition (Gibbs, 1994; Johnson, 1987,
1995; Lakoff, 1987, 1993; Lakoff and Johnson, 1980; Turner, 1991) provides
good theoretical starting points for investigations of non-verbal, or multimodal,
manifestations of the mind’s workings. Conversely, the tenability of its central
theses (such as prototype theory, the idea that our way of conceptualizing is
ultimately guided by the human physique, the pervasiveness of the figurative in
so-called ‘literal’ language and thought) will have to be tested extensively in the
realm of the visual, an area that has hitherto completely been ignored by those
working in the Lakoffian framework. KvL’s book supplies various elements that
constitute potential bridges to the cognitivist programme.6 Their concept of the
‘interordinate’ level in hierarchies (p. 81) has clear parallels in the ‘basic level’
developed by Rosch and discussed in Lakoff (1987: 31 ff.). And KvL’s discussion
of top–down, left–right, centre–margin orientations invites theoretical cross-
fertilization with Johnson’s (1987) observations about the fundamentality of a
number of image schemata underlying human conceptualizing, such as ‘path’
(including up–down orientations), ‘cycle’, ‘link’, ‘balance’, ‘centre–periphery’.
Johnson’s claim that ‘image schemata are pervasive, well-defined, and full of
sufficient internal structure to constrain our understanding and reasoning’ (1987:
126) echoes precisely the kind of programme that I take KvL to pursue when they
try to formulate the ‘rules’ describing systematic relationships between thinking
and the production/reception of pictures.

Language and Literature 1999 8(2)


7 Concluding remarks

In this review article I began by giving an outline of KvL’s book and indicating
some of its strengths, and proceeded by voicing a number of serious criticisms of
its methodological weaknesses and ideological commitments. I ended by
embedding these views in broader concerns not necessarily shared by KvL
themselves. It might seem that my respect for, and excitement about, KvL’s book
have become somewhat buried under the criticisms. Let me therefore repeat that I
think that Reading Images is significant and innovative. KvL present a host of
concepts and tools for the analysis of pictures, many of them illuminating and
unexpected. The wealth of pictures and discussions in their attractively produced
book provides ample food for thought and further theorizing. Because of the way
they present their concepts, and the applications to specific pictures, their work
has the merit of being amply verifiable and falsifiable. As I explained, I am by no
means convinced of the general applicability of a number of their concepts, but by
making explicit claims they do open up opportunities for counterclaims, based on
other pictorial data and/or experimental research. Given its format, KvL’s study is
clearly intended to be used as a textbook, presumably at undergraduate level. In
view of its methodological deficiencies and strong ideological commitment, this
entails some dangers. Nonetheless, the gains are worth the risks, on condition that
KvL’s ideas are subjected to highly critical scrutiny: partly by juxtaposing the
book with more theoretically oriented approaches, some of which have been
suggested in this article, partly by systematic testing of the concepts against new
pictorial material.


I thank Leo Hoek, Elrud Ibsch, Lachlan Mackenzie, and Ed Tan (all Vrije
Universiteit Amsterdam) for their comments on earlier drafts of this review
article. The responsibility for its contents remains entirely mine.


1 The contrast recalls the difference between the (impertinent, colonizing) ‘gaze’ and the (dialogic)
‘glance’. First proposed by Norman Bryson, the distinction was popularized by Mieke Bal.
Surprisingly, KvL do not refer to Bal’s work on ‘gaze/glance’ here, although they include her
book in their bibliography – and curiously they cite the Dutch (Bal, 1990) rather than the more
widely accessible English version (Bal, 1991).
2 Note that what KvL subsume under the general heading of ‘modality’ here is what in Simpson
(1993: Ch. 3) is equivalent to one main type of modality out of four, namely, ‘epistemic
3 As indicated, KvL also use symbols in their diagram to indicate that certain dimensions can co-
occur in a single picture. But here, too, one wants to know under what conditions this is possible.
4 By contrast, Sonesson, who also works in a semiotic framework, is acutely aware of the

Language and Literature 1999 8(2)


theoretical threat of prototype theory to traditional semiotic accounts, and discusses prototype
theory at considerable length (Sonesson, 1988: 66 f.).
5 Moreover, verbal transitivity, which for KvL is the paradigm upon which they model pictorial
transitivity, is by no means a simple case. It is, as Hopper and Thompson (1980) show, a matter
of degree, and hence itself subject to prototype effects.
6 KvL are, however, not likely to sympathize with, or perhaps even accept as possible, cognitivists’
attempts to distinguish between what is more or less ‘neutrally’ there and any ideological
valuations that adhere to this more or less ‘neutral’ nucleus.


Arnheim, R. (1969) Visual Thinking. Berkeley, Los Angeles, London: University of California Press.
Aumont, J. (1997) The Image. Trans. by Claire Pajackowska. London: British Film Institute.
Originally published as L’Image (1990).
Bal, M. (1990) Verf en Verderf: Lezen in Rembrandt. Amsterdam: Prometheus.
Bal, M. (1991) Reading Rembrandt: Beyond the Word–Image Opposition. Cambridge: Cambridge
University Press.
Barthes, R. (1986/1964) ‘Rhetoric of the Image’, in R. Barthes, The Responsibility of Forms, trans. R.
Howard, pp. 21–40. Oxford: Blackwell.
Bordwell, D. (1985) Narration in the Fiction Film. London: Methuen.
Bordwell, D. and Carroll, N., eds (1996) Post-Theory: Reconstructing Film Studies. Madison:
University of Wisconsin Press.
Braden, R.A. (1996) ‘Visual Literacy’, Journal of Visual Literacy 16 (2): 9–83.
Camargo, E.G. (1987) ‘The Measurement of Meaning: Sherlock Holmes in Pursuit of the Marlboro
Man’, in J. Umiker-Sebeok (ed.) Marketing and Semiotics: New Directions in the Study of Signs
for Sale, pp. 463–83. Berlin: Mouton de Gruyter.
Carroll, N. (1996) Theorizing the Moving Image. Cambridge: Cambridge University Press.
Cook, G. (1992) The Discourse of Advertising. London: Routledge.
Cook, G. (1994) Discourse and Literature: The Interplay of Form and Mind. Oxford: Oxford
University Press.
Forceville, C. (1995) ‘IBM is a Tuning Fork – Degrees of Freedom in the Interpretation of Pictorial
Metaphors’, Poetics 23(3): 189–218.
Forceville, C. (1996) Pictorial Metaphor in Advertising. London: Routledge.
Gibbs, R.W., Jr (1994) The Poetics of the Mind: Figurative Thought, Language, and Understanding.
Cambridge: Cambridge University Press.
Goodman, S. (1995) ‘Triadic Pictures’, Chapter 4 in ‘Aesthetics and Consensus: Verbal and Visual
Poetics in Newspaper Discourse’, unpublished doctoral dissertation, University of East Anglia,
Norwich, England.
Hopper, P.J. and Thompson, S.A. (1980) ‘Transitivity in Grammar and Discourse’, Language 56(1):
Hurst, M. (1996) Erzählsituationen in Literatur und Film: Ein Modell zur Vergleichenden Analyse von
Literarischen Texten und Filmischen Adaptionen. Tübingen: Max Niemeyer.
Ibsch, E., Schram, D. and Steen, G.J., eds (1991) Empirical Studies of Literature. Amsterdam, Atlanta,
GA: Rodopi.
Johnson, M. (1987) The Body in the Mind: The Bodily Basis of Meaning, Imagination and Reason.
Chicago, IL: University of Chicago Press.
Johnson, M. (1995) ‘Introduction: Why Metaphor Matters to Philosophy’, Metaphor and Symbolic
Activity 10(3): 157–62.
Kress, G. and Van Leeuwen, T. (1990) Reading Images. Geelong: Deakin University Press.
Kress, G. and Van Leeuwen, T. (1996) Reading Images: The Grammar of Visual Design. London:
Lakoff, G. (1987) Women, Fire and Dangerous Things: What Categories Reveal about the Mind.
Chicago, IL: University of Chicago Press.
Lakoff, G. (1993) ‘The Contemporary Theory of Metaphor’, in A. Ortony (ed.) Metaphor and
Thought, pp. 202–51. Cambridge: Cambridge University Press.
Lakoff, G. and Johnson, M. (1980) Metaphors We Live By. Chicago, IL: University of Chicago Press.

Language and Literature 1999 8(2)


Mick, D.G. and Politi, L.G. (1989) ‘Consumers’ Interpretations of Advertising Imagery: A Visit to the
Hell of Connotation’, in E. Hirschman (ed.) Interpretive Consumer Research, pp. 85–96. Provo,
UT: Association for Consumer Research.
Mitchell, W.J.T. (1986) Iconology: Image, Text, Ideology. Chicago, IL: University of Chicago Press.
Moriarty, S.E. and Kenney, K. (1995) ‘Visual Communication: A Taxonomy and Bibliography’,
Journal of Visual Literacy 15(2): 7–156.
Morley, D. (1983) ‘Cultural Transformations: The Politics of Resistance’, in H. Davis and P. Walton
(eds) Language, Image, Media, pp. 104–19. Oxford: Blackwell.
Pateman, T. (1980) ‘How to Do Things with Images: An Essay on the Pragmatics of Advertising’ in T.
Pateman Language, Truth and Politics, pp. 215–37. Lewes, East Sussex: Jean Stroud.
Petterson, R. (1995) ‘Associations from Pictures’, in D.G. Beauchamp, R.A. Braden and R.E. Griffin
(eds) Imagery and Visual Literacy, pp. 136–44. Blacksburg, VA: International Visual Literacy
Simpson, P. (1993) Language, Ideology and Point of View. London: Routledge.
Sonesson, G. (1988) Pictorial Concepts. Lund: Lund University Press.
Sperber, D. and Wilson, D. (1986) Relevance: Communication and Cognition. Oxford: Blackwell.
Steen, G.J. (1994) Understanding Metaphor in Literature: An Empirical Approach. London:
Thompson, J., ed. (1996) Towards a Theory of the Image. Maastricht: Jan van Eyck Akademie.
Turner, M. (1991) Reading Minds: The Study of English in the Age of Cognitive Science. Princeton,
NJ: Princeton University Press.
Zwaan, R.A. (1993) Aspects of Literary Comprehension: A Cognitive Approach. Amsterdam,
Philadelphia, PA: Benjamins.


Charles Forceville, Vrije Universiteit Amsterdam, Faculty of Arts, P.O. Box 7161, 1007 MC
Amsterdam, The Netherlands. [email:]

Language and Literature 1999 8(2)

Language and Literature 1999 8(2)


Figure 1 Analytic image structures. Reprinted with permission from: Gunther Kress and Theo Van Leeuwen, Reading Images:
The Grammar of Visual Design. London: Routledge, 1996, p. 107