You are on page 1of 301

Multimodal transcription and text analysis

Equinox Textbooks and Surveys in Linguistics


Series Editor: Robin Fawcett, Cardiff University
Also in this series:
Language in Psychiatry by Jonathan Fine
Meaning-centred Grammar by Craig Hancock
Intonation in the Grammar of English by M. A. K. Halliday and William S. Greaves
Forthcoming titles in the series:
Text Linguistics: the how and why of meaning by Jonathan Webster
The Rhetoric of Research: a guide to writing scientific literature by Beverly Lewin
Multimodal Transcription and Text Analysis

A multimedia toolkit and coursebook

Anthony Baldry and Paul J. Thibault

Equinox
Published by

Equinox Publishing Ltd

UK: Equinox Publishing Ltd., 1 Chelsea Manor Studios, Flood Street, London
SW3 5SR
USA: DBBC, 28 Main Street, Oakville, CT 06779

www.equinoxpub.com

Multimodal transcription and text analysis by Anthony Baldry and Paul J. Thibault

First published 2006


Reprinted (with minor revision) 2010
© Anthony Baldry and Paul J. Thibault 2006

All rights reserved. No part of this publication may be reproduced or transmitted


in any form or by any means, electronic or mechanical, including photocopying,
recording or any information storage or retrieval system, without prior permission
in writing from the publishers.

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library.

ISBN 1-904768-06-7 (hardback)


ISBN 1-904768-07-5 (paperback)

Library of Congress Cataloging-in-Publication Data


Baldry, Anthony.
Multimodal transcription and text analysis / Anthony Baldry and Paul J. Thibault.
p. cm.
Includes bibliographical references and index.
ISBN 1-904768-06-7 -- ISBN 1-904768-07-5 (pb.)
1. Transcription. 2. Multimedia systems. 3. Discourse analysis. I. Thibault, Paul
J. II. Title.
P226.B35 2005
401’.41--dc22
2005047216

Printed and bound in Great Britain


TABLE OF CONTENTS
Foreword xi

Acknowledgements xiii

Preface xv

Chapter 1: Introduction: multimodal texts and genres 1


1.0 Introduction 1
1.1 Multimodal texts and the resource integration principle 4
1.1.1 Resource integration and the transcription of printed cartoons 7
1.1.2 Multimodal transcription of cartoon narratives and the question of the
metafunctions 16
1.1.3 Sources of meaning in multimodal texts 17
1.2 Cluster analysis and the transcription of static multimodal texts 21
1.2.1 Multimodal transcription and questions of genre 30
1.3 Textual properties of short printed cartoons 34
1.4 Printed advertisements and their exemplification of the metafunctions 38
1.4.1 Metafunctions in relation to genre 38
1.5 Web pages and their transcription 44
1.6 Film texts and their transcription 46
1.6.1 The soundtrack 51
1.7 Conclusion 54

Chapter 2: The printed page 57

2.0 Introduction 57
2.1 The printed page and its evolution 57
2.2 The resource integration principle in the scientific page 61
2.2.1 How can we study tables systematically? 64
2.2.2 How does the page communicate? 68
2.3 Science textbooks and multimodal meaning making 70
2.4 Visual, verbal and actional semiotic resources in a table 71
2.4.1 Visual and verbal resources 71
2.4.2 Thematic development of the page: hierarchies of textual periodicity 74
2.4.3 Actional semiotic resources 78
2.5 Blood under the microscope: multimodality in a photographic display 78
2.6 Integration of scientific photographs and verbal text 80
2.6.1 The textual metafunction 80
vi

2.6.2 The ideational (experiential and logical) metafunctions 82


2.6.3 The interpersonal metafunction 89
2.7 The Italian texts: differences with respect to the Australian texts 91
2.7.1 Reading paths 91
2.7.2 The use of colour 92
2.8 Expertise and authority vs. comprehensibility and accessibility 93
2.8.1 Linguistic resources 96
2.8.2 Visual resources 99
2.9 Conclusion 102

Chapter 3 The web page 103

3.0 Introduction 103


3.1 Page or screen? 105
3.2 Decoupling of material support and information on the computer screen 109
3.3 The relationship between web page, website, web users and web genres 113
3.4 The home page 118
3.5 The Nasa Kids home page 120
3.6 Creating a hypertext pathway 126
3.7 The British Museum Children’s COMPASS website 130
3.7.1 Children’s COMPASS home page: description of multimodal objects 130
3.8 A multimodal hypertextual thematic formation: daily life in Asia 136
3.8.1 Thematic system analysis: preliminary observations and an example 136
3.8.2 Multimodal thematic system development along a hypertext pathway 140
3.9 The action potential of hypertext objects 146
3.9.1 Experiential meaning 147
3.9.2 Interpersonal meaning . 148
3.9.3 Textual meaning 153
3.10 The virtual world of hypertext 156
3.11 Community or social network of users and practices? 161
3.12 The WWW as technological infrastructure and meaning-making resource 162
3.13 Conclusion 164

Chapter 4 Film texts and genres 165

4.0 Introduction 165


4.1 The Eskimo text: a macro-analytical approach to transcription 167
4.2 The Westpac text: an integrated approach to transcription 174
4.3 Etic and emic criteria in multimodal transcription 181
4.4 Phases, subphases and transitions 184
4.5 Column 1: Row number and time specification 186
4.6 Column 2: The visual frame 187
4.6.1 Visual frames and shots 187
4.6.2 Information structure: Given and New 189
4.6.3 Sequencing and relations of interdependency between shots 190
vii

4.7 Column 3: The visual image 191


4.7.1 Specifying visual information 191
4.7.2 Perspective 195
4.7.3 Distance 195
4.7.4 Visual collocation 198
4.7.5 Visual salience 199
4.7.6 Colour 199
4.7.7 Coding orientation 200
4.7.8 Visual focus or gaze of participants 200
4.8 Column 4: Kinesic action 202
4.8.1 The meaning of movement 202
4.8.2 Interpersonal modification of movement 206
4.8.3 General observations on the notation of movement 209
4.9 Column 5: The soundtrack 209
4.9.1 Integrating auditory phenomena 209
4.9.2 Sound acts and sound events 210
4.9.3 Dialogic relations among sound events 211
4.9.4 A brief comment on the notation of the soundtrack 214
4.9.5 The rhythm of sound events 214
4.9.6 Accented rhythmic units 216
4.9.7 Rhythm groups 216
4.9.8 Degree of loudness 217
4.9.9 Duration of syllable, musical note, sound event 218
4.9.10 Tempo 219
4.9.11 Continuity and pausing 220
4.9.12 Dyadic relations among auditory voices: sequentiality, overlap, turntaking 220
4.9.13 Vocal register 221
4.10 Column 6: Metafunctional interpretation 222
4.10.1 Metafunctional notation in relation to Column 6 222
4.11 Display and depiction: two sides of the same semiotic coin in visual texts 223
4.11.1 Multimodal discourse analysis: the Mitsubishi Carisma advertisement
revisited 223
4.11.2 From delimited optic array to visual text: the stratification of the visual sign 223
4.11.3 Transformations in the optic array: some examples from the Mitsubishi
Carisma text 228
4.11.4 Visual transitivity frames and experiential meaning 230
4.11.5 Identity chains in visual semiosis 232
4.11.6 Dependency relations in the Mitsubishi Carisma text: implications for
visual texts 234
4.11.7.Some sources of coherence in the Mitsubishi Carisma advertisement: Phase 1 239
4.11.8. Counter-expectancy and hypertext in the Mitsubishi Carisma advertisement 242
4.12 Conclusion: the shape of things to come 248
viii

References 251

Appendices
Appendix I I - XII
Appendix II 261

Index 265

List of Insets relating to Keypoints

Inset 1: Context of situation and Context of culture 2


Inset 2: Text 3
Inset 3: The resource integration and meaning-compression principles 18-19
Inset 4: Metafunctions 22-23
Inset 5: Clusters and cluster analysis 31
Inset 6. Bakhtin’s distinction between primary and secondary genres 43
Inset 7: Phases and their transcription 47
Inset 8: Intertextuality 55
Inset 9: Projection 101
Inset 10: The trajectory 116
Inset 11: Visual transitivity frames 122
Inset 12: Scalar levels 144
Inset 13: System and instance 172-173
Inset 14: Material object text and semiotic action text: two sides of the same
textual coin 175-177
Inset 15: Gibson’s optic array 192
Inset 16: Perspective in sound: Van Leeuwen on Figure, Ground and Field 212
Inset 17: Recontextualising social practices 213
Inset 18: Stratification 236-7
Inset 19: Negotiation 245-7

List of Figures

Figure 1.1: Codeployment of space and hand-arm movements in a car


advertisement 5
Figure 1.2: The two Marmaduke cartoons 8
Figure 1.3: Cluster transcription of narrative event structure in a cartoon 9
Figure 1.4: A London Transport leaflet (top part: cover side; bottom part:
reverse side) 25
Figure 1.5a: A macro-transcription of part of the LT leaflet 28
Figure 1.5b: A micro-transcription of the cover page of the LT leaflet 29
Figure 1.6a: Parts of a leaflet for the Chesapeake Bay tunnel-bridge complex 32
Figure 1.6b: Clusters making up the reverse cover of the leaflet 33
Figure 1.7: Lupo Alberto and attitudinal stance 36
ix

Figure 1.8: The metafunctions .... and a bear hug from ‘Boo’ 40
Figure 1.9: A mini-genre analysis of the Boo Bear text 41
Figure 1.10: Intertextual relationships in websites: the role of frames 45
Figure 2.1: Leaf movements: in Darwin (top); in a modern textbook (bottom) 59
Figure 2.2: A typical multimodal page in Marx’s Capital; and the top part
of an equivalent page in The Economist 62
Figure 2.3: An example of the use of the table in Marx 66
Figure 2.4: A typical use of charts in The Economist 67
Figure 2.5: An example of the use of vectors 68
Figure 2.6: Table and related verbal text pages 60 and 61 in Australian Biology 72
Figure 2.7: Blood under the microscope and related verbal text (pages 62-63) 73
Figure 2.8: Relation of elaboration: attribution between visual image and verbal label 87
Figure 2.9: Visual-verbal thematic formation; white corpuscles 88
Figure 2.10: 1982 Italian text 94
Figure 2.11: 1985 Italian text 95
Figure 2.12: The cline between ideational and interpersonal sourcing 97
Figure 2.13: Sequence of clauses 98
Figure 2.14 Sequence of clauses highlighting interpersonal negotiation 99
Figure 3.1: Web page genre schema 115
Figure 3.2: Nasa Kids Home Page: cluster analysis 121
Figure 3.3: Nasa Kids home page (focus on NasaToons object illuminated) 124
Figure 3.4: A Far Out Pioneer page 126
Figure 3.5: NasaToons menu of options page 128
Figure 3.6: The British Museum’s Children's COMPASS home page:
cluster analysis 131
Figure 3.7: The British Museum : Search the Museum page 132
Figure 3.8: Daily life in Asia page 133
Figure 3.9: The web page: Women sewing, a print 134
Figure 3.10: Covariate tie between verbal and visual semiotic modalities, creating
a cross-modal thematic relation in an airline magazine text 139
Figure 3.11: Search subcluster before and during mouse rollover 150
Figure 3.12: Interpersonal meaning potential of linked objects; three simultaneous
parameters 151
Figure 3.13: Objects that are clicked on to reveal a link 153
Figure 3.14: Layered structure of textual objects 155
Figures 4.1a and 4.1b: Transitivity frames in the Eskimo advertisement 168-9
Figure 4.2: A (revised) preliminary network for gaze in visual texts;
primary delicacy only 171
Figure 4.3: Waves relating to the soundtrack in the first phase of the Westpac
advertisement 183
Figure 4.4: Camera position relative to depicted world of image and visual
kinaesthesis of viewer: main options 194
Figure 4.5: System network of basic options for gaze in video texts 196
Figure 4.6: Notational conventions used in the transcription of the soundtrack 215
x

List of Tables

Table 1.1: Examples of possible temporal and causal expansions of the event
sequence in the dog-chews-shoe cartoon 12
Table 1.2: Example of the temporal sequencing of events in the dog-chews-shoe
cartoon showing change (two shoes then one shoe) resulting from
the transition from one moment to the next 13
Table 1.3: Three levels of semiotic organisation in the dog-chews-shoe
cartoon 14
Table 1.4: Distribution of semiotic resources in dog-chews-shoe cartoon 15
Table 1.5: The Mitsubishi Carisma text: Summary analysis of shots,
phases and macrophases 48
Table 1.6: The Mitsubishi Carisma advertisement: thematically salient
transitivity frames 49
Table 1.7: Phonetic and prosodic features in man’s spoken voice 52
Table 1.8: Some salient meaning oppositions indexed by contrasting phonetic
and prosodic features in the speaking voices of the man and the woman 53
Table 2.1: Reconstruction of ellipted clauses relating to red corpuscles in the first
column of the table in Figure 2.6 75
Table 2.2: The multimodal thematic development of the page about red and
white blood corpuscles; integrating table and verbiage 76
Table 2.3: The co-articulation of the page into subregions showing
top-bottom and left-right organisation 82
Table 3.1: Types of web pages according to social activity 104
Table 3.2: Transcription of an unfolding hypertext pathway 129
Table 3.3: British Museum Children’s COMPASS activity sequence 135
Table 3.4: Interaction potential of objects on the Nasa Kids home page 152
Table 3.5: Nasa Kids home page: comparison of three objects 154
Table 3.6: Textual links and functions in linked objects on Nasa Kids home page 156
Table 4.1: Stratification of video texts, showing both the relationship between the
expression (display) and content strata (depiction) of visual signs 226
Table 4.2: Some transformations in the delimited optic array of Phase 1 of the
Mitsubishi Carisma advertisement 229
Table 4.3: Some visual process types and their modes of realization in the
Mitsubishi Carisma advertisement 231
Table 4.4: Visual participant chains in Phase 1 of the Mitsubishi Carisma
advertisement 233
Table 4.5: Dependency relation between Shots 1 and 2 in Phase 1 of the
Mitsubishi Carisma advertisement 238
Table 4.6: Three sources of visual-textual ideational coherence in the
Mitsubishi Carisma advertisement: Phase 1 only 242
Foreword
Multimodal Transcription and Text Analysis is a book that many of us have been looking for:
a readable “how to” manual for analyzing images, websites, video and film, cartoons, magazine
layouts, advertisements, textbooks, television programs, and computer games. Paul Thibault and
Anthony Baldry strike just the right balance between rich examples and accessible explanations
of the concepts that lie behind their practical methods. This book is, however, far more than a
“how to” manual: it is a comprehensive introduction to the field of multimedia analysis.
Why is multimedia analysis so important today? Partly because multimedia themselves,
combining language with visual images, animations, video, music, and sound effects (at least!)
are becoming the dominant forms of communication in our society, not just for commercial
purposes, but also in our daily lives and personal activities. No one doubts that this is because
computers make working with multimedia (almost) easy and (almost) cheap. But that is using
multimedia. Why analyze multimedia?
Because today we understand, as never before, the power of media to influence how we
think and what we believe. We need to know how media produce their effects, how we interpret
and make sense of them and with them, and how to design media that will both influence people
and empower them to create and express their own insights and points of view. Multimedia
analysis is the foundation for both designing and criticizing media and their messages.
Multimodal Transcription and Text Analysis develops a systematic approach to under-
standing how combinations of words, images, and sounds, whether sitting on a page or flashing
past in real time, make more meanings together than any one of them can make alone. Baldry
and Thibault have been working towards this for years. They belong to a growing community
of researchers in media studies, communication, education, linguistics, semiotics, sociology,
anthropology, and political science who have been developing methods to analyze media for the
last two decades or longer. Their approach has its origins in functional linguistics, an alternative
to the very abstract and formal theories of syntax that most people still associate with linguistics.
Thibault studied with one of the great linguists of the last half-century, Michael Halliday, who
developed a method of analyzing purely verbal text in terms of the available choices we have
in putting words together, and the differences in meaning that different choices of wording
make. Thibault and Baldry, like Theo van Leeuwen, Gunther Kress, myself and many others,
have now found ways to generalize Halliday’s method to combinations of words with images,
sounds, actions, and more. That is what you will discover in this book.
Read this book and use it! Take advantage of the online web-based analysis tools that
the book prepares you to use. That is the best way to understand why transcription is not just
a boring task to be left to someone else (if you can afford to pay them), but the place where
theory meets data head-on and multimedia materials are re-framed for analysis in the way that
you decide. It is also the best way to start understanding what media are already doing to you,
and what you can do with them for your own purposes. That’s important, and so is this book.

Jay Lemke,
Professor, Educational Studies,
University of Michigan, Ann Arbor
Acknowledgements

The authors and publisher wish to thank the following individuals and agencies for
permission to use copyright material. All possible care has been taken to trace and contact
the owners and copyright holders of the materials included and to provide full
acknowledgement of the use we have made of them.
All the texts we have presented have been analysed in a way which we feel is entirely
supportive of the goals of the various authors. Despite intense efforts, we have been
unable to trace and make suitable arrangements with the copyright holders of some of the
texts reproduced in this book. The authors and publisher would like to hear from them so
that this can be rectified. Below we acknowledge our debt and thanks to those copyright
holders whom we have managed to contact.
Chapter 1 investigates a variety of texts including cartoons, leaflets and websites. The
Marmaduke cartoons in Figures 1.2 and 1.3 have been reproduced with the permission of
United Media (UFS Inc.), New York and their Italian distributors Adnkronos, while the Lupo
Alberto cartoon in Figure 1.7 has been reproduced with the permission of McKenzie
Syndicate, MCK S.r.l, Milan. The LT leaflet in Figures 1.4, 1.5a and 1.5b which dates from
January 2000 is reproduced with kind permission of Transport for London. Equally, thanks
go to the Governing Body of the Chesapeake Bay Bridge-Tunnel Commission for permission to
reproduce parts of a Chesapeake Bay Tunnel-Bridge leaflet in Figures 1.6a and 1.6b. Finally,
the British Library has given permission to reproduce a website page from their Leonardo
Notebook online presentation (see Figure 1.10).
Chapter 2 investigates the printed page, the scientific page in particular, from
various standpoints, including the evolutionary one. Services, (Special Collections) University
College London are thanked for assistance regarding provision of the originals of Darwin’s
The Power of Movements in Plants in the top part of Figure 2.1. W.H. Freeman and
Company/Worth Publishers are thanked for permission to reproduce the text in the bottom
part of Figure 2.1 which is from Helena Curtis’ Biology. Similarly, The Economist is thanked
for permissions relating to Figures 2.2. and 2.4. Although we have been unable to trace the
authors of the text reproduced in Figure 2.10, which is taken from an Italian school science
textbook, we owe special thanks to Emilio Delmastro, who supervised the book’s produc-
tion and layout, for his kindness in allowing us to reproduce the graphic layout of the page
in question. Similarly, we wish to thank Carlo Signorelli Editore for permission to reproduce
the text in Figure 2.11 taken from a 1985 Italian school science textbook. We also thank
Prof. GianLuigi Borgato of Unipress, Padua for permission to reproduce much of the
second half of Chapter 2 which was originally published as part of the CITATAL project
in a volume edited by a team from the University of Padua led by our colleague and friend
Carol Taylor Torsello, whom we thank profusely.
Chapter 3 investigates two children’s websites. We thank the British Museum for per-
missions relating to Figures 3.6, 3.7, 3.8, 3.9, 3.13. We would be glad to hear from the copy-
right holders of the NASA Kids website, whom we have been unable to contact despite
repeated efforts in relation to Figures 3.2, 3.3, 3.4 and 3.5. The snapshots which appear at
the bottom of Inset 11 have been supplied on condition of anonymity. We wish to thank
the anonymous donor whose snow removal efforts have not gone unrewarded.
Chapter 4 investigates three TV advertisements. The Eskimo, Mitsubishi Carisma and
Westpac commercials. The Eskimo text was shot in Whitehorse, Yukon Territory in 1997.
The grandfather and the young boy were respectively played by Cliff Solomon and Yudii
Mercredi. We thank the DDB Agency, Milan, Italy and the McKinney Agency, Durham,
North Carolina for permission to reproduce stills from this advert and their kind assistance
in contacting the Vancouver offices of The Characters Talent Agency and Kirk Talent, in
relation to permissions given by the actors. We also thank Auto-Germa, Verona, Italy, the
distributors for Audi cars in Italy for whom the advert was commissioned. With regard to
the Westpac text, the Westpac Banking Corporation has kindly agreed to the reproduction of
frames from a 1983 advertisement that celebrates Australia’s identity as a nation and
Westpac’s role in consolidating this identity. The authors are grateful to Equinox, the pub-
lishers, for agreeing to reproduce frames from the Westpac advertisement in Appendix I in
colour. This has enabled us to provide a detailed description of the nature and functions
of colour in what is, from every standpoint, a superbly crafted advertisement. The
Mitsubishi Carisma text in Appendix II is an entertaining linguistic, visual and musical
spoof of James Bond films. Despite many efforts, we have not been able to trace the actors
or the agency who produced this delightful commercial which we recorded from British
TV in the late 90’s and wish to hear from them.
We wish to take this opportunity to thank the many advertising agencies and com-
panies, in particular in Italy, who have provided us with a constant supply of TV com-
mercials and permissions, mainly for cars and drinks, for many years. Without this support,
this book would have been virtually impossible and research and teaching in relation to
multimodal texts would have been so much the poorer.
Finally, we wish to thank Gino Palladino, his brothers and the staff of Palladino
Editore, Molise, Italy for their permission to reproduce various parts of Multimodality and
Multimediality in the distance learning age, a thousand copies of which were printed and pub-
lished by them in 2000. This volume, now a collector’s item, was the very first book on
multimodality to appear entirely in colour at a price that defies belief. We have fond mem-
ories of the times we spent in Palladino’s ultra-modern printing works tucked away in the
mountains of the Molise region in Central Italy. Multimodalists everywhere will be for ever
grateful to them for their courage and excellence in colour printing. We also wish to thank
Nicola Prozzo of IRRE Molise for his dedication and unfailing kindness in helping us to
further the multimodal cause, in a whole series of ways: photo calls, logistics, telephone
contacts, computer services and video recordings. To Gino and Nicola and our many
friends in the Molise region our heartfelt thanks.
Preface
The study of multimodal texts and multimodal meaning-making practices has
developed and matured considerably since the early 1990’s. Unlike the pioneering
works of an earlier generation (e.g. Bateson, Birdwhistell, Scheflen), who were con-
cerned above all with behavioural and paralanguistic units of various kinds (e.g.
gesture, movement, posture), with a concomitant focus on the material dimension
of the behavioural units so described, the current focus is on both the material
dimension and the meaning in a unified and semiotically informed perspective. The
focus has shifted to the multimodal text as the site of meaning-making activity.
The present book participates in this shift.
Our analyses and transcriptions in this book are concerned, for example,
with what a particular pattern of movement means as a form of action involving
participants and taking place in a particular setting, or with how a particular choice
of colour, in combination with selections of other features, indexes a particular
attitudinal or evaluative stance. The examples can easily be multiplied and we will
leave further discussion of these to the chapters that follow. The point is that
specific choices and combinations of choices – e.g. movement, colour, and so on
– realise or express meanings (e.g. actions, evaluations) in multimodal texts. The
focus is on the meaning of different kinds of units and their functions in larger-
scale patterns of discourse organisation that cannot be described in terms of
small-scale units per se. By the same token, we also explore the importance of the
material dimension of these texts as making its own distinctive and important con-
tribution to the overall meaning-making process.
This book brings together our research efforts and findings on multimodal
text analysis and transcription and proposes a novel and distinctive approach which
is both meaning-based and functional. We aim to present a systematic account of
what multimodality is in relation to contemporary discourse practices in a variety
of social and cultural contexts. The term multimodality does not designate a pre-
given entity or text-type. Rather, it is a diversity of meaning-making activities that
are undergoing rapid change in the contemporary cultural context. Moreover, the
concept of multimodality is a useful yardstick for measuring and assessing the
diversity of ways in which texts and their associated meaning-making practices are
the results of the ways in which semiotic resources of various kinds work in
partnership to create the meanings that we attribute to texts. Multimodality there-
fore invites us to reassess many older assumptions and prejudices at the same time
that it opens up new fields of enquiry and understanding.
The term multimodality covers a diversity of perspectives, ways of thinking
and possible approaches. It is not a single principle or approach. It is a multipurpose
toolkit, not a single tool for a single purpose. Multimodal text and discourse
xvi Multimodal Transcription and Text Analysis: Preface

analysis currently informs and is shaping work in Critical Discourse Analysis,


Ethnographically-based Discourse Analysis, Genre Analysis, Mediated Discourse
Analysis, Systemic-functional Discourse Analysis, among others. Moreover, it is
being applied in insightful ways to a wide variety of discourse genres in many
different spheres of social life. We hope that our book has something of interest
and relevance to say to all of these approaches as well as to others not mentioned
here.
We propose the present book as a contribution to the development of a
theoretical model and an analytical approach which is both functional and
meaning-based. At the same time, it demonstrates detailed analyses and
interpretations of a range of multimodal texts and genres in relation to their social
and cultural contexts. For the most part, the texts that form the basis of our study
are texts that are in some way inscribed on, or projected onto, a technologically
prepared surface such as the printed page, the computer monitor or the television
screen. We do not explore in this book discourses that occur in various kinds of
natural settings such as classrooms, the workplace, museums and so on. This
limitation is dictated both by the tyranny of space imposed by publication and our
particular desire to focus on media texts of various kinds. However, we believe that
the toolkit proposed here can be adapted to these forms of multimodal discourse
analysis without too much difficulty.
The transcription and analysis of multimodal discourse events and texts are
closely related notions. Transcription is a way of revealing both the codeployment
of semiotic resources and their dynamic unfolding in time along textually con-
strained and enabled pathways or trajectories. Analysis synthesises the results of
transcription in order to ground statements about textual meaning in a principled
and replicable way. Transcription is itself a form of analysis: it is a textual record
of the attempts we make to systematize and unpack the codeployment of the
semiotic resources and their unfolding in time as the text develops. Transcription
also prepares the way for other forms of analysis which are essential for the
detailed and systematic historical or other comparison of texts, the study of their
genre features, the intertextual relations they take part in, the relations between
different analytical units and different levels of textual organisation, and the coding
of texts for the purposes of multimodal concordancing. It is in this respect that we
propose the present book as the development of a systematic, though far from
complete, model of multimodal text transcription and analysis.
A number of key concepts and the relations between them inform this work,
including some innovations that we present here for the first time. Without trying
to define these here, these concepts include clusters and cluster analysis, functions,
metafunctions, phases, meaning-making trajectories, resource systems, resource integration
and scalar levels. These and other concepts introduced in the book are our tools and
together they form our toolkit. At times we may use this or that tool for a particular
purpose, but on the whole we seek to emphasise that each of these tools is part of
an overall kit and that it is the toolkit as a whole along with the appropriate
techniques of tool use that are the tools of the text analyst’s trade. Rather than pre-
Multimodal Transcription and Text Analysis: Preface xvii

senting individual tools one-by-one in a step-by-step manner, we have preferred a


text-centred approach in which the toolkit as a whole is drawn on and put to use
for the purposes for which it was designed – multimodal transcription and text
analysis. This emphasis reflects our primary interest in showing how different
semiotic resources work in partnership in the meaning-making process on the
discourse level of organisation in multimodal texts.
Each of the four chapters puts these and other concepts to work in a variety
of ways and on a range of texts representing different multimodal discourse
genres. Chapter 1 sets out the basic approach in ways that should ease the reader
into the more complex and detailed analyses in the chapters that follow. This first
chapter applies the model to a variety of texts in the form of relatively focused
analyses of specific features and problems. One innovation in this chapter is the
exploration of the way the discourse stratum is developed in visual texts such as
cartoons. Chapter 2 takes the printed page as its principal focus, in particular the
scientific page. A particular concern of this chapter are the ways in which the page
is a multimodal textual unit which integrates linguistic, visual, actional and other
resources on different scalar levels of organisation. This chapter provides a
detailed metafunctional account of these processes of integration with reference
to examples from a number of different historical periods. Chapter 3 looks in detail
at a number of examples of web pages targeted at children. In this chapter, we
explore in detail the question of meaning-making trajectories and how the semiotic
and technological resources of the web afford the user’s construction of hypertext
pathways that link web pages and websites in complex ways not always predicted
by stable genre conventions. Our interest lies in the website as a form of action
potential. Chapter 4 examines some instances of television advertisements and
develops detailed transcription techniques for the further analysis of entire adver-
tisements as a form of dynamic text. In this chapter, we return to, and further
develop, some key questions about the discourse level of organisation that we first
explored in Chapter 1. Chapter 4 also develops some specific ideas concerning the
expression (material) stratum of multimodal texts. A further feature of the book is
the use of Insets throughout in order to highlight a number of key concepts in ways
we hope will encourage further reading and discussion.
We have subtitled this book A multimedia toolkit and coursebook for two main
reasons. First, we have emphasised the close integration of multimodal text
analysis, multimedia technology and the multimodal nature of transcription itself
in our attempts to find ways of talking about multimodal texts. Secondly, the book
is proposed as a series of practical applications of the analytical and descriptive
principles we develop in relation to a diverse range of texts. At all stages, we strive
for an integrated approach to textual analysis and transcription, rather than one
which pulls out and isolates single features and focuses on them. The emphasis is
on the thick textual dimension of the meaning-making process. In such an
approach, complexity quickly comes to the fore in ways which mitigate against
atomistic and piecemeal solutions based on simplicity. Once again, the toolkit is our
informing metaphor: in order to deal with the complexity and diversity of
xviii Multimodal Transcription and Text Analysis: Preface

multimodal texts, a set of tools is needed that are kept together for this purpose as
part of an overall kit. In this spirit, we invite our readers to take up and further
develop these techniques and principles for their own purposes, modifying and
adding to the toolkit in the process.
We wish to extend a special debt of gratitude to Malcolm Coulthard, Michael
Halliday, Ruqaiya Hasan, Jay Lemke, Ole Letnes, Eva Maagerø, Jim Martin, Kay
O’Halloran, Maria Pavesi, Carol Taylor Torsello, Chris Taylor, Elise Seip Tønnessen,
Gordon Tucker, Theo van Leeuwen and Eija Ventola for critically important
discussions as well as for their encouragement and support. Alessandra Varasi’s
unfailing professionalism and dedication to the task ensured that the very highest
standards were maintained during the final and often difficult preparation of the
manuscript with its rather special and demanding technical requirements. Grazie di
cuore, Alessandra! To the Equinox team, Janet Joyce, Val Hall and David Graddol, a very
special thanks for their commitment to this project and their willingness to provide
constant editorial support and advice whenever called upon. We also acknowledge the
generosity of the many students and colleagues, too numerous to mention here, who
volunteered their time and energy to read the manuscript and to offer many useful
suggestions and corrections that have helped us to improve the quality and readability
of the final text. A special word of thanks in this respect goes to Claire Archibald,
Patti Grunther, Sheila McVeigh and Robert Ponzini. To Maggie and Marisa, a very
special thanks for making it all possible in more ways and in more modalities than we
could possibly do justice to! Finally, texts are always the products and/or records of
activities at the same time that they constitute and organise the potential for other
activities. The present book is no exception. We hope that the fruits of our own
activities, as documented here, will encourage others to explore and to apply this
many-sided and always fascinating area of research to their own areas of interest in
the multimodal analysis of discourse in all its manifestations.
28th April 2005, Anthony Baldry and Paul J. Thibault

Preface to this reprint


This reprint of our volume reflects the growing strength of online website analysis.
In keeping with social networking principles, readers are invited to explore and
experiment with the resources available on the mcaweb.unipv.it website. These include
the colour version of the Westpac text, also published in black and white as Appendix
I of this volume, and prior to that as part of Paul J. Thibault (2000a). The website also
includes an area where readers can both examine transcriptions and analyses
produced by other readers and upload their own. The continued validity of this
volume and its supporting website over time is the basis for the Equinox English
Linguistics and English Language Teaching series of which we are the editors, further
details of which can be found on the Equinox Publishing website as well as on the
mcaweb.unipv.it site.
12th December 2009, Anthony Baldry and Paul J. Thibault
Chapter 1

Introduction: multimodal texts and genres

1. 0. Introduction

What is multimodality? What is the resource integration principle? What is meaning


compression? What are clusters and phases? What are transitivity frames? What are
metafunctions? What is context of culture? Finally, what are multimodal texts and
genres? In this chapter we set out to define some of the basic terms and concepts used
in this book. Most will be completely new to readers without a background in
functional and text linguistics. Those readers who do have such a background will,
however, find that the effect of viewing texts from a multimodal perspective often
means that a traditional term or a well-established definition will come to be seen in a
different light, one that may well encourage a critical rethinking and reformulation of
the relationship between texts and society. Indeed, one of the underlying themes of
this book relates to the all-important question as to whether or not a single theoretical
framework can in fact adequately describe the very different semiotic systems (lan-
guage, gesture, music, movement etc.) that multimodal meaning making entails and
that multimodal text analysis and transcription seek to describe. We will not claim,
however, that we have anything like a definitive answer for what is, after all, a very big
and intriguing question, nor will we propose that one can be given in a book which
aspires to being only a small step on a very long road to a full understanding of
multimodal texts and genres.
Different types of multimodal texts and genres are examined more carefully
in the subsequent chapters of this book where efforts to analyse them through
detailed multimodal transcription and analysis of texts are linked to the notion of
multimodal grammar and a scalar approach to multimodal meaning making that is
designed to explore the organisation of multimodal texts in terms of different
levels. We will refer constantly to the Insets which contain essential background
information that helps to contextualise our approach by linking it up to some key
insights that have been formulated in the last hundred or so years. We believe it will
be useful for the reader to consult and review these Insets frequently.
In this chapter, we begin our journey by looking at the ways in which
resources typically combine to make meaning, characterising, in particular what we
2 Multimodal Transcription and Text Analysis: Chapter 1

Inset 1: Context of Situation and Context of Culture

� Malinowski (1923: 306) coined the term context of situation in order to broaden, as
he put it, the notion of context. He pointed out that the meaning of ‘any single word
is to a very high degree dependent on its context’ (Malinowski, 1923: 306), in the
sense that its meaning is determined by the whole utterance in which it occurs. He
further pointed out that the utterance itself ‘becomes only intelligible when it is
placed within its context of situation ’ (Malinowski, 1923: 306). In ‘coining’ this term,
he proposed both that ‘the conception of context must be broadened’ beyond that
of the utterance to the situation and that ‘the situation in which words are uttered
can never be passed over as irrelevant to the linguistic expression’ (Malinowski 1923:
306). He also argued that the concept of context itself ‘must burst the bonds of
mere linguistics and be carried over into the analysis of the general conditions under
which a language is spoken’ (Malinowski, 1923: 306). The study of language must be
therefore conducted ‘in conjunction with the study of [the] culture and of [the]
environment’ of ‘people who live under conditions different from our own and pos-
sess a different culture’ (Malinowski, 1923: 306).
� Malinowski’s ethnographic and anthropological perspectives on culture led him to
propose the notion of context of culture in order to connect language to the activities
through which human needs are satisfied and the forms of cultural organisation giv-
ing rise to these activities. His definition of culture hinges on the notions of function
and organisation. He posited that there is a functional relation between a performance
or activity and a human need and that culture implies the organisation of human
behaviour so that any particular purpose can be achieved (Malinowski, 1944: 38-39).
The ‘larger significant whole’ of utterances is an integral part of the meaning of
utterances and includes both the context of situation and the context of culture.
� Firth (1957 [1934]; 1957 [1950]) turned context of situation into a construct concerned
more with typical contexts of situation and the typical functions of language in these
contexts than with the thick ethnographic description that Malinowski brought to bear.
In our approach to multimodal text analysis and transcription, Malinowski’s detailed
description of specific instances and Firth’s concentration on the typical features of
different types of context of situation are equally important. Both have important les-
sons to teach us. Malinowski (1923, 1935, 1944) shows the need to develop forms of
analysis and transcription that relate language and other semiotic modalities to each
other, to the activities they help to constitute, the meanings and functions of these in
their context of situation and how these relate to the context of culture. Firth’s empha-
sis on types of language functions in relation to types of context, on the other hand,
highlights the need to make generalisations and encourages us to connect text analysis
and transcription to questions of genre (see Inset 6 , p. 43). They also show in a prin-
cipled way how different units and their relations on different scalar levels of textual
organisation (see Inset 12 , p. 144) are all functional in some way to the meaning of
the whole. What both these early thinkers have in common is a clear understanding
of the contextual significance of units and functions at all levels: it is contextualisa-
tion all the way up and all the way down. Context is not extrinsic to semiotic form
and function; rather, it is an integral part of it on all levels of textual organisation.
Insets 1 and 2 3

Inset 2: Text

� In Inset 1 we went back to the early insights of Malinowski and Firth regarding
context of situation and context of culture because they still remain startlingly fresh and
relevant to our present concerns. They not only provide a historical touchstone for
our own efforts in the tradition of systemic-functional linguistics but also invite us
to ‘renew the connection’, as Firth put it, with the full range of semiotic modalities
that function in partnership with each other when we analyse texts of all kinds and
seek to relate their forms of organisation to the contexts of situation and the contexts
of culture in which they function and make their meanings. With Firth’s and
Malinowski’s thinking in mind, we now examine Halliday’s more recent definition of
text (see p. 4) from the perspective of systemic-functional linguistics.

� Halliday’s functional definition of text helps us to see that text is a constitutive part
of some meaning-making event or activity in which the text participates. As
explained in Inset 13: System and instance (pp. 172-173) texts involve many interact-
ing systems of different kinds on different levels of textual organisation. Halliday
also shows that the definition of text readily extends to multimodal texts and even
to texts in which there is no language whatsoever. The important point is that texts
are embedded in, and help to constitute, the contexts in which they function. Texts
are thus inseparable parts of the meaning-making activities in which they take part.
A functional and semiotic definition of text seeks to understand the ways in which
the intrinsic properties of texts and their organisation enable them to be coupled to
their contexts. As we saw in Inset 1 on the facing page, context is not something
extrinsic to text. Rather, it is created when text users’ knowledge of culture and
society interact with the internal features of the text’s organisation during the making
and interpreting of texts.

� Texts themselves may recontextualise meanings and practices in one modality to some
other modality. For example, a film version of a novel is a recontextualisation of other
semiotic modalities in this sense. A novel is a recontextualisation of the speech
genres of everyday life and many other semiotic modalities, practices and perceptual
experiences including, for example, many non-linguistic social activity-types and
many forms of auditory, sartorial, gustatory, bodily and other experiences and
practices. Consider the following literary example: Hours later, the cart climbed the last
hill that hid Immortal Heart. I could hear the crowing of cocks, the yowling of dogs, all the
familiar sounds of our village. In this quotation from Amy Tan’s novel, The Bonesetter’s
Daughter (2001: 196), a physical event (the cart climbing the hill) and various familiar
sounds are recontextualised by the linguistic semiotic through specific choices in the
lexicogrammar. All of these events – the movement of the cart and the sounds of
the village cocks, dogs and so on – are themselves familiar types of experience that
are meaningful in the context that the writer connects them to. By the same token,
the indexical-symbolic resources of the linguistic semiotic allow for the possibility that
these sights and sounds can be indexically evoked in the mind’s eye, so to speak, of
the reader as off-line perceptual experiences that readers may undergo. Texts of all kinds
allow for this constant criss-crossing of semiotic and perceptual modalities.
4 Multimodal Transcription and Text Analysis: Chapter 1

term the resource integration principle (Inset 3, pp. 18-19), which lies at the heart of
multimodality. We do this mainly in relation to cluster analysis (Inset 5, p. 31) and
phasal analysis (Inset 7 , p. 47). In particular, this will help us understand the
relationship between individual multimodal texts and multimodal genres, or to put
the matter in slightly different terms between instance and type (see Inset 13 , pp.
172-173). In this respect it is appropriate, as our very first step, to examine the
relationship between text and society in terms of the links between context of
situation and context of culture (Inset 1, p. 2) and text (Inset 2 , p. 3), a step which
will help us subsequently to characterise the relevance of metafunctions (Inset 4, pp.
22-23) and primary and secondary genres (Inset 6, p. 43) in our approach to
multimodal text analysis and transcription.

1. 1. Multimodal texts and the resource integration principle

What is a text? And what is a multimodal text? As we can see from Inset 2 on the
preceding page, in this book a text is a technical term which follows Halliday in
considering texts to be meaning-making events whose functions are defined by
their use in particular social contexts.

We can define text, in the simplest way perhaps, by say-


ing that it is language that is functional. By functional, we
simply mean language that is doing some job in some
context, as opposed to isolated words or sentences that I
might put up on the blackboard [...]. So any instance of
living language that is playing some part in a context of
situation, we shall call a text. It may be either spoken or
written, or indeed in any other medium of expression
that we like to think of.
(Halliday, 1989:10)

As Halliday points out, texts are not limited to the spoken and written media
of language. Instead there are many other resources that can be used to create texts
in addition to the spoken and written word. In this book, we shall explore these
other possibilities. As a starting point, we need to point out that different semiotic
modalities adopt different organisational principles for creating meanings. Different
semiotic modalities make different meanings in different ways according to the
different media of expression they use. When studying multimodal texts, it is all too
easy, for example, to underestimate the significance of the codeployment of space
with hand-arm movements as a meaning-making resource. To see this we may
examine the highly selective type of multimodal transcription in Figure 1.1 which
is designed specifically to reconstruct the relationship between hand-arm move-
ments and space. The transcription reconstructs the text in terms of phases and
Action Visual Image (+ camera position) Movement and/or gesture Space Meaning

Phase 1: Zombies attack


Threat; suspense: suggestion that the
Subphase 1.1: Circumstance/setting and first Shot 1: Shot of car & trees from a distance: Ø Single space: car and trees alone car is in an eerie, remote spot.. and
Participant, the car, introduced. car seen in remote spot that something is about to happen
Subphase 1.2: Second Participant, the loving Shot 2: Close-up of couple looking into car Couple put arms around each Inside car with implicit contrast to Safety: Couple’s occupancy of each
couple, introduced through windscreen other previous space outside car (i.e. two other’s space seems reassuring
spaces identified)

Subphase 1.3: Third Participant, the Zombie Shot 3: Close-up shot of Zombie’s head & Zombie shakes body like an Above ground with implicit contrast Threat confirmed: underground
leader, introduced arms popping out from beneath ground animal to remove loose earth to space below ground creatures emerge threatening couple
Subphase 1.4: Final Participants, a large group Shot 4: Distant shot from raised position of Zombies’ hands are outstretched Space is represented as a diminishing as before
of Zombies, introduced car & trees showing Zombies encircling car circle with car as centre
Subphase 1.5: Couple screams Shot 5: Very close up shot of couples’ The car is now completely encircled: as before
mouths from outside (Zombies’ view of focus on space inside car followed
couple); Ø by focus on space inside and outside
Shot 6: From inside car over couple’s car. Only one space now exists
shoulders (Couple’s view of Zombies)
Phase 2: Counter-measures Pressing a button stops Zombies Car shown as a space providing Safety: Couple’s safe occupancy of
Subphase 2.1: Man operates car door locks Shot 7: Very close-up shot of dashboard entering protection from outside the car’s space can be reassured
Shot 8: Very close-up shot of door locking thanks to technology (the engine and
ignition weren’t even on)
Subphase 2.2: Zombie leader fails to get inside Zombie’s hand on door handle Distinction between spaces is Threat and safety: the two opposing
car Shot 9: Very close-up shot of Zombie’s from outside car disappearing forces are in the balance
hand on door handle
Zombie’s hand
Subphase 2.3: Other Zombies also fail Shot 10: Close-up of two Zombies with Number of outstretched hands All distinction between spaces has Threat re-confirmed
raised hands attempting to touch car gradually been lost
Shot 11: Distant shot of heads and hands grows until nothing else is visible
Phase 3: Zombies give up Shot 12: Close-up of man Man’s hand stifles yawn as Separation between Zombies space Safety reaffirmed
Subphase 3.1: Couple sit out attack and Shot 13: Close-up of woman woman paints her nails and car’s space gradually increases
express boredom
Subphase 3 .2: Zombies leave Shot 14: Medium close-up shot of Zombies Zombies leave as before Drama over
leaving
Subphase 3.3: Zombie leader leaves after Shot 15-18: Various shots of Zombie leader Zombie leader’s replacement of as before as before
replacing windscreen wiper wiper is a conciliatory gesture

Phase 4: End phase with logo Logo appears superimposed on frozen Shot 18 Ø as before as before
Multimodal texts and the resource integration principle

Figure 1.1: Codeployment of space and hand-arm movements in a car advertisement


5
6 Multimodal Transcription and Text Analysis: Chapter 1

subphases (see below and Inset 7, p. 47 for a definition of phase ). It is arranged in


such a way as to highlight the meaning-making functions of gesture and, in particular,
of space in the text (and, as we will see subsequently, in just about all the texts tran-
scribed in this book). The use of space here, as elsewhere, is time-based, that is, it is
constructed around, and conditioned by, a sequence of events which involves the con-
stant reorganisation of the participants’ occupancy of space in relation to each other.
The transcription carefully reconstructs the partly metaphorical contrasts (in
the first phase) between space above ground and space below (from which the
Zombies emerge to terrify the car’s occupants) and space outside the car and inside
the car (the latter providing safety and protection) which are fundamental to the
creation of the text’s meaning. In the second phase, the camera focuses on the
car’s inside door lock, thereby creating the message that while the world below and
the world above may have merged, the world inside the car as opposed to the world
outside the car will remain safe and secure. Space is closely linked to the many hand-
arm gestures enacted in this text. Gestures are configurations of hand-arm move-
ments which occur in space and time. Hands and arms, the movements they make,
space and time are all codeployed resources in gestures. Space, as we have sug-
gested in examining this text is not a ‘neutral’ entity but is instead ideologically
loaded (Hall, 1972 [1963]). It is part of the culturally-determined way in which we
perceive the world, the result of our collective cultural experience (see above Inset
1: Context of situation and Context of culture , p. 2).
In addition to time and space, other resources, which we will examine more
thoroughly in the following chapters, are essential to the meaning-making process,
some at first sight surprisingly so. One meaning-making resource that falls into this
category is intertextuality (see Inset 8 : Intertextuality , p. 55), which we will discuss
more fully in Chapter 2.
However, we may note in passing that the text which has been partly tran-
scribed in Figure 1.1 works on the assumption that viewers know what a Zombie
is and will have previously seen books or films about them. While no resource ever
functions alone in a multimodal text, it is, however, the case that in most film texts,
the soundtrack plays a fundamental role and, as we shall see subsequently, is artic-
ulated into various levels of textual organisation (see Inset 16: Perspective in sound:
Van Leeuwen on Figure, Ground and Field, p. 212). In multimodal texts, not all of
the soundtrack’s meanings necessarily derive from the use of speech. In the Zombie
text much of the humour derives from the foregrounded sequence of non-linguistic
yet exquisitely human vocalisations, such as groans, screams and yawns. Above all,
the text’s transcription highlights the unpredictability of the interplay between
resources in an interactional encounter.
In the Zombie text, predictable patterns of behaviour vie with highly unex-
pected ones. The transcription given in Figure 1.1, though limited in its scope,
systematically relates gesture and other actions to space and in so doing suggests
Resource integration and the transcription of printed cartoons 7

how the resource integration principle (see Inset 3 , pp. 18-19) contributes to the rapid
alternation of expected and unexpected in a very short time span (a 30 second
advertisement). Our understanding of the kissing couple’s predicament (or from
another point of view the Zombies’ predicament) is heavily dependent on our
knowledge of film genres, the TV car advertisement included (see in this respect
Inset 8: Intertextuality, p. 55). This is the same thing as saying that the Zombie text,
like all texts, is dependent on, and partly creates, a particular context of situation and
a particular context of culture.
We will return to the question of expected and unexpected patterns in
multimodal texts on many occasions in this book, for example in the second part
of Chapter 2 (see 2.4 to 2.8, pp. 71-102), and in particular, in Chapter 4 when dis-
cussing the ways in which soundtracks contribute to these patterns in film texts.
This relationship is further characterised in the associated online course (see
Preface, p. xv) as are the relationships, for example, between video tracks and
soundtracks in film texts. The exercises and text analyses in the associated course
which relate to printed media, websites and film texts are designed to further
analyse these relationships as well as to provide further contexts in which to
explore and apply the theoretical statements made throughout the book, for
example in the Insets.

1.1.1. Resource integration and the transcription of printed cartoons


In any text, there will always be some expected and easily predictable patterns and
some unexpected ones. Hence, if we meet a friend in the street and wish to present
our family to him or her we might well expect to use a combination of language
and gesture (a handshake, a pat on the back, pointing or a wave) to identify
particular individuals or to indicate leave taking. Yet even within expected patterns,
we can never be sure about what will actually happen. Whatever we make of the
rather unexpected leave-takings in the Marmaduke cartoons in Figure 1.2, they
help to clarify that meaning making cannot be construed in terms of individual
semiotic resources but instead relies on their combination.
Both of the cartoons in Figure 1.2 consist of: a cartoon illustration, the
frame around the depicted scene in the illustration and direct speech in quotation
marks outside the frame at the bottom of the picture. Thus, in these cartoons,
visual resources and language are, in fact, separated physically in a very sharp way
through the presence of a framing border. This, however, does not necessarily mean
that verbal and visual resources are separated functionally. Language, frame and
visual resources make their meaning through their mutual interdependence. Hand-
arm movements and other types of movement in these, as in other cartoons, are
not ‘real movements’ but are instead represented as movements through the use of
visual resources, mostly curved lines and the relations among them, a resource
which is fundamental when attempting to produce caricatures in printed or film
8 Multimodal Transcription and Text Analysis: Chapter 1

Figure 1.2: The two Marmaduke cartoons


© UFS. Inc. / Distribuzione Adnkronos.

© UFS. Inc. / Distribuzione Adnkronos.


Stage 1. Orientation 2. Event 3. Complication 4. Event 5. Event + 6. Event
Resolution

Visual cluster

Participants: two men;


Action: man 1: walks:
directional vector;
House front, door, Participant: open Participant: dog; Participant: dog; Participant: man; movement;
window, shutters; door; Action: language + Action: language + Action: walks away Action: man 2:
Visual Resources tree and fence in Event (implicit): vector vector from house; stationary: standing
background door was closed Change of state: Process: gaze vector
prior to man’s minus one shoe connects two men;
arrival close spatial
proximity; contrasting
body postures and
facial expressions

Visual setting: the first man tried the dog bites his the dog chews up the man, in pain, on the stairway , he
Phase in location and to enter the house, shoe off and his shoe gives up and leaves encounters a second
Narrative Event participants but was refused prevents him from the house with just man who intends to
Structure entry entering the house one shoe enter the house
Resource integration and the transcription of printed cartoons

Figure 1.3: Cluster transcription of narrative event structure in a cartoon


9
10 Multimodal Transcription and Text Analysis: Chapter 1

cartoons. Generally speaking, the frame that surrounds a picture separates the
depicted world of the picture inside the frame from that which is outside the frame (in
a manner which is partly analogous to the division of space inside and outside the car
in the Zombie text). The frame itself is not part of the depicted world of the picture,
but stands outside of it. The frame provides some implicit indication as to how the
picture is to be viewed. In doing so, it provides a metacomment on the depicted world
of the picture or, to put the matter in slightly different terms, it specifies a metarule
concerning how the things inside the frame are to be taken (Bateson, 1973 [1972] 159-
161).
In the first cartoon in Figure 1.2, the Marmaduke-in-a-playful-mood-text, the
sentence of direct speech occurring outside the frame specifies what belongs to the
depicted world at the same time that the woman in the foreground of the picture
constitutes its deictic centre. In other words, the words outside the frame are to be
attributed to one of the participants inside the frame at the same time that they
characterise the point of view of the woman inside the frame rather than the stand-
point of the cartoonist or the reader/viewer, who are outside the frame. Thus, the
reference point for the direct speech is the woman and not the outside observer or
the cartoonist who created the world depicted inside the frame.
How do we know that the words outside the frame are to be attributed to the
woman? Why aren’t the words placed inside the frame, for example in a speech
bubble linked to the woman? The use of quotation marks to signal direct speech, on
the one hand, and the person deixis, mood and tense, on the other, all tell us that the
utterance represents the point of view of the woman rather than someone else out-
side the frame. This is unusual insofar as items that are placed outside the frame are
normally taken to represent the reference point of an observer of the scene rather
than one of the participants in the scene. However, by presenting the words from
the point of view of the woman, the cartoonist is able to distance himself from
judgements concerning the truth or validity of the words attributed to the woman,
especially given that the depicted scene is a fictional one. Judgements of truth and
so on can therefore be suspended or, if you like, left to the cartoon characters
themselves in their fictive world.
At the same time, the placing of the words outside the frame can indicate
that the cartoonist wishes to adopt a particular affective or other interpersonal, e.g.
evaluative stance of, say, solidarity with the woman and her words. In this way, this
multimodal text uses the combined resources of written language and depiction to
present some aspects of the situation from the point of view of the participants in
the depicted scene and other aspects of it from the point of view of the external
observer of the scene so that the latter is drawn into a particular kind of
interpersonal relation with it or with some aspect of it. In the present case, it is the
woman’s assessment of the situation, specifically of the man’s plight after his
unhappy encounter with the dog in the background, which is salient. However, it
Resource integration and the transcription of printed cartoons 11

is the external observer who is able to view the whole depicted scene from his or
her reference point outside the frame. Such an observer will note the ironic dis-
crepancy between the woman’s assessment of the dog’s ‘playful’ mood, the man’s
plight and the immense power and energy of the dog as his owners struggle to
restrain him from overwhelming the retreating man.
As mentioned above, the frame functions as an implicit metacomment, in
Bateson’s sense, on the depicted world of the cartoon and therefore to signal that it
is to be interpreted as a cartoon world and not as a feature of the world outside the
frame. By the same token, the direct speech of the woman is attributed to a
participant of the depicted world inside the frame at the same time that it is used to
frame the interpersonal evaluative stance of an observer outside the frame. The
humour of the cartoon may in part be due to the paradoxical effects which derive
from this.
As readers of the text, we establish a link between the inverted commas and
the woman as part of the process of deducing that it is the woman who is speak-
ing since, apart from Marmaduke (a dog who cannot be expected to speak) and the
possible exception of the man (who is too stunned to speak), she is the only char-
acter with an open mouth. Had there been more than one open mouth, the process
of associating a particular utterance with a particular speaker would have been
resolved in other ways, most probably through the presence of a speech bubble, with
an explicit link between a speaker and their words or thoughts. In such circum-
stances, the utterance would almost certainly have been inside the framing border,
in contrast to the external position in this text.
A speech bubble is itself a partly prefabricated multimodal unit, a cluster (see
Inset 5: Clusters and cluster analysis , p. 31), made up of various resources including
language, curved and straight lines and space, that is ready to be pressed into
service once specific context customisation has been enacted. As Figure 1.7 in 1.3
(pp. 34-38) indicates, this customisation relates to the choice of specific, contextu-
ally appropriate words and the decision to portray them as words (a link achieved
through lines) or as thoughts (a link achieved through circles). In our approach to
multimodal text analysis and transcription, clusters are groupings of resources that
form recognisable textual subunits that carry out specific functions within a
specific text. Multimodal transcription typically serves to identify the components
of each cluster and the function that each specific cluster plays within a text. A
further function of multimodal transcription is to identify the relations between
clusters in the same text and the relationship between specific multimodal clusters
and cluster types (see the discussion on primary genres in Inset 6 , p. 43).
Clusters are thus a prime indication of the localised effects of the resource
integration principle (see Inset 3, pp. 18-19). The variation in the complexity of the
codeployment of resources in any specific cluster is closely linked to social evolu-
tion and technological developments. Contemporary society has unquestionably
12 Multimodal Transcription and Text Analysis: Chapter 1

instantiated more recorded multimodal texts, in particular dynamic film texts, than any
previous society (Baldry, 2000b: 28-38). Nevertheless, a rock painting in the
Australian desert, a 15th century musical score, Leonardo’s Notebook (see Figure 1.10
in 1.5, pp. 44-46) and the latest feature film with special effects all share a basic
feature in that they are units of meaning which carry out a specific function in a
specific social context, deploying various resources to this end. They all typically
contain clusters of related items. As such, though otherwise very different, they are
all multimodal texts. We should not overlook, however, as mentioned above (Inset 2:
Text , p. 3), the absence or restricted use of language in many multimodal texts. A
multimodal text may well be something written, spoken or a combination of written
and oral discourse, but it may also extend beyond the linguistic semiotic to include
other meaning-making modalities and, in so doing, may not necessarily include lan-
guage. The likelihood, however, is that in some genres, language will be pared back
as much as possible but not entirely excluded. The cartoons in this chapter are good
examples of this.
The simultaneity of visual presentation in the cartoon scenes in Figure 1.2
should not distract us from the way in which they tell a story involving a sequence of
events. Nor should it distract us from the changes or transformations that are
brought about as these events unfold in time. How can a simultaneously presented
configuration of events in the depicted world of a single picture tell a story? How
is succession in time and consequent change communicated? How can the reader
infer a narrative sequence on the basis of the depicted scene? In the remainder of the
current section, we will propose answers to these questions by exploring the ways
in which a discourse level of narrative organisation can be unpacked from the visual
and other resources used in the cartoon.
Reference to the semiotic resources used to create the second cartoon in
Figure 1.2 and its meanings help provide some answers to these questions. The
man attempts to enter the house with both shoes on but leaves with just one shoe.
The other shoe is apparently being chewed up by the dog (part of whose head can
just be seen). The static depiction nevertheless manages to convey movement, a
temporal situation involving different moments in time as well as the change which
occurs with the passage from earlier moments in time to later ones. These three
factors together constitute an event, which, regardless of their modality of realisa-
tion, are the hallmark of narrative. Narratives, including cartoon narratives

1. the dog took the man’s shoe and then the man left the house with no shoe on his right foot

2. the dog took the man’s shoe so that the man left the house with no shoe on his right foot

Table 1.1: Examples of possible temporal and causal expansions


of the event sequence in the dog-chews-shoe cartoon
Resource integration and the transcription of printed cartoons 13

(Goodman, 1996: 60-69), do not merely signal a temporal succession of events.


Most importantly, they show how some aspect of a situation or a participant in a
narrative changes as a result of the transition from an earlier moment to some later
moment. Narrative therefore involves change or transformation over time. One
such event involving change in our example may be expanded as in Table 1.1.
There are a number of different ways in which this event could be expressed
linguistically, e.g. (1) the man lost his shoe; (2) the dog took the man’s shoe and (3)
the man tried to enter the house with both shoes on and left with one shoe, and so on.
Each of these linguistic glosses on the meaning of the cartoon either implies
distinct temporal moments in an event sequence (i.e. 1 and 2) or spells out the
temporal-causal relations between the different moments in the sequence (i.e. 3).
Examples 1 and 2 can be expanded to draw out their temporal and causal relations
to the event sequence as a whole. For example, linguistic gloss 1 can be expanded
by supplying a reason as to why the man lost his shoe, e.g. the man lost his shoe
because the dog took it.
Two possible expansions are given by way of illustration in Table 1.1. The
first example in Table 1.1 highlights the temporal relation between events (and then );
the second example draws attention to the causal relation between the two events
(so that ). The two interpretations, which are analytically separated in their respective
linguistic glosses, are both simultaneously present in the cartoon. There is therefore
a degree of indeterminacy in the picture because both the temporal and causal mean-
ings can coexist and mutually support each other without requiring that the reader
decide between one reading or the other. The first example foregrounds temporal suc-
cession (one thing happened then another thing happened ) and implies a causal sequence
(the man lost his shoe because the dog chewed it up). The second example foregrounds
the causal relation between the two actions and implies temporal succession. Table
1.2 reconstructs the first two elements in the event sequence.
In the cartoon in Figure 1.3, the dog-chews-shoe cartoon, the perceptual simul-
taneity of the visual invariants used to specify the actions (e.g. the dog chewing up
the shoe, the first man leaving the house), the participants (the dog, the two men),
and the surroundings (the entrance of a suburban house, the garden in the back-
ground) nevertheless communicates a narrative sequence involving change in time.
The narrative structure of the cartoon can be unpacked as a level of semiotic
organisation which is analogous to the discourse stratum in linguistic texts (Martin,

Time 1 Time 2

man has two shoes man has one shoe

Table 1.2: Example of the temporal sequencing of events in the dog-chews-shoe cartoon showing change (two
shoes then one shoe) resulting from the transition from one moment to the next
14 Multimodal Transcription and Text Analysis: Chapter 1

1992: 20-21; Martin, Rose, 2003: 3-7). The event structure is a level of semiotic
organisation which is highly condensed in the visual organisation of the depicted
scene. Nevertheless, the temporal succession of events can be reconstructed or reac-
tivated in ways which partially detach it from the visual forms themselves. This shows
the need to distinguish the narrative event structure as a level of meaning which is
realised by, though not reducible to, the resources of the visual grammar used in the
cartoon drawing. It also suggests that a visual image such as the dog-chews-shoe
cartoon in Figures 1.2b and 1.3 can be analysed in terms of the three levels of
semiotic organisation presented in Table 1.3, which provides a synoptic reconstruc-
tion with a focus on the experiential metafunction and which suggests some of the
kinds of meanings that are realised by choices from the visual grammar and their
combinations. Narrativity can be generated in a visual text like the shoe-chewing one
shown in Figures 1.2b and 1.3 on the basis of genre-related considerations such as
those listed below:
� contrary to appearances, the depicted scene in this cartoon is not
a single moment in time, though it can, of course, also be seen as
such. Rather, the scene implies a timeline comprised of actions and
events that take place in a given chronological order, which can be
deduced from the visual depiction;
� the chronological order of these events therefore corresponds to a
sequence of events in time;
� the participants who take part in the sequence of events maintain
their identity from one action or event to the next in the sequence;
� the transition from one action or event to another in the sequence
also entails change or transformation in some aspect of one or the
other of the participants but in a way that maintains their identity.

Expression stratum Content stratum resources


resources
Visual grammar Narrative event structure
(discourse)

lines, dots, light-shade volumes/shapes participants


interplay; intersections
and nestings of these vectors actions, events, movement
resources to produce an
arrested optical array of change of state; timeline
visual invariants change of feature

Table 1.3: Three levels of semiotic organisation in the dog-chews-shoe cartoon


Resource integration and the transcription of printed cartoons 15

The above points summarise the conditions for the activation of narrative
discourse in a text. Figure 1.3 shows the narrative sequence that can be
reconstructed on the basis of the visual cues provided in the cartoon drawing. The
man’s dress and the briefcase he is holding index his likely participant status as a
salesman who was hoping to gain entry to the house in order to discuss a business
transaction with the house owner. The reader can assume that prior to the moment
which is shown in the cartoon, he had knocked on the door, encountered the
savage dog, and that the dog took his shoe off. We see him retreating from the
house in obvious pain and discomfort. The reconstructed sequence of events
entails both continuity of participant roles (man and dog) over successive moments
in the sequence at the same time that some change occurs in the man at Time 3
and Time 4 as shown below when he loses his shoe:

(1) Time 1: man knocks on door of house with both shoes on;
(2) Time 2: man encounters savage dog;
(3) Time 3: dog bites one of his shoes off;
(4) Time 4: man leaves house without one of his shoes.

It is this factor of change or transformation in relation to the factor of continuity


or sameness which enables narrative meaning to be generated. This is why it is
important to distinguish the level of narrative discourse (Table 1.3) from the level of
what is actually depicted by the visual and other resources which are used (Table 1.4).

Semiotic Resources
Language Depiction Sound Movement

Participant: dog volume/shape:


face
Action: dog eats written action vectors onomatopoeia: action vectors
shoe language: emanating from transcoding emanating from
‘chomp, dog from writing to dog
chomp, chew’ speaking
Participant: first volume/shape
man
Action: man direction vector visual movement
leaves house invariants
Attribute: first facial expression:
man in pain open mouth
Event: no shoe on man’s
transformation in right foot
features of first
man
Table 1.4: Distribution of semiotic resources in dog-chews-shoe cartoon
16 Multimodal Transcription and Text Analysis: Chapter 1

1.1.2. Multimodal transcription of cartoon narratives and the question of the metafunctions
What resources and what kinds of meanings made by these resources contribute
to the discourse level of narrative organisation? The contribution of the different
metafunctions (see Inset 4 : Metafunctions, pp. 22-23) is discussed below with this
question in mind. In this respect, we need to understand that the meaning-making
processes of a text need to be defined in terms of four different but general and
concomitant types of meaning.
1) Logical Meaning. It is on the discourse level that the reader of the cartoon acti-
vates the potential narrative meaning of the text by raising questions and providing
answers to them. For example, Why did the man want to enter the house? What was he
expecting to achieve? Who/what is he? What went wrong?, and so on. In this way,
relations of cause, time, comparison and so on between events in the sequence can
be postulated and possible answers provided. In the case of texts such as the one
above, the raising of such questions and the providing of answers to them gener-
ates narrativity by seeking to find reasons for the changes which occur in
participants during the unfolding event sequence (see 4.11.6 in particular pp. 238-
239). For example, why did the man go to the house? What happened when the
door was opened? What will happen to the second man?

2) Textual Meaning. The depicted scene presumes a narrative timeline in which


there is participant identity across successive occurrences of the same participants
in time, e.g. the same man and the same dog take part in the reconstructed sequence
of events referred to above. Moreover, the similarity in dress, physical type and the
briefcases of the two men provide a visual basis for construing a relation of simi-
larity between the two participants. They are participants of the same type, doing
the same kinds of things, and can be expected to be found in the same kinds of
situations, as here. Textual ties of this kind are known as covariate ties (see 3.8, 3.9,
pp. 136-155, 4.7.6, pp. 199-200) in which two or more items that are not struc-
turally related as part of the same transitivity frame (see Inset 11: Visual transitivity
frames, p. 122) are co-classified as belonging to the same general class (Lemke, 1985;
Martin, 1992: 25). The covariate visual tie between the two salesmen indicates that
the second man is going to the entrance of the house for the same reason as the
first man. The tie between the two participants not only establishes a similarity of
experiential role function (see the following paragraph) but also provides a further
basis for inferring that (1) the first man was in a similar situation to the second man
before his unfortunate encounter with the dog and (2) that the second salesman is
likely to meet the same fate as the first one.

3) Experiential Meaning. A further factor that is important here for the activation of
the narrative discourse meaning are the respective expectations that apply to the two
different participant roles (salesman and savage dog), along with the ways in which
Multimodal transcription of cartoon narratives and the question of the metafunctions 17

they are expected to interact with each other. Cartoons of this kind draw upon
stereotypical representations of social roles and the expectations that are associated
with these roles. Thus, salesmen do certain kinds of things such as knocking on the
doors of houses in the hope of finding potential clients, they dress in a certain way,
they usually carry a briefcase with their wares and so on. Likewise, savage dogs in sub-
urban houses make it difficult for strangers such as salesmen to enter, they are likely
to be aggressive and imposing, they may bite or chase such intruders and so on. In
this kind of relation, the reader associates a set of features with each of the
participants which characterise a specific role, the way that individuals (dogs or
humans) in the given role can be expected to behave in particular situations and so on.

4) Interpersonal Meaning. The interpersonal meaning of the cartoon can be seen in


relation to the way in which the reader is positioned so as to take up a particular
evaluative stance with respect to the depicted world, its participants and the experi-
ences that they undergo. The depicted world of the cartoon takes place within a
frame which separates the participants and events within that world from those
outside the frame, e.g. readers in the real world (see the related discussion of Figure
1.2 above and of Figure 1.7 below). The reader is therefore invited to adopt a
particular metadiscursive stance on the depicted world, as implied by the framing
relationship. In the present case, the cartoon genre, with its emphasis on
stereotypical situations, its focus on schematic representations of participants’
actions and reactions, and the humour or playfulness derived from these, plays on
the reader’s expectations concerning the outcomes of such encounters between
salesmen and savage dogs. In this way, we can empathise with the participants in the
situation on the basis of our expectations and predictions. At the same time, we are
not part of the depicted world and can laugh at the participants and their predica-
ment. The framed world of the cartoon tells us to view its content in a different
way from what is outside the frame in the world of the reader (Bateson, 1973
[1972]: 160). Yet the reader can relate what he or she sees in the frame to his or her
experiences outside the frame. As mentioned above, therein resides the paradox of
the framing relationship – a major source of the cartoon’s humour and playfulness.

1.1.3. Sources of meaning in multimodal texts


As we have seen from the preceding discussion, careful analysis is required to
ascertain the sources of meaning in multimodal texts, one reason why multimodal
transcriptions of texts are so important. This, as we have suggested above, is a
question of understanding how different semiotic systems, in our terminology,
resources, intertwine to make meaning. We have called this the resource integration
principle (see Inset 3, pp. 18-19).
It is all too easy to assume that one resource will typically predominate in a
particular genre, only to find that this is not the case. TV advertisements are a well-
18 Multimodal Transcription and Text Analysis: Chapter 1

Inset 3: The resource integration and meaning-compression principles

(a) The resource integration principle views a semiotic resource as something used for the purposes
of making meaning and which accordingly functions in the texts in which these resources are used
to this end. Semiotic modalities such as language, gesture, depiction, gaze and so on, can be for-
malised and described as resource systems in this sense. A semiotic resource system is thus a
system of semiotic forms that we can use for the purposes of making texts. The forms have
particular functions in the texts in which they are used. The notion of resource therefore captures
these two aspects – use and function – of the relevance of semiotic systems to the texts which
these systems make possible. This does not mean that the system pre-exists use in some abstract,
Platonic sense. More accurately, semiotic resource systems are distributed across many different
individuals in a particular context of culture of a community, which makes use of particular
semiotic resources. Different individuals winnow their own way through the culture they live in
and, in doing so, define and accumulate their own semiotic resource systems on the basis of their
own experience and participation in different social contexts in the course of their lives, encoun-
ters with texts, educational and professional experience and so on. A system of semiotically salient
differences in some community, i.e. the differences that potentially make a difference in the
meaning-making practices of that community, is thus a resource for making meanings.

� Multimodal texts integrate selections from different semiotic resources to their principles of
organisation. For example, the printed page makes use of the resources of depiction, written lan-
guage, lexicogrammar, spatial positioning and arrangement of items, among other things. These
resources are not simply juxtaposed as separate modes of meaning making but are combined and
integrated to form a complex whole which cannot be reduced to, or explained in terms of the
mere sum of its separate parts. The organisational principles of the whole – e.g. the page as a
visual unit – cannot be understood in terms of the different resources used, taken separately. The
resource integration principle refers to the ways in which the selections from the different semiotic
resource systems in multimodal texts relate to, and affect each other, in many complex ways across
many different levels of organisation. Multimodal texts are composite products of the combined
effects of all the resources used to create and interpret them. Lemke (1998) uses the term multi-
plying effect to capture the way in which different semiotic modalities co-contextualise each other
in ways that are not predictable on the basis of the different semiotic resources seen as separate
modalities. The separation of different resources into different modalities is an analytical
abstraction. Different resources are analytically, but not constitutively, separable in actual texts.

�A semiotic resource system is thus a system of possible meanings and forms typically used to make
meanings in particular contexts. A system is always a theoretical abstraction from very many
instances (see Inset 13, pp. 172-173). An act of abstraction of this kind is an attempt to reconstruct
the possible forms and their typical patterns of combination in a given semiotic system. Language,
or rather some languages, has/have been extensively theorised as a system of semiotically salient dif-
ferences that social agents use in contextually constrained ways. The lexicogrammar of natural lan-
guage is thus to date the most studied case of a semiotic resource system from this standpoint.
However, the advent of computerised multimodal corpora will, in time, change this (Baldry,
Thibault, 2001, 2005) given that there is no reason, in principle, why this kind of thinking cannot be
extended to the complex systems of topological differentiation that characterise the ‘grammar’ of
visual semiosis (see Kress,Van Leeuwen, 1996 and the discussion of gaze in 4.1, pp. 167-173).
Sources of meaning in multimodal texts & Inset 3 19

Inset 3: (continued)

� According to the resource integration principle, texts are never monomodal. Monomodality is
the result of a certain way of thinking of separate, distinct semiotic resources, abstracted from
use, as existing in their own right. In practice, texts of all kinds are always multimodal, making
use of, and combining, the resources of diverse semiotic systems in ways that show both generic
(i.e. standardised) and text-specific (i.e. individual, even innovative) aspects. This is so of even the
seemingly limiting case of the telephone conversation where no visual contact between the two
speakers (i.e. no videophone) features in the conversation. On the telephone, we attend to many
aspects of the other person’s spoken voice that are not necessarily part of language – e.g. its
lexicogrammar and its phonology – in the narrow sense. Such resources include voice quality,
breath control, rate of speaking, hesitations and pauses. Speakers and listeners are not always
aware of these resources or even that they may be considered as meaning-making resources. Yet,
just like the choice of words, intonation and so on, these resources can be, and indeed often are,
modulated variously by speakers to create specific meaning effects just as listeners can attend to
the speaker’s use of them, again to varying degrees of conscious awareness, as they interpret the
speaker’s meanings in relation to what is said and how. It is no accident that in many call centres
telephone operators are trained to attend to and interpret the significance of often subtle cues
in the voices of the potential clients whom they never see. In this sense, the telephone voice is a
multimodal semiotic resource that some people learn to cultivate and use to great effect. Thus,
from the perspective of both the system and the instance, the resource integration principle is
essential when attempting to understand how meanings are created in multimodal texts. A
further example will suffice here: the joint verbal-visual thematic relations created by multimodal
displays, tables and diagrams in school science textbooks are a semiotic-cognitive resource
through which the specialised meanings of the scientific topic can be stored, accessed, activated
and further developed by users of these books in teaching and learning activities (see Chapter 2).
(b) The meaning-compression principle refers to the effect of the interaction of smaller-scale
semiotic resources on higher-scalar levels where meaning is observed and interpreted. Take the
London Transport (LT) leaflet in Figure 1.4. Familiar shapes such as the fried egg, the mush-
room and so on, and the rhythmical, patterned relations among them reduce and compress
more complex problems on larger-space time scales to a set of patterned relations between
familiar visual shapes and minimal verbal text. Visual scanning of these patterns may take mere
seconds and places no burden on processing. These patterns are, in turn, contextually integrated
with the complex task of encompassing in one’s mind the vast reality of the city of London
and the fare structure of its urban transport system. The meaning-compression principle makes
this task manageable in this text by compressing and reducing the complexity of the higher-
scalar reality which is being interpreted to a series of rhythmically patterned and interrelated
visual shapes and images on the here-now scale of visual scanning. Readers are thus able, quickly
and effortlessly, to process these visual patterns obtaining the necessary information about the
LT system and its fare structure. The meaning-compression principle is a principle of economy
whereby patterned multimodal combinations of visual and verbal resources on the small, highly
compressed scale of the leaflet provide semiotic models of the larger, more complex realities
that individuals have to engage with. In this way, a given combination of resources compresses,
in its patterned arrangements, meanings which can be unpacked and integrated into a more
specified semiotic configuration on a higher level of textual organisation.
20 Multimodal Transcription and Text Analysis: Chapter 1

known example of a contemporary text in which language, whether written or


spoken or both, will invariably not be the only source of meaning. Thus reports of
the rulings made in many Western countries by disciplinary councils duly authorised,
under the self-regulatory advertising codes of practice, to ban a particular
advertisement from being disseminated will often indicate that the judgement was
made on the basis of the unacceptable language used but will also indicate that the
advertisers themselves, in their defence, argued that, for them, the use of language
was of no consequence at all (cf. Cook, 2001 [1992]: 54). In many such cases, the
advertisers may well be at least partially right.
There are many printed and TV advertisements which make a conscious
effort to restrict or do without language altogether or which more subtly and
deceptively use language as the apparent primary source of meaning, whereas from
the advertiser’s standpoint, the most important meanings are made, possibly more
surreptiously and covertly, by other modalities such as clothes (Barthes, 1960),
music and sound (Van Leeuwen, 1999), gesture (Birdwhistell, 1952, 1972 [1961];
Kendon, 1981; McNeill, 1992), movement (Birdwhistell, 1952, 1972 [1961];
Kendon, 1981), posture (Scheflen, 1972, 1973) and spatial relations (Hall, 1972
[1963]). In the Zombie text, all these resources are brought together in a special way,
namely to make the distinction between my, your and others’ space salient, a matter
which, as we have seen, is also often foregrounded in cartoons.
Quite apart from general issues of cultural awareness and social attitudes to
multimodal texts that they raise, examples such as the Marmaduke cartoons
(Figures 1.2 and 1.3) make it clear that multimodal texts combine and integrate the
meaning-making resources of various semiotic modalities – language, gesture,
movement, visual images, sound and so on – to produce text-specific meanings.
We will be at pains in this book to reject the view that a particular resource will have
automatic pre-eminence in a particular genre and will suggest that an important
skill in multimodal analysis, one that transcription helps to pinpoint, is to recognise
typical patterns of resource integration – but also the many variations within these
typical patterns (see Figures 1.1, 1.2 and 1.3).
When we assert, as we have done above, that in actual fact, no text is, strictly
speaking, monomodal (Lemke, 1992; Thibault, 1997a: 342) and that the
multimodal principle is pervasive in all texts to a greater or lesser extent, we need
to recall that this view is not necessarily the prevailing view in linguistics at the
present time. Indeed, transcription practices still focus on spoken discourse privi-
leging the linguistic dimension of the text’s meaning-making resources and
consider other resources such as gesture, phonological prosodies, gaze, movement
and so on as paralinguistic, rather than as fully-fledged semiotic resources in their
own right with which language is often codeployed. In such approaches, non-
linguistic resources are often seen as non-verbal accompaniments to language
annotated as a running commentary in brackets alongside the verbal transcription.
Cluster analysis and the transcription of static multimodal texts 21

In keeping with the view presented here, it may well turn out to be the case that lan-
guage cannot be adequately described and theorised as a system in its own right.
Rather, language and other semiotic resource systems, such as gesture, body move-
ment and gaze, are likely to be parts of a still larger system which may well turn out
to look very different from any of these components taken separately. Many tran-
scriptions of texts seem to have more in common with literary rather than linguistic
traditions resembling playwrights’ asides of the in-a-soft-voice-glancing-at and beckon-
ing-off-stage type. This type of stage instruction is itself an instruction for a text’s
recontextualisation in other semiotic modalities in a performance text (see Inset 14,
pp. 175-177). The transcription procedures discussed in this book seek to reveal the
multimodal basis of a text’s meaning in a systematic rather than an ad hoc way. They
are truly part of a discourse analysis, rather than a literary analysis, tradition. In this
sense, multimodality refers to the diverse ways in which a number of distinct
semiotic resource systems are both codeployed and co-contextualised in the making of
a text-specific meaning. Rather than separate communicative channels which are
ancillary to, or which in some way supplement a primary linguistic meaning, the guid-
ing assumption is that the meaning of the text is the result of the various ways in
which elements from different classes of phenomena – words, actions, objects,
visual images, sounds and so on – are related to each other as parts functioning in
some larger whole.
Meaning making is the process, the activity of making and construing such
patterned relations among different classes of such elements. The term multimodal
thus recognises that, from an analytical standpoint, it is important and necessary to
distinguish different classes of meaning-making resources rather than group them
together as members of some more general class which fails to specify their individ-
ual characteristics. Such a class would be too general to be really useful. By the same
token, the term multimodal recognises that different kinds of resources are combined
to produce an overall textual meaning. As the Marmaduke cartoons show, the
meaning of the text is not the result of merely adding the meanings of one resource
– language, say – to those of another, such as the visual image. Meaning is multi-
plicative rather than additive (Bateson, 1987 [1951]: 175; Lemke, 1998). This funda-
mental property emerges clearly when we examine texts in detail through
transcription. In this respect, we may now turn our attention to leaflets designed to give
information on transport services to the public. This might be assumed to be a field
where ideological manipulations are absent. Nothing could be farther from the truth.

1. 2. Cluster analysis and the transcription of static multimodal texts

Identifying multimodal clusters (see Inset 5: Clusters and cluster analysis, p. 31) is par-
ticularly useful when describing multimodal texts, not least because it helps
exemplify some of the principles we have so far described with reference to some
22 Multimodal Transcription and Text Analysis: Chapter 1

Inset 4: Metafunctions

� Halliday (e.g. 1979) posits that the content stratum of language, its lexicogrammar and
semantics, is internally organised in terms of a small number of very general
functional regions that are simultaneously interwoven and configured in the internal
organisation of lexicogrammatical form, corresponding, respectively, to the
experiential, interpersonal, textual and logical dimensions of linguistic meaning.

� Experiential meaning interprets the phenomena of the world as categories of experience.


In language, the clause analyses experience as a configuration of semantic functions, viz.
different classes of process (actions, events, states and so on), the participants taking part
in these and the circumstances that qualify them. Experiential meanings are realised in
the clause as particulate or part-whole structures based on the principle of constituency.
Interpersonal meaning is concerned with language as interaction (c.f. speech acts, dialogic
moves), the expression of attitudinal and evaluative orientations (modality) and the
taking-up and negotiating of particular subjective positions in discourse. Typically,
interpersonal meaning is expressed by field-like prosodies and is scopal in character
(McGregor, 1997: 210-213). Textual meaning is concerned with the organisation of lan-
guage into semantically coherent text and the relation of text to its context and with the
distribution of information in text, continuity of reference and lexico-semantic cohesion.
Textual meanings are realised by wave-like periodic movements that culminate in peaks
of prominence. Logical meaning is concerned with relations of causal and temporal inter-
dependency between, say, clauses. Logical meanings are realised by recursive structures
which add one element to another so as to build up more complex chain-like structures.
These four kinds of meanings are illustrated in relation to the example on the facing page.

� In language, experiential relations are based on part-whole or constituency relations ; a


given unit (e.g. a participant) has a function in a larger whole such as the experiential
structure of the clause. In the first clause in the example in the Table opposite, the
pronoun you realises the experiential function Sayer in relation to a verbal Process and
a second participant role, the Verbiage, or what is said. Interpersonal relations are
prosodic or scopal; in the example in the Table, the entire clause complex, consist-
ing of two clauses, is a declarative proposition. The choice of declarative mood holds
the entire proposition in the two clauses within its scope and modifies it so as to indi-
cate that the writer of this utterance is presenting it as a proposition which is, for
example, asserted or affirmed. The mood component of the proposition is realised
in just one part of the second clause, yet the meaning declarative extends over the
entire two-clause complex. Dependency relations are based on part-part relations. In the
present example, the first clause hypotactically expands the meaning of the second
clause in the sequence by specifying a condition on the meaning of the second
clause, which is the dominant one in this complex. Textual meaning is based on both
structural and non-structural relations which create linking relationships between the
different parts of the whole. For example, the two mentions of you create a
coreferential tie which establishes the common identity and reference of this
participant. Furthermore, you, in both clauses, is the thematic point of departure for
the further development of each of the two clauses as a message about you.
Inset 4 23

Inset 4: (continued)

� It is becoming increasing evident

Residue: non-finite part of proposition


to speak Chinese

Phenomenon
that other semiotic systems such as
depiction, gesture, sign, move-
ment, music and so on, have
metafunctional characteristics.
This does not mean that the very

RESULT/CONSEQUENCE
different characteristics and mean-

Rheme
ings of these systems are being

Process: mental:
reduced to forms of analysis more

cognition
learn
appropriate to language. Rather, it
shows that all semiotic systems
have in common some very
general kinds of meanings though
the specific meanings and their

operator
Finite:
modal

Mood element
can

modes of realisation will differ


PROPOSITION: DECLARATIVE

according to the particular


semiotic system.
Subject

Theme
Senser

‘you’ --------------------------------------------------------------------- ‘you’


you

� It also shows that there are general


metafunctional principles of

[then]
organisation which provide a basis
for the integration of different
modalities in multimodal texts.
Verbiage

Rather than speaking of different


this,

(co-reference tie)

semiotic channels or ‘codes’, the


metafunctional basis of semiosis
Process:

Rheme

suggests that different resources are


Verbal
say

integrated on the basis of the mean-


ings which are created through the
CONDITION

synergy of and co-contextualising


operator
Modal
can

relations between modalities. The


meaning of multimodal texts is the
result of the often complex ways in
Theme:
topical
Sayer

which different resources work in


you

partnership. This is more appropri-


ate than saying that each modality
Theme:

makes its own meaning, separate


Textual
If

from the meanings made by other


modalities. In this book we use the
multimodal transcription to show,
Metafunction

Interpersonal

Dependency
Experiential

in great detail, how the


Textual

Logical

metafunctional principle applies in


a systematic way across modalities.
24 Multimodal Transcription and Text Analysis: Chapter 1

sample texts and some sample multimodal transcriptions. We have already consid-
ered one type of multimodal cluster, the speech and thought bubbles of cartoons,
and suggested how this type of cluster is customised according to context in keep-
ing with Halliday’s principle that texts are units of meaning in specific contexts (see
Inset 2 : Text, p. 3). Bearing this in mind, we can now take a close look at the London
Transport (LT) public service leaflet shown in Figure 1.4 (top and bottom part). This
text was used to guide and assist Londoners as regards fare structures at the turn of
the millenium (see also www.tfl.gov.uk/tfl/ ).
How can we go about describing it in the light of what we have said so far?
Could we, for example, simply say that it takes the form of a six-page leaflet which,
when folded, presents two identical cover pages (Pages 2 and 3) functioning as the
front and back covers (see top part of Figure 1.4)? Or should we focus on the fact
that the cover page in question announces the text’s basic thematic content: a new
two-fare structure for London buses? When the leaflet is unfolded, the reader dis-
covers the details of the fare structure, which are expanded and contextualised on
the reverse side of the leaflet (Pages 4, 5 and 6 in the bottom part of Figure 1.4).
The final page on the coverside of the leaflet (Page 1 in Figure 1.4, top part) pro-
vides a second thematic expansion with the description of Saver 6, a new but
different kind of fare. In answer to our question, we may take as our starting point
the observation that this text is organised in terms of a series of multimodal spher-
ical or hemispherical clusters containing some striking combinations of visual,
verbal and spatial resources to explain and justify a new simplified fare structure
for London buses in the new millennium. The fried egg, the mushroom, the
wastepaper basket, the weight, the cup of tea are thus more than eye-catching add-
ons and cannot be eliminated without substantially changing the text’s meaning.
To see this, put your hand over the various textual objects on the cover page
and you will notice that, without the tight integration between the visual, the verbal
and the spatial, in particular the lines linking abstract numbers to concrete objects,
it would not be possible to grasp either the principle of the division of London
into two new tariff zones or accept the ‘social message’ that such a division is con-
sistent with life and travel in London. Still unconvinced that the reading process is
guided by the meaning-multiplying effects of the resource integration principle?
Then try rewriting the leaflet as a piece of written discourse in a way that matches
the concision of the fried eggs, the mushroom, the wastepaper basket as visual
metaphors for London as a physical, economic and social entity. In saying this we
are looking at the resource integration principle from a slightly different perspective,
the perspective (further discussed in Chapter 2 ) of meaning compression. By
meaning compression (see Inset 3 part b , p. 19) we mean the power of multimodal
texts to allow users to identify meanings from combinations of resources in
context with the utmost efficiency, and, in particular, with much greater efficiency
than would have been the case if a different set of resources had been used.
Cluster analysis and the transcription of static mulitimodal texts 25

Figure 1.4: A London Transport leaflet (top part: cover side; bottom part: reverse side)
26 Multimodal Transcription and Text Analysis: Chapter 1

We could go on analysing the LT text’s meanings and meaning-making


processes indefinitely, pointing out the interplay between the various meaning-
making and meaning-compressing resources it deploys: the LT logo, the ellipted
visual objects, the spatial arrangement that creates harmonious links between dis-
cordant objects. However, while meaning compression is a fascinating aspect of
multimodality, we need to delay its discussion to subsequent chapters. Further
discussion at this stage would entail losing sight of a basic question, which is quite
simply this: how can we analyse the multimodal texts and genres that characterise
contemporary society in a systematic way that brings to light the characteristics and
underlying organisational principles of multimodal texts?
One question implicit in the cluster analysis approach that we have outlined
above is where, in terms of the reading process, does the text we are considering –
and more generally many other texts belonging to this and other genres – start and
end? With the title at the top of the page and with the colourful telephone number
on the last page? There are in fact two reading paths that we need to consider: a ver-
tical one and a horizontal one. So that if we replied to the questions posed by sug-
gesting a typical linear left-right, top-down reading, we would be overlooking the
fact that meaning making in many multimodal genres will often follow a more com-
plex reading path than the one we are used to when we read a novel. We can use the
term cluster hopping (see Figure 1.5a) to describe the fact that the reading process is
discontinuous, defined in terms of relationships of often overlapping clusters which
require the reader to ‘hop’ backwards and forwards. Rather than following a definite
linear sequence, the reader can jump to different clusters of items on the page in a
fixed sequence. One function of transcribing the organisation of multimodal texts
in terms of clusters is to understand that the structure of the reading process needs
to be defined in terms of principles other than linearity (left-right; top-down) and
which include such principles as periodicity, namely structures that repeat themselves
in a patterned way and that allow variation within a fixed framework.
This is manifested in the LT text in terms of circular structures, with the
result that, on the cover pages (Pages 1 and 2), the reading process is intended to
follow both a vertical and a horizontal path. All this implies that we have to find a
way whereby multimodal text analysis can adequately describe the meaning-making
processes of a multimodal text which includes the reading process, i.e. the readers’
interaction with a text. One answer to this issue lies in the development of cluster-
oriented multimodal transcriptions, an example of which is given in Figures 1.5a and
1.5b. In this interpretation, the transcription attempts to describe the multimodal
clusters that make up the LT text. Figure 1.5b gives a simple multimodal
transcription of the cover page of the LT leaflet. Apart from suggesting just how
complex the cover page is – it consists of six primary objects, or clusters, and an
overall total of some 40 components – it helps us understand some of the analytical
functions that transcriptions are designed to fulfil. A major function of this micro-
Cluster analysis and the transcription of static mulitimodal texts 27

transcription is thus to treat the page as a composite of six primary objects or clusters
(Figure 1.5b) and to look at their composition, including a characterisation of the
relationships existing within these objects. The cluster-oriented macro-transcription in
Figure 1.5a, on the other hand, records the relationships between the primary objects.
A different approach could, in theory, have been adopted in which the notion
of cluster is eliminated. This, however, would go against a basic principle explored in
this chapter, namely that multimodal texts are typically made up of partly
prefabricated meaning-making units or ‘primary genres’ (see Inset 6: Bakhtin’s
distinction between primary genres and secondary genres , p. 43). Of these, the first
cluster, recognisable as a title, and the last, recognisable as a combination of a slogan
and a logo (a textual subunit sometimes called a slogo ), are all but obligatory in a pub-
licity leaflet, as they respectively announce the text’s basic theme and identify a
particular company, association or institutional body, the equivalent in a novel of the
author’s name and other aspects of its identity, such as the title and name of the pub-
lisher. The remaining objects have no traditional name and are less obviously
prefabricated vis-à-vis the others, in the sense that it would be difficult to cite other
texts or types of text in which these specific combinations appear (but see Figure
1.5a). Like the title and the slogo, they are immediately recognisable as functional
units within the text, i.e. units which, though analysable in terms of subcomponents
with many potential meanings, nevertheless share the characterisitic that they are a
basic unit of meaning in a specific text.
This does not mean, of course, that the components of each primary object
cannot be the source of a specific meaning in another text, i.e. function as a
primary object in their own right. Thus, in a different context, one of the lines that
links the fare to one of the objects representing London might, for example, rep-
resent a marker of a division between various sections in a chapter or in a web
page. Nor does it mean that the components are ‘meaningless’ but rather that in
this text their meaning is not primary. Instead a component is subordinated in such
a way as to function as part of a cluster of resources. It is the cluster and the rela-
tionships between clusters, rather than the individual parts of individual clusters,
that make meaning in a specific context.
The transcription given in Figure 1.5b also helps make explicit other choices
that have been made in the construction of the LT text and in the way it makes its
meaning. Most obviously, repetition: four similarly-shaped round objects have been
selected which systematically change their size as we move through the text: they
get bigger and bigger as we go down the cover page, all of which gives the reader
a visual clue as to the fact that there is a vertical reading path to follow. In other
words, when defining the relationhips between the various objects, the
transcription helps cement the idea that the reading process on the cover page is
intended to follow both a vertical and a horizontal path. The lines linking the fare
structure to the concentric circles follow a zig-zag path through the text, moving
28 Multimodal Transcription and Text Analysis: Chapter 1

Secondary thematic Primary thematic Primary thematic (repeated)

1
7

8a
2

3
5
9

8b
6

10

This macroanalysis of one part of the LT leaflet reconstructs the links between clusters and the
cluster hopping that the reading/viewing process involves when attempting to ‘decipher’ the text’s
meaning chains. The transcription thus uses numbered boxes to identify the various clusters as well
as dotted lines to indicate the links between them. In the central panel, corresponding to the cover
page, Clusters 1 and 6 are respectively the start and end of the first meaning chain. In terms of
cluster type, they are respectively a title and a slogo whereas Clusters 2 to 5 are theme-expansion
clusters i. e. clusters which serve to develop the two-fare theme. They are also thematically inter-
related metaphors for London. Clusters 2 and 4 are, in part, repeated on the reverse side (see Figure
1.4) where their indexical relationship (they stand in for the map of London) is made explicit.
Cluster 7 in the right-hand panel is a derived cluster in the sense that, by following the meaning
chain, the reader/viewer comes to understand that the collective function of Clusters 1-6 is to spell
out the details of the two-fare structure. This derived cluster occupies a central ‘empty’ space in the
panel, a position that helps cement the fare details in the reader/viewer’s mind. Cluster 8 in the left-
hand panel introduces a second meaning chain consisting of three clusters. It is, however, discon-
tinuous, divided into two by Cluster 9, a reworking and partial repetition of Cluster 5. The function
of Cluster 9 is to link up the two meaning chains, thereby underscoring the relevance of the primary
thematic to the secondary one. Cluster 10 provides information about the production of the leaflet.
When unfolded, the three panels (or pagelets) form a macropage read both vertically (in which case
the regularity and details of the fare structure are foregrounded) and horizontally (in which case the
different types of social venues reached by buses come more into focus). The same processes are
at work on the ‘reverse’ macropage (see Figure 1.4). The text differs from the Chesapeake Bay text
(Figure 1.6) where the clusters link up to create a more traditional, top-down reading path.

Figure 1.5a: A macro-transcription of part of the LT leaflet


Cluster analysis and the transcription of static mulitimodal texts 29

Cluster Textual resources used in the clusters


1. The title
Wordings: (1) Thousands of places. (2) Only 2 bus fares
Font: larger than any other font in the entire text
Spatial disposition: central
Punctuation: full stops after each of the wordings

2. The cup Wordings: Callouts: (1) 70p,


Overlay: (1) 1,589 cafés
(2) £1
(2) Only 2 bus fares (as above)
Visual Image: cup of tea consisting of 3 concentric rings:
(1) External ring: white rim of cup
(2) Central ring: brown ring of tea containing Overlays 1 and 2
(3) Inner ring: white/brown froth
Viewing position: from above
Ellipsis: about 40% of the rightmost part of the image
Vectors: two parallel lines linking Callout 1 to central ring & Callout 2 to inner ring
Spatial disposition: (1) rightward;
(2) callout: in the centre, overlay: on the right
Cluster size: smaller than the subsequent one

3. The weight Wordings: Callouts: as above;


Overlay: (1) 650 gyms (2) as above;
Markings: (1) 15.9 Kg (2) Lb
Visual Image: weight consisting, as above, of 3 concentric rings:
(1) External ring: black raised rim marked off by white line
(2) Central ring: black area containing as above Overlay 1 and 2
(3) Inner ring: a hole
Viewing position: as above
Ellipsis: as above but with the leftmost part deleted
Vectors and their mutual disposition: as above
Spatial disposition: (1) leftward; (2) callout: as above, overlay on left
Cluster size: larger than the preceding object

4. The basket Wordings: Callouts: as above; Overlay (1) 251,176 offices (2) as above
Visual Image: wastepaper basket consisting as above of 3 concentric rings:
(1) External ring: partly as above but with colour differences: silver raised rim
marked off by black border
(2) Central ring: wire mesh containing as above Overlays 1 and 2
(3) Inner ring: screwed-up paper
Viewing position: as above
Ellipsis: (1) as above but with the rightmost part deleted
(2) slighty ellipted at the top due to overlapping (Cluster 2)
Vectors and their mutual disposition: as above
Spatial disposition: (1) rightward; (2) callout as above, overlay on right
Cluster size: as above

5. The fried egg Wordings: Callouts: (1) 1.819 cafés (2) as above
Visual Image: fried egg consisting of 3 concentric rings:
(1) External ring: partly as above but with colour differences : golden brown rim
(2) Central ring: albumen containing as above Overlays 1 and 2
(3) Inner ring: egg yolk
Viewing position: as above
Ellipsis: (1) as above but leftmost part is deleted
(2) as above slightly ellipted at the top (Cluster 3)
Vectors and their mutual disposition: as above
Spatial disposition: (1) leftward
(2) callout: as above, overlay on left
Cluster size: as above

6. The slogo
Wordings: Making London simple
Visual Image: The LT logo
Spatial disposition: rightward orientation of textual objects

Figure 1.5b: A micro-transcription of the cover page of the LT leaflet


30 Multimodal Transcription and Text Analysis: Chapter 1

first to the right of the first cluster, then to the left of the second cluster, and so
on. When we see this, we begin to realise that the cover page, however asymmetri-
cally, resembles a table in which the columns and rows are functionally, though not
formally, present. What is special about this table is that the most important column,
the central one, contains wording that seems to extend from the leftmost and right-
most parts of the page. In fact, the linguistic elements are organised in such a way
that the central part of the text provides the details of the actual fares while the top-
most, leftmost and rightmost parts provide the principle of a two-fare structure.
Thus the part of the LT text transcribed in Figure 1.5b, and to a large extent
the entire LT text, may be construed as a pseudo-table that can be read in a variety
of orders that combine the vertical and horizontal readings that a table makes avail-
able. We will discuss tables and reading paths in much greater detail in Chapter 2 in
relation to economics and science texts and in Chapter 3 in relation to the web
page. For the moment we will simply observe that readers can go down the cen-
tral column of the LT text or can read it from left to right along the rows.
Alternatively, the text can be read in a stepping-stone fashion, jumping in a zig-zag
way following a pathway that respects the default way of reading in Western cul-
ture, namely from left to right and from top to bottom, but requiring the reader to
do so in a series of jumps. It is no coincidence that the layout of the page, which
includes the fact that four round objects are grouped together in pairs (two on the
left, two on the right), is carefully arranged to encourage the reader to be aware that
the text will not make its meaning entirely through language but that other reading
skills need to be at work. Not all leaflets are inspired by the principle of tabular or
pseudo-tabular reading. Nevertheless, it is surprising just how many are.
One descriptive principle followed in this chapter is that there will be many
occasions where, because of the limitations of the page structure, the meaning-
making processes of a text cannot be captured in a single transcription and will
instead require a series of transcriptions to be made that give different ‘zooms’ of
the text. Thus, while Figure 1.5a functions as a macro-transcription of a single page,
Figure 1. 5b is a micro-transcription, in that it reconstructs the micro-structure of the
same page in all its manifold detail. All this reflects the fact that the LT text, like all
the texts we analyse in this book, is a multimodal text which integrates many
meaning-making resources. It is not a linguistic text with pretty visual ‘add-ons’ but
one in which visual, spatial and linguistic elements are carefully and tightly integrated.

1.2.1. Multimodal transcription and questions of genre


In this book, multimodal transcriptions are ultimately based on the assumption that
a transcription will help us understand the relationship between a specific instance
of a genre, i.e. a text, and the genre’s typical features. Multimodal transcription
techniques can be used to compare different texts from the same genre with a view
to highlighting their different functions within the genre. In this respect, the
Multimodal transcription and questions of genre & Inset 5 31

Inset 5: Clusters and cluster analysis

 Our use of the term cluster refers to a local grouping of items, in particular, on a
printed or web page (but also other texts such as manuscripts, paintings and films).
The items in a particular cluster may be visual, verbal and so on and are spatially prox-
imate thereby defining a specific region or subregion of the page as a whole. The
items in a cluster are functionally related both to each other and to the whole to which
they belong as parts. For example, in the Nasa Kids website in Chapter 3, the Nasa
logo and the masthead (see Figure 3.2) are two functionally related components in this
sense. The logo specifies the institutional source of the meanings of the website and
the masthead extends these meanings by connecting the institutional logo to the more
specific, children-based concerns of the Nasa Kids home page. Another example
from the same page is Cluster 17 which consists of two functionally related parts of
a larger whole, i.e. the two components of the activity sequence which is realised
when the user inserts a search item in the search engine and then clicks on ‘go’. Once
again, Cluster 17 is characterised by the spatial proximity of the two items in keeping
with the definition of cluster given above (which may differ from those given by others
e.g. Kok, 2004: 135-136). Clusters are often partly prefabricated structures and as such
frequently enact primary genres (see Inset 6 : Bakhtin’s distinction between primary
genres and secondary genres , p. 43). This is the case with the multimodal search engine
cluster/primary genre implementing a Question ^ Response sequence through visual and
kinetic resources as well as linguistic ones.

 Cluster analysis helps us to see how larger-scale items and the relationships in the
visual field contain smaller-scale ones just as smaller-scale ones such as clusters are
contained within larger ones. A cluster is a locus of inclusion for a small-scale
functional arrangement of items included in some larger-scale arrangement (includ-
ing superclusters see Chapter 3 e.g. Figure 3.1). Thus, when we use the term cluster to
define a local grouping of multimodal items which are part of a larger unit in which
they function, our use of the term presupposes that clusters are in some way func-
tionally related to each other. As mentioned above, clusters of items and objects on,
for example, a web page are small-scale arrangements of items which are nested
within larger wholes. Some clusters or some items in some clusters may move while
others may remain inanimate. Other clusters may create intertextual relationships (see
Inset 8 : Intertextuality, p. 55) between texts as happens with the virtual magnifying
glass (Figure 1.10) which, by sliding over and magnifying parts of Leonardo’s Notebook,
contributes to the goal of linking various parts of a rare manuscript to an illustrative
web-based commentary. The notion of cluster thus aims to show that the items and
objects displayed on screen, the web page or on the printed page are not separate
items but are instead connected to other items. They are nested within larger struc-
tures and have relationships with some items with which they are proximally
connected more than others. The visual field that is displayed by the screen or the page
as a whole can be subdivided into various parts such as top, bottom, right and left. The
visual field displayed on the screen or the page does not consist of discrete, separate
items but is instead completely filled. Cluster analysis is one tool for understanding the
ways in which it is filled.
32 Multimodal Transcription and Text Analysis: Chapter 1

Chesapeake Bay leaflet (dating from 1989) shown in Figure 1.6a bears a number of
striking similarities to the LT text: in just the same way that the latter explains how
the London traveller is spared the hassle of a complex fare system, so travellers on
the East Coast of the United States are shown how they can be spared the hassle
of a long inland detour between New York and Virginia Beach (see www.cbbt.com ).
Although slightly larger and consisting of 12 as opposed to 6 pagelets, it is also
folded to provide identical cover pages, or rather almost identical cover pages,
since, as the panel in Figure 1.6b shows, the photograph in the central cluster has
changed: the front cover depicts the bridge aspect of the bridge-tunnel complex
while the reverse cover depicts the tunnel. The question arises as to whether we
can use the multimodal transcription to characterise the differences and similari-

Figure 1.6a: Parts of a leaflet for the Chesapeake Bay tunnel-bridge complex
Multimodal transcription and questions of genre 33

Reverse cover page: clusters Other clusters on the reverse side

Cluster 1. The title Cluster 6. Tourist advertisement

Cluster 2. The index

Cluster 3a. The visual: tunnel terminal

Cluster 7. Travel and tourist advertisement

Cluster 3b. The variant: bridge (see Fig 1.6a )

Cluster 4. The slogan + logo cluster

Cluster 8. Facilities description

Cluster 9. Special regulations

Cluster 5. The definition

Figure 1.6b: Clusters making up the reverse cover of the leaflet


34 Multimodal Transcription and Text Analysis: Chapter 1

ties between specific instances of this subgenre, i.e. whether we can compare two
or more texts in an approach tendentially concerned with recurrent features in
textual genres. The macro approach to multimodal transcription certainly helps bring
out subsidiary thematics; in the case of the leaflets examined here, it brings out the
social aspects associated with the transport service offered. Thus, while there are
similarities in the thematics of the texts, there are, on the other hand, striking dif-
ferences in the way that these thematics are presented, most obviously the exten-
sive use of concrete contextualised and illustrative maps and photographs in the
latter text, while the map and the photographs are much more abstract and decon-
textualised in the former.
In the LT text, the map is the final step in the progression of circular objects
(e.g. the mushroom) which continue on the reverse side of the leaflet. In this case,
the map has a primarily textual function in that it helps clarify that the dual-circle
objects (corresponding to the two rings of inner and outer London) contribute to
the text’s overall symmetry, whereas in the Chesapeake Bay text the function of the
map (not shown) is primarily interpersonal in nature: it is designed to orient the
reader on his or her journey and to persuade him or her that the route is a time-
saver. The multimodal transcriptions of the two leaflets thus show how the
organisation of different multimodal clusters and the relations between them pri-
oritise different goals. Both leaflets are designed to inform travellers but differ in
their mix of essential public service and personal entertainment.
In discussing the different goals of these two texts we have hinted at the fact
that meaning making in multimodal texts can be characterised in terms of different
overall functions (O’Toole, 1994, Kress, Van Leeuwen, 1996, Baldry, 2000,
Thibault, 2000a, Baldry, Thibault, 2001). In 1.4, pp. 38-44 we will see how texts of
all kinds can be systematically described in terms of the metafunctionally-
organised meanings that they make (see Inset 4 : Metafunctions , pp. 22-23). In other
words, we explore the interplay between the metafunctions more rigorously than we
have so far, given the need in subsequent chapters to use transcription techniques that
link up the relationship between resources and metafunctions in a systematic and
highly detailed way. However, before we do this it is time to take another, more detailed
analysis of cartoons. Specifically, we will take a closer look at the conventions cartoons
use and the way their textual properties can contribute to interpersonal meaning.

1. 3. Textual properties of short printed cartoons

Cartoons are a well-known example of multimodal narratives, a good demonstra-


tion, as we have already seen, of how meaning making depends on the integration
of semiotic resources. The use of caricature, the cartoon’s main source of comic
effect, requires the cartoonist to strip down detail to its bare essentials and exag-
gerate or deform them in some way. Thus cartoonists, particularly those concerned
Textual properties of short printed cartoons 35

with the printed cartoon, have a much more limited set of resources available than,
say, a 15th century artist from Central Italy or a 16th century Dutch portrait painter,
where the very nature of the genre means that the artist can faithfully represent the
tiniest of meaningful details. This constraint means that special sets of conventions
have arisen which experienced readers of the cartoon genre have learnt to recog-
nise as having special meanings. Some of these are apparent in the Lupo Alberto
cartoon (Figure 1.7). The object above the henhouse in the first vignette, for
example, is, following Peirce (1985), both iconic in that it resembles the crescent
shape of the moon and indexical in that it implies a cause-effect relationship,
namely that if the moon is present, then the action will probably be taking place at
night. Indeed, it may be said that a general trend in cartoons is towards iconicity and
indexicality, in that one or more resources systematically ‘stand in’ for others: in
printed cartoons, ambient sounds are typically expressed through a combination of
language and visual and graphic resources, the latter instantiated not just in the shape
of specialised lettering (zigzagged, curved and inclined as opposed to straight or hor-
izontal) but above all, as the example of ‘clang’ in the third vignette shows, in terms
of amusing chain-like crescendos (Thibault, 2002).
Vectors constitute one important subset of these specialised conventions. A
vector is essentially a line which has properties such as dynamic force, directionality
and orientation, like those emerging from Lupo Alberto’s body. Some of these are
straight lines, others a series of curves; but, by transforming space and movement-in-
space into visual objects, both suggest the position that his body had occupied a
split second before; others, instead, take the form of drops of sweat suggesting not
only the direction and speed of his movements but also the emotional stress
involved. Other lines are not physically present but are instead implied. Thus a gaze
vector such as the one in the second vignette, which links the hen to Lupo Alberto,
is an invisible line; in this case the hen, who wants to go dancing, is looking at Lupo
Alberto checking to see that this is also his intention; her pupils are positioned in
the bottom part of her eyes and are dilated, suggesting the intensity of her desire
(she loves him intensely). Whether these feelings are reciprocated or not is beside
the point (the reader might well think that his real purpose, as a wolf, is to gobble
her up, though fans of this cartoon know that Lupo Alberto has in fact lost his
carnivorous instincts). Instead, the important point to note is that the combination
of the hen’s discourse, the position and dilation of the pupils of her eyes and the
intensity and direction of her gaze are indicative not just of a physical relationship
but are in fact a means through which attitudinal stance is expressed. Similarly, the
reader who does not know about Lupo Alberto’s peculiar status as a wolf might
understand that the hen’s feelings for Lupo Alberto are not reciprocated, by virtue
of the fact that his gaze is firmly fixed on his escape route and not on her. The reader
who does know all about Lupo Alberto, as a result of his or her experience of other
Lupo Alberto cartoons (see Inset 8: Intertextuality , p. 55; see also www.lupoalberto.it ),
36

Interpersonal aspects of the cartoon: modalisation is not just a linguistic feature since attitudinal stance can be built onto gaze. The
transcription given below of the three salient central clusters involving either the wolf and the hen or the wolf alone accounts for this.

© Silver/McK
Multimodal Transcription and Text Analysis: Chapter 1

Sneakiness: wolf’s movement is Discrepancy: non-amorous stance Orientation of Wolf: off screen: indeterminate
modalised as stealthy through gaze of the wolf (gaze forward); amorous
(eyes in corner) stance of hen (pupils dilated, gaze Orientation of Hen: Depicted World:
vector on wolf) Engaged: Object: Inside personal space

Figure 1.7: Lupo Alberto and attitudinal stance


Textual properties of short printed cartoons 37

interprets the clenched teeth, the sweat pouring off his back and the forward-looking
gaze as resources that indicate that he is scared stiff that he is about to be shot.
By saying that the various parts of the wolf’s and the hen’s bodies exist in an
attitudinal, as well as a physical, relationship with each other, we are effectively say-
ing that the various visual elements in the text are modalised to indicate attitudinal
and evaluative stances (Kress, Van Leeuwen, 1996). Visual elements can be
modalised just as much as linguistic elements. While in the grammar of the English
language it is often pointed out that attitudes are expressed typically, though by no
means exclusively, through modal verbs (I do love you; I can’t go on running like
this), in visual grammar the judgements made by the protagonists are modalised,
for example, through the caricatural manipulation of body parts: Lupo Alberto’s
angled eyes in Vignette 1 change from a confident and even mischievous I’m-going-
to-sneak-up-without-anybody-seeing-me attitude to his whoops-I’m-in-the-shit-now
attitude in Vignette 3.
The visual features indicating movement produce different attitudinal
stances in cartoons – sneaky, cautious, confident (think of the way Asterix is por-
trayed) – while the modulation of the visual image produces a different feel –
sensuous in the case of a perfume advertisement, naturalistic in the case of the
maps and photographs in the Chesapeake Bay document, abstract in the case of the
map in the London Transport leaflet, hyperreal in the case of films which portray
dream sequences (see Inset 15: Gibson’s optic array, p. 192).
A study of modality thus needs to take into consideration how semiotic
resources other than language, such as movement, gaze and depiction, contribute
to the expression of attitudinal and evaluative meanings by increasing the range of
possibilities. Indeed, for the analyst, much of the pleasure in analysing cartoons lies
in understanding how the cartoonist handles the process of maximising meaning
through caricature while keeping detail to a minimum (see 4.8.2, p. 206).
A further and slightly more abstract appeal derives from how the cartoonist
is able to provide a coherent depiction, on the one hand, of the flow of time and
continuity of events (see 1.1 .1, pp. 7-15) and, on the other, movements within a
place or from one place to another and the cause-and-effect relationships that this
entails. In this respect, precisely because they are narrative structures, an important
property exhibited by printed cartoons is their dynamic nature. As we have seen
from the Marmaduke cartoons, they typically display an imaginative use of visual
devices to represent the constantly changing states of:

� the protagonists’ body positions (e.g. running, falling, bending);


� their changing facial expressions (e.g. surprise, pain, glee);
� their interaction with others (e.g. speech bubbles);
� their inner thoughts (e.g. thought bubbles).
38 Multimodal Transcription and Text Analysis: Chapter 1

Whatever media they use – printed page, TV, cinema or computer –


cartoons are also dynamic in another sense: each vignette represents a specific and
significant stage in a story that unfolds over time (cf. the Marmaduke cartoons,
pp. 8-9). A consequence of this is that dynamic features are typically foregrounded
against a static background. In the Lupo Alberto cartoon, for example, the wolf’s
creeping up on the henhouse (Vignette 1 ), his hasty escape with the hen (Vignette
2 ) and his determination to persist in his escape (Vignette 3 ), even though shack-
led by the traps that have been set, are all salient. In this respect, Figure 1.7 presents
a heavily simplified multimodal transcription that is still organised in terms of the
analysis of clusters – only this time we may distinguish between those which are
salient (or foregrounded) and those which are backgrounded. It is all part of the way
in which the creators of a multimodal text, such as a cartoon, attempt to orient read-
ers in terms of an assessment of the situation they are depicting.
The use of caricature in cartoons is a good illustration of the ways in which
visual and other semiotic forms can be shaped and deformed in particular ways in
order to achieve particular interactional ends, such as inviting the viewer to adopt a
particular attitudinal or evaluative stance with respect to the participants and events
that are depicted in the cartoon world. The different modalisations that we dis-
cussed above in connection with Lupo Alberto are scopal relations which operate
over a particular domain of the image and shape it in ways that are functional to the
achievement of particular interactional ends and effects. A scopal relation is a whole-
whole relation. That is, a given whole holds some other whole in its scope and oper-
ates on it and extends its influence over the whole domain that is so operated on
(McGregor, 1997: 66-67). Scopal relations spread over an entire domain and there-
fore shape and influence it accordingly. For example, the ‘stealthiness’ of Lupo
Alberto in Vignette 1 is achieved by means of the overall way in which his body and
the simulated movements of it are modified in this scene so as to create the effect
we are glossing here as ‘stealthiness’. This includes features on various scalar levels
which are nested within each other in the drawing of Lupo Alberto. They range
from small-scale features such as the corners of his eyes (see Figure 1.7), to larger-
scale features such as the lowering of his head, the way the head is thrust forward,
the movement invariants connecting his feet to the ground suggesting quick, light-
footed steps to avoid being heard as he approaches the henhouse and so on.

1. 4. Printed advertisements and their exemplification of the metafunctions

1.4.1. Metafunctions in relation to genre


We have discussed the metafunctions above in relation to the Marmaduke cartoons
(see 1.1.1, pp. 7-15). We now need to take a further step and see how metafunctions
relate to the notion of multimodal genres. We may illustrate the choices available vis-
à-vis genre in relation to the Boo Bear advertisement (Figures 1.8 and 1.9), a printed
Printed advertisements and their exemplification of the metafunctions 39

community service advertisement that encourages readers to give money to protect


an endangered species. Two different transcriptions, set out on the following pages,
are used to describe this text. While the first is concerned with analysis of the text in
terms of the metafunctions, the second provides a more detailed cluster analysis.
Taken together, they provide an analysis of the text in terms of two different per-
spectives: instance and type. These different aspects of text analysis are the object
of subsequent discussion (see Inset 6: Bakhtin’s distinction between primary genres and
secondary genres, p. 43; see also Inset 13: System and instance , pp. 172-173). They are
also important considerations here, the focus being on typical uses of resources
and how these typical uses are made possible (see Inset 3, pp. 18-19). We begin our
analysis by looking at how the metafunctions are typically enacted in visual genres.
With regard to experiential meaning, the structuring of the internal relations
between the depicted participants, things, the actions they perform and the settings
or circumstances in which they occur can, for example, be defined in terms of the
different processes Active and Passive, Non-Transactive, Transactive, Reactive created
by vectors (Kress, Van Leeuwen, 1996: 57). In the Reactive case, for example, the
visual process is a reaction rather than an action proper: the vector is formed by
the eyeline of the participant, by the direction of the gaze (assuming it is not
directed at the viewer). Such images can show either the person (or in this case a
humanised animal ) who is looking or the person reacting to something, and possi-
bly also the phenomenon which causes the reaction.
Interpersonal meanings are concerned with the social relations between interac-
tants as well as the evaluative orientations that participants adopt towards each other
and to the represented world of the text. Some of the main resources that function
to orient interactants in this way are: distance, gaze and perspective. In the case of dis-
tance, visual images may simulate closeness or distance between viewer and the par-
ticipants in the text in varying degrees that run from very close shot, i.e. less than head
and shoulders, to very long shot, i.e. the human body occupies a small part of the
picture via a series of gradations: close shot, i.e. head and shoulders; medium close
shot, i.e. body cut off at the waist; medium long shot, i.e. full-length figure; long shot,
i.e. body occupies approximately half the height of the image. Close shots express
intimacy and personality, as is made clear in the Boo Bear example: they allow the
viewer to relate to the person in the text as an individual whereas distance deperson-
alises and objectifies (Kress, Van Leeuwen, 1996: 130-135).
Finally, the most important textual/compositional resources when expressing
the textual metafunction would appear to be: (a) horizontal structure when pre-
senting visual information as Given or New and (b) vertical structure when pre-
senting visual information as Ideal or Real (Kress, Van Leeuwen, 1996: 186-202).
In the latter case, the contrast is between elements placed at the top of the picture
and elements placed at the bottom. Elements placed towards the top are presented
as Ideal ; those placed near the bottom are presented as the Real. The Ideal refers
40 Multimodal Transcription and Text Analysis: Chapter 1

The main body of the written text, which is a little


hard to decipher, reads as follows:

This is Boo, a Nor th American Bear


cub. Boo is an or phan. He escaped with
just a punctur ed lung when hunters
c r u e l l y s h o t h i s m o t h e r i n O n t a r i o.
Many bear species ar e cur r ently facing
extinction, their body par ts being sold
at enor mously high prices in the Far
East for use in traditional Chinese
delicacies. But it is the North American
bears that the poachers are really
gunning for. Hel p us by adopting Boo
for just £25. It’s a small price to pay
for a lif e. You can also adopt Tigers,
Chimps, Water Voles and Dor mice.

Experiential metafunction: Participant-process relation: a single Participant, the bear, and a


vector extending from the eyes of the bear, out of the frame towards the viewer. The Vector
realises the Visual Process – the act of looking at the viewer.

Inter personal metafunction:


Gaze: there is direct gaze establishing eye contact between the bear and the viewer of the text,
realising a visual demand.
Distance : there is a medium-close distance establishing the possibility of an intimate and personal
relationship with the bear (a bear hug).
Perspective: the horizontal angle is frontal rather than oblique, thereby locating the bear within
the social world of the viewer – whereas oblique angles detach the viewer from the represented
world; the vertical angle is such that we are put on the same level as the bear, neither above, nor
below, thereby establishing a relationship of equality and solidarity – looking down would make
us more powerful than the bear, looking up less powerful.

Textual metafunction:
There is left-right organisation – the bear is on the left, presented as the Given, i.e. known,
familiar or taken for granted. The verbal text is the New presented as that which the text is
about. There is also top-bottom organisation: this has to do with the Ideal/Real distinction.
The child’s imaginary world of the teddy bear and the moral appeal to save bears in the main
heading is the Ideal. More specific concrete information is presented as the Real in the bot-
tom bar.

Figure 1.8: The metafunctions .... and a bear hug from ‘Boo’
Printed advertisements and their exemplification of the metafunctions 41

(VERBAL GENRES)
Cluster 1: Main heading 1) Main heading (mainly linguistic but font size and type is salient).
The grammar in this text is typical of the grammar of the
mini genre of the LITTLE TEXT Halliday, (1994
[1985]:392-397) as exemplified by newspaper headlines,
telegrams, captions and headings. There are no finite
verbs, but instead truncated grammatical structures. This
little text has an appeal function (an APPEAL is a genre
in its own right).
Cluster 2: Main verbal text 2) Main verbal text. The linguistic text draws and combines
features from a number of different linguistic genres:
a) the genre of PERSONAL PRESENTATION. The text
starts off presenting a bear as an individual, giving him a
name, classifying him as an orphan, attributing personal
qualities.
b) some elements of the genre of RECOUNT in which
some specific events in Boo’s life are recounted in chrono-
logical sequence.
c) INFORMATION REPORT giving general facts,
information not about Boo Bear but about bears in
general and their plight. It uses the universalising present
tense, all part of a concern with general rather than
specific detail.
d) an EXHORTATION genre concerned with making an
appeal and persuading people to adopt a desired course of
action. This is marked by the addressee-directed imperative
Help us, followed up by reason/motivation in the clause
it’s a small price to pay for a life, the word small being con-
trasted with the priceless nature of a life.
Cluster 3: Photograph (VISUAL GENRES)
3) Photograph. One type of visual image is the PHOTO-
GRAPH, a genre in its own right distinct from, for example,
graphs, diagrams and tables. This photograph is not an
uncoded representation of the real world, but is itself semi-
otically coded. What the eye perceives is closer to what the
photograph represents than a diagram which is concerned
with abstract and general tendencies in the scientific coding
orientation. This is in clear contrast to the specific concrete
detail of the photograph which adopts the naturalistic coding
orientation (Kress, Van Leeuwen, 1996: 170-71).
4) Bottom bar. A BOTTOM BAR is also another visual mini-
genre in which visual, linguistic and graphic resources are
used in typical ways. In this text, we see an imperative clause
concerned with very specific concrete information which is
important for the reader to know, being highlighted by the
specific font size and choice (contrasting with both the main
heading and the main verbal text).
Cluster 4: Bottom bar
5) Logo. In advertisements, logos are typically in the bottom
right corner, whereas in letters they tend to be at the top. A
LOGO is a visual mini-genre, similar in status to the LITTLE
Cluster 5: Logo TEXT genre. It indexes a specific corporate or organisational
identity as the addresser of the text and ties this identity to the
meanings in the text. In other words, it has an anchoring
function which grounds the text in relation to a particular
company or cause which the reader or consumer of the text
can identify with.

Figure 1.9: A mini-genre analysis of the Boo Bear text


42 Multimodal Transcription and Text Analysis: Chapter 1

to the idealised or desired essence of something; the Real to more specific, con-
crete information and detail. As the Boo Bear example clearly shows, advertise-
ments are one text genre which often make use of the latter resource. The desir-
able, the fantastic and the sensual are placed in the top part of the page; specific,
more detailed or realistic printed information appears in the lower half of the page.
In providing a brief analysis of the Boo Bear advertisement in terms of the meta-
functions, we have, together with the examples given in the previous sections, the
beginnings of a multimodal transcription for the printed page, a matter that will be
further explored in Chapter 2.
Figure 1.9 focuses on the notion of mini-genres, which is not dissimilar to
the notion of cluster type that we have described briefly in 1.2, pp.21-34. As the
transcription makes clear, a cluster is essentially associated with specific instances
whereas a mini-genre is concerned with types. The two overlap insofar as a
specific instance is also a manifestation of type. The notion of mini-genres derives
from a distinction made by Bakhtin (1986) between primary and secondary genres
(see Inset 6 : Bakhtin’s distinction between primary genres and secondary genres on the
next page). Primary genres, sometimes called mini-genres, are basic prefabricated
text-making resources, such as Question-Response, Command-Compliance, Problem-
Solution, Definition-Explanation. Secondary genres, of which advertisements, novels,
and scientific articles are examples, select and combine the resources of these mini-
genres for their own purposes, thus forming more complex genres, the secondary
genres. Figure 1.9 clarifies how we can make a start, in terms of a rudimentary
multimodal transcription, to the work of demonstrating how primary genres are
typically combined to form secondary genres, in this case an advertisement. The
transcription distinguishes visual from verbal mini-genres in a clearcut way but clar-
ifies once more that the visual and the verbal are not simply added to each other. As
mentioned above, the multimodal integration of the two produces a multiplying
effect such that the one contextualises the other to produce an overall text meaning.
We may give two examples: first, we see how the linguistic genre of
PERSONAL PRESENTATION at the beginning of the text indexes Boo in the visual
image at the same time that the features that we have analysed in the visual
semiotic create a synergy between the two, all to do with creating interpersonal
closeness between the bear and us, and integrating the bear into our social world.
The bear is not in the wilderness, but is turned into your familiar, cuddly teddy.
Second, we see how the use of direct gaze, realising a visual demand for goods and
services, ties in with the EXHORTATION genre in the verbal text.
The transcription makes clear how the Boo Bear text works in terms of the
meaning-compression principle (see Inset 3: The resource integration and meaning-
compression principles , pp. 18-19). It shows how we use our knowledge of genre,
both primary and secondary, to understand the relationships between the various
clusters that typically make up a printed text. Note in passing that some of the
Inset 6 43

Inset 6: Bakhtin’s distinction between primary genres and secondary genres

�Bakhtin (1986: 61-62) makes a useful distinction between primary and secondary
genres. Primary genres are elementary genres that occur in what Bakhtin calls ‘unmedi-
ated speech communion’; they are the basic generic forms characteristic of a wide range
of social situations encountered in daily life. They include dialogic exchange units such
as: Command^ Response, Statement^ Response-to-Statement, Question^ Answer,
Greetings^ Reciprocate Greetings and so on. They also include written variants such as
Personal Letters, Memos, Explanations, Instructions, Explanations, Recounts and
Arguments.
�Secondary genres are more complex and include complex scientific, artistic, legal, jour-
nalistic, political, bureaucratic, technocratic and other complex forms of discourse.
They assimilate a wide variety of primary genres to their own purposes and principles
of organisation. Primary genres thus ‘absorbed’ and ‘digested’ (Bakhtin, 1986: 62) are
recontextualised by the secondary genre, losing any immediate contact with the every-
day situations in which they function. Their functions and modes of organisation are
mediated by the more complex secondary forms to which they are assimilated.

� Bakhtin developed this distinction as a result of his reflections on the shortcomings of


formal and objectivist accounts of language, with their emphasis on decontextualised
grammatical units such as the sentence. Generic forms are the standardised discourse
formats and text-types which language users use and adapt to specific speech situations
in order to give determinate social shape to their discourse. Genres are the typical forms
of discourse that are used according to the speech situation. Different social situation-
types require different generic forms according to the purposes of the interactants,
their social relations with each other and the social activity in which they are engaged.
Genres are stable, yet plastic, forms of discourse which can be adopted/adapted in
both standardised and creative ways to particular situations and their contingencies.
� In this book, we suggest that the distinction between primary and secondary genres
can be extended to multimodal texts. Secondary genres such as television advertise-
ments frequently absorb and recontextualise primary genres from a range of semiotic
modalities into their own forms and for their own purposes. Primary genres include
visual forms such as logos, animation, photographs, film clips and so on as well as a
wide variety of linguistic genres (see above) and musical genres. In multimodal texts,
pictorial, linguistic, kinesic and sound genres may all be assimilated to and recontex-
tualised by the more complex, more highly mediated secondary genres such as the
advertisement, the documentary film, the school textbook, the web page and so on.
� Bakhtin’s insight implies that texts include intermediate levels of analysis lying
between the microlevel lexicogrammatical, kinesic and image selections and the global
structuring of the text as a whole, its generic or macro-structure. There is, in other
words, no direct or unmediated relationship between the micro- and macro-levels of
textual organisation. Thompson and Mann (1987) have proposed a not dissimilar
notion of the rhetorical structures which lie between lexicogrammatical choices per se
and the most global level of text organisation. Our scalar model of multimodality
explores these intermediate levels when discussing, for example, clusters and phases.
44 Multimodal Transcription and Text Analysis: Chapter 1

various clusters in the Boo Bear text might at first sight appear to be monomodal
in terms of the resource integration principle. However, this is not the case. Even
the photograph of Boo is in fact more than a photograph. In fact the photograph
cluster integrates space in terms of orientation (Boo is at a 45 degree angle to the
horizontal plane) and visual salience (Boo’s body overshadows the text), all part of
the process of creating an interpersonal bond between the reader and the bear.

1. 5. Web pages and their transcription

Chapter 3 of this book contains a detailed analysis, perhaps the first of its kind, of
the way web pages and websites typically make their meanings. The special
characteristics of web pages are described there. In this section, however, we will
briefly outline one of the characteristics that needs to be entertained when
describing web pages, namely the ways in which intertextual relationships (see
Inset 8 : Intertextuality , p. 55) are created in web pages and how we can analyse and
transcribe these relationships. We will do this with reference to a page from the
British Library’s website (www.bl.uk ) relating to Leonardo’s Notebook as represent-
ed in Figure 1.10. In the preceding sections, we discussed the nature and role of
clusters as meaning-making structures. The discussion included a description of
the frame as a resource that can be pressed into service in many texts to separate
a cluster, or, more often, a group of (sub)clusters, from another cluster or
(sub)cluster group. In a comic or a film storyboard, for example, frames allow clusters
to be built into higher-level meaning-making sequences that enact particular sce-
narios, or as we prefer to call them phases (see Inset 7, p. 47 and 1.6, pp. 46-54).
The very notion of sequence implies a time-based, chronological ordering
of events in a narrative and/or cause-effect structuring. However, clusters can
also be related to each other in other ways. They can, for example, at least in part,
transcend a linear time-based sequence and enact constantly changing positions
relative to each other. In this sense, clusters are more like stars moving through
space in patterned and predictable ways in relation to each other, thereby consti-
tuting part of a larger, dynamic whole.
This is often the case with web-based animations. The website in Figure
1.10 is an interesting case in point (www.bl.uk/onlinegallery/ttp/digitisation3.html ).
It consists of a display area (top part) in which Leonardo’s Notebook is shown and
a bottom bar which partly governs the way the Notebook is displayed. Thus,
although users can turn the pages of the Notebook by clicking on them directly,
they can also use the slider in the bottom bar. This is in itself a cluster, a grouping
of resources: three visual objects, a line/bar, a circular sliding button and two end tri-
angles/arrowheads plus the possibility of moving the button with the mouse pointer
– i.e. an action potential (see in this respect Inset 14: Material object text and semiotic
action text: two sides of the same textual coin , pp. 175-177). A related cluster is the
Web pages and their transcription

Figure 1.10: Intertextual relationships in websites: the role of frames


45
46 Multimodal Transcription and Text Analysis: Chapter 1

page-display bar, part of the bottom bar supercluster, which displays the results of
the movement of the sliding button in terms of page numbers. The mouse pointer
is a further related cluster combining visual and actional resources. It is, of course,
essential in realising the potential of the display and bottom bar superclusters.
Part of this potential relates to overcoming the time-based consultation of
texts by allowing a greater set of selection possibilities. Indeed, with this device,
the reader can flip backwards and forwards through the pages at will, whereas
there is no such random sequencing when the Text and Audio buttons on the right
of the bottom bar are activated; that is, there are no fastforwards: you either read or
listen to the entire text or you switch these options off. A more striking example is
the Magnify option which allows the reader/viewer to randomly float a virtual mag-
nifying glass over the virtual Notebook, effectively zooming in on the individual
clusters that Leonardo wrote or drew. The Magnify option is essentially a frame with
a specialised action potential (see 3.9, p.146). It acts as a text-access tool but with very
different functions as compared with other text-access tools such as search engines.
The page shown in Figure 1.10 relates to Leonardo’s design for a new town
in France that he had been commissioned to work on by the King of France. It is
striking in the way that it consists of an apparently random set of jottings which,
on closer inspection, prove to be a series of interconnected and often merging
visual and verbal clusters. The last option, the Mirror option, can be used with the
Magnify function; it reverses Leonardo’s mirror writing (he wrote from right to left
in many of his works) thus making the text easier to comprehend. From one stand-
point, the effect of all these textual interactions is to make a precious text available
for detailed perusal by the many who would be otherwise barred from consulta-
tion. From another standpoint, we may say that by looking at these various aspects
in which clusters interact in websites, we are in part preparing the reader for the
model of textual organisation of websites that is put forward in Chapter 3.

1. 6. Film texts and their transcription

This section is concerned with film texts. It deals with their definition and the methods
by which they can be analysed. Specifically, it is concerned with outlining some of the
analytical tools – phase, transition and transitivity frames – that we will analyse in much
greater detail in Chapter 4. Examples of film media are those projected in cinemas
(films, news reels and cartoons), broadcast on TV (films, documentaries, news and
sports broadcasts etc.), distributed as home video (DVD and VHS films) or those relat-
ing to events such as lectures, public meetings and conferences which have been vide-
orecorded and reproduced in videotape or digital format for non-commercial reasons.
The definition covers a large range of film texts and genres including those relating to
the general public (cinema, DVD, TV or web-based films) and those intended for more
restricted audiences (company training films, recordings of university lectures, record-
Film texts and their transcription & Inset 7 47

Inset 7: Phases and their transcription

� The basic unit of textual sequencing and, hence, of global or ‘macro’ level
organisation of a text is the phase. Following Gregory (1995, 2002), a phase may
be defined as a set of copatterned semiotic selections that are codeployed in a con-
sistent way over a given stretch of text. Phases are text-analytical units in terms of
which the text as a whole can be segmented and analysed. However, these units do
not in themselves realise or constitute relations between semiotic forms and the
meanings the forms realise. Phases are instead the enactment of the locally
foregrounded selections of options which realise the meaning which is specific to
a given phase of the text. It is the task of a multimodal text analysis to specify both
which selections are selected from which semiotic modalities and how they are
combined to produce a given, phase-specific meaning.

�Phasal analysis has also been extended to multimodal action texts in the work of
Martinec (1998, 2000) and in our own work (Baldry, Thibault, 2001, 2005). In this
approach, the text is segmented into a number of phases and the points of transi-
tion between phases. A given phase is characterised by a high level of
metafunctional consistency or homogeneity among the selections from the various
semiotic systems that comprise that particular phase in the text. In this way, the
specific selections in that phase and their modes of copatterning yield an internal
consistency which characterises a given phase and which distinguishes that phase
from other phases in the same text. The temporal unfolding of a given phase is a
wave-like pattern or, rather, a series of interacting waves. It follows that phases and
subphases refer to salient local moments in the global development of the text as
it unfolds in time. The transcription must reveal the patterns of use of choices
from different systems in the real-time unfolding of the text. The Prague school
concept of foregrounding is crucial when showing which selections from which
semiotic resource systems are relevant to the instantiation of a given phase.

�Viewers of the text have no difficulty in perceiving particular textual phases.


Crucially, this also depends on their ability to recognise the transition points or the
boundaries between phases, i.e. when one phase or subphase ends and another
begins. The points of transition between phases have their own special features that
play an important role in the ways in which observers or viewers recognise the shift
from one phase to the next. Generally speaking, transition points are perceptually
more salient in relation to the phases themselves (Thibault, 2000a; Baldry, 2004).
This is always a matter of degree and does not entail some absolute criterion of
what is salient and what is not. If rhythm is, as Mathiot (1983: 38) argues, ‘the pat-
terning of perceptual prominences in the behavioral flow’, then the perceptual
prominence which is accorded to a transition from one textual phase to the next
can be expected to relate to the overall rhythmic patterning of the text in signifi-
cant ways.
48 Multimodal Transcription and Text Analysis: Chapter 1

ings of children telling stories). Such texts are both multimodal and dynamic : as they
unfold in time, they display different and constantly varying constellations of sound,
image, gesture, text and language (Baldry, 2000a; Thibault, 2000a). They can be
analysed as individual texts using the multimodal transcription technique or alterna-
tively as collections of texts (text corpora) using multimodal concordancing techniques
(Baldry, Beltrami, 2005; Baldry, Thibault, 2005; Taylor Torsello, Baldry, 2005).
Through analysis a much more detailed definition and understanding of film
texts can be provided. Frame-based dissections of texts such as the multimodal
transcription of advertisements in Appendix I and Appendix II play an important
role in describing the meaning-making blocks that make up a specific film text
(Baldry, 2000a, 2004; Thibault, 2000a). They necessarily go beyond cluster analysis,
which presupposes a fixed relationship between resources, and involve dynamic
sequences of resource integration, what we have termed phases – (see Inset 7:
Phases and their transcription , on the previous page) – following, though also mod-
ifying, Gregory’s notions of phase and transition (Gregory, 1995, 2002).
Film genres are varied in their nature and their discussion requires a canvas that
goes far beyond the confines of this book. We will thus limit our exemplification in
this book to the analysis of one genre – the TV advertisement genre – for which we

Shots Phases Description of Phases Macrophases


Shots 1-5 Phase 1 young woman telephones man holding young Macrophase 1
man hostage
Shots 6-10 Phase 2 young woman drives to Vienna
Macrophase 1
Shots 11-15 Phase 3 in Vienna young woman gets out of car and
phones man, who instructs her to drive to
Prague
Shots 16 - 19 Phase 4 young woman drives to Prague
Shots 20 - 24 Phase 5 it is revealed the whole scene is taking place Macrophase 2
on a film set in a studio; the young woman
gets angry with the film director and walks off
the set Macrophase 2
Shots 25 - 26 Phase 6 the Mitsubishi Carisma car is revealed
Shot 27 Phase 7 the young man taken hostage is shown
hanging upside down beneath a bridge in a
dark subterranean setting
Shots 28 - 29 Phase 8 logo and slogan Endphase
Blackout

Table 1.5: The Mitsubishi Carisma text: Summary analysis of shots, phases and macrophases
Film texts and their transcription 49

provide detailed transcriptions of a number of texts (see Chapter 4 ). As Appendix I


and Appendix II show, multimodal transcriptions can capture the activities, people,
objects and circumstances represented in a TV advertisement, in such a way that an
advertisement’s basic message can be reconstructed in terms of the individual phases,
subphases and shots that it is made up of (Baldry, 2000a, 2004; Baldry, Thibault, 2005;
Gregory, 1995, 2002; Martinec, 1998, 2000; Thibault, 2000a). Thus, in terms of
understanding the meaning-making processes of this text, the multimodal
transcription helps reconstruct the text’s phases and the subdivisions of each phase.
Phases are the basic strategic meaning-making units in a film text. As with printed
texts, a multimodal transcription of a film text reconstructs the way information is
divided into blocks and the way these blocks relate to metafunctional organisation
and the constant changes in this metafunctional organisation as the text flows in
time (see Appendix I and Appendix II ). As we shall see in the course of this book,
transcription can thus help us identify many elements in a film text and suggest the
way they integrate to make meaning.
With the type of multimodal transcription illustrated in Appendix I and
Appendix II, we can reconstruct the boundaries between phases and subphases,
the role of scenes and shots and, on a limited scale, we can make a start to the work
of identifying phase types by comparing the transcribed texts to others (see Chapter
4 ). Texts are comprised of different phases along with transition points between
phases. A phase is an intermediate level of analysis which is characterised by a
relative semiotic homogeneity of selections and combinations of selections from
the semiotic resource systems that are used to realise a particular textual phase. A
particular phase is therefore characterised by the fact that the meanings made

SHOT 1: PARTICIPANT: CAR^ PROCESS: MOVE^ LOCATION: TOWARDS TELEPHONE


BOX

SHOT 2: PARTICIPANT: YOUNG WOMAN ^ PROCESS: ACTION: GRASP^ PARTICIPANT:


GOAL: TELEPHONE RECEIVER

SHOT 3: PARTICIPANT: SAYER: YOUNG WOMAN^ PROCESS: VERBAL: TALKS ON


TELEPHONE^ PARTICIPANT: RECEIVER: MAN^ LOCATION: INSIDE TELE-
PHONE BOX

SHOT 4: PARTICIPANT: SAYER: MAN^ PROCESS: VERBAL: TALKS^ PARTICIPANT:


RECEIVER: WOMAN^ LOCATION: SECRET UNDERGROUND LOCATION

SHOT 5: PARTICIPANT: GAZER: MAN^ PROCESS: GAZE VECTOR^ PARTICIPANT:


GOAL: YOUNG MAN SUSPENDED FROM BRIDGE

SHOT 6: WOMAN^ LEAVES TELEPHONE BOX^ WALKS TO CAR

Table 1.6: The Mitsubishi Carisma advertisement: thematically salient transitivity frames
50 Multimodal Transcription and Text Analysis: Chapter 1

within that phase exhibit a high degree of sameness, at the same time that the
meanings made in other phases of the same text are different. A phase will there-
fore make use of a distinctive copatterning of meaning options in order to create
the meanings of that phase and to distinguish a particular phase from other,
different phases in the same overall text.
The transition from one phase to another is matched by a shift in the kinds
of meaning options which are selected and combined in that phase. Different
phases in a text are functional units which make their own specific contribution to
the meaning and organisation of the text as a whole. At the same time, the
different kinds of meanings made in different phases of a text are related to the
meanings made in other phases of the same text as well as to the meanings that are
made by the text as a whole. Phasal analysis is useful and revealing because it shows
how small-scale units such as the shot in a video text can be related to larger-scale
textual units such as the phase. The ways in which smaller-scale units are related to,
and are integrated with, larger-scale units is important for establishing the meaning
and function of small-scale units such as the shot in video texts. Texts are organised
on many different organisational scales; units on different scales all play their part
in creating the meaning of the text as a whole, at the same time that they make their
own specific contribution to the meaning of the whole. The small-scale meanings
of a unit such as the shot or the phase cannot simply be added together to derive
the meaning of the whole text. Different units on different scales in a text relate to
units on the same scale and on smaller and larger scales in different ways (see Inset
12: Scalar levels , p. 144). The meaning of the whole depends on the many different
ways in which units relate to each other.
For the purposes of the present discussion, we shall focus on the following
units: shot , phase and macrophase in relation to the Mitsubishi Carisma advertisement.
The Mitsubishi Carisma text has been analysed into a total of twenty-nine shots,
eight phases and three macrophases. This analysis is summarised in Table 1.5 above.
The complete text is transcribed in Appendix II. Since this text will be further
analysed in Chapter 4, we will not undertake a complete analysis of this text as yet.
Instead, we shall make some informal observations on the overall organisation of
the text into shots, phases and macrophases. Our purpose is to illustrate the impor-
tance of phasal analysis and its role in text analysis and transcription. The first part
of the discussion will look closely at Phase 1 with these questions in mind. Phase 1
consists of five shots:
� Shot 1 establishes the time (night time) and the urban location, at
the same time that it introduces the telephone box and the car,
which is seen approaching the telephone box.
� Shot 2 cuts to the telephone receiver inside the telephone box and
shows a hand grasping the receiver.
The soundtrack 51

� Shot 3 shows a young woman inside the box talking on the telephone.
� Shot 4 cuts to the middle-aged man with whom she is talking. He is
holding the telephone receiver to his ear and is talking to the woman.
� Shot 5 is a camera pan to the left which tracks the man’s gaze to
his hostage, a young man suspended above a shark-infested
waterway by a rope hung from a bridge.
� Shot 6 is a transitional shot: it marks the end of Phase 1 at the same
time that it begins Phase 2. This shot shows the woman’s car – a
Mitsubishi Carisma – viewed from inside the telephone box as the
woman leaves the telephone box and walks back to the car.
The six shots in Phase 1 combine to produce a thematically homogeneous phase
which is focused on the use of the telephone to bring about this first contact
between the woman and the man. Shot 6 marks the end of the first telephone con-
versation between the woman and the man, at the same time that it is the begin-
ning of the first car-drive phase when the woman drives to Vienna. In other words,
shots and phases function on different meaning levels (see Inset 12 : Scalar levels, p.
144). The visual scene combines thematic material from both Phase 1 and Phase 2,
namely the telephone box and the woman’s car.
Table 1.6 presents a linguistic gloss on the thematically salient transitivity
frames (Baldry, Thibault, 2005) in Phase 1. The analysis in Table 1.6 focuses on the
participants, the actions they engage in, and other relevant circumstantial details such
as location. However, very many details of the visual text and the soundtrack remain
unaccounted for in such an analysis, which is not meant to be exhaustive in any case
(see Chapter 4 for a more detailed analysis). Instead, Table 1.6 shows some of the
resources which contribute to the thematic homogeneity of this phase: we see the
participants, the activities they engage in, and the objects that they interact with.

1. 6. 1. The soundtrack
At the same time, what we see is also related to what we hear. A few words on the
significance of the soundtrack are therefore in order at this point. In Phase 1, the
soundtrack is comprised of three components: (1) the orchestral music; (2) the
voices of the man and the woman; (3) the sound of the telephone box door clos-
ing when the woman returns to her car after making the telephone call to the man.
The music starts off at a relatively low volume and slow tempo. Both the volume
and the tempo gradually increase until the beginning of Phase 2, when there is a
marked quickening of the tempo as the woman begins the drive to Vienna in that
phase. The music in Phase 1 is a contextual ground (see Ground in Inset 16: Perspective
in sound: Van Leeuwen on Figure, Ground and Field, p. 212) to the scene which we
observe. The music supplies mood and ambience and is reminiscent of the music
52 Multimodal Transcription and Text Analysis: Chapter 1

from spy thrillers such as James Bond films. The two occurrences of the spoken
voice in this phase are the dominant sound motif, whereas the orchestral music
stays in the background.
The voice is the acoustically dominant figure in the soundtrack in Phase 1. In
other words, the acoustic focus is on the voices of the two speakers, rather than the
orchestra. However, the orchestra gains in prominence with the entry of the
trumpet solo as soon as the man finishes saying his line bring it tomorrow at the same
time that the camera tracks the shift in the man’s gaze by panning to his hostage
(Shot 5 ), who is seen suspended above the shark-inhabited canal. The motif played
by the trumpet clearly associates with the man and his dark intentions; it introduces
a note of dramatic tension into the previously more melodic flow of the music.
In contrast to the woman’s standard middle class British accent and steady mod-
ulation, the man’s voice is more heavily accented as well as exhibiting a fair degree
of resonance and reverberation. The tempo of his voice is also slower than that of the
woman’s more normal talking speed. Table 1.7 shows some of the salient phonetic
and prosodic features of the man’s voice. The man’s voice and the woman’s voice
clearly contrast in ways which index significant meaning contrasts. These opposi-
tional contrasts are summarised in Table 1.8. The contrasting phonetic and
prosodic features of the two speaking voices index a series of oppositions that
oppose categories such as nationality and language and therefore the moral oppo-
sition between the good intentions of the woman and the evil ones of the man.
Given that the woman is aligned with the car, it is hardly surprising that the viewer
is asked to align with the values of the female protagonist and later the car itself in
opposition to the values indexed by the man and the faintly parodistic and absurd
links that his voice indexes. These features of the man’s voice suggest a sinister
quality and index the morally contorted character of the villain in many spy movies,
such as those found in many of Ian Fleming’s James Bond stories and the feature
films based on these. The contrasting qualities of the two speaking voices serve, in
partnership with other modalities such as the visual scene, to position the two char-
acters in very different ways in relation to the viewer of the advertisement. Thus,
the woman is visually connected to the car, to the urban world outside, and more
generally with the values of the reader/viewer that the text seeks to target. In the

Phonetic Features Prosodic Features

low pitch decreased tempo


guttural syllable-timed rhythm
resonant

Table 1.7: Phonetic and prosodic features in man’s spoken voice


The soundtrack 53

telephone box, she is shown frontally and the use of colour is naturalistic. The
woman belongs to our world and viewers are asked to identify with her. In contrast,
the man is associated with a sinister subterranean setting, with criminal activity
(extorting ransom money for the hostage he is holding), and is generally shown as
not belonging to the world and the values of the targeted viewer. His face is seen
to be tilted at an oblique angle in contrast to the frontal perspective and vertical position
used to show the woman’s face when she speaks on the telephone. Furthermore,
the woman’s face is fully illuminated, whereas the man’s face is partly obscured by
the interplay of light and shade on his face.
As we have pointed out above, Shot 6 is a transitional shot : it brings Phase 1
to an end at the same time that it introduces elements of Phase 2. In this shot, the
young woman leaves the telephone box and walks back to her car. The shot also
features one of the few occurrences of an ambient sound, in this case made by the
door of the telephone box when it shuts behind her. The sound is given a fairly
high degree of prominence as well as being fairly low on the absorption scale, to
the extent that it is heard as reverberant or resonant.
A similar use of an ambient sound is made in Shot 15, Phase 3 when the
woman replaces the telephone receiver after talking to the man for the second time.
In both instances, an ambient sound takes on qualities and a potential significance
that might not normally apply to such a sound in a more naturalistic acoustic
context. In both cases, the sounds mark the end of a textual phase in the
advertisement. They therefore function textually in concert with other features to
indicate a transition from one phase to the next. Furthermore, the low degree of
absorption in both cases means that these sounds take on non-naturalistic qualities
in keeping with many other features of this text.

Woman Man

Nationality British Foreign: middle European

Language authentic English inauthentic ‘foreign’ English

Moral Character ‘good’ ‘bad’

Table 1.8: Some salient meaning oppositions indexed by contrasting phonetic


and prosodic features in the speaking voices of the man and the woman
54 Multimodal Transcription and Text Analysis: Chapter 1

At a still higher level, the text is comprised of three macrophases, one of which
is the end phase (Baldry, 2004), which integrate the smaller-scale phases to its prin-
ciples of organisation. The justification for the inclusion of this level lies in large
measure in the role that the soundtrack plays in binding the text into three higher
level macrophases. The first macrophase extends from Phases 1 to 4 and is charac-
terised by the orchestral music which binds all four of these phases into a distinc-
tive larger-scale unit. Phases 1 to 4 focus on the two telephone conversations
between the woman and the man (Phases 1 and 3 ) and the two car-drive phases when
the woman drives to Vienna (Phase 2 ) and to Prague (Phase 4 ). The music, which
starts off slowly and quietly in Phase 1, builds up to a very quick tempo and rhythm
in Phase 2 when the woman starts the journey to Vienna. This surge in musical
intensity carries through to the end of Phase 4. With the onset of Phase 5, the music
is abruptly faded out and the voice of the off-screen male presenter is heard for the
first time, at the same time that Phases 5 to 7 switch from the crime thriller genre
that is foregrounded in Phases 1 to 4 to the movie set where an overpaid director is
shown to be responsible for making what we now see to be a hackneyed movie in
Phases 5 to 7, before the car is moved centre stage and shown to be the real star of
the show. In the short end phase, different visual and acoustic resources function to
conclude the advertisement and to connect its meanings to the maker of the car.
The discussion in the preceding paragraphs suggests some of the ways in
which a multimodal video text such as a television advertisement can be analysed
in terms of a number of different, though interrelated levels of textual organisa-
tion. The analysis here is neither exhaustive nor technical (see Chapter 4 ). By intro-
ducing the notion of phasal analysis, we have drawn attention to the ways in which
different parts of the same overall text may use different semiotic resources and dif-
ferent combinations of semiotic resources to create meanings which are specific to
a particular part of a text, that we have called phase, following and adapting
Gregory’s original insights as regards this level of textual meaning (Gregory, 1995,
2002). At the same time, the various phases combine units on lower levels, such as
the shot, while also being integrated to higher-scalar units such as the macrophases
proposed here, or to the text as a whole.

1. 7. Conclusion

In this chapter, we have introduced some of the basic aspects of the scalar model
of multimodality that we are seeking to establish in this book. This model views
texts as consisting of multiple, interacting textual levels that make their meaning
through the constant interplay of smaller and larger textual units. In this respect,
we have placed particular attention on the intermediate levels of textual composi-
tion of multimodal texts, such as clusters and phases, showing how they link up with
Conclusion & Inset 8 55

Inset 8: Intertextuality

 A critical aspect of multimodal text analysis is intertextuality (Lemke, 1985;


Thibault, 1986, 1990). No text is made or interpreted in isolation from other texts.
Instead, texts of all kinds are always related to other texts, at the same time that
they incorporate other texts and other textual voices into their own internal organ-
isation. Intertextuality is a system of relations that link texts on the basis of shared
criteria of meaning and formal patterning and shared principles of organisation. It
is a system of constraints which includes: (1) the typical meaning relations which
characterise at some level of abstraction the texts that are assigned to a given inter-
textual set; and (2) the extent to which a specific text instantiates the patterns of
meaning relations that are typical of the given intertextual set.
 Car advertisements, for instance, give us the opportunity to abstract certain common
meaning relations which are potentially meaningful in some community from a set of
thematically related texts. An intertextual pattern may be established on the basis of a
common pattern of movement, colour, visual transitivity relations (e.g. human-car
relations) and many other features which may function to create a system of mean-
ings which is shared across the texts belonging to the intertextual set. Intertextuality
therefore refers to the systems of meaning relations which are common to some set
of texts, however large or small, in some community (Lemke, 1985). Texts can be
related to each other in some wider intertextual set on the basis of shared thematic
relations, a common evaluative orientation, the fact that they are related to the same
type of social activity structure and on the basis of shared generic features.
Intertextuality is less abstract than the system of possible meanings but more abstract
than specific texts (see Inset 13: System and instance, pp. 172-173): it is a typical pattern
of actualised meaning potential, rather than being a mere possibility in the system.
Multimodal intertextual thematic formations (see 3.8.1 , pp. 136-140), for instance, are
built up on the basis of joint verbal-visual resources which work in partnership to cre-
ate a multimodal (verbal-visual) thematic formation common to some set of texts.
The texts which belong to the same formation have in common typical cothematic
relations among selections in the visual grammar of depiction and the linguistic gram-
mar. A multimodal intertextual analysis (see 2.6-2.8 in Chapter 2, pp. 80-102) recog-
nises the role of selections in, for instance, both visual transitivity frames and expe-
riential selections in clause grammar in the formation of its thematic relations (see
Figures 4.1a and 4.1b). The identification of a common intertextual pattern across dif-
ferent texts represents an abstraction from the meaning relations in specific texts to
the features which the wider intertextual set of related texts has in common.
 The concept of intertextuality therefore shows how the resources of different semi-
otic systems are codeployed in ways that belong to a common intertextual pattern,
at the same time that the actual uses of these resources in particular texts can be
compared to the more abstract intertextual pattern. Such comparisons serve to
show the extent to which any specific instance conforms to, or varies with respect
to, the features that are criterial for the intertextual pattern. In this sense, intertex-
tuality is both an abstraction from specific instances at the same time that it is more
specific to a particular intertextual set.
lower levels – the basic building blocks, as it were, of textual composition – such
as visual, acoustic, gestural and many other resources, as well as with higher levels,
such as superclusters and macrophases and, of course, with the overall text and other
texts through intertextual links (see Inset 8 on the previous page). Given our con-
cern with the need to characterise the notion of multimodal genre more fully, we
have also shown how these intermediate levels may be characterised, in part, as one-
off instances and, in part, as recurrent types. In sketching out the basic features of
our model, we have provided answers to the questions posed at the beginning of
the chapter relating to the definition of multimodality and the role of such princi-
ples as resource integration and meaning compression in the enactment of multimodal
texts. We have thus made a start to the study of the whole-part relationships within
and between multimodal texts, with context of situation, genre and context of culture.
Now that a firm basis has been secured, it will be possible to explore all these
aspects in a much more detailed way in the chapters which follow.
Chapter 2

The printed page

2. 0. Introduction

How does the page communicate? How does the visual image organise the per-
sons, objects, the actions they perform, and the settings in which these occur into
a structured set of relations? How does the page give structure to the relations
between the represented world of the text and the viewer of the text? And, more
generally speaking, how does it indicate to the reader/viewer the possible ways of
reading the text and the relative information priority to be assigned to the different
component parts of the overall visual composition? How can we talk about –
analyse and theorise – these various aspects of the way visual texts communicate?
In this chapter we link some of the general principles outlined in the previous
one to the analysis and transcription of the printed page, the scientific printed page
in particular, in an approach which moves from general to increasingly specialist con-
siderations. Specifically, in the first part of the chapter we consider the evolution of
the page as a textual unit in its own right in contemporary society, characterising the
role played by such resources as tables, charts and diagrams in this process and com-
paring texts from the 19th century with their ‘counterparts’ from the end of the 20th
century. The later sections of the chapter deal with the way in which scientific mean-
ings are communicated to children in biology textbooks. In the course of the chap-
ter, we will provide a detailed account, both in terms of text analysis and multimodal
transcription, of the ways in which a metafunctional framework (see 1.4, pp. 38-44)
can help us understand many aspects of the multimodal printed page, such as the
spatial arrangement of items on the page, the relation between images and linguistic
text, the relations between reader and multimodal text, and so on.

2. 1. The printed page and its evolution

The very notion of the evolution of the page may at first seem ludicrous. How can a
page evolve? Yet in modern society the page is an important textual unit and a com-
parison of virtually any page from contemporary publications (whether newspapers,
school textbooks or scientific journals) with those of previous generations will show
58 Multimodal Transcription and Text Analysis: Chapter 2

that this change in status is due mainly to the rise of the multimodal page in the last
fifty years (Kress, Van Leeuwen, 1996). There is, of course, no such thing as a
monomodal page: there never has been and never will be. Like any other textual
unit, a page cannot create meaning through the use of language alone but relies
instead on a combination of several meaning-making resources: linguistic, graphic
and spatial at the very least (Thibault, 1998a). While all pages are by definition
multimodal, some are more obviously multimodal than others, combining traditional
semiotic resources such as language and layout with more ‘modern’ resources such
as colour and photographs. Under the influence of technology, computer technol-
ogy in particular, these developments have accelerated to such an extent that in
recent years our conception of the page has changed significantly. What was essen-
tially a linguistic unit 100 years ago has now become primarily a visual unit. The page
is no longer, as it was predominantly in the 19th century, simply a convenient divi-
sion for the purposes of printing. In Western culture, it is increasingly looked upon
as a textual unit in its own right, a matter clearly reflected in the growing list of
expressions that identify the page in terms of different social functions: index pages,
glossy pages, financial pages, yellow pages, teletext pages and, of course, web pages.
Today’s page thus incorporates the principle of multimodal textual design and
organisation (Kress, Van Leeuwen, 1996, 2001).
An awareness of the page and its evolution would appear to be a signifi-
cant aspect of language studies and language education at all levels, particularly
when we recall that Internet pages are not just read but are instead listened to,
looked at and even watched, increasingly by young children, many of whom are tra-
ditionally assigned to the pre-reading age group. The scientific page is no exception
to the general trend towards the integration of diverse semiotic resources sketched
out above. A glance at any page from a modern textbook and one from the previ-
ous century will confirm that the conception of the page as a visual unit affects
specialist sectors of society, and the genres they use, just as much as it affects
popular genres, such as newspapers, magazines, directories, circulars and ency-
clopaedias, directed to a much wider public. Compare for example the
representation of plant movements in a twenty-four hour cycle in Figure 2.1, as
exemplified in Darwin’s The Power of Movement in Plants published in 1880
(Darwin, 1989 [1880]), with its modern counterpart taken from Curtis’s Biology
(Curtis, 1975 [1972]). You will see that many of the meaning-compressing devices
that characterise the modern multimodal text in general are also employed in the
scientific text: colour, use of spatial disposition that arranges text blocks horizon-
tally as well as vertically, lines that have evolved to acquire both a metatextual
labelling function and a dynamic function, which acts to create a relationship between
movement and change in time. Following on from what we stated in the previous
chapter, the term meaning-compressing (see Inset 3: The resource integration and
meaning-compression principles, pp. 18-19) is used here in relation to the changes that
The printed page and its evolution 59

From BIOLOGY by Helena Curtis. © 1968, 1975 by Worth Publishers. Used with permission.

Figure 2.1: Leaf movements: in Darwin (top); in a modern textbook (bottom)


60 Multimodal Transcription and Text Analysis: Chapter 2

have taken place in the course of time which allow the same processes to be described
in scientific texts in a shorter space; these changes are typically the result of:
(i) the greater integration of visual and verbal resources;
(ii) the often concomitant process of greater abstraction in
representation that brings about a collapsing of the different
time scales that occurred in the actual process of experimenta-
tion into a single, more hypothetical point in time usually corre-
sponding to the point in time when the experiment/research was
reconstructed in terms of a written report by the scientist.
Interpreting this in relation to Darwin’s work on the movement of plants, we
notice how there is very little abstraction in the visual, to the point where the drawings
are reproductions of the actual lines that the movement of leaves drew on a cylinder.
On the other hand, the half-page reproduction of the same experiment in a modern
biology textbook – see Figure 2.1 (bottom part) – collapses all this into two inter-
linked diagrams labelled (a) and (b). The first visual – Cluster (a) – summarises the
entire experiment in terms of a labelled and abstract cluster which pares the experi-
ment down to a visualisation of the MATERIAL process in which a leaf draws a line
on a drum by means of a thread tied to the leaf at one end and to a lever at the other
end. The second visual – Cluster (b) – suggests the second stage of the experiment,
in that it represents the now completed line at the end of the 24-hour cycle and pro-
vides a blow-up of the complete cycle of leaf movements showing the regularity of
the cycle. The two visuals thus express a cause-effect relationship requiring very little
linguistic expansion; language is in fact used in what, de facto, is Cluster (c) in the left-
hand margin of Figure 2.1. By contrast, Darwin’s original experiment – Figure 2.1
(top part) – on which this modern version is based, only expresses the first part of
this equation – the actual material drawing of the line – but stops short of visualis-
ing the consequences, relying instead on language to describe these effects. Moreover,
the drawings in Darwin’s text typically show only a low level of abstraction. The
leaves are recognisably clover leaves whereas in the modern text they are abstract
representations of leaves; Darwin’s lines are real lines drawn by the leaves themselves
with the addition of an arrow mark suggesting directionality; in the modern text, on
the other hand, the lines are vectors of a totally abstract nature with a symmetry not
backed up by the original experiment. We may add that the contemporary scientific
page, with its sophisticated representation of movement, often has a dynamic, as
opposed to static, feel to it, which enables it to express different states. This is exem-
plified by the cartoon-like sequence of drawings, typical of modern school science text-
books that indicate stages in cycles: the water-evaporation-rain cycle, the seasons cycle, the
phases-in-human-evolution sequence, and so on (cf. the notion of temporal-analytical
processes in Kress, Van Leeuwen, 1996: 95). The dynamic potential of scientific
drawings is often exploited in the animations used in scientific film media. Video
The resource integration principle in the scientific page 61

recordings can also capture the process of drawing a diagram, a special form of ani-
mation that is used, for example, in a lecture to explain a process or a phenomenon
in a stage-like manner. As we shall see later, in Chapter 3 (e.g. 3.5, pp. 120-125), this
kind of animation is often closely associated with an awareness and use of the
dynamic properties of the multimodal web page.

2. 2. The resource integration principle in the scientific page

A full account of the history of the scientific page as a meaning-making unit is not
possible here. A brief comparative examination of the codeployment of visual and
linguistic resources in the scientific page will, however, contribute to our under-
standing of the multimodal scientific page. Of special interest, in this respect, is
the focus that can be placed on how the various resources contribute in their
different ways to the various dimensions of a text’s meaning and how the growing
integration between resources may be regarded as an important social achievement.
To illustrate this point, and to provide partial answers to the questions posed
above, we may examine some specimen texts from different ages.
We have already seen some texts from the field of biology, and below we
analyse biology texts in more detail. But we may also extend our enquiry to the field
of economics, again comparing texts from two distinct historical phases, in this case
taken from Volumes I and II of Marx’s Capital as presented in the first American
edition dating back to 1908 and an issue of The Economist magazine dated September
5th 1998. Albeit in different periods and within a different world, these texts essen-
tially belong to the same discourse type, namely political economy, a genre effectively
created by Marx in Capital and which marks a significant shift in the deployment of
semiotic resources vis-à-vis earlier economics texts such as Adam Smith’s The Wealth
of Nations (1776) and John Stuart Mill’s Principles of Political Economy (1848). It is
a discourse type characterised by a combination of other discourse types: mathemat-
ical, statistical, journalistic, political as well as pure economics. In Marx (but much less
frequently and less obviously in the Economist ) a particular discourse type is often
associated with a particular modality. Thus, in Figure 2.2, for example, we can see that
the part of the page relating to Lincolnshire is divided into three parts:
(a) a narrative section reflecting a written modality;
(b) an economic and statistical section (after the words The following ),
reflecting a combined visual/verbal modality;
(c) a journalistic section (embedded in the first section and marked
off by quotation marks) which reflects an oral modality.
Whatever definition of economics as a science is given – and today’s economics is
obviously concerned with many issues, such as inflation and the production of
services, that were not the primary focus for Marx – the concept of measurement
62 Multimodal Transcription and Text Analysis: Chapter 2
© The Economist Newspaper Limited, London (03/09/1998)

Figure 2.2: A typical multimodal page in Marx’s Capital;


and the top part of an equivalent page in The Economist
The resource integration principle in the scientific page 63

remains an important part of all scientific texts. From this point of view, Capital and
The Economist share common ground in their contribution to the development of
semiotic devices that express measurement. Although the modern text shows a
much stronger tendency to integrate visual and linguistic resources, the text type
developed by Marx already moves in this direction in some very interesting ways,
notably in its use of the vector. Nevertheless, in reading Capital, one is constantly
aware of a struggle to make meaning emerge from a limited array of resources –
principally the table and various graphic and spatial devices. Take, for example,
Chapter X (Vol. I) which is concerned with economic and social aspects of the work-
ing day and the exploitation of the labourer by Capital. Here Marx adopts a journal-
istic standpoint which foregrounds the idea of hearing different voices – literally as
well as metaphorically: the voices of the labourer, the factory inspector and of
Capital. Marx writes not “Let us read the Factory Inspectors report ” but “Let us hear
the Factory Inspectors”, “Let us listen to Factory Inspectors”. Capital is heard in the
House of Commons as is the “voice of the labourer”. The written report – invari-
ably Marx’s source of information – is thus reconstrued in terms of an oral modality
in order to make a greater impact on readers regarding the social injustice that
surrounds the worker’s plight, a journalistic recontextualisation of the original text
achieved not through sound bites but through a combination of language and such
visual means as quotation marks, different font size and different spatial dispositions
vis-à-vis the rest of the page. Like many scientists of his time, Marx was hampered,
but not defeated, by the resources he had at his disposal. He did not have colour
photographs of the conditions of men, women, children and animals in coal mines,
live broadcasts from Parliament or complex charts capable of representing
development over time that are a staple part of modern textbooks. In some cases, he
chose not to use some resources which he did have – the political cartoon, for
example. The Economist (Figure 2.2) uses all these resources and many more, some
of which, as we shall see, include dynamic properties that are the result not just of
technological innovation but also of society’s learning about how to represent
meaning in increasingly complex and often abstract combinations of the visual and
verbal.
Such learning (Halliday, 1978: 192) often translates into a greater capacity in
the modern page to condense meaning. This process of compression is apparent
in the plant movements example (Figure 2.1) and may be associated with the
greater abstraction of the modern scientific page: As suggested above, Darwin’s
use of abstraction in the visual is limited: the verbal text describes the original
naturalistic tracing at length while the movements in the diagrams are the result of
the movement of a real pen moved by real leaves. Only the representation of the
direction of the movement is abstract, in that arrowheads have been added on in
such a way as to reflect the direction of the movement as closely as possible. The
actual pen is not represented, while the tracings and the leaves are represented sep-
64 Multimodal Transcription and Text Analysis: Chapter 2

arately. The modern text, on the contrary, is more abstract in its encoding of the
visual semiotic (Kress, Van Leeuwen, 1996: 170) – not real leaves, pens or pen
marks, but artists’ representations of the leaves and the devices that produce the trac-
ings. This abstraction – in the modern author’s words, a representative recording
(Figure 2.1, Cluster (c ), running text) – allows the tracing process and the notion of
a cycle that repeats itself to be expressed in just half a page. This level of abstraction
is made possible by the integration of various semiotic resources; or, to see the mat-
ter from a slightly different perspective, through the rise of the multimodal page.

2. 2. 1. How can we study tables systematically?


We have seen that, though in principle quite separate, meaning compression and
the integration of visual and verbal resources are closely related aspects of the
scientific page. We need to understand this relationship a little better in order to be
able to sketch a preliminary answer to the question of how the scientific page
makes its meaning. Let us begin by looking at one of the typical meaning-making
resources of the scientific page: the table. As demonstrated by the examples taken
from Capital and from a recent issue of The Economist (Figure 2.2), tables show a
high degree of thematic-semantic condensation in which the principle of ellipsis is
carried to the extreme; this is a process, which, as illustrated in 2.4.1 (pp. 71-74)
below and in the section in Lemke entitled ‘thematic condensation in technical
discourse’ (Lemke, 1990a: 441-6), needs to be discussed in relation to nominalisation.
The table typically integrates visual, linguistic and even mathematical semiotic
resources into its structure. The condensation is shown in the fact that the grammar
of the table usually reduces to the bare nominal groups or Head nouns which label
the various columns and rows. Thus, each row or column is assigned a thematic
meaning which can then be expanded into a full clause either by accessing the
thematics of the surrounding verbal text or some relevant intertextual thematic
system (Lemke, 1985; Thibault, 1989). Few modern texts are, however, likely to use
the convention of vertical inclination found in Marx (Figure 2.2) preferring to adopt
a horizontal format of the type that is easily and automatically formatted with a word
processor. The way that the table is used in Marx’s Capital is revealing, when we
compare it to tables in modern scientific pages. Figure 2.3 (Vol II, p. 310) is a
multimodal page in which the table and the verbal text that it specifically relates to is
set off from the prior text by a printed line. The premises or assumptions on which
the table is based are realised linguistically. In the typical style of economic discourse,
which we still find today, Marx writes: We assume once more…Let the working period
last 6 weeks… Let the time of circulation be ... and so on. But while the predomi-
nantly typological and visual resources of written language are doing one job, namely
setting the premises, the table is doing another job, more efficiently than language
could ever do, by concisely setting out continuously varying topological values by
virtue of the relationships it construes among the various rows and columns. The
How can we study tables systematically? 65

column is semantically homogeneous. It is glossed by a Head noun which in the


linguistic semiotic provides a thematically condensed meaning, rather than a full
clause, for the items in that column. The meaning of this item can be recovered by
the reader either from the surrounding linguistic text or by relevant intertexts.
Importantly, the meanings of these condensations may be implicit or explicit to
varying degrees. In this case, we have Homogenous weeks, money, working period,
turnover. The rows, on the other hand, are semantically heterogeneous, thereby
enabling the reader to construe various types of relationship between the various
columns. By virtue of this, they can be interpreted comparatively. All this is a com-
bination of language plus, crucially, visual and spatial resources. If language alone
had to explain all the meanings of the table, it would go on for many pages, as was
indeed the case with Darwin’s intricate yet fascinating description of pen movements
on a glass surface. While in Darwin’s case the result is a successful piece of writing,
most modern scientific writers would shy away from this solution, given the typically
linear/typological character of language. The typological is the discrete, the digital, the
discontinuous. Language is especially good at construing phenomena typologically,
whereas the visual image is especially effective in construing the topological, the
continuous, the flowing and the merging (Thibault, 1997a; Lemke, 1998).
The table is heavily rooted in language and is not capable of representing
many phenomena or scientific processes. One of the fascinating aspects of the text
reproduced in Figure 2.3 is the fact that Marx actually admits that he is struggling
to represent temporality:

The two capitals I and II remain entirely separate. But in order to represent them
thus as separate, we had to tear apart their actual interrelations and intersections, and
thus also to change the amount of turnover. For according to the above dia-
gram, the amounts turned over would be…[…]… But this is not correct, for
we shall see that the actual periods of production and circulation do not absolutely
coincide with the above diagrams (Capital p. 311, Vol II, our italics)

Since the time of Darwin and Marx, our ability to construe temporality has
evolved. We have a greater awareness of the difference between the diagram and the
table and the different ways in which they can make meanings. As may be seen in
Figure 2.3, table and diagram are synonymous in Marx’s text, whereas in modern
times the table is partly embedded in language – it integrates language and visual
semiotic resources – as opposed to a diagram which tends to be much more
abstracted from the thematics of language. This enables the diagram to do things,
i.e. make meanings, in significantly different ways from the table, as shown by an
instance from a page from The Economist, (Figure 2.4) which, apart from the run-
ning text (omitted here), contains four charts, each of which proposes graphic,
tabular, diagrammatic and linguistic information in differing proportions. The left-
66 Multimodal Transcription and Text Analysis: Chapter 2

hand chart in Figure 2.4 is about Slumping, that is the danger of a worldwide eco-
nomic recession. It is divided into two parts World GDP growth, which shows the
ups and downs of GDP over several decades, and GDP shares, which shows the dis-
tribution of GDP in the world’s economies in 1997. The top right-hand chart, A Raw
deal, is about the decline in time of commodity prices, while the bottom right-hand
chart, Drying up, is about the decline in real as opposed to expected capital flows to
emerging countries. The key factor to note is that in the top two charts in Figure 2.4
we have a vector, such that the foregrounded emphasis is on the horizontal as con-
trasted with the vertical (which is what we found in Marx), and which we also find in
the bottom two charts. A vector is, in its simplest form, an arrow (Kress, Van Leeuwen,
1996). In the grammar of visual meaning, a vector signifies activity, action, movement,
direction and dynamism in time. There is a single vector in the top left-hand chart while
two are present in the right-hand one (with a joint starting point). In contrast to the
plant movements texts shown in Figure 2.1, none of these vectors need have the form
of arrows: no such explicit representation is necessary because, owing to the inclusion
of an explicit time scale, the vector successfully carries out the function of allowing
readers to make up their own minds about potential future developments.
This is a significant shift with respect to the visual semiotic resources avail-
able to Marx and other theorists of that era, who were struggling to represent time

Figure 2.3: An example of the use of the table in Marx


How can we study tables systematically? 67

in terms of static vertical structures of classification (cf. Figure 2.3) or who, alter-
natively, made use of rather primitive vectors. If we take the opening page of
Chapter X of Capital (Figure 2.5), we notice that, among its other functions, the
vector is used to measure the working day. The ‘B-C’ lines in Figure 2.5 make
their meaning through their differing length, in this case, as explained in the text,
to represent the distinction between ‘necessary working time’ and ‘surplus working
time’, the first a constant, the latter a variable. In itself, this is an interesting and
important development because it shows the beginnings of the use of the line to
represent change in time. However, even the most cursory glance at the properties
of vectors in a modern scientific text will clarify that the line has now acquired
many new functions. In computerised science texts, vectors are manifested in
many ways.
Whether as call-outs in word-processing packages or as indicators of the
presence of drop-down menus in websites, they play an important role in the
creation of pagelets appended in various ways to other pages. They are thus an
important resource in the evolution of the page from a static, printed medium to
the dynamic, multiple-dimension phenomenon of the distance communication age as
represented by the web page (see 3.7, pp. 130-136 and 3.8, pp. 136-146) and the
video text (4.11, pp. 223-248). A comparative study of the modern scientific page
© The Economist Newspaper Limited, London (03/09/1998)

Figure 2.4: A typical use of charts in The Economist


68 Multimodal Transcription and Text Analysis: Chapter 2

and that of a previous era helps us understand the role that various resources, such
as vectors, play in a page while at the same time providing an important arena in
which to explore a further question that we posed at the outset, namely: how does
the page communicate?

2. 2. 2. How does the page communicate?


An integrated grammar of the visual and the linguistic underlies the construction of
the modern scientific page, a matter that we touched on in Chapter 1. Importantly,
visual images do not simply convey a content: they are not just more or less accurate
representations of the real world, perhaps with added aesthetic or connotative value,
even though aesthetic appeal as expressed in modern journals of political economy
like The Economist is an important addition in the evolution of this genre. Rather,
they communicate through the way they select or choose visual forms and the ways
these are structured and organised to form a visual text. This does not mean that
the makers and the interpreters of visual texts need to follow rigidly prescriptive
rules of what is ‘right’ or ‘wrong’ as far as visual composition is concerned. The
word ‘grammar’ is not being used here in this traditional, prescriptive sense. Instead,
visual images, just like linguistic texts, make use of a system of possible forms and
their possible combinations to create visual texts which may be interpreted according
to particular cultural conventions and practices.
A grammar refers to a socially-shared system of resources – visual, linguistic
and so on – which we use in order to make meanings in ways that others can rec-

Figure 2.5: An example of the use of vectors


How does the page communicate? 69

ognise and understand. So far we have discussed this with regard to the integration
of visual and verbal resources through the use of resources such as diagrams, tables
and vectors. A further step relates to the question of intertextuality (see Inset 8, p. 55).
The dependency of texts on each other and the way a text incorporates other texts
has been explored, among others, by Lemke (1985), Thibault (1994b) and Bakhtin
(1986); the latter, as we have mentioned previously (Chapter 1 ), with regard to the
question of heteroglossia and the distinction between primary and secondary
genres (see Inset 6, p. 43). These writers essentially adopt a position whereby the
speaker is situating himself or herself in an entire field of social relations and the
conflicts which are necessarily generated within this field (Thibault, 1989: 202).
Hence any personal opinion or view or interpretation is inevitably connected to the
social and collective nature of meaning making.
In the case of Marx we can see clearly how Capital is a composite of texts
that function to give different perspectives and points of view on the same themat-
ics: the labourer’s, the parliamentarian’s, the mine owner’s, the factory inspector’s.
Indeed, the explicit attempt to represent a plurality of voices that we find in Marx’s
work raises the question of Marx’s intuitive awareness of the dynamic processes of
social heteroglossia and intertextuality. Thibault (1989: 183, 203) has pointed out that
all texts, whether spoken or written, have the meanings they do in relation to other
texts within a given speech community as well as historically prior texts with the re-
sult that texts do not stand in given, static or neutral relationships with other texts.
Instead, the relationships between texts are constructed by text users. Speech is
constituted out of the heteroglossic interplay of both consensus-oriented and con-
flicting discourse voices in our social meaning-making practices. Capital exemplifies
all this very well. In one sense Capital is a rearticulation of discourse voices
arranged in such a way that the reader is encouraged to perceive new meanings and
new relationships between them. Thus Marx goes beyond merely quoting written
reports and instead often invites the reader to look at them in a different light from
that which was originally intended. As this book attempts to suggest, see for exam-
ple the analysis of the web page in the second part of Chapter 3 and the analysis of
the Westpac text in Chapter 4, recontextualisations (Inset 17, p. 213) and realignments
typify modern discourse, whether in informal oral discourse or the more formal scien-
tific page. As Lemke writes:

The voice of a text is heard only against the background of the voices of other
texts, within some relatively stable social system of heteroglossia. A text voice
orchestrates the available social voices of heteroglossia, speaking some in its
own name, and putting others in the mouths of those it defines as opponents,
allies, or bystanders. It construes, directly or indirectly, value-orienting relations
among these voices, evaluating some favorably and others unfavorably, and
modifying, reproducing, or presupposing the prevailing relations of the voic-
es in a community.
(Lemke, 1989: 40)
70 Multimodal Transcription and Text Analysis: Chapter 2

An important function of intertextuality is that of creating a relationship


between texts that would not otherwise exist; and hence creating meanings that
would otherwise not exist. The numerous examples of political satire in the
Economist illustrate this principle. Further examples of intertextuality which
function to create a strong evaluative, interpersonal appeal include the use of
photographs in the scientific page. Even photographs which are proposed as objec-
tive and scientific are, on close inspection, unlikely to be entirely neutral (see 2.5, pp.
78-80, 2.6, pp. 80-90, and Figures 1.8 and 1.9).

2. 3. Science textbooks and multimodal meaning making

We may briefly summarise what we have so far stated or implied. Scientific texts
have always combined and integrated language and visual images in the making of
the specialist meanings of scientific discourse (Lemke, 1998). In science texts,
words and images are typically not, and possibly never have been, strongly insulated
from each other.
Moreover, the conventions of scientific discourse do not assume that lan-
guage has necessary priority over the visual. Importantly, this distinguishes
scientific discourse from the privileged forms of verbal literacy which are derived
from the literary traditions that have informed almost all thinking about language
in the Western European tradition since the early linguistic theorising of Plato and
Aristotle. Instead, science texts exhibit a close interaction between verbal and visual
semiotic modalities. The two semiotic modalities work together – in synergy – to
create the overall meaning of the textbook page. The two modalities are seen as
closely linked according to shared principles of composition which are based on
the visual design or layout of the page as a whole.
Thus science textbooks are visual semiotic artifacts as much as they are
linguistic ones. This does not mean that writing now ceases to play an important
role. Rather than adding pictures to words, words are integrated with visual images.
Moreover, writing is itself becoming increasingly integrated with principles of
visual design as the graphological resources of the written language increasingly
exploit the possibilities afforded by other visual semiotic resources.
The codependence of visual and linguistic semiotic modalities in science
textbooks is based on multimodal conventions of meaning making which have
tended to elude the predominantly language-based forms of literacy which have,
until recently, shaped most of our thinking about educational practice and intellec-
tual activity. An important component in the study of such conventions is the
analysis of the semiotic principles whereby verbal and visual resources are com-
bined to produce the multimodal meanings that are characteristic of science (see
also Kress, 1998).
Visual, verbal and actional semiotic resources in a table 71

2. 4. Visual, verbal and actional semiotic resources in a table

2. 4. 1 Visual and verbal resources


An analysis of some of these principles is presented below with respect to the
multimodal construal of the thematics of blood and its transportation in the body
in an Australian and two Italian school science textbooks. The first example is
taken from Australian Biology (1957), and is addressed to junior high school pupils.
Our first analysis is of a table (see 2.2.1, pp. 64-68 and Figure 2.6). Of all the visual
genres, tables, as Lemke (1988) points out, most directly derive from written text.
As mentioned above, they are typically extremely elliptical, without full
grammatical constructions. Hence, they use visual-verbal resources and other
specialised symbolic notations to enable the required thematic meaning relations to
be recovered. However, even in the absence of full grammatical resources, there is
always an implied grammar which is recoverable either elsewhere in the same text
(i.e. intratextually), or in other texts – both linguistic and visual – that the particular
text presumes or invokes as relevant to the interpretations of its thematics (i.e.
intertextually). In the present example, this is evidenced by the ways in which there
is a redistribution of meanings between the lexicogrammatical and the visual-
graphological resources of the table, such that some of the functions of the for-
mer are now taken over by the latter. There is, in other words, an intersemiotic
trade-off between the two sets of resources to the extent that table and main
verbal text may be construed as paratactically related to each other in a semantic
relation of equivalence. In other words, tables often involve a high degree of
thematic condensation: readers of the text are required to unpack the relevant
semantics on the basis of a high degree of experiential under-specification. Readers
achieve this on the basis of their familiarity both with the relevant semantic
relations as well as with the social practices for interpreting them. This is the under-
lying logic of expert discourse (Thibault, 1991a). However, the texts considered here
are for junior apprentices in the discipline rather than experts. They therefore entail
considerable negotiation and mediation between the familiar, everyday ways of
making meaning that the students recognise and use in their daily lives and the
specialised semantic patterns, genres, and perceptual modalities of science (see 2.8,
pp. 93-102 and Inset 19: Negotiation , pp. 245-247).
The table shown in Figure 2.6 shows relatively sparse use of thematic-
semantic condensation. Furthermore, its thematics are entirely linguistic: the
numerals which identify the several rows have a primarily textual-organisational
function of indicating the sequential ordering of items; they do not have any
experiential function or any mathematical significance. This table follows the typical
textual-organisational principle of vertical columns and horizontal rows, thereby
arranging both semantically homogeneous and heterogeneous items (Lemke,
1998). Reading down the column headed by the nominal group Red Corpuscles, for
72
Multimodal Transcription and Text Analysis: Chapter 2

Figure 2.6: Table and related verbal text : pages 60 and 61 in Australian Biology
Visual, verbal and actional semiotic resources in a table

Figure 2.7: Blood under the microscope and related verbal text (pages 62-63)
73
74 Multimodal Transcription and Text Analysis: Chapter 2

example, each item is thematically subordinate to, and hence homogeneous with
respect to, the superordinate heading. The table can be read to obtain information
about an item (e.g. red corpuscles), or to contrast them, item by item, with the
information that is arranged in the second column under the heading White
Corpuscles. In this table, the items in each row in the first column, Red Corpuscles,
are elliptical with respect to their corresponding full clauses. The latter have been
reconstructed in Table 2.1. The linguistic thematic in the table in Figure 2.6 is only
minimally condensed. Overall, the foregrounded meaning of the table amounts to:

(1) contrasting/comparing red and white corpuscles on the basis of


clause level schema, as shown above;
(2) arranging, in the two columns, maximally homogeneous thematic
meanings about each of the two types of blood corpuscles in
relation to the superordinate nominal groups Red Corpuscles and
White Corpuscles, respectively.
The table is not even explicitly identified as such by means of a caption. This is an
indication of the relatively low level of ellipsis. This table is assumed to be so
highly integrated with the thematics of the surrounding linguistic text that it is
barely separable from it. There is no explicit link to the table in the surrounding
verbal text, i.e. no intratextual phoric link of the type ‘See Table 1’. We may note
in passing that we subscribe to the view that existential there, as illustrated in the
second row in Table 2.1, is a nominal item which has some ‘weak pointing force’
such that it points to a quantifiable occurrence of a general class of ‘thing’ as
realised by the Existent in existential constructions (Davidse, 1992: 126). In this
line of reasoning, the experiential function of existential there is glossed as Setting.
It is thereby assumed that existential there is not an adverbial deictic of place but
instead expresses the experiential participant function called Setting.

2. 4. 2. Thematic development of the page: hierarchies of textual periodicity


As a general principle, the page as a whole can be regarded, for our purposes, as the
highest ranking textual-compositional unit which serves to integrate lower level units
on the basis of their spatial arrangement and integration on the page. The principle
also extends to the double-page display (Figures 2.6 and 2.7) achieved in the present
case through the integration of lexicogrammatical, typological-graphological and
visual semiotic resources. On this basis, it is possible to postulate a hierarchy of
textual periodicity, using the notions of macro-Theme, hyper-Theme and Theme,
along with macro-New, hyper-New and New, as developed by Martin (1992: Chap.
6), who adapts Daneš’s term hyper-Theme (Daneš, 1974, 1989). A hyper-Theme is an
introductory clause or group of clauses which serve to predict a particular
thematic development in subsequent textual units such as the paragraph. A hyper-
Theme may also predict, as in our example, the pattern of thematic development in
Thematic development of the page: hierarchies of textual periodicity 75

a table or some part of this (see below). A macro-Theme is a clause or group of


clauses, or even a paragraph, which predicts the overall thematic development of a
whole text, chapter, and so on. The resulting hierarchy of periodicity means that
macro-Themes predict hyper-Themes, which predict clause-level Themes. Themes
at all these levels look forward and hence further anticipate thematic development in
the text. New elements, hereafter called New, are retrospective and work to accu-
mulate and elaborate the thematics of the text. As before, this is so at all levels in
the proposed hierarchy of periodicity. In this subsection, the terms Theme, New
and thematic development specifically pertain to the textual metafunction, as defined in
systemic-functional linguistics (Halliday, 1994 [1985]: Chap. 3; Halliday, Matthiessen,
2004). Adapting Martin’s notion of a hierarchy of textual periodicity to the multimodal
page, as in our example, the following proportionalities may be proposed:

(1) macro-Theme/macro-New: page (as textual unit for the multimodal


integration of images, verbiage, etc.);

(2) hyper-Theme/hyper-New: paragraph, table, column in table


Theme/New: clause, single image (figure, graph, photograph,
etc.).
In our example (right-hand page in Figure 2.6), the macro-Theme is com-
prised of the following structure: MAIN TITLE^ FIRST PARAGRAPH. Each of these
two elements has a specific contribution to make to the overall functioning of this

Red corpuscles have an orange-red


1 CARRIER^ PROCESS^ATTRIBUTE (type-quality)
colour

SETTING^ PROCESS^ EXISTENT^ CIRCUMSTANCE There are about 5,000,000 in 1 cubic


2 millimetre (see ruler) of blood.

3 CARRIER^ PROCESS^ ATTRIBUTE (type-quality) They are about 1/3200 inches wide

4 CARRIER^ PROCESS^ATTRIBUTE (type-category) They are biconcave discs

5 CARRIER^ PROCESS^ATTRIBUTE: POSSESSION They have no nucleus (unusual cells).

They are made in the liver and


6 MEDIUM^ PROCESS^ CIRCUMSTANCE (ergative)
spleen ...

They break up after about 120 days


7 MEDIUM^ PROCESS^ CIRCUMSTANCE
and the remains go to the spleen.
Table 2.1: Reconstruction of ellipted clauses relating to
red cor puscles in the first column of the table in Figure 2.6
76 Multimodal Transcription and Text Analysis: Chapter 2

macro-Theme. Thus, the main title of the chapter, viz. Blood and its functions,
specifies the most salient thematic items in a highly condensed form. It does so
here through the interaction of Theme and New in the following way. The noun
blood in the title establishes the point of departure for the thematic development
of the page as a whole by referring back to meanings which were made on the pre-
vious page, where the topic of blood had already been introduced (lefthand page

Verbiage
macro-theme macro-New

Title: BLOOD ( AND) ITS FUNCTIONS


First paragraph: Blood is made up of: (1) A fluid, called plasma,
(2) Cells, called platelets,
(3) Cells, called corpuscles, red and white

hyper-Theme hyper-New

(1) The pale yellow plasma makes up a little over. .

macro-theme macro-theme

(2) The platelets are small, colourless cells without nuclei.. .

Table

hyper-Theme hyper-New hyper-Theme hyper-New

Red Corpuscles White Corpuscles

An orange-red Colourless.
colour.
About 7,000 to
About 5,000,000 in
10,000 per cubic mm.
1 cubic mm (see
ruler) of blood. etc..

Red corpuscles are White corpuscles


Work elastic...to form oxy- push and pull...help
haemoglobin Work
the blood to clot.

Table 2.2: The multimodal thematic development of the page about


red and white blood corpuscles; integrating table and verbiage
Thematic development of the page: hierarchies of textual periodicity 77

in Figure 2.6). It both recapitulates in condensed form these meanings at the same
time that it anchors the further development of the meanings of the current page
in relation to meanings that have already been developed. The second part of the
nominal group complex in the title, viz. its functions, is New because it anticipates
meanings which have yet to be made in the further development of the meanings
of this (and the following) page. This dual function of the main title is also high-
lighted both visually and spatially by the choice of upper case letters and the spatial
centring of this textual unit near the top of the page. As an element of the macro-
Theme, the main title of this page functions to predict the hyper-Themes which are
derived from it in both the main text and the table. A hyper-Theme may also predict
the pattern of thematic development in the table as a whole, though this is not the
case here. Themes function to predict the pattern of thematic development at the
level of the clause or the intersection of row and column in a table (see
Martin,1992: Chap. 6 for these terms).
The first paragraph is a further component of the macro-Theme: it too pre-
dicts the hyper-Themes developed in subsequent paragraphs. This paragraph con-
tains these macro-News. These are highlighted by the graphological resources of
indentation, bold type, and numbered sequencing. The first two of these macro-
News is, in turn, further developed as the hyper-Theme of the two paragraphs
which follow. However, the third of these macro-News, i.e. Cells, called corpuscles,
red and white, is not taken up by one or more paragraphs of main text as hyper-
Themes which are, in turn, further developed as hyper-News. Instead, this macro-
New is further developed in the table. In this case, the headings of the two
columns – i.e. Red Corpuscles and White Corpuscles – in the table function as hyper-
Themes which predict the parallel and simultaneous development of two sets of
hyper-Theme and/or hyper-New relations in the form of the numbered sequences
of ellipted clauses that constitute the rows in each of the two columns.
Each column of the table concludes with a further wave of periodicity in the
form of extended verbiage concerning the theme of work. This subheading may be
construed as a cohyponym of the word functions in the main title at the head of the
page. In this way, the hyper-Themes Red Corpuscles and White Corpuscles, which
establish the pattern of thematic development in the two columns of the table, are
further developed experientially in terms of the specific functions of the two types
of blood corpuscles. They thus constitute a further parallel set of hyper-News which
unpack and elaborate the highly condensed thematics of the title of the page, Blood
and its Functions. If, on the other hand, these meanings had been developed exclu-
sively in the verbal text, the simultaneous construal of thematically homogeneous and
thematically heterogeneous items, which the visual-graphological resources of the
table afford, would not have been possible. The method of thematic development of
the page discussed above is illustrated above on the facing page in Table 2.2.
78 Multimodal Transcription and Text Analysis: Chapter 2

2. 4. 3. Actional semiotic resources


There is also one explicit indexical or exophoric link to the technical operations
associated with the practices of using the ruler to measure phenomena (see Row 2 in
the Red Corpuscles Column in Figure 2.6). In the present case, this would appear to
be an explicit tie between the meanings made in the textbook and specific classroom
practices and activities involving the use of this technical instrument. The intertextual
tie is to an actional semiotic involving the sensori-motor activity of manipulating a
specific tool – the ruler – in relation to specific measurement practices. The ruler itself
implicates a structuring system of mathematical signs and related meaning-making
activities. In this instance, the ruler, as a linear measuring tool, is invoked as a
resource for calculating a given volume of blood – one cubic millimetre – in
relation to the number of red blood cells typically contained in this volume. It
therefore plays an important mediating role in the structuring of the field of signs
whereby children may activate an appropriate volume schema. In this way, the ruler
both mediates and structures the student’s interaction with the given material phe-
nomenon as well as with the conceptual representation which he or she conse-
quently acquires. As we shall see in the next section, there is yet another link to a
material action that is construed in relation to the full-page photographic display
reproduced above in Figure 2.7.

2. 5. Blood under the microscope: multimodality in a photographic display

The photographic display reproduced in Figure 2.7 cannot be adequately analysed


in terms of any one semiotic modality. In this full-page visual display, three semiotic
modalities work together to produce the overall meaning of the text. These are:
(1) verbal : the linguistic thematics which is built up and developed
mainly by selections at clause, group and word ranks.
(2)visual-pictorial : visual pattern recognition and associated conventions
for interpretation according to the visual grammar which is
deployed;
(3) technical-operational : the use of a microscope (top, left-hand corner
of page 63 in Figure 2.7) and, hence, the sensori-motor activity
involved in manipulating this type of instrument.

Interestingly, the verbal dimension makes extensive use of the visual-graphological


resource of bold type to assign visual prominence to thematic items – i.e. techni-
cal terms – that the writer of the text wishes to foreground. Only nominal items
are selected for highlighting, possibly in conformity with a folk-ideological view of
language as naming and labelling. In actual fact these items also reflect the scientific
practice of naming and classifying phenomena according to criteria which do not
necessarily correspond to commonsense classifications (Martin, 1993: 86-90). In
Blood under the microscope: multimodality in a photographic display 79

the process, the important semantic relations between processes, participants and cir-
cumstances in the clause are correspondingly downplayed by the linguistic folk-theory
implicitly at work here. Such intraclausal relations are an important means through
which thematic systems are developed. The use of the microscope further implies an
intertextual/indexical link to the practices of the laboratory and highlights the cross-
coupling of semiotic and material relations and processes in the making of these
scientific meanings. These practices and the expanded perceptual capacities afford-
ed by the microscope operate as prosthetic extensions of our perceptual systems
which expand the human Umwelt (Harré, 1990: 301). The Umwelt is that part of the
physical-material world which humans have adaptively modified as their living space.
Thus, we see that technical instruments, no less than linguistic and pictorial signs,
mediate human activity and shape the processes of abstract thought and reasoning
(see Chapter 3 for further discussion of this important concept).
The three photographs that are displayed on page 63 of Australian Biology
(Figure 2.7) do not, of course, operate according to the modality of realism. First,
they index a referent situation which is not available to ordinary human perception
on the scale of reality on which humans normally act and perceive, i.e. the every-
day material and social world that we perceive through our senses, live in and in
which we interact with others. Oddly enough, photographs are often considered as
true pictures of an objective reality. However, graphs and diagrams also function
to interpret the world as it truly is. Yet, they lack the characteristics that confer on
a photograph its sense of reality. Graphs and diagrams are abstract and general;
photographs are detailed and specific. Nevertheless, both make claims about the
way things really are. They do so in different ways, according to different
conventions as to how we relate visual images to the world. Photographs are inter-
preted according to naturalistic conventions. That is, the greater the congruence
between what you perceive with the eye and what the photograph represents, the
higher the reality modality of the image. Clearly, photographs, like other visual texts
in other domains, apply this principle to varying degrees.
The photographs shown here are oriented to scientific truth and objectivity.
The scientific definition of reality and truth operates according to a different reality
modality from the naturalistic one. In this case, it aims to reveal deeper, more general
principles that go beyond the detail and characteristics of the specific instance. In the
examples analysed here, there is no background detail; colour and depth are also
absent. Features like background detail, colour and depth are signifiers of reality in the
naturalistic domain of the photograph. However, in the scientific domain of the graph
and the diagram, such details are considered unimportant. Again, these principles may
be applied to various degrees and this will depend on the goals of a particular text.
Moreover, the three images are concerned with conceptual-relational, rather than
actional, meanings (Kress, Van Leeuwen, 1996: 79-118). Thus, they function to
analyse the given phenomenon into its component parts, or to classify related
80 Multimodal Transcription and Text Analysis: Chapter 2

phenomena by grouping them together for the purposes of comparison. This would
be one motivation for the juxtaposition of the three photographs on this page.
How are linguistic and visual semiotic resources integrated in multimodal
texts? As discussed in detail in Chapter 1, Halliday (e.g. 1978, 1994 [1985]), holds that
language form, i.e. lexicogrammar, is simultaneously organised in terms of a number
of overlapping semantic-pragmatic functions (see Inset 4, pp. 22-23). Lexicogrammar
simultaneously:
� categorially construes experience, including naming and referring;
� enacts interpersonal relationships;
� construes relations of logical (spatial, temporal, causal, and so on)
dependency between items;
� provides the means whereby language coheres textually as
discourse which is operational in context.
As we saw in Chapter 1, similar proposals can be made about other, non-linguistic
semiotic systems (see Baldry, 2000a; Kress, Van Leeuwen, 1990, 1996; Lemke, 1998;
Martinec, 1998; Nalon, 1997, 2000; O’Halloran, 2004, 2005; O’Toole, 1994;
Thibault, 1994a, 1998b, 2000a, Ventola et al., 2004). The fact that different semiotic
modalities such as language and depiction share a common metafunctional basis
thus becomes an important principle enabling their integration in multimodal
semiosis (see Inset 4 , pp. 22-23). It does not necessarily follow, however, that all the
metafunctions are equally distributed across all the semiotic modalities which are at
play in a given instance. Moreover, different weightings and distributions of the meta-
functions may come to the fore in different phases of the same text (see Inset 7, p. 47).
The full meaning of the particular page under consideration can only be
understood by integrating experiential, logical, interpersonal and textual contribu-
tions from: the visual semiotic; the linguistic semiotic of the captions; the relations
of this page to the main verbal text; the actional semiotic of laboratory practices
and other technical procedures such as using the microscope and measuring with
the ruler. However, it is important in this respect to notice once more how the
resource integration principle (Inset 3, pp. 18-19) is at work: the visual text is not
simply an illustration of a more important verbal text. Rather, the visual text adds
dimensions of meaning that are not, and cannot, be made in the linguistic text; the
visual text is complementary to, rather than subordinate to, the linguistic text, as
2.6 explains.

2. 6. Integration of scientific photographs and verbal text

2. 6. 1. The textual metafunction


We shall now consider in more detail the metafunctional integration of verbal and
visual resources in the full-page display featuring photographs of blood cells on p.
Integration of scientific photographs and verbal text 81

63 of Australian Biology (Figure 2.7). The textual dimension of visual semiosis is


often closely related to the construction of a reading path (Kress, Van Leeuwen,
1996: 218-22). In general, as indicated in Chapter 1, reading paths on the page are
constructed through the interaction between choices from the following sets of
distinctions: centre and periphery, left and right, top and bottom. Some further
aspects of the choices made in relation to these need now to be discussed starting,
firstly, with the grouping of the three images together on the same page in the text
in question. This particular supercluster suggests that there is some principle of
unity at work even though there is no superordinate linguistic caption for the three
images and hence no explicit link from the linguistic text to the overall meanings
made on this page. This is further highlighted by the absence of borders or other
framing devices which separate the images off from each other on the page. The
absence of such boundaries thus enhances the potential for thematic links to be
construed between them. Secondly, the positioning of the top two images on
opposite sides of the page, equidistant from a central vertical axis, suggests in itself
symmetry and complementarity. This is subtly offset by the contrast between the cir-
cular and the rectangular forms of these two images, indicating some degree of
contrast rather than opposition. Thus, the frog and human cases are related as instan-
tiations of a still wider class at the same time that they are flagged as being different.
This is further suggested by the verbal texts that are copatterned with the two
images. While the untrained reader will readily perceive a similarity of visual pat-
terning in the two images, he or she will not necessarily know which meaning to
assign to the perceived pattern. At this point, the untrained or apprentice reader will
need to make connections on the basis of the captions that accompany each image
and, in the case of the second image (top right corner Figure 2.7), the labels that
index selected features in that image. In turn, these captions and labels mediate
between the images and the thematics of the main text.
The lower image of the frog’s capillaries is striking on account of its greater
size in comparison with the upper two images. In some sense, it may function to
anchor the top two images in relation to the page as a whole. Importantly, the three
images are pivoted around an ‘invisible’ centre from which they are roughly equidis-
tant. This functions to achieve an overall sense of balance and symmetry, as opposed
to tension or conflict. Again, the message is that the three images are all visual
cohyponyms in relation to some superordinate category which they belong to.
The combined effect of the top-bottom and left-right organisation of this
page can be analysed with reference to the distinctions that Kress and Van Leeuwen
(1996: 208) make between Ideal-Real and Given-New, as shown in Table 2.3, which
draws attention to the coarticulation of the page into four subspaces that are dually
organised around the principles of top-bottom and left-right organisation in
relation to a central horizontal axis and a central vertical axis. With reference to the
distinctions made in Table 2.3, the following analysis is proposed. In the full-page
82 Multimodal Transcription and Text Analysis: Chapter 2

photographic display from Australian Biology, the frog and human blood examples
are positioned in the top part of the page, on the left and right respectively. Both
images show instances of blood belonging to animals of the superordinate class
[BACKBONED ANIMALS], as distinct from lower animals such as molluscs, and so on
(see p. 60 in Figure 2.6). However, amphibia are positioned as being lower on a ‘per-
fection’ scale, as we shall now see, and this would motivate their positioning in the
overall visual space with respect to the more highly valued human case. The image
showing the human white corpuscles (top right) is therefore accorded maximum
visual salience, as well as being rated highest on a scale of visual importance. This
is consistent with the thematics of the verbal text, e.g. This [double circulation] reaches
perfection in mammals, so again the human will be taken as a typical example, important
to us (p. 60 in Figure 2.6). The covariate thematic tie which can be construed between
these visual and verbal elements also works to enact a joint visual-verbal axiological
orientation which positions the human case as high on both the usuality and impor-
tance modal and evaluative scales. The centring of the image of the frog’s capillar-
ies in the lower part of the page in relation to its greater overall size suggests that it
is being positioned about midway along a cline between the low and median points
with respect to the Given-New and salience parameters (see Table 2.3).

2. 6. 2. The ideational (experiential and logical ) metafunctions


In the case of the first image (top left on p. 63 in Figure 2.7), the verbal thematics
of the caption are linked to the mention of the blood of backboned animals, which
are subdivided into two main groups, depending on whether the organism uses gills
(fish and tadpoles) or lungs (adult amphibia, reptiles, birds, mammals) in the circu-
lation of blood (p. 60 in Figure 2.6).
The second image (top right) has both a caption Blood and circulation and
several labels. Whereas the thematics of this caption link back to the main text on
p. 60 (Figure 2.6), the labels are more closely tied to several different aspects of
Column 2 of the table on p. 61 (Figure 2.6). Thus, the full clause of the
[Actor ^ Process: Material ^ Goal] type, Human white cells take in bacteria, ties in with
the extended note at the foot of this column, while the nominal groups irregular

TOP LEFT TOP RIGHT


ideal/given ideal/new
salience: median salience: high
importance: high importance: high
BOTTOM LEFT BOTTOM RIGHT
real/given real/new
salience: low salience: median
importance: low importance: low

Table 2.3: The co-articulation of the page into subregions


showing top-bottom and left-right organisation
The ideational (experiential and logical ) metafunctions 83

nucleus and surrounding protoplasm link to Row 5 of the same column. Moreover, all
three of these labels are connected by arrows to selected features in the image
which they indicate. More precisely, we would argue that what is being constructed
here is a joint verbal-visual thematic system whereby a number of resources and inter-
texts are integrated in order to construe the superordinate thematic meaning
glossed here as [WHITE BLOOD CELL]. The use of upper case letters in square
brackets is a convention for specifying a higher-order, more abstract thematic
formation, rather than specific linguistic or visual instantiations of this (see Lemke,
1983; Thibault, 1986 for the earliest formulations and developments of these
notions). The meaning white blood cell is not simply built up and developed through
the experiential resources of language, i.e. through clause level selections, patterns
of lexicosemantic collocation and so on. No less importantly, there is also an
experiential dimension to this overall thematic formation in the visual semiotic.
The choice of the arrow which connects the various clause or nominal group labels
to selected features of the second (top right) image is, of course, a visual semiotic
resource which functions as a directional vector. As we shall see below, it functions
to construe an intersemiotic relationship between the verbal and visual modalities.
The third image (bottom part of p. 63) links back in with the thematic item
fine-walled capillaries, linking arteries and veins on p. 60 (Figure 2.6), as well as with
the caption Blood capillaries in a frog’s foot. Here we see a number of different kinds
of basic relations between the verbal and the visual semiotics. In particular, these
are: main verbal text to figure and label or caption to figure. Such relations are
quite typical of multimodal scientific texts. The meaning of the text cannot be
restricted to either of these components considered separately. Rather, the caption
or label is co-contextualised with the visual elements of the figure or some specific
aspect of this. The one cannot be reduced to the other, since the two different
semiotic systems make meanings in quite different ways. The combination of the
resources from the two systems results in a new set of meaning relations which is
not reducible to the sum of its parts. This is yet a further example of the resource
integration principle (Inset 3, pp. 18-19) that we illustrated in Chapter 1. The meaning
is the composite product of their combination, rather than the mere addition of
the one (e.g. the verbal) to the other (e.g. the visual) (Bateson, 1987 [1951]: 175;
Lemke, 1998).
For example, the three verbal labels pointing to various aspects of the visual
text Blood and Circulation cannot specify the simultaneity of the global relationships
among, say, the different light and dark areas in this figure, or of the varying shapes
of these same dark areas. Language is specialised, if not uniquely so, to construe
phenomena in terms of typological-categorial distinctions. Moreover, it is not well-
suited to the depiction of visual phenomena, as Bühler (1990 [1934]: 220-41) has
shown. The topological-continuous relations mentioned here are, on the other hand,
characteristic of the visual semiotic. The visual image Blood and Circulation is a
84 Multimodal Transcription and Text Analysis: Chapter 2

continuous topological field which, however, may be broken down into specific subre-
gions within this overall topological space (see Saint-Martin, 1985: 47-8).
The meaning of this multimodal biology text is a result of the composite
relations among its verbal and visual elements. However, this does not preclude the
likelihood that the meanings of the visual elements may be immediately apparent
to the trained biologist without the mediation of the linguistic dimension. Indeed,
this only enhances the fully-fledged semiotic status of the visual, as well as show-
ing that the intersemiotic links that may be construed between the verbal and the
visual can be achieved on the basis of varying reading paths according to factors
such as the specific interest or the level of expertise of the reader.
In the school science textbook, on the other hand, the verbal meanings of
the labels and the vectors (the arrows) linking these to selected aspects of the visual
text become more specified when cross-linked to a visual element. The first of
these labels, human white corpuscles take in bacteria, is a clause of the type
[Actor ^ Process: Material ^ Goal]. This is linked by an arrow to a specific relation-
ship between a relatively large dark area (white blood corpuscle) with respect to sev-
eral adjacent smaller ones (bacteria). The relationship between these two classes of
entities constitutes a subregion in the visual field. Such relations are established by
relations such as proximity, separation and envelopment (Saint-Martin, 1985: 47). In
this particular case, the relationships of both proximity and envelopment give rise
to a subregion in which the larger entity (the white corpuscle) and the smaller ones
(the bacteria) are seen as interrelated.
Visually, the precise nature of this relationship is probably not readily inter-
pretable for the person who is unfamiliar with the meaning-making practices asso-
ciated with the perceptually-enhanced means of interpretation afforded by the
microscope. However, the verbal text makes it clear that this particular subregion in
the visual field realises a Process-Participant relation analogous to the one men-
tioned above. In this case, the verbal text gives specific experiential meaning to
visual relations which would not necessarily be so interpretable by the non-special-
ist. It does so by construing the visual pattern in terms of a particular kind of
Process-Participant configuration. In other words, a thematic tie is construed
between the linguistic clause and a selected aspect of the image. Note, too, that
both nominal participants in this clause have generic rather than specific reference.
That is, the verbal text abstracts from the specificity of this particular white corpus-
cle taking in bacteria and construes the visual image in terms of the generic mean-
ings that are typical of technicality and abstraction in the discourse of science
(Martin, 1991, 1993). In any case, the visual image itself operates in the modal
domain of generic truths abstracted from everyday perceptual experience (i.e. nat-
uralism), as indicated by the absence of depth and the use of black-and-white.
In the case of the other two labels, viz. irregular nucleus and surrounding pro-
toplasm, the vector linking these to selected aspects of the visual image establishes
The ideational (experiential and logical ) metafunctions 85

a cross-modal relationship between the verbal and the visual elements which is,
semantically speaking, analogous to an attributive predication. In the first instance,
the verbal elements specify the visual elements more precisely. In other words, a
type-category, as specified by the nominal group in the verbal label, is instantiated by
a visual element. In these two cases, the arrow is analogous to the copula be of
relational be -clauses. It functions to establish an intrastratal relationship of instan-
tiation – i.e. schematicity – between the two units. A schematic category is both more
general and more abstract than its lower level instantiations. The relationship of
schematicity can be explained as follows: the nominal group irregular nucleus
specifies a schematic category which the image instantiates as a more detailed
instance of the category. Importantly, the relationship between the two units – one
visual and the other verbal – is, logically speaking, one-way and irreversible. That
is, the type-category which is construed in the nominal label is schematic for the
visual element, whereas the visual element cannot be schematic for the type-category
in the nominal group (Davidse, 1992: 101).
The image instantiates in a more detailed and more specific way the
schematic category which is symbolised by the nominal group. This relationship is
one-way and irreversible because a particular instance does not specify the more
schematic criteria which define a particular category. Instead, instances conform to
varying degrees to the higher-order schema. This suggests that the precise semantic
relationship is more like attribution than identification. The attributing item in each
case is the nominal group, whereas the visual element is the item to which the
attribute is assigned (cf. Carrier in the grammar of attributive clauses; Halliday,
1994 [1985]: 120-2; Halliday, Matthiessen, 2004: 219-226). This semantic non-
reversibility is also illustrated by the fact that the spatial relations between the visual
and verbal elements in this multimodal syntagm are not interchangeable, as is further
suggested by the unidirectionality of the vector linking the two elements. Thus, the
vector moves unidirectionally from the schematic category in the verbiage to its
instantiation in the image. The image semantically elaborates the verbiage. The
notion of elaboration alluded to here is not, however, of the logical type (Halliday,
1994 [1985]: 225-9; Halliday, Matthiessen, 2004: 396-405) because there is no sug-
gestion of tactic dependency between the two items in the syntagm. Rather, elabora-
tion is intended as a more schematic category of ideation which is superordinate to
both attribution and identification. There is, therefore, a topological proximity
between the following two possible construals of these multimodal syntagms: (1) a
visual Carrier instantiating a nominalised Attribute; and (2) a visual Token/ldentified
being decoded as (‘signifying’) a nominalised Value/Identifier. The topological close-
ness of these two interpretations argues for a more superordinate ideational gloss
such as elaboration or exemplification as the best means for revealing the
indeterminacy between the two possibilities. The semantic relationship between the
visual and the verbal components of this syntagm may be modelled as in Figure 2.8.
86 Multimodal Transcription and Text Analysis: Chapter 2

The vector also has an interpersonal function, since it draws attention to a spe-
cific feature of the overall visual field and, hence, accords it a degree of importance.
In so doing, it organises and structures the visual field such that the visual feature so
designated demonstrates the linguistic category (Goodwin, 1994: 629) in ways some-
what similar to pointing with one’s finger. Thus, the spatial organisation of elements
on the page can now be seen not as a static display but, rather, as a material-semiotic
artifact which is positioned within a dynamic social-discursive field of specific profes-
sional practices and competencies concerning questions such as what is visually
salient, who is authorised to draw attention to it, to name it and to make specific
knowledge claims about it. The vector, as realised by the arrow, also has a textual or
linking (cf. phoric ) function such that the arrow is the means whereby the two units –
one visual, the other verbal – are textually linked to each other. Moreover, the unidi-
rectionality of the arrow suggests an indexical function as well. That is, the arrow
points to the specific visual item which the linguistic item elaborates. Unlike vectors
in visual material processes (Kress, Van Leeuwen, 1996: 56-64; Thibault, 1997a: 330-
3), the vectors under consideration here do not have experiential meaning. Instead,
their meaning is centrally concerned with the textual metafunction. Experiential
meaning, on the other hand, is to be found in the lower level units – visual and lin-
guistic – which are linked by the vector in this multimodal syntagm.
The three images may, overall, be said to realise meanings in the semantic areas
of classification and analysis. The juxtaposition of the three images on the same page
follows the classification pattern mentioned earlier. In this case, we are given two
types of blood – frog and human – as well as an example of the means by which
blood is circulated in capillaries. Kress and Van Leeuwen (1996: 81) point out that
such visual classificational processes relate participants to each other in terms of a
taxonomy. Visual taxonomies are characterised by a decontextualising ‘objectivity’, as
evidenced by minimal or no background detail, lack of depth and the use of frontal
perspective. The elements which comprise the taxonomy tend to be arranged symmet-
rically and at a roughly equal distance from each other within the visual field of the
image. The single page arrangement of the three images in the Australian text is a
good example of a visual classification, as is the inset showing different types of white
blood cells in the 1982 Italian text (Figure 2.10).
Kress and Van Leeuwen (1996: 79-89) argue that the elements comprising a
visual classification are (experiential) participants in a relationship of the experiential
kind. In our view, the elements in a visual classification are better seen as standing in
some kind of superordinate ideational relation as discussed above. In the two exam-
ples referred to here, no experiential situation is being construed at the level of the
display as a whole. Rather, each of the visual images elaborates or exemplifies the
implicit superordinate term which relates each of the constituent images – the sub-
ordinate terms – in a relationship of schematicity. In the Australian text, the superor-
dinate term may simply be glossed as [BLOOD], of which the human and frog cases
The ideational (experiential and logical ) metafunctions 87

are instances. In the inset featuring the white blood cells in the 1982 Italian text, the
superordinate term is [WHITE BLOOD CELL] and the individual images are different
subtypes of this.
Now, there is nothing natural or given about these classifications. This raises an
important question as to the role of visual images in the processes of learning
specialised scientific meanings. To the apprentice reader untrained in the specialised
scientific meanings, the images in the two displays under discussion here afford the
recognition of a number of basic visual patterns. In the Australian text, this may be
glossed, somewhat crudely, as small dark regions contrasting with a uniform grey back-
ground. In the inset of the white blood cells from the Italian text (Figure 2.10), we
may discern, among other things, the existence of a central area – a nucleus – in each
of these types, in spite of differences in shape. However, visual pattern recognition
alone is not a sufficient basis for inferring more specific scientific meanings, though
the recognition of visual patterns is an important starting point (St. Julien, 1997: 275).
As patterns which we perceive through our visual perceptual system, these
patterns are material entities arranged on a material surface (the page in a book).
However, these material entities are themselves cross-coupled to the multimodal
meaning-making practices of some discourse community such as the classroom
practices associated with the teaching and learning of science (see Lemke, 1990b)
as well as a particular culture’s conventions for making visual meanings. The visual
grammar of classification, as described above, is not therefore a perceptual given.
Instead, it serves to add structure and meaning to visual percepts. Like the grammar
of natural language, it does so for the most part implicitly and unconsciously. The
deployment of the visual grammar adds structure and meaning to otherwise vague
and unspecified percepts by organising and interpreting these according to the
conventions of a visual grammar.
Overall, the three images play their part in the building up and development of
a wider thematic formation that can be glossed as [BLOOD IN BACKBONED ANIMALS
AND ITS TRANSPORTATION]. Again, this thematic formation is jointly constructed on

Visual image ‘irregular nucleus ’

CARRIER/INSTANCE ELABORATION: ATTRIBUTE/SCHEMA


ATTRIBUTION

Figure 2.8: Relation of elaboration: attribution between visual image and verbal label
88 Multimodal Transcription and Text Analysis: Chapter 2

the basis of verbal and visual semiotic resources. This helps us to clarify the
relationship between the top two images and the bottom one, which features capillar-
ies in a frog’s foot. The relationship is not, therefore, simply one of spatial juxtaposi-
tion on the same page. Each image, along with its caption and/or labels, and the links
between these and the main verbal text, plays its part in the development of a joint
verbal-visual thematic formation. Thus, the third image – frog’s capillaries – selectively
instantiates and develops part of the thematic concerned with blood transportation in
backboned animals (including frogs and humans). In doing so, it forms a thematic link
with the specifically verbal development of this on p. 60, especially in the three final
paragraphs on this page (Figure 2.6). The frog and human instances are cohyponyms
in relation to the superordinate thematic formation, as glossed above. This is equally true
of both the verbal and the visual contributions to this thematic formation (see
Thibault, 2004b: 306-310). With respect to the photograph showing the white blood
corpuscles (Figure 2.7), this joint visual-verbal thematic formation may be modelled
as in Figure 2.9. Figure 2.9 shows how the thematic formation [WHITE CORPUSCLE] is
built up on the basis of a trade-off between verbal and visual elements such that the
two semiotic modalities jointly create these thematics. For example, the clause white
corpuscles fight and destroy germs and bacteria in the table (p. 61) is cohyponymic to the
clause human white corpuscles take in bacteria, one of the labels in the microscope slide
labelled Blood and Circulation (p. 63). The latter also stands in a relationship of elabo-
ration/exemplification to the visual image, as indicated by the vector (see above). Each
of these thematic items and the links that are created between them, both individual-
ly and jointly, instantiate some aspect of the superordinate intertextual thematics
[WHITE CORPUSCLE].
[IMAGE] [IMAGE]

IRREGULAR NUCLEUS NUCLEUS PROTOPLASM SURROUNDING PROTOPLASM

[IMAGE]
Cl-Th

[WHITE CORPUSCLES ... (DESTROY) ... GERMS] [WHITE CORPUSCLES ... (TAKE IN) ... BACTERIA]
Ac-Pr-G Ac-Pr-G

Ac-Pr-G = Actor^ Process^ Goal = [elaboration: attribution]


Cl-Th = Classifier-Thing = hyponym
= cohyponym

Figure 2.9: Visual-verbal thematic formation; white cor puscles


The inter personal metafunction 89

Similar remarks may also be made concerning (1) the thematic links between
the nominal group white corpuscles and the nominal groups irregular nucleus and sur-
rounding protoplasm, which label selected aspects of the visual image on p. 63, on the
one hand, and (2) white corpuscles and the nominal groups nucleus and protoplasm,
which occur in the first paragraph of the main text (p. 61), on the other. All of these
nominal groups are cohyponyms in relation to the more superordinate thematic item
[BLOOD], which is retrievable from the main text (p. 61). However, irregular nucleus
and nucleus are also hyponymic to the superordinate thematics [WHITE CORPUSCLE],
which is the focus in the present example.

2. 6. 3. The interpersonal metafunction


The interpersonal dimension of the text’s meaning can be described in terms of a
number of interrelated parameters. First, the stance the text adopts towards its own
thematics; secondly, the stance it adopts towards the reader or ideal reader of the
text; thirdly, the way the text situates and orients itself in terms of a wider field of
other possible viewpoints, opinions and so on, in the relevant discourse community.
We may briefly illustrate these in the following terms:
Stance towards its own thematics. Typically, scientific texts are concerned with
evaluations of their own thematics, which fall into the modal areas of importance,
warrantability and usuality/typicality (Lemke, 1998). In the case of science texts for
junior high school readers, a fourth domain – comprehensibility – may also be added
to this list. The first three of these are briefly outlined below, the fourth being taken
up in 2.8, pp. 93-102.
 Importance: high. The fact that an entire page comprising the three images is
separated off from the verbal text and is devoted to this thematic formation and its
visual development suggests that the text grades it as high on a scale of importance.
We can also extend this observation to a further level of delicacy, as far as the inter-
nal ranking of importance among the three images is concerned. In this case, a num-
ber of factors interact. These include relative size, positioning and degree of integra-
tion with the verbal text, as discussed above.
 Warrantability: scientific truth. The appeal to technologically-enhanced means of
perception in the form of the microscope indicates that the text is oriented to the
revealing of scientific truths that penetrate beyond the mere appearance of phenome-
na as we perceive them in everyday reality. The warrantability claim thus regards crite-
ria of objective scientific truths that go beyond the appearance of things so as to get
to their essential features. Importantly, it implies something significantly different from
a given reality ‘out there’ which science reveals. Rather, interpersonal criteria of
warrantability operate according to standards and procedures that are implicit in the
practices of some interpersonal moral order. Thus, warrantability entails criteria as to
how rights and obligations concerning the making of knowledge claims are distributed
among the agents who participate in a given discourse community. As we shall show
in 2.7, pp. 91-93 and 2.8, pp. 93-102 the school science textbook raises a number of
important issues in this regard.
90 Multimodal Transcription and Text Analysis: Chapter 2

� Usuality: typical /expected. Finally, the juxtaposition and comparison of frog and
human blood, along with the tie to the verbal thematics of p. 60, which establishes a
link to a superordinate thematic item, [BLOOD IN BACKBONED ANIMALS], sug-
gests that the images reveal what is typical or usual about blood in such animals despite
specific differences. Even though they obviously show us specific blood samples under
laboratory conditions, the visual images are not concerned to highlight any specific or
individual features, assuming this were possible, or to show the blood as that of an indi-
vidual frog or human. Rather, the images abstract away from the specific to foreground
what is typical and expected. Moreover, the top two images in particular, by virtue of
their juxtaposition, imply a relationship of comparison which serves to highlight the
common or shared features of frog and human blood. Neither the verbal nor the visual
text explicitly mentions how these might differ. Therefore, possible differences
between the two images, which only an expert reader could appropriately assign
meaning to in any case, are not foregrounded. Thus, we see that the top two images
show a similar – not identical – visual-perceptual pattern, i.e. blood cells depicted as
small dark regions in plasma fluid, depicted as a uniform background grey.

Stance towards the user. In the present example, the users are 12-year-olds in
1950’s Australia. Overall, the text works to introduce new terms and to define and
explain them. The stance towards the reader may be summed up as one of author-
ity mediating for, and in some sense condescending towards, the young school
reader, in the sense that the text has to talk down the terms, explain and define
them for its young readers. For example, the second (top right) image on p. 63
(Figure 2.7) would not require the labels and vectors linking these to selected
features of the image if the text were addressed to expert readers.
Stance towards other texts. The text defers to, and bases its knowledge claims
on, the authoritative and high-prestige discourse of biology and its intertexts at the
same time that it mediates between these and the more familiar, everyday language
of its young readers. In other words, the text situates itself in, and participates in, a
system of possible social viewpoints and evaluations regarding its own thematics as
well as those of other social discourses. This is the system of social heteroglossia as
first theorised by Mikhail Bakhtin (1981; see also Thibault, 1991b: Chaps. 5-6). In the
everyday domain, experiential meaning is mainly tacit and based on commonsense
understandings of reality, as well as on personal experience. Interpersonal interaction
is grounded in family, friends and peers and the discourse positions associated with
these interactions. In the scientific-technical domain, on the other hand, abstract and
objectified forms of knowledge are learned through specialised processes of train-
ing and apprenticeship vis-à-vis the social practices and ways of making meaning of
the specialised discipline. Interpersonally, this entails specialised interactional roles with
specific claims to knowledge and authority. It also entails an objectivating stance
towards the self and the self ’s relationship to the ideational field of knowledge in
question. These questions will be further discussed in the next section.
The Italian texts: differences with respect to the Australian texts 91

2. 7. The Italian texts: differences with respect to the Australian texts

2. 7. 1. Reading paths
We shall now turn our attention to the two Italian school textbooks dealing with
the same thematic system – blood and its circulation in the body. In both cases, we
have a double-page display comprising verbal and visual elements. The first thing to
strike the reader on a first reading of the relevant parts of the two texts under
discussion here – published in 1982 (Figure 2.10) and 1985 (Figure 2.11),
respectively – is the overall stability of the thematic meanings, in spite of the fact
that more than twenty years have elapsed since the publication of the Australian
text in 1957. Moreover, the fact that this stability is exhibited across two different
languages – English and Italian – indicates that the discourse of science and its
intertexts is interlinguistic, as well as being intersemiotic. The two Italian texts, in
relation to the Australian text, also show that the multimodal construal of scientific
meanings is not something new to our own historical period. What is new in the
two Italian texts with respect to the Australian text is the increased integration of the
verbal and the visual modalities in the overall design of the page. That is, the
boundaries between words and pictures are now much more weakly insulated; the
two modalities interpenetrate each other considerably more and in ways that com-
plement and add to each other’s meaning-making possibilities (Kress, 1998: 59-66).
A further difference is the use of colour, as distinct from the use of black-and-
white photographs and line drawings in the Australian textbook. While these devel-
opments are also due to technological changes over the period from the 1950’s to
the 1980’s, the more important point is that new technology affords new
possibilities for multimodal meaning making.
Above all, the compositional – textual – structure of the more recent texts
foregrounds linearity to a lesser extent. This is so because in both writing and depic-
tion, it is spatial organisation which enables complex arrangements of elements in
syntagms which are often non-linear as are the reading paths they make possible (see
also Harris, 1995: 46). Let us consider once more page 61 (Figure 2.6) in the
Australian text. In this case, the table is embedded in the verbal text and, as we
showed in 2.4 (pp. 71-78), it is strongly tied to this on account of the thematic links
between the two. Here, the compositional layout of the page, which is part of the
overall verbal-visual textual metafunction, more or less compels the reader to adopt
a linear, top-to-bottom reading path. There is less motivation to view the table as a text
in its own right (which to some extent it is), which the reader can hop to if desired,
and read independently of the verbal text (see Inset 5, p. 31). In the 1957 text, the
top-bottom layout strongly compels the reader to start with the linguistic semiotic and
then to read the table as being tightly integrated with it, both visually and spatially, as
well as thematically. This is less true of the photographs on p. 63 of the Australian
text (Figure 2.7), though even here these are all placed together on a separate page and
92 Multimodal Transcription and Text Analysis: Chapter 2

are relatively strongly insulated from the main verbal text. In both of the Italian texts,
linearity is less foregrounded. The visual images are either more closely integrated with
the verbal text (the 1982 text) or more explicitly cross-linked by means of indexical
pointers such as Figure 2.11 and so on, in the case of the 1985 text.
In the double page presentation of the 1982 Italian text shown in Figure 2.10,
there are three insets showing, respectively, the heart on the top left, two different
visual perspectives on red blood cells in the bottom right of the first page and a num-
ber of different subtypes of while blood cells in the large inset at the top left of the
second page. Unfortunately, it is not possible to analyse these in detail here. The visual
semiotic is a prominent feature of the spatial design and the meaning of this double
page. Furthermore, all three insets are accompanied by their own verbal texts in italics
– itself a typographical-visual contrast with the main verbal text – which function to
emphasise the relative autonomy of the insets and the combined verbal-visual
thematics that they operate. In other words, each inset functions like a quasi-
autonomous text to which the reader can refer in any order he or she prefers, without
following a single preferred reading path based on an a priori top-down and left-right
directional organisation. Two of the insets – those concerned with red and white
blood cells – are thematically tied to material that occurs several pages earlier (p. 95,
not reproduced here) in the main verbal text. This, too, says something about the
‘looser’ nature of the links between any given part of the text and some other part.
The reader is freer to hop back and forth, and this suggests that multimodal scientific
texts are a precursor form of hypertext, with multiple reading paths and multiple links
between different parts of the overall text (see 3.6, pp. 126-129 and Lemke, 1998).

2. 7. 2. The use of colour


The use of colour rather than black-and-white is, of course, due in part to techno-
logical advances in the graphic arts and photography, as well as to the economics of
using these in school textbooks. However, there are also some interpersonal aspects
of the use of colour that are worth commenting on. The 1982 text uses coloured line
drawings whereas the 1985 text uses both photographic plates taken with the aid of
the optical and electron microscope, as well as line drawings. Overall, the
interpersonal stance of these two texts is comparable to the 1957 text along the three
dimensions referred to in 2.4, pp. 71-78. For this reason, we shall not comment in
detail on this aspect of the meaning of the Italian texts.
Nonetheless, the use of colour here is significant. Canonically, scientific jour-
nals addressed to expert readers such as academics and research scientists use black-
and-white tables, graphs, and diagrams (Kress, Van Leeuwen, 1996: 149-53; Lemke,
1998). In scientific discourse, this is the unmarked modal selection for realising the
warrantability claim of objective scientific truth and for penetrating to the essence of
things (see above). This does not mean that colour is excluded in these discourses;
rather, black-and-white is the unmarked choice in scientific articles for experts. The
Expertise and authority vs. comprehensibility and accesibility 93

fact that colour, for mainly interpersonal reasons, is usually the unmarked choice in
modern science textbooks for school pupils can be explained as follows. Line
drawings, graphs and diagrams, as well as photographs obtained by technologically
enhanced means such as the microscope, are, in their different ways, abstractions
from or departures from the visual percepts that we are accustomed to in everyday
reality. This is so because the phenomena under consideration here exist on a scale of
reality that is very much smaller than that of the ecosocial scale on which individuals
interact with and act on both their material and social environments. This microscop-
ic scale is not available to normal human perceptual experience. Furthermore, the
colours used in the line drawings of the white blood cells in the 1982 text are, strictly
speaking, not accurate. This is so because, as the main text tells the reader on p. 95,
sono incolori ... e non hanno in genere una forma propria (they are colourless ... and in general
are formless). Rather than a question of the text contradicting itself, this is a case of
the use of colour belonging to the sensory modality, as distinct from a realistic or
objective scientific one (Kress, Van Leeuwen, 1996: 168-71).The use of colour there-
fore functions to mediate the interpersonal relations between the scientific thematics
and the apprentice readers by appealing to the domain of sensory experience, which is
characterised by rich, saturated colours. From this point of view, the electron micro-
scope photograph of white blood cells in Picture 9 on p. 425 in Figure 2.11 more
closely corresponds to the sensory modality. In other respects, the line drawings under
discussion here are oriented to the scientific modality insofar as they seek to represent
the phenomenon according to its essential characteristics. This suggests that the visual
semiotic is negotiating between two distinct contextual domains and their respective
value orientations. The resulting modality hybrid highlights the way in which the
‘objective’ scientific attitude abstracts away from perceptual phenomena and bodily
experience in order to get at the underlying truth or essence of things, while the more
subjective sensory modality draws attention to our own embodied viewpoint from
which we observe and interact with things in all their sensual immediacy, including the
activities of the science classroom, along with the embodied reading, handling,
pointing to and touching of textbook pages.

2. 8. Expertise and authority vs. comprehensibility and accessibility

Typically, school science textbooks negotiate between notions of expertise and


authority, on the one hand, and notions of apprenticeship and accessibility, on the
other. The writer of the science textbook must, therefore, negotiate between both
technical-scientific semantic registers and the more familiar everyday registers of his or
her apprentice readership. In science textbooks for your readers, authority and
expertise are rarely enacted interpersonally as modalised knowledge claims that are
attributed to particular speakers and writers in the I-you context in which the
interpersonal enactment and negotiation of meanings take place, e.g. I believe that
94 Multimodal Transcription and Text Analysis: Chapter 2

Figure 2.10: 1982 Italian text


Expertise and authority vs. comprehensibility and accessibility 95

Figure 2.11: 1985 Italian text


96 Multimodal Transcription and Text Analysis: Chapter 2

platelets are made in red bone marrow. In this example, the modalised knowledge claim
is attributed to the I of the particular person who asserts the claim in a particular
interpersonal context such as the author of a textbook addressing the reader. A claim
of this kind is interpersonalised. Explicit interpersonal sourcing of this kind in which
the writer directly addresses the reader is rare in school science textbooks. Instead,
claims to scientific expertise and authority are abstracted away from the interpersonal
contexts of authors and readers and relocated in third-person contexts in which the
scientific meanings are attributed to third-person authorities and experts – both indi-
vidual and institutional. Here are some examples: (1) King claims that platelets are
made in red bone marrow; (2) Research findings show that platelets are made in red bone
marrow. In such cases, interpersonal meanings and contexts are transferred to
ideational contexts. The negotiation of these meanings therefore takes place
between ideationalised orders of discourse and their intertexts. These orders of
discourse and their (third-person) spokespersons, rather than particular persons on
particular occasions of speaking and writing, are construed as the sources of
scientific knowledge and the authority claims associated with this knowledge.
Prototypically, ideational negotiation makes use of the lexicogrammatical resources
of projection (see Inset 9 : Projection, p. 101). Projection enables others’ texts to be
negotiated as the projected texts of third-person sources of acts of saying and
thinking (Halliday, 1994 [1985]: 250-73; Halliday, Matthiessen, 2004: 441-82;
Thibault, 1991b: 73-89; Thibault 1999a: 574-80). This strategy allows for varying
degrees of incorporation of the other’s discourse – cf. Bakhtin’s alien word – into
the context of the symbolic source (Sayer or Senser) of the text that is projected
from this source. Figure 2.12 illustrates this with respect to the clause, Platelets are
probably made in red bone marrow, in Australian Biology (p. 61 in Figure 2.6).

2. 8. 1. Linguistic resources
Both the Australian and Italian texts examined here adduce very little explicit sourc-
ing of ideational negotiation between projecting and projected discourse contexts.
Instead, there is a foregrounded predominance of unsourced semantic variation.
That is, the authoritative discourse of science – i.e. the discourse of the other in the
present case with respect to the apprentice readers – is reformulated or reconstrued
in the ideational context of the projecting context of the textbook writer. The source
of these meanings thus becomes increasingly assimilated into the projecting context
to the extent that it is nothing more than unsourced semantic variation which is not
explicitly differentiated from the projecting context of the textbook writer. This is
the semantic region where heteroglossically distinct discourse voices are ‘translated’
or reformulated as semantic variation rather than as heteroglossic diversity or differ-
ence between particular value stances held by individuals and social groups (Thibault,
1991b: 99-103; 1997a: 269-72; Fuller, 1998). Consequently, there is little or no use of
the lexicogrammatical resources of projecting, assigning and attributing. In other
Linguistic resources 97

words, there are few explicit Sayers, Sensers, Assigners and Attributors which
specify the source of the text of the other. These functional semantic labels derive
from Halliday (1994 [1985], see also Halliday, Matthiessen, 2004: Chap. 5). In mental
process clauses, the Senser is the participant who thinks, perceives or feels, e.g. John
believed Mary was coming (Halliday, 1994 [1985]: 114); in verbal process clauses, the
Sayer is the one who emits a verbal signal, as in John said Mary was coming. Assigners
and Attributors are additional functions which pertain to identifying and attributive
processes, respectively. Thus, in the identifying clause, The meeting elected him
President, the meeting is the semantic source, the Assigner, which is construed as
being responsible for assigning the role of President to him. Similarly, in the clause
her words made John angry, her words, as Attributor, is construed as the semantic
source responsible for attributing the type-quality angry to John. In all four cases, we
have different ideational grammatical resources for the sourcing of the meaning of
another in relation to the speaker or writer. Instead of explicitly sourcing the text of
the other, unsourced relational predications of identification and attribution play a
prominent role in the ways in which one grammatical unit in the relationship elab-
orates or further specifies the other. Frequently, this may function to construe
relations of identity or attribution – seen as instantiations of the more schematic
category elaboration (Thibault, 1997a: 314) – between semantic registers deriving
from different social domains such as the everyday and the technical-scientific. In the
discourse of school science, definitions and exemplifications, as realised by relational
(attributive and identifying) processes, frequently have this function. The examples
given in Figure 2.13, taken from the texts under discussion, are typical of this general
pattern. The examples instantiate the schematic pattern [CARRIER^ ELABORATION/
EXEMPLIFICATION^ ATTRIBUTE] (see 2.6.2, pp. 82-89).
The negotiation between the claims of authority and expertise, on the one
hand, and notions of accessibility and apprenticeship, on the other, does, however,
implicate both interpersonal and ideational resources. Whilst ideational resources
construe negotiation between different discursive domains, interpersonal resources

Explicit ideational Conflated ideational/ Implicit interpersonal


source interpersonal source source

King thinks / says I think that platelets are Platelets are probably
platelets are made in made in red bone marrow made in red bone
red bone marrow marrow
It’s probable that platelets
are made in red bone
marrow

Figure 2.12: The cline between ideational and interpersonal sourcing


98 Multimodal Transcription and Text Analysis: Chapter 2

enact addresser and addressee positions and relations. Interpersonal resources thus
ground the negotiation between addresser and addressee in terms of time, place,
modal orientation and the person deixis of writing and reading positions. In the
above instances, interpersonal negotiation is mainly evidenced in specific lexical
choices which realise particular axiological orientations. In example (2), the Epithet
strong implies an evaluative judgement; in (4) not enough indicates a negative judge-
ment as to the quantity of red corpuscles; in (5) the morphemic suffix -ish indi-
cates a judgement regarding the degree to which the type-quality blue may appro-
priately qualify the noun organ; in (6) some entails a judgement concerning quantity;
and in (7) the diminutive giallino [pale yellow] specifies that plasma is evaluated as
only relatively weakly conforming to the type-quality giallo [yellow]. In the
examples given in Figure 2.14, relating to clauses that follow on from those
presented in Figure 2.13, interpersonal negotiation often, though not always,
comes to the fore at a number of points where the everyday world of the reader
is the thematic focus. In such moments, the interactive relationship between writer
and reader is more focal. In example (10), the writer is not explicitly sourced in
the lexicogrammar of the clause, though she is implicit in the modalised
orientation (probably) which she adopts towards the proposition in this clause. In
the remaining examples (11-13), the use of person deixis (I, you) and the
orientation to metaphorical commands (it may be necessary ..., occorre provvedere)
enacts the interpersonal space in which the writer/reader relationship is negotiated.

In the backboned animals, blood is confined to vessels, called arteries (leaving the heart), veins (entering the
1
heart), and fine-walled capillaries, linking arteries and veins. (p. 60 in Figure 2.6)

2 The heart itself is a strong, muscular pump. (p. 60 in Figure 2.6)

3 The platelets are small, colourless cells without nuclei. (p. 61 in Figure 2.6)

4 ANAEMIA is the condition of a person who has not enough red corpuscles. (p. 61 in Figure 2.6)

5 The spleen is a bluish-red oblong, flattened organ, below the stomach, ... (p. 62 in Figure 2.8)

When blood is shed some of it will set or harden on the wound. This is called a clot, ... (p. 62 in Figure
6
2.8)

7 Il plasma è un liquido di colore giallino, ... (p. 424 in Figure 2.13)

I globuli rossi sono cellule senza nucleo, tondeggianti, schiacciate al centro e rialzate ai bordi. (p. 424 in
8
Figure 2.13)
I globuli bianchi (o leucociti) sono cellule incolori, più grandi di globuli rossi e provviste di nucleo.
9
(p. 424 in Figure 2.13)

Figure 2.13: Sequence of clauses


Visual resources 99

2. 8. 2. Visual resources
Visual semiotic resources also possess their own means for negotiating the
relations between the ideational and interpersonal sourcing of the meanings
which are negotiated in texts. In the linguistic semiotic, this is done, as we saw
above, by projection and related lexicogrammatical systems (Halliday, 1994 [1985]:
248-69). Projection functions to recontextualise the discourse or the meaning of
some other as projected text in relation to the projecting context of the text – the
writer’s text in the present case – which is projecting it. In this way, the projecting text
adopts a metasemiotic stance on the projected text and thus comments on it or re-eval-
uates it in some way (Thibault, 1991: Chaps. 2-4; 1999a: 574-80).
Likewise, the use of colour may also function to reframe the scientific
discourse and to ground it in the interpersonal perspective of the writer-reader
relationship. In so doing, it (a) enacts interpersonal negotiation between the writer
and the reader and (b) construes ideational negotiation between the scientific and
sensory coding orientations. The visual interpersonal deictic frame is thus shifted
back to the writer/reader domain, whereas the thematic content is that of the
scientific domain. The former functions to relocate the latter into its own context
and to comment on it from that context, i.e. the writer’s perspective. It is a nego-
tiation between different ideational-thematic orders of discourse at the same time
that this is implicated in the interpersonal negotiations between writer and reader.
The use of the rich, saturated colours that are typical of the sensory coding
orientation in the visual semiotic (Kress, Van Leeuwen, 1996: 168-71) thus serves
to circumscribe the interpersonal domain of the visual image. This is analogous
to the ways in which comment adjuncts such as zoologically, clinically, technically, and
so on, serve to delimit the domain of validity of the propositions which they hold
in their scope. They do so by relativising the proposition according to the values of
the interpersonal domain of validity in which it is articulated. For example, in the
clause: zoologically, the Bluetongue is a Tiliqua (see Worrell, 1963: 61), the would-
be speaker/writer highlights semantic differences across different discursive

10 Platelets are probably made in red bone marrow (p. 61 in Figure 2.6)

You all know that when blood is shed some of it will set or harden on the wound. This is called a clot,
11
and you may read of this as the coagulation of blood (p. 62 in Figure 2.6)

It is foolish to rush to a running tap when you cut your finger, for this washes away the fibrin, hindering
12 clotting. Certainly, it may be necessary to bathe dirt away from a wound, but do not use much water (p.
62 in Figure 2.6)
Per salvare la vita di un uomo che in seguito ad un infortunio abbia perduto una parte del suo sangue,
13 occorre provvedere ad una trasfusione, cioè ad immettere nei vasi sanguigni dell’infortunato il sangue di
un altro (p. 97 in Figure 2.12)

Figure 2.14: Sequence of clauses highlighting interpersonal negotiation


100 Multimodal Transcription and Text Analysis: Chapter 2

domains and, hence, between different categories of addresser and addressee in


these domains. In the present example, the contrast between the everyday desig-
nation of the Australian lizard as Bluetongue and its technical-herpetological clas-
sification as Tiliqua illustrates the fact that different locutions have different values
in different contexts. The interpersonal semantics here are not so much concerned
with the negotiation of different modalised subjective takes or positions on the
proposition, along with their relative claims to power, knowledge and authority –
cf. the Bluetongue is certainly / probably / possibly a Tiliqua – as they are, with the loca-
tion of the locution in an appropriate domain of validity – technical, herpetological,
biological, everyday and so on. It is not, then, the content of the proposition which
is being negotiated but, rather, the validity, warrantability, appropriateness, and so on,
of the specific locution within the particular interpersonal context.
The interpersonal semiotic resources for modifying a given unit are, as
McGregor (1997: 210) argues, scopal in nature. This means that the units which
comprise a given (experiential) structure are acted upon by something other than
the units which comprise that structure – i.e. they are external to that structure –
so as to modify or deform its shape in order to take up a particular interpersonal
orientation to it. As McGregor further points out, the two sets of units belong to
two different orders of semiotic reality. For example, the scopal nature of
interpersonal units contrasts with the constituency relationships which realise
experiential meaning in the linguistic semiotic. The experiential constituents
CARRIER (the Bluetongue ), PROCESS (is) and ATTRIBUTE (a Tiliqua ) in the clause: the
Bluetongue is certainly a Tiliqua, are all of the same order of reality. By contrast, the
modal operator, certainly, which models the interpersonal relations and effects,
certainly adopts a metasemiotic stance on the experiential units on which it oper-
ates. It belongs to a different order of metasemiotic reality.
Similarly, the use of rich, saturated colours acts on or operates on the
experiential units in the figure of the white blood cells (Figure 2.10) so as to modify
or deform them in some way in order to achieve a specific interactional effect or
to adopt a particular – evaluative, affective – orientation to them. This is consistent
with the fact that colour is the result of perceptual phenomena which are ‘localised
at the interfaces of matter which are in contact with air’ (Saint-Martin, 1985: 37),
rather than being an essential property of matter. In other words, colour is a result
of the spectral reflectance ratios of the incident light rays on a surface (Gibson,
1986 [1979]: 30-1). As Gibson also points out, the colours of surfaces specify the
composition of a surface – cf. the ripeness or unripeness of fruit (Gibson, 1986
[1979]) – and thereby provide important information as to how an organism may
orient to a given substance. Therefore, this fundamental fact of visual perception
can be utilised to attain particular interactional-interpersonal purposes and effects
in visual semiosis as a logical consequence of this property of colour. Moreover,
the specific modification – the use of saturated colour in this case – is said to be
Visual resources & Inset 9 101

Inset 9: Projection

�All languages appear to have resources for indicating some stretch of discourse as having
been spoken, written or thought by some person in a given discourse context. There are
two aspects to this relation. First, there is the context of the discourse of the person who
is quoted or reported. Secondly, there is the context in which the person who is quoting
or reporting the discourse of some other speaks or writes. The relationship between the
two contexts is called projection. The quoted or reported discourse of someone is the pro-
jected context; the projected context is projected by the projecting context of the person who
quotes or reports the other person’s words. It is also possible for the person in the pro-
jecting context to quote or report their own words or thoughts. The principle is the same in
all cases. The distinction between the projecting and projected contexts is as follows

Projecting Context Projected Context


Quote He says, “That's when I get humbled.”
Report Police said the man set himself alight when he was questioned
by two police officers around 2am.

�There are two main ways of representing the speech or thoughts of others. First, some-
one’s speech or thought may be grammatically construed as it is supposed to have been
actually said or thought in some context. Such is the case with the direct quoting of
someone’s speech, writing or thought. In such cases, the discourse of the other is
presented from the point of view of the projected context. Secondly, the other’s speech,
act of writing or thinking may be construed from the viewpoint of the person in the
present situation of utterance who interprets the discourse of the other. That is, from
the point of view of the projecting context. This is what is known as the indirect reporting
of another’s discourse. In both cases, the speaker or writer of the utterance attributes
some utterance, thought, perception or feeling to a sentient being who is construed as
being the source of the utterance, thought, perception or feeling in question. With ref-
erence to English, the basic possibilities are as follows:
Direct Indirect
Speech He said, "I am coming" He said that he was coming
Thought He thought, "I am coming" He thought that he was coming

�This illustrates the basic distinction between the two ways of projecting speech and thought.
In direct speech and thought, tense and person deixis index the situation of the person
responsible for saying the utterance or thinking the thought in the projected (quoted)
clause. In indirect speech and thought, tense and person deixis shift to the discourse situation
of the speaker or writer who says or writes the utterance, i.e., towards the projecting
context. This shows that two points of view and the relationship between them are
involved in this type of linguistic structure. In direct speech and thought, the quoted
utterance or thought – the projected clause – is presented from the standpoint of the
person who says or thinks it. The relationship between the projecting (quoting) and project-
ed (quoted) clauses is one in which there is a congruence or non-discrepancy between the
two perspectives. This is the case when, for example, the present knowledge or point of
view from the speaker/writer’s perspective is in accord with the perspective of the quot-
ed utterance or thought. In the case of indirect speech and thought, on the other hand,
the semantic construal of the projected utterance or thought from the point of view of
the speaker or writer indexes a non-congruence of the two perspectives. In other words, a
contrast or discrepancy between the knowledge or point of view of the speaker or writer
in the present time of utterance and the projected speech or thought is suggested.
scopal because its scope extends over a certain region or subregion of the
topological space of the visual field, in the process interacting with other features
in the visual field and thereby creating a particular interpersonal (affective,
evaluative, etc.) orientation to them. The use of colour therefore constructs a
metasemiotic frame of reference which organises interpersonal orientation. It thus
anchors or grounds the experiential meaning of the visual text by defining the
modalised intersubjective space in which the image is to be negotiated. However,
this entails more than simply an interpersonal negotiation between writer and reader.
It also serves to locate this interpersonal negotiation in a still wider intertextual field
of heteroglossic relations among texts and reading and writing positions.

2. 9. Conclusion

Science textbooks are a type of knowledge object (Bereiter, 1997: 298) which can be
used in a variety of ways – enabling and constraining – which go beyond the
situations in which they were produced. They are not simply objects which students
and teachers passively decode. They, too, are participants in the dynamics of the
processes – both material and semiotic – which link human agents, their tools and
artifacts, and semiotically mediated activities in still more extended networks of
meaning making across diverse temporal and spatial scales. As knowledge objects,
science textbooks are hybrids, to borrow Bruno Latour’s notion: they are
simultaneously artifacts and activities, both material and semiotic, local and global,
natural and cultural. It is in this way that scientific theories and explanations, far
from being universal laws, are kept alive by the networks of classroom activities,
others’ texts, measuring instruments, perceptual devices, laboratory experiments
and so on, with which the textbook is linked in, and through, all the work which is
required to keep the whole network going (Latour, 1993 [1991]: 121).
Moreover, the activity of reading, the constructing of multiple pathways
between the verbal and visual resources that are codeployed and, hence, the con-
strual of joint verbal-visual actional systems of thematic meanings and their
associated axiological orientations, allows for the emergence of scientific mean-
ings, knowledge and associated affective investments. The recognition of visual
patterns, on the one hand, and the deployment of multimodal meaning-making
practices, on the other, are not, in the final analysis, constitutively separate activities.
The one does not follow on from, or cause, the other. Rather, they are all
simultaneously cross-coupled in the time-bound processes of building up scientific
meaning and knowledge in, and through, the social practices associated with the use
of textbooks. An understanding of these links and the role of the multimodal
science textbook page in these can help us to develop a truly multimodal literacy
which is adequate to the world in which we and our children live and make meaning.
Chapter 3

The web page


3. 0. Introduction

How can we go about describing a website? Are web pages in fact pages? Or is the idea
of the page just a metaphor? How do we account for the web page in terms of
resources and the kinds of meanings that people make? And what about the web page
as genre? Can we describe web pages in terms of different genres? How can we
describe a particular pathway through a website and how can we relate a particular path-
way to the virtual resources of a specific website as a whole? And what transcription
techniques can we develop? How do websites work? How does this connect back to
the notion of transcription? To what extent does the transcription speak for itself as a
description of the web page?
As we can see from these questions, there are many possible starting points in
the analysis of web pages. Our chosen approach is to make some general observations
about the nature of websites, which we illustrate with some examples. We then discuss
a number of websites in detail applying the technique of multimodal transcription and
text analysis to them. For reasons of space, we have restricted our discussion to the first
of the Information websites types categorised, on the following page, in Table 3.1,
namely edutainment websites for children. Naturally, children’s edutainment websites,
include textual objects implicitly associated with, or, indeed, explicitly linked to other
types of Information websites (e.g. museums, special interest sites). Internet is a technol-
ogy that is especially good at merging disparate entities, at crossing and realigning the
boundaries between diverse discourse genres, social activities and domains, and, as we
shall see below (e.g. 3.7, pp. 130-136), at constantly relocating and recontextualising
agents and objects that would we expect to find in one setting or category into others.
In this respect, we are concerned with a detailed analyses of specific web pages. Our
analyses take into account the nature of the different semiotic resources that are used
and the way they are used to create a particular website. This includes the reading path-
ways (see 3.6, 3.7, 3.8, pp. 126-146) that can be taken through a particular website as
the user creates and negotiates the meanings afforded by that website along a particular
meaning-making trajectory (see Inset 19: Negotiation, pp. 245-247; Inset 10: The
trajectory, p. 116). Our analysis will look in particular at home pages and the way
they are linked to subsequent pages in the website to which they relate. Other
104 Multimodal Transcription and Text Analysis: Chapter 3

starting points could include discussions of the cultural and social functions of
websites. Our approach is certainly compatible with such approaches, but our con-
cern is with how textual resources function in distinctive ways to create web pages
and, in the process, make it possible to distinguish a web page from a printed page
and from other kinds of multimodal texts such as films. Fundamentally, we are con-
cerned with the way a user interacts with a web page, and we base our analysis on a
detailed account of the textual resources used in particular websites, rather than
relying on the subjective accounts of particular users.
What is distinctive about the web page as a space for meaning making? A web
page is a visual-spatial unit displayed on a computer screen. It makes use of written
resources such as language and the resources of depiction, including the spatial
juxtaposition of objects. In this respect, a web page is similar to a printed page (see
Chapters 1 and 2 ). However, the web page goes beyond the printed page because
of its hypertextual nature and the action potential that this affords (see 3.9, pp.146-
155). We need to see the web page with a dual focus: as a visual-spatial unit and as
action potential. The web page in this sense is a hybrid, sharing features of the stat-
ic page; on the other hand, it also has a dynamic potential for action; the user can
act on the page and obtain responses to his or her actions. A printed page cannot be
dynamically reorganised unless you take a pair of scissors to it, so to speak. Of
course, the reading of a printed page is itself a form of activity at the same time
that the written text enables the creation of indexical ties with activities that the
reader can perform as a consequence of reading the page.
A feature of the web page is its capacity to be reorganised. It is possible, for
example, within a children’s website, to physically move objects around. Examples of

Examples of Information websites


(1) Edutainment websites, often produced by big organisations (e.g. Nasa) which blend
entertainment with an educational function;

(2) Museums, libraries, universities, newspapers;

(3) Institutions and associations, whether international bodies (e. g. Unesco ), govern-
ment sites (e.g. relating to consular services);

(4) Personal websites: these present individual people and their work, such as musicians,
film stars, academics and so on;

(5) Special interest sites (hobbies, recipes, animals and zoos);

(6) Good and services websites that may include shopping for a) goods: books, food
and b) services: car hire, train and plane tickets, e-banking, buying a theatre ticket;

(7) Individual company sites ranging from the global (e.g. car manufacturers) to small
businesses (e.g. restaurants in a particular town).

Table 3.1: Types of web pages according to social activity


Page or screen? 105

this include the games, such as adventure games, found in a website. It is also
possible to reorganise the page by making selections either by passing the mouse
over an object (known as a rollover ), by clicking an object (e.g. drop-down menus ) or
by double clicking so as to select a link. Examples of such reorganisation include: (1)
a change in part of the page on the screen, for example, the activation of a video
(relating to a news report, an advertisement or a demonstration of an instrument),
or a drop-down menu giving a list of cities and their temperatures on a particular
day; (2) the transformation of the entire page, or at least most of it, as in the pro-
duction of a timetable; (3) access to a new page. This capacity for reorganisation is
implicit in the questions that we have listed above. We will provide answers to these
questions in what follows.

3. 1. Page or screen?

The printed page is a visual-spatial unit of textual organisation created by the


tracing of visual invariants onto a treated surface such as a sheet of paper. A treated
surface provides the material support for the installation of the visual and grapho-
logical patterns which are displayed on the page as potential signs which can be
interpreted in and through the activities with which they are integrated by the
reader. As we have seen in Chapters 1 and 2, the page has typical forms of top-
bottom and left-right organisation, which provide access to its potential meanings
by specifying possible reading paths. A reading path is a preferred way of integrat-
ing the activities of visual scanning with the potential meanings proposed by the
page. In this sense, the web page clearly has pagey properties that justify the term
web page. For example, the home page of a website has forms of top-bottom and
left-right organisation that are common to very many different instances in spite of
the immense amount of variation that is seen in the design of web pages.
By the same token, the visual invariants that constitute a web page are
received as patterns of light onto a computer screen, rather than being traced onto
a treated surface. Moreover, the screen, unlike the printed page, affords possibilities
for the projection of movement in the form of objects which respond when
clicked by the mouse, for animation and for video. In this way, the web page has
properties that make it like a video screen: the screen displays moving objects which
the user can view in ways similar to the viewing of video texts on a television
screen. The web page therefore has screeny properties in addition to its pagey ones.
Importantly, as we discuss below, these screeny properties also relate to the way in
which the web page is dynamically assembled on the screen by a computer
program. This characteristic of the web page means that different texts, in diverse
semiotic modalities, both visual and aural, can be dynamically assembled as a web
page on the screen by the computer user.
The dual character of the web page – page and screen – as discussed so far also
relates to a third characteristic, i.e. the web page as a dynamic and interactive interface
106 Multimodal Transcription and Text Analysis: Chapter 3

between the computer user and the possibilities of the website as a whole. The web
page which is dynamically assembled and displayed on the computer screen mediates
the user’s relations with particular objects which the user can interact with. Such
objects influence the user’s behaviour, who can obtain responses from them (see 3.9
pp. 146-155). Moreover, the web page enables a user to make links with other
objects, other web pages and other websites. In this third perspective, the screen is:
(1) a receiver of information from remote sources; (2) the means of sending information
to remote targets; (3) a field of on-screen possibilities (images, objects and so on) that
the user interacts with and which mediate the previously mentioned points (1) and
(2).
The computer user therefore enters into an active relationship with the
virtual screen world that is created by his or her interactions with the web page.
This means that the user is able to take part in forms of virtual modelling of, and
virtual participation in, the virtual hypertextual world that is so created. The
computer user becomes a participant in processes that are dynamically assembled
on the screen at the same time that the here-now events on the screen link to other
times and places in other websites. From this third perspective, Internet is a network
which enables interactions between persons and between persons and virtual
hypertextual worlds and their participants that may be widely separated in time and
space. The apparent stability and the artifactual character of the printed page, the
written text or the book which is integrated into the activities in which it
participates, is thus dynamically transformed. The web page as it appears on the
screen can be seen as being linked to other sites, other possible pages, other objects
by lines of connectivity which make it a participant, along with the user, in a network
topology of such connections. By contrast the adjacent ordering of elements on the
printed page into visual-spatial forms of organisation that are traced onto a treated
surface, puts the emphasis on a flat, two-dimensional form of organisation.
The visual patterns displayed on the screen only become a text when they are
integrated with activities which assign meaning to them. What are these activities?
How are they related to each other? Some of the relevant activities include:
Visual scanning ^ Select specific object [^ = followed by];
Point mouse at object ^ Object responds (tells me something,
changes form, lights up);
Click on object ^ Create link to thematic area, functional unit;
Click on object ^ Expand thematic area/interact with virtual
object to create activity.
The user enters into an active relationship with the screen world and its
objects. However, the virtual world of the text that is projected onto the screen for
the user to access is an abstraction away from natural objects as we perceive them
Page or screen? 107

under normal perceptual conditions in the ecosocial environment in which we


typically live and move. Our interaction with the virtual world of hypertext and its
participants enables our virtual modelling of, and virtual participation in, processes
and activities that have the potential to take us further and further away from the
world that we typically encounter with our senses. This is so from the point of view
of the materiality of the objects and processes encountered and their meanings. It is
also true of conventional written genres (e.g. stories, recipes and so on) as well as
purely hypothetical or fictive visual objects, which may have little or no relation to the
objects and events that we commonly encounter in the world of ordinary perception.
There is often a high degree of self-consciousness in the design of web pages
and their modes of representation. There is a ‘set towards the message’, to use a
Jakobsonian turn of phrase (Jakobson, 1960: 356), placing the focus not just on the
aesthetic appeal of the page, but on the semiotic principles of its design and func-
tioning. This is very much for a purpose: the web page appeals to and requires the
user to understand its semiotic modus operandi as a normal or routine part of the
overall business to hand whenever one interacts with a web page. Finding a web
page is itself an important web-related activity, with access to a specific website
gained by the following means:
(1) by clicking on a website address in a web page including bookmarks;
(2) by knowing the website address from other sources, such as
announcements and advertisements in books, magazines, TV and
cinema films or by determining the website address through guess-
work because of its predictability and inserting it in the browser’s
address line;
(3) by using a search engine to find the address of a known company,
association, etc., or more generally a thematic search e.g. recipes or
telephone numbers.
The activity structure may be further characterised in the following way. From
the standpoint of multimodal text analysis, we have a Participant and an Action. The
Participant is the computer user. He or she uses a Search Index, which embraces
bookmarks, address lines in browsers and search engines, into which information is
typed or from which information is selected. This typing-in or selecting is the Action,
which can be represented more fully as: PARTICIPANT^ACTION^ OBJECT^ RESULT. In other
words, a Participant, namely a computer user, acts upon an Object, the Search Index
(in one of its forms), by typing in or selecting a Search Item (which is a piece of
information such as a website address or keywords which are part of a search). The
action of clicking, which is a way of confirming a selection, results in either the appear-
ance on screen of a particular home page of a particular website or, alternatively (in
the case of a search engine), a list of potential websites. The only exception is where
108 Multimodal Transcription and Text Analysis: Chapter 3

no hits are recorded (in the case of a search engine) or where a particular website no
longer exists. Even so this is obviously a Result. This is the basic pattern which exists
also within websites, though less guesswork is likely to be involved.
We need to recall that Internet users typically find home pages using a search
engine so that the home page is likely to contain a keyword which allows the site in
question to be easily retrieved, e.g. the word recipes. A search of this type will return
vast numbers of sites relating to recipes, some of which are commercially inspired and
others not. The latter category includes sites set up by individuals and by groups of
people forming associations. These sites vary in their overall characteristics, but most
are concerned with providing a recipe and little else – in other words information
restricted to the special-interest category (see Table 3.1). Commercial websites, on the
other hand, are typically emanations of food companies and food magazines whose
major concern is with marketing and establishing ways of online selling. Typically,
these websites will have textual features including surveys and links concerned with the
sale of kitchen equipment, cookbooks and subscriptions to magazines. A comparison
of these two types of website is instructive as regards the way virtual communities are
built up. A comparative multimodal transcription can help pin down these differences.
Although Internet has only made a big social impact in recent years, it has itself
undergone major changes. Specifically, there is a growth of websites in which the
page is produced dynamically, i.e. generated by a computer program. For example,
train and plane timetables nowadays rarely take the form of pre-existing pages.
Instead the user has to build his or her own timetable from a series of parameters
including place and time of departure, place and time of arrival, return journey, pre-
ferred cost category, number of passengers and special requirements (e.g. seating
arrangements, food requirements and special assistance due to mobility problems).
Consequently, the nature of the web page has itself changed from one dominated by
a series of links between pregiven pages to one in which the page is dynamically
created on the screen through parameter selection, providing the illusion that the
page in question has been transformed into something else. When the user attempts
to return to the previous page, a notice will frequently appear stating that the page is
no longer available. This is because the web page does not pre-exist; rather, it is
assembled from a database by a computer program. In other words, the ongoing
transformation taking place within Internet, is that the metaphor of browsing and
navigation that characterised the first stages of Internet and led to definitions in
terms of links between pages produced by pre-existing off-the-shelf, take-it-or-leave-it
items is giving way to authorship. Authorship is located within the space of the same
page, which transforms itself through parameter selection on the part of the user,
one reason why home pages have become increasingly significant in websites and
why many complex websites consist of a series of home pages. Rather than moving
from one virtual site to another, the web page is a modifiable site that changes its state
through the modification of its parameters by the computer user.
Decoupling of material support and information on the computer screen 109

3. 2. Decoupling of material support and information on the computer screen

The printed page qua material artifact is a synoptic entity. The visual tracings on a
treated surface constitute a frozen array of visual invariants which, when integrated
with the activities of visual scanning and interpretation of the reader/viewer, can
be understood as meaningful signs. In part, the illusion of permanency of the
written page and of the written text that it materially supports is due to the way in
which the relationship between the material surface (paper, plastic and so on) and
the visual tracings that are made on this surface by means of some tool or machine
(printing, engraving, writing, drawing) is fixed for as long as the surface itself or the
tracings on it do not materially decay, fade away, become erased and so on.
The material support and the tracings on it are hard-coupled for as long as the
material relationship between them endures. On the other hand, the visual images,
the linguistic texts, the audio files and so on that are dynamically assembled on the
computer screen are not hard-coupled to their material support in the same way.
Instead, the diverse sources of stimulus information (visual, auditory, kinesic and so
on) are reduced to a single abstract form, the byte, consisting of ones and zeroes,
which cannot be picked up by our perceptual systems.
The digital processing of the information that is stored in this form by
means of computer software through processes of selecting and editing means that
the data which is stored in these bytes can be dynamically assembled in newly con-
tingent ways according to the choices made by the computer user. This happens
before the data that is so elaborated is projected onto the computer screen or is
saved as a permanent record on a CD, DVD or hard disk. The digital technology
of the multimedia computer allows the computer itself to carry out part of the
process of elaboration of the data so that the material object-text which we see and
hear through the screen and the audio system of the computer is linked in real-time
to the processes of elaboration (selecting, editing, assembling) of the data (the
bytes) that are carried out by computer programs rather than by the computer user.
The web page is not the product of the hard coupling of a material support (e.g. a
treated surface such as a sheet of paper) and data (e.g. visual tracings made by a
drawing implement such as pen, crayon, chalk or by mechanical means such as
printing, engraving, photographic reproduction). Instead, the web page is charac-
terised by the soft coupling of material support (screen, CD, DVD, and so on) and
data (digital bytes).
This has two important consequences. First, some of the processes of data
elaboration are allocated to the computer’s internal processing. Secondly, this gives
the computer user the possibility of creating a dynamic and flexible web page,
rather than interacting with a pregiven one (see 3.1, pp. 105-108). The page can be
modified and updated through the actions of the computer user when he or she
interacts with the texts and objects that are displayed on the computer screen. In
110 Multimodal Transcription and Text Analysis: Chapter 3

semiotic terms, we can say that the relationship between the data and its material sup-
port is unhinged. Data is coded in digital form as bytes and is dynamically assembled
into newly contingent patterns by programs internal to the computer. These
processes occur prior to the processes which subsequently convert this data into a
form which we can perceive on the computer screen.
These observations pertain to what linguists and semioticians refer to as the
expression stratum of language and other semiotic systems (see Inset 18:
Stratification, pp. 236-237). The expression stratum is the semiotically organised
material means of embodiment of a semiotic system; it is the dimension of
semiosis that we apprehend with our perceptual systems. Consider, for example,
the relationship between the tracing activity of the writer and the visual-graphic
traces that are produced by this activity on a surface and picked up by the reader
as visual stimulus information about that tracing activity. The two phenomena –
tracing activity and visible traces – are relatively hard coupled to each other. That
is, the stimulus information provides the reader with information about environ-
mental objects and events such as the marks on the page and the activities involved
in putting them there. The sequence of activities involved in the elaboration of the
expression stratum of the written or printed page can be schematised as follows:

WRITER’S MANUAL OR MECHANICAL TRACING ACTIVITY ON TREATED SURFACE WITH


TRACING TOOL � RESULTING VISUAL TRACES DISPLAYED ON TREATED SURFACE �
PERCEPTUAL PICKUP OF STIMULUS INFORMATION BY READER THROUGH ACTIVITIES
OF VISUAL SCANNING

In the case of the hypertext page, the sequence of activities is as follows:

COMPUTER PROGRAM ELABORATES DIGITAL INFORMATION IN THE FORM OF


ABSTRACT COMBINATIONS OF BYTES � RECODING OF ABSTRACT DIGITAL
INFORMATION AND ITS DYNAMIC ASSEMBLING ON COMPUTER SCREEN AS VISUAL,
KINESIC AND OTHER PERCEPTUAL INFORMATION � PERCEPTUAL PICKUP OF STIM-
ULUS INFORMATION BY READER THROUGH ACTIVITIES OF VISUAL SCANNING

The radical difference lies in the way in which the digital processing of abstract
combinations of bytes by the computer is separated from the material means of its
support, i.e. as stimulus information displayed on the screen that can be picked up
by our perceptual systems and interpreted as signs of objects, events and so on.
Many commentators on the ‘digital age’ talk about the ‘information’ that is
elaborated, processed, stored, transferred, exchanged and so on, by means of the
computer. Bytes and combinations of bytes are information. Information, as
distinct from meaning, is defined in statistical or probabilistic terms without reference
to the categories of the observer/interpreter. Moreover, the information that is
coded in combinations of bytes entails a fixed relationship between the combina-
Decoupling of material support and information on the computer screen 111

tions of bytes and the information that is contained in them; it is information that
is read by a machine and interpreted according to the fixed relations established by
a computer program. It is correct to say that this information is coded in combi-
nations of bytes. However, the abstract coding of information in the digital form
of bytes is not in itself meaningful to a human interpreter.
The patterns of sound and light that the user picks up and interacts with
through the multimedia resources of the computer are, on the other hand, potentially
meaningful to a human interpreter. In this case, the meanings that the user creates by
interacting with this information are not coded in the patterns of light and sound
that are perceived; rather, the user interprets them as meaningful signs by integrating
them into the semiotic categories of a system of interpretance. Human semiotic
systems are not codes. A code – e.g. morse – is based on a fixed relationship between
the information which is coded and the means of its coding. Language, gesture,
depiction and other human systems of meaning making are not codes in this sense;
they do not exhibit fixed relationships between ‘information’ and the means by
which this information is coded. The idea that language, fashion and so on, are codes
was popular in early theories of semiotics in the 1960’s and 1970’s, but such models
are neither realistic nor informative for theorizing and describing semiotic systems.
The information coded in combinations of bytes has to be translated into a
form that is accessible to a human user and the system of interpretation that he or
she uses. The information encoded in combinations of bytes has to be reorganised
by the computer’s own operations as a new type of information on a higher-scalar
level that the human user can access through his or her perceptual systems. The
computer programs that read this information and transform it into a form which
is accessible to human interpretation are, of course, designed by humans. However,
the computer programs which have these functions perform the task of comput-
ing this information (bytes) into a qualitatively different form on the scalar level of
the human interpreter with his or her categories, interests and systems of inter-
pretation. In this sense, the computer programs that perform these tasks constitute
an intermediate level of organisation in a human-computer social system of relations.
The semiotic potentiality of this hybrid human-machine system can be modelled
as a hierarchical system of relations on three levels, as follows:

L+1: System of interpretance in, and through which, meanings are


recognised and interpreted (context of culture );
L: Multimedia (screen, audio) interface with a human user;
dynamic assembling of a web page as a multimodal text (context
of situation ) (actual);
L-1: Combinations of bytes and their dynamic assembling as data
for elaboration by a computer program (virtual).
112 Multimodal Transcription and Text Analysis: Chapter 3

In general terms, the logic of the three-level hierarchy can be explained as


follows. Level L+1 designates a system of higher-scalar constraints, generally on a
larger, slower time scale, such as those of a cultural system, including its categories
and practices. Level L, the intermediate level, is the focal level, i.e. the level of
specific focus and interest. It is the level of our human-scale interactions with the
environment that we live in. Level L-1 refers to faster, smaller lower-scalar processes
that constitute the material affordances for the processes on the next level up.
Processes on this level are enabling and/or initiating conditions for processes on
the focal Level L. Processes on Level L+1 are boundary conditions in that they
place constraints on lower-scalar processes. They include, as here, the systems of
interpretance in, and through which, Level L phenomena have meaning for human
observers (see Salthe, 1993; Lemke, 2000; Thibault, 2000a, 2004a, 2004b).
On Level L, there is the real-time of the dynamic assembling of the page qua
text on the screen and the user’s interaction with it. On the level below this, there are
the much faster processes of electrical impulses and their conversion to patterns of
optical information on the screen. Above the focal level, there are processes on longer
timescales, such as the website as a whole, its possible links to other sites and the
embedding of Internet in the context of culture(s) and social networks of its users.
Processes on the L+1 scale constrain and provide a higher-scalar context for the
activities and processes on the focal level when the computer user interacts with the
web page on his or her computer screen.
Bytes per se and their combinations on Level L-1 are neither accessible nor
directly potentially meaningful to us on our human scale. They are fast, small scale
electronic processes below the threshold of human awareness. Rather, the computer
reorganises the information coded in combinations of bytes as a qualitatively different
kind of information on Level L, the next highest level. Level L, the intermediate level
in the proposed hierarchy of relations, is the interface between the human user and
the perceptual stimulus information that the multimedia resources of the computer
make available to his or her perceptual systems. Rather than making meaning, the
computer takes part in the processes of reorganizing one type of information on one
level as a qualitatively different type of information which can be interpreted by human
observers on the next higher level in the hierarchy of relations presented above. In
the process, the separation that this implies between Levels L-1 and L means that the
computer dynamically reorganises this information in newly contingent ways
according to the parameters chosen by the user. The computer thus produces in the
real time of the user’s interaction with the website, personalised web pages and person-
alised trajectories through websites. The relatively strong cross-coupling of material sup-
port to the informational database in the case of the written or printed page does not
permit this possibility. Computer programs semiotically mediate the relations between
abstract combinations in the digital form of bytes and the observer’s categories and
activities through the dynamic assembling of the web page from a computer database.
The relationship between web page, website, web users and web genres 113

3. 3. The relationship between web page, website, web users and web genres
In any approach to the study of websites and its component parts, such as web pages
and the multimodal clusters of objects and images found on web pages, it is
important to understand the overriding significance of genre as an organisational
principle. Genres regulate and mediate the ways we interact with each other in
society, and websites and web pages are no exception. The website as a whole has
generic features at the same time that it comprises many more specific genres. For
example, the home page is a functional component within the larger-scale structure of
the website as a whole. The home page also has the characteristics of a superordinate
genre in its own right at the same time that many of its component parts are
themselves distinctive mini-genres – linguistic, visual, musical and so on.
In many linguistic accounts, genre refers to the most global level of
organisation of a given text-structure type or activity-structure type (Hasan, 1978;
Martin, 1985a; Ventola, 1987). A genre is defined in terms of a typical beginning-
middle-end structure, as a series or configuration of stages through which texts
belonging to the given genre typically progress. Each stage is a functioning
component both in relation to the larger whole to which it belongs and in relation
to the other component parts of that whole. Some examples include:
RECOUNT: Orientation^ Eventn^ (Coda)
ARGUMENT: Thesis^ Argumentn^ (Recommendation)
[N.B. The following notational conventions apply: ^ = “followed
by”; subscriptn indicates the element is recursive, i.e. can occur
more than once; round brackets indicate the item so enclosed is
optional; unbracketed items are obligatory]
A genre in this definition is a sequence of optional and obligatory elements
through which texts progress from their beginning to their end in order to fulfil some
social or communicative purpose. Recounts have the purpose of telling a chronologi-
cal sequence of events that someone experienced. A Recount typically begins with an
Orientation. This initial stage indicates who took part in the event sequence, when,
where, and so on. The Orientation is followed by a chronological sequence of actions
and/or events that are usually told in the past tense. These two stages are obligatory.
A third stage, which is optional, is the Coda. The Coda provides some retrospective
evaluation of the events recounted, and therefore provides them with some wider sig-
nificance or value. Arguments seek to persuade or convince readers or listeners to
adopt a certain point of view and perhaps to act on it or be prepared to act on it. An
Argument begins with a Thesis, which states the position to be defended. The Thesis
is followed by a series of Arguments providing information and evidence in support
of the Thesis. Arguments often, though not always, conclude with a Recommendation
to act or behave in a certain way on the basis of the arguments presented.
114 Multimodal Transcription and Text Analysis: Chapter 3

In what way can the notion of genre be applied to the home page? Does the
notion of genre described here readily apply to the website? In this section, we
shall try to give some preliminary answers to these questions.
One significant difference with respect to linguistic genres such as those
mentioned above is that the notion of sequential organisation does not apply in the
same way. The web page is, in the first instance, a visual-spatial unit which is dis-
played on the computer screen. Consequently, it is important that we attempt to
understand and describe its generic features with this in mind. Given that there is
considerable variation in the way web pages are organised, how can we go about
identifying generic features which are common to the web page in general? What
are the component parts of the web page? How do they relate to each other? What
semiotic and material resources are typically codeployed? And how? .
In Figure 3.1, we have shown, in a highly schematised way, the layout of a
typical home page. The diagram suggests some of the typical elements in this type of
web page as well as a typical combination of these elements. At the same time, we
want to stress that the positioning of the elements in Figure 3.1 is not intended to
suggest that these elements always occur or that they necessarily occur in this
particular combination. We are trying to describe in a highly schematic way some of
the typical components of a web page and their relations to each other. Even a cursory
survey of web pages will show that there is considerable variation in the way that
elements are arranged on the screen. However, we feel that, notwithstanding this
diversity, it is possible to identify a number of elements and combinations of elements
that are typical of web pages, although there is considerable diversity as regards the
way they are arranged on a given web page. In this respect, the home pages from two
sites relating to children, Nasa Kids and British Museum Children’s COMPASS, that
we discuss later in this chapter only conform to this schema to a lesser extent than
many other websites but nevertheless still exhibit many of the elements featured.
The idea of genre as a staged, goal-oriented schema comprising a particular
sequencing of functional components – both optional and obligatory – is less
appropriate for talking about web page genres. This view very much puts the locus
of control in the genre schema; the writer or speaker is required to create a text
which conforms to the requirements of the schema, though this does not exclude
variability and creativity. The genre is a metadiscursive construct which functions as a
point of reference for the activity; for example, it specifies the sequential ordering
of items in a determinate beginning-middle-end type of structure and also represents
the global organisation of the particular text-type. In this view, generic structure
potential is a pedagogical device which teachers and learners in classroom writing
activities can use as a model or a template for controlling their own writing activity.
It is a device which enables both teacher and learner to look at their textual
processes and make them accessible to conscious awareness and control (Martin,
1985b; Thibault, 1989). In this view, genre is a metalevel structure which enables the
The relationship between web page, website, web users and web genres 115

Site name Top Banner Picture

Left panel.
This often Top Bar. This is typically a menu which is thematically-
gives access related to the site name and which provides links to further
to the latest pages, including other home pages relating to subsites.
information
in the form
of:
Top centre-right panel. This forms a grid on thematic
grounds with verbal and visual clusters. Typically, the
1. Search clusters will form a repeating pattern arranged as a grid-
engines like structure or as a vertical list with links to other pages.
The clusters form a supercluster identified through a
heading at the top. The clusters in the superclusters are
hyponyms to the heading at the same time that they are
cohyponyms of each other. A cluster, like a supercluster,
2. A tabu-
is characterised by a semantic homogeneity of the
lated index
component items. Sometimes this supercluster will con-
creating
links to tain a search engine typically in the bottom right-hand
pages corner.

3. Other Bottom centre-right panel. This panel is organised as


links above but with a different set of elements. It often hap-
pens that this panel is dedicated to a special feature.

Bottom Bars. This panel usually contains a combination of clickable and non-clickable
information. This area typically displays the website’s small print such as FAQ’s, privacy
statements, legal disclaimers and related notices, copyright, troubleshooting advice, site
map information, webmaster and contact us information. It can also contain menu bars
that are alternative or additional to the Top Bar.

Figure 3.1: Web page genre schema


116 Multimodal Transcription and Text Analysis: Chapter 3

explicit manipulation of textual processes. In some accounts influenced by cognitive


science and artificial intelligence, the schemata were seen as internalised
representations of, for example, story grammars operating on and essentially
redescribing lower-level cognitive processes as symbolic representations correspon-
ding to schemata for particular kinds of familiar activities e.g. going to a restaurant
and text-types e.g. stories (Rumelhart, 1975; Schank, Abelson, 1977).
In actual fact, the genre schemas used in the teaching of writing to children
in the classroom are reified templates which control or are used to control the writ-
ing activity. Text is emergent in activity, a product of meaning-making activities.
Our meaning-making activities produce textual products – semiotic object-texts –
that we can both perceive and manipulate in the real-time activity of producing
them. We see and hear the emergent products of our activity and we map those
perceptions back onto the activity, in the process creating semiotic structures
(texts) that we attribute meaning and value to (see Inset 14, pp. 175-177).
Inset 10: The trajectory

� The term trajectory is used in this book to refer in particular to the meaning-
making pathways that are created when users of websites create links from one
web page to another, from one website to another, and so on, as they navigate or
author their way through a website or from one website to another. A meaning-
making trajectory in this sense refers to the progressive integration over time of the
semiotic resources that are encountered as the website user progresses from one
linked object, one text, one web page, one website to another. A trajectory may
last mere seconds or minutes or it may occur over much longer periods of time,
as well as being picked up and resumed across separate occasions.
�From the analytical perspective, the notion of trajectory is used to reconstruct
users’ pathways through websites and to investigate the organisational princi-
ples of such pathways as a form of multimodal text. In this sense, a trajectory
is also a textual record or a trace of the progressive integration over time of the
meaning-making resources that the website user encounters. It is the entextu-
alisation of the web user’s meaning-making activity. As such, it displays prop-
erties of continuity and coherence qua meaning-making trajectory. The
examples given in 3.6 and 3.7 ( see pp. 126-136) indicate that transcription
of trajectories will include descriptions of local resource configurations
such as clusters (see Inset 5, p. 31) and phases (see Inset 7, p. 47).
�The multimodal analysis and transcription of such trajectories can reveal the
ways in which the trajectory integrates diverse semiotic resources to itself as it
develops and unfolds in time. Possible trajectories are afforded by the
resources – both technological and semiotic – of websites. By the same token,
the recording and analysis of trajectories will provide insights into the ways in
which users experience websites and their possible meanings. It will also be
able to show the extent to which trajectories have generic and individual
characteristics in their semiotic make-up.
The relationship between web page, website, web users and web genres & Inset 10 117

In this view, the genre schema is not prior to or a cause of the emergent system
of relations or of any of its components. The genre schema is not a prior cognitive
representation in the head which computes global order for texts. Instead, the genre
schema is a semiotic tool, and therefore just one of the components which interacts
with all the other components that are involved in a distributed network of activities
in time which give rise to texts. In the case of relatively stable and fixed genre
schemas such as those mentioned above, the generic structure potential that is invoked
in the writing activity acts as a locus of control over the construction of the required
text-type. Genre is a technology with which the writer interacts in the process of
creating a text and which exerts its own agency on that process. For example, the
meaning-making trajectory of a given instance of a genre such as Narrative or
Argument must pass through a certain number of stages in a determinate sequence
and it must reach some kind of semantic closure.
The genre schema thus conceived is a stable attractor space which imposes its
own constraints on both the text-producing activity and the outcome of that
activity. In a particular context, activity is assembled in response to the various con-
straints imposed by the subsystems and their components that come together and
constitute that context. A stable genre schema is robust to local change and insta-
bility; stable global configurations of textual elements in particular instances con-
form to varying degrees to particular genre schemas. Web pages and websites not
only give rise to new genres, new combinations of semiotic modalities, new inte-
grations of meaning-making activity with technologies; they also signal a loss of sta-
bility of many of the precursor genres and forms and the relations among these that
we still find in websites. Hypertext and its conventions can be seen as newly emer-
gent solutions to this instability in the system. There are no a priori hypertextual
schemas or templates which provide solutions in advance. Instead, new solutions,
and therefore new genres and new relations between old genres, are created through
the navigation of this space.
When we encounter the home page of a website, we have before us a much
more open-ended set of possibilities as compared to the stable generic forms men-
tioned above. Whereas these forms and their integration to writing activities require
the writing activity to be organised around stable global solutions in the form of
staged, goal-directed sequences of schematic structure elements that exert principles
of strong classification and strong framing (Bernstein, 1990 [1981]) over the activity, the
web page shifts the locus of control back to the computer user. The emphasis is to
a much greater extent on variability, flexibility, multitasking, the fluidity of semiotic
resources and context-sensitive local effects arising, for example, from rolling the
mouse over or clicking one object on the page rather than some other. There is no
single, determinate starting point or sequential organisation. There is no single or
privileged causal or other factor which initiates or controls the activity. Rather, there
is a redistribution of component subsystems and their related resources such that
118 Multimodal Transcription and Text Analysis: Chapter 3

the locus of control shifts away from a stable genre schema to the computer user as
the maker and improviser of solutions. This does not mean that hypertext and its
associated activities are unstructured. What hypertext brings to the fore – some
might say celebrates – is that there are no a priori structures that cause or guide
meaning-making activity from start to finish. Instead, there is a multiple, parallel,
open-ended, backlooping interplay of texts, genres, semiotic modalities,
technologies and the user’s perceptions and actions. It is the interaction among all
these factors that gives rise to stable solutions in time.
The writing of traditional written genres is also a form of distributed activity
in time, although, as we saw above, the locus of control is more strongly focused on
the genre schema as a stable attractor, such that regularity and purposefulness are
emphasised. Hypertext turns this emphasis on its head. In the real-time navigating of
a trajectory through a website, there is an interaction of all the factors mentioned
above; a definable meaning-making pathway emerges and takes shape. Hypertext
brings to the fore these processes in often highly self-conscious ways: local variability,
fluidity and context specificity are emphasised over global order and stability.
Many commentators on hypertext have drawn attention to the features we have
mentioned here. We are not original in mentioning them again. What is absent in most
discussions, however, is any detailed account of the meaning-making process and the
codeployment of semiotic and material resources that takes place during this process.
The process is simply taken for granted and celebrated as such. And yet the premises
for a better understanding of web pages and hypertext are often ill-defined, without
any basis in the detailed analysis of hypertext and its associated activities. Questions
such as the following remain unaddressed: How are semiotic resources integrated
along a hypertext trajectory? What kinds of organisation does this trajectory embody?
How do technologies and semiotic resources co-operate in the development of
hypertext? In this chapter, we seek to give substance to these questions and to indi-
cate analytical solutions to the analysis and transcription of hypertext.

3. 4. The home page

A home page is home to the other pages in a website; it provides the links which
enable users to access the other pages in the website. The home page is the gateway,
and therefore the user’s point of entry, to a website and its meanings. For these
reasons, the designers of web pages place a lot of emphasis on the construction of
the semiotic space in which textual objects are displayed and arranged in relation to
each other, as well as in relation to the viewer. Home pages also often function to
evoke cultural institutions, places and so on. The design of the web page and the way
it presents itself to the viewer are important considerations which also relate to the
ways in which the viewer navigates the virtual space of the website in moving from
the home page to other connected pages within a website or across websites. The
visual, spatial, auditory and linguistic features that contribute to the design of a home
The home page 119

page and its meanings convey more than just information. They also contribute to its
interpersonal appeal, to the evocation of affective responses, to the indexing of social
values and to the creation of atmosphere. Colour, spatial perspective, the depiction of
natural landscapes, architectural sites, persons and so on, can all function to obtain
an interpersonal orientation to the website on the part of the viewer.
This emphasis on the interpersonal appeal of many websites is also a reflex of
the increasing commercialisation of Internet along with the shift in emphasis from
production to consumption in post-fordist economies and the concomitant new
emphasis on the relationship with the client that results from this. In such an
environment, the semiotic engineering of meanings and texts itself becomes a primary
economic imperative in the new information age. This also means that the World Wide
Web is currently the site of a struggle between, on the one hand, the economic and
political interests of advanced capitalism, which see in the Web a vast market to be
manipulated and exploited through the buying and selling of products online along
with the semiotic strategies of persuasion and manipulation that this necessarily entails,
and those who use the web as a means of creating and sustaining new forms of
community, new ways of making meanings and new identities that constitute
alternatives to the vertical hierarchies of traditional media such as television and the
patterns of consumption of goods, services and meanings demanded by the post-
fordist economy. As suggested in Table 3.1, many websites reflect the tension between
the two tendencies in their efforts to negotiate between a concern with, on the one
hand, knowledge and learning and, on the other, the need for institutions to be
accessible to the general public in this age of mass consumption and mass entertainment.
As we shall see below, in connection with the Nasa Kids home page, the use
of contrasting bright and dark colours, the sense of opening out to new hitherto
unexplored horizons in outer space, and the grounding of the visual perspective of
the viewer in a particular physical location all function in semiotic partnership with
each other to create a sense of openness and the need to move away from the
familiar in order to explore new horizons. The semiotic design of the web page
therefore requires careful consideration of the ways in which viewers will feel wel-
comed and comfortable about the site, the institution or person it may give voice
to, not to speak of the technology of the web page for those who are first-time
users or who have had little experience in navigating their way through websites.
This last point also draws attention to the need to give website users easily
accessible (user friendly) points of access for finding and engaging with the objects,
the texts and the links that constitute a website, so that they will come to feel ‘at
home’ with its meanings and practices.
At this stage, we can ask the question as to how the Nasa Kids and British
Museum Children’s COMPASS home pages organize their potential meanings, their
relations to their targeted users, the kinds of reading pathways they afford and their
potential for particular kinds of interaction between user and page.
120 Multimodal Transcription and Text Analysis: Chapter 3

3. 5. The Nasa Kids home page

Figure 3.2 presents the Nasa Kids home page (http://www.nasakids.com/ ) which
invites the viewer to enter into a virtual world that we immediately make sense of
even though it is a world relating to extraterrestrial space and space travel that is not
part of the everyday experience of most of us. Nevertheless, it is a world that we can
make sense of without feeling disoriented. We do so by making the information that
is presented by the visual scene converge into interobjectivity (Latour, 1994) on the
basis of our previous experiences of the objects presented and the relations among
them. Our previous experience and knowledge of these objects is, above all, an
intertextual one (see Inset 8, p. 55). We are familiar with, or assumed to be familiar
with, photographs and film clips of the Apollo missions to the Moon between 1968
and 1972, close-up images of planets (e.g. the planet Saturn with its rings as seen
through the eyes of space probes such as Cassini and telescopes such as Hubble ),
science fiction stories, films and comic strips of journeys into outer space as well as
artists’ conceptions of proposed future manned bases on the Moon’s surface, and so
on. These are all texts – verbal and visual – which we have all encountered in other
contexts in the media, in the school science classroom, in science fiction books and
films, and so on.
On the basis of this knowledge, we are able to make the various objects and
locations that are presented converge into a coherent visual presentation of a
virtual world. The use of schematic, cartoon-like images without too much detail
and without any attempt at scientific accuracy helps here. We are given just
enough information to bring about a plausible interobjectivity; we accept the scene
as it is presented to us without feeling the need for more detail to determine the
acceptability to our senses and our intellect of the world presented. When we
view this world we do not feel disoriented even if the true scale of the
relationship between, for example, the Earth and Saturn is seriously misrepre-
sented in terms of both relative size and proximity to each other in our solar
system. The depicted scene is not concerned with presenting a scientifically true
representation of the scalar relationships between these objects. Nor is it con-
cerned with presenting these objects as we would truly see them under normal
perceptual conditions on the surface of the Earth. In other words, the visual
modality of the scene is neither scientific truth nor perceptual realism.
The cartoon-like character of the overall scene and its objects projects a fan-
tasy world in which actuality, futurology and playfulness are blended. The scene fore-
grounds the perspective of someone looking back towards the Earth from the
Moon’s surface. There is a clear visual intertextual tie to the many widely publicised
photographs from the Apollo missions to the Moon showing the Earth as seen from
the lunar surface or from lunar orbit. By the same token, there is also a
recontextualisation of the visual genre of the realistic photograph to the cartoon genre
The Nasa Kids home page 121

1
3 20

4
7
12
16 18
8
9
10
14 13
5
11 17
19

6
15
A. Clusters responding only to mouse click: C. Self-activating clusters unresponsive to rollover and mouse
Cluster 1: This is a Nasa Logo + Masthead cluster. The masthead click (compare positions and presence with Figure 3.3 ):
is the name of the site. This cluster consists of the
two objects, the logo and the masthead. The cluster Cluster 12: The geyser : a discontinuous cluster with three geysers, one on
anchors the meaning of the page as a whole in which the left, one central, one on the right, alternately spouting
the visually prominent colours tie the notion of a site from the Moon’s surface in a centre-left-right sequence;
for kids with the Nasa space agency. The Masthead is Cluster 13: The astronaut : exits from the Moon base moves to the left-
repeated throughout the page. The choice of bright wards to the foreground before returning to the base in a
arresting colours and animated, cartoon type images, clockwise circle that shifts from the foreground to the
(e.g. the rocket, the space creature) foreground the middle ground;
interpersonal dimension, rather than naturalistic Cluster 14: The Moon buggy : moves on the rim formed by the Moon’s
representations; surface in a clockwise direction becoming larger when
Cluster 2: Nasa News link to other NASA websites, foregrounded and smaller when backgrounded.
not just those for kids; D. Self-activating clusters responsive to mouse click:
Cluster 3: Stories by kids : as above;
Cluster 15: The plane and its banner : moves across the screen from
Cluster 4: Solar Flare : link to NASA KIDS CLUB Art Gallery;
right to left; when the plane is clicked, the user is linked
Cluster 5: Hey Kids...coming soon: link to Rockets page;
to the NASA KIDS CLUB Art Gallery home page;
Cluster 6: Teacher’s Corner : link to teachers resource site.
when the banner is clicked, the user is linked to the
B. Clusters responding to rollover and mouse click: Connect the Stars home page (a game).
Cluster 7: Saturn: on rollover, becomes yellow with Saturn’s E. Self-activating clusters responsive to rollover and mouse click:
rings pulsating; the wording Space & Beyond appears
Cluster 16: The Earth: this cluster rotates simulating the real Earth’s
in red indicating link to page with the same name;
rotation; like Clusters 7-11, on rollover, the cluster
when clicked, goes to the Space & Beyond home page;
becomes yellow with red wording, in this case: Our
Cluster 8: The Rocket: on rollover, becomes yellow; the wording
Earth. Similarly when clicked, the user is linked to the
Rockets & Airplanes appears in red indicating a link to
Earth home page.
page with the same name; when clicked, goes to the
Note: Clusters 12-16 all interpret different cycles of movement.
Rockets home page;
Cluster 9: Nasa: on rollover, as above but with the wording F. Clusters responding to type-in and mouse click:
NASAtoons; when clicked, goes to the NASA Toons Cluster 17: Search engine: when a word such as Moon is typed in,
home page, dealing with animations; and the word Go is clicked, a database is searched and an
Cluster 10: (Moon base ) on rollover, as above, but with the wording appropriate page is returned
Astronauts, living in space; when clicked goes to the
Pioneers home page; G. Inactive Clusters (dotted to show they represent a much larger area ):
Cluster 11: (Extraterrestrial ) on rollover, as above, but with the Cluster 18: Milky Way : dark grey with white spots representing stars;
wording Projects & Games; when clicked goes to the Cluster 19: The Moon’s surface : dark yellow colour;
home page.of Games website. Cluster 20: Deep Space : respresented by pure black.

Figure 3.2: Nasa Kids Home Page: cluster analysis


122 Multimodal Transcription and Text Analysis: Chapter 3

Inset 11: Visual transitivity frames


� The analysis of visual semiosis requires new tools of analysis. One such notion is
that of visual transitivity frame (Baldry, Thibault, 2001, 2005). Visual transitivity
refers to configurations of a process, the participant(s) in that process, and any
associated circumstances in a visual text (Kress, van Leeuwen, 1996: Chap.2). The
notion of visual transitivity frame is meaning based. Specifically, visual transitivity
frames are based on the experiential dimension of meaning in visual texts. They
are a small-scale unit of meaning which can be integrated to larger-scale units such
as the shot or the phase in video texts. This encourages us to go beyond small-
scale units per se in order to see how they relate to, and integrate with, units on
other levels of textual organisation.
� In our work, visual transitivity frames are an important analytical tool in the tag-
ging of visual meanings in text for the purposes of creating a multimodal corpus
made up of various kinds of units in a visual grammar (Baldry, Thibault, 2005).
The visual transitivity frame is just one such unit. On occasions, visual transitivity
frames coincide with a single shot; on other occasions, they are distributed across
more than one shot. The former are intra-shot transitivity frames (e.g. Shot 1 in the
Mitsubishi Carisma advertisement; see Appendix II ); the latter are inter-shot
transitivity frames (e.g. Shots 12-13 in the same advertisement). In Shot 1, the
transitivity frame is created within the one shot by the relationship between the car
(Agent), the movement vector of the car towards the telephone box (Process), and
the stationary telephone box (Target). In Shots 3-4 and 12-13, the visual transitivity
frame is distributed across two shots. In these cases, the cut itself plays its own
role in connecting the two participants (man and woman) in a verbal process of
talking to each other on the telephone.
� Visual transitivity frames can form part of a larger narrative or other sequence on the
discourse level. We can observe how each frame in the snow-shovelling sequence
below depicts a particular action, or a phase in some overall activity, at the same time
that there is a clear beginning-middle-end type structure in which the actions of the
main participant bring about progressive change until the results of the
participant’s actions bear fruit; hence the satisfied and restful posture of the Actor
in Frame 5 as compared to the Actor-Vector: Action-Goal type transitivity frames
signifying the actions that are evidenced in Frames 1-4. The vector formed by the
snow shovel both connects the Actor to the Goal at the same time that the pro-
gression from one frame to the next signifies the change in time as the snow is
progressively cleared away from the wooden area in question.

1 2 3 4 5
The Nasa Kids home page & Inset 11 123

of the Nasa Kids home page, with its blend of fantasy, play and science. The use of
the cartoon genre is motivated by the following criteria: (1) it appeals to the sense of
fun and enjoyment of the young reader; (2) it can hybridise diverse visual domains such
as the realistic photos from the Apollo missions and the fantasy world of the
cartoon; (3) it can negotiate the shift away from the real world to the virtual world
that the home page projects. Some of the visual intertexts that are evoked and
recontextualised in this process include:
� Nasa photographs and films taken during the Apollo missions to the
Moon;
� the space rocket in classic science fiction stories and their film ver-
sions, e.g. Jules Verne, Buck Rogers, and so on;
� photographs of distant planets taken by space probes and tele-
scopes such as Hubble;
� the stereotyped comic strip or cartoon alien (friendly green, bug-eyed
monster );
� artists’ conceptions of future Moon bases;
� the familiar sight of an airplane towing a display banner for adver-
tising purposes.
The viewer is thus positioned as imaginer and traveller in a virtual world. The depicted
scene, with its unrealistic scalar representations, positions the viewer as located on the
lunar surface (here you are ) looking back towards the Earth (where you’ve come from )
and beyond towards Saturn and the Milky Way galaxy in the more distant background
(where you’re headed to ). The initial positioning, along with these shifts in perspective,
function to position the viewer variously as traveller on an imaginary journey, as
knowledge seeker, as imaginer of hypothetical worlds beyond the actual, as adventurer
and as someone to be entertained, who wants to have fun, along the way. As we have
mentioned above, the depicted scene raises questions about scale; for example, in
relation to the relative sizes of Earth and Saturn, their positioning in the solar system,
and their distance from each other. However, the real value of the scene lies in the
imaginary world which it projects. Moreover, this is from a particular vantage point,
i.e. the Moon’s surface. The message seems to be: our Earth is not the only vantage
point from which to view both ourselves and the Universe; therefore, we need to
expand our earthbound perspective to take in those of other worlds.
The visual semiosis of the Nasa Kids home page therefore decouples perceptual
invariants (see Inset 11: Visual transitivity frames on the facing page and Inset 15:
Gibson’s optic array, p. 192) from the stimulus flux of the perception of real-world
events under natural conditions and manipulates and rearranges them as possible or
imaginary scenes. In such cases, as here, visual deixis nevertheless functions to ground
the viewer and the viewer’s perspective – i.e. standing on and observing from the lunar
124 Multimodal Transcription and Text Analysis: Chapter 3

surface – in an orientational field which is based on transformed and rearranged visual


invariants. Hence, the viewer of the Nasa Kids home page can place him- or her-self
in an imaginary environment – a virtual world – by means of the deictic resources of
depiction that are used to ground the viewer and his or her perspective in relation to
the depicted scene. The Nasa Kids home page places the viewer in an immediate
context of situation (here’s where you are and what you’re looking at ). By the same
token, the interobjectivity of the depicted scene is based on a cultural consensus con-
cerning the products of human cultural and technological evolution at the same time
that their interobjectivity bestows on them a partially independent status with respect
to the human world with which they interact and enter into various forms of sym-
biosis. In another sense then, the individualised appeal on this home page to the
particular viewer (here YOU are ) is generalised by the absence of specific human
participants to refer to the present-day state-of-the-art of human exploration of
space, past, present, and future, in which we can all participate in various ways through
our personal computer. Interpersonal appeal engages the user in an interactive
relationship with the screen world and the activities and meanings it affords.
The salient objects – the rocket ship, the Moon base, the Earth, Saturn – attract
the eye, and in so doing they direct the viewer’s attention towards specific thematic
domains that can be activated by clicking on these objects. Each of these objects both
specifies an action procedure and indexes a thematic domain. When the mouse arrow
is rolled over them, they change colour by switching from multi-coloured to yellow
and showing a superimposed verbal item such as Rockets & Airplanes when the
mouse is rolled over this object, or in the case of the highlighted object in Figure 3.3

Cluster
highlighted
after mouse
rollover

Figure 3.3: Nasa Kids home page (focus on NasaToons object illuminated)
The Nasa Kids home page 125

NasaToons. In this way, the given object specifies a superordinate thematic item in
such a way that the object both grabs the viewer’s attention and invites him or her to
click on the object to explore its thematic potential.
The dominant colour contrast is between the bright yellow of the Moon’s
surface in contrast to the black of outer space as it recedes into the background away
from the observer’s perspective on the Moon’s surface. Outer space is sometimes
depicted as a lonely and hostile environment, e.g. in the science fiction movie Alien
(1979). In this film, the vastness and darkness of outer space is seen as a hostile
environment in which humans are exposed to great dangers. The Nasa Kids home page
uses the contrast between the bright yellow of the Moon in the foreground to evoke
a sense of emotional warmth and openness (cf. for example, the use of colour in: the
Children’s COMPASS website in 3.7, pp. 130-136; the Eskimo text in 4.1, pp. 167-
173; the Westpac text in 4.2, pp. 174-181 and Appendix I ). The Moon is depicted as
a home-away-from-home, as shown by the presence of an astronaut’s living quarters,
the Moon buggy roving its surface, and the astronaut who is walking in close proxim-
ity to the Moon base. The receding blackness of outer space is itself populated with
brightly coloured objects such as the Earth and the planet Saturn. Rather than evok-
ing fear and desolation, this space invites the viewer to feel safe and secure in explor-
ing it (see 1.1, pp. 4-21).
Interpersonally, the Nasa Kids home page codeploys visual resources such as
colour, perspective, the juxtaposing of objects and so on. In this way, it positions itself
and the viewer in a complex heteroglossic space in which the meanings and values of
diverse domains of social practice are negotiated and hybridised (see Inset 19, pp. 245-
247). In particular, these domains include: education; entertainment; creating public
interest in and recruiting people, especially young people, to Nasa’s scientific and
technological mission and values. The use of colour as a semiotic resource and a
cartoon-like genre of visual depiction in contrast to, say, a realistic one are functional
choices in this particular semiotic environment, fulfilling functions such as the
following:
� make the depicted world appealing to/enjoyable for the viewer;
� conjoin the viewer to a set of shared community values and meaning
orientations;
� inform the viewer about Nasa’s achievements and discoveries;
� evaluate Nasa’s achievements and discoveries;
� interact with the viewer by answering his or her questions, respond-
ing to his or her search inquiries and so on;
� direct the actions of the viewer in relation to the manipulation of
textual objects, the use of web page resources (audio, video), and
how to go to other sites of interest or relevance to the viewer.
126 Multimodal Transcription and Text Analysis: Chapter 3

3. 6. Creating a hypertext pathway

A single mouse click can initiate a semiotic cascade on account of the multiplying
potential created by the synergy among diverse multimedia genres. As the hypertext
trajectory unfolds (see Inset 10, p. 116), it expands into a much larger-scale semiotic
formation in which diverse genres, modalities, web pages and websites are pro-
gressively integrated with each other. As we shall see, this process is very different
from the linear or sequential character of the generic structure potential of many lin-
guistically realised genres such as Recount, Narrative, Argument and so on (see 3.3,
pp. 113-118). The logogenesis of a particular pathway through a website or from one
site to another can be described as an activity structure that collects the effects of its
own cascading along the duration of the pathway. Thus, a hypertext pathway collects
thematic meanings, diverse semiotic modalities and genres and integrates them to its
own activity as meaning is accumulated along its logogenetic (meaning-making)
trajectory. Let us take the Nasa Kids home page as an example. Figure 3.3 on the pre-
vious page shows the starting point of our chosen pathway, the NasaToons linked
object (marked by an arrow). The NasaToons icon is a superordinate item (see 2.6, pp.
80-90) which indexes and, if clicked, allows access to a series of animated films

1
2
3 4b

4a
5

Figure 3.4: A Far Out Pioneer page


Creating a hypertext pathway 127

about various Nasa activities such as the one shown in Figure 3.4. By clicking on the
NasaToons icon, a pathway is created to another page (see Figure 3.5 on the following
page) in the form of a NasaToons menu of options.
This menu of options is a set of thematic nodes in the form of a visual icon
and a corresponding verbal caption. This page in effect expands the NasaToons icon
on the home page into a much wider set of thematic areas and associated activities
in both the verbal and visual semiotic modalities. Each of these nodes itself provides
access to a thematic area that the given higher-order node specifies. By selecting and
clicking on the verbal-visual node A Far Out Pioneer in the top-right hand corner
that we have blown up in the inset (Figure 3.5), access can be gained to the page,
shown in Figure 3.4, about the Pioneer 10 space probe, which was launched in March
1972. This page deploys the following genres and semiotic modalities:
� Cluster 1: The generic title of the page as a whole: New Science;
� Cluster 2: The specific title: A Far Out Pioneer + icon indicating the availability of audio;
� Cluster 3: The date of posting of the page, i.e. 25th August, 2001;
� Cluster 4a: A short verbal text in the form of a Recount which chronicles the history of
the Pioneer 10 space probe from its launch in March 1972 till the recent rede-
tection of its signal in interstellar space; this text foregrounds the temporal
staging of key events in the thirty-year history of the probe; overall, the verbal
Recount functions as an Orientation for the meanings of the page as a whole;
� Cluster 4b: The verbal Recount is closely integrated with a visual image depicting an artist's
conception of the Pioneer 10 space probe in deep space; the visual image
can be seen as a hypotactic extension of the meanings of the verbal text,
which are primary in this case; in other words, the visual image adds further
dimensions of meaning in the naturalistic visual modality by showing how
the spacecraft really looks out there in deep space;
� Cluster 5: A second verbal text, comprising the imperative clause Check out this
NasaToon for more about Pioneer 10, both specifies a particular procedure to
follow while it also indexically points to the embedded NasaToons animated
video clip which immediately follows this clause;
� Cluster 6: If we click on the animated film clip, we can watch a seven minute educational
video in which narration, animation, graphics and diagrams are used both to
explain the scientific principles underlying the redetection of Pioneer 10 in
Deep Space well beyond the orbit of the planet Pluto as well as some more
historical detail; in any case, it is the film clip, in contrast to the verbal Recount,
which foregrounds the scientific meanings associated with Pioneer 10;
� Cluster 7: The left-hand column of the page specifies links to other video texts about
Pioneer 10, the Pioneer 10 Home Page of Nasa’s Ames Research Center, and so
on.
The above analysis shows how the unfolding activity sequence involves the progres-
sive expansion and integration of meanings and actions along its trajectory. The
128 Multimodal Transcription and Text Analysis: Chapter 3

various stages in this sequence are schematised in Table 3.2, which transcribes the
main stages in the unfolding activity sequence. In the present example, the analysis
stops with the New Science page. However, it is possible to continue the pathway in
any number of ways; for example, by clicking on the Pioneer 10 Home Page icon in
order to go to that page, or going back to the NasaToons menu or the Nasa Kids
home page in order to select other objects which enable us to open other thematic
domains and their related activities. Nevertheless, this brief analysis is enough to show
that the meanings made along a given trajectory are different in important ways with
respect to the sequential unfolding of the generic structure potential of verbal texts such
as Recounts, Narratives, Arguments and so on. For a start, the hypertext trajectory is
much more open-ended; it does not have a definite beginning-middle-end type of struc-
ture, and for this reason it does not feature the same kind of semantic closure that is
characteristic of these linguistic genres. Instead, there is a progressive opening up of
thematic regions and genre possibilities as the developing trajectory navigates a path-
way across genres, semiotic modalities, activities and meanings.
This page exhibits a fairly high degree of semiotic condensation, which is typical
of many web pages. This is so in two related senses. First, there is a spatial juxtaposi-
tion of different texts, different genres of text, different semiotic modalities and
different associated technologies on the same page in a screen environment which
affords the user interaction with, and manipulation of, these texts and the objects that
are embedded in them. Secondly, the processes referred to in this first point also mean
that many processes on much longer space-time scales, e.g. the thirty-plus years of
Pioneer 10’s journey in space, impinge upon, and produce effects on, the short
timescale activities of the user when he or she interacts with the page, acts on the

Figure 3.5: NASAtoons menu of options page


Creating a hypertext pathway 129

page, and so on (e.g. reading text, watching the video, clicking objects, printing, down-
loading and so on). These two features of the web page draw attention to the princi-
ples of weak classification and weak framing (Bernstein, 1990 [1981]) that characterise
the relationships among semiotic resources and genres and their spatial arrangement
and codeployment on the same web page or along a hypertext trajectory. The material
affordances of the computer screen both make this possible at the same time that the
user is positioned as an agent who can intervene in texts and create personalised
hypertextual pathways through websites and between websites. The personal computer
affords the dynamic assembling of, and intervention in, a diversity of semiotic resources
and genres that were previously strongly insulated from each other as domains of
specialised practices and competencies.
Importantly, the meaningful whole that is created by a particular web page is not
defined by the physical space alone of the page on the screen, or by the spatial arrange-
ment of the elements in this space. Instead, it is defined by semiotic-material functional
relations and networks of relations. This is also true of the static printed page and its
meanings. The difference lies in the kind and degree of the functional relations
involved. Rather than a vertical hierarchy of elements and their functions, based on
the criterion of spatial juxtaposition and relative size with respect to the compositional
whole, there is also an increased emphasis on the horizontal multiplicity of functional
pathways. These pathways potentially relate the different texts and objects on the web
page not only to each other, but also to other pathways, other actions, other connec-
tions to other pages and websites and their texts and objects. Pathways of this kind
enact a whole network of connections and interactions that cannot be defined or
analysed in vertical hierarchical or compositional terms alone.

1. LOCATION: Nasa Kids home page;


ACTION: Select Object: NasaToons icon^ Click object

go to

2. LOCATION: NasaToons menu; expanded set of thematic nodes as hyponyms of the
superordinate NasaToons icon;
ACTION: Select Object: A Far Out Pioneer^ Click object

go to

3. LOCATION: New Science: A Far Out Pioneer ;
ACTION 1: Read verbal heading and text; look at picture;
ACTION 2: Select Object: NasaToons animated video^ Click object^ View video;
ACTION 3: Select Object: Pioneer 10 Home Page^ Click object.^

Table 3.2: Transcription of an unfolding hypertext pathway


130 Multimodal Transcription and Text Analysis: Chapter 3

3. 7. The British Museum Children’s COMPASS website

3. 7. 1. Children’s COMPASS home page: description of multimodal objects


Figure 3.6 presents the second home page (http://www.thebritishmuseum.ac.uk/chil-
drenscompass/) that we shall analyse in detail in this chapter: the British Museum
Children’s COMPASS home page, which relates to another website aimed at the
young user. This home page is centred on Alfred, the British Museum lion. Alfred’s
size and his positioning in the centre of the page make him the most salient object
within the visual field of this home page. Alfred is not depicted realistically.
Instead, the cartoon-type image places the emphasis on Alfred’s friendliness and
his readiness to make the would-be user of the site feel welcome. The image of
Alfred is not static: he turns his head towards the objects on the right side of the
page, looks at them, looks up again towards the viewer, winks his left eye, then
blinks both eyes and wags his tail. If the mouse arrow is rolled over Alfred, a text
box projects the following text as spoken by Alfred: I’m Alfred the British Museum
lion. Welcome to Children’s COMPASS. All of these features function to evoke a
particular interpersonal response from the viewer. Alfred is a reassuring and
welcoming figure; he makes the viewer feel positive about the Museum at the same
time that he also indexes the greatness and strength of a prestigious cultural insti-
tution. These two facets of the meanings associated with Alfred show the
processes of heteroglossic negotiation between the meanings and values of the British
Museum as a prestigious cultural and scientific institution with a mission to
research and educate, the need to appeal to a wide audience from the general
public, and the need to combine education with fun and entertainment for the ben-
efit of the children to whom the website is primarily addressed.
Alfred’s size, colour and position in the centre of this home page make him
stand out markedly against the predominantly blue background. The use of this colour
is in itself quite significant, especially with regard to the interpersonal response it is
designed to evoke in the viewer. In contrast to warm colours such as red and orange,
blue is a cool colour. Cool colours such as blue appear to recede while warm colours
tend to come towards the viewer. The blue background of this home page, coupled
with the absence of specific background detail with the exception of the pale blue
compass, conveys the impression of an undefined space that is receding away from the
viewer. The fact that blue is associated with both the vast receding expanses of the sky
and the ocean may reinforce this effect. Thus, blue creates a feeling of unconfined
spaces that expand or recede into the distance (see Westpac text in 4.7.6, pp. 199-200
and Appendix I ). This feeling is further enhanced by the uniformly smooth quality of
the background surface. The relatively highly saturated background blue contrasts with
the pale, unsaturated blue of the compass, which is placed behind Alfred. The light blue
(higher luminosity) of the compass in contrast with the darker blue (lower luminosity)
of the background both gives the impression that the darker background blue
The British Museum Children’s COMPASS website 131

1
6
4

3 8
5

Cluster 1: Masthead + Subheading: The Masthead is the Cluster 5: Interaction is the main feature of this
name of the British Museum website as a whole (the cluster. The ask-the-expert and contact-us elements
Portal). Cluster 1 consists of the overall name of the function as ways for the user to interact with the site.
website, which situates this particular web page within Each of these elements is, as before (see Cluster 2 ),
partly linguistic and partly visual, though in this case
that overall site. The use of upper case font and its posi-
we have imperative clauses instead of nouns. The
tion in the top-left part of the page foreground the lion’s head and paw print identify the lion as the
Masthead’s superordinate position with respect to the expert-cum-companion, who will guide us around this
specific web page. The subtext: The British Museum: virtual museum site. The visual aspects of this cluster,
Illuminating world cultures hypotactically extends the the lion’s head and paw print, create a textual tie with
meaning of the masthead. The meaning of subordina- the central and visually salient cluster, Cluster 4.
tion is conveyed by the change in typography: the size Cluster 6: A going-on-a-museum-tour cluster. In the top
of the font is much smaller, underscoring the idea of a right of the page there are two pictures entitled Object
subordinate relationship to the masthead. and Tour, each with a caption below it. These are both
Cluster 2: The four items in Cluster 2 each comprise a linked objects leading to links on the next page. The
visual icon relating to a noun, for example, Search: An two photos are naturalistic, unlike the lion. They fore-
ground the educational value of the site with the shift
icon of two little feet is closely linked physically and
from the ludic to the scientific domains. The superor-
thematically with the noun Tours to form a cohesive tie.
dinate item This month’s ties this month’s object to the
Cluster 3: Link to the National Grid for Learning tour highlighting the museum function of the website.
Cluster 4: This cluster consists of the cartoon-like lion, Cluster 7: The Competition winner’s noticeboard cluster.
Alfred, and part of a pale blue compass on a deeper In this case, two similar images relate back to the super-
blue background. Alfred’s central position on the page ordinate heading. The organisational principle is similar
is a good example of bilateral symmetry, creating a to Cluster 6, two same-sized images stand in a
sense of balance between the left and right sides of the hyponymic relation to a superordinate linguistic caption.
page. The lion gazes directly at the viewer, inviting him Cluster 8: This is the masthead plus logo for the
or her to come in. The point of the compass is a vector Children’s Compass site page. Cluster 8 and Cluster 1
which links Alfred in Cluster 4, to the masthead, stand in a relation of complementarity and contrast.
Cluster 1. The cartoon-like representation of the lion They are complementary as they are both about the
British Museum at the same time that they both
is predominantly interpersonal, rather than
coexist within the bigger website of the British
experiential in its orientation: it suggests user- Museum, the Portal. The Children’s COMPASS home
friendliness, as is appropriate in a site designed for page both identifies and gives access to this specific
children. There is obviously a relationship between the website within the overall British Museum portal.
lion and the compass, indicating that the lion will be Cluster 9: The bottom bar comprises four items each
the visitor’s guide to the museum. consisting of an icon and a verbal text.

Figure 3.6: The British Museum’s Children’s COMPASS home page: cluster analysis
132 Multimodal Transcription and Text Analysis: Chapter 3

surrounds the compass and that the higher luminosity of the compass creates the effect
of the compass receding away from the viewer. This choice in colour and luminosity
is co-contextualised with the directional vector of the compass, which points away
from Alfred towards the top right of the visual field. The compass thus functions as a
visual metaphor of the British Museum as educator. The Museum’s researchers explore
far away regions of knowledge and history at the same time that they make this
knowledge accessible to the general public, thanks to the friendly British Museum lion.
If the Search icon is clicked, the user gains access to another page entitled The British
Museum. Search the Museum. This page consists of a menu of options (Figure 3.7).
As with the NasaToons example, this particular menu of options consists of
a set of thematic nodes in the form of a visual icon and a corresponding verbal
caption. In the present example, this menu requires the reader to make a selection
from two thematic areas – one geographical and/or temporal (blue), the other fea-
turing domains of daily life (pink) – before submitting the combined selection to the
Find button. As with the NasaToons example, the menu of options expands the
Search icon on the home page into a much more diverse set of thematic regions and
their possible intersections. Having made, for example, the selection Asia + Daily
Life, the user clicks on the Find button and creates a link to the page headed Daily
life in Asia, which is the more specific thematic area resulting from the combination
of the two selections from the menu of options on the previous page (Figure 3.7).
Figure 3.8 shows the page, Daily life in Asia, which results from this choice.
This page consists of a brief information report giving some factual information

Figure 3.7: The British Museum: Search the Museum page


The British Museum Chilren’s COMPASS website 133

about the topic, with particular reference to objects used by ordinary people in
their daily lives. On the left side of the page, opposite the verbal text, a number of
such objects, with identifying verbal caption, take the form of clickable icons refer-
ring to specific Museum objects which can be viewed in the website. When the user
clicks on the icon entitled Women sewing, a print, he or she creates a link to a page
which focuses on this print and its meanings (Figure 3.9). This page consists of the
following texts and objects:
� a colour plate of the print, Women sewing, which is a display item in
the Department of Japanese Antiquities;
� a short verbal text providing a brief historical orientation to the
print in the first sentence, a brief description of each of the three
scenes depicted in the print, and an instructional text suggesting
other persons and objects in the depicted scenes that the viewer
may also wish to attend to;
� below the reproduction of the print are four icons + verbal cap-
tions of other linked and clickable objects belonging to the same
thematic area as the print that is featured on this page.
The page overall is a recontextualisation of a display item from a particular
Museum exhibition. In other words, the page recontextualises the three-dimensional
display of an arrangement of items in space accompanied by verbal text explaining

Figure 3.8: Daily life in Asia page


134 Multimodal Transcription and Text Analysis: Chapter 3

the items to the two-dimensional modalities of the web page. The Museum object
– the Japanese print – has been decontextualised from its grouping with other objects
in a British Museum display and recontextualised in another modality – the
photographic plate embedded in a web page – such that it functions as a metonym
of the Museum display. Moreover, the spatial juxtaposition of objects in a Museum
display is itself transformed by the web page into the possibility of creating links
with other objects that are thematically related (see, for example, the Linked Objects
menu in the Women sewing, a print web page shown in Figure 3.9 below the print
itself), though not necessarily coming from the same area of the Museum. Once
again, we can see how the unfolding activity sequence involves the progressive
expansion and integration of meanings and actions along its trajectory. The various
stages in this sequence are schematised in Table 3.3.
In Phase 4 of this unfolding hypertext trajectory, the reproduction of the
print, Women sewing, is surrounded by a white border in the top-left part of the
page. It is as if the print is being displayed in an autonomous ‘space’, removed from
the context of its relationships with other items in a museum display. The
recontextualisation of the print in this way tends to minimise both its historical
context and its links to other times and places beyond that depicted in the print itself.
This possibility is further emphasised by the two clickable objects, Add to folder, and
Larger picture, which are placed on the right-hand side of the print. Both of these
clickable objects specify actions which the viewer can perform on the reproduction
of the print. They allow the viewer to save the print to the computer’s hardisk and
to enlarge the size of the print so that it can be seen in greater detail. Both of these

Figure 3.9: The web page: Women sewing , a print


The British Museum Chilren’s COMPASS website 135

actions enable the viewer to appropriate a copy of the print to a private sphere, and
therefore afford possibilities for further uses of the print in other contexts beyond
that of the museum in which it is located and the web page itself. The print can be
linked to other kinds of activities that would not be possible in the case of the
original print in its museum display context.
A hypertext trajectory unfolds in time at the same time that it integrates texts
and activities in the virtual locations, in the form of web pages and their objects
that the viewer encounters at different stages along the trajectory. The objects that
can be clicked on, and accessed, on the Daily life in Asia page are all presented in
a way which foregrounds their thematic homogeneity, in spite of the differences in
historical time, geographical provenance and their housing in different
Departments of the British Museum. For example, the object Salt bag ( Figure 3.9),
in comparison with the Women sewing print, comes from Pakistan, dates from the
19th century, and is housed in the Department of Ethnography. However, the mode

1 LOCATION: British Museum Children’s COMPASS Home Page;


ACTION: Select Object: Search icon^ Click object

go to


2 LOCATION: Search the Museum menu; expanded set of thematic nodes as hyponyms of
superordinate Search the Museum icon; the blue and the pink nodes are
cohyponyms of each other;
ACTION: Select Objects: Asia + Daily Life^ Click Find button

go to


3 LOCATION: Daily life in Asia


ACTION: Read verbal heading and text: Information Report; look at clickable
objects;

go to


4 LOCATION:Women sewing, a print;


ACTION 1: Select Object: Women sewing, a print^ Click object^ View print + read text:
Information Report + Procedure: Suggestions for attending to object
ACTION 2:View Linked Objects^ Select Object: Hsun-ok^ Click object

Table 3.3: British Museum Children’s COMPASS activity sequence


136 Multimodal Transcription and Text Analysis: Chapter 3

of presentation of its web page is the same as that for the Japanese print. The
compositional homogeneity of the pages pertaining to these objects and the fact that
they are all construed as joint verbal-visual cohyponyms of each other at the same time
that they are hyponyms of the superordinate thematic item Daily life in Asia both
decontextualises these items as displayed museum objects and recontextualises them
as the coarticulated parts of a joint verbal-visual thematic formation in which the
diversity of times and places partly gives way to the newly contingent meanings of
this hybrid thematic formation. The original display items, now recontextualised as
visual images and verbal text in the new formation, are treated as entextualised
hypertextual objects, rather than material ones, which can be integrated to other
forms of activity in the virtual hypertextual environment of the website.
The visual presentation of the objects removes them from their contexts of use
and transforms them into aestheticised objects for contemplation and appreciation.
Whilst the verbal text in each case foregrounds meanings and activities concerned
with some aspect of the daily life of the people who used the object, the visual image
gives rise to another set of meanings that are not entirely supported by the verbal text.
The tension between the different meaning orientations of the two modalities – the
visual and the verbal – in this context suggests a complex process of negotiation
between the meanings that pertain to the scientific (historical, paleontological, etc.)
activities of the Museum qua research community and the requirement that the
Museum both educate and entertain a wider public of non-specialist visitors. The
latter can more readily appreciate the value of the objects displayed with reference to
formalist criteria of aesthetic appreciation and its appropriation to a private sphere of
consumption which typically emphasises the ‘autonomy’ of the object rather than its
relations to the material conditions of its manufacture and use. In this way, we can
see how the meanings of the web page under consideration here are themselves a
further recontextualisation of the meanings and activities of the Museum itself. We
will now consider in more detail the ways in which joint verbal-visual thematic
relations and meanings are built up during the development of a hypertext pathway.

3. 8. A multimodal hypertextual thematic formation: Daily Life in Asia

3. 8. 1. Thematic system analysis: preliminary observations and an example


The creation of a hypertextual trajectory leads to the building up of systems of
meaning relationships from text to text, from verbal text to visual image, from page
to page, and so on, as one navigates through a website or across websites.
Hypertext is a set of discourse practices which allows us to construct such rela-
tionships, in the process instantiating multimodal systems of thematic meanings
(Lemke, 1983). Lemke has shown how the ideational-grammatical relations of the
clause and the clause complex are resources in language for creating semantic
relations that can be typical of a set of texts, rather than being specific to a single
A multimodal hypertextual thematic formation: Daily Life in Asia 137

text. In a given thematic system, the use of a particular transitivity pattern – e.g.
Actor-Process: Material-Goal – in some clause in a given text may be a typical use
of this pattern in some intertextual set of texts in a community and not merely a
selection which is specific to a single text. Such a system is called an intertextual
thematic system. Thus, a particular clause may be the instantiation of a more abstract
thematic relation which the particular text shares with other texts, even if the text-
specific lexicogrammatical realisations of the common thematic pattern may vary
from one instance to another. The notion of thematic system was developed as a tool
for representing the salient and common meaning relations that are shared by any-
thing from a very restricted number of texts to a potentially indefinite number
(Lemke, 1983: 162).
To demonstrate the basic principles of this kind of analysis, let us now
consider an example of visual-verbal thematic relations between a photograph and its
verbal caption in a magazine.
Thematic relations are non-linear; they are diagrammatically represented in a
network notation consisting of nodes (thematic items) and the connections among
these (thematic relations). The specific thematic relations which occur in a text can
be seen as such or they can be connected to other texts sharing the same system of
thematic relations. An intertextual thematic relation, which is more abstract than a
text-specific one, can be represented in terms of typical transitivity patterns in the
experiential semantics of the clause (e.g. Actor-Process-Goal), of typical nominal
group patterns (e.g. Epithet-Thing-Post Qualifier) and of the clause complex
relations that link clauses into larger-scale semantic units.
Thematic relations in verbal texts are created through the lexicogrammatical
resources of the clause and the clause complex. Consider the clause Young romance
on the subway: what was out of the question for their parents’ generation is quite normal
today. This clause was the verbal caption of a photograph (not shown here) in an
inflight magazine of an airline. The experiential grammatical semantics of this clause
is of the Relational: Attribution type. In clauses of this type, an Attribute is attrib-
uted to a Carrier. The Carrier instantiates the type-quality or the type-class of quality
or thing specified by the Attribute. In the example, the Carrier-Attribute relationship
in the clause functions to construct a thematic relationship between the two items,
the Carrier, what was out of the question for their parents’ generation, and the Attribute,
quite normal today. Specifically, this grammatico-semantic relationship is used to con-
strue a thematic relation of opposition between the different values of the older and
the younger generations. The transitivity relation in this clause is the grammatical
means for relating the two items as part of a more abstract thematic pattern con-
cerned with the conflicting values of the older and the younger generations. This is
more than just a semantic relationship specific to this clause. Instead, the
lexicogrammatical resources of the clause are the means for constructing a more
abstract thematic relation that is shared with other texts.
138 Multimodal Transcription and Text Analysis: Chapter 3

In this and the subsequent analysis, the abstract intertextual pattern is repre-
sented in capitals as a notational convention to show that its meaning relation refers
to a more abstract and general class of thematic relation that is common to, or
shared by, some intertextual set – small or large – of texts. In the present case, this
may be represented as /GENERATION CONFLICT: OLD-UNACCEPTABLE VS. YOUNG-
ACCEPTABLE/. This shows how the locutions out of the question and quite normal are
assimilated to the meaning of an intertextual thematic pattern which is concerned
with the clash, the contrast, or the opposition between the values of parents and
their offspring, especially with regard to issues such as courtship, romantic love, sex-
uality and their display in public places such as subway trains.
The present example shows that the bringing together of the choices in both
language and depiction creates the possibility of a joint verbal-visual thematic relation
(see 2.6.2 , pp. 82-89). However, it is important to go beyond the mere possibility
of a cross-modal thematic relation and show which ties are foregrounded, which
ties are typical patterns in some wider multimodal intertextual thematic relation, and
how these ties are established and developed in texts. In the analysis below of the
thematic development strategies along a hypertextual pathway in the British Museum
Children’s COMPASS website, we shall see that the action potential of Linked Objects
(as in Figure 3.9), cross-modal covariate ties, the resources of deixis and transitivity
relations in both clause grammar and depiction, are some of the means whereby a
network of hypertextual multimodal thematic relations is built up. In the present
analysis, we shall focus in particular on some of the thematic ties between verbal and
visual patterns as well as the strategies that are deployed to create these ties.
The present example occurred as the verbal caption which accompanied a
colour photograph of a young couple in affectionate embrace on a subway train in
Seoul, South Korea, in the inflight magazine of a well-known airline company. The
verbal caption and the participants and the actions they are engaged in (the visual
transitivity) in the photograph construe a multimodal thematic relation on the basis
of the ties between ideational-grammatical relations in the clause and the visual
transitivity relations in the photograph. We would say that many cases of the verbal
‘anchoring’ of the meaning of a visual image are in fact multimodal verbal-visual
thematic relations which themselves have an intertextual, and not merely a text-
specific, basis.
A young couple, seated between another young couple on the left side of the
photograph and a young woman on the right side, are depicted embracing each
other and involved in affectionate hand play. The visual scene therefore depicts a
visual transitivity relation involving two participants and the actions they perform
together (e.g. embracing, holding hands). While the scene is specific both to this
photograph and this couple on that particular occasion, it is also the instantiation of
a typical pattern that is common to very many images, not to speak of the scenes
from everyday life that these images derive from.
A multimodal hypertextual thematic formation: Daily Life in Asia 139

In the verbal caption, the nominal group young romance on the subway indexi-
cally points to the image and verbally identifies the depicted scene. It is as if the nom-
inal group could be expanded to a clause-level structure such as This is young romance
on the subway in which the demonstrative pronoun this, which exophorically points to
the picture as its discourse referent, is the Identified in an Identified-Identifier
transitivity relation in its clause. This does not mean that the picture is without
meaning until the words supply it with one. As we have said above, the visual
transitivity in this text is part of a common pattern which is shared by many images
and, on this basis, with or without language, it is capable of generating its own mean-
ings. Likewise, the Classifier-Epithet structure in young romance uses the grammatical
resources of the nominal group to create a thematic tie between the items young and
romance which is shared by very many linguistic texts.
The combination of the photograph and the nominal group instantiates a joint
verbal-visual intertextual thematic pattern across the two semiotic systems. The point
is that the codeployment of the two systems, along with the linking together of
thematic choices in the form of the covariate tie between the visual transitivity pattern
in the photograph and the Classifier-Thing relation in the nominal group, strengthens
the thematic link between the two modalities: not only is the process-participant
relation in the visual transitivity a typical pattern, but the Classifier-Thing semantics of
the nominal group also defines its Thing as being of a certain type – young romance is
a standard cultural-semantic pattern and a typical collocation pattern. In this way, a
covariate tie links the two modalities on the basis of some shared higher-order
meaning relation. This thematic relation may be diagrammatically represented as in
Figure 3.10.

IMAGE: VISUAL TRANSITIVITY: MALE-FEMALE^ ACTION: EMBRACE^ LOCA-


TION: ON SUBWAY TRAIN

covariate tie

VERBAL CAPTION: NOMINAL GROUP: CLASSIFIER-THING-LOCATION: young


romance on the subway

Figure 3.10: Covariate tie between verbal and visual semiotic modalities,
creating a cross-modal thematic relation in an airline magazine text
140 Multimodal Transcription and Text Analysis: Chapter 3

The relations that are internal to the visual transitivity pattern and the nomi-
nal group are structural relations (Participants^ Vector and Classifier^ Thing) that
belong to the experiential meaning relations in the two different semiotic modalities.
The relationship that is construed between them is a non-structural or covariate
relation in which the two items are linked on the basis of some meaning relation that
they are construed as sharing. It is the interplay between the two kinds of relations
(structural and non-structural), along with the deictic or indexical link already men-
tioned, that constitutes the typical strategies for building up multimodal thematic
relations. We shall now explore how this applies to an instance of a hypertextual path-
way between pages in the British Museum’s Children’s COMPASS website.

3. 8. 2. Multimodal thematic system development along a hypertext pathway


Our hypertextual journey starts with the Search icon on the home page. If we roll the
mouse arrow over this icon, it becomes animated and changes colour at the same time
that it gyrates. As we also saw in relation to the Nasa Kids home page, this is a typical
resource for indicating the action potential of the object qua clickable object. We shall
begin our analysis with the page entitled Daily life in Asia (Figure 3.8).
The title of the text, Daily life in Asia, forms covariate semantic ties with the
multimodal objects VISUAL ICON: ASIA + Asia and VISUAL ICON: MAN WORKING +
Daily life. These items are themselves part of a Circumstance: Location structure in
the joint verbal-visual clause-like structure YOU SEARCHED ON VISUAL ICON: ASIA +
ASIA – VISUAL ICON: MAN WORKING + DAILY LIFE. This structure therefore creates a
joint verbal-visual thematic relation in which the computer user is construed and direct-
ly addressed as the Actor ( you searched …) who is responsible for creating this
thematic link. At the same time, the item you searched enacts both a covariate semantic
and a coactional tie with the item Search (expanded as Search the museum when rolled
over by the mouse) on the home page.
In this way, the computer user is experientially construed in this clause as
actively developing a specific verbal-visual set of thematic relations that can be related
back to the initial (and initiating) search activity as a result of clicking on the choices
described above. Clicking on Search as we saw in 3.7 (see Figure 3.6) or on Find (see
Figure 3.7) above represents the beginning of a hypertext trajectory in which the user
is required to make choices from, and to combine choices from, specific thematic
domains, e.g. ASIA + DAILY LIFE. In making this choice, the computer user initiates the
dynamic on-line assembling of the page which results from this particular combination
of choices from the menu of options on the previous page, as explained in 3.7, pp.
130-136. This also means that the joint verbal-visual thematic relations which emerge
are themselves a consequence of these choices, of the online assembling of the page
in question, as well as of the links to previous pages already traversed and future ones
yet to be created by clicking on other objects on this or other pages. Here is the verbal
text for this page:
Multimodal thematic system development along a hypertext pathway 141

(1) Many objects survive which tell us about the daily life of people living in Asia.
(2) These include paintings and books, pottery, glass, clothing and burial
goods which all contain important information about the past.
(3) Whilst many objects tell us about the lives of the wealthy, we can also
learn about the lives of ordinary people.
The first sentence in the verbal text, Daily life in Asia, creates a semantic link
between many objects and survive, because in this thematic formation there is a typical
semantic relation /OBJECTS-SURVIVE/ (Medium-Process) when /OBJECTS/ are from
the past or can tell us something interesting or important about the past. In the second
clause of this sentence, /OBJECTS/ is in the semantic role of Sayer in a verbal process
clause of the type Sayer-Process-Recipient-Circ:Matter: Verbiage, i.e. /OBJECTS TELL
US ABOUT THE LIFE OF PEOPLE LIVING IN ASIA/. Thus, in a larger set of texts which
belong to the same intertextual thematic formation (see Inset 8: Intertextuality, p.55),
/OBJECTS/ are Sayers which speak to us from the past. The Circumstance of Matter
(what about) further specifies the ideational content of the discourse that is attributed
to the Sayer, /OBJECTS/. Thus, in this first sentence a crucial thematic link is made
between, on the one hand, /OBJECTS-SURVIVE/ and /OBJECTS-TELL-ABOUT LIFE IN
THE PAST/, on the other. In this case, the life of the people living in Asia is assimila-
ble to the abstract thematic item /LIFE IN THE PAST/.
Moreover, the items the daily lives of people living in Asia, the lives of the wealthy,
and the lives of ordinary people are assignable to the superordinate thematic pattern
/PEOPLE-LIVE-IN ASIA/ which is introduced here and further developed on the specific
pages related to the Linked Objects (see below). These three items all occur in
Circumstances of Matter in relation to verbal processes (objects tell us) in their
respective clauses. In this way, the thematic pattern /PEOPLE-LIVE-IN ASIA/ is
coarticulated with the /OBJECTS-SURVIVE/ pattern described above by means of the
clause level experiential pattern Sayer-Process: Verbal-Recipient-Verbiage.
The Linked Objects displayed to the left of the verbal text, each comprising a
visual icon and a verbal caption, are therefore assimilable to the superordinate
thematic item /OBJECTS SURVIVED FROM PAST/ at the same time that the joint verbal-
visual Linked Objects can be seen as specific instantiations of the superordinate item.
Again, we see how the thematic formation is based on the inter-semiotic
complementarity between verbal and visual resources. For example, the Linked Object
[VISUAL ICON + VERBAL CAPTION: Women sewing, a print] can be seen as a specific
thematic expansion in the two modalities of the verbal item goods in the main verbal
text. At the same time, the Linked Object also creates the possibility of a coactional
tie between verbal text and Linked Object. The user can further explore the thematic
possibilities of the Object and how it develops the thematic meanings and relations that
are built up on this introductory page in relation to the more specific pages that can
be accessed via the Linked Objects.
142 Multimodal Transcription and Text Analysis: Chapter 3

The first clause of the second sentence can be assimilated to the abstract
pattern /GENERALISATION: EXAMPLE(S)/, as in the clause These (i.e. objects) include art
and books, pottery, glass, clothing and goods, which is a clause of the type Carrier-Process:
Relational-Attribute: Type-category. The second clause of this sentence, i.e. (they) all
contain important information about the past, further elaborates the previous mention of
the same abstract thematic pattern /OBJECTS-TELL-ABOUT LIFE IN THE PAST/. In
Sentence 1 this is so even though the lexicogrammatical choices in this occurrence
of the thematic item are of the type Carrier-Process: Relational: Attribute-
Circumstance: Matter. Thus, /OBJECTS CONTAIN INFORMATION ABOUT THE PAST/
and /OBJECTS TELL US ABOUT THE PAST/ are lexicogrammatical variants on the same
more general thematic relation in their intertextual thematic formation. Significantly,
the final clause in this sentence also gives voice to an evaluation of the importance
of the information that these objects contain, rating this positively on a scale
IMPORTANT-UNIMPORTANT.
The thematic meaning of the third sentence in part hinges on an implied
opposition between the thematic items /WEALTHY PEOPLE/ and /NOT SO WEALTHY
PEOPLE/, where the nominal group ordinary people can be assimilated to the meaning
of the more abstract item /NOT SO WEALTHY PEOPLE/. The clause many objects tell us
about the wealthy is assimilable to the abstract thematic item /OBJECTS-TELL-ABOUT
LIFE IN THE PAST/ that was evidenced in Sentence 2 above. In the present case, the
meaning of the wealthy is therefore interpretable as a hyponym of the superordinate
thematic item /LIFE IN THE PAST/; it is seen as a specific exemplification of the super-
ordinate item. The second clause, we can also learn about the life of ordinary people, is a
clause of the type Senser-Process: Mental-Phenomenon-Circumstance: Matter. The
two clauses in this sentence together imply a thematic relation which is partially
implicit. This relationship may be expanded into the full set of thematic items and
the relations among them as follows:
MANY OBJECTS TELL US ABOUT THE WEALTHY
THEREFORE
WE LEARN ABOUT THE WEALTHY
MANY OBJECTS (ALSO) TELL US ABOUT THE NOT SO WEALTHY
THEREFORE
WE LEARN ABOUT THE NOT SO WEALTHY.

The pivotal thematic relationship here is that between /TELL/ and /LEARN/, where it
is assumed that the latter is a consequence of the former: someone tells us something
therefore we learn something. We are able to supply the missing thematic items in order
to build up an expanded thematic formation, as shown above. The muted
heteroglossic opposition between wealthy and ordinary people, which in some other
thematic relation could imply a negative evaluation of the latter, is in this case evaluated
positively, such that objects pertaining to the lives of less wealthy and poor people are
seen as being just as interesting and important as those having to do with the wealthy.
Appendix I: Multimodal Transcription of the Westpac advertisement (T= time in seconds)
T VISUAL FRAME VISUAL IMAGE KINESIC ACTION SOUNDTRACK METAFUNCTIONAL INTERPRETATION
PHASES AND SUBPHASES
C.1 Column 2 Column 3 Column 4 Column 5 Column 6
! Shot 1 CP: stationary [Herdsman starts walk- [silence] PHASE 1a
HP: frontal ing from car towards
VP: median viewer; sheep dog goes
1 D: VLS to left; Herdsman starts
VC: sheep, eucalyptus tree, utility van, sheep dog rolling up left sleeve]
VS: progressive magnification of form of herdsman (1-10) Tempo: M
CO: naturalistic

{RG}
Herdsman bends down [ ] Solo keyboard
and twice slaps thighs to EXP: Actor; action (Herdsman walks
(pp, TWO towards viewer)
2 recall dog to his side CHORDS ^
Tempo: M [sheep]: SI
Volume: p
Tempo: S
(^ Dog returns to herds-
man). Herdsman starts INT: Viewer positioned as belonging to
rolling up right sleeve depicted world and its shared values;
3
Tempo: M

[Herdsman stands {RG}


upright; Starts rolling up [ ]Drum (p):I Imperative mood of chorus:
left sleeve]^ [dog returns [ chorus]; exhortation to act addressed to
4 to his side; resumes walk- (*) roll viewer; minor dyadic exchange:
ing] Volume: pp
Tempo: S
Tempo: M

[Herdsman continues
rolling up left sleeve; Herdsman/dog; low volume, slow
dog runs ahead]. tempo of music: intimate communion
5
Multimodal Transcription and Text Analysis: Appendix I

Tempo: M
I
[Herdsman continues them //(#)
II

rolling up right sleeve;


continues walking
6 forward]

Tempo: M
Phase 1

[ ]
TEX: hyperthematic status of Phase 1a
functioning
7

{RG} (1) to introduce textually significant


[ chorus]; items: movement and rolling sleeves
(*)roll up;
8 Volume: f
Tempo: S

[Herdsman stops (2) establishes shot in rural Australia


walking; resumes rolling
up left sleeve; gaze
9 directed off-screen to
right]

VS: maximum magnification of visual contour of Herdsman rolls up right them //(#) (3) thematic condensation of major
herdsman sleeve themes to be developed in subse-
VC: hat, shirt typical of rural worker quent text
VF: far; off-screen Tempo: M
10
! Shot 2 CP: stationary [Draughtswoman rolls up {RG} PHASE 1b
HP: oblique left sleeve, while sitting at [ ]
VP: median
D: MCS desk] (*) roll
11 VC: indices of work (glasses, desk, lamp)
VS: draughtswoman Tempo: M
CO: naturalistic
CR: red
VF: near; hands

[Draughtswoman rolls up EXP: Actor; Action


right sleeve; chin thrust 3 work locations, their typical
forward] participants, and associated
12 activities and performance
Tempo: M indicators

! Shot 3 CP: stationary [Truck driver enters cabin INT: identification of viewer with
HP: frontal of truck, moving
VP: low depicted scenes; solidarity (smile,
towards viewer, prior to gaze); chorus: imperative exhortation
D: MCS driving truck; mouth:
13 VC: blue singlet of driver; interior of truck cabin of viewer
smiles]
VS: truck driver
CO: naturalistic
VF: median; viewer Tempo: M

them //(#) TEX: movement, smiling, sleeve rolling as


covariate cohesive ties across shots

14

! Shot 4 CP: stationary [Nurse walks briskly {RG} PHASE 1c


HP: frontal
VP: median towards viewer; rolls up left (*) ROLL them up
D: MCS sleeve; looks at and Volume: ff EXP: Actor; Action, participant in work
Multimodal Transcription and Text Analysis: Appendix I

15 VC: background out of focus; nurse’s uniform smiles at someone out of Tempo: F location
VS: nurse
CO: naturalistic field of vision] (NO MUSICAL
III

VF: median; off-screen Tempo: F ACCOMPANIMENT)


INT: interpersonal solidarity with implied
[Nurse walks briskly {RG}
off-screen participant; loud volume and
IV

towards viewer; rolls up [ chorus]:


fast tempo of music: public immersion
right sleeve; indistinct par- (*)roll
in exhortation
ticipant walks left to right in them up //(#)
16 TEX: movement, smile, sleeve rolling as
background] Volume: ff
covariate ties
Tempo:F
Tempo: F
Phases 1 & 2

! Shot 5 CP: stationary PHASE 2a


HP: frontal Westpac logo moves {RG} EXP: Actor;Action (logo as visual
VP: median towards the viewer [ ]: R experiential metaphor of Australia/
D: LS to MCS and (*) let’s get … Westpac moving forward in relation to
17 VC: light blue in upper frame and dark blue in Tempo: M Volume: f
lower frame converge on red logo sky and ocean)
VS: logo contrasting with two shades of blue Tempo: S INT: modality: hyperreal of desired dialogic
CO: sensory turn: female soloist as role model inter-
CR: red, blue acting with chorus and viewer; implicit
inclusive we.
!Shot 6 CP: stationary TEX: covariate tie with previous on basis of
HP: oblique [Father and son roll right (*)mov(!!)ing shared feature moving forwards logo as
VP: median sleeves up] hyponym of previous in that it includes
D: MCS all such instances; visual contrast with
18 VC: garden tools, shed co-text plus transitional function of
VS: father & son Tempo: M
CO: naturalistic logo
VF: near; father’s supervisory gaze directed to son;
son’s directed to own hands

{RG}
[ chorus]:R PHASE 2b
EXP: Actor;Action As above;
(*) roll them visual thematics: intergenerational
19 relations (father/son, old/young and
(*) up
Volume: ff paternalistic ideology
Tempo: F

(*) roll them up INT: viewer identification with depicted


Volume: ff world; public immersion in
Tempo: F exhortation
gaze vector links father and son [ ]: R TEX: rolling sleeves as covariate tie
20 Volume: f
Tempo: S PHASE 2c
{RG}
(*)let’s
!Shot 7 CP: stationary Dishwasher rolls up right see it EXP: Actor;Action;visual thematic of
HP: frontal sleeve Phase 2b further developed through
VP: median
D: MCS displacement onto dishwasher
21 VC: white uniform, dirty dishes, kitchen utensils Tempo: M (young, subordinate) and supervisor
VS: centred contour of dishwasher (older, superordinate)
CO: naturalistic
VF: near; hands
Dishwasher grasps newly {RG} INT viewer identification with depicted
arrived pile of dirty dishes (*) through (!!) world;
with both hands Female soloist as role model for
22 VF: near; dirty dishes {RG} dishwasher
Tempo: M [ chorus]:R

! Shot 8 CP: panning [Supervisor walks on job


HP: slightly oblique site, right to left; rolls up TEX: covariate ties: sleeve rolling (dish-
VP: low washer) and movement (supervisor);
D: MCS left sleeve]
23 VC: helmet, shirt and tie of supervisor; industrial plant in contrastive direction of his move-
background Tempo: M ment (right to left)
VS: contour of supervisor walking left to right
CO: naturalistic VF: horizontal right to left; off-screen

HP: frontal [Supervisor walks on job


VF: near; viewer site; mouth: smiles]

Tempo: M

24
! Shot 9 PHASE 3a
As in Shot 5 Westpac logo moves
towards the viewer
Logo as in Shot 5
Tempo: M

{RG} EXP: Actor;Action


[ ]: R visual thematics extend
intergenerational theme (schoolgirl),
Multimodal Transcription and Text Analysis: Appendix I

25 and let’s get …


Volume: f who is thus linked to other
Tempo: S participants in common cause
V
!Shot 10 CP: stationary [Schoolgirl sits at desk; (*) mov(!!)ing INT: viewer identification with visual;
VI

HP: frontal grasps pen with right //SLOW female soloist as role model: advisory
VP: median hand] speech act
D: MCS
VC: school uniform of girl, study materials, desk
VS: contour of girl
26 CO: naturalistic
VF: near; viewer
[Schoolgirl leans forward; TEX: covariate ties: smiling, sleeve rolling,
mouth: smiles] moving forward; cut from logo to
Phases 2 & 3

schoolgirl on moving
Tempo: M

! Shot 11 CP: stationary [Businessman sits at desk; {RG} PHASE 3b


HP: frontal
VP: median rolls up right sleeve; [ chorus]:R
D: MCS mouth smiles] Volume: f
27 VC: desk, work materials, business dress EXP: as before
Tempo: S
VS: businessman Tempo: M
CO: naturalistic
VF: near; viewer we have

! Shot 12 CP: panning [Baker stands outside bak- a job INT: chorus: declararive clause and explicit
HP: slightly oblique
VP: median ery; rolls up left sleeve] we provides reason/motivation for
D: MLS prior exhortation in
28 VC: shop window, bread, baker’s uniform Tempo: M dyadic response to soloist
VS: shop window, contour of baker
CO: naturalistic
VF: median; off-screen

VF: distance: median; orientation: viewer; to TEX: chorus extends from businessman to
otherwise as above baker, or linking them in a joint we as
defined above.
29

(*) do (!!)
//SLOW
30
! Shot 13 CP: stationary Bricklayer crouches {RG} PHASE 4a
HP: frontal behind wall
VP: median [ ±]: I
D: MCS Volume: n
VC: brick construction wall, bricklayer occluded Tempo: M
by wall
VS: brick wall
31 CO: naturalistic
Bricklayer places brick in this (*) country EXP: Actor;Action (Shots 13-14);
As above; bricklayer now partially visible behind wall wall discourse of male speaker and visual
VF: near; directed to brick enact joint thematics of
Tempo: M building Australia though work; the-
matic condensation of this
thematic in bricklayer and dish-
washer
As above; [Bricklayer continues posi- was (NA) built
VC: blue work-singlet and trowel tioning brick in wall; raises
trowel]
32
Tempo: M

Bricklayer taps brick with on a tradition INT: viewer identification with depicted
trowel world; constant tempo and volume
of speaker’s discourse: leadership
33 Tempo: M

! Shot 14 CP: stationary; [Dishwasher stands before {RG}


HP: frontal
VP: median; dishes; rolls left sleeve of (*) rolling
D: MCS down] the sleeves
34 VC: white uniform, dirty dishes, newly arrived dirty dishes on right (NA)up
VS: contour of dishwasher, clean dishes on left, dirty dishes on Tempo: M //(#)
right
CO: naturalistic;VF: near; directed down to dirty dishes on left
As above; (Hands bring more dishes {RG} TEX: rolling sleeves as covariate tie
VF: directed down to newly arrived dirty dishes on to be washed^Dishwasher and (*) getting
right starts rolling sleeve up) on with the
35 (NA)job
Tempo: M // (#)
Multimodal Transcription and Text Analysis: Appendix I VII
{RG} PHASE 4b
at
(NA) Westpac
36 //(#) /
VIII Phase 4

As in Shots 5 and 9 Westpac logo moves {RG} EXP: Actor;Action (Shots 16-20); joint
!Shot 15 towards the viewer we see (*) our verbal-visual thematic of Westpac
job services and tie to prior thematic of
37 Tempo: M getting on with the job

CP: panning;
!Shot 16 HP: oblique Helicopter pilot runs {RG}
VP: low; towards helicopter as (NA)backing
D: MCS
38 VC: pilot’s jacket, helmet, helicopter Tempo: F
VS: pilot running left to right to helicopter
CO: naturalistic;
VF: off-screen
As above Helicopter pilot enters that way of INT: As above; identification of we of
VF: near; directed inside cabin of helicopter helicopter thinking Westpac with corporate symbol of
logo (Shot 15)
39 Tempo: F

as much

40 NO VF

! Shot 17 CP: stationary


HP: frontal [Nurse walks towards as we possibly TEX: covariate ties, rolling sleeves,
VP: median the viewer; mouth: (*) can smiling, movement, red
D: CS smiles] //(#)
41 VC: No background detail; face, nurse’s hat
VS: nurse’s face Tempo: M
CO: naturalistic
VF: close; viewer
CP: stationary
! Shot 18 HP: frontal Carpenter rolls up left
{RG}
VP: median sleeve; smiles
(*)backing it
D: MCS Tempo: M
42 VC: house construction site in background [Carpenter rolls up right
VS: carpenter sleeve; smiles]
CO: naturalistic Tempo: M
VF: directed to viewer
CP: panning towards teller
! Shot 19 HP: frontal Supervisor walks on job with
VP: median site
D: MCS (NA) money
DT: little background detail; shirt, tie, Tempo: M //(#)
supervisor’s helmet
VS: face of supervisor
CO: naturalistic
VF: close; viewer
43
[Supervisor walks on job
site; mouth: smiles]

Tempo: M

{RG} with
ad(NA) vice
//(#)
44

! Shot 20 CP: panning towards teller [Bank teller stands at {RG}


HP: frontal counter; leans towards with (NA)
VP: median viewer] more
D: MCS (*)branches
VC: Westpac bank counter in foreground, red tie of Tempo: M than any other
teller, Westpac logo on wall in background,
background participant moves left to right
45 VS: teller in foreground
CO: naturalistic
CR: red of tie and logo
VF: median; viewer
Multimodal Transcription and Text Analysis: Appendix I IX
D: MCS [As above; mouth: smiles] (*) bank PHASE 4c
X

VF: close, viewer (#/)


46
Phase 4

!Shot 21 CP: stationary [Nun holds cricket bat; {RG} EXP: Actor;Action
HP: frontal rolls sleeves up: boys and a (*) brand thematics of bank merger made
VP: median grouped together near new explicit for first time; joint verbal-
D: MLS nun] visual thematics of competition;
DT: nun’s dress, cricket bat and wicket, white sports heteroglossic alliance of spiritual
uniform of boys (nun), youth (boys), and economic
VS: nun and boys, nun’s atypical way of holding a competition
47 cricket bat (*) spirit
CO: naturalistic [Nun holds cricket bat;
VF: median; viewer (nun) boys turn towards viewer;
one boy bowls ball]

of INT: As above
(*)
competition
48 VF: boy bowling facing viewer

{RG}
that will bring
(NA) more and
49 (*) more (*)
benefits

[Boy from opposing team (*) your way TEX: As above


runs, appears from off- // (#)
screen and runs left to
50 VC: nun occluded by lead boy, out of focus figure of right; other boys run
boy forward]
VF: directed off frame to right (lead boy) Tempo: F
CP: panning towards travel agent [Travel agent stands at
! Shot 22 HP: frontal counter; rolls up right
VP: median sleeve]
51 D: MCS
VC: minimal background indicates Westpac travel Tempo: M
agency, travel brochures on right
VS: travel agent’s face
CO: naturalistic
CR: red (tie, brochures) [Travel agent rolls up left {RG} PHASE 4d
VF: close, viewer) sleeve; mouth: smiles] at (NA) Westpac
// (#) EXP: Actor;Action;
Tempo: M verbal-visual thematics of Westpac
service (Shot 22) and leadership (Shot
23)
52
[Travel agent leans {RG} INT: As above; affirmation of both power
forward towards viewer; we’re (*) and solidarity
mouth: smiles] rolling
TEX: covariate ties as above
Tempo: M

! Shot 23 CP: panning left to right as secretary exits office [Westpac executive ushers (NA) our sleeves
HP: oblique woman out of his office; his up
VP: median arm extends to her back;
D: MCS woman moves right as she
VC: minimal background indicates (Westpac) execu- leaves]
tive office, business dress of executives Tempo: M
VS: contours of executives, secretary in shadow
53 CO: sensory, low detail
CR: red (tie) (*)too
VF: directed to secretary // (#)

[Westpac executives sit {RG} PHASE 5a


down; one on right rolls EXP: Actor;Action
up sleeve] [ chorus]:R verbal-visual thematics of merging
54 VS: two executives (secretary now out of scene) of collective we of chorus with cor-
we have porate we of Westpac
Multimodal Transcription and Text Analysis: Appendix I XI
VS: single business executive rolling sleeves up [Westpac executive rolls a (*) job (!!) INT: identification of Westpac with the
VF: median; viewer
XII
up sleeve; smiles] people
55 TEX: covariate ties as above

! Shot 24 CP: stationary to


HP: frontal [Left hand rolls up sleeve PHASE 5b
EXP: Actor;Action (hand rolls up sleeve)
Phases 4 & 5

VP: median of right arm; right arm


D: CS moves upwards in victory joint verbal-visual thematic of work and
56 VC: no background detail, focus on lower arm of sign] victory in arm vector;
executive as in 55 high degree of thematic condensation in
VS: arm, sleeve rolled up, arm rising Tempo: M this action of prior thematic systems
CO: abstract (work, leadership, success, strength, etc.)
(*) do (!!) INT: viewer identification with Westpac
//SLOW TEX: covariate ties: rolling sleeves, red
[±]
57

PHASE 5c

EXP: Actor;Action (we’re rolling up our


58 DT: arm almost totally faded from field of vision, sleeves)
indistinct background, written text in lower part of [ ]: fade
screen experiential visual metaphor maps
CR: red (logo) Westpac logo onto Australia

Final chord of INT: musical fade and resolution on tonic:


music harmony
59 arm now absent

[silence] EXP: red as covariate tie

60
Multimodal thematic system development along a hypertext pathway 143

The next stage in the pathway entails selecting and clicking on the Linked Object
called Women sewing, a print on the Daily life in Asia page. In making this choice, we
go to the page entitled Women sewing, a print. We shall now discuss how the move to
this page relates to, and further develops, the thematic meanings of the Daily life in
Asia page. The verbal text on this page is as follows (1-4 belong to Paragraph 1, while
5 to Paragraph 2):

(1) Sewing was done by Japanese women, who often spent their time
apart from the men of the family.
(2) This print shows three women working together on their sewing on
a hot summer’s day.
(3) On the right, two of them stretch and fold a red silk sash with a tie-
dyed pattern of white starfishes.
(4) On the left, one woman is holding up a sash, maybe to check the
repair she has just made.
(5) Look for other members of the family – a teenage girl peering into
a small cage, which holds her pet insect; a little boy teasing a cat with
its reflection in a mirror; and a baby playing with its mother’s fan.
The first clause in the first sentence of the verbal text, Sewing was done by Japanese
women, is assimilable to the more general superordinate thematic pattern /PEOPLE-
LIVE-IN ASIA/ in the introductory text on the previous page. At the same time, the
new item is also a further semantic specification of the developing thematic
relations across the two texts. The Actor-Process: Material-Goal pattern of this
clause specifies the activity which these Japanese women undertook. Its
experiential semantics belong to a wider set of texts about /HUMAN ACTIVITY/, or
/PEOPLE DO THINGS/, as part of the way they live. From this point of view, this
clause is both assimilable to the superordinate thematic item at the same time that
it is its further specification. The second clause develops the same basic pattern.
Thus, /JAPANESE WOMEN SPEND TIME APART FROM MEN/ is itself a further instan-
tiation of the /PEOPLE LIVE IN ASIA: PEOPLE DO THINGS/ thematic pattern.
The clause (Sentence 2), This print shows three women working with their sewing
on a hot summer’s day, indexes a cothematic tie with the visual scenes which are
depicted in the print. The new clause also further develops the /PEOPLE DO
THINGS/ thematic relation in the non-finite clause three women working with their
sewing on a hot summer’s day by extending the thematics of the text into the domain
of work, which is a subset of what people do. This clause again instantiates the
Actor-Process: Material pattern that was noted above, in the process further
developing the thematics of this paragraph as /PEOPLE DO THINGS: WORK/. In
repeating the transitivity pattern of the second clause in Sentence 1, a covariate
semantic tie is established between the two: both clauses are about what the women
did (Actor-Process: Material). The remaining clauses in this paragraph continue the
144 Multimodal Transcription and Text Analysis: Chapter 3

Inset 12: Scalar levels

�It is useful to talk about texts in terms of different scalar levels of


organisation. A text or a discourse event is a system of scalar levels, or it can
be analytically reconstructed as such (see also Inset 13 : System and instance ,
pp. 172-173). In theory, the system of scalar levels could continue indefi-
nitely in any given direction, though, in practice, there are always limits on
the number of levels that the analyst needs to work with. The different levels
in a scalar hierarchy allow for the representation of different kinds of units
and relations on different levels. In the analysis of television advertisements,
for example, we have proposed a number of scalar levels, which have been
arranged below in ascending order of size (smaller to larger), as follows:
visual transitivity frame; shot; subphase; phase; macrophase; whole text.

�In the present study, these units refer to different kinds of meaning relations
on their respective levels. Thus, there are different scales of meaning making
and their relationships in multimodal texts. The units and relations that are
described at any given level usually apply only to that level. In other words,
each level is characterised by the distinctive units and their relationships on
that level. This does not alter the fact that on occasions a lower level unit
such as a visual transitivity frame may coincide with a single shot; on other
occasions, it may be distributed across more than one shot. In such cases, we
can say that the two levels have been conflated in that text. Such observations
draw attention to the tight linkages across the different levels in the given
hierarchy of semiotic relations. Levels are not therefore autonomous. The
same point applies to the relations between: subclusters, clusters and super-
clusters of items to be found on, for example, printed pages and web pages
(see Inset 5: Clusters and cluster analysis , p. 31).

�In one sense, as noted above, a system of scalar levels is a hierarchical struc-
ture. The units and relations on a given level, such as the shot in video texts,
are parts of larger wholes such as the subphase or the phase on higher levels.
However, the notion of a hierarchy in which larger-scale units ‘contain’
smaller-scale ones (e.g. a shot contains a visual transitivity frame ) can also be
misleading. There are two points to be made here:
(1) larger-scalar units provide integrating contexts for smaller-scale ones;
(2) the different levels mutually interact with and constrain each other;
they are not for this reason completely separable.
Smaller-scale units are not simply smaller parts or building blocks in larger
wholes. Leakage across levels is part of the way in which a hierarchy of
meaningful units and relations functions in discourse.
Multimodal thematic system development along a hypertext pathway & Inset 12 145

basic Actor-Process: Material-Goal pattern in their experiential semantics. At the


same time, each clause provides more specific detail and therefore contributes to the
further thematic development of the overall pattern that is foregrounded in this para-
graph. Thus, /TWO WOMEN STRETCH AND FOLD A RED SILK SASH WITH A TIE-DYED
PATTERN OF WHITE STARFISHES/ (Actor-Process: Material-Goal) and /ONE WOMAN
HOLDS UP SASH/ (Actor-Process: Material-Goal). All of these clauses contribute to
the development of, and are assimilable to, the /PEOPLE DO THINGS: WORK/
thematic pattern.
The imperative clause Look for other members of the family in the first sen-
tence of the second paragraph creates a further cothematic and coactional tie between
the visual reproduction of the print and the verbal text. In this case, the reader’s
attention is directed to still finer details of the depicted scenes in this triptych of
images. The imperative clause specifies a particular action (look for ) which orients
the computer user’s attention to the image in a particular way, and therefore creates
a coactional tie between the verbal text and the image. At the same time, this clause
introduces a new thematic focus with respect to the previous paragraph. The
participants in the image are now referred to as other members of the family. The item
introduces a new thematic item in the verbal text which creates a specific cothe-
matic link between it and the ‘other participants’ in the image that are the new
focus here, at the same time that the previously mentioned participants in the pre-
vious paragraph are assimilated to the new thematic meaning /FAMILY/.
A new subset of the /PEOPLE DO THINGS/ thematic pattern is also created
with respect to the newly focal participants in the image, viz. /PEOPLE DO THINGS:
PLAY/. The nominal groups a teenage girl peering into a small cage, a little boy teasing
a cat with a mirror, a baby playing with its mother can all be unpacked and assimilated
to the clause-level experiential pattern Actor-Process: Material-Goal, as above.
Each of these nominal groups entails a focus on a specific feature of the depicted
scene – a particular Participant-Action combination – such that they make selected
aspects of the image relevant to their own interpretation. At the same time, the
non-specific deixis of these nominal groups instantiates each Participant-Action
cluster in the image as a non-specific instance of the type-category that is named in
their nominal groups, i.e. ‘any teenage girl’, ‘any little boy’, and so on. Thus, the deixis
here points to the image as a typical scene exemplifying a typical pattern, /DAILY LIFE
IN ASIA/, rather than specific participants in a specific time and place. For example,
/GIRL PEERS INTO SMALL CAGE/, /BOY TEASES CAT/, /BABY PLAYS WITH MOTHER’S
FAN/, all of which are instantiations of the /PEOPLE DO THINGS: PLAY/ thematic
pattern, at the same time that this pattern is coarticulated with the /FAMILY/ pattern
across the two semiotic modalities.
The discussion in this section shows how cothematic and coactional ties are
created across different semiotic modalities such as the verbal text and visual image in
hypertext. As we saw above in relation to the brief example from the airline magazine,
146 Multimodal Transcription and Text Analysis: Chapter 3

multimodal links between verbal and visual genres are made on the basis of joint
verbal-visual thematic relations. The building of a relation of this kind means that
both semiotic systems have resources that enable this linkage to occur. In the
discussion of the Women sewing print, we saw above how such intersemiotic
complementarity is created between visual transitivity and linguistic transitivity
relations. Of the many different things that could be focused on, this example selects
patterns of Process-Participant relations in the two semiotic systems as the basis for
the creation of a joint visual-verbal thematic system of relations which can be built up
in different ways as one moves between image and text on this page, or from this page
to some linked object such as Salt bag in the same website. There is nothing which
inherently connects these texts and images except the hypertextual pathways from
page to page and the specific arrangements and co-contextualisations of semiotic
modalities on different scalar levels of organisation ranging from a particular cluster
of items on a page, the relations among clusters, a particular juxtaposition of image
and verbal text, or the pathway taken from one page to another, and so on.

3. 9. The action potential of hypertext objects

A hypertext object has an ambivalent status: it is a visual image at the same time that
it is more than that. The fact that the term object is used is in no way fortuitous. In this
section, we will explore the ambivalent character of some of the linked objects on the
Nasa Kids home page in order to better understand their dual status as picture and
object. A linked object on a web page has a potential for action. It is useful to think in
terms of the layered nature of such objects (see below). We can explore the space
between any given layer in terms of a potential pathway: every time we click on an
object, we get a new set of objects and a new set of relationships which link both
backwards and forwards to previous objects and to relationships in the past and in the
future. Some of these are determined by the author and some by the reader. On the
Nasa Kids home page, objects of this kind include the rocket and the Earth. When we
click on such an object, we activate various possibilities for action. Each new
possibility is a potential pathway which can itself be described in metafunctional
terms. Let us follow up this idea in more detail.
A pathway from one object to some other is a functional unit of meaning and
action; it can be related to higher or lower scale units as well as to other units on its
own scale. A unit of this kind is not a formal unit, but a unit of action and meaning.
Meaning is made through activity. Text is derived from activity. Activity and action are
the fundamental meaning-making units; text is derived from meaning-making activity.
The creation of a link from one object to another, from one web page to another, and
so on, is a form of activity which is instigated by the interaction between computer
user and the selective use of the resources of the given web page or website that the
user makes. Activity of this kind integrates objects, texts, images, web pages, and so
The action potential of hypertext objects 147

on, along its trajectory. Meaning is created according to the kinds of relationships that
the computer user construes among the objects, texts, and so on, that are so integrated
along this trajectory. A hypertext pathway therefore entails the selective
recontextualisation of the resources of the website as the pathway unfolds in time. As
we shall now see, some aspects of the activities performed are in the hands of the
computer user whilst others are controlled by the computer program.
On a web page, action can take various forms. There are objects which move
on the web page in relation to the other objects in the depicted scene. Such objects
move autonomously; they have locomotion; they move from place to place and their
movement is not influenced by the computer user. Examples on the Nasa Kids home
page include the Moon buggy and the rocket pulling the display banner. Another kind
of object is that which changes its state when clicked by the user. Examples of this
type include the rocket on the Moon’s surface, the Earth and Saturn. The two kinds
of objects on the Nasa Kids home page are two different classes of objects with
distinct functions. For example, the Moon buggy is a moving object which does not
respond to mouse rollover or to mouse clicking. On the other hand, the rocket flying
overhead with the display banner in tow does respond to clicking, though it accesses
a different kind of activity (playing the game of drawing the constellations) as com-
pared to the more thematically oriented objects such as Saturn, Earth, and the rocket
on the lunar surface (see below).
The metafunctional character of the linked object Rockets & Airplanes and other
linked objects is described in the following three subsections.

3. 9. 1. Experiential meaning
The activity sequence can be analysed in terms of the functionally related parts which
comprise the whole. In this perspective, the activity-sequence is describable as a config-
uration of Participant roles, a Process (the action performed), and a Result, as follows:
PARTICIPANT: AGENT: COMPUTER USER^ PROCESS: ACTION: MOUSE
CLICK^ PARTICIPANT: GOAL: WEB PAGE OBJECT^ RESULT: CHANGE OF STATE VIS-À-VIS
AN OBJECT (e.g. THE ROCKET, THE EARTH, THE ASTRONAUTS LIVING IN SPACE).

This activity structure comprises four functionally related parts; each part has a
functional role to play both in relation to the whole activity structure and in relation to
the other parts in the whole. The computer user-cum-mouse clicker is an agent whose
action is directed towards a particular object. When the object is clicked, the agent’s
action instigates a certain result, viz. a change of state in the object. A structure of this
kind is said to be a multivariate structure. That is, it comprises different kinds of parts
which, in combination, make their distinctive contribution to the whole in which they
function. In cases like these, the activity structure is a hybrid of real and virtual
elements. The computer user and his or her manipulation of the mouse belong to the
real world of physical actions performed by the body. This activity of the body has
148 Multimodal Transcription and Text Analysis: Chapter 3

its virtual extension on the screen in the form of the mouse arrow, which responds to
the actions of the computer user. Objects on the screen such as the rocket on the
Moon’s surface are virtual objects; in many respects the traditional separation between
visual image and the real object represented by that image has been replaced by, or
realigned, in a new type of relationship in which visual image and manipulable object
are now merged in a new kind of functional relationship. In this new relationship, the
visual image of the rocket is not so much the representation of something else (a
rocket) that is not present, but the presentation or creation of a (virtual) object which
has specific kinds of reality effects.
The computer user can act on and manipulate this virtual object such that in
the virtual world of the computer screen the distinction between visual image-as-
representation and represented object is dissolved. The result is a new kind of hybrid
entity whose reality status lies somewhere between the two. Rather than indexically
presupposing its represented object, the image indexically creates an object which
participates in a field of action with other participants – real and virtual – such as the
computer user and the other objects that populate the screen world. The conjoining
of image and object in this virtual environment restores to the image some of its
original appeal as a direct incarnation of the world of objects before the discourse of
representation prised them apart and made of the image an object of contemplation
and distancing, rather than of action and involvement.
This may also help to explain the immense appeal and efficacy of the virtual
world of hypertext objects for so many people: the objects of this virtual world are
malleable and manipulable in varying ways and to varying degrees such that we can
say that human actors submit to them, their actions, and their possibilities for emo-
tional involvement and subjective investment. These aspects will be discussed in the
next section in relation to interpersonal meaning.

3. 9. 2. Interpersonal meaning
The object has the potential for dialogic engagement with the reader. We do not wish
to suggest that this is the same as dialogue between, say, two humans in conversation.
Rather, the integration of human and computer gives rise to forms of dialogic inter-
action and co-ordination of joint human-computer activity that are modelled on
human-human interaction at the same time that they constitute its further specifica-
tion along some parameters by virtue of the fact that the computer is made by humans
and is an extension of human activity in a social and cultural context. For these
reasons, the dialogue of computers or between computers and humans must in some
ways resemble their precursor forms of human-human dialogue, at the same time that
this dialogue is a qualitatively new development that has its own characteristics, not
reducible to the precursor level. The implications of this possibility are still little
understood at this stage. The reader orients to and engages with the object in the
following way:
Interpersonal meaning 149

READER^ ENGAGE OBJECT: MOUSE ROLLOVER^ OBJECT RESPONDS: CHANGE


FORM: CHANGE COLOUR: YELLOW + MOVEMENT + VERBAL TEXT: PROPOSE
LINK^ READER RESPONDS: CLICKS OBJECT^ OPENS NEXT PAGE

The modification of the object, e.g. the rocket, over the temporal duration of
the rollover both indexes the object’s interactive potential and attracts or engages the
attention of the reader (see Figure 3.13). This involves a multimodal coordination of
the following kinds of stimulus information: visual + auditory + kinesic: object:
movement: pulsating; kinesic: movement: hand-arm-eye movement of user.
McGregor (1997:79) points out that a sign can be modified or reshaped in order
to achieve interactional ends. The processes of sign modification or reshaping are con-
jugational relations: the reshaping or deformation of the sign spreads across or holds
the whole sign in its scope. Conjugational relations are thus scopal in nature in contrast
to the particulate or part-whole relations based on constituency that are characteristic
of experiential relations in the clause. Interpersonal meanings are created by the
reshaping or the modification of the signs in order to achieve interactive ends. In lan-
guage, the different mood categories (e.g. declarative, interrogative) reshape the same
proposition for different interactive purposes, i.e. asserting or interrogating the
proposition in the declarative or interrogative clause, as in semiosis is a dialogic activity
vs. is semiosis a dialogic activity ? The reshaping of the sign according to the
interactive purpose iconically expresses the interpersonal meaning. In these
examples, the interpersonal meanings declarative and interrogative are not con-
stituents in their clauses; instead, they hold the entire clausal proposition in their
scope and shape the clause accordingly. We can now relate these observations to our
example. To do so, it is necessary to distinguish between a number of distinct layers
of organisation that comprise the object in question, namely the rocket.
On Layer 1 (see the following page), the rocket is depicted as a participant
in an overall scene. The viewer orients to it from this perspective. On Layer 2, on
the other hand, when mouse rollover occurs, it is still a rocket, of course, but the
modification of this image through the change of colour (uniform bright yellow)
and texture (e.g. loss of specific detail) and the appearance of the superimposed
verbal caption, require the viewer to orient to this image from a different perspec-
tive and for a different interactional purpose with respect to Layer 1. These changes
spread across the entire object and constitute a reshaping or modification of the
object for interactive-interpersonal purposes, as described above. They iconically
signal both a change in the way the viewer is required to orient to the object and a
change in the interactive purpose to be attained by engaging with the object. The
changes that are manifested on Layer 2 when mouse rollover occurs signal a
different way of orienting to the given object in order to attain a different interac-
tional purpose. The different ways of interpersonally orienting to, and interacting
with, objects such as the rocket can be summarised as follows.
150 Multimodal Transcription and Text Analysis: Chapter 3

Layer 1:
ORIENTATION: explore overall visual field; view object as a
component of the depicted scene;
ACTION: explore objects with mouse (some respond, others
don’t);

Layer 2:
ORIENTATION: focus on/attend to object as salient, appealing, now
foregrounded against the depicted scene on Layer 1:
view object as having action potential;
ACTION: mouse rollover^ object responds: change form;

Layer 3:
ORIENTATION: focus on action potential of object as link to a new page;
ACTION: respond to change of form on Layer 2^ left click
object^ create link to new page.

The two parameters ORIENTATION and ACTION specified here represent two
aspects of the interpersonal meaning potential of the object on each of the three
layers in the analysis. These two parameters are simultaneously present in the
interpersonal meaning of the object on each layer.
A third parameter, APPEAL, will be briefly discussed below. On each layer, the
changes in the object induced by [+mouse contact] or [-mouse contact] constitute
modifications of the object in order to indicate how the viewer is to orient to and
interact with the object in the way described above. The role of the mouse with
respect to the three layers may be described as follows:

BEFORE DURING

Figure 3.11: Search subcluster before and during mouse rollover


Interpersonal meaning 151

Layer 1: [-MOUSE CONTACT]: view object in depicted scene;

Layer 2: [+MOUSE CONTACT: roll over]^ modify object: invite viewer


to respond;

Layer 3: [+MOUSE CONTACT: left click]^ create link to new page.

Similar observations can be made, for example, about objects such as Search on
the British Museum Children’s COMPASS home page (Cluster 2). The subcluster com-
prises three items:
(1) a picture of a torchlight; superimposed on...;
(2) a brown disc with a light coloured outer edge;
(3) the word Search.
Figure 3.11 shows this subcluster before and during mouse-roll in order to demon-
strate the interpersonal modification which occurs in this object in the transition
from Layer 1 to Layer 2. On mouse rollover, the disc both rotates on its axis around
the torch in its centre and simultaneously gyrates in an up-and-down movement.
During this movement, the brown coloured face of the disc which is apparent prior
to mouse rollover is alternately visible and not visible to the viewer during the rota-
tion of the disc. The same pattern characterises all four items in Cluster 2 in Figure
3.6. In this case, the combination of movement and the alternating loss of colour
detail whilst the disc rotates on its axis constitute a reshaping of this object for inter-
actional purposes. The combination of these two features achieves the same kinds of
interpersonal and interactive purposes as do those described above in relation to the
rocket on the Nasa Kids home page. In both these cases and many others like them,
visual and kinesic changes or modifications that spread across the domain of the whole
object act as interpersonal signs of the ways in which viewers orient to, and interact
with, objects of this kind.

APPEAL

ORIENTATION

ACTION-POTENTIAL

Figure 3.12: Inter personal meaning potential of linked objects;


three simultaneous parameters
152 Multimodal Transcription and Text Analysis: Chapter 3

Overall, there are three aspects to the interpersonal meaning of such objects,
as in Figure 3.12. The feature APPEAL refers to the way in which changes in the
object function to attract the viewer’s attention; ORIENTATION refers to the per-
spective or stance the viewer is required to adopt on the object, e.g. is the object
part of a depicted scene or does it afford action potential?; ACTION refers to the
specific purposes that can be achieved by performing determinate actions on the
object, e.g. object change on mouse roll indicates the object can be clicked to link to
another web page.
Interpersonally, these changes index the interactive potential of the object. The
visual and kinesic changes that occur spread across the object as a whole; they enact
a kind of visual-kinesic prosody and serve to orient the user to it as an object that the
user can do something with. Clearly, the combination of bright colours (the uniformly
bright yellow) and catchy movement (e.g. the pulsating of Saturn’s rings) are far from
realistic. Instead, they have an appeal function; they seek to catch the user’s attention
and to orient the user to act in a certain way. The deformation of the object described
above also constitutes a remodalisation of the object, or the creation of a new point
of view on the object for the user/observer. The change in form goes hand-in-
hand with a shift in the observer’s perspective. The object is no longer part of an
overall scene, but an object to be acted upon and engaged with. Its potential for
action is now focal in this remodalisation. This example serves to illustrate a more
general point. The virtual world of hypertext is as much about the virtual points of
view of observers on the objects of this world as it is about the objects themselves.
Objects on the screen can be analysed in terms of the interaction potential
along a cline of possibilities ranging from the user having control over the object’s
action to the object acting independently of the user according to a computer
program. There are different degrees of user control and object autonomy in relation

Action Variable Object Description

– locomotion: + locomotion: + locomotion:


Locomotion – locomotion walk on overfly travel over Moon’s
Moon’s surface Moon’s surface surface
+ change of – change of
Change of + change of state: – change of
state: go to Layer state: go to
state go to Layer 2 state
2 linked page

Extraterrestrial
Example Moon rocket Banner rocket Moon buggy
creature

Table 3.4: Interaction potential of objects on the Nasa Kids home page
Textual meaning 153

to the object’s potential for locomotion and/or change of state, as set out in Table
3.4 In showing the correlation between the variables of user control and change of
state, Table 3.4 also shows objects which respond to clicking and to rollover, and
which change their state accordingly. Other objects do not respond in this way. The
mouse can be used to perform the following kinds of actions on objects:
(1) point to object;
(2) roll over object;
(3) click object.
In the case of (2) and (3), we can say that the object responds to the user’s
action. There is at least the simulation of a dialogic coordination of the two actions
in a larger-scale syntagm, which can be schematised as follows: Mouse Click ^ Object
Responds: Go to new page and Mouse Roll Over ^ Object Responds: Change form/colour.
Objects of this kind therefore have a response potential. This is so in two senses.
Their presence on the screen elicits the attention of the user and possible actions
on the part of the user. At the same time, the object itself has a repertoire of
responses when rolled over or clicked. These responses can, in turn, lead to the
development of further courses of action.

3. 9. 3. Textual meaning
Clickable objects can be described in terms of different layers as we demonstrated in
the previous subsection; they have a multi-layered type of organisation. Different
layers of an object are nested within each other. On the Nasa Kids home page, a given
object (e.g. the rocket on the Moon’s surface, the Earth, or the Moon base in Figure
3.2) are, on this first layer of their organisation, component parts of a larger visual
scene, i.e. the visual composition on the home page. As parts in this larger visual scene,
they have relative salience/weight and position and various kinds of relations to other
objects. However, on mouse rollover, another layer of the same object is revealed.
Figure 3.13 reveals the second layer in relation to the Moon rocket and the Earth.
This second layer is textually internally homogeneous, though less integrated
with, and partially demarcated from, the compositional whole to which the first

Figure 3.13: Objects that are clicked on to reveal a link


154 Multimodal Transcription and Text Analysis: Chapter 3

layer belongs. The three objects mentioned above – the Earth, the rocket on the
Moon’s surface, and Saturn – show this very well. However, each does so in slightly
different ways, at the same time that the three objects also have some features in
common with each other when the mouse is rolled over them, as shown in Table 3.5.
This partial demarcation with respect to the visual field of the first layer
accords the object a newly contingent autonomy; it now stands out from, and is
partially separated from, the surrounding visual field of the first layer by virtue of
the new characteristics that it takes on when rolled over. This second layer of the
same object is a transitional layer in two related senses:
(1) it only occurs during mouse rollover;
(2) it constitutes the potential of the object to open up a pathway to
another page.
This second layer is anaphoric to the previous state of the object (it links
back to Layer 1) and cataphoric to the next page (it points forward to Layer 3). A
transitional layer of this kind is textually ambivalent because the reader can hover,
so to speak, between ‘clicking’ and ‘not clicking’ before deciding which way to go.
A transitional layer of this kind and the change of state of the object that marks its
transitional status is therefore a decision-making moment on a potential pathway.
The three layers of the rocket as linked object can be described as follows:
[LAYER 1: LOCATION: HOME PAGE; OBJECT: ROCKET: COMPONENT IN
PRIMARY VISUAL FIELD; COLOUR: PRESENT; DETAIL: PRESENT]�
[LAYER 2: OBJECT: COLOUR: UNIFORM YELLOW; DETAIL: SCHEMATIC;
+ SUPERIMPOSED VERBAL CAPTION: Rockets & Airplanes ]�
[LAYER 3: Rockets menu page ].

Behaviour of Object: Behaviour of Object:


Object
without mouse rollover with mouse rollover

Earth movement: rotates + colour + detail no movement + uniform yellow + no


detail + superimposed verbal caption

movement: rings pulsate + uniform


Saturn no movement + colour + detail yellow + no detail + superimposed
verbal caption

no movement + uniform yellow + no


Rocket no movement + colour + detail
detail + superimposed verbal caption

Table 3.5: Nasa Kids home page: comparison of three objects


Textual meaning 155

Figure 3.14 schematises the layered organisation of web page objects such
as the one described above. Objects like the ones mentioned above undergo
change when rolled over in the way described in Table 3.5, which illustrates the
behaviour of objects with and without mouse rollover. Experientially, they are
bleached of the detail that is significant in relation to their place in the visual com-
position of the home page as a whole (Layer 1). At the same time, the verbal
caption is a further specification of the visual image (e.g. the rocket in Figure 3.13)
qua thematic node.
Textually, these objects, on account of the changes which occur in them
when rolled over, take on a new kind of visual salience. They are no longer just com-
ponents of an overall scene; the changes in them foreground their visual salience in
contrast to the visual scene to which they belonged in Layer 1. At the same time, the
objects tend to be separated off to some extent from the visual field to which they
previously belonged (Layer 1), thereby enhancing their cataphoric potential for link-
ing forward to new thematic domains in anticipation of a hypertext pathway which
can be accessed.
The superimposed verbal caption is especially important in this regard. The
fact that some of the changes recorded – e.g. the uniform yellow on rollover or the
appearance of the red of the verbal caption – mean that some features also serve to
link these objects on the basis of some features that they all have in common. That
is, the objects have covariate ties with each other in Layer 2 that are different from
those that tie them together on Layer 1. Textually, the objects exhibit the kinds of
textual linking relations shown in Table 3.6 on the following page.

Layer 1

Layer 2a Layer 2b

Layer 3a Layer 3b Layer 3c Layer 3d

Figure 3.14: Layered structure of textual objects


156 Multimodal Transcription and Text Analysis: Chapter 3

3. 10. The virtual world of hypertext

Hypertext, as the above examples from the Nasa Kids and the British Museum
Children’s COMPASS websites show, is very much a hybrid of precursor genres such
as verbal text, visual images, and multimodal combinations of these, on the one
hand, and the new meaning-making possibilities of the virtual environment of
hypertext, on the other. This is by no means surprising, though it has at times been
a source of critical confusion and disorientation. Furthermore, we must also face
the fact that many existing forms of hypertext are banal, commercialised recyclings
of already existing textual forms which are simply uploaded to a website.
Hypertext is a newly evolving system of semiotic possibilities. Yet this can only
come into being by taking over, reorganizing and reintegrating to its new forms of
organisation at least some of the possibilities of previously existing semiotic forma-
tions. This is so of all evolving systems.
In particular, hypertext foregrounds the virtual field of possibilities within
which the user can navigate and create multiple lines of connectivity among diverse
texts, web pages and websites. All acts of meaning making are created against a back-
ground of unrealised possibilities. A given choice in language, in depiction, in gesture,
and so on, and its combination with other choices from the same modality and from
other modalities is made against sets of alternative choices that were not actualised in
a given instance. Choices which are actualised constitute the syntagmatic environment
of a text; the sets of unactualised alternatives – the contrast sets – constitute the
paradigmatic environment of text (Saussure, 1993: 356).
The paradigmatic dimension of meaningful choice is a virtual system or field of
possible choices. Every choice that is actualised or realised in a text is made in relation

Type of Textual Tie Textual Function

transformed image points back to prior state


anaphoric
of image or home page

transformed image + verbal caption points forward to


cataphoric new thematic domain which can be accessed by clicking
object

transformed images share some features in common as


covariate members of a more general class of clickable objects with
interactive and thematic potential

Table 3.6: Textual links and functions in linked objects on Nasa Kids home page
The virtual world of hypertext 157

to, and always implies, this wider field of unactualised possibilities. The term virtual
therefore refers to the entire field of possibilities, which both influence the actual at
the same time that the virtual is itself a field of action and manipulation beyond the
merely actual.
Saussure defined the internal constitution of the language system (la langue ) in
terms of two orders of relations, i.e. syntagmatic and associative relations. Of asso-
ciative relations, he made the following observation:

The sum of the relations with the words which the mind
associates with words which are present is a virtual series,
a series formed in memory, a mnemonic series, as
opposed to enchainment, to the syntagm which is
formed by two units which are both present. It is an
effective series in opposition to the virtual series which
engenders other relations
(Saussure, 1993: 356).

Associative relations are a virtual series in memory because they are not
actualised as syntagmatic combinations of actually present linguistic units in a given
chain or sequence of units. Nevertheless, they influence the actually present
syntagmatic relation at the same time that they constitute a latent field of
possibilities which is able to engender other possible relationships. Hjelmslev (1961
[1943]: 39-40) postulated the notion of a language ‘without a text constructed in
that language’. Hjelmslev points out that such a language would be a possible
system, ‘but that no process belonging to it is present as realised. The textual process
is virtual.’ (1961 [1943]: 40). Again, we see how the virtual is contrasted with what is
actually realised as a field of possibilities.
The virtual environment of hypertext spatialises this field of possibilities as
one in which the user’s powers for creating both vertical and horizontal lines of
connection among the objects on a single web page, across pages and across web-
sites, and the textual objects and the multimodal combinations of texts, objects, and
actions that these connections afford is massively enhanced. The space of cyber-
space is not, of course, the same as the three-dimensional physical space in which
we live. Rather, it is a metaphorical space projected on the screen but which extends
beyond the screen, and within which the user performs actions and navigates path-
ways from one point in this space to another. In this perspective, hypertext is a
virtual field of possibilities which affords enhanced semiotic possibilities for:
 interaction with textual objects;
 personalising texts, combinations of texts, and so on according to
individual preferences and requirements;
158 Multimodal Transcription and Text Analysis: Chapter 3

� participating in, and creating in real-time, social networks that cut


across traditional hierarchies and power relations at the same time that
the space-time reach of the individual is massively extended;
� extending the possibilities for creating new forms of multimodal
objects, texts, and modes of action and interaction in ways which
change our ways of conceptualizing and performing cognitive
operations;
� embodied engagement with the screen world; human-computer nexus;
� the evolution of new genres and new meaning-making practices;

� increased orientation to semiotic abstraction, decontextualisation and


recontextualisation of meanings and texts across semiotic domains,
virtual participation in, and virtual modelling of abstract multimodal
objects.
Hypertext, with its multiple, parallel and overlapping connections and inter-
linkings, is often opposed to the sequentiality and linearity of (written) text. Indeed,
some have proclaimed the end of the cultural dominance of writing and its super-
seding by hypertext. However, this way of comparing hypertext to more traditional
genres of written text ignores the fact that the linear character of linguistic forms
realises many essentially nonlinear meaning relations in texts. A text is an instanti-
ation of selections from a vast and interconnected system of linguistic relations. It
is also a system of relations and interconnections among choices on many different
levels of organisation. Hypertext is a specific evolution of the same kind of
thinking. Rather than being a total departure from this thinking, we believe that the
novelty of hypertext lies in the way that it combines aspects of both system and
structure in diverse semiotic modalities as it unfolds along its trajectory.
For example, a Linked Object such as VISUAL ICON + VERBAL CAPTION:
Women sewing, a print, which was discussed above, is both a structural item in an
unfolding hypertextual trajectory as well as a node specifying in a highly condensed
way a paradigmatic class of elements which are subsumed by the node. The node
specifies a functional class of items, though it cannot specify all the specific
characteristics of the more delicate choices that are actualised if the node is clicked
on. Moreover, a node can be expanded into a more specific set of items by click-
ing on it and thereby actualizing the paradigmatic potential that is represented by the
node qua class of paradigmatic alternatives. Nodes are both paradigmatic resources
as well as ways in which resources typically get deployed and linked in syntagmatic
structures to create a certain kind of meaning relation.
Hypertext foregrounds to a much greater extent than many other forms of
discourse the dually paradigmatic and syntagmatic dimensions of meaning making.
The virtual world of hypertext 159

Furthermore, it does so by constantly calling the user’s attention to the dialectically


dual character of the hypertextual process: readers are constantly being made aware
of the process through often highly self-conscious metadiscursive strategies which
draw attention to the choices which can be made if a given object is selected and
clicked on. For example, in both the Nasa Kids and the British Museum Children’s
COMPASS home pages, clickable objects respond when rolled over by the mouse by
changing colour or form, becoming animated, and so on. Such devices are, to be sure,
interpersonally appealing and pleasurable. They are also metadiscursive markers in the
sense that they are models of the user’s own activity when using the potential of the
hypertextual system of relations. Hypertext evidences considerable self-reflexivity.
The open-ended character of hypertext can be understood in two ways.
First, there is potentially indefinite syntagmatic expansion as one moves from object
to object within a page, from page to page, and from website to website over the
time-span of a particular pathway. Secondly, there is paradigmatic expansion as one
opens up more and more links, explores their paradigmatic potential in the way
described above, and selectively integrates the meaning relations of the link that is
opened up to meaning relations on diverse scalar levels of organisation along the
trajectory up to this point. In this way, the user does not so much experience more
and more texts, though this is undoubtedly true; more accurately, the user creates
and experiences an expanding matrix or web of interconnected thematic regions,
texts, activities, genres, semiotic modalities, technologies and multiple, overlapping
hybrids of these on potentially very many scalar levels of semiotic organisation
ranging from the relations among objects and texts on a single page, a whole web
page, the connections across pages, the connections across websites, and so on.
Hypertext therefore challenges the strong classification and the strong framing of
traditional forms of pedagogy. It challenges and rearticulates to its own meaning-
making practices, the separation of thematic regions into specialised subject areas, as
well as the distribution of participant roles and the pacing or the timing of the
activities in, and through which, thematic areas are introduced, explained and
unpacked for apprentice learners. Hypertext is a mode of semiosis based on weak
classification and weak framing: the traditional boundaries between thematic regions
and semiotic formations give way to hybrid formations, often of a purely contingent
and idiosyncratic character, at the same time that the control of the timing and pacing
of the user’s pathway and associated activities is in the hands of the user.
Moreover, the immediacy of the objects on the screen, their response time
when clicked on, and the possibility of real-time communication with other web users,
reduces the semiotic distance between experience and reflection. The user is
integrated with the computer as a functioning component of a wider social system of
relations and practices on diverse space-time scales. Moreover, this integration is based
on both sensori-motor and cognitive operations and activities: the use of the mouse
to select and click on screen objects, the responses of these objects, the pickup of
160 Multimodal Transcription and Text Analysis: Chapter 3

audio and visual stimulus information about these objects, the ways in which objects
index activities and participant roles for the user, the user’s taking up of these roles
and activities and the possibility of modelling new forms of experience, and of com-
bining and pooling experiences in new ways, interweaves the user’s embodied engage-
ment with the screen on the here-now scale, the making of a particular hypertext path-
way over minutes, hours, days and so on, and the virtual, forever expanding resources
of the web as a whole into a newly emergent level of organisation in our ecosocial
semiotic system. This level is made possible by the technological infrastructure which
supports it, though it is neither reducible to, nor caused by, this level alone. Rather, the
web is a newly emergent level of social semiotic organisation; it lies between
previously existing levels of social, cultural and technological organisation, as shown
in the three-level hierarchy of relations presented below.

L+1: Technological infrastructure of WWW and the socio-cultural condi-


tions of its enablement;
L: Newly emergent meaning-making practices and genres: hypertext,
hypermedia, email;
L-1: Precursor and still existing meaning-making practices, genres and
technologies: spoken and written genres, print and electronic (radio,
TV) genres.

The middle level, L, does not simply emerge from anywhere. Instead, it brings
about the reorganisation and integration of precursor meanings, texts and genres to
its own forms of organisation and the new possibilities that these afford. At the same
time, a technological infrastructure and the social and cultural conditions which enable
this must already be in place on a still higher level of organisation (L+1 ) in order to
provide a higher-scalar environment for the intermediate level. The lower level (L-1 ),
i.e. the already existing practices and technologies, constitutes both the initiating and
enabling conditions for the new intermediate level as well as the affordances, both
material and semiotic, which make the newly emergent level possible.
It is important to distinguish between the directionality and sequentiality of a
reading path and the nonlinear character of many of the meaning relations that the
reader construes when reading a written text. The visual-spatial organisation of
written text gives structure to determinate, not necessarily linear reading paths. At the
same time, it affords the reader multiple possibilities for accessing the page and its
potential meanings in a variety of places and ways. The visual-spatial organisation of
the page, in other words, enables jumping around (cf. cluster hopping as discussed in
Chapter 1 ). Typically, written genres have a default beginning-middle-end type of
organisation which structures the directionality of the reading path (3.4, pp. 118-
119). Hypertext, on the other hand, has a much looser, more open-ended type of
Community or social network of users and practices? 161

organisation. Hypertext is a further evolution of this tendency, already evidenced in


writing, not its total replacement. There is no single reading path or sequential
organisation that can be defined as the default sequence. This is also so because,
unlike a written text, the user does not encounter an already made text when
embarking upon a hypertext trajectory. Instead, the user is creating (authoring) his or
her personalised text in the process of navigating a pathway through the web.
A written text is a synoptic product, a semiotic object-text, which can be
integrated to various kinds of social activities in, and through which, its potential
meanings are activated. The visual-spatial organisation of written text does, of
course, require the text to be integrated to temporal processes of visual scanning,
interpreting and so on, and in this sense written text is integrated to temporal
processes. A written text is a textual product which can be used in various ways; in
this sense, it is activated as a dynamic process in time. In the case of hypertext, there
is no pregiven object-text that can be picked up, so to speak, and integrated to various
types of social activity. Rather, the creation of a particular hypertextual trajectory
takes place in and through the user’s activity of navigating among objects, texts, web
pages and websites. The greater degree of freedom in the ways in which these are
linked and combined with each other as the user creates a pathway through the web
means that hypertext foregrounds the process of textual creation, rather than its
products qua object-text. To be sure, there are many already-made textual products
(e.g. written texts, visual images, objects and so on) that one encounters while
creating a hypertext pathway. Nevertheless, the creation of the pathway is itself the
process of hypertext creation rather than the various textual products that can be
combined, interlinked and integrated to the evolving hypertext trajectory during the
course of its creation by the user. Moreover, the fact that web pages are dynamically
assembled by computer programs rather than being pre-constituted pages qua object-
texts means that, unless the hypertext trajectory is recorded by the computer’s mem-
ory, or otherwise transcribed in some way, it is an ephemeral process which is dissi-
pated at the moment that the user terminates the trajectory he or she has created.

3. 11. Community or social network of users and practices?

What are the ways in which websites build up a community of users? A community
is created and defined by the practices of its members, the links made between
practices, and the meanings that are made through these practices and the connec-
tions between different practices. The participants in a practice take up and per-
form various participant roles in the activities which they engage in. This combi-
nation of an activity and its associated participant roles may suffice as a first
definition of a practice. The notion of a community has often been seen in fairly ide-
alised terms as implying a homogeneous or uniform set of shared values and
meanings. To avoid this implication, we would propose the term social network in
162 Multimodal Transcription and Text Analysis: Chapter 3

order to explore the ways in which the website brings different people together in
a network of practices and meanings which tie these people or institutions in ways
that may be transient or stable to varying degrees over time. A social network
requires some kind of technological infrastructure which makes it possible for people
to connect with each other as members of the social network and participate in its
practices and to share in its meanings. The technological infrastructure of the web
enables the individual PC user to create links across space and time in the real-time
interaction between the user and his or her computer screen.
The web page as a site for linking participants as members of a wider social
network, however ephemeral or ongoing the commitment of individual
participants may be, can be understood in terms of the following parameters:
� the practices, activities, and participant roles that are internal to the
website and its semiotic resources;
� the ways in which the website is a recontextualisation of
practices, activities, and participant roles that derive from outside
the web page itself in other domains of social life;
� the ways in which the web page builds connections with other
texts and other media (for example, a website may guide the user
to books, magazines, TV programmes and so on, which relate to
the meanings and activities of the website itself.)
The website displays considerable heteroglossia (Bakhtin, 1981 [1975]). That is, it
gives voice to a rich diversity of meanings and social values and the ways in which
participants may orient to and deploy these in their meaning-making activity when
they engage with a website and its resources. In this respect, at the beginning of
this chapter we referred the reader to various types of information websites (Table
3.1). Using the analytical tools that we have provided above, the reader may now
care to consider the ways in which the reading pathways found in other types of
information wesbites can be analysed. In some cases, the transcription will highlight
the relatively fixed nature of trajectories in which the steps are preordained (e.g.
those sites whose provision of information is closely linked to specific details, e.g.
publication details of a particular book, trains between London and Glasgow on
Sunday mornings) while in other cases (e.g. special interest sites) the likelihood of
this happening is much smaller.

3. 12. The WWW as technological infrastructure and meaning-making resource

Internet and the World Wide Web afford the integration of texts and meanings
across space and time on a previously unprecedented scale. Internet and the net-
worked multimedia personal computer together provide the technological infra-
structure for new forms of accumulation of knowledge and experience through
The WWW as technological infrastructure and meaning-making resource 163

the integration of individual experiences, individual meanings and actions on a


scale that no individual alone can possibly attain. In very many different domains
of human social and cultural life, the web integrates many diverse individual expe-
riences, knowledge gained, texts created of events that have occurred, geographical
journeys undertaken, as well as experience and knowledge of the web itself and the
virtual journeys which it affords in the hypertext environment. The web is part of
the complex social and cultural environment in which many humans in all parts of
the world live, work, teach, learn and play. At the same time, the web is embedded
in and is a further transformation and evolution of already existing socio-cultural
genres, practices and meanings whilst also bringing into existence newly evolving
genres, practices and meanings.
It is important to remember that the environment is not something which is
already given and to which a given species must adapt or be fitted; rather, the
environment of a species or of an individual organism is constructed by the
activities of the organism (Gibson, 1986 [1979]; Lewontin, 2001 [2000]: 48). The
environment of an organism is the set of external conditions and affordances
which allow it to have meaningful interactions with the world that surrounds it.
Organisms selectively act on, and interact with, their surroundings in such ways
that aspects of the world outside them are relevant or useful for making their envi-
ronments. Internet is an environment in this sense. This is so both from the per-
spective of the material affordances that it makes possible and from the perspec-
tive of its semiotic potential for making meanings and interacting with others through
the creation of social networks which the technological infrastructure makes
possible. The two perspectives go together: the technological infrastructure makes
possible the creation of social activities, ways of making meaning and forms of
human interaction at the same time that these meanings, activities and so on, can-
not exist and cannot take place in time without the support of the material
processes that subtend them.
The environment of cyberspace and the screen interface in, and through
which, we interact with others leads to an increasing homogenisation of texts,
images, objects, and so on, and their functions in the virtual world of cyberspace.
There is a concomitant emphasis on formal operations performed on abstract
objects of all kinds. This goes hand-in-hand with the increasing integration of the
body-brain complex of the individual with the virtual environment of cyberspace
and the displacement of bodily movements and sensations into their electronic
extensions. It also shows a drastic reduction of the space-time scales of meaning
making: the ecosocial space-time is compressed into the virtual space of the screen
world at the same time that time is remodelled in terms of the short time-spans
involved in navigating from link to link in this virtual environment (see Inset 3, pp.
18-19). Internet is both a technological infrastructure which connects its users and
texts in a vast global network at the same time that it is a resource for the semiotic
integration of meanings and texts – both individual and institutional – that no
computer user on his or her own could achieve. The multimedia personal computer
is a technological artifact, itself the product of complex and historically-evolving
social and technological processes, which affords the individual user participation
in these meanings and practices through the semiotic mediation of the activities
which integrate user, personal computer and Internet. The semiotic potential of
Internet can be likened to a new form of ‘oral tradition’ in which the meanings of
individuals are shared and pooled across space and time as a new form of collec-
tive knowledge and consciousness on an unprecedent scale.

3. 13. Conclusion

So far in this book we have proceeded in a steplike progression vis-à-vis the


analytical task of isolating the meaning-making units existing on different meaning
scales within multimodal texts. Our focus has also been directed to the task of
describing their nature and inter-relationships. In this chapter, we have contributed
to these objectives by attempting to provide answers to the many questions about
web pages and wesbites raised in the Introduction to this chapter and to those posed
within the chapter’s various sections. If at times incomplete answers have been
given, this is because much remains to be done as regards establishing a multi-
layered model of multimodal text analysis and transcription. In recalling that a basic
objective of this book is to outline the scaffolding for a scalar model of multimodal
text analysis and to describe the tools required to achieve this goal, we need to
point out that, as in the previous chapters, we have given sample transcriptions that
provide in-depth analyses of small sections of texts rather than attempting exhaus-
tive analyses of entire websites.
As befits a coursebook and toolkit (see Preface ), we are now ready, in the
concluding chapter, to take the final step of analysing and transcribing the inter-
play of hierarchically-distributed meaning-making processes by providing a
detailed micro-level analysis of complete texts (see Appendix I and Appendix II ).
Even so, our primary concern will still be with demonstrating how a scalar model
helps us understand the meaning-making processes at work in specific texts. In the
course of the chapter, we will be at pains to reassert the inherently selective nature
of multimodal text analysis and transcription. Moreover, as we shall see, the
analysis of film texts and genres will require us to add a few last tools to our toolkit
and to reflect further on the nature of multimodal semiosis.
Chapter 4

Film texts and genres


4. 0. Introduction

We may begin this chapter by reflecting on context and the recontextualisation of


texts across contexts (see 3.7, pp. 130-136 and Inset 17, p. 213). After the analysis and
transcription of websites, the television advertisements discussed in this chapter may
at first sight appear to be relatively rigid artifacts, whose meaning is fixed for all time.
However, even the process of making a video recording of a television advertisement
and its subsequent viewing is itself an act of selective recontextualisation of a prior
semiotic event. The text’s relations to specific social and historical events, other televi-
sion programmes, the time of day it was broadcast, the specific viewers of the text dur-
ing the historical period of its original broadcasting, may all be relevant to the under-
standing of the text in relation to its wider context of culture (see Inset 1, p. 2). These
issues may not be relevant to the immediate problems of developing suitable
transcription techniques and procedures but may need to be in some way annotated
or otherwise described as ethnographically relevant information for the purposes of
the subsequent uses of the text in what may be very different cultural and historical
circumstances.
The second of our texts, the Westpac text, was, for example, broadcast in
Australia in early 1983 when the Westpac Banking Corporation was formed as the
result of a merger of a number of other banks. The immediate context of the
Westpac advertisement is thus the ideological legitimation of the merger and the
promotion of the services provided by this new bank, formed in the wake of the
election to power in March 1983 of the Australian Labor Party (ALP), which
ushered in an extensive restructuring of the Australian economy. Thus any database
– computerised or otherwise – of multimodal texts needs to devise appropriate pro-
cedures for the documenting of relevant aspects of the context of culture of a text.
In this respect, the TV advertisement is a constantly evolving genre whose
length is, in itself, a clue to its historical period. In his analysis of a corpus of drinks
advertisements, Turrisendo (2004), for example, shows how TV film advertisements
for drinks have progressively become briefer in the last fifty years. An improved
capacity to apply the meaning-compression principle (see Inset 3, part b on meaning
compression , p. 19) may be one reason for this. Indeed, the first of the three texts
166 Multimodal Transcription and Text Analysis: Chapter 4

analysed in this chapter, the Audi Eskimo advertisement dating from 1997, is only 27
seconds long but still tells a complete story. It is by far the shortest of the three texts
analysed. The second text, the Westpac television advertisement, which dates from
1983, lasts for 60 seconds. Though very different, as we shall see, in its internal
organisation, its narrative complexity is roughly comparable to the final text, the
Mitsubishi Carisma advertisement that we discussed briefly in 1.6, pp. 46-54. The
latter text, which dates from roughly the same period as the Eskimo text, is again
much shorter, with a total air time of 40 seconds.
In our analysis of the three texts we follow the practice partly adopted in the
preceding chapters of giving different types of transcriptions according to the
analytical goals being pursued. In some cases these goals presuppose a low magnifi-
cation that endeavours to capture the basic structures of the text. This requires a
macro-analytical approach to transcription, i.e. one which attempts to capture the
meaning-making processes of complete texts in terms of the links between the
various subunits that make up a text: principally clusters, phases and transitivity frames.
Thus, in the Eskimo text the transcription is concerned with reconstructing the
text’s transitivity structure (see Baldry, Thibault, 2005). It takes into account the rela-
tionships existing between two or three shots to produce a transitivity frame and in
turn relates the transitivity frame to the text’s phasal organisation.
However, the analytical goals will often require a much higher level of magni-
fication. This is the case with the analysis of the remaining two texts, where we are
concerned to provide a more exhaustive study that brings together a macro-
transcription, concerned with the interplay between the texts’ phases, and a micro-
transcription, concerned instead with a detailed description of the semiotic resources
used in the meaning-making process (cf. Figures 1.5a and 1.5b). The two types of
transcription fulfil different functions but are complementary and can be interwoven.
In other words, the distinction between various types of transcription is only a ques-
tion of methodological convenience. Rather, as our analysis, in particular of the third
text, demonstrates, the purpose of this concluding chapter is to show, albeit partially,
how multimodal text analysis and multimodal transcription can be combined in order
to develop insights concerning the ways in which meaning-making resources on the
levels of Hjelmslev’s (1961 [1943]) expression and content strata are integrated to the
discourse level of organisation in multimodal texts (see Inset 18, pp. 236-237).
A multimodal transcription of a television advertisement is an entextualized
artifact which the analyst extracts from the prior discourse practices in which the
broadcast text is embedded at the same time that the analyst embeds it in the new
discourse practices of transcription and analysis. The transcription represents an
attempt to make claims about a prior discourse at the same time that it is embedded
in, and appropriated by, new discourses. The video recording of a television
advertisement is itself an entextualized artifact that we extract and appropriate
from the ‘original’ context of broadcasting. The video text is then transferred to
The Eskimo text: a macro-analytical approach to transcription 167

and transformed by the transcription text and its associated practices. Rather than
say that texts hook up to a situational context, we see how the entextualization of dis-
courses as text artifacts, including transcriptions, affords the lifting of texts out of
one context and their appropriation by, and transference to, other (con)texts and
practices (Thibault, 1994). Transcription, which is both analytical activity and
textual record of this activity, must always keep this in mind when addressing the
question as to what it is we are studying. Practices create the context, not the formal
patterns and meaning relations in the text per se. In this view, the discourse level of
organisation is the point of intersection or the interface between practices and the
formal patterns and meaning relations that we identify as the global level of discourse
organisation (see Table 4.1, p. 226).

4. 1. The Eskimo text: a macro-analytical approach to transcription

We have discussed the relevance of the systemic-functional tradition in this book (see
also Baldry, 2000b, 2004; Thibault, 2004) to describe short sequences of dynamic video
texts in terms of the relationship between phases and metafunctions (see 1.4, pp. 38-
44 and 1.6 , pp. 46-54). The transcription in Figures 4.1a and 4.1b of the Audi Quattro
Eskimo advertisement again shows the relevance of the metafunctions in two princi-
pal ways. First, a given modality (e.g. gaze ) can be described in metafunctional terms.
Thus, gaze has experiential, interpersonal and textual dimensions of organisation and
meaning. Figures 4.1a and 4.1b describe the experiential dimension of gaze in the
text in terms of transitivity frames (1-8 ). They foreground the notion of a transitivity
frame as a functional semiotic unit in which the relations between participants,
process and circumstances are realised in gaze (Baldry, Thibault, 2005). It also illus-
trates ways in which options in gaze are integrated with other meaning-making
resources. However, gaze can also be used, inter personally, to engage an
interlocutor. Textually, it typically indexes a phoric (indexical) relation between the
gazer and the object of the gaze in ways which can be interpreted by an observer
such that the gazer’s intentions can be inferred. Though not found in the present
example, this option is, nevertheless, attested in the Westpac text (see 4.4, pp.184-
186 and 4.7, pp. 191-202). In the current text, instead, all of the instances analysed
are used to align the TV viewer to the Phenomenon to which the Gazer (man or boy)
directs his gaze. Gaze can also be modulated by facial expressions, eyebrow move-
ments and other factors to signal attitudinal and affective modification of the gaze
syntagm. Gaze, as noted above, expresses textual meaning, serving, in particular, to
create phoric links to relevant objects in the perceptual purview of interlocutors,
either alone or in conjunction with other resources such as pointing (see Frame 5a
in Figure 4.1b). Second, as we saw in Chapter 1, the metafunctions can guide us in
terms of the resource integration principle (see Inset 3, pp. 18-19). They function as
an integrating principle showing how different semiotic resources are codeployed.
TRANSITIVITY (1) ENGAGE: panorama (2) REACT TO: object (3) ENGAGE: other person (4) ENGAGE: Object
FRAMES
168

Visual Image

Experiential Phenomenon: Gazer: old man; Phenomenon: Gazer: boy Gazer: boy Phenomenon: man Gazer: old man
Artic landscape Process: gaze wolf print in Process: gaze Process: gaze Process: gaze vector:
meaning in Gaze
vector: unfocused: snow vector: focused: vector: focused: focused: specific
non-specific: specific: specific: engage object: downwards
surveys landscape downwards to other: upwards Phenomenon: implied
paw print towards man (off-screen) wolf print
in snow
Distance Far close-up head and close very close: close: head-and- very close: head very close: head only
shoulders head-and- shoulders only
shoulders

Vertical/ Median/ Frontal High: looking Oblique high (looking Median/ Median/ Oblique Median/ frontal Median/ frontal
down/oblique down)/ frontal frontal
Horizontal Angle

Body Movement none slow head turn left none boy turns head none none
to right towards old man
Multimodal Transcription and Text Analysis: Chapter 4

Gaze Focus old man implied: wolf print wolf print in 2b


in 2a.

Language utterance ‘marou’ +


Italian subtitle ‘lupo’
[= wolf]

Transition cut to 1b; cut aligns viewer to old man…. ….and his cut to 2b: cut cut to 3b: cut
aligns viewer to

Figures 4.1a and 4.1b: Transitivity frames in the Eskimo advertisement


panorama aligns viewer to
boy’s gaze
boy’s gaze
TRANSITIVITY Frame 5: ENGAGE: object Frame 6: ENGAGE: object + body part Frame 7: ENGAGE: object + body Frame 8: ENGAGE: other person
FRAME part

Visual
Image

Experiential Gazer: man Phenomenon: off- Gazer: man Phenomenon: Gazer: man Phenomenon: Gazer: man Gazer: man Gazer: boy
Process: gaze screen, then shown Process: gaze hand reaching to Process: gaze snow in man’s Process: vector + Process: gaze Process: gaze
meaning in
vector: focused: as they are walking vector: focused: pick up snow + vector: focused: hand head turn: engage vector: engage vector + nod:
Gaze specific: away specific: downwards Audi tyre tracks in specific object: held with boy with boy react to +
downwards to snow in hand: scrutinising: Phenomenon: boy Phenomenon: recognition
bear print downwards boy: off-screen Phenomenon:
man (off-screen
Textual cut from Gazer to Phenomenon zoom in from Gazer to Phenomenon cut from Gazer to Phenomenon Very cut from Gazer to Phenomenon
Close Shot: close identification with man; Very Close Shot: focus on key role of man as teacher
meaning in
high intensity of gaze
Gaze

medium close: far: man and boy in medium close very close: less than very close close: head-and- very close: head close: head and
face and upper distance walking whole face shoulders + face only shoulders
Distance
body away from bear
print in foreground

Vertical/ Median/frontal Median/frontal Median/oblique high: looking Median/frontal high: looking Median/oblique Median/frontal Median frontal
down/frontal down/ frontal
Horiz. Angle

Body hand point to man and boy man bends down man’s hand picks up none hand holds snow man turns towards none boy nods in
movement Phenomenon walking away from and reaches towards snow in hand for boy response to
of Gaze Vector bear print snow on ground inspection by man’s saying
man ‘Audi Quattro’;
Gaze Focus implied; bear None implied: Audi hand picking up implied: snow held as in 7a man to boy + boy man to boy: Implied (8b):
print (shot 5b) Quattro track in snow in hand (7b) to man implied (8a + 8c) boy to man
snow (6b)
utterance: ‘Audi Quattro’:
Language ‘hanou’; + subtitle +
Italian subtitle: utterance ‘Audi
‘orso’ [‘bear’] Quattro’

cut cut to 8b: cut to


cut to 5b; cut to 6b + zoom: cut to 7b: cut aligns cuttoto8b
8b cut to 8c: cut
The Eskimo text: a macro-analytical approach to transcription

8c: cut aligns


Transition cut aligns viewer cut aligns viewer viewer to old man’s aligns viewer to
viewer to man's
to object of gaze to man’s gaze + gaze man’s gaze.
gaze.
vector in 5a completes action
169

of reaching for
170 Multimodal Transcription and Text Analysis: Chapter 4

In the Eskimo text, an adult Eskimo identifies animals for a small Eskimo boy
from paw prints left in the snow, the final ‘paw’ print being that of the Audi Quattro
car. The teacher-learner relationship between old man and young boy is
foregrounded. Selections and combinations from the resources mentioned above, as
presented in the transcription, highlight the participant roles of the man and boy as
well as functioning to mark the status of the two participants in relation to the
activities which they are presented as performing together. The subphase compris-
ing Frames 2, 3, and 4 illustrates this. The gaze transitivity frame in 2-3 has established
the boy as the discoverer of the paw print (Frame 2 ) in the snow. The close camera
distance and the high vertical angle project the boy as looking down on, and engaging
with, the paw print in the snow as an object of interest. In Frame 3, the close shot
of the boy’s face, shown frontally even though the boy’s gaze is directed at the paw
print and not at the viewer, creates a close interpersonal involvement between the
TV viewer and the boy, and by implication with the object with which the boy is
engaging. The viewer is asked to enter into the boy’s world, through his eyes.
In Frame 3a, the boy’s gaze is directed at the man. The oblique (not frontal)
horizontal angle and the fact that his gaze is directed upwards towards the man
indexes the differential status of the two participants, i.e. the boy is marked as the
subordinate or apprentice participant who defers to the man’s superior knowledge.
The very close shot of the man in Frame 3b, the frontal horizontal angle, and his
disengagement from the viewer by virtue of the downwards direction of the gaze
vector (to the implied wolf print) and his intense engagement with the Phenomenon
of his gaze vector, as shown by the partly closed eyelids and the fixed facial expres-
sion, all function to mark his teacher status as one who knows at the same time
that he pronounces the word for wolf in Frame 4 in the process of identifying the
paw print that the boy had found.
In Frame 8a-c, the interpersonal dimension of gaze is more prominent. In
Frame 8a, the gaze transitivity combines with the man’s head turn, as the man shifts
his focus from the type of track in Frame 7b to the boy in Frame 8a. The close dis-
tance between viewer and man, the median vertical angle, and the oblique horizon-
tal angle, the foregrounding of the man and the backgrounding of the boy all work
together to indicate the interpersonal significance of this gaze frame at the same
time that the status relation is also clearly marked. In Frame 8a, this combination
of features reinforces the dyadic link between man and boy and hence the signifi-
cance of gaze in enacting the interpersonal relation between the two participants,
at the same time that the viewer is signalled as being an onlooker and not a party
to this. The cut to the very close shot of the man’s face in Frame 8b as he utters the
words Audi Quattro focuses on his role as authority figure as he passes on the
results of his finding to the boy. The boy’s role as learner and apprentice is shown,
in Frame 8c, both through the reciprocal contact established by the jointly shared
gaze vector which was initiated in Frame 8a and by the boy’s nodding in
Gazer:
Gazer: Agent^Process^
Agent^Process^ Phenomenon
Phenomenon 1.1.1.1
1.1.1.1
Engage
Engage with
Engagewith
with 1.1.1
1.1.1 Gazer:
Gazer: Affected^Process^
Affected^Process^ Phenomenon
Phenomenon 1.1.1.2
1.1.1.2
Engage with 1.1.1
Control
Control
Control
1.1
1.1 Gazer^Process^
Gazer^Process^ Phenomenon:
Phenomenon: Agent
Agent 1.1.2.1
1.1.2.1
1.1 React
React to
to 1.1.2
1.1.2
React to 1.1.2
Gazer^Process^
Gazer^Process^ Phenomenon:
Phenomenon: Affected
Affected 1.1.2.2
1.1.2.2

Up 1.2.1 Insert
Insert
Up 1.2.1 Insert
Up 1.2.1 Frame 1: 1.1.1.1
1.1.1.1 +1.2.1 1.3.1
1.2.1 +1.3.1 A A
Direction
Direction Frame
Frame 1: 1: 1.1.1.1 ++ 1.2.1 + + 1.3.1 A
GAZE
GAZE
GAZE Direction Level
Level 1.2.2
1.2.2 1.3.2 B
Frame 2:
Frame 1.1.2.1
1.1.2.1 +1.2.31.2.3 +1.3.2 B
GAZE 1.2
1.2 Level 1.2.2 Frame 2: 2: 1.1.2.1 ++ 1.2.3 + + 1.3.2 B
1.2 Frame 3: 1.3.2 C
Down
Down 1.2.3
1.2.3 Frame
Frame 3: 3: 1.1.1.1
1.1.1.1 +1.2.1
1.1.1.1 +
1.2.1 +1.3.2
+ 1.2.1 + + 1.3.2 C
C
Down 1.2.3 Frame 4: 1.3.2 B
Frame
Frame 4: 4: 1.1.1.1
1.1.1.1 +1.2.3
1.1.1.1 +
1.2.3 +1.3.2
+ 1.2.3 + + 1.3.2 B
B
Far
Far 1.3.1
1.3.1 Frame 5: 1.3.2 B
Frame
Frame 5: 5: 1.1.1.1
1.1.1.1 +1.2.3
1.1.1.1 +
1.2.3 +1.3.2
+ 1.2.3 + + 1.3.2 B
B
Far
Distance
Distance Frame 6: 1.1.1.1
1.1.1.1 +1.2.3 1.3.3
1.2.3 +1.3.3 D D
Medium
Frame
Frame 6: 6: 1.1.1.1 ++ 1.2.3 + + 1.3.3 D
Distance Medium 1.3.2
1.3.2
1.3
1.3 Medium 1.3.2 Frame 7: 1.1.1.1
1.1.1.1 +1.2.3 1.3.3
1.2.3 +1.3.3 D D
Frame
Frame 7: 7: 1.1.1.1 ++ 1.2.3 + + 1.3.3 D
1.3
Close
Close 1.3.3 Frame 8: 1.1.1.1
1.1.1.1 +1.2.3 1.3.3
1.2.3 +1.3.3 D D
1.3.3 Frame
Frame 8: 8: 1.1.1.1 ++ 1.2.3 + + 1.3.3 D
Close
NoNo te s : 4 differenttypes types ofgaze
gaze combinationareareinstantiated this
instantiated inthis
No tete ss:: 44 different
different types of
of gaze combination
combination are instantiated in in this
text.
text.InInFrameFrame4,4,the
thephenomenon,
phenomenon,present
presentininFrame
Frame2,2,is isnow off-
nowoff-
text. In Frame 4, the phenomenon, present in Frame 2, is now off-
screen
screen andimplied.implied.
screen and and implied.
The Eskimo text: a macro-analytical approach to transcription

Figure 4.2: A (revised) preliminary network for gaze in visual texts; primary delicacy only
171
172 Multimodal Transcription and Text Analysis: Chapter 4

acknowledgement of the man’s words. In Frame 8a-c, the close distance and the
frontal horizontal angle, in conjunction with the fact that the textual participants do
not directly address the viewer in any modality (e.g. gaze, language), here function
cross-modally to create a high degree of interpersonal rapport between the man and
the boy. Once again, the viewer is an onlooker and not a participant in this exchange.
As the focus on transitivity frames in this presentation suggests (see Inset 11,
p. 122), we are increasingly concerned with multimodal transcription as a text
analysis tool capable of embracing both system and instance (Baldry, Thibault,
2001; 2005). For example, a number of other options, which we have grouped
together under body movement (e.g. pointing, walking, head turning, nodding)
typically interact with gaze in relevant ways, as is the case in this advertisement.
Moreover, we have drawn attention to some of the ways in which camera position
(CP), interpersonal distance (i.e. the viewer’s distance from participants in the text),
Inset 13: System and instance
�The system and instance perspectives on language and other semiotic systems is not
a matter of the two poles of a simple dichotomy. The two perspectives are not
opposed to each other in this way. Instead, they refer to two very different time-
scales and, therefore, two different perspectives from the point of view of observers
on very different time-scales (Halliday, 1992). Language-as-system evolves on the evo-
lutionary time-scale; change on this scale therefore accumulates and becomes evi-
dent on time-scales that may greatly exceed that of the individual’s life time. Change
and forms of organisation on this scale are nevertheless the cumulative result of
very many interactions among components on smaller scales, such as the instance
perspective. The latter perspective is the human scale of text, the human activities in
which texts are created and used, and the participants in those activities.

�Text is the instantiation of some aspects of the system’s overall potential in a


particular context. We are, however, able to put together a picture of the system on
the basis of our samplings of very many instances used by many different people in
different contexts. Texts therefore have a dual status: they function and have
meaning for their users on the scale of the contexts in which they are made, used,
interpreted, and so on; at the same time, textual forms and functions have a history
and are the products of processes – both material and semiotic – on time-scales that
are far greater than that of the human scales in which we encounter them.

�Moreover, texts are not simply the instantiations of choices from an abstract lan-
guage system. Texts are also resources which we use to make meanings in different
contexts, to create links with other times and places, with other texts and so on.
Furthermore, the production and interpretation of texts involve a constant dialectic
between the virtual constraints of the language system and the dynamic constraints
of the text as it unfolds and develops in real time (see also Beaugrande, 1997: 11).
Constraints of the latter kind relate to the kind of social situation – the context –
which the participants understand themselves to be participating in.
The Eskimo text: a macro-analytical approach to transcription & Inset 13 173

and the means for effecting transitions between shots would typically appear to
interact with gaze. In this way, we begin to see how meaning-making resources that
belong to the production and editing of video texts (camera position and distance, cut-
ting between shots ) interact in patterned ways with the meaning-making resources of
the human body (e.g. gaze, movement, speaking, pointing ).
Figure 4.2 presents some basic distinctions in the system of gaze. It focuses in
particular on transitivity relations expressed by gaze with a view to specifying the ways
in which some basic gaze schemas are realised. Figure 4.5, which is also concerned with
options in the system of gaze, is more general in focus and sets out a range of param-
eters relevant to gaze, without, however, attempting to specify the realizations that
characterise particular gaze syntagms. In 4.2 (on the next page), we propose a detailed
microlevel multimodal transcription of the Westpac advertisement (see Appendix I).

Inset 13: (continued)


�Semiotic systems are not reducible to an abstract system of pure values or differences
per se. Texts are dynamic processes rather than static entities. The material object-text,
i.e. the sound waves produced in articulation or the visual tracings arranged on a
surface such as a page, are just the tip of the iceberg, as Beaugrande (1997: 11) has
expressed it. A text is itself a complex system of relations involving many different
systems of different kinds and different levels of organisation as well as the linkages
across levels. These systems and relations include sounds, visual tracings,
lexicogrammatical units, visual transitivity frames, images, gestures, phases, discourse
structures, genre structures, social activities, users’ plans and goals and so on.

�Finally, texts themselves are not well served by a flat model of their organisation.
They are organised on a number of different scalar levels of organisation. With
respect to video texts such as television advertisements, we have proposed in this
book a number of such levels, e.g. the visual transitivity frame, the shot, the phase,
the macrophase, and the global organisation of the text as a whole, including its
generic structure. This means that selections from paradigmatic systems of alterna-
tive choices are made on many different scalar levels, and in ways which affect choices
both on their own level and on other levels.

�For example, choices in visual transitivity in video texts interact with choices in shots
and the sequencing of shots to form still larger phases as the text develops in time.
The choices made on one level, at one point in the unfolding text, affect and antici-
pate choices still to be made, just as they may alter the significance of previously
made choices. For these reasons, the relationship between system and instance is a
complex one involving many different interacting factors. Neither system nor
instance are static entities: they both involve change and dynamic processes on their
respective time scales. At the same time, texts are always linked to the larger-scale
processes of the system, its history and the culture with which the system has
coevolved and in which it is embedded.
174 Multimodal Transcription and Text Analysis: Chapter 4

4. 2. The Westpac text: a micro-analytical approach to transcription

Appendix I presents a transcription based on six vertical columns: (1 ) Time; (2)


Visual Frame; (3) Visual Image; (4) Kinesic Action; (5) Soundtrack; and (6)
Metafunctional Interpretation: phases and subphases. Each of these entries corresponds
to, and heads, a specific vertical column. To each vertical column there is assigned a
particular type of element or cluster of such elements which belong to the same class
of item. The prime significance of each of these entries is discussed below.
Column 1 specifies the time in seconds of the video recording. This was
determined by the time indicator in the Windows Media Player instrument. Its accu-
racy may be checked simply by sliding the progress bar with the aid of the mouse
from one frame to the next. However, Column 1 is also important in a second, related
way. The numbers also serve to identify the horizontal row with which the time spec-
ification correlates, referred to on this basis as Row 1, and so on. The horizontal rows
so identified have important integrating and cross-referencing functions. First, the
row specifies that all of the features identified in each of the columns belonging to
that row are temporally correlated. In other words, it provides an effective method for
showing which specific semiotic modalities copattern at particular moments in the
text. For example, Row 26 specifies that the word moving, as sung by the female soloist,
coincides with the appearance of the schoolgirl in Shot 10, who leans forward towards
the viewer. Second, in referring to the rows in this way, the analyst is provided with a
useful means of cross-checking the information provided in a given column with the
particular row with which it intersects. For example, Shot 15 – the third appearance
of the Westpac logo – can be specified as occurring in Row 37, Column 2.
Column 2 refers to the visual frame that correlates with the time that is
indicated in the first column. Each frame was inserted into the column by copying
selected frames of the entire film text from the .tif file that was obtained from the
video text. In our experience, the main advantage of this approach lies in the disci-
pline and exactitude that it imposes on the entire transcription procedure. In the
Westpac text, nothing is left to chance, so to speak, and the very fine-grained
correlation of selections from different semiotic resources can be more precisely ref-
erenced to the unfolding visual text on a second-to-second basis if this approach is
adopted. A further advantage lies in the way that the overall development of the text
can thus be reproduced to a certain degree of accuracy. However, in the real-time of
the specific .avi video file, there are some fifteen frames per second. It therefore
follows that the reduction of this to just one frame per second represents a choice
based largely on economic and practical necessities. There is no necessary one-to-one
correlation between row and visual frame. Row 24 is an example, where two visual
frames occur, belonging, respectively, to Shots 8 and 9. In those cases where more
than one visual frame is assigned to a single row, this means that the two visual
frames occurred within the same interval of time, specified to the second.
The Westpac text: a micro-analytical approach to transcription & Inset 14 175

Inset 14: Material object text and semiotic action text ..............

 A text is a material object or process as well as a semiotic one. These two dimensions
of a text are equally important for understanding how texts function and have meaning
in particular communities. We can say therefore that texts are dually material and semiotic
entities and processes; that is, the material and the semiotic dimensions are fully
integrated in the one overall contextualizing activity (Lemke, 1995: Chap. 6; Thibault,
2004: Chap. 5). For this reason, the material and the semiotic dimensions are the two
sides of the same textual coin: the one cannot be fully understood and cannot exist
without the other. In this inset, we shall explore this point in greater detail.
The visual, auditory and other patterns which are picked up by our perceptual systems
are the material basis for construing patterns of meaning relations in a text. In this per-
spective, the text-as-material-object participates in the dynamic physical and biological
processes of the community and has material relations and connections to other
material processes in the same or some other community. A textual object in this sense
can be integrated to sensori-motor activity of the body (e.g. a book can be held in the
hand of the reader and its pages visually scanned). A treated surface, such as the paper
on which written signs or visual images are traced or otherwise installed by
technological means, is a material object text in this sense. The surface supports visual-
graphic patterns which have been traced on it and which provide visual information
about something other than the surface in the form of an arrested optic array of visual
invariants (see Inset 15: Gibson’s optic array, p. 192).
In the first instance, the optic array provides information about the tracing
(articulatory) activity or the technological processes (e.g. photographic) that put the
traces on the surface. The material surface used for this purpose affords the material
installation of these tracings in ways which enhance their durability as well as their inte-
gration to social activities and objects relevant to their interpretation in socially mean-
ingful ways. The material object text can be transported, transmitted, manipulated by
different users on different occasions, stored and retrieved etc. It may be relatively per-
manent (e.g. a book) or as ephemeral as the message someone writes in the sand on
the beach before the text is washed away by the incoming tide.
 The physical-material object text is maintained by matter and energy processes that ensure
its structural integrity over some time span, long or short, at the same time that it participates
in a larger-scale system of semiotic-discursive processes. In this sense, the material object-
text constitutes an environment of matter-energy transactions and processes that afford or
enable the object text to be integrated to, and contextualised by, the meaning-making
practices of some community. Object texts are perceivable and manipulable material entities
or processes. They may be dually an extra-somatic surface of some kind and the tracings
displayed on it. The articulatory (somatic) processes of the body, e.g. vocal tract activity in
speaking, hand-arm gestures, facial expressions and other dynamic neuromuscular processes
of the body constitute the material means of installation of audible or visible patterns of
sensori-motor activity that are projected into the environment as an optic or auditory array
that can be picked up by the perceptual systems of others attending to these patterns. Again,
the auditory or visual patterns that are so detected provide information about their source
at the same time that they provide information about things other than that source.
176 Multimodal Transcription and Text Analysis: Chapter 4

.......... two sides of the same textual coin ........

�A text is also a semiotic object or process; its material processes and characteristics afford
the possibility of its being integrated to, and made meaningful in, some community as a
semiotic text. It can be related to the patterns of contextualizing relations that typically
operate in some community. This means that the physical-material tracings on a surface
or the patterns of sound created by vocal tract activity can be construed as semiotically
salient patterns that are potentially meaningful for the participants in some social group.
Furthermore, these semiotic patterns are the means whereby we can interpret and assign
meaning to the phenomena we experience in the world around us, e.g. the events, things
and actions of other objects and participants in the world we live in.
�Semiotic patterns of this kind include the semantic relations in the lexicogrammar and
discourse organisation of linguistic text, transitivity relations construing participants and
the processes in which they are involved in pictures and gestures, the attitudinal and
evaluative meanings conveyed by facial expressions and so on. In this perspective, texts
qua semiotic processes can be contextualised by, and can participate in, social activities in
and through which their users recognise and interpret patterns of semiotic relations that
are meaningful in that activity. They can also be used to relate that activity to some other
activity in some other time and place and to interpret or otherwise assign meaning to other
activities. Semiotic texts are both made and interpreted in and through social activities at
the same time that they participate in these or other activities and help to constitute them.
�Take the example of the fire extinguisher illustrated below. As a physical object, it affords
its use in determinate material processes. It can be picked up, carried to the scene of the
fire and used to extinguish a fire by virtue of the capacity it affords for spraying a chemical
substance it contains onto the source of the fire. This object also provides the means for
the installation on its treated surface of an arrested optic array of visual invariants that
provide information about something other than the painted metal surface of the object.
In this sense, these visual patterns can be interpreted as semiotically salient linguistic and
visual patterns in an instructional text concerning the use of the object.
�The multimodal verbal-visual text that we interpret on the basis of these patterns can
therefore be related to the lexicogrammatical patterns in language and the conventions for
interpreting visual patterns as depicting socially meaningful actions and sequences of
action (e.g. pulling out the safety pin, aiming at the fire and so on). The combined verbal-
visual instructional text can thus be used to make sense of these already meaningful and
purposeful physical actions by connecting the user of the fire extinguisher, the fire
extinguisher, and specific circumstances (outbreak of a fire) in a particular kind of socially
recognisable activity- or event-type. Moreover, the physical object and not just the verbal-
visual text installed on its surface is itself a meaningful semiotic artifact; it has its typical
socially recognised uses and the participant roles that these entail. Its size, shape and
colour together provide cues as to how it is to be interpreted as a certain kind of social
artifact which is imbued with social significance. This significance is further enhanced by
the use of the colour red in such contexts. The choice of this colour from among a set
of other possible colour choices is a meaningful choice: it indexes, in those communities
which recognise its significance, a meaning such as danger; be alert and, therefore, the
kinds of actions and responses that are required in dangerous situations.
The Westpac text: a micro-analytical approach to transcription & Inset 14 177

Inset 14: (continued)

The colour red does not have this meaning in all the situations in which it is used (e.g. my red
sweater). However, in some contexts such as the one described here, it stands in certain kinds
of typical relationships to certain types of situations, meanings, and activities and configura-
tions of these in the communities in which the colour red has the meaning mentioned above.
The relationships it has to all of these factors in combination create a set of contextualizing
relations. Thus, red has a semiotically salient relation to certain kinds of actions and situations.
It has the meaning that it does in relation to this larger whole or contextual configuration rather
than on its own. The meaning may be a highly standardised one, as in this case, but it still has
to be connected to a wider system of contextualizing relations to have the meaning it does.
Texts can also be used to make sense of other activities, and therefore to create contextualiz-
ing relations between the text and these activities. A text realised by one particular configura-
tion of modalities or semiotic resources may be used to interpret and make sense of a
semiotic action or event in some other modality. For example, the action of physically stand-
ing before an audience in an auditorium and producing a lot of vocal tract activity for the
benefit of that audience is a socially meaningful action or event. There is a contextualizing
relationship between/among the sounds the audience hears and, for example, the social role
of lecturer, the architectural configuration of the space in which the event takes place, and
the subject matter (e.g. exoplanets) to which the audience connects the audible sound patterns
and the person whom the members of the audience recognise as embodying that social role
on that particular occasion. Both the meanings related to the topic of exoplanets and the
social-participant roles of lecturer and audience are abstractions from the audible and visi-
ble features of the event; in this way, they can be connected to other similar events and to
other similar ways of talking about the topic or
related topics as well as to other individuals who
embody the same roles on other occasions.
The event as such is dually a material and a semiotic
event in the way described above. However, this
event can in turn be talked about afterwards by
other people in the texts that they create through
their conversation about the event or, for example,
in the form of a journalistic text consisting of
verbal text and photograph of the famous visiting
lecturer in the next day’s newspaper. Both of these
texts – conversation and newspaper story – make
sense of the original event. They do so both by
creating meaningful links to that event and by using
the resources of other semiotic systems (e.g. word
and gesture in conversation or written text and
photograph in the journalistic story) to make sense
of and to recontextualise the original event using Fire extinguisher used to show its dual
the resources of other semiotic modalities and the status as a material object text and
meanings these afford. semiotic action text
178 Multimodal Transcription and Text Analysis: Chapter 4

Column 3 is headed Visual Image and constitutes a series of notational


glosses on the frame reproduced in Column 2, with which it corresponds. The
discussion of these, given below, is necessarily selective in nature. Not all of the
topological meanings presented in the visual text can be adequately represented by
a few shorthand verbal glosses. The transcription is necessarily selective and must
restrict itself to only those visual (or other) features that are relevant to the analysis.
Moreover, the gloss on the meanings of the visual frame in the form of verbal text
and other notational conventions that will be explained below is a necessary step
in the analytical integration of the various semiotic modalities that are codeployed
in this text. In verbalising the visual or other meanings in this way, the table itself
provides semiotic resources for combining and integrating the visual image with
the soundtrack and so on. This is so in the sense that the genre conventions of the
table allow us to selectively combine features from different vertical columns on the
basis of a shared glossing procedure. If the information in each of the columns
referring to, for example, the visual image, body movement and sound (speech,
music, ambient sounds) were simply presented, each with their own specialised
notation, then the possibilities for such integration would be much more opaque.
This does not mean that language is the only resource for achieving this. However,
the choice of the table as the modus operandi of the transcription makes language
the most suitable candidate for our present purposes.
Column 4, headed Kinesic Action, refers to the use of body movements of
various kinds. This column groups together a number of different kinds of behav-
ioural units, as defined by Kendon (1981), or spatiotemporal arrangements of the
agents in some discursive event. In the Westpac text, salient behavioural or kinesic
units include bodily actions such as smiling, rolling the sleeves up, gaze and
moving/walking forward. It is doubtful that such kinesic units have a fixed or
univocal meaning which can be established independently of their cross-modal
relations with other features of the text as a whole. Indeed, identical body move-
ments may have quite different significances according to their patterned relations
with other features in other (con)texts. For example, the kinesic act rolling the sleeves
up may be interpreted as a gestural emblem with the culturally fixed meaning of
starting or getting on with the task to hand. However, it does not follow that this body
movement always has this particular cultural meaning. In some other context, the
same action may be a purely physiological response to climatic conditions, with no
specific semiotic significance in a given interactional context. What, then, helps us
to motivate a specific semiotic significance for a given bodily act? Four criteria may
be invoked here as analytical starting points.
� First, bodily actions tend to focus on particular parts of the body
which have the potential for specific semiotic significance. Thus, facial
display has to do with the exchanging of affect, spatial distance or
The Westpac text: a micro-analytical approach to transcription 179

proxemics with power and social hierarchy, posture with personal


defence, and so on.
 Secondly, bodily actions are cross-modally linked with other features
of the discourse event so that they enter into patterned relations with
other semiotic features in other modalities in the same event. It is on
the basis of such co-contextualising relations that meaning is created,
rather than on the basis of individual kinesic or other units. This is
consistent with the multimodal basis of textual meaning which is
adduced in this book.
 Thirdly, such actions are dialogic acts of semiotic exchange rather than
behavioural units per se. In this way, syntagmatic relations such as Bodily
Act^ Response to Bodily Act link two participants in a dialogic
exchange relation. In the Westpac text, the use of smiling to link
textual participants to the television viewer in an interpersonal relation
of intimacy and solidarity is just one such feature.
 Fourthly, advertising texts such as Westpac frequently use foreground-
ing strategies whereby a given semiotic feature in some modality func-
tions to establish a semantic commonality among the different shots
which comprise the text as a whole. For example, the act of rolling the
sleeves up is just one such feature which recurs throughout the text and
which functions to tie together the various participants – schoolgirl,
businessman, baker, etc. – on the basis of some semantic feature
which they all have in common. This is what Lemke (1985: 287-9)
defines as a covariate semantic relationship whereby formally distinct
elements in a text are linked on the basis of their belonging to a
common semantic class. In Westpac , the repetition of rolling the sleeves
up is just one such foregrounding strategy which thus links the various
textual elements that share this feature on the basis of a meaning
relation that they are all construed as having in common. In the
Westpac text, the foregrounded copatterning of features such as
smiling, rolling the sleeves up, and moving forward establishes a
strongly foregrounded set of cohesive ties which link different
categories of participants to each other as members of a common
chain of interacting cohesive elements.

Hasan’s (1980) notion of textual cohesion as constituted on the basis of


chains of identical and similar elements and the interactions among chains is the
prototype for this kind of analysis. On the basis of their copatterning, what are in
effect separate chains of elements such as smiling, moving forward, and rolling the
sleeves up all come together at some point in the text so that a given participant
180 Multimodal Transcription and Text Analysis: Chapter 4

often does all three at the same time. An example is the nurse in Shot 4, Row 15.
Significantly, this shot occurs at the end of Phase 1 and is the first point in the text
where all three chains come together in the video track. This in itself emphasises
the culminative character of text-linking items, which here reach a culminative peak
at the end of Phase 1, in conjunction with that of the chorus in the soundtrack.
The partial and at times full interaction of all three chains is a striking and signifi-
cant feature of this text. Each feature in its own chain of cohesive elements may
be assigned to a specific superordinate intertextual thematic relation along with its
associated evaluative orientation. However, it is the interaction of all three chains of
elements that provides grounds for saying that these constitute foregrounded
meaning relations that link the different participants on the basis of a shared
intertextual system (Lemke, 1985). In the Westpac advertisement, this may be glossed
as [CORPORATE CAPITALISM ON THE MOVE + POSITIVE EVALUATION/AFFECTIVE
IDENTIFICATION], where CORPORATE CAPITALISM ON THE MOVE refers to the wider
thematic context in, and through which, the individual elements are assigned their
meaning, and POSITIVE EVALUATION /AFFECTIVE IDENTIFICATION refers to the axi-
ological/affective orientation which the text adopts in relation to this thematic at the
same time that it seeks to persuade viewers to adopt a similar value stance.
Column 5, Soundtrack, refers to all aspects of the soundtrack. Here language,
music, and other sounds are considered as parts of a more unified phenomenon
and are not separated out. There are two main reasons for this. First, the
multimodal basis of the transcription and concomitant analysis presume no
necessary priority for the linguistic semiotic in the making of the text’s meaning.
Secondly, and while recognising that each has its distinctive qualities, speech, song,
music, ambient and other sounds also have many features in common which
provide a basis for their potential semiotic integration in multimodal texts (Van
Leeuwen, 1999). Again, the emphasis here is not on mathematical criteria of
acoustic physics, but on criteria which are perceptually and semiotically salient.
Significantly, the Westpac text does not present the various participants in their
work settings from the point of view of a naturalistic or realistic auditory modality
of ‘how things really sound’. The soundtrack does not, for example, make available
to us the ambient sounds of the carpenters working at the house construction site
(Shot 18, Row 42 ), the sounds of the street outside the baker’s shop (Shot 12, Rows
28-30 ), the sounds of the boys playing cricket with the nun (Shot 21, Rows 47-
50 ), or the sound of the helicopter preparing to take off (Shot 16, Rows 38-40 ).
Instead, these visual images and associated body movements are variously integrated
with the sounds of a musical band, a female chorus, a female soloist and an off-
screen male speaker. The only possible exception to this relates to the sounds of the
sheep in the first scene. However, these sounds, too, play their role in the overall
meaning-making process. Without going into all the details here, the soundtrack
itself combines a number of different sound genres – broadly defined – that interact
Etic and emic criteria in multimodal transcription 181

with each other and with other semiotic modalities in the text in order to create their
own specific evaluative and affective orientations to the text’s thematics.
Column 6, Metafunctional Interpretation, represents an attempt to specify the
multifunctional basis of all acts of semiosis. The left-to-right visual organisation of
the table is not without consequences for the ways in which the makers and users
of transcriptions perceive the relationships among the various components of the
transcription and, by implication, of the text transcribed. Ochs (1979: 49) draws
attention to the left-to-right bias which derives from the Western tradition of visual
literacy. In this tradition, left is perceived as signifying both temporal and logical
priority. That is, that which is placed on the left of the transcription is – probably
unconsciously – doubly privileged on account of these organisational principles in
the grammar of visual semiosis in Western cultures. Typically, transcribers place the
verbal or linguistic component of the transcription on the left. If other semiotic
modalities are referred to at all, they tend to be placed to the right of the verbal
component. In the present transcription three interrelated strategies have been
adopted to overcome this problem. First, the leftmost column is that which dually
specifies the temporal progression of the text in seconds and the row number with
which this correlates. The numbers in Column 1 therefore have this dual function.
The important integrating and cross-referencing functions of the rows, as discussed
above, justify the priority which is accorded this column by virtue of its being placed
in the leftmost position in the table. Secondly, the verbal or linguistic dimension of
the textual transcription is located in Column 5, thus mitigating against any tendency
to treat it as more significant than the other columns. Thirdly, Column 5 includes all
relevant aspects of the soundtrack – speech, song, music – as different dimensions
of a single phenomenon. This does not mean that it is always appropriate to treat
speech in this way vis-à-vis other semiotic modalities based on sound. In the present
case, and probably in many other generically related texts, it does make sense to adopt
the approach undertaken here. The reasons for this are explained in 4.7, pp. 191-202.

4. 3. Etic and emic criteria in multimodal transcription

Rather than saying that the advertisement is a sequence of discrete shots along with
the techniques of cutting which mark the transitions between shots or sequences
of shots, it is possible to analyse the text as a series of dialogic moves. In this view,
each dialogic move culminates in the peak of a wave, whereas the implicit response
on the part of the viewer coincides with the trough of the wave (see 4.11.7 pp.239-
42 ). The question then becomes one of asking which particular copatternings of
selections co-occur in relation to either the wave peak or the wave trough at any
given stage of the text. In other words:
(1) how does the wave-like patterning contribute to the dialogic
organisation of the text as an interactive event?
182 Multimodal Transcription and Text Analysis: Chapter 4

(2) which dialogic moves and their responses are associated with
which particular participants and speaking and listening positions
in the discourse event?

Answers to these questions would enable the analyst to account for the ways in
which variations in the kind or degree of selections in any given semiotic modality
may impact upon the overall wave patterning. Figure 4.3 refers to Phase 1 of the
text, which lasts approximately 16 seconds. This figure, which presents the sound-
track as a visual display, was derived from the Timeline window of Adobe Premiere.
It shows the acoustic intensity of the soundtrack relative to its progression in time,
indicated in seconds. With specific reference to the chorus, which starts singing at
03.75 seconds, it shows that each speech act move of the chorus – roll them (three
times) to roll them up (twice) in Phase 1 – is represented as a clearly discernible
periodicity, i.e., a pulse or a surge of acoustic energy. Each such periodicity alter-
nates with a lull or a low point in the overall wave-cycle, corresponding to the brief
pauses between the singing of each clause. The approximate temporal scope of
each periodicity is also indicated in Figure 4.3. The five periodicities referred to are
as follows: (1) 3.75 to 6.50 seconds; (2) 8.0 to 10.0 seconds; (3) 10.5 to 14.0 sec-
onds; (4) 14.5 to 15.0 seconds; (5) 15.5 to 16.0 seconds.
These figures are approximations only due to the limitations of the visual
display and concomitant time analysis. The lulls may thus be seen to correspond to
points where a change of interlocutor can potentially occur. In other words, each
lull is a potential response point to the imperatively realised command, as sung by
the chorus. Overall, each clause as sung in Phase 1 is best seen as a submove in an
overall dialogic move which receives its text-internal response with the transition to
Phase 2, when the female soloist sings her turn in response to the chorus. However,
the soloist does not only address the chorus. She addresses both the members of
the chorus and the viewer. For this reason, we can say that each lull, as described
above, correlates with a potential dialogic response – implicit or explicit – on the part
of, for example, the viewer. Of course, in this text the viewer is present only vir-
tually. The point is that the semiotically projected ideal response of the viewer is syn-
chronised with the rhythmic alternation of wave pulse and lull. Thus, we see that
the overall relationship between text and viewer is organised into a wave cycle. This
wave cycle is not, however, a mere acoustic property of the expression plane. It also
organises waves of meaning and potential courses of action on the content stratum.
That is, the flow of acoustic energy is also a flow of meaning through the entire
system of relations. This is also emphasised by the ways in which we see a number
of participants who actually do roll their sleeves up as the chorus sings (e.g. Shot 1,
Rows 3-10; Shot 2, Rows 11-2; Shot 4, Rows 15-6 ).
Figure 4.3 shows that the soundtrack can be seen as a series of cumulative
waves with their respective peaks and troughs. This raises a further question con-
Etic and emic criteria in multimodal transcription 183

cerning criteria of measurement for the identification of salient changes in the


phase space of the text. In our view, it is not necessary to adopt totally extrinsic or
objective measurement criteria. Instead, criteria based on perceptual salience are to
be preferred (see also Mathiot, 1983: 38; Van Leeuwen, 1985: 222; Gumperz,
Berenz, 1993: 92). In other words, multimodal transcription is not concerned with
etic criteria of an objective physicalist nature as obtained by some kind of
mechanical measuring apparatus. Rather, perceptually salient units must be discovered
and determined, as Pike (1967: 37) expressed it, during the preparation of the
transcription. The analyst of multimodal texts is thus interested in how perceptually
salient features in such events contribute to the meaning-making process of that
event. In this emic point of view, the analyst is concerned with the identification of
units that are perceptually and semiotically salient for the members of the culture in
question. This is a consequence of the fact that multimodal transcription is meaning
based. Given that meaning is always relative to an observer or participant – an agent
– it follows, of course, that the meaning-making patterns in the text can be construed
in different ways by different participant-observers. However, the notion of an attrac-
tor emphasises the fact that foregrounded copatternings of selections are the result
of preferred patterns which the analysis should not overlook.
This does not mean that the preferred pattern is directed or controlled by a
single overarching plan or intention that regulates all the interacting variables in the
text. Rather, the pattern is the result of a synergetic combination of various inter-
acting variables on different scalar levels of the text’s organisation. One such factor
is certainly the aims and purposes of the makers and designers of the text at all
stages from production to postproduction. However, this does not causally explain
the patterned relations in the text. The preferred pattern – the overall sense of order
in, say, a given phase or subphase – is generated by the ways in which many inter-
acting factors seek, or are attracted to, a regime of co-operative stability. It is an
emergent property of the way variables interact in both the history of the system as
well as in the real-time of its performance and reception (Thibault, 1999b: 7).
Van Leeuwen (1985: 218-9) points out how the incidental body movements
of head, arms, torso, etc. of participants are natural (biological) movements, or
organismic variables, that are not dictated by film editors. He further points out that
the shots themselves can be ‘trimmed’ so that these natural body movements

Figure 4.3: Waves relating to the soundtrack in the first phase of the Westpac advertisement
184 Multimodal Transcription and Text Analysis: Chapter 4

coincide with the rhythmic accents of speech or music. An example is the sequence
showing the supervisor walking from right to left and the industrial plant in the
background (Shot 19, Rows 43-4 ). The camera distance is quite close, featuring head
and shoulders, which is typically synonymous with familiar interpersonal relations
(4.7.2, p. 195). The closeness of participant to viewer serves to accentuate the
natural movements of the head. At first, the head is roughly centre frame. This cor-
responds to the male speaker in the voiceover uttering the unaccented with. The
supervisor’s head then swings to the right of the frame, coinciding both with the
next word, the accented first syllable of the word money and the supervisor’s smile.
There is at this point a significant pause or juncture in the speech rhythm. In Visual
Frame/Row 44, the head swings to the left of the visual frame, a movement which
coincides with the manner circumstance with advice, again following the pattern of
accented syllables mentioned before. Thus, the salient accent here falls on the sylla-
ble -vice in the word advice. The supervisor’s smile prosodically extends across both
adverbial groups spoken by the male narrator and both visual frames, starting on
one accented syllable and ending on another before the cut to the next shot. Here
we have a microscopic slice of this synergetic co-operation among different inter-
acting variables, some originating from the natural rhythms of the participant’s body
as he walks, others synchronised in postproduction in such a way that a combination of
visual trimming and synchronisation of the rhythms of the male narrator’s off-scene
voice ensures that the centre-left-right swing of the head in walking, the accented
syllables in the male speaker’s speech, the rhythmic juncture or pause between the
two prepositional phrases – with money, with advice – in the speech of the male narra-
tor and the onset and subsequent development of the supervisor’s smile are all syn-
chronised, not on the basis of a single master plan but on the basis of these variables
fluctuating in a stable way so as to generate the local pattern described here.

4. 4. Phases, subphases and transitions

As pointed out in Inset 7 (p. 47), a discoursal phase, following Gregory (1995,
2002), is a set of copatterned semiotic selections that are codeployed in a consis-
tent way over a given stretch of text. In the Westpac advertisement, there are five
main phases which exhibit an overall pattern either increasing towards a peak or
decreasing towards a trough. The text is not simply structured as a sequence of
alternating shots, but as a sequence of alternating turns between different voices,
sung, spoken and instrumental. The five phases in the Westpac text can be specified
in this way (see also 4.6.1, pp. 187-188).
The start of a given phase is indicated at the appropriate point in Column 6,
labelled Metafunctional Interpretation. Thus, the first phase is indicated by upper case
letters, and a number indicates which phase it is and its position in the text. The
subphases within any given phase are further specified by a lower case letter of the
Phases, subphases and transitions 185

alphabet in subscript. For example, the stretch of text which is headed by Phase 1b
refers to the second subphase of the first phase of the text, which extends from
Rows 11 to 14. The reason why the phase labelling is placed in Column 6 has to do
with the fact that any decision concerning where to draw the boundaries between one
phase, or subphase, and another is always motivated by criteria which involve all meta-
functions. As indicated in Column 1, Phase 1 extends over the first sixteen seconds of
the text. It is characterised by an overall increase towards a culminating peak. This may
be described as follows: Visual Frame/Row 1 shows the lone sheep herdsman with his
dog in the mythical vastness of the Australian outback. In this frame there is silence –
no sounds of any kind are heard during the first second of the soundtrack. The sheep
herdsman is seen as far away from the viewer. Implicit is the notion that the only dyadic
interaction is that between herdsman and dog, as then evidenced in Visual
Frames/Rows 2-4, when the herdsman beckons the dog back to his side. The solitary
life of the herdsman is then accompanied by the solo keyboard which interacts, con-
trapuntal fashion, with the sounds of the sheep in Column 5 (Rows 2-3 ). All of this
is initially offset by the vastness and silence of the Australian outback in this scene.
The transitions, boundaries or junctures between phases may be signalled in a
variety of ways on both the content and the expression planes (see Inset 18, pp. 236-
7). On the expression plane, a change, a break, or a pause in the rhythm of music,
speech, body movement, or cutting between shots coincides, generally speaking, with
the transition to a new phase or subphase. The same can be said of tempo, whether
visual, as in the movement of the camera, kinesic, having to do with the locomotory,
gestural, facial and other body movements of participants, or in the speech and
musical and other sounds of the soundtrack. On the content plane, there may be a cor-
responding shift in, for example, the visual or linguistic thematics, the
evaluative/interpersonal orientation (Lemke, 1988), or in the specific textual voice that
constitutes a given move in the text.
Perceptually speaking, transitions between phases are not always clear cut. This
means that it may be difficult to decide exactly where one phase ends and the other
begins as the boundaries between phases, rather than being segmental in character, are
continuous and, hence, blurred. This is in keeping with the wave-like or periodic char-
acter of the phase itself. Thus, the transition point may be characterised by a gradual
merging of features from the two phases in question as one phase decays or fades out
and the other comes into being. In the Westpac text, the transitions between phases
tend, overall, to be quite clear cut. Thus, the cut from one visual shot to another coin-
cides with a shift to a different musical or spoken voice in the turntaking sequencing
of, say, chorus, and female soloist. For example, the cut to the first appearance of the
Westpac logo moving forward at the beginning of Phase 2 (Row 17 ) perfectly coin-
cides with the first entry of the female soloist, singing and let’s get …, whereas this
imperative clause is not completed until the cut to Shot 6, Row 18, featuring the
schoolgirl at her desk. Her leaning forward copatterns with the word moving. The
186 Multimodal Transcription and Text Analysis: Chapter 4

transitions between subphases are not always so straightforward. At times, there is


an almost imperceptible overlap between subphases, as a fine-grained analysis of
the transitions between some of the visual frames will reveal (see, for example, Row
24, Column 2 ).
An example is the transition between Phase 2b and Phase 2c (Rows 19-20 ). In
this case, Visual Frame 20 coincides with the shift from the chorus singing roll them
up to the female soloist, who starts singing let’s …. The fact that this shift occurs at
Visual Frame/Row 20 is itself significant. At this point in the visual text, the father
and son are linked by a gaze vector. In the earlier part of the same shot (Rows 18-9 )
this was not the case: the father’s gaze was directed at his son, who, in turn, directed
his gaze at his own hands while rolling his sleeves up (see 4.7.7, p. 200). Rather, it
occurs when they have both finished rolling their sleeves up. The gaze vector linking
the two participants may serve to establish a relationship of affiliative motivation
between them or to obtain the coparticipation of the other in some joint activity, as
shown in studies of the function of gaze in face-to-face interaction (Beattie, 1981: 302;
Goodwin, Goodwin, 1992: 90). Thus, the shift from chorus to soloist on this same
frame coincides with a shift in the visual thematics from preparing to do the job to
actually starting the job. Such moments of transition where overlap occurs are
indicated in Column 1 by a vertical double-headed arrow extending from the begin-
ning of the transition to its end, as in, for example, Rows 20 and 52-3.

4. 5. Column 1: Row number and time specification

As well as providing quick-and-easy means of identifying the various horizontal rows


in the transcription, Column 1 indicates the real-time progression of the text on a
second-to-second basis. Thus, the numbers in this column indicate both progression in
time as indicated in the .avi file when played in Windows Media Player and the entire
row to which this point in time corresponds. In this way, the information in the
remaining five columns can be correlated with each other simply by comparing items
along the horizontal row that corresponds to a given number in Column 1. It is
important to point out that the time indicated in Column 1 does not correspond to
the visual frame in Column 2 in a direct way. This is so because it is sometimes more
appropriate to insert two or more visual frames in a given row in order to illustrate
more clearly a specific micro-level development or a transition in the text. This is the
case with Row 24, for instance, which coincides with the transition from Phase 2c to
Phase 3a. While both of the visual frames shown occurred within the time frame of
one second, as indicated in Column 1, the insertion of the second frame in Row 24,
featuring the logo (Shot 9 ), indicates that this reappearance of the logo demarcates
the transition into a new subphase even though the other selections that characterise
this subphase – the female soloist, the schoolgirl and so on – do not occur until Row
26 and, therefore, after the appearance of the logo.
Column 1: Row number and time specification & Column 2: The visual frame 187

4. 6. Column 2: The visual frame

4. 6. 1. Visual frames and shots


The visual frames in Column 2 serve to specify the segmentation of the video track
into visual shots and the transitions between shots. Visual frames or stills should
not be taken as coinciding with shots. Given that there are some fifteen frames per
second in the real-time of video playback, the frames that are shown in Column 2
are themselves a visual transcription of some aspects of the visual track. A shot is
defined as a filmed visual sequence in which there is no spatial displacement of the
camera; for example, forwards or backwards. In the transcription, the temporal
duration of a specific shot is ascertained by correlating the numerals in Column 1
with the visual frames in Column 2 that represent the extent of any specific shot.
In Shot 1, the camera provides a fixed point of observation in relation to which
salient changes in the depicted scene occur while other features remain invariant
(see 4.11.3, pp. 228-230). The most salient change is the movement of the sheep
herdsman from far away to close up as he rolls his sleeves up while walking towards
the stationary camera. A subordinate change is the movement of the sheep dog in
relation to the herdsman (Rows 2-7 ). The chief invariants are the location and its
features (trees, landscape, sheep, sky), which constitute the circumstance in which
the above change – main action and subordinate actions – occurs.
Shot 2 (Rows 11-2 ) constitutes a displacement of the point of observation
provided by the camera – again fixed – to the draughtswoman working at her desk
in a professional studio, in other words a different participant in a different loca-
tion. She, too, rolls her sleeves up. At first glance, the difficulty of establishing or
perceiving invariant structures that the two shots have in common may appear to
pose a problem of local coherence. If we have jumped from one set of invariant
structures to another set, then how are the two shots related? The same may also be
asked of the remaining shots – Shots 3 and 4 – in this initial subphase (Rows 13-6 ).
It may be argued that what appears to be a problem of local discontinuity or inco-
herence is, in actual fact, overridden at higher levels of textual organisation by
covariate semantic ties in the visual thematics (see above) that are progressively
defined in the unfolding text as cohesive chains extending over the entire text. For
example, the foregrounded copatternings of items deriving from the interacting
cohesive chains of smiling, rolling the sleeves, and moving forward function to create
global coherence in the text.
In Shot 2, the salient variant structure is the draughtswoman’s rolling her
sleeves up. Unlike the sheep herdsman, she does not move towards the camera as
she is seated at her desk. However, in Shots 3 and 4 both the truck driver and the
nurse move towards the camera at the same time that they smile. Again, there is
local variation that contrasts with global patterning, as witnessed in the fact that the
truck driver does not roll his sleeves up (he is dressed in a blue sleeveless singlet
188 Multimodal Transcription and Text Analysis: Chapter 4

typical of manual workers). Thus we see how the principal variant structures within
each individual shot are a text-developing strategy whereby global continuity and
coherence are enacted on the basis of local variation and change. Shot 1 also serves
as an orientational or establishment shot, even though its participants (herdsman and
dog) and spatial location are not seen in the remainder of the text. This may be
explained as follows.
On analogy with hyper-Theme in linguistic texts (see 2.4.2 , pp. 74-77; see
also Daneš, 1974, 1989; Martin, 1992: Chap. 6), Shot 1 serves an important anchor-
ing function for the shots that follow it. In the visual semiotic, a textual hyper-Theme
is an introductory shot which functions to predict a particular pattern of thematic
development in successive shots in some phase or subphase or sequence of sub-
phases. This shot is hyper-Thematic in the sense that it functions to establish a
particular pattern of interaction among the other shots which realise this textual
subphase with respect to thematic choices and their development. Shot 1 thus has
an anchoring function because it serves to establish a global visual thematic meaning
which provides a textual basis for the development of the shots which follow it in
Phase 1. It is, therefore, prospective or anticipatory in character. In Subphase 1a, it
provides a thematic anchoring point whereby the shots that follow are linked to a
shared and developing network of (inter)textual thematic relations. Thus, in this
text the apparent problem of the lack of visual invariants that are common to each
successive shot is solved in a different way, namely the thematic continuity from shot
to shot is developed on the basis of each shot’s local contribution to a higher-order
visual thematic system. It is the visual salience accorded to the primary participant –
herdsman, draughtswoman, truck driver, nurse – along with performance indexes
(stereotypical work clothes, work setting, implements) that links all these shots –
Shots 1 to 4 – on the basis of a common thematic relation that may be glossed as
[TYPICAL OCCUPATIONAL ROLES]. In this respect, Shot 1 is interesting because it
instantiates in the form of the herdsman the archetype of the early pioneering hero
on which the historical myth of the Australian outback was founded.
Before concluding the current section, a brief comment on the relationship
between shot and phase seems to be in order. A shot, as defined above, is a con-
stituent in the visual semiotic. Given that the shot is specific to the visual semiotic
of video texts, it is legitimate to say that shots and the relations between shots are
intra-semiotic in character. A phase or subphase, on the other hand, is an inter-semiotic
notion. As defined in the present book, phases and their subphases are an interme-
diate level of textual organisation that integrate microlevel selections of resources
from diverse semiotic modalities in a consistent way (see 1.6, pp. 46-54, 4.4, pp.
184-186 and Inset 7, p. 47) so as to achieve a global text organisation. The notion
of shot is subordinate to that of phase, given that shots are, in the Westpac text, just
one of the semiotic resources that combine with others to produce determinate
phases.
Information structure: Given and New 189

4. 6. 2. Information structure: Given and New


The question of information structure and its organisation and the related ques-
tion of how the visual text is organised in terms of the functionally related vari-
ables Given-New now need to be addressed. Kress and Van Leeuwen (1996: 186-
92) argue that visual information is organised on the basis of a horizontal structure
which presents information as Given or New. In this view, left of centre is Given;
right of centre is New. Their position ultimately derives from Halliday’s (1994
[1985]: 295-9) analysis of Given and New in the linguistic clause. In Halliday’s view,
Given and New are functions which are realised by specific constituents in the
clause. This view carries over into Kress and Van Leeuwen’s analysis of the printed
visual image in terms of a horizontal structuring of the image in terms of left and
right. While there is no denying that images can be so divided, this way of formu-
lating the question would appear to remain too tied to an inappropriate extension
of the notion of constituency to the visual text.
In our view, a more convincing solution in the case of video texts lies in
reconsidering the essentially topological-continuous character of visual texts as distinct
from the predominantly typological-categorial character of the semantics of natural
language. Gibson points out that the progressive picture (e.g. the film or video text),
as distinct from the still picture, is not based on motion, as is commonly thought,
but on ‘change of structure in the optic array’ (Gibson, 1986 [1979]: 302; see Inset
15: Gibson’s optic array, p. 192)
The information in the depicted world of the text specifies the participants,
events, actions, places and so on in that world relative to a point of observation.
This information consists of both invariants and variants, or transformations, in
the optic array.
A better solution to the question of Given-New lies in this important obser-
vation of Gibson’s and its further development. If we consider Shot 1 (Rows 1-10 )
in the Westpac text in this light, it is clear that left-right horizontal structure has little
or nothing to do with the organisation of this shot into Given and New. Instead, the
critical factor has to do with the fact that the sheep herdsman is first seen as quite
distant and perceptually non-salient and then he progressively moves towards the
viewer until he occupies a large proportion of the visual frame at the conclusion of
the shot. In other words, it is the dynamic transformation of the herdsman from an
inconspicuous aspect of the ground to his emergence as the dominant figure which
is most pertinent in the assigning of criteria of informational salience and newness.
Throughout the shot his positioning is fairly central rather than left or right.
What is New in this shot is not based on left-right structuring, but on what is pro-
gressively made salient or focal, along with all those other features that lie within its
scope. The progressive increase in size of the herdsman, the perceptual centring of
the herdsman in the frame, his moving toward the viewer in contrast to the lack of
movement in the overall scene, along with other actions – principally that of rolling
190 Multimodal Transcription and Text Analysis: Chapter 4

his sleeves up – that fall within the scope of this overall movement prosody con-
stitutes the New in this shot. Importantly, it is this combination of features that
constitutes the principal informational variant or transformation in the delimited optic
array of this shot. In contrast, the landscape, the sky, the sheep and the trees are
invariant structures throughout the duration of the shot. They can thus be con-
strued as Given. In these terms, the New information unit is constituted by the
prosodic modulating of salient informational variants, or transformations, against a
background of informational invariants that are construable as Given (see 4.11.3, pp.
228-230).
As far as progressive pictures are concerned, left-right horizontal structuring
per se proves in any case to be too static a notion to be really useful. On the other
hand, the equating of salience with a dynamic informational variant in the visual
topology of the text provides a semiotically (and perceptually) better motivated
criterion for specifying what a given text treats or presents as New for the viewer.
It thus seems more reasonable to consider each shot as a quantum of information –
with both variant and invariant factors – which can be organised in terms of one
or more salient or focal informational units of variable prosodic scope rather than a
fixed geometry of left versus right.

4. 6. 3. Sequencing and relations of interdependency between shots


Shots must also be linked to each other in the overall sequencing of the film. This
entails visual strategies for linking one shot to another, or one shot to a series of
shots, in various relations of interdependency. Relations of interdependency between
shots have to do with questions of temporal and logical (causal, etc.) sequence,
continuity and discontinuity, subordination and superordination. Such relations
have to do with the semiotic resources of the logical metafunction (see 4.11.4, pp.
230-232 and Inset 4, pp. 22-23). Transitions between visual shots may take various
forms such as cuts, dissolves, fade-ins, fade-outs, wipes and so on. In the Westpac
advertisement, all such transitions are in the form of visual cuts. While the visual
cuts are easily determined simply by looking down Column 2, a simple notational
convention to indicate the various types of transitions between shots may be used.
Only the first of these is used in this particular text. Thus, the sign ‘!’ indicates a
visual cut; ‘²’ a dissolve; ‘>’ a fade-in; and ‘<’ a fade-out. These signs are placed after
the last visual frame in a sequence of frames demarcating a given visual shot. In
this way, the analyst can specify, for example, that the transition between Shot 1 and
Shot 2 (Rows 10-1 ) is effected by a visual cut and that Shot 1 is of ten seconds
duration. Shot 1 is, incidentally, the longest single shot in this text. This fact is itself
significant in various ways for the overall meaning of the text as well as for the
hyperthematic status of Shot 1 (see 4.6.1, pp. 187-188). It is significant that Shot 1
also corresponds to Subphase 1a, as a glance at the other rows and columns in the
first ten seconds of the text will reveal.
Column 3: The visual image 191

4. 7. Column 3: The visual image

4. 7. 1. Specifying visual information


In Column 3, a number of parameters which have to do with the ways in which
options in the visual semiotic organise the relations between the depicted world of
the visual image and the viewer will be annotated. This column will not be con-
cerned with the specifically kinesic aspects of the depicted world of the image –
the locomotory, gestural, facial and other bodily movements of the participants in
the depicted world. These will be dealt with in Column 4.
Gibson points out that what he calls the progressive picture, in contrast to the
more usual term motion picture:
provides a changing optic array of limited scope to a
point of observation in front of the picture, an array that
makes information available to a viewer at the point of
observation (Gibson, 1986 [1979]: 302).

In Gibson’s theory of visual perception, the ambient stimulus information that is


available to our sensory systems is structured information that specifies the world for
perceivers (Gibson, 1986 [1979]: 63). The term optic array refers to the structured
stimulus information which is available to perceivers in the environment in which
they live, move and orient (see Inset 15 on the following page). Such an array is
ambient in contrast to the delimited character of the optic array that the video screen
affords the viewer. The visual information in the delimited optic array of the video
screen can specify visual kinaesthesis in the viewer even though the viewer may occupy
a fixed position, seated, say, in a room with the screen occupying a specific position
in front of him or her. In this way, the viewer is provided with an impression, which
is entirely virtual, that he or she is turning and hence orienting his or her head in
relation to the depicted world of the participants, actions, events, locations and so on,
or moving closer to or further away from these. These camera movements are anal-
ogous to the head-body system that is involved in visual perception of the ambient
optic array (Gibson, 1986 [1979]: 298). There are, then, two sources of visual
information simultaneously available to the viewer in the video text. The visual
information that specifies visual kinaesthesis in the viewer and the visual information
that specifies the depicted world of the text therefore need to be distinguished both in
transcription and in analysis at the same time that the relationship between the two is
also understood. The first kind of visual information specifies the viewer’s embodied
relation to another kind of visual information which is simultaneously present in the
delimited (not ambient) optic array of the video screen (see 4.11, pp. 223-248).
The observations made here rest on a still more fundamental point, namely that
all forms of visual semiosis are recontextualisations or transformations of the head-
body system of the individual who picks up information from the ambient optic array
192 Multimodal Transcription and Text Analysis: Chapter 4

Inset 15: Gibson’s optic array

 In James J. Gibson’s ecological theory of visual perception, the ambient


optic array is a structured arrangement of stimulus information that speci-
fies the environment of the observers who inhabit it (Gibson, 1986 [1979]:
63). To be ambient at a point of observation that could be occupied by an
observer means to surround that point. A point of observation is a point in
the ecological space-time of the observer from which its environment can
be observed. When a position is occupied by an observer, human or other-
wise, the ambient array is modified in such a way that it provides informa-
tion about the observer as well as information about the environment that sur-
rounds the observer (Gibson, 1986 [1979]: 66). Information of the first kind
is known as proprioceptive information; information of the second kind is
referred to as exteroceptive information.
Gibson defines a picture as ‘a surface that always specifies something other
than what it is’ (1986 [1979]: 273). Whether on a large surface displayed in a
public place or on the page which one reads in the privacy of one’s study,
the optic array is arrested in time. For this reason, Gibson says that the optic
array of pictures is delimited. A delimited optic array is an arrangement of
arrested visual structures. The arrested structures of such an array do not
‘surround a position in the environment that could be occupied by an
observer’ (Gibson, 1986 [1979]: 65). For this reason, the optic array of a pic-
ture is said to be delimited, not ambient. On the other hand, the ambient optic
array that illuminates and provides information about the surfaces, objects,
events, and so on, in the world in which we live is ambient. The ambient
optic array surrounds potential points of observation in the environment of
the observer. For this reason, it is environing in Gibson’s sense. In the case
of moving images projected onto a screen, as in television or cinema, the field
of view of the camera becomes the optic array of the viewer (Gibson, 1986
[1979]: 298). Again, the optic array which is projected onto the screen is
delimited, not ambient.

 A picture is a record of the invariants that an observer has extracted from


the ambient optic array in order that these may be ‘stored’, ‘put away’,
‘retrieved’, or ‘exchanged’ (Gibson, 1986 [1979]: 274). Gibson points out
that a picture is not a record of perception. That is, it is not a record of
‘what the picture maker was seeing at the time she made the picture at the
point of observation she then occupied’ (Gibson, 1986 [1979]: 274). A pic-
ture can be this, but, more fundamentally, it is a record of invariants in the
ambient optic array that the picture maker has selected for attention.
Column 3: The visual image & Inset 15 193

as he or she orients to this and moves within it. It is the ground – the earth’s surface
– which provides the viewer with his or her primary means of support and his or her
main reference point with respect to all other surfaces when orienting to and sampling
the ambient optical array (Gibson, 1986 [1979]: 16, 33; Thibault, 2004b: 26-30). Thus,
the ground may be seen as a principle of ‘congruency’ with respect to the metaphor-
ical transformations that the head-body system undergoes in the virtual visual
kinaesthesis of all forms of visual text – drawings, paintings, photographs, scientific
diagrams, films, video games, CD-Roms, flight simulators, and so on.
In the case of a television advertisement such as the Westpac text, the total
visual system is constituted by the interactions among:
(1) the delimited optic array that changes over time of the video screen
and which is projected to a potential point of observation;
(2) the information that the surface of the screen contains about
phenomena other than the physical surface itself, comprising (i) the
depicted world of the visual image projected on the screen’s
surface; and (ii) the camera movements analogous to our head-
body movements that orient to the depicted world;
(3) the viewer who occupies a point of observation in relation to the
video (TV) screen and the information that it projects.

The notational conventions adopted in Column 3 are as follows. CP refers to


camera position. Is the camera stationary, or is it moving? If it is moving, is it
moving in relation to the movements of the camera operator, or in order to visually
track the movements of the participants in the visual frame? Camera movement in
one shot may contrast with absence of such movement in a preceding or following
shot. This can be significant in various ways. A camera movement may coincide
with a rhythmically salient moment in the text, hence contributing to the fore-
grounding of some feature of the text. A contrast between movement and absence
of movement may also signify some sort of transition point in the structuring of
the text. For example, in Shot 20 (Rows 45-6) the camera pans towards the
Westpac bank teller and concludes Subphase 4b. In Shot 21 (Rows 47-50), in
contrast, the camera is stationary. The contrast between ‘stationary’ and ‘panning’ is
used at various points in the text and doubtless functions, in relation with other
features, as a framework for an emergent textual cohesion. However, this contrast
is not an end in itself. Instead, the contrast between one part of the text and some
other, relative to the whole, constitutes a framework within which social norms of
interpretation are invoked as the text develops in time as a unit of social interaction.
If the camera is moving, the main options are as follows. Panning refers to
the movement of the camera sideways, either left or right, to create the illusion of
a panoramic view. The camera can also move backwards or forwards with respect
194 Multimodal Transcription and Text Analysis: Chapter 4

to the depicted events and participants. If the camera is mounted on a moving


vehicle or dolly, it can move sideways, thereby allowing for the deletion or accretion
of occluding edges during the movement of the camera with respect to the field of
vision. This is known as the dolly shot.
The main options are displayed as a system network in Figure. 4.4. The system
network formalises a set of contrasting terms as a system of possible options in
meaning. A few notational conventions need to be explained. On the far left of the
network, the horizontal arrow specifies the systemic environment which is to be so
formalised. In Figure. 4. 4, this refers to the system Camera position, which is then
further subcategorised as a number of contrasting terms, each with their respective
semiotic values in that system. Reading from left to right, more specific or delicate
options are revealed. The use of square brackets indicates a disjunctive or either/or
entry condition. For example, one must select either stationary or moving, though not
both, before more specific selections can in turn be made. The use of curly brackets
(see Figure 4.5) specifies, on the other hand, a conjunctive or both/and entry
condition: for example, select both features x and y. The italicised terms which are
used to identify the various options in Figure 4.4 are the same as those used in
Column 3 of the transcription. In Column 3, the type of camera position will be
annotated in the following way: [CP: stationary], where CP is notational shorthand
for Camera position and stationary indicates the type of shot with reference to the
movement or otherwise of the camera (see Figure 4.4). The parentheses simply indi-
cate that the information contained therein is a set of features characterising a given

stationary
panning

sideways
Camera position dolly

moving sagittal/tilting

forwards

perpendicular

backwards

Figure 4.4: Camera position relative to depicted world of image


and visual kinaesthesis of viewer : main options
Perspective & Distance 195

option to a specifiable level of delicacy or specificity. They serve to demarcate such


information from the surrounding text and to indicate the metasemiotic status of the
notation.

4. 7. 2. Perspective
Perspective will be transcribed in terms of two basic possibilities, viz. horizontal and
vertical angle (Kress, van Leeuwen, 1996: 140-8). Horizontal angles have to do with
degree of involvement in, or empathy with, the participants and so on in the depicted
world. There are two main options: the viewer is positioned directly in front of the
depicted world, or obliquely, i.e. at an angle. The former possibility increases the
viewer’s empathy with, and direct involvement in, the actions, events, and participants
of the depicted world; the latter suggests detachment, lack of involvement.
Horizontal perspective will be transcribed as follows: [HP: direct], [HP:
oblique], where, for example, [HP: oblique] indicates that the viewer is positioned
so as to view the depicted world from an oblique angle, as in Shot 23, Row 53.
Vertical perspective is concerned with the power, status and solidarity relations
between the viewer and the depicted world. There are three main options. The
viewer may: a) be positioned so as to look down at the depicted world as if from on
high. In this case, the viewer may be positioned as having power over the
participants in the depicted world, or as viewing this world from a detached,
depersonalised or objectified perspective, as is the case in aerial and bird’s eye
views; b) be placed on the same level as the depicted world in a relationship of
equality or solidarity; c) view the depicted world from below such that the viewer is
placed in a position of inferiority.
There are, of course, many gradations possible between these three points,
which are, therefore, best seen as located on a graded continuum of possibilities.
Further, the references to notions such as power, status, solidarity, objectification,
empathy and so on, are themselves interpretations which cannot be assigned to
these options on a one-to-one basis. Instead, such interpretations will need to be
made and justified on the basis of the copatternings of these options with others
in the text. For this reason, vertical perspective has been transcribed in terms of
the three basic possibilities of high, median and low, rather than on the basis of
specific interpretations. Thus, [VP: low], for example, means that in the vertical
perspective the viewer is positioned so as to view the depicted world from below
(e.g. Shot 3, Rows 13-4 ), and so on, as described above.

4. 7. 3. Distance
A further important visual parameter which functions to orient the perspective of
the viewer is that of the virtual or simulated distance between viewer and the
depicted world of the image. This is a further aspect of the way in which the posi-
tioning of the camera relative to the depicted world simulates visual kinaesthesis in
196

close

distance median eye contact

other participant body part


far
clothing

inside personal space


engaged object

Gaze outside personal space

self

depicted world
aversion
Multimodal Transcription and Text Analysis: Chapter 4

disengaged self-involvement
orientation
viewer mental process

off-screen
indeterminate (monitoring, etc.)

Figure 4.5: System network of basic options for gaze in video texts
Distance 197

relation to the head-body position of the observer who occupies a point of obser-
vation. Obviously, the viewer’s actual physical distance from the video screen and
the virtual distance as constructed by the camera in relation to the depicted world
are not to be confused here. The two extremes of maximally near and maximally far
relative to the position of the observer do not refer to the mathematical
abstractions of Cartesian geometry, where maximally far implicates the mathemat-
ical notion of infinity. Instead, these two extremes have to do with the here of the
nose and the there of the horizon relative to an observer who is located on the
ground (Gibson, 1986 [1979]: 117). Distance is an embodied notion for expressing
the relations between the nose-here and the horizon-there parameters and their trans-
formations and virtual simulations in visual texts. Distance is neither an objective
property of the physical world per se nor is it simply a matter of individual per-
ception. Instead, it is a meaning-making resource (Van Leeuwen, 1996: 90).
With these considerations in mind, we can postulate a cline of possibilities
from maximally close to maximally far, relative to the embodied perspective of the
observer. Visual images may simulate interpersonal closeness or distance between
viewer and the participants in the text (Kress,Van Leeuwen, 1996: 130-5). In visual
semiosis, these are transformations of the proxemic resources which regulate social-
interpersonal relations between interactants (Hall, 1972 [1963]). As we pointed out
in 1.4.1, close shots express intimacy and personality while distance depersonalises
and objectifies. A scale of degrees of closeness and distance can be postulated on
the basis of the following transcription conventions:
MAXIMALLY CLOSE
VCS = Very close shot (less than head and shoulders);
CS = Close shot (head and shoulders);
MCS = Medium close shot (human figure cut off at waist);
MLS = Medium long shot (full length of human figure);
LS = Long shot (human figure occupies approximately half the height of the image);
VLS = Very long shot (the distance is even greater);
MAXIMALLY DISTANT
In the transcription, distance relative to the nose-here perspective of the viewer, as
simulated by the camera relative to the depicted world of the text, will be annotated
as follows. For example, [D: CS] indicates a close shot (head and shoulders in the
case of a human participant, as in Shot 17, Row 41 ). In the Westpac text, most of
the depicted participants are human. For this reason, the basic notational
conventions as presented here should suffice. Clearly, some modifications may
have to be contemplated in the case of non-human participants (e.g. buildings,
landscapes) although the basic principles remain the same. Distance also interacts
with motion perspective as participants move towards or away from the observer.
Importantly, the Westpac text makes considerable use of the first of these
198 Multimodal Transcription and Text Analysis: Chapter 4

possibilities. This shows very clearly that distance and its virtual simulations in
visual semiosis is not a matter of abstract Cartesian geometrical space. Rather, they
are, in the first instance, a question of ‘the number of paces along the ground’
(Gibson, 1986 [1979]: 117) between some object, person, and so on, and the
observer, as specified by the interaction between the optical information that
specifies the camera/observer and the information that specifies the depicted world.

4. 7. 4. Visual collocation
The Westpac advertisement devotes considerable attention to details which may in
some way collocate with or otherwise index the performance role of the
participant(s) in a given shot. The term visual collocation (VC) is intended to indicate
those secondary items which do not have participant status, but which function to
specify either the role of the participant or the activity which he or she is perform-
ing. In the topological space of the visual field, such objects form, in relation to the
main participant(s), a distributionally associated set of relations. In the Westpac text,
their use borders on the stereotypical, insofar as each shot is characterised by a num-
ber of such objects which function to index some aspect of the participant, his/her
role, or the socially relevant location in which the depicted scene takes place. A
further subcategory is the use of dress, again quite stereotypical, which serves to
index the social role, class and gender of the participant. In transcribing such
objects, tools, and other performance indicators, the aim is not to write down every-
thing which appears in a given shot, which would be pointless and self-defeating.
Rather, the aim should be to note down with a fair degree of parsimony only those
items which are strictly relevant to the purposes of the transcription and subsequent
analysis.
The transcription of visual collocation will be illustrated here with reference
to the truck driver in Shot 3 (Rows 13-14). Thus, [VC: (body) tattoos on left arm;
(dress) blue work singlet; (location) cabin of truck; (role) truck driver]. This
example may be read as follows. The items in round brackets designate particular
subcategories of VC which co-occur within the same shot. In the Westpac text,
body, dress, location, and occupational role are especially relevant. Further, it is the
collocation of such features in a given visual field which together serves to index the
relevant situation or situation-type.
The notion of collocation as used in Firthian and neo-Firthian approaches to
language (see Sinclair, 1991) has thus been adapted here to suggest the ways in
which, for example, given objects, ways of dressing, occupational roles and insti-
tutional locations have their typical patterns of distribution in a visual field even
though they can also occur independently of each other, with different functions in
different contexts, or in different distributionally associated relations. Therefore, any
given feature referred to in this way may have functions other than those which are
relevant to the analysis in the current text.
Visual collocation, Visual salience & Colour 199

4. 7. 5. Visual salience
In a given visual text, some features will be perceived to be more salient than others
and, hence, to have greater informational prominence in the text or some part of
it. Visual salience (VS) is related to the articulation of the relationship between fig-
ure and ground, as studied by researchers of visual perception in the Gestalt tradi-
tion. Kanizsa (1980: 41) points out that in a given visual field a figure emerges with
respect to a ground on the basis of a number of interacting factors. The most
important of these include the relative size of the parts, their topological relations,
their types of margins, as well as spatial orientation (Kanizsa, 1980: 41-3). Salient
objects tend overall to occupy a smaller proportion of the total volume of the
visual field than does the background. Furthermore, salient objects tend to be
more substantial and distinct with respect to their background, both in terms of
solidity and colour. The background, by contrast, may be relatively indistinct,
lacking in detail, and exhibiting less compactness of colouring. These are generali-
sations only, and each individual case may make use of these in different ways, or
it may make use of only some of these possibilities. In order to transcribe visually
salient items in the image, a very simple notation – [VS: draughtswoman] – may be
adopted, which simply identifies the salient feature(s), namely the draughtswoman,
with reference to Shot 2 (Rows 11-12).

4. 7. 6. Colour
In a text such as the one to hand, there is a need to transcribe specific colours
which have a special salience or significance in the text – hence the reproduction
of Appendix I in colour. In the Westpac text, the colours red and blue are particu-
larly significant. This is so because of the function of these colours in tying various
features of the text together on the basis of shared covariate ties. For example, the
occurrence of the colour red clearly associates with, and indexes, Westpac, and
serves textually to link various participants and circumstances to Westpac. This is
evident, for example, in the red Westpac logo on the helicopter pilot’s flying suit in
Shot 16 (Row 38 ) and in the red ties worn by the business executives in Shot 23,
Rows 53-55. However, it should be emphasised that colour is not an isolate – it is
not a question of a pure chromatic quality – but has its significance in relation to
other features of the visual field with which is integrated. A good example in the
text is the Westpac logo, which appears three times (Shots 5, 9, 15, corresponding
to Rows 17, 24-25, and 37, respectively). Whereas the red of the W-shaped logo
exhibits qualities of surface and texture, as is typical of the surface colours of a well-
defined object located in three dimensional space, the two shades of blue which,
respectively, characterise the sky and the sea in these shots, are conspicuously less
substantial and less consistent, thus denoting something more fluid, without density
and precise contours. Respectively, they characterise colours of surfaces (e.g. the
logo) and colours of films or sheets (e.g. the sky and the ocean). A further possibility
200 Multimodal Transcription and Text Analysis: Chapter 4

is offered by colours of volumes whereby the colour in question fills or appears to


fill a three-dimensional space e.g. translucent crystals (see Kanizsa, 1980: 210-3;
Gibson, 1986 [1979]: 31). The fact that this is no naturalistic representation of the
‘real’ sky and ocean further suggests its metaphorical displacement into the hyperreal
of the desired, the dream-like or the imaginary (Kress, Van Leeuwen, 1996: 166).
In the present transcription, Colour (CR) will be coded, with reference to the
Westpac logo in Shots 5, Row 17, Shot 9, Rows 24-25, Shot 15, Row 37, as follows.
[CR: red; surface; blue; film], where the terms ‘surface’ and ‘film’ designate specific
qualities of the colours in question as defined in the preceding paragraph. With ref-
erence to the logo in Shots 5, 9, and 15, the colours red and blue may be transcribed
as referring to a red which pertains to a clearly defined object (the logo) and to a
blue (the sky and ocean) which, in contrast, pertains to a more fluid-like, less clearly
defined, more diffuse phenomenon. Other factors not mentioned here may also be
included in the description of colour. It follows that the transcription can be
extended and modified accordingly.

4. 7. 7. Coding orientation
Bernstein’s (1990 [1981]) notion of coding orientation (CO) has been used by Kress
and Van Leeuwen (1996) to distinguish a number of different orientations to
‘reality’ in visual semiosis. They distinguish three main coding orientations – the
naturalistic, the sensory/sensual, and the hyperreal. These make different validity
claims with respect to the truthfulness or degree of correspondence to reality as
we normally perceive it in everyday perception. Thus, the coding orientation of a
visual text is related to the extent to which, and the way in which, it is abstracted away
from our everyday ecology of ambient visual perception. Different visual genres and
texts may exploit or even combine these possibilities in different ways.
The Westpac text is no exception to this. Generally speaking, advertisements
tend to prefer the saturated colours typical of the sensory/sensual coding orientation,
along with an appeal to the hyperreal world of our dreams, desires and fantasies. In
the latter, colours are less dense, less consistent, more misty. However, neither of
these excludes the naturalistic, and all may be used in varying ways in different parts
of the same text. In the naturalistic coding orientation, colour and other features are
deemed to correspond closely to our everyday perception of the world under nor-
mal conditions. Coding orientation is transcribed in this way: [CO: naturalistic], as
in the case of Shot 1, Rows 1-10. On the other hand, the logo (Shots 5, 9, 15) is
transcribed as follows: [CO: sensory; hyperreal ]. In this case, there are elements of
both orientations.

4. 7. 8. Visual focus or gaze of participants


The basic contrast is between participants who look directly at the viewer so as to
establish direct eye contact and those who do not (Goffman, 1985 [1976]: 62;
Coding orientation & Visual focus or gaze of participants 201

Kress, Van Leeuwen, 1996: 121-2). In this way, participants who look directly at the
viewer simulate an interactive relation with the viewer. This may be accompanied
by other kinesic features such as a smile, a wink of the eye, a sarcastic expression
on the face, knitted eyebrows, chin thrust forward and so on (see 4.8.2 , pp. 206-
209). The absence of direct eye contact correspondingly suggests the absence of
an interactive or interpersonal relation between viewer and textual participant. In
this case, textual participants are like third person participants and, hence, are seen
as not directly implicated in a dialogic relationship with the viewer. Figure 4.5 pro-
poses a system network for the system of gaze.
The Visual Focus (VF) of a given participant may also serve to establish a
gaze vector with another participant. This is the case at the end of Shot 6 (Row 20 ),
where father and son establish mutual contact in this way, i.e. by looking directly at
each other. The vector that links the two sets of eyes is easily discernible in this case.
Here, the function would be to support the affiliative bond of solidarity between
them. However, the primary purpose of the transcription is to establish on formal
grounds the nature of the participant’s gaze on the basis of several interacting vari-
ables. The first of these has to do with the specific focus of the participant’s gaze.
That is, to what is the gaze vector directed or extended ? Gaze vectors may extend to
the eyes of another participant, as described above. Alternatively, they may focus on
some other part of the other’s body or some aspect of their clothing.
Furthermore, the participant’s gaze may be directed to some aspect of the self,
such as the hands, in order to suggest self-involvement, self-enclosure, or submission
(see Goffman, 1985 [1976]: 65). This is the case of the boy in Shot 6, Row 18. Gaze
may also be directed to some object within the immediate purview of the
participant’s personal body space or, alternatively, to some more remote object out-
side this space. A participant’s gaze may also be disengaged from the immediate scene
in order to suggest either withdrawal of one’s participation or inner cognition analo-
gous to mental process verbs in language. The gaze vector of the participant may
also extend to some indeterminate point outside the visual field of the video screen in
order to suggest a monitoring function, a sense of readiness, or expectation.
Another variable is that of distance. In this case, the basic possibilities are
close, median, and far. The two sets of variables – focus or direction of gaze vector
and distance – will need to be accounted for in the transcription.
In the transcription, the various possibilities outlined above will be
annotated as presented in Figure 4.5. Visual focus will thus be transcribed as shown
with reference to Shot 1. Thus, [VF: distance: far; orientation: off-screen]. With
specific reference to Visual Frame/Row 10, this tells us that the gaze vector of the
herdsman extends off-screen to something indeterminate in the distance and not
seen by the viewer.
Two additional transcription conventions can also be noted with respect to
the visual frame (Column 2 ). With reference to any given shot, in some cases more
202 Multimodal Transcription and Text Analysis: Chapter 4

than one visual frame has been inserted in order to better illustrate a specific micro-
level development that would be inadequately presented by just one frame. Shot 21,
Visual Frames 47-50 is an example of this in that Row 47 contains two visual
frames. Finally, mention of salient colours such as red in the Westpac text are
colour coded in Column 3 in order to indicate that this feature constitutes a signif-
icant covariate tie in the overall texture of the text. Thus, such references to the
colour red are printed in red to highlight this aspect (see Appendix I ).

4. 8. Column 4: Kinesic action

4. 8. 1. The meaning of movement


Movement, like any other semiotic system, cannot be reduced to mere physical
criteria of the kind used by physicists to talk about motion. Human movement also
obeys perceptual laws and constraints which are not inherent in the laws of
physical nature per se. Instead, movement is perceived as a phenomenal experience
in the ecosocial environment that individuals inhabit. Movement is a foregrounded
feature of the Westpac text and, indeed, of many advertising texts. Two important
points need to be made from the outset.
First, the locomotory and gestural movements of the participants in the text are
integrated into the overall rhythmic structure of the text. This is achieved by the syn-
chronisation of rhythm in movement with the soundtrack, as well as by the trimming
of shots to facilitate this process (see 4.3, pp. 181-184). Secondly, the movements of
the participants in the depicted world of the text interact with the camera movements
which simulate the head-body movements of the observer (see 4.7.1, pp. 191-195).
Movement thus has a powerful indexical function. It both realises critically important
aspects of the depicted world of the textual participants at the same time that it
indexically enacts or models an emergent interactional text (see 4.11.7 , pp. 239-242).
It is by means of the latter that the nose-here perspective of the viewer is absorbed
into the horizon-there perspective of the depicted world in order to create the illu-
sion that the viewer is transported beyond the living room into the world of the text.
Movement is not the only resource for achieving this, though it is a key one in the
Westpac text. Other resources that are relevant here include those responsible for the
simulation of visual kinaesthesis, as discussed in 4.7.1 and annotated in Column 3.
In transcribing movement, it is not enough to say, for instance, that a given
participant walked or ran, waved her hand, and so on. Aside from the lack of
delicacy in such a description, this tells us nothing about the larger configuration
of semiotic relationships of which the movement is a part. A movement does not
occur sui generis, but is performed by a participant, perhaps in relation to some
other participant. The movement may be an initiating movement or a reactive one,
i.e., in response to another participant’s initiating movement. There are, therefore,
different categories of experiential participant roles involved in different types of
Column 4: Kinesic action 203

movement such as Actor; Action; Goal and Agent/Initiator; Action; Reactor and
so on. The first of these configurations designates a movement which is construed
as intentionally performed by an Actor – the performer of the movement – and
which is directed towards some other participant (the Goal). The second refers to
an Agent who performs a primary movement which causes or instigates a
secondary movement in the Reactor. In this ergative perspective, the focus is on
causality rather than on intentionality. This second possibility implicates a hierarchy
in which an Agent performs a higher-order movement which causally brings about
a lower-order movement in a second participant. Furthermore, the given move-
ment may entail the drawing near of one participant to another to the point where
prolonged contact and even conjunction of the participants may occur as they
form a new unity. On the other hand, a participant may move away from or distance
him- or herself from another participant. This may also entail the dissolution of
previously existing structures or, in other words, a relationship of disjunction
between previously conjoined participants. However, the analogies to experiential
clause grammar should not be carried too far for reasons that are discussed below.
The two conditions of CONJUNCTION and DISJUNCTION are best seen as the
two polar extremes of a topological region in which various combinations and
gradings of these possibilities may occur. Thus, two movements performed by two
distinct participants in the same visual-spatial field may be related to each other
along a number of different parameters:
(1) simultaneity – immediate succession – succession after interval;
(2) concord – discord (in direction and/or orientation);
(3) sameness or difference (of speed of movement);
(4) sameness – difference (of type of movement);
(5) contact – spatial separation (of movements and respective
participants).
Bodily movement is not simply a passive movement in the geometric space of
classical physics. Rather, it actively assumes and appropriates both space and time in
the service of its own projects (Merleau-Ponty, 1992 [1962]: 102). In other words it,
too, is a meaning-making resource. Merleau-Ponty further shows that movement is
capable of creating an abstract space above and beyond the concrete physical space in
which the movement takes place in space and time. This space is a virtual projection
by the agent of the movement whereby the resources of the body are deployed so as
to creatively enact meanings which are not physically present in concrete physical
space in the Newtonian sense. Thus, the agent’s body is the semiotic source of mean-
ings that are directed towards the other and which seek to engage the other.
Movement syntagms are organised on the basis of both their relationship to
their ecosocial environment in which they occur and which they help to constitute and
the spatial deployment of the body and/or body parts that perform the movement.
204 Multimodal Transcription and Text Analysis: Chapter 4

This implies some important differences in the way experiential meanings are realised
in movement as compared to the particulate or constituency-based organisation of
experiential meanings in language. In language, the participants, process and circum-
stances in the clause are analysed as parts (constituents) which have specific functions
or roles in the overall syntagmatic structure – e.g. the clause – which functions as a
semiotic construal of reality. Language users thereby draw on the experiential
resources of grammar to analyse the phenomena of experience into the participants,
the process and the circumstances that comprise the given phenomenon. In language,
clauses are experientially organised in terms of more central process-participant
relations based on constituency and more peripheral circumstances based on relations
of interdependency (see also Tesnière, 1965: 40-6; McGregor, 1997: 168-9).
In movement, the equivalent of the case markings which ground the
experiential meaning in a certain way are the following: the body or body part per-
forming the movement and whether this is an instigator or a reactor; the move-
ment performed; the body or object, etc. which instigates, initiates or reacts to
some other body or object; the spatial location of the movement; the directionality
and the orientation of the movement; the time of occurrence of the movement;
the duration of the movement. In movement, simultaneity and spatiality rather
than linear succession in time and particulateness (constituency) are important in
the realisation of experiential event and action configurations.
For example, in Shot 19 (Rows 43-44 ) the process (walking) and the
participant performing the process (the supervisor) and the circumstance of spatial
location (the work site) are not linearly segmented as in the clause: the supervisor
(Actor) walked (Process) through the work site (Circumstance). Instead, process,
participant and circumstance are conflated into a single indissoluble configuration
in the visual-spatial field of simultaneous relations unfolding in time. In this
topological visual-spatial field, relations of interdependency among the different
components of the experiential configuration are established on a different basis
from that of the linguistic semiotic of clause grammar. In Shot 19 location is a sta-
ble visual invariant which remains in the background. In this sense, it is a periph-
eral circumstantial function rather than a central process-participant function. The
question of the relationship between informational variants and invariants has to
do with the ways in which the visual field is (1) segmented into distinct objects and
(2) categorised or attributed with a meaning by the perceiver (Kanizsa, 1991: 113-
5). These two logically, though not temporally, distinct operations lend support to
the stratified nature of visual semiosis (Thibault, 1997a: 229-30). The supervisor in
Shot 19 is a dynamic variant; as a participant function it perturbs the invariant back-
ground by virtue of the fact that he moves. It is the syntagmatic bringing together
of diverse visual forms so as to constitute a determinate visual field which makes
this possible. That is, the structured relations among the various forms – variant
and invariant – in this field means that there are connections among them. On the
Column 4: Kinesic action 205

basis of such structural connections, functional semantic relations can be con-


strued. In clause grammar, the semantic function Circumstance, along with its
structural realisations as adverbial groups or prepositional phrases, is an optional
and peripheral element. In the visual image, circumstances may be in focus or out
of focus, backgrounded or foregrounded, and so on, so as to upgrade or down-
grade their importance in relation to the central process-participant functions.
A further important aspect concerns the spatial conditions in which the
movement occurs. Three variables are especially important here: the directionality of
the movement, the orientation of the movement relative to the viewer, and the posi-
tion of the participant(s) involved in the movement in the delimited optic array of
the video screen (central, peripheral, left, right, and so on) both at the initiation of
the movement and at its conclusion. If we take the case of the herdsman in Shot 1
as an example, the directionality of the movement is forwards, towards the viewer,
as the herdsman moves from the horizon-there perspective to the nose-here perspec-
tive of the viewer. This shot is the prototype of a pattern which is a foregrounded
feature of the Westpac text. Further, the herdsman is oriented to the viewer in a
particular way. That is, he draws near the viewer in a potential relationship of con-
tact or conjunction whereby the two perspectives are integrated or at least brought
into proximity to each other. The herdsman is also positioned centrally rather than
peripherally throughout the duration of the movement sequence.
Now, it would be naive to presume that all this reduces to a question of the
indexical orientation of the viewer to the depicted world in terms of a virtual simu-
lation of his or her location in the physical reality of the depicted world, as if this
has some existence independently of the interaction that the text itself constitutes
– see Thibault (1997b, 2003) for further discussion of indexical and intertextual
meaning-making practices. Rather, the two dimensions dialectically enact and
create this reality at the same time that they invoke and bring into play wider
systems of intertextual thematic meanings and their associated value systems
(Lemke, 1988) which go beyond the immediate interaction between text and
viewer. In this case, a prototypical scene of the Australian outback, which in actual
fact rarely encroaches directly on the daily life of urban Australians, serves as a
cultural presupposable (Silverstein, 1992: 69) that is introduced into the interaction
above and beyond the requirements of depiction and of spatiotemporal orientation.
It serves to put into play a whole system of cultural values that are the ultimate
source of the text’s coherence, whatever the depicted scenes in the specific shots.
It is not clear whether there exists in movement a distinction between the
grammatical system of mood and the various speech functions or illocutionary forces
that a given movement may realise in human social meaning making. However, it is
clear that movement, like language, can be modulated or deformed so as to modify its
interactional status and hence its axiological stance on either the experiential meaning
of the movement or the referent situation to which this connects. That is, movement
206 Multimodal Transcription and Text Analysis: Chapter 4

has interpersonal meaning insofar as it can be deployed to interact dialogically with


other agents – human and non-human – in the social world. However, it is not only
the movement itself which is interpersonally modified. Interpersonal modification
entails a dialogic orientation to the world – that is, the organisation of the world in
terms of a self and the other (Merleau-Ponty, 1992 [1962]: 111).

4. 8. 2. Interpersonal modification of movement


In the case of movement, the principal ways in which a given sequence may be
interpersonally modified are as follows. First, movement may constitute a
proposition or a proposal about the world. Propositions can be asserted, believed, dis-
believed, denied and so on. Proposals, however, do not have this status, as they have
to do with desired or proposed changes to the world that have not yet been
actualised at the moment of their occurrence (cf. commands in the linguistic
semiotic), rather than to assertions or claims about an actual state of affairs (see
also McGregor, 1997: 211). Proposals are concerned with projected courses of
action for bringing about desired change in the world. In the Westpac
advertisement, an example of a movement proposition is the action of the bricklayer
in Shot 13 (Rows 31-3) when he stands up, places the brick in its place in the wall,
and then taps it with his trowel. From the viewer’s perspective, this movement
sequence can thus been construed as a truthful or naturalistic depiction of the brick-
layer’s activity. A movement which has the status of a proposal, on the other hand,
is given in Shot 1, where the herdsman twice slaps his thighs while bending down
in order to command or beckon his dog to return to his side. Other movements are
not so clear in status. The use of the gestural emblem rolling the sleeves up falls into
this category. In the text, it appears to be indeterminate between the two
possibilities. On the one hand, it can be seen as an affirmation of the given
participant’s readiness to get on with the job (proposition); on the other, it fits into the
pattern of exhortation insofar as the text as a whole works to persuade Australians
to roll their sleeves up and work harder for the benefit of the nation (proposal).
Movement propositions and proposals, like their linguistic counterparts, entail
a specific semiotic project in the world. When Merleau-Ponty (1992 [1962]: 112) says
that such projects polarise the world, he appears to be referring to the way they
assume a dialogic orientation to some other, and that the movement thus constitutes
an interpersonal orientation to the other in ways that are not unlike propositions and
proposals in language. Such a dialogic orientation presupposes that the world is
organised and understood as a relationship between self and non-self. Merleau-
Ponty eloquently points out that movements in the service of such projects, as
distinct from physiological processes per se, are intentional activities which function
to ‘mark out boundaries and directions in the given world, to establish lines of force,
to keep perspectives in view, in a word to organise the given world in accordance
with the projects of the present moment’ (Merleau-Ponty, 1992 [1962]: 112).
Interpersonal modification of movement 207

Secondly, movement may describe or designate some situation and evaluate this
situation by adopting a particular interpersonal orientation towards it, e.g. parody, dis-
gust, pleasure, disapproval and so on. For example, Shot 4 (Rows 15-6 ) shows the
nurse walking briskly towards the viewer. The interpersonal orientation may be
described as that of commitment and seriousness with respect to the task to hand (see
also Birdwhistell’s (1972 [1961]: 96) category of self-possession/self-containment).
Thirdly, movement may indicate the performer’s affective disposition not
only to the specific action that is being performed, but it may also index a more
general emotional state or ‘state of mind’. In the Westpac text, the movements of
the participants may be seen as indexing enthusiasm, willingness and confidence in
what they are doing.
Fourthly, a social agent may perform a given movement in order to represent
or otherwise recontextualise some action that was performed by another. The use
of movement in this sense is analogous to quoting and reporting speech and may
turn out to be a better way of explaining what people do when they use movement
to imitate the actions of others in their own discourse (Inset 9: Projection, p. 101).
No examples of this function of movement are attested in the Westpac text.
Fifthly, a movement may be evaluated according to whether it is performed
naturally, awkwardly, artificially, stiltedly, appropriately, gracefully and so on.
Analogous to the notion of vocal registers of singing and speaking (see 4.9.9, pp.
218-219), such variations in the kinesic dynamics of movement may be seen as
different movement registers. In the Westpac text, the participants express move-
ment registers of naturalness and appropriateness with respect to the circumstance
in which the movement occurs and/or the interpersonal orientation of the
participant to some other, such as another participant or the viewer.
Interpersonal modification of movement entails the shaping or deforming of
the movement according to the meaning it has in a specific interactional context.
Specific corporeal schemas, which are highly abstract in character and which have
a neurological basis, would appear to lie at the basis of this. Such schemas have a
predictive function which enables the individual to orient and adapt his or her bodily
movements to specific semiotic and material circumstances. The kinds of
interpersonal meanings adumbrated above cannot be reduced to the schema per se.
Rather, the schema is activated in a specific context in relation to other features –
other semiotic modalities, the addressee, selected aspects of the material world and
so on – all of which function contextually to ground the schema in meaningful
ways and hence to deform the individual’s body according to a specific
interpersonal orientation. This means that the schema is a kind of embodied move-
ment grammar of a very abstract kind that can be modified so as to produce
particular contextual meanings. The deformation and shaping of bodily movement
is far from limited to human beings, but is also shown in the different ways in
which dogs and other animals deform their bodies according to varying interac-
208 Multimodal Transcription and Text Analysis: Chapter 4

tional contexts such as aggression, courtship, receiving affection, obeying, retreat-


ing from danger and so on (see Darwin, 1955 [1872]).
There are three main ways in which movement can be interpersonally modi-
fied. First, movements can be modified by the visual-spatial equivalent of a prosodic
contour which extends over the entire movement configuration or some part of it.
Some of the body resources which can function as interpersonal operators that
modify the movement or some part of it are the head (shaking, nodding), eyes
(object orientation, winking, blinking, closed), nose (wrinkled, bunn, nostrils flared,
nostrils compressed), cheeks (lax, puffed, drawn in, wrinkled), mouth (lax, down,
smiling, lips thinned), chin (lax, set, thrust forward), eyebrows (lax, furrowed, raised,
knit). For reasons of space we will restrict ourselves to the above list, which is con-
fined to the head. In Shot 2 (Row 11-12 ) there is an experiential configuration of the
draughtswoman seated at her desk as she rolls her sleeves up. In this sequence, her
chin is thrust forward in a way which prosodically modifies the action sequence here.
That is, this kinesic prosody, which is the result of increased muscle tension in the
face, indexes her attitudinal stance to the activity she is about to perform. In this case,
this would be one of willingness and determination to get on with the job. In the
Westpac text, smiling is the most common facially realised kinesic prosody, and in all
cases certainly indexes the participant’s commitment to the job as well as his or her
solidarity with their clients, viewers and so on. Examples include Shot 4, Row 15 and
Shot 22, Row 51.
Secondly, movements can be modified according to the principle of force. Beat
movements and gestures may have this function. McNeill (1992: 15) describes beats
as small baton-like movements whose form remains constant whatever the
experiential content of the discourse is. Beats can interpersonally modify the
discourse by, for instance, signalling the nuclear accent as the one which is produced
with increased force or emphasis with respect to others in the same overall movement
pulse. Thirdly, movements can be interpersonally modified on the basis of what
Poynton (1985: 79-80) calls amplification. In the case of movement, a given move-
ment or part thereof can be articulated with decreased or increased speed, or by
repetition of the same movement, or by some form of embellishment of the basic
movement so as to invest it with heightened subjective commitment or intensity.
Within a given movement configuration, there are also specific modalities of
sequencing and connection of movements, involving varying types of dependency
and the nesting of one movement within another. Important here may be the degree
of the temporal interval between one movement and another. The relevant question
is the following: is there a prolonged interval between one movement and another,
or does one movement begin immediately on completion of another? The spatial
character of movement also means that aspects of the movement index or otherwise
locate in space and orient to or otherwise make relevant selective aspects of the
physical environment or even non-physical abstract objects that are indexically
Column 5: The soundtrack 209

created by the movement and built into its overall texture. A given movement
sequence is also structured in terms of peaks of prominence as well as various types
of boundary phenomena. There are, in this sense, onset phenomena, movement focus
phenomena, offset phenomena, and inter-movement phenomena which have a textual
function in demarcating the beginning-middle-end structure of the movement sequence
as a wave with peaks of prominence alternating with less prominent phases.

4. 8. 3. General observations on the notation of movement


In transcribing movement, the following notational conventions will be used. A
sequence of items in square brackets, [ …], designates a series of actions or move-
ments which occur simultaneously, or which are in some way nested the one within
the other. Each separate movement is distinguished from the others in the same over-
all configuration by a semi-colon. Shot 18, Row 42 illustrates this principle. Thus, in
Column 4 (Row 42 ) the first mention of the carpenter rolling his sleeves up is not
placed in square brackets because there is no simultaneous kinesic act of any rele-
vance to the transcription. In the visual frame which follows, however, the two simul-
taneous kinesic acts of rolling the sleeves up and smiling are here placed in square
brackets in accordance with the notational convention adopted. A similar observation
can be made in relation to Row 24, which, however, bridges between two shots, i.e.
Shot 8 and Shot 9. In connection with the former, the simultaneity of the two kinesic
acts – the supervisor’s walking and his smiling – are placed in square brackets in
Column 4. On the other hand, the reference to the movement of the logo in Shot 9 is
not bracketed as there is no other kinesic act with which it is connected in this shot.
Round brackets, viz. ( ), are used in Column 4, relative to a given visual frame,
to indicate a sequence of movements in time. Shot 14, Row 35 is an example of this.
In this case, the first kinesic act – bringing more dishes to the dishwasher – is followed
by the dishwasher’s response to this, i.e. he rolls his sleeves up. Thus, the second act
follows the first in a temporal sequence. The carat sign ‘^’ is used here to indicate that
the two movements referred to stand in a dialogic relationship to each other, the one
following the other, as in Command ^ Compliance. An example of this kind of struc-
ture is the dog’s response to the shepherd’s command – his slapping his thigh – to
return to the shepherd’s side, as shown in Shot 1 (Rows 2-3) and notated accordingly
in Column 4. A further example is Shot 14, Row 35, as described above. The use of
the carat sign in the present transcription is not restricted to the kinesic dimension
in Column 4, but is used to specify any kind of dialogic relation between units,
irrespective of semiotic modality.

4. 9. Column 5: The soundtrack

4. 9. 1. Integrating auditory phenomena


In this column, speech, music, and other sounds are brought together on the
assumption that they all have characteristics in common providing a basis for talk-
210 Multimodal Transcription and Text Analysis: Chapter 4

ing about them and transcribing them in a unified way rather than as entirely sepa-
rate phenomena. The guiding assumption is that the acoustic flux of the soundtrack
is a perceptual continuum constituting a delimited auditory array, which, however, lis-
teners can analyse or parse into different components of information that tell us
about a given source. The soundtrack is delimited rather than ambient because it
derives from a specific point source coming from a particular direction – the loud-
speakers, say – rather than the ambient auditory array which surrounds us and comes
from all directions as we move through and orient to a natural or urban environment.

4. 9. 2. Sound acts and sound events


Each source and the component of the acoustic flux that corresponds to it is a
specific event in the overall array. Further, the assigning of different parts of the
acoustic wave to different informational sources entails that listeners construe
meaningful relationships among different sources (Handel, 1993 [1989]: 185-6;
Echard, 1996). The meaning that we construe in these informational sources and
the relations among them are not reducible to the kinds of phenomena studied by
acoustic physics. Aside from subjective aspects of perception, social practices and
cultural values also play their part in shaping how we perceive acoustic phenomena.
The starting point is that acoustic information can be resolved into different kinds
of auditory objects and events (Echard, 1996: 9). The principal reason for this lies
in the way in which the various sounds and types of sound which constitute the
soundtrack may stand in relations of coarticulation both among themselves as well
as in relation to the listener. The construal of acoustic information as different
classes of acoustic objects, actions and events suggests an experiential dimension.
By the same token, different sound events may be seen as interacting with or
otherwise orienting to and evaluating other events in the auditory or some other
modality. For example, in Shots 18 and 19 (Rows 42-3 ), the male speaker utters the
non-finite clause backing it with money. As a non-finite clause, it is tenseless and can-
not, therefore, function as a proposition that can be argued about with respect to its
truth status, accuracy, and so on. The present example is part of a clause complex.
In relation to the clause complex, its main function is to extend the meaning of the
prior constituent we see our job as backing that way of thinking (Column 5, Rows 37-9 ).
This is an intra-semiotic relation. However, the non-finite clause in question here is
also copatterned in relation to the visual semiotic in Shots 18 and 19. In this case, the
overall linguistic thematic is concerned with the kinds of services that Westpac pro-
vides the community, of which money is a specific component. The two scenes of,
respectively, the carpenter on a home construction site (Shot 18, Row 42 ) and the
supervisor on the industrial site (Shot 19, Row 43 ) are specific instances of this
theme, insofar as Westpac provides capital for both ordinary Australians and entre-
preneurs to realise their aspirations in the form of financial support for home con-
struction and business development. Thus, the inter-semiotic relationship which is
Sound acts and sound events & Dialogic relations among sound events 211

here established between the linguistic and the visual construes a positive evaluative
orientation of the one in relation to the other. This suggests an interpersonal
dimension to such events and serves to orient the viewer’s own evaluation of this
relationship and its implications for him or her.
Finally, acoustic events may form parts of larger wholes on the basis of
relations of foregrounding, backgrounding, spatial location, distance from the
listener, relations of dependency with other events, and so on (see Inset 16 on the
following page). They therefore have properties of textuality as part of a larger
Gestalt to which they belong. An example of this is the relationship between male
speaker and musical accompaniment in relation to Shots 13 to 23 (Rows 31-53 ). In
this long sequence, the instrumental music accompanies the male speaker. The latter
is always more prominent, whereas the former is backgrounded so as to perform a
supportive role rather than a primary or dominant one. In this sequence, the music
continuously accompanies the speaker and thus provides one important principle of
textual cohesion in this sequence. A further striking feature of the Westpac text is that
with the exception of the sounds of the sheep at the beginning, the listener does not
hear any of the ambient sounds which are typical of the various work sites, street
scenes, and so on that are depicted in the visual track. This suggests that the various
scenes which are depicted in the visual track are recontextualised (see Insets 1, p. 2 and
17, p. 213) along both the acoustic dimension as well as along the visual and linguistic
dimensions.

4. 9. 3. Dialogic relations among sound events


The notion of coarticulation tells us that different sounds do not simply occur, dis-
playing their own specific qualities. Rather, they dialogically interact with other sounds
on the soundtrack at the same time that they may constitute a dialogic relationship
with the listener. Different sounds give voice to different social meanings and social
positionings in ways that recall Mikhail Bakhtin’s (1973 [1929]) proposals concerning
the multi-voiced or polyphonic character of linguistic texts. This is evident in the very
first part of the Westpac text. In any case, the original musical basis of Bakhtin’s
notion is telling. In Shot 1, the soundtrack starts with a very low, soft dialogue
between the sounds of the distant sheep and the keyboard – a piano – playing a repet-
itive pattern. The interaction between these two auditory voices establishes the imme-
diate context, which is that of the great Australian outback, as indexed by the sounds
of the sheep, and the lone voice of the sheep herdsman, as represented by the key-
board. The subdued acoustic quality of this dialogue is offset by the vastness of the
surrounding natural environment. Thus, the acoustic dialogue here plays its part, so
to speak, in the indexing of a whole system of symbolic values deriving ultimately
from Australia’s colonial past. In collaboration with the visual track, the soundtrack
functions to invoke these values for the listener. On a beat of the drum, the female
chorus begins singing roll them. At first, the volume is very soft and the tempo very
212 Multimodal Transcription and Text Analysis: Chapter 4

slow, gradually increasing in volume and tempo as Phase 1 progresses. The female
chorus directly addresses the listener in a way that the dialogue between the sheep
sounds and the keyboard does not. This is reinforced both by the rising, crescendo-
like development of the chorus, as well as the imperative mood of the sung text. It
is significant that the appearance of the chorus is cued in by the drum beat.
Moreover, the crescendo-like development of the chorus, along with its rep-
etition and increasing tempo, as it sings Roll them, roll them, roll them up, ensures
that the chorus quickly becomes the dominant sound voice for the remainder of
Shot 1. The initial drum beat would appear to have two functions in relation to this.
First, in contrast to the keyboard, the drum stands for power and dynamism in
contrast to the lonely and introspective quality of the dialogue between sheep
sounds and keyboard. This is after all what the chorus and its sung text is all about.
Secondly, the drum beat is also the initial cue for the instrumental accompaniment
which underlies and supports the chorus throughout the remainder of Shot 1.
This sets up a relationship between the dominant sound voice of the chorus
and the non-dominant voice of the instrumental accompaniment that continues

Inset 16: Perspective in sound: Van Leeuwen on Figure, Ground and Field

� Sound events are hierarchically related to each other. In the soundtrack of a film,
different elements in the overall event are organised into subgroups which are
coarticulated in relation to each other. Some sounds are in the foreground and are focal;
others are in the background; others are somewhere in between these possibilities. Van
Leeuwen (1999: 23) refers to the hierarchical grouping of sounds and their coarticula-
tion in terms of a three-way distinction between Figure, Ground and Field.
� The Figure is the sound which is the focus of interest. It is the sound or sound group
that is treated as the most salient and which the listener is most required to engage
with or attend to. The Figure tends to stand out against both the Ground and the
Field. The Ground functions as the setting or context; it is a minor, non-salient
component of the listener’s social world, though the listener is still able to orient to
it and to evaluate it as interpersonally significant. The Field is the physical place
where the observation takes place. The Field consists of sounds which index or in
some way characterise the soundscape of the listener, though the listener is not
expected to orient to them or to take up a particular evaluative stance on them.
� The above distinction also shows that sounds that are heard simultaneously are parsed
into groups and related to each other hierarchically as different sound events with
different locations and sources as well as different degrees of salience and relevance to
the listener. This further implies that sounds can be coarticulated in relation to each
other in some overall soundscape in, for example, the soundtrack of a television
advertisement.
� In Phase 1 of the Mitsubishi Carisma advertisement, the voices of the man and
woman are the Figure; the orchestra is the Ground. The sound of the telephone box
door closing at the end of this phase is the only example of a sound event which
functions as Field in this phase (see 1.6.1, pp. 51-54).
Dialogic relations among sound events, Inset 16 & Inset 17 213

throughout Phase 1a (Column 5, Rows 4-10 ). In other words, the mixing of the
sounds of the musical instrumentation and the chorus is done in such a way as to
ensure that the chorus is always the dominant voice on account of its relative loud-
ness with respect to the musical accompaniment. This is so at all stages including
the very soft and slow way in which the chorus starts, only to become considerably
louder and quicker towards the end of Phase 1a. What is the significance of this?
The quiet, almost mystical, way in which the chorus begins singing suggests a
prayer-like communion with one’s surroundings – cf. the natural landscape in the
visual track – or a mystical union with nature. Gradually, this gives way to the
quicker, more rhythmic character of the choral singing when the initial roll them, roll
them is expanded to the complete clause roll them up. The chorus is an all female one
and the individual singing voices are highly blended to produce a markedly homo-
geneous or unified sound quality. In such cases, the individual differences in pitch,
rhythm, voice dynamics and so on, of the individual members of the chorus tend
to be attracted to an average pitch whereby the individual differences are minimised
in the service of orchestral or choral unity (Schoenberg, 1975: 151).

Inset 17: Recontextualising social practices

� The purpose of this inset is to draw attention to the ways in which a text
recontextualises material social practices and activities from the social world known
to television viewers or website users. In each case, the initial practice in the social
world is inserted into, and recontextualised by, another set of practices. Most of the
scenes in the Westpac advertisement, for example, refer to typical social practices in
the working world of Australians. For instance, bakers typically make bread and sell
it to the public. However, the advertisement recontextualises the practices of the
baker and combines these with many other such recontextualisations.
� Most importantly, it does so in ways which transform the original social practices in
accordance with the goals and values of the recontextualising practices of advertis-
ing agents and their clients (e.g. the Westpac Banking Corporation ). The text is not so
much concerned with what bakers, nurses, bricklayers do, but with a very different
strategy in which the often diverse and conflicting social viewpoints and values which
are represented by the many different categories of participant in a given community
(young, old, male, female, workers, bosses, rural, city, religious, secular, corporate,
individual, etc.) are removed and transformed into a single viewpoint which elimi-
nates or downplays the differences among them. In the Westpac advertisement, this
has to do with the manufacturing of consent about two main issues:

(1) ideologically justifying the new merger of banks, which in 1983 led to the
creation of the Westpac corporation in Australia;
(2) constructing a corporate national identity founded on a work ethic and on cer-
tain national myths and archetypes.
214 Multimodal Transcription and Text Analysis: Chapter 4

All this is significant to the meaning of the text. The quasi-mystical start to
the chorus, along with its slow crescendo-like development, suggests a gradual
emerging from one’s individuality as this is harmonised with the wider social world
in the service of a larger actional project which is meant to involve all Australians.
The chorus is the dominant sound voice here because it is that which directly enters
into a dialogic relationship with the listener, exhorting him or her to be part of this
wider project. In the transcription of the soundtrack, no attempt is made to use
musical notation. There are three reasons for this. First, aside from the problems of
accessibility for those who do not read music, there is also the important question
of finding common ground that can be adapted to all of the various components
of the soundtrack – speech, music, other sounds. Secondly, the transcription is
interested in revealing the semiotic integration of different acoustic phenomena.
Thirdly, it is important to preserve the criterion of computer retrievability.

4. 9. 4. A brief comment on the notation of the soundtrack


In order to distinguish music, speech and other sounds in a suitably retrievable
form, the notational conventions shown in Figure 4. 6 will be adopted. These nota-
tions will be inserted at the beginning of the relevant stretch of the soundtrack. In
the case of song and speech, the linguistic text will be reproduced in standard
written orthography (see also Gumperz, Berenz, 1993: 96). Square brackets are
used here and elsewhere in the transcription to designate a specific configuration
or cluster of interrelated symbols. This means that the square brackets have a meta-
semiotic status in the transcription. For example, the fourth symbol relating to the
female chorus, when inserted at the appropriate point in Column 5, informs the
reader that the musical text so designated is sung by a female soloist. These and
other symbols also constitute a basis for easy computer retrievability. Thus, if the
analyst requires examples of female soloists in this or other similarly transcribed
texts in some corpus, he or she can use the search facilities provided by Microsoft
Word to locate such moments in this and other texts. Clearly, the above notational
conventions can be extended and further modified by other researchers in this area.

4. 9. 5. The rhythm of sound events


In the Westpac advertisement, there are no instances of synchronous dialogue
between two or more participants. As a number of researchers have shown (Auer,
1992; Couper-Kuhlen, 1992; McNeill, 1992), in spontaneous dialogue such factors
as the speech and gestural rhythms of the various participants often attain a high
degree of isochrony. The Westpac text is a very different kind of multimodal text
from spontaneous dialogue and there is a high degree of postproduction adjust-
ment to the natural rhythms of movement, and so on. However, the essential point
does not change. That is, music, speech and movement are also highly synchro-
nised in this kind of text and the transcription should endeavour to reveal this. It
A brief comment on the notation of the soundtrack & The rhythm of sound events 215

is important that multimodal text transcription show how meaningful units of the
text are chunked and, hence, recognised by observers on the basis of their organi-
sation into rhythmic units, the integration of such units into still higher units, and
the transition points or the boundaries between units. In multimodal transcription,
the emphasis is not on speech or other sources of rhythm per se, but rather on the
multimodal integration of different sources of rhythm in a given text. The fact of
their integration does not change the important point that a particular rhythmic
source may be dominant. Moreover, a number of researchers have independently
shown that there is a good deal of common ground in the organisational principles
which subtend rhythm in, say, speech and gesture.
For example, McNeill (1992: 85) draws attention to the parallels between the
hierarchy of units that comprises the phonological structure of a given language and
an analogous hierarchy of units in the kinesic structure of gesture. This is hardly sur-
prising given that the basis of their synchronisation in discourse lies in the sensori-
motor activities of the body and its natural rhythms. Furthermore, both kinds of
body rhythm are experienced as movement. Abercrombie (1967: 97) talks about how
both speakers and listeners enter into reciprocally felt phonetic empathy on the basis of

1 [ ] = instrumental music (e.g. Row 2, Column 5 )

2 [ ] = female soloist (e.g. Row 17, Column 5 )

3 [ ] = male soloist (not attested here)

4 [ chorus] = female chorus (e.g. Row 16, Column 5 )

5 [ chorus] = male chorus (not attested here)

6 [ ] = female speaker (not attested here)

7 [ ] = male speaker (e.g. Row 31, Column 5 )

= source of spoken voice off-screen, not shown in depicted world (e.g. Row 31,
8 [ ] Column 5 ff); the symbol is used at the start and end of the sequence, in this
case Row 31 and Row 57, respectively
= other non-speech or non-musical sounds, including silence, followed by a brief
[ sheep] verbal specification of the specific sound: in the present example, the sun sym-
9
bol followed by the word ‘sheep’ designates the (non-linguistic/ non-musical)
sounds of the sheep, as in e.g. Row 2, Column 5
= silence other than rhythmic pause or juncture in speech and/or music (e.g. Row 1,
10 [ silence]
Column 5 )

= continuation of previous, as for example when sung or spoken text is stretched


over more than one visual frame or shot (e.g. Row 3, Column 3 and Column 5 ). As
11
the example here shows, this notational symbol is not specific to the soundtrack
and is used in other columns. In all cases, it has the same significance

Figure 4.6: Notational conventions used in the transcription of the soundtrack


216 Multimodal Transcription and Text Analysis: Chapter 4

the speech rhythms they experience as embodied movements. Thus, listeners extract
information about the speaker’s articulatory movements from the speech sounds
which they hear and, on this basis, are able to enter into a relation of felt rhythmic
empathy with the speaker. This empathy is an important, if largely intuitive, con-
tributing factor to the synchronisation of speaker and listener in spoken interaction
(see Gumperz, Berenz, 1993: 106). On the basis of such observations, rhythmic units,
the transitions between these and related phenomena such as rhythmic accents in
speech, music and bodily action will be transcribed on the basis of a single notation.

4. 9. 6. Accented rhythmic units


Within a given stretch of speech, music or movement, accented syllables, musical
notes or kinesic units such as gesture strokes contrast with unaccented ones and con-
tribute to the overall shape of the rhythmic unit, along with a number of other inter-
acting factors such as volume, pitch or vowel lengthening in speech and song, and
force of gesture stroke, duration of stroke and so on in the case of kinesic movement.
In the transcription, accented units are indicated by a single asterisk in round brackets,
(*), before the syllable or other unit in question; extra prominence is indicated by two
asterisks in round brackets, again before the given unit, (**). Here the conventions
proposed by Gumperz and Berenz (1993: 106) are adopted and extended to include
the kinds of non-linguistic units mentioned above. Round brackets rather than square
brackets are used here to specify that a single metasemiotic symbol is featured, as
distinct from the configurations of symbols enclosed in square brackets (see 4.9.4 , p.
214). An example of this kind of symbol occurs in Row 4, Column 5 so as to indicate
the accented rhythmic unit that occurs on the sung syllable roll.

4. 9. 7. Rhythm groups
Van Leeuwen (1985: 225) points out that a given sequence of accented and
unaccented units is organised into a higher-order unit on the basis of a perceived
rhythmic regularity within the sequence. When this regularity is perceived to be per-
turbed by a pause or slowing down, then the given movement sequence is felt to
come to an end. On this basis, it is possible to establish what Van Leeuwen calls
rhythm groups and the boundaries or transitions between these. In the present
transcription, such boundaries will be indicated by a double-slash followed by a spec-
ification of the type of transition (e.g. pause, change in tempo) in the following man-
ner: [//PAUSE], [//SLOW], and so on, where the double-slash indicates a boundary
between rhythm groups and where the linguistic gloss in upper case subcategorises
this according to type of boundary, i.e. pause, slowing of tempo, etc. An example of
this notation may be seen in Row 43, Column 5, where the sign [//(#)] indicates a
pause or juncture after the word money. The most prominent rhythmic unit – cf. the
nuclear accent in the tradition of phonological analysis (Crystal 1972: 111; 1982: 11)
Accented rhythmic units, Rhythm groups & Degree of loudness 217

– is the nuclear accent and constitutes the nucleus of the rhythmic group. Nuclear
accent is shown in the transcription by placing the following notation before the unit
in question. Thus: (NA) as in Rows 32, 34, 36, and 38.
In Column 5, rhythm groups are indicated by heading the group in question
with the following notational symbol: {RG}, which is placed before any other nota-
tional symbols at the beginning of the group in question. Two examples in Column
5 are to be found in Rows 25 and 54. In turn, rhythm groups are, generally speaking,
integrated into still higher-order units which tend to correspond to the subphases and
the phases of the text. In the Westpac text, the soundtrack plays a critically important
role in specifying the shifts from one subphase or phase to another in the text. For
example, Phase 1c is the culmination of a wave-like development that has gradually
developed in Phase 1 as a whole on the basis of a number of rhythm cycles or
periodicities in the soundtrack (see 4.9, pp. 209-222).
In the final subphase – Phase 1c – the full clause roll them up is sung twice; the
tempo is fast and the volume loud. With respect to the previous singing of the chorus,
this is a distinct shift in the text’s dynamics. This shift coincides with Shot 4, Rows 15-
6. In relation to the visual text, Shot 4 could be seen as a continuation and further
microlevel development of the visual thematics of workers on the job, as also shown
in Shots 2 and 3.
However, it is the marked shift in the dynamics of the soundtrack which justifies
the analytical decision to see Shot 4 as belonging to a new subphase, rather than as
continuing the previous one. This also illustrates how the codeployment of different
semiotic resources provides principles of both continuity and change in textual
dynamics. In this case, the visual thematics is based on continuity, whereas the change
in the rhythm in the soundtrack in Phase 1c corresponds to criteria of change.

4. 9. 8. Degree of loudness
Degree of loudness has to do with what Abercrombie (1967: 95) calls, with reference
to speech, the degree of force with which air is expelled from the lungs during phona-
tion. While loudness is clearly a relative notion in the sense that different speakers of
the same language and even speakers of different languages may have a typical range
which is characteristic of that speaker, it is possible to notate degree of loudness in
the transcription by getting a feel for the overall volume range of a given speaker,
singer, musical performance and so on. It is also possible to postulate a continuum
of possibilities ranging, for example, from sub-vocalising, whispering, speaking soft-
ly, speaking normally, speaking loudly, shouting, yelling and screaming. Furthermore,
volume can be controlled by electronic and mechanical means so as to produce the
desired effect in a specific context. Abercrombie’s claim that loudness has ‘little
linguistic importance’ (1967: 96) can therefore be questioned. That is, we need to
reconstitute such notions within a much more embodied notion of what linguistic
and other modalities of meaning-making are and how they function in context.
218 Multimodal Transcription and Text Analysis: Chapter 4

Instead of assigning fixed meanings or values to various degrees of loudness,


it seems more appropriate to consider loudness as a multifunctional variable that can
have different values under different organismic and contextual constraints. In
multimodal texts such as the present example, the relative loudness of different
acoustic modalities in the same subphase or phase is an important factor. For example,
the interaction between the male speaker and the musical accompaniment is, in part,
based on the relative loudness of the two auditory modalities. In this case, the music
remains in the background at a relatively subdued volume and does not compete with
or usurp the primary role which the speaking voice has here. In this case, it is the
speaking voice which has the foregrounded or more dominant role in the soundtrack.
The music, in accompanying the speaking voice, supports it at the same time that it
provides one basis for textual continuity whereby this phase – with the male speaker
– is linked to the earlier phases featuring the chorus and the female soloist. These ear-
lier phases were also accompanied by the same music and this is a source of textual
coherence in tying different phases to each other. Degree of loudness can be tran-
scribed as follows: (pp) = very soft; (p) = soft; (n) = normal; (f) = loud; (ff) = very loud,
on analogy with the Italian terms piano (‘soft’) and forte (‘loud’) common to musical
terminology (Gumperz, Berenz, 1993: 108). In Row 16, Column 5, for example, the
notation [Volume: ff] here indicates that the chorus is singing very loudly.

4. 9. 9. Duration of syllable, musical note, sound event


Syllables, musical notes, and other sound events may be lengthened beyond the
requirements of the structural patterning in which the given element occurs. Again,
this may be treated as a question of perceptual and contextual judgement rather
than of objective criteria of measurement. Furthermore, a general notational gloss
would, on this basis, appear to be more appropriate than one which seeks to
explain, for example, lengthened musical notes on the basis of a specifically
musical explanation. Lengthening may thus be seen as a resource for indicating the
salience of a given element with respect to other elements with which it co-occurs.
In the transcription, lengthened elements will be indicated by placing a
double exclamation mark in round brackets immediately after the lengthened ele-
ment. For example, in the female soloist’s singing, the first syllable of the word
moving is lengthened, as shown here: mov(!!)ing (Row 26, Column 5). This is a very
clear case of salience, reinforced by the fact that the lengthened syllable coincides
with the appearance of the schoolgirl at her desk in Row 26. The cut to the shot of
the girl occurs very precisely on the utterance of the first syllable of moving in the
song. It is no coincidence, of course, that the girl leans forward towards the viewer
on the singing of this syllable. Thus, we see here how three different semiotic
modalities – the girl’s body movement, the lexicogrammatical meaning of the
word, and duration of the sound – all copattern to foreground the specific
meaning which this cluster of variables produces in context.
Duration of syllable, musical note, sound event & Tempo 219

Moreover, lengthening, as in the present example, may be more than a mat-


ter of salience per se. As an auditory gesture, the sound in question is not only
lengthened but also has high pitch and slow tempo. It is the combination of pitch,
tempo, and lengthening which probably most bears on the contribution that this ele-
ment makes to the meaning of the specific part of the text in which it occurs. Over
the whole word in question – moving – the directionality of the pitch movement is
rise-fall, i.e. rising on the first syllable and falling slightly on the second, thereby
conveying a sense of urging and definiteness in the lead singer’s voice.
As an auditory gesture, this suggests how the sound itself is a movement
which the listener, too, bodily experiences and orients to as such on the basis of
shared patterns of auditory kinaesthesis that link singer and listener in an interactive
relation. This further suggests that auditory gestures, like manual-brachial gestures,
may iconically construe some aspect of the action or event that is referred to.

4. 9. 10. Tempo
In speech, tempo refers to ‘rate of syllable-succession’ and has to do with the num-
ber of syllables per chest-pulse, also called breath-pulse or syllable-pulse (Abercrombie,
1967: 96). These pulses are periodic or wave-like in character and occur on cycles of
greater and lesser muscular activity when, in the former case, more effort than usual
is expended to expel air from the lungs, producing stressed syllables. A cycle is
defined by the alternation of phases of less effort with a greater effort in order to
produce a stressed syllable. Speakers vary the tempo of their speech considerably and
this variation in speech tempo may be seen as one index, along with others, of the
specific organismic and contextual variables that are in operation.
In the transcription, tempo will be indicated by a simple three-way distinction
between slow, median, fast, as follows: Tempo: S, M, F, as in Column 5, Rows 25 and
31. These signs will be placed immediately prior to the stretch of text or the item
in question. Tempo is also a relevant factor in body movement, and the same signs
will be used indifferently to specify tempo in both the auditory and kinesic dimen-
sions. It should also be emphasised that tempo is a relative factor which varies in
concert with other factors and has no fixed meaning of its own. In the Westpac text,
for example, the tempo of the chorus in Phase 1 starts quite slowly, to become very
quick at the end of this phase. We might say that the increase in both tempo and
volume that characterises this development constitutes one dimension of the over-
all textual work that is undertaken to exhort people to act in a certain way. In the
case of the male speaker in Phase 4 (Column 5, Rows 31-53 ), the tempo of his voice
is quite regular and in ways that are atypical of spontaneous conversation, where
tempo fluctuates considerably. In this case, the male speaker is the voice of Westpac
itself. The tempo is consistently moderately fast, with little variation or fluctuation,
and this contributes to the authoritative positioning of the speaker as one who
speaks with confidence, leadership and assertiveness.
220 Multimodal Transcription and Text Analysis: Chapter 4

4. 9. 11. Continuity and pausing


In both music, song and speech, the sound stream may be punctuated by pauses or
silent junctures of varying duration and significance. Pauses are indicated in the
transcription by the following sign: (#), as in Row 43, Column 5. Pauses may occur
for completely contingent reasons (breathing and so on) or they may signal the end of
a melodic or intonational phrase. Such pauses may indicate either finality or open-
endedness. In the former case, a falling melody signals closure or finality as the speaker
or singer indicates his or her turn is coming to an end. Open-endedness, by contrast,
is indicated by a rising melody, signalling a willingness or intent to continue. The cor-
respondence between the two forms and the meanings mentioned here is, however,
not always so clear cut. For example, the discourse of the male speaker in the Westpac
text is punctuated by numerous pauses which coincide with a falling melody. However,
these do not suggest here that the speaker has finished. Instead, this is a discourse in
which the Westpac spokesman holds the floor, though in no way that corresponds to
intersubjective ratification among interlocutors, and lets it be known that Westpac is a
leader that can speak with definiteness and authority. The difference between falling
and rising melodies in the above sense will be indicated by the signs (/) and (\),
respectively. In the environment of a pause, the two phenomena can be specified as
(# /), meaning that the pause coincides with a falling melody, as in Row 46, Column 5.

4. 9. 12. Dyadic relations among auditory voices: sequentiality, overlap, turntaking


The soundtrack of the text is organised in terms of a number of textual voices
(spoken, sung, instrumental etc.) which stand in various kinds of relationship to
each other, rather like the partners in a conversation. In this text, the chorus is the
voice of the people. The female soloist – the lead singer – is both one of the people,
yet also set apart from them as an example whom they can both admire at the same
time that she encourages them to get moving. Linguistically, this distinction is
revealed in the two different kinds of imperative that the two textual voices –
chorus and lead singer – use.
The chorus uses the subcategory of imperative known as jussive; it is the
voice of the people collectively exhorting themselves and each other to adopt a
certain course of action. The female soloist, on the other hand, uses the suggestive
subtype, i.e. let’s get moving, which uses the implied inclusive plural pronoun us to
distinguish between the I of the singer and the you of her addressees – in the first
instance the chorus and, by implication, the viewer. The inclusive pronoun includes
both parties in the proposal at the same time that it makes a distinction between
proposer (singer) and proposee (chorus/viewer), thereby highlighting the female
soloist as exemplary and distinctive with respect to the chorus and viewer.
These two voices engage in a kind of dialogue in which the first three dia-
logic moves of the chorus – roll them, roll them, roll them up – initiate the dialogue
through partial repetition of the same formulaic locution. Musically, the chorus
Continuity and pausing, Dyadic relations among auditory voices & Vocal register 221

develops from the subdued, quasi-mystical tone at the beginning through to the
rousing, ecstatic quality of the final roll them up, which is loud and fast.
Intertextually, this references a tradition of religious choral music, as in the choral
works of Bach and Händel. The overall meaning is the celebration of, and the iden-
tification with, the meanings of the chorus so that both the individual members of
the chorus and the audience are collectively bound to these meanings and values in
the celebration of something more exalted.
The lead singer responds to the chorus not by way of mere reaction. Instead,
her contribution to the dialogue constitutes a further development of the meaning
of the chorus, as also highlighted by the paratactic conjunction of extension and
(Halliday, 1994 [1985]: 230-2). The conjunction construes an explicit link of the
additive type between the dialogic move of the chorus and that of the lead singer.
As befits her role as exemplar, she extends the more formulaic meaning of the
chorus at the same time that she proposes to them (and to the listener/viewer) an
exemplary role model to follow.
In the Westpac text, the dialogue between chorus and lead singer is orderly
and sequential. There is no overlap or interruption. This in itself suggests the har-
monising of their purposes rather than conflict and competition. Importantly, the
chorus is supported by an instrumental accompaniment, whereas the soloist sings
alone. Later in the text, the male speaker – the voice of Westpac, of authority – is
also accompanied by a simultaneous instrumental support (Rows 31-53, Column 5 ).
In the first case, the accompaniment is another textual voice which harmonises with
that of the chorus in conformity with its overall modal orientation. The exhortative
orientation of the chorus receives further support from the musical accompani-
ment, which does not compete with it. In the second case, the male speaker is clearly
dominant and assertive – the voice of power and leadership – and the instrumental
support again serves to reinforce this role by remaining in the background and in
no way creating discord with the speaker (see Van Leeuwen 1991: 76). Perhaps it is
possible to say here that the instrumental support has resolved the tension between
chorus and lead singer into a single (purely instrumental) voice which has now been
harmonised to the goals of Westpac. In other words, both voices have now been
fully subordinated to the dominant discursive voice of Westpac.
For the purposes of the transcription, the salient distinctions are as follows:
sequential (SE), simultaneous (SI), initiating (I), and responding (R). Thus, in, Row 17,
Column 5, for example, the female soloist is glossed as (R) to indicate that she is
responding to the previous turn, which is sung by the chorus.

4. 9. 13. Vocal register


The term register, which has also become a technical term in linguistics, derives in
actual fact from music. In its musical sense, register refers to ‘different qualities of
sound arising from differences in the action of phonation’ (Abercrombie, 1967:
222 Multimodal Transcription and Text Analysis: Chapter 4

99). Thus, in singing there are said to be upper, middle, and lower registers. It is in its
original musical sense that the term is used here. Abercombie also suggests that
there are different registers of the speaking voice in order to express a range of
different emotions – anger, tenderness, impatience, and so on. Moreover, speakers
may switch speaking (or singing) registers as they emotionally modulate their
discourse in different ways.
The above labels are somewhat impressionistic, but they can give us some
clues as to how we might gloss changes in the register of the speaking and singing
voice when we are transcribing multimodal texts, though without necessarily going
into the articulatory details which underlie this. Thus, the chorus may be said to
move from a register of the mystical to the ecstatic in which all voices are united in
the service of a higher cause. The lead singer sings at a higher pitch level than the
chorus in a tradition of pop and rock female singers, as distinct from the allusions
to the tradition of choral music in the chorus. Thus, the register here is a folksy,
individualistic one, as distinct from the upper, yet dark, registers of the tragic
heroine (e.g. Brünnhilde or Isolde) in a Wagnerian opera. The register of the male
speaker is that of the radio or television commentator who provides an authorita-
tive interpretation of events – fast-paced, assertive and monologic in orientation,
not amenable to dialogic interrogation or interruption.
Typically, speakers and singers have a range of voice dynamics which they
variously deploy according to the context, the discourse genre, as well as more sub-
jective factors to do with their physical or psychological states.

4. 10. Column 6: Metafunctional interpretation

The assumption that the metafunctions (see Inset 4, pp. 22-23) are spread across all
the resources used constitutes a unifying principle for thinking about multimodal-
ity. The decision to include this sixth Column also draws attention to the fact that
transcription and textual notation are never theory-neutral. Rather, they always make
assumptions both about the meaning of the text and about which meanings and
their modes of expression to foreground in a given analysis. The inclusion of this
column should help transcribers both to make this explicit as well as to provide
shorthand glosses on each successive phase of the text’s unfolding meaning.

4. 10. 1. Metafunctional notation in relation to Column 6


As mentioned before, the purpose of Column 6 is to suggest some of the ways in
which the multimodal integration of the metafunctions is achieved. Each meta-
function is identified as follows: EXP = experiential; INT = interpersonal; TEX =
textual; LOG = logical. The metafunctional analysis in Column 6 is specified with
respect to each subphase in the unfolding text. There is no attempt to provide a
complete or detailed analysis. Instead, only the most salient features are identified
Column 6 & Display and depiction: two sides of the same semiotic coin in visual texts 223

and their metafunctional significance glossed with reference to the notation referred
to in the current section. The main purpose of Column 6 is, then, to provide a brief
summary of the metafunctional salience of the particular semiotic modalities that are
codeployed in a given subphase. Column 6 has, then, an important integrating
function, though in a different way from Column 1. Importantly, Column 6 consti-
tutes a departure from the integration of columns and rows which characterises the
rest of the table (see 4.2 , pp. 174-181). This means that the metafunctional analysis
in Column 6 does not correlate with a row number as specified in Column 1. For this
reason, no rows are featured in Column 6. The motivation for this choice lies in the
fact that the information in Column 6 is specific to an entire subphase. Therefore, the
analytical scope of the commentary in Column 6 necessarily extends over the
equivalent of several rows. For example, the metafunctional analysis which refers to
Phase 1a extends over the section of text that stretches from Rows 1 to 10.

4. 11. Display and depiction: two sides of the same semiotic coin in visual texts

4. 11. 1. Multimodal discourse analysis: the Mitsubishi Carisma advertisement revisited


In this final part of the book we will suggest that display and depiction are two sides
of the same semiotic coin in visual texts. In Chapter 1, we suggested in the
discussion of the cartoon in Figure 1.3 that visual texts can be unpacked as a level
of semiotic organisation which is analogous to the discourse stratum in linguistic
texts. We also saw in Chapter 1 how the event structure is a level of semiotic
organisation which is highly condensed in the visual organisation of the depicted
scene in the cartoon. Nevertheless, the temporal succession of events can be
reconstructed or reactivated in ways which partially detach it from the visual forms
themselves. This shows the need to distinguish the narrative event structure as a level
of meaning which is realised by, but is not reducible to, the resources of the visual
grammar which are used in the cartoon drawing. Below, we further develop these
ideas in relation to some aspects of the discourse level of organisation of the
Mitsubishi Carisma television advertisement that we discussed briefly at the end of
Chapter 1 in relation to meaning-making units in film texts. As mentioned above, a
transcription of this text is provided in Appendix II. The ultimate goal of this con-
cluding section is to show, albeit partially, how multimodal text analysis and
multimodal transcription can be combined in order to develop insights concerning
the ways in which meaning-making resources on the levels of Hjelmslev’s (1961
[1943]) expression and content strata are integrated to a discourse level of
organisation in multimodal texts.

4. 11. 2. From delimited optic array to visual text: the stratification of the visual sign
The basic reality of the visual image is a delimited optic array (Gibson, 1986 [1979]:
270-273) that is projected onto the video screen by some sort of electronic device
224 Multimodal Transcription and Text Analysis: Chapter 4

such as a modulated scanning beam. In contrast to the ambient optic array that
affords the perceptual pickup of information about events in the environment of
the observer, the optic array that is projected onto the television screen is delimited
because it is confined to the screen rather than belonging to the environment that
surrounds the organism. The screen is a surface onto which optic information
about something other than the surface is projected. The surface of the screen dis-
plays to the viewer visual invariants and their transformations in time. That is, the
structure of the array undergoes change and transformation in time, and it is this
change and transformation which creates the effect of movement. Thus, changes
in the structure of the delimited optic array of the screen can provide stimulus
information that affords both the pickup of information about the movement of
persons, objects and so on, in the depicted world of the film and the pickup of
information about the movement of the viewer in relation to the depicted world
(Gibson, 1986 [1979]: 294). In other words, the optic array affords both visual event
perception and visual kinaesthesis.
The expression stratum of a video text consists of visual resources such as
lines, dots, the interplay of light and shade, colour, and so on. However,
information about visual invariants is not contained in the lines, the dots, or the
light and shade per se, but in the ways in which these are connected to and nested
within each other so as to create information about shapes, surfaces, textures and
many other features. Lines, dots and so on, are analogous to the phonetic
dimension of speech sounds. Hjelmslev (1961 [1943]) referred to this as the level
of expression substance. When lines, dots and so on, are connected to each other,
they display information about visual invariants and changes in these invariants in
the optic array. The many degrees of freedom of expression substance is shaped by
and entrained to the forms and categories of expression form. In spoken language,
this level is equivalent to the phonological system of a particular language. In visual
semiosis, expression form is equivalent to the stimulus information in the delimited
optic array that the viewer picks up with his or her perceptual systems. The
information that is picked up is in the form of structured ambient light which
specifies an environment for the observer (Gibson, 1986 [1979]: 51). Gibson calls
structured ambient light in the environment of the observer the ambient optic array
because it is ambient – i.e. it surrounds the observer – and affords the observer the
possibility of picking up structured stimulus information about the environment
that surrounds the observer. The notion of structure further implies that the stim-
ulus information has pattern, texture and configuration (Gibson, 1986 [1979]).
In the case of visual texts, the optic array is not ambient, but delimited, for the
reasons mentioned above. In any case, it is the information about structure and
pattern which the optic array – ambient or delimited – affords the observer and which
corresponds to the expression stratum of visual texts. An optic array therefore has
component parts – it is not homogeneous – and specifies information about objects,
From delimited optic array to visual text: the stratification of the visual sign 225

events and relations between these in the environment of the observer or about
something other than the surface on which the delimited optic array of visual texts
is projected. The delimited optic array projected onto the video screen displays
information about (1) transformations, substitutions, nullifications of structure in
the optical array and (2) visual kinaesthesis in the observer. As shown in Table 4.1,
which is only concerned with visual semiosis, this information has properties of
metafunctional organisation which resonate with those on the content stratum.
Table 4.1 illustrates the relationship between the expression and content strata in
visual texts. It also shows how the two strata exhibit metafunctional forms of
organisation. The distinction between display and depiction can be explained, with ref-
erence to Table 4.1, as follows. The expression stratum of visual semiosis is based on
the display of visual invariants and their transformation on, say, a video screen; the
content stratum is based on the depiction of a visual scene consisting of actions, events,
persons, objects and so on in the depicted world. Display and depiction therefore per-
tain to the expression and content strata, respectively. They are, of course, two sides
of the same semiotic coin (see Inset 14, pp. 175-177 and Table 4.1 on the following
page). Display is concerned with getting the optic invariants and their transformations
to the viewer in a perceptible form as stimulus information. Stimulus information
takes the form of the visual invariants and their transformations that are traced or
projected onto a surface such as a sheet of paper, a video screen, as a delimited optic
array (Inset 15: Gibson’s optic array, p. 192). Depiction is concerned with the interpre-
tation of this stimulus information as a visual scene consisting of meaningful actions,
events, participants, settings and so on. Actions and events and their associated
participant roles are realised by the resources of the visual grammar (content form) in
the form of shapes or volumes and the connections (e.g. vectors) between these.
In this book, we have used the term visual transitivity frame to talk about this
aspect of visual grammar and the meanings realised by it. Visual transitivity frames
consist of participants, which are realised by volumes or shapes and processes, which
are realised by vectors and other dynamic features such as those proposed in Table
4.2. Visual transitivity pertains to the content stratum of depiction. The observations
made here have to do, above all, with the experiential metafunction, though it is
important to keep in mind, as Table 4.1 shows, that visual texts are organised along
metafunctional lines. Interpersonal, textual and logical meanings are also involved (see
also Baldry, 2000a; Baldry, Thibault, 2005; Kress,Van Leeuwen, 1996; Lemke, 1998;
O’Toole, 1994; Martin, Rose, 2003: 255-262; Martinec, 2000; Thibault, 2000a).
In Table 4.1, the discourse stratum – cf. Hjelmslev’s content substance (see Inset
18: Stratification, pp. 236-237) – is the higher-scalar level that contextually integrates
and organises to its own level the units and their relations on the other lower-scalar
levels specified in Table 4.1. It is the global level of the text as a meaning-making
(communicative) event in a given social and cultural context. The discourse stratum
involves the entextualization of selections on the other levels of organisation as a
226 Multimodal Transcription and Text Analysis: Chapter 4

Expression Perceptual pick up of stimulus information in ambient optic array about environmental events
purport

Expression Delivery of delimited optic array to a surface (screen); array contains information about things other
substance than that surface

Metafunction Experiential Interpersonal- Textural Logical-


orientational transitional

Expression Display on Field of view and Deletions, Visual Display of visual


form screen of movement of accretions, transitions: invariants and
transformations, camera = optic array slippage of their trans-
substitutions, of viewer + texture in optic (1) based on formations in
nullifications of simulation of eye- camera time of delimited
structure in optic head-body array movement, (e.g. optic array on
array + visual movement and pan, zoom, dolly television screen
kinaesthesis orientation of shot); or by means of a
based on modes stationary/seated modulated
of camera viewer (2) based on film scanning beam
movement to editing (e.g. cut,
produce a wipe, merge,
changing optic dissolve) in post-
array production

Content Depiction/ Use of colour, Compositi- Shot as single Depiction of events


form/visual perception of modalisation, onal run of camera in the depicted
grammar objects and camera angles to principles of with no world that the
events in the orient the viewer to wholeness, displacement in viewer sees on
form of volumes the depicted world balance, the time or place of the screen
and vectors in and to adopt an relations of depicted scene +
depicted world evaluative stance parts to nesting of shots
+ movement of towards it; the whole in higher-order
observer in creation of social- units;
depicted world interpersonal dependency
relations between relations
viewer and depicted between shots
world

Content Construal of visual grammar and its integration to social activities and practices; processes of
substance/ entextualization in multimodal texts
discourse

Content Visual perception of events in the world of the viewer


purport

Table 4.1: Stratification of video texts, showing both the relationship between
the expression (display) and content strata (depiction) of visual signs
From delimited optic array to visual text: the stratification of the visual sign 227

form of text-artifact that can be connected to a particular context, delocated from


one context, and relocated in some other context in and through the practices that
embed text-artifacts in contexts. Entextualization refers to the processes whereby
linkages are created between items as an organised field of sign-relations which the
interpreter differentiates from its surrounds as a textual artifact and attends to
accordingly (Silverstein, Urban, 1996).
A visual text, no less than a linguistic one, though in different ways and on the
basis of different principles of organisation, is a complex and stratified system of
interacting units and their relations on different levels of organisation. The stimulus
information that we pick up with our perceptual systems is just the most visible part
of this overall complex of interacting factors that play their role in the recognition
and interpretation of some patterned relation among items as an entextualized field
of interrelated signs that stands out from its surrounds and semiotically mediates the
relations between participants in interaction. Entextualization is the process of
attributing the quality of textuality to some phenomenon and of embedding it in
practices of sign recognition and sign interpretation.
In one sense, the fundamental reality of visual texts is the stimulus information
that we pick up in the delimited optic array that is displayed on the screen. However,
the patterned arrangements in the optic array on the expression stratum are the basis
for the construing of patterns of meaning relations on the content stratum of the
depicted world that is presented to the viewer. The optic array and its entrainment
to the practices of individuals and communities specifies in the first instance the
material reality of the text as a dynamic material process involving both stability
and change in the optic array and its variants and invariants. By the same token,
perceptible visual patterns in the array can be interpreted as semiotically salient
patterns when these patterns become more and more integrated to the social
activities that are characteristic of a given society and culture. It is in the process of
their integration to meaning-making activities that visual patterns can come to be
seen and interpreted, to varying degrees, as conventional patterns and therefore as
having the potential to create meanings that have social and cultural value in the
contexts in which they are made and interpreted.
It is for this reason that we can talk about a grammar of visual and other
semiotic forms as an abstraction from material processes that we nevertheless use,
reproduce, theorise about, value and innovate with. Once a semiotic system acquires
this level of abstraction it is possible to talk about it and to formalise it as a system
of possible forms and the possible meanings that can be made in and through these
forms in particular times and places. At the same time, the attempt in Table 4.1 to
show how visual semiosis is a stratified system should remind us that no system can
be detached from the dually material and semiotic processes of the concrete visual
images and texts that we encounter. The dialectical duality of display and depiction
should serve as a constant reminder of this fact, which should always inform our
228 Multimodal Transcription and Text Analysis: Chapter 4

analysis and theorisation of multimodal texts. In 4.11.3 we shall consider some


aspects of the first of the two dimensions – i.e. display – in order to illustrate some
of the ways in which visual signs are organised on the level of expression form.

4. 11. 3. Transformations in the optic array: some examples from the Mitsubishi Carisma text
Transformations in the delimited optic array that is displayed on the screen involve
the following kinds of operations on features of the optic array: magnification;
diminishment; nullification; deletion; movement; accretion; addition; slippage; and
substitution. Table 4.2 sets out some of the main possibilities in Phase 1 of the
Mitsubishi Carisma advertisement.
In Shot 1, the movement of the car is a transformation of a visual invariant
– i.e. the position and the relative prominence of the car qua visual shape – in the
optic array that is displayed in this scene. This is the salient change that occurs in
this shot against a background of other features (the telephone box and the urban
setting) that remain invariant. The transformation involves both movement and mag-
nification as the car increases in size relative to the total volume of the screen space
as it moves along the road from the background towards the centre of the screen
space. The telephone box is the salient object in Shot 1. The viewer’s visual focus
tends to be drawn towards this object on account of its visual salience: the tele-
phone box dominates the left top-to-bottom area of the screen and its bright red
colour contrasts with the subdued dark colours of the setting. This salience anchors
the shot relative to the movement of the car and suggests that there may be a
relationship between the two objects.
The cut to Shot 2 involves deletion of the car and the urban setting in Shot 1
at the same time that there is also magnification of the interior of the telephone
box. The transition from Shot 1 to Shot 2 thus involves the deletion of some invari-
ants and changes in others. In Shot 2, the salient transformation is the movement
of the woman’s hand when it picks up the telephone receiver. The telephone appa-
ratus itself remains invariant. In Shot 2, the visually salient object is the slightly out-
of-focus telephone apparatus inside the telephone box and, momentarily, the
woman’s gloved hand as it reaches towards and grasps the receiver.
In Shot 3, the invariant features are the interior of the telephone box and the
woman. The variant features are the movement transformations observable on her
face, head, and upper torso. The salient object is the woman in contrast to the non-
salient interior of the telephone box.
The transition to Shot 4 involves the deletion of all invariant features shown
in the preceding three shots and their substitution with a new set of invariant
features, viz. the head and face of the man and the out-of-focus background detail
of the subterranean setting. As in Shot 3, the variant feature here involves the head
and facial movements of the man. In this shot, the salient feature is the head and
face of the man in relation to the indistinct and non-salient background.
Transformations in the optic array: some examples from the Mitsubishi Carisma text 229

Shot 5 features the same invariant features that are seen in Shot 4 (the man’s
head and face) at the same time that further detail is added to the background set-
ting in the form of the greater detail of the setting (the shark, the waterway, the

Optic Array
Visual Frame Variant and Invariant Features Salience Location and Disposition
Shot of Shapes on Screen
within Shot
Invariant: telephone booth + + salient: phone booth; Background + car + setting;
setting; Foreground: phone booth
1 –salient: setting

Variant: movement Angle: oblique: left


transformation: car

Invariant: telephone set +salient: phone receiver; Background: zero;


Foreground: telephone +
2 –salient: no background hand

Variant: movement Angle: oblique: left


transformation: receiver +
hand

Invariant: interior of phone +salient: woman; Background: interior of


booth; phone booth;
3 Variant: head + facial –salient: telephone + interior Foreground: woman;
movements of woman of phone booth Angle: oblique: left

Invariant: background setting; + salient: man; Background: indistinct:


Variant: head + facial –salient: background; Foreground: man +
movements of man Angle: oblique: right telephone;
4 Angle: oblique right

Invariant: background setting: +salient: man; Background: suspended


Variant: left pan of camera –salient: hostage + hostage + underground
5 background setting; place;
Angle: oblique: right Foreground: head-shoulders
of man;
Angle: oblique: right
Invariant: car + phone booth +salient: car: illuminated; Background: car;
door; –salient: frame of phone Foreground: phone booth
6 Variant: movement booth door; obscured; door + woman
transformation: woman walks Angle: oblique: left:
to car intersecting

Table 4.2: Some transformations in the delimited optic array


of Phase 1 of the Mitsubishi Carisma advertisement
230 Multimodal Transcription and Text Analysis: Chapter 4

bridge, the man suspended from the bridge). The principal variants here all involve
movements of various kinds. These are as follows: (1) the movement of the man’s
head as he turns his gaze to the hostage; (2) the movement of the shark’s fin in the
water below the hostage; and (3) the leftward movement of the camera pan as it
tracks the movement of the man’s head and synchronises his gaze vector with the
viewer’s. In Shot 5, the salient feature remains the man’s head and face in relation
to the less salient details of his setting.
A number of features that are best described as part of the visual texture of
Phase 1 set up a pattern of contrasting patterns that relate to the woman and the
man, respectively. For example, Shots 1-3, which are associated with the woman,
feature the use of oblique lines that incline towards the left (\) as well as the illu-
mination of the woman and the car she is driving. These features contrast with the
oblique lines inclining to the right (/) and the partial occlusion of the man’s face on
account of the interplay of light and dark in Shots 4-5.
The repetition and interplay of these features and their association with
specific participant functions and locations undoubtedly contribute to the visual
texture of this phase at the same time that the contrasting patterns index the unre-
solved conflict between the values of the two positions that the man and woman rep-
resent. Shot 6, which concludes Phase 1, combines some aspects of both patterns:
(1) the illumination of the car contrasts with the darkness of the frame of the door-
way of the telephone box as viewed from its interior; and (2) the oblique character of
the intersecting vertical and horizontal lines formed by the frame of the doorway
signifies asymmetry, imbalance and lack of visual harmony.
In 4.11.4, we shall be concerned with transitivity frames in relation to the
Mitsubishi Carisma advertisement (see Appendix II ).

4. 11. 4. Visual transitivity frames and experiential meaning


In language, a clause represents a given phenomenon which is spatially and tem-
porally grounded in some (real or imaginary) referent situation. In this perspective,
a clause can be analysed into two central components: a unit signifying a process
and one or more linguistic units signifying the participants in the process. Nominal
groups signify participants; verbal groups signify processes.
In the linguistic semiotic, what is signified (e.g. a semantic category such as a
participant role) and how it is signified (e.g. a lexicogrammatical form such as a
noun or nominal group) are both discrete. Semantic categories in natural language
are defined by their contrasts with other discrete categories. A noun may be either
singular or plural, but it cannot be continuously graded between the categories of
singularity and plurality. The difference between man and men is a categorial one.
The nouns that realize the two categories are themselves discrete grammatical units.
You have to interpret the noun either as man or as men; the very many degrees of
freedom that are possible in the articulation of the vowels in the two words does
Visual transitivity frames and experiential meaning 231

Visual Frame Grammar Feature Process type


farness farness+ quantity:
nearness nearness + event: movement
magnification: quantity:
size of carmagnification:
size of car
farness
farness nearness; nearness; event: movement
quantity: magnification:
quantity: magnification: size of car
size of car
vector: directionality: movement of action
woman

vector: directionality: orientation of spatial orientation

Table 4.3: Some visual process types and their modes of realization
road sign

vector: action
connection: hand-telephone

in the Mitsubishi Carisma advertisement


vector: action
connection: mouth-loud hailer

deformation: body surface: mouth verbal-behavioural

deformation: body surface: mouth verbal-behavioural

interpenetration of different domains connect actions

interpenetration of different domains: mental


projection

continuous
continuous change: change:
right camera left orient viewer
pan
camera pan [1�2]
1 2
continuous
continuous change: change:
left camera left behavioural: gaze: align viewer
pan +

� turn + man’scamera
heads gaze pan [1�2] +
vector gaze to participant gaze
head turn [� ]+ man’s
1 2 gaze vector
balance + centering + vector: frontal engage viewer’s gaze: present
orientation towards viewer object
232 Multimodal Transcription and Text Analysis: Chapter 4

not warrant the ad hoc creation of a new word or a new semantic category. The
articulatory space in which vowels are produced permits many degrees of
topological-continuous variation in the production of a particular vowel sound
(Thibault, 2004a: Chap. 3). However, this does not give rise to a semantic gradation
lying between the categories singular and plural. The distinction between the two
categories is a typological-categorial one. The lexicogrammar and semantics of natural
languages are almost completely typological in their mode of semiosis (Lemke, 1998).
In visual depiction, a visual image likewise represents some phenomenon
which is spatially and temporally grounded in some (real or imaginary) referent
situation. In this perspective, a visual image can also be analysed into two central
components: a visual unit in the form of a vector signifying a process; and one or
more visual units in the form of volumes or shapes which signify the participants
in the process. Volumes signify participants; vectors signify processes.
Unlike the lexicogrammar and semantics of natural languages, the mode of
visual semiosis is mainly topological. It is concerned with continuous change and
variation and is tied to visual kinaesthesis of the observer in a way that language is
not. Visual semiosis is concerned with the topological and dynamic characteristics of
phenomena. This can be illustrated with respect to the kinds of process types that
are typical of the visual transitivity frames that we find in television advertisements
such as the Mitsubishi Carisma text.
Visual processes are realized through a wider range of resources than vec-
tors per se. A process connects participants through a range of topological relations
such as the following: increase or decrease in quantity; deformation of body
surface; continuous change; movement; nearness and farness; connectedness;
interpenetration of domains; visual kinaesthesis.
Table 4.3, on the previous page, illustrates some of the visual process types,
their associated participants and the grammatical features that realize them with
reference to the Mitsubishi Carisma advertisement.
In 4.11.5, we shall discuss some of the ways in which the visual transitivity
frames are integrated to and function in larger-scale identity chains in the Mitsubishi
Carisma advertisement. In doing so, we are drawing attention to some of the ways
in which small-scale features such as the visual transitivity frame contribute to the
discourse level organisation of multimodal texts.

4. 11. 5. Identity chains in visual semiosis


Table 4.4 models the visual participant chains in Phase 1 of the Mitsubishi Carisma
advertisement. This table shows the repetition of participant functions across
shots and patterns of interaction among participant chains (items in square brackets
are implicit i.e. not visually depicted, but nevertheless present in the shot in ques-
tion). Abstracting away from the specific visual details, Table 4.4 shows that there
are five participant chains in Phase 1. The columns display the items that belong to
Identity chains in visual semiosis 233

a particular participant chain. The rows show the patterns of interaction on a shot-
by-shot basis among the different participant chains. With the exception of the
single occurrence of the hostage, the other two participants consistently interact
with the telephone chain, which has a pivotal role in Phase 1 in connecting the two
main participants. Much of the thematic, and therefore the phase-specific homo-
geneity of Phase 1, can be attributed to the consistency of patterning in the shot-
by-shot development of each individual participant chain along with the patterns
of interaction among the chains.
Chains are linked to each other by visual processes (movement and
connection vectors) in different kinds of visual transitivity frames. The thematic
continuity and development that Table 4.4 shows is not only based on the percep-
tion of invariants from one shot to the next on the expression stratum; it is also
based on the ways in which the progression from one shot to the text is integrated
to a narrative logic of transformation from one shot to the next on the discourse
stratum. Thus, the movement of the car in Shot 1 does not entail transformation in
this sense, whereas the transition from Shot 1 to Shot 2 does. In this case, the move-
ment of the car and the salience of the telephone box in Shot 1 and the hand grasp-
ing the telephone receiver inside the telephone box in Shot 2 are seen as integrated
components of a larger-scale structure of meaning.
The individual shots and the sequential relations between them presume the
transformation of some features from one shot to the next. In the present case, the
spatial relocation of the woman from car to telephone box is one such
transformation which potentially generates narrative meaning, in the process raising

Woman Car Telephone Man Location Hostage

Shot 1 [woman as unseen driver] car phone box street

Shot 2 body part: hand phone receiver1 inside phone


box

Shot 3 woman interior of phone


box
[woman as unseen
Shot 4 addressee] phone receiver2 man

Shot 5 phone receiver2 man hostage

Shot 6 woman car phone box

Table 4.4: Visual participant chains in Phase 1 of the Mitsubishi Carisma advertisement
234 Multimodal Transcription and Text Analysis: Chapter 4

questions, for example, as to the reasons for the woman’s action. On the basis of Shot
1, we can say, for instance, that the car moved. The integration of the two shots
enables us to say something along the lines of the woman drove to the telephone box to
make a call. The kind of question raised here concerning the ways in which shots are
integrated with each other to form larger-scale units leads us to consider the kinds of
meanings that are created by the relations between shots in the next section.

4. 11. 6. Dependency relations in the Mitsubishi Carisma text: implications for visual texts
The relations between the shots in a video text can be viewed in terms of dependency
or part-part relations. Dependency is a fundamental kind of meaning relation which
is not specific to language. In a dependency relation there is a direct relationship
between the units involved rather than a relationship which is mediated by a shared
higher-order unit. The latter type of relationship is usually referred to as constituency
in linguistics. Constituency involves part-whole relations; the parts are defined in
terms of their relationship to the whole structure to which they belong and in which
they function. Visual transitivity frames involve part-whole relations in this sense. In
the case of visual semiosis, this may at first glance appear to contradict the fact that
visual semiosis is based on topological-continuous variation. Constituency in lan-
guage is based on discrete typological-categorial distinctions. Nevertheless, the visual
image can be analytically broken down into such elements as volumes (shapes), vectors
(lines of directionality and force), and the functional relations among these (Kress,
Van Leeuwen, 1996: 56-64). In visual transitivity frames, volumes function as
participants, while vectors function as processes.
It is on this basis that we can determine different kinds of visual transitivity
frames both within and between shots in video texts (see Baldry, Thibault, 2005).
Constituency relations are analytical: a syntagmatic relation is broken down or analysed
into its parts and the functions of the parts in the whole are determined. The whole
mediates the relations among its parts. Dependency relations are synthetic: a
syntagmatic relation is synthesised between two or more units when the relationship
between them is not dominated by some superordinate constituency relation.
In linguistics, two principal types of dependency relation are recognised.
These are parataxis and hypotaxis. A paratactic relation involves equality between
the units: the units in the dependency relation are of equal status. The relationship
between Shots 24 and 25 in the Mitsubishi Carisma advertisement is of this kind. Shot
25 paratactically extends the meaning of Shot 24 by giving an alternative to it. The
switch from the angry female actor on the film set to the focus on the car in these
two shots does precisely this. In doing so, the car replaces the director and the actors
in the hackneyed thriller plot in the film studio as the centre of attention.
Hypotaxis involves a relationship of inequality : one unit is dominated by and
in some way dependent on the other unit. One example in the Mitsubishi Carisma
advertisement is the relationship between Shots 4 and 5. In this sequence, Shot 5 is
Dependency relations in the Mitsubishi Carisma text: implications for visual texts 235

dependent on Shot 4. Shot 4 is the dominant or more important shot here. Shot 4 is
the Head; it is the point of origin and the initiation of the dependency relation
between these two shots. Shot 5 is dependent on it because it is Shot 4 that could
stand alone whereas Shot 5 modifies Shot 4 and cannot stand by itself. Specifically,
Shot 5 modifies Shot 4 by extending its meaning, adding to the meaning of Shot 4.
The camera pan to the left in Shot 5 extends and adds to the meaning of the man’s
head turn and gaze vector initiated in Shot 4. In this way, the specific focus of his
gaze vector is shown in Shot 5 to be the hostage suspended from the bridge. It is in
this sense that we can say that Shot 5 hypotactically extends the meaning of Shot 4.
According to Halliday (1994 [1985]: 225-241; Halliday, Matthiessen, 2004: 395-
441), there are three ways in which the dependency relations between clauses in a
clause complex may be expanded. These are: elaboration, extension and enhancement.
Elaboration (=)
The elaborating relation is indicated by the equals sign (=). In a paratactic elaborat-
ing complex, an initial unit is restated, exemplified or further specified by another
unit. Shot 26 elaborates Shot 25 in this way. Shot 25 takes the viewer from the film
set to the car as it is being withdrawn from the film set. Shot 26, however, focuses the
viewer’s attention squarely on the car for the first time in the entire advertisement. It
restates and further specifies the meaning of the Mitsubishi car in Shot 25 by switch-
ing from the oblique angle of the car being directed by the man beckoning with his
hands to a frontal view of the car in which the car is now the salient object without
any human in view. In this way, the visual significance of the car and its relative
salience in Shot 25 is restated and further specified in this shot.
Extension (+)
The basic meanings of the extending relation are those of addition (including the
adversative relation) and variation. That is, one unit adds to or varies the meaning
of the other, thereby extending its meaning. The relationship between Shot 17 and
Shot 18 is an example of this type. Shot 18, which features the (woman’s) passport
being held by a hand, is superimposed on and merges with Shot 17, which is of her
car travelling on the road to Prague from Vienna. Shot 18 extends the meaning of
Shot 17, which shows the car on the road and a road sign pointing in the direction
of Prague, by adding (quite literally) a further layer of meaning to Shot 17.
Enhancement (X)
The meanings included under enhancement tend to be circumstantial. One unit
enhances the meaning of another in terms of time, manner, place, cause,
condition, result and so on. The relationship between Shots 21 and 22 is of this
kind. Shot 21, which shows the film director shouting the word cut on the film set,
provides the reason or the motivation for the woman’s anger in Shot 22. Shots 23-
24, which show the telephone box being removed from the film set, enhance the
meaning of Shot 22, in which the woman is speaking on the telephone, by indicat-
ing the changed circumstances in which her conversation is taking place.
236 Multimodal Transcription and Text Analysis: Chapter 4

It is not our intention here to go into detail about these distinctions.


However, these preliminary observations suggest that formally distinct types of
dependency relations between shots in video texts can be identified, and that it may
be possible to establish a systemic representation of the visual resources that are
used. As the examples show, these resources include aspects of both film editing

Inset 18: Stratification


 According to the Danish linguist Louis Hjelmslev, a semiotic system such as
language is a stratified system. This means that it is organised in terms of the
two interrelated levels, or strata, of expression and content. The sign is a
relationship between these two strata. Both expression and content are aspects
of the internal organisation of language. The two strata are mutually defining
in the sign-relation: there is no expression without a corresponding content
just as there is no content without its expression. The sign in this definition
consists of a content-form and an expression-form. The sign is constituted by the
functional relationship between these two levels of linguistic or other semiotic
form that are intrinsic to the organisation of signs. How do the two abstract
levels of pure form connect with the material world outside the sign?
 Each of these two levels of the sign-relation stands in a particular relationship
to what Hjelmslev calls content-purport and expression-purport, respectively.
Purport is extrinsic to the sign-relation, as defined above. Content-purport is the
topological continuous variation of the phenomena that we experience in the
world before this has been interpreted and shaped by the categories of
content-form. Expression-purport refers to the topological-continuous variation
of, say, the speaker’s vocal tract activity before this has been interpreted and
shaped by the categories of expression-form (1961 [1943]: 57). Both kinds of
purport are extrinsic to the sign; however, both purports are shaped and
interpreted by the categories that are internal to a given language. This point
will now be explained.
 In Hjelmslev’s theory of the sign-relation, the expression stratum does not simply
realise or express a given content, though this is certainly true. Both expression-
form and content-form also interpret their respective purports as expression-
substance and content-substance on both sides of the sign-relation. In spoken
language, for example, the many degrees of freedom, i.e. the continuous vari-
ation of both the speaker’s vocal tract activity and the acoustic stream of
speech sounds that this produces in articulation are filtered by the phonological
categories (the expression-form) of a particular language system. Expression-form
therefore semiotically forms expression-purport as expression-substance. What we
actually hear as the recogniseable sounds of a particular language have been
shaped in this way.
Dependency relations in the Mitsubishi Carisma text: implications for visual texts & Inset 18 237

(e.g. cut) and camera movement (e.g. camera pan). Shots 4 and 5 also realise a visual
transitivity frame conforming to the schema GAZER^ GAZE VECTOR^ TARGET. The
fact that these units can also be functioning parts in a larger whole – the visual
transitivity frame – does not in any way preclude the fact that the two shots can
also enter into a dependency relation of the kind described here.

Inset 18: (continued)


 The sounds in the acoustic stream of speech differ in many ways from one
occasion to the next, from one speaker to another, and so on. The
phonological system filters this variation in order that the actual sounds that
are spoken and perceived are assimilated to and made recogniseable by the
phonological categories of the given language system.
 The same principle also applies to the content stratum: the phenomena of
experience – both real and imagined – are assimilated to and interpreted by the
lexicogrammatical and semantic categories on the level of content-form of a
given language. The sign therefore functions to interpret the bodily processes
of articulation and the phenomena of experience in the world and to entrain
these to the categories internal to a particular language system. The table below
gives the various levels – both intrinsic (i.e. form) and extrinsic (i.e. substance)
– of the sign-relation and how these in turn semiotically shape the body and
the world to the categories of language and other semiotic systems. Linguists
and semioticians who have assimilated Hjelmslev’s account of stratification
into their own ways of thinking about semiosis include Greimas (1966),
Halliday (1978), Lamb (1966), Martin (1991) and Thibault (2004a, 2004b).

expression-purport the body (e.g. the vocal tract and oral cavity)

the body entrained to the phonological


expression-substance
categories of a language system
the phonological categories of a spoken
expression-form
language

content-form lexicogrammar

experience interpreted in and through the


content-substance
semantic categories of a given language

content-purport the phenomena of experience in the world

The different levels of the sign-relation in Hjelmslev’s account


and how the sign connects to and interprets body and world
238 Multimodal Transcription and Text Analysis: Chapter 4

The dependency relations in our two examples relate the two shots involved
in temporal or logical relations of the kinds indicated above. In narrative discourse,
the principal type of dependency relation is the type that Halliday (1994 [1985]: 230-
232; Halliday, Matthiessen, 2004: 405-410) refers to as EXTENSION, i.e. one unit
extends the meaning of another unit by adding to it, proposing an alternative, replac-
ing it, and so on. Shot 2 extends Shot 1 by adding to Shot 1’s meaning. Shot 2 follows
Shot 1 in a temporal sequence: first the event in Shot 1 occurred then the action per-
formed in Shot 2 took place.
Narrative discourse tends simultaneously to express temporal and causal/con-
sequential relations between the units in a sequence and to question these. Thus, the
most general type of narrative dependency relation may be glossed as:
COMPLICATION^ RESOLUTION. This is a schematic relationship which has many
more specific kinds of instantiations in the grammar of natural languages as well as
in visual semiosis. The basic idea is as follows: in dependency relations of this kind,
temporality and causality are both affirmed (COMPLICATION) and questioned (RES-
OLUTION). In simpler terms, one thing causes, follows, or otherwise leads to another
(COMPLICATION), and this requires further explanation (RESOLUTION) (see Thibault,
1988-1989: Chap. 12). Dependency relations such as those between shots in video
texts are an important aspect of the ways in which texts are organised on the
discourse strata (see Table 4.1, p. 226). However, dependency relations are much
more than just a set of syntagmatic relations of the part-part kind. Above all, they
are a type of meaning relation which relates to the dynamic and active process of
making meaning on the discourse stratum. In the case of narrative as a mode of
discourse whatever the semiotic modality, we see this process operating on many
different scalar levels of organisation, e.g. between clauses or between larger phases
and generic stages of linguistic narratives and between shots or between the larger
phases to which shots belong in video texts.
In the Mitsubishi Carisma text, this can be illustrated as follows: Shot 1
implicitly raises or generates questions in the viewer’s mind. In this case, questions

Shot 1: Shot 2:

Affirm: Affirm:
a car approaches a phone box inside the phone box, a hand reaches for the
receiver
Question: Answer:
a. Temporal: what will happen next? a. Temporal: the woman driver goes into the
phone box
b. Causal: why is the car there? b. Causal: the woman wants to make a phone
call
Question
who between
Table 4.5: Dependency relation is she ringing?
Shots 1 why?
and 2… etc.
in Phase 1 of the Mitsubishi Carisma advertisement
Some sources of coherence in the Mitsubishi Carisma advertisement: Phase 1 239

of the kind where is the car going ?, will it stop at the telephone box?, why is it where it
is?, spring to mind. This does not mean that all of these questions or others that
might also be posed are equally relevant. The point is that Shot 1 in this case sets
in motion a process of raising questions that require resolution. Narrativity as
meaning making process is in evidence on this relatively small-scale level.
Shot 1 is therefore a sort of mini-complication. Shot 2 provides answers to
the questions raised in Shot 1. The car stops at the telephone box, the driver makes
a telephone call, and so on. Shot 2 therefore provides a local resolution to the kinds
of questions and problems raised by Shot 1. In doing so, Shot 2 in turn raises
further questions that will in their turn require answers, e.g. who is she ringing up?
why?, and so on. In the case of the dependency relation between Shots 1 and 2, we
can see how both temporality and causality are simultaneously affirmed and ques-
tioned, as outlined in Table 4.5.
The two logics operate simultaneously, though one or the other may be
foregrounded in any particular instance. Narrative discourse requires both tempo-
rality and causality in order that the event sequence brings about transformation or
change as the text develops. This criterion operates on many different levels and
lies at the heart of the Complication^ Resolution type of organisation which is
characteristic of narrative. It also shows that the Complication and the Resolution
are not localised constituents or segments that occur in just one part of a narrative
text and its generic structure, as the Labov and Waletzky (1967) generic schema and
its development in systemic-functional linguistics may suggest (Martin, 1992: 564-
565). Rather, these units are a pervasive feature of narrative on different scalar
levels of its discourse organisation. The raising of questions and the providing of
answers to them satisfies the most fundamental requirement of transformation in
narrative discourse. The Complication^ Resolution schema can have many more
specific instantiations, e.g. Action^ Action, Action^ Reason, Action^ Result,
Action^ Consequence, and so on. In all these cases, it is not merely the temporal
sequencing of events that is important, but the causal logic whereby features of the
text undergo change from one action or event to the next. As the discussion in the
current section has shown, there is a close connection between the logical
organisation of texts and the fundamentally dialogical principles that underlie all
forms of semiosis. We would like to suggest that the tie up between the logical
organisation of the temporal and causal relations between textual units and the dia-
logical basis of all semiosis is far from coincidental.
In 4.11.7 we offer some further suggestions concerning the ways in which
logical coherence and interactional coherence are closely related to each other.

4. 11. 7. Some sources of coherence in the Mitsubishi Carisma advertisement: Phase 1


Texts like the Mitsubishi Carisma advertisement are in the first instance interactive
events. In our view, this is the primary source of their coherence. Events of this kind
240 Multimodal Transcription and Text Analysis: Chapter 4

are colourful and noisy affairs which impact upon and appeal to our senses. They
attract our attention largely on the basis of visual and acoustic waves of rhythmic
patterning which we perceive as episodic. The coherence of a communicative event
is the result of the ways we make meaningful connections among different compo-
nents of what we interpret as the same overall event or text. In the first instance,
events of this kind announce themselves as interactive events. The television viewer
is not simply a processor of abstract visual and other information, but is placed in an
interactive relationship with the text. The viewer is positioned as an addressee who
plays an active role in the interpretation of the text.
In the visual semiotic, the division of the text into distinct, though interrelated
shots, and the ways in which the viewer is positioned to adopt certain perspectives or
viewing positions with respect to the depicted world as mediated by choices in
camera angle (horizontal and vertical), camera distance, and camera movement all
play a role in constituting the viewer as an active and responsive addressee. Choices
in these systems interact with other visual resources, such as the gaze vectors of the
participants in the depicted world as well as choices in other semiotic systems (e.g.
the off-camera presenter in Phase 2 who directly addresses the viewer), in ways that
create addressee roles and positions for the viewer and therefore possible or pre-
ferred ways of responding to the advertisement.
Many approaches to coherence in discourse start with various kinds of
cognitive operations, whereby principles of logical continuity can be derived.
However, it seems to us that coherence is founded in the first instance on principles
of interactional coherence before principles of logical continuity come into the
picture. The wavelike patterning of visual, musical, and spoken rhythms provides the
platform on which other more specified forms of interactional coherence such as
those mentioned above, come into play. This is analogous in some ways to the dia-
logic organization of turn-taking and the associated speech roles in spoken discourse.
The text’s rhythmic patterns provide a basis for synchronizing the viewer’s own body
rhythms with those of the text. The ability to attend to these patterns provides the
basis for the further recognition of interactive units whereby the viewer is able to
take up specific interactive roles in relation to the text. This in turn leads to the abil-
ity to recognise event-like or episodic units in the overall flow of the text and to con-
strue various kinds of semantic or other meaningful relations among different parts
of the text. Thus, we can construct meaningful relations among separate shots as
parts in larger, meaningful wholes, rather than seeing each shot as a discrete unit
which is unrelated to what comes before it and what goes after it.
With reference to the transcription of this advertisement, we can see how
camera position and camera movement function to position the viewer in relation to the
depicted world of the text. In Shot 1, the approaching car is seen from a distance; its
movement towards the viewer indicates a potential for some kind of interactional
involvement with the viewer. Shot 2 is a very close shot of the telephone receiver as
Some sources of coherence in the Mitsubishi Carisma advertisement: Phase 1 241

the woman’s hand picks it up in order to make a phone call. The close nature of this
shot in contrast to the previous shot invites the viewer to enter into the interpersonal
space that is created by the close-up shot of the woman’s action. Shot 3 cuts to the
woman speaking on the phone; the woman is shown frontally in medium close-up.
For the first time, she is individuated as a result of these choices in camera distance
and angle. The viewer is able to enter into her world and, potentially, to identify with
it. Shot 4a cuts to the man with whom she is speaking on the phone. He too is indi-
viduated by a medium close shot; however, the more oblique angle here and the inter-
play of light and shade entail a different kind of interpersonal positioning for the
viewer with respect to the man. In this case, the man is positioned as being remote
from the world of the viewer and the latter is being asked to evaluate him in this light.
In Shot 4b, the camera pan tracks the man’s gaze vector so that the viewer’s position is
aligned with that of the man as he looks at his hostage, who is seen suspended above
the shark-infested waterway. In this case, the combined effect of the camera pan and
the gaze vector enable the viewer to interpret the man’s intentions with respect to his
hostage at the same time that a negative evaluation of his intentions is implied.
The observations made in the previous paragraph in connection with camera
position and camera movement shows some of the ways in which each shot is an
interactive unit which functions interpersonally to create a viewing position for the
viewer and, on that basis, a potential evaluative or other response on the part of the
viewer. Selections of choices in various interpersonal systems such as those men-
tioned here create an interactional relationship between viewer and different parts of
the text at the same time that the viewer is invited to adopt different kinds of
evaluative positions and stances vis-à-vis the participants and their actions. Each shot
therefore orients the viewer and the viewer’s potential responses to the world of the
text. It is the shift from one perspective and potential response point to another
which creates an overall sense of interactional or interpersonal coherence. In this
way, the episodic nature of the text as a series of interrelated events becomes evi-
dent. This in turn paves the way for the more explicit forms of ideational and logical
coherence which more cognitively oriented approaches tend to focus on.
Table 4.6, on the following page, presents three sources of ideational
coherence in Phase 1 of the Mitsubishi Carisma text. Column 1 shows the number of
each of the shots in Phase 1. The other three columns each suggest a source of
coherence in the top row; the rows corresponding to the various shots in turn indi-
cate how this principle is instantiated in each shot. Thus, Column 2 shows that each
shot features a particular instantiation of the same general class of participant, i.e.
the telephone. However, we know that the different instantiations across the
various shots index two telephones – the one used by the woman and the one used
by the man as they speak to each other. In this case, the coherence is provided by
the fact that the different instantiations are of the same class of participant and
that this is a textual means of connecting the woman and the man to each other.
242 Multimodal Transcription and Text Analysis: Chapter 4

Column 2 shows that different appearances from shot to shot of the woman and the
man are linked by the sameness of their appearance even though these visual forms
may have different participant roles in different visual transitivity frames. For
example, the unseen woman in Shot 1 can be retroactively construed as the Actor
who drives the car in that shot. In Shot 2, she is likewise interpretable as the Actor
who moves her hand to pick up the telephone. In Shot 3, where the woman’s face is
shown for the first time, we nevertheless connect this appearance of the woman to
the previous ones, even though in this shot she is now Sayer (not Actor) in the visual
transitivity frame which is shown here. Finally, Column 3 reconstructs the principle
of temporal continuity whereby the different visual processes across the different
shots and associated visual transitivity frames are seen as related actions or events in
a chronological sequence of events. The interplay between logical coherence and
interactional coherence in video texts shows some of the ways in which video texts
such as television advertisements have hypertextual characteristics not unlike those
exhibited by websites (Chapter 3 ). In 4.11.8, we shall consider this issue.

4. 11. 8. Counter-expectancy and hypertext in the Mitsubishi Carisma advertisement


The Mitsubishi Carisma advertisement also shows how visual resources can function to
create relations of expectation and counter-expectation concerning the development of
the text. This section shows how the strategies used and the meanings created fore-
ground the dialectical interplay of the linear succession of textual units (e.g. shots) and
the non-linearity of many of the meaning relations constructed in and through the

Same class of Sameness and difference of Sequential-temporal continuity of


Participant: phone; Participant roles and their Processes; one Process leads to
different instantiations instantiations the next in a temporal sequence
same element: phone same participant: woman Process: action: car drive to
1 box1; by contiguity: car: Actor1 phone box
whole: outside
2 same element: phone same participant: woman continuity of Process: action:
box1; part: inside by contiguity: hand; Actor1 hand takes phone receiver
same element: phone same participant: woman: continuity of Process: verbal:
box1; part: inside: Sayer1; woman speaks on phone
3 expand different transitivity roles;
same participant
4 different element: different participant: man; continuity of Process: man
phone2 Sayer2 speaks to woman
5 same element: phone2 same participant: man: continuity of Process: man
Sayer2 + Gazer speaks to woman
different element: different participant: continuity of action: woman
6 phone1; part: inside woman: Actor1 leaves phone box and returns to
looking out car
Table 4.6: Three sources of visual-textual ideational coherence
in the Mitsubishi Carisma advertisement: Phase 1 only
Counter-expectancy and hypertext in the Mitsubishi Carisma advertisement 243

linear unfolding of the text. We shall also suggest that this is not unlike the processes of
textual development that we examined in relation to hypertext trajectories in Chapter 3.
Shots 18-20 feature the superimposition of the car on the road driving to
Prague (18) and the display of the words ‘Prague, Wednesday’ informing the viewer
of the car’s arrival in Prague (19) and the elapsed time since the previous car drive
sequence and arrival in Vienna on Tuesday. Shot 19 is in turn also superimposed with
Shot 20. Shot 20 shows the woman inside another phone box and her car parked near
the phone box. The sequential unfolding of shots in the text contributes to the
development of a textual trajectory. However, the linearity of the sequence does not
mean that the meaning relations constructed through these relations are necessarily
linear. The superimposition or overlapping of shots in the sequence of shots creates
a high degree of thematic compression in the visual semiotic. A shot is always a
choice point or a node in an unfolding textual trajectory or syntagm at the same time
that a given choice is made in relation to a paradigmatic class of possible alternative
choices in some contrast set. The superimposition of shots foregrounds to a greater
extent the paradigmatic potential of the shot as a choice point in the unfolding visual
text. Again, we see the dually paradigmatic and syntagmatic dimensions of meaning-
making being foregrounded in ways that reveal the hypertextual characteristics of
video texts such as the Mitsubishi Carisma advertisement (see Appendix II ).
The superimposition of these shots is part of a paradigmatic pattern which had
been established by the similar – not identical – patterns in Shots 8 to 11 and Shots
18-19. Thus, similar situations – talking to the man from different phone boxes and
driving to Vienna then to Prague – create a paradigmatic set based on sameness with
variation. This pattern in turn conditions the viewer’s expectations concerning the
further development of the storyline. The transition from Shot 20 to Shot 21, which
shows the director on the film set holding a megaphone to his mouth, breaks with the
expectation that has been created. A different alternative, one which was not predict-
ed by the previous choices in the text, is made at this point, as the text unexpectedly
switches from the thriller plot to the director who is making the film, thereby intro-
ducing a new participant into the text who was not in any way part of the previous
storyline up to Shot 20. At this point (Shot 21 ), the text has jumped to a new, unex-
pected storyline, which features both new participants (e.g. the film director) and new
roles for old ones (e.g. the angry woman actor who walks off the film set in protest).
In ways that are similar to the hypertextual strategies analysed in relation to the web
page in Chapter 3, the textual processes described here call attention to the dialecti-
cally dual character of the text as system and process. The television viewer is made
aware of the process through highly self-conscious metadiscursive strategies which fore-
ground the dialectic of system and process as the text unfolds along its trajectory in
ways that are similar to the examples of hypertext discussed in 3.10, pp. 156-161.
The choice of these particular patternings create a pattern of similarity with
variation in the two car drive sequences. The choices made at these two points in the
244 Multimodal Transcription and Text Analysis: Chapter 4

text therefore condition the viewer’s expectations concerning later choices on various
scalar levels, such as visual transitivity frames, shots, sequences of shots, and so on.
The compression and superimposition in the two sequences creates expectations as to
what will happen next when the woman gets into the phone box on arrival in Prague:
Will she meet the man and secure the release of her boyfriend? Will she be asked to
continue the car journey?, and so on. This highlights the dynamic character of both
system and text. Textual meanings are made over time, never entirely according to our
expectations. Probabilities in the system are reweighted for each situation and each
text at the same time that they dynamically shift in the process of text production.
Hypertext draws attention to this process, though it is not unique to hypertext.
Expectations that are set up by a pattern of choices at some point in the text can be
confirmed or frustrated by choices made at some later stage of the text’s development.
The typical conjunction of situations that are achieved by the superimposition
of shots in the two sequences are a visual resource that functions in ways similar to
conjunctions in language. Just as conjunctive relations in language can provide indi-
cations concerning what to expect in the development of discourse (Martin, Rose,
2003: 128-133), these visual resources function in the same way to lead us to expect
a certain pattern in the further development of the event sequence (Van Leeuwen,
1990), only to see this frustrated by the unexpected twist in the plot. In the Mitsubishi
Carisma advertisement, the expectation that is created by the visual strategy of
compression and superimposition of paradigmatically related situations in the event
sequence leads the viewer to expect that the text will continue to make choices from
the same contrast set in the further development of the action. In this perspective,
the congruency of phase-specific copatternings of choices made up to this point can
be expected to function to maintain and develop the same situation, e.g. bringing the
story to a typical happy ending such as girl reunited with boy. The counter-expectation
results from the selection of a different choice from some alternative contrast set in
relation to the different situation of the film studio which occurs in Shot 20, and it
is developed from that point in the text.
Viewers’ expectations concerning the way the text develops – e.g. its plot
structure – are created by establishing a paradigmatic pattern in the syntagmatic
development of the text. In the present example, this is maximally foregrounded by
the superimposition and partial merging of different shots in the two sequences
referred to above. The visual semiotic of the film shot does not have the resources
of conjunction that are characteristic of discourse-level relations between clauses in
linguistic text. However, other specifically visual resources can be deployed to create
relations of expectation and counter-expectation in the development of a sequence
of shots and the transitions between shots. The superimposition or partial merging
of shots in the two sequences in the Mitsubishi Carisma advertisement is an example
of this. These two sequences show typical conjunctions of expected events in the
two car drive sequences. In Shots 8-11, the road to Vienna and the woman at the
Inset 19 245

Inset 19: Negotiation


� The term negotiation refers to the processes whereby the participants in meaning-
making activities variously initiate, respond to, take up, further develop and resolve the
meanings they jointly create and which mediate and help to give sense to these same
activities. Negotiation will be illustrated in relation to the following example:
(1) Q: Let me ask you this: Did you have any Vietnamese saved up for the mine field?
(2) A: No, sir, I did not.
(3) Q: Did you testify that you received any order before you left LZ-Dotti to save some of them
for the mine field?
(4) A: Yes, sir, I did.
(5) Q: Why didn’t you save some up for the mine field?
(6) A: Captain Medina rescinded that order and told me to waste them, sir.
(7) Q: When did he rescind that order?
(8) A: When he called me on the radio, when he was in the eastern part of the village, sir.
(9) Q: Did he specifically tell you to disregard the previous order?
(10) A: No, sir. He said those people were slowing me down, waste them, sir.
(11) Q: Save none for the mine field?
(12) A: No, sir.
(13) Q: So you inter preted it to mean save none for the mine field, is that right?
(14) A: The second time he told me, yes, sir.
[Source: Direct examination by George Latimer: Lt. William Calley, Witness for the Defense, My Lai
Court Martial Trial]

�In the above excerpt, the negotiation hinges on the meaning of Captain Medina’s words
those people were slowing me down, waste them as reported in (10) by Calley. In (11), the
questioner’s ellipted interrogative clause suggests one possible interpretation for the
meaning of this locution and asks Calley to provide his own perspective on the
proposition in the questioner’s interrogative clause, i.e. to verify whether these words
correspond to Calley’s own interpretation of what Captain Medina is reported in (10)
to have said or whether they correspond to what Captain Medina actually said on the
radio to Calley as they spoke to each other while in different parts of the Vietnamese
village in question. In (12), Calley’s negative reply refutes that interpretation. In (13),
the questioner then returns to the point previously made in (11) by adopting a different
discourse strategy, which is realized by the selection [declarative clause + polar inter-
rogative clause]. This choice puts the focus on Calley’s own interpretation of the
meaning of the words referred to before and therefore on his own responsibility in
making that interpretation. Rather than formulating the question so as to put the focus
on Captain Medina, e.g. what did Captain Medina mean? or what did Captain Medina say?,
the declarative clause in (13) focuses on Calley’s interpretation of the meaning of what
was previously said, viz. you interpreted it to mean … This declarative utterance begins
with the conjunction So, which connects the clause back to Calley’s negative confirma-
tion in (12) that Medina was responsible for the interpretation suggested by the
questioner in (11). (13) is a complex discourse move realized by two clauses. The first
part of (13) takes the form of a logical inference which the questioner draws at this
point on the basis of the immediately prior exchange with Calley. The selection [polar
interrogative; positive polarity] in the second part functions to seek Calley’s confirma-
tion of the meaning of the declarative clause as the questioner seeks to attribute this
meaning to Calley at this point in the exchange (i.e. in 13).
246 Multimodal Transcription and Text Analysis: Chapter 4

Inset 19: (continued)


�The declarative clause So you interpreted it to mean save none for the mine field can be inter-
preted as a Statement insofar as it asserts the logical inference which Calley’s questioner
makes on the basis of what was said previously. It is also interpretable in conjunction
with the interrogative clause as a Question whereby the questioner seeks confirmation
of the proposition in the declarative clause. In his response (14), Calley both responds
affirmatively to the speaker’s desire for confirmation, as indicated by the yes/no inter-
rogative clause when he says yes, sir at the same time that he also provides, in the form
of the temporal circumstance The second time he told me, a qualification of the meaning
of the questioner’s declarative proposition. In other words, Calley’s response takes up
and responds to two aspects of the meaning potential of the previous speaker’s utter-
ance, as follows: (1) he negotiates the meaning of the declarative proposition by adding
to it further semantic specification as to the precise point in time at which he inter-
preted Captain Medina’s words in the way indicated by his questioner; and (2) he nego-
tiates the polarity of the polar interrogative clause by confirming the rightness of the
questioner’s proposition when he responds with yes. Calley’s response in (14) can there-
fore be seen as a further semantic development along these two dimensions of the
meaning potential of the previous speaker’s discourse move.
�Overall, this example and the brief analysis given here serve to show that meanings are
not already fixed in linguistic forms, but are developed in the course of the negotiation
that ensues between participants. The interpretation of the locution those people were
slowing me down, waste them as meaning save none for the mine field does not inhere in the
linguistic forms used or in the ways these are combined into larger-scale units that con-
form to regular lexicogrammatical patterns in the English language. Instead, the
relationship of semantic equivalence between the two items is established in and
through the negotiation that takes place between the two speakers. Negotiation is an
emergent consequence of the jointly coordinated activity that takes place. This activity
and its emergent meanings are controlled – not by the ‘construal’ of the phenomena
of experience by the semantic categories of the linguistic system – but by the ways in
which brains and bodies coordinate events in the world as the two participants respond
to each other in the real-time unfolding of the event. The semantic equivalence of the
two terms in this context is a result of this activity-driven and semiotically mediated coor-
dination of events linking bodies, brains, and world.
�In other words, what the two participants in this case understand is based on their co-
ordination with the activity and the modes of understanding this gives rise to. This
understanding – e.g. the semantic equivalence referred to above – is an achievement
which is as much dependent on skills of the ‘know how’ kind (e.g. being a skilful cross-
examiner of witnesses) and on procedures (e.g. in the military court room) that are
embedded in social practices which enable the brain (e.g. of the witness, Lt. Calley) to
adjust action and understanding of events to the body’s concurrent and past experi-
ences of this here-now event and its embodied memories of relevant past experiences.
�Negotiation does not necessarily entail equality or symmetry between participants.
Generally speaking, difference and asymmetry of various kinds provide a motivation
for sustaining the negotiation of meanings. For example, the witness is required to
respond to questions that seek a closed yes/no response, rather than to semantically
more open-ended questions such as, for example, how did you interpret Captain Medina’s
words? or what did you think Captain Medina meant?.
Inset 19 247

Inset 19: (continued)


�Moreover, the relationship between the two speakers is not one based on equality; the
two parties to the exchange are not equally free to express themselves without fear of
coercion or sanction. Instead, the institutional constraints in operation – this is a mili-
tary court martial – require that only one speaker (the examining lawyer) is entitled to
ask questions, which the witness must answer. The examining lawyer therefore sets and
controls the discourse agenda, the questions that are asked, and the kinds of answers
that are deemed acceptable. This is also reflected in the repeated use of the deference
marker sir on the part of the witness, which indexes the unequal distribution of rights
and obligations and the power relations that pertain to these in this context.
� Negotiation does not therefore necessarily equate with equality, though this is not in
principle excluded in other context-types where other norms operate (e.g. casual conver-
sation between intimates). Instead, negotiation refers to the dynamic processes of
meaning construction and interpretation as the interactants orient to and evaluate each
other, the discourse topic, and its associated referent situation. Above all, the negotiation
of meaning in discourse is driven by the semiotic and/or material ‘friction’ or difference
(Thibault, 1995) between the perspectives of the interactants in exchanges such as the
one discussed above. In the current example, there is ‘friction’ between the differing per-
spectives and interpretations of the two speakers concerning the meaning and validity in
the context of the crucial distinction between save none versus save some for the mine
field. The negotiation of this distinction leads to the creation of a local semantic equiv-
alence between the terms those people were slowing me down, waste them and save none for
the mine field, when Calley, in (14), agrees to this interpretation on the part of his
questioner. His assent therefore produces a local semantic resolution of the difference in
interpretation and perspective which the two speakers were negotiating up to this point.
Each speaker’s move is in some way functional in furthering this process of negotiation.
� Meanings do not reside in the linguistic and other semiotic forms which realize texts. The
particular semiotic choices and their combinations and copatternings on many different scalar
levels of organization provide cues and guidelines which both constrain and enable the nego-
tiation of meanings that takes place in and through these forms whenever the participants in
some occasion of discourse interact with each other. In this sense, the lexicogrammar is a
resource which participants use to make and negotiate meanings with each other.
� In this perspective, texts are the instantiation of a constrained meaning-potential rather than
containers of meanings that already exist in their forms. This potential is activated and
negotiated as a contextualized meaning in the course of discursive interaction as
participants seek to achieve their goals. Meaning is the result of the interaction of many
different constraints as the discursive event unfolds in time in the course of meaning-
making activity. Some such constraints are in the text, others derive from grammatical and
other forms of organization permitted by the various codeployed semiotic systems,
others from the experience and expectations of the participants concerning factors such
as genre, relevant intertexts, knowledge of world and society, social positioning, historical-biog-
raphical experience and so on. The textual transcription of this event, which was obtained
from a website about this case, enables the analyst to reconstruct some aspects of the nego-
tiation of meaning that took place in the original event, i.e. those aspects that are realized by
the lexicogrammatical selections referred to above and the ways in which they, in turn, con-
tribute to the realization of each speaker’s dialogically coordinated move in the unfolding
discourse. The text is a partial record of that event; it cannot substitute for the fully
embodied multimodal occasion of meaning making that took place in the courtroom.
248 Multimodal Transcription and Text Analysis: Chapter 4

steering wheel in her car are superimposed. In Shots 18-19, there is a similar super-
imposition of car driving on the road to Prague and passport control, as the woman
crosses the border from Austria into Hungary. The two sequences compress in this
way typical turns of events concerning the temporal development of the story line.
In each case, we see situations being merged that we would normally expect to occur
close together at that particular stage of the unfolding story. Expectation is, however,
thwarted in the present example by the change in the event line and participant roles
in the cut from Shot 20 to 21. The new features in Shot 21 replace those in Shot 20
and break the previous (and expected) continuity in the development of the narra-
tive event line. The transition between these two shots creates a discontinuity in
terms of both location (i.e. car journey from one city to another then switching to
film set in studio) and in terms of narrative event line (i.e. thriller plot then switch-
ing to the making of the film in the studio).

4. 12. Conclusion: the shape of things to come

The transcriptions and text analyses presented in this book are a first step towards
the formulation of better multimodal transcription practices and the development
of computer-assisted tools for the storage, retrieval, processing and analysis of
multimodal texts. A central goal is the construction of multimodal corpora with a
view to the development of new categories of text analysis and description, a
necessary stage in the construction of the next generation of text-based corpora
(Baldry, Thibault, 2005).
In spite of the important advances made in the past thirty or so years in the
development of linguistic corpora and related techniques of analysis, a central and
unexamined theoretical problem remains, namely that the methods adopted for
collecting and coding texts isolate the linguistic semiotic from the other semiotic
modalities with which language interacts. In other words, linguistic corpora as so
far conceived remain intra-semiotic in orientation. In contrast, multimodal corpora
are, by definition, inter-semiotic in their analytical procedures and theoretical
orientations.
This entails new methods for the collecting, coding, storing and analysing of
textual data. There are, of course, many practical and technical difficulties that will
need to be overcome in order to realise this objective. A central requirement in
such an enterprise will be (1) transparency of cross-modal coding criteria whatever
the modality in question; and (2) retrievability of inter-semiotic relations such as,
for example, the copatterning of written text and visual image or spoken language
and body kinesics among others. At the present stage of computer technology,
some kind of language-based and/or visual coding systems remain the most
feasible procedure. Nevertheless, the data so coded will need to be referenced both
to specific transcriptions as well as to electronically stored databases comprising,
for example, video clips, video texts and multimodal printed pages, multimodal
transcriptions like the prototype presented in this book and so on (Baldry, 2000b).
Only in this way can we begin to quantify on a sufficiently large scale the sys-
tematic relations between language and the other semiotic modalities with which it
is co-contextualised in the making of genre- and context-specific meanings. If lan-
guage form and function are themselves shaped by the kinds of intersemiotic
relations into which language typically enters, then it may be argued that those con-
cordancing practices which ignore this fundamental fact about language will fail in
the longer run to provide entirely adequate explanations of language itself and the
ways in which language, too, is changing under pressure from the newly emergent
forms of multimodal and multimedia meaning-making practices with which lan-
guage is codeployed and with which it has always coevolved.
Neither the present chapter nor the current book claim to provide fully
worked out solutions to all of the problems addressed. The core of the enterprise
is to provide a dynamic account of multimodal meaning making which integrates
both micro- and macro-level processes in ways that bring about and facilitate plausi-
ble and logical accounts of the processes of the co-contextualisation of diverse
semiotic modalities and of the data that is accumulated and stored in some corpus
of multimodal texts. In spite of the pioneering work of Bateson, Birdwhistell, Hall
and others in the 1950’s, the multimodal study of human social meaning-making
remains in its infancy. The main concern in this book has been to provide a
methodological and theoretical starting point on the basis of which further work
towards the goals outlined above might be undertaken. It necessarily follows that
the tentative and as yet incomplete proposals made here will undergo further
development and modification.
References
(a) Primary sources for text analyses
Bargellini Alberto, Fratello, Mario and Monfroni, Luciana (1985). Scienze per il 2000:
Introduzione interdisciplinare allo studio delle scienze chimiche, fisiche e naturali. Vol.
2. Milan: Signorelli.
Curtis, Helena (1975[1972]). Biology. New York: Worth Publishers Inc.
Darwin, Charles (1989 [1880]). The Power of Movement in Plants. London: Pickering.
Orginally published: London: Murray.
Guglielmi, Bona and Ferrari, Ercole (1982 [1980]). Scienza. Natura. Società: Corso di
scienze chimiche, fisiche e naturali per la scuola media Vol. 2. First edition, sixth
reprinting. Turin: Paravia.
King, Beryl A. (1957). Australian Biology for High School Junior Classes. Fifth revised edi-
tion. Sydney and Brisbane: William Brooks & Co.
Marx, Karl (1909[1908]). Capital: a critique of political economy, Vol 1: The process of cap-
italist production. Translated from the third German edition, by S. Moore and E.
Aveling. Edited by Friedrich Engels. Revised and amplified according to the 4th
German ed. by Ernest Untermann. Chicago: Charles Kerr & Co.
The Economist, Sept. 5th -11th 1998, London (Volume 348, Number 8084).
Tan, Amy (2001). The Bonesetter’s Daughter. London: Flamingo.
Worrell, Eric. (1963). Reptiles of Australia. Sydney and London: Angus and Robertson.

(b) Scientific works


Abercrombie, David (1967). Elements of General Phonetics. Edinburgh and Chicago:
Edinburgh University Press.
Auer, Peter (1992). “Introduction: John Gumperz’ approach to contextualization”. In
Peter Auer and Aldo di Luzio (eds.), The Contextualization of Language.
Amsterdam/Philadelphia: John Benjamins, pp. 1-37.
Bakhtin, Mikhail (1973 [1929]). Problems of Dostoevsky’s Poetics. R. W. Rotsel (trans.).
Ann Arbor: Ardis.
Bakhtin, Mikhail (1981 [1975]). “Discourse in the novel”. In Michael Holquist (ed.), The
Dialogic Imagination: Four essays. Austin: University of Texas Press, pp. 259-422.
Bakhtin, Mikhail (1986). “The problem of speech genres and text types”. In Caryl
Emerson and Michael Holquist (eds.). Vern W. McGee (trans.). Speech Genres
and Other Late Essays. Austin: University of Texas Press, pp. 60-102.
Baldry, Anthony (2000a). “English in a visual society: comparative and historical
dimensions in multimodality and multimediality”. In Anthony Baldry (ed.),
252 Multimodal Transcription and Text Analysis

Multimodality and Multimediality in the Distance Learning Age. Campobasso: Palladino


Editore, pp. 41-89.
Baldry, Anthony (2000b). “Introduction”. In Anthony Baldry (ed.), Multimodality and
Multimediality in the Distance Learning Age. Campobasso: Palladino Editore, pp. 11-39.
Baldry, Anthony (2004). “Phase and transition, type and instance: patterns in media
texts as seen through a multimodal concordancer”. In Kay O’Halloran (ed.),
Multimodal Discourse Analysis: Systemic functional perspectives. London and New York:
Continuum, pp. 83-108.
Baldry, Anthony, Beltrami, Michele, (2005). “The MCA Project: concepts and tools in
multimodal corpus linguistics”. In Maj Asplund Carlsson, Anne Løvland and
Gun Malmgren (eds.), Multimodality: Text, culture and use . Proceedings of the Second
International Conference on Multimodality. Kristiansand: Agder University
College/Norwegian Academic Press, pp. 79-108.
Baldry, Anthony, Thibault, Paul J. (2001). “Towards multimodal corpora”. In Guy
Aston and Lou Burnard (eds.), Corpora in the Description and Teaching of English.
Bologna: CLUEB, pp. 87-102.
Baldry, Anthony, Thibault, Paul J. (2005). “Multimodal corpus linguistics”. In Geoff
Thompson and Susan Hunston (eds.), System and Corpus: Exploring connections.
London and New York: Equinox, pp. 164-183.
Barthes, Roland (1967). Système de la Mode. Paris: Éditions du Seuil.
Bateson, Gregory (1973 [1972]). “A theory of play and fantasy”. In Gregory Bateson,
Steps to an Ecology of Mind. London and New York: Granada, pp. 150-166.
Bateson, Gregory (1987 [1951]). “Information and codification: a philosophical
approach”. In Jurgen Ruesch and Gregory Bateson, Communication: The social
matrix of psychiatry. New York and London: Norton & Co., pp. 168-211.
Beattie, Geoffrey W. (1981). “Sequential temporal patterns of speech and gaze in dia-
logue”. In Adam Kendon (ed.), Nonverbal Communication, Interaction and Gesture.
The Hague and New York: Mouton, pp. 297-320.
Beaugrande, Robert de (1997). New Foundations for a Science of Text and Discourse:
Cognition, communication and the freedom of access to knowledge and society.
Norwood, NJ: Ablex.
Bereiter, Carl (1997). “Situated cognition and how to overcome it”. In David Kirshner
and James A. Whitson (eds.), Situated Cognition: Social, semiotic, and psychological
perspectives. Mahwah, NJ and London: Lawrence Erlbaum, pp. 281-300.
Bernstein, Basil (1990 [1981]). “Codes, modalities, and the process of cultural repro-
duction: a model”. In Basil Bernstein, Class, Codes and Control, Volume IV: The
structuring of pedagogic discourse. London and New York: Routledge, pp. 13-62.
Birdwhistell, Ray L. (1952). Introduction to Kinesics: An annotation system for analysis of body
motion and gesture. Kentucky: University of Louisville, Dept. of Psychology and
Social Anthropology.
Birdwhistell, Ray L. (1972 [1961]). “Paralanguage twenty-five years after Sapir”. In John
Laver and Sandy Hutcheson (eds.), Communication in Face to Face Interaction: Selected
readings. Harmondsworth, Middlesex: Penguin, pp. 82-100.
References 253

Bühler, Karl (1990 [1934]). Theory of Language: The representational function of language.
Donald Fraser Goodwin (trans.). Amsterdam/Philadelphia: Benjamins.
Cheong, Yin Yuen (2004). “The construal of ideational meaning in print advertise-
ments”. In Kay O’Halloran (ed.), Multimodal Discourse Analysis: Systemic functional
perspectives. London and New York: Continuum, pp. 163-195.
Cook, Guy (2001 [1992]). The Discourse of Advertising. London and New York: Routledge.
Couper-Kuhlen, Elizabeth (1992). “Contextualizing discourse: the prosody of interac-
tive repair”. In Peter Auer and Aldo di Luzio (eds.), The Contextualization of
Language. Amsterdam/Philadelphia: John Benjamins, pp. 337-64.
Crystal, David (1972). “The intonation system of English”. In Dwight Bolinger (ed.),
Intonation: Selected readings. Harmondsworth, Middlesex: Penguin, pp. 110-36.
Crystal, David (1982). Profiling Linguistic Disability. London: Arnold.
Daneš, František (1974). “Functional sentence perspective and the organisation of the
text”. In František Daneš (ed.), Papers on Functional Sentence Perspective. The
Hague: Mouton, pp. 106-28.
Daneš, František (1989). “Functional sentence perspective and text connectedness”. In
Maria Elizabeth Conte, János Petöfi and Emel Sözer (eds.), Text and Discourse
Connectedness. Amsterdam/Philadelphia: John Benjamins, pp. 23-31.
Darwin, Charles (1955 [1872]). The Expression of the Emotions in Man and Animals.
New York: Philosophical Library.
Davidse, Kristin (1992). “A semiotic approach to relational clauses”. In Occasional
Papers in Systemic Linguistics 6: 99-131.
Echard, William (1996). “Working paper on the notion of style, by way of auditory stream-
ing and social semiotics”. Department of English, York University, Toronto: Mimeo.
Firth, John R. (1957 [1934]). “The use and distribution of certain English sounds: pho-
netics from a functional point of view”. In John R. Firth, Papers in Linguistics
1934-1951. London and Oxford: Oxford University Press, pp. 34-46.
Firth, John R. (1957 [1950]). “Personality in language and society”. In John R. Firth,
Papers in Linguistics 1934-1951. London and Oxford: Oxford University Press,
pp. 177-189.
Fuller, Gillian (1998). “Cultivating science: negotiating discourse in the popular texts of
Stephen Jay Gould”. In James R. Martin and Robert Veel (eds.), Reading Science:
Critical and functional perspectives on discourses of science. London and New York:
Routledge, pp. 35-62.
Gibson, James J. (1986 [1979]). The Ecological Approach to Visual Perception. Hillsdale, NJ
and London: Lawrence Erlbaum.
Goffman, Erving (1985 [1976]). Gender Advertisements. London and Basingstoke:
Macmillan.
Goodman, Sharon (1996). “Visual English” First part of Chapter 2 in Sharon
Goodman and David Graddol (eds.), Redesigning English: New texts, new identities.
London and New York: Routledge, pp. 38-72.
Goodwin, Charles (1994). “Professional vision”. In American Anthropologist 96 (3): 606-
33.
254 Multimodal Transcription and Text Analysis

Goodwin, Charles, Goodwin, Marjorie Harness (1992). “Context, activity and partici-
pation”. In Peter Auer and Aldo di Luzio (eds.), The Contextualization of
Language. Amsterdam/Philadelphia: John Benjamins, pp. 77-99.
Gregory, Michael (1995). “Generic expectancies and discoursal surprises: John Donne’s
The Good Morrow”. In Peter H. Fries and Michael Gregory (eds.), Discourse in
Society: Systemic functional perspectives. Meaning and choice in language: Studies for Michael
Halliday. Norwood, NJ: Ablex, pp. 67-84.
Gregory, Michael (2002). “Phasal analysis within communication linguistics: two con-
trastive discourses”. In Peter Fries, Michael Cummings, David Lockwood, and
William Sprueill (eds.), Relations and Functions in Language and Discourse. London:
Continuum, pp. 316-345.
Greimas, Algirdas Julien (1966). Sémantique structurale. Paris: Larousse.
Gumperz, John J., Berenz, Norine (1993). “Transcribing conversational exchanges”. In
Jane A. Edwards and Martin D. Lampert (eds.), Talking Data: Transcription and
coding in discourse research. Hillsdale, NJ: Lawrence Erlbaum, pp. 91-121.
Hall, Edward T. (1972 [1963]). “A system for the notation of proxemic behaviour”. In
John Laver and Sandy Hutcheson (eds.), Communication in Face to Face Interaction.
Harmondsworth: Penguin, pp. 247-273.
Halliday, M.A.K. (1978). Language as Social Semiotic: The social interpretation of language and
meaning. London: Arnold.
Halliday, M.A.K. (1979). “Modes of meaning and modes of expression: types of gram-
matical structure and their determination by different semantic functions”. In David
J. Allerton, Edward Carney and David Holdcroft (eds.), Function and Context in
Linguistic Analysis: A Festschrift for William Haas. Cambridge: Cambridge
University Press, pp. 57-79.
Halliday, M.A.K. (1989). “Part A.”. In M.A.K. Halliday and Ruqaiya Hasan (eds.),
Language, Context and Text: Aspects of language in a social-semiotic perspective. Oxford:
Oxford University Press, pp. 1-49.
Halliday, M. A. K. (1992). “How do you mean?”. In Martin Davies and Louise Ravelli
(eds.), Advances in Systemic Linguistics: Recent theory and practice. London and New
York: Pinter, pp. 20-35.
Halliday, M.A.K (1994 [1985]). An Introduction to Functional Grammar. Second edition.
London and Melbourne: Arnold.
Halliday, M.A.K., Matthiessen, Christian (2004) An Introduction to Functional Grammar.
3rd Edition, London: Arnold.
Handel, Stephen (1993 [1989]). Listening: An introduction to the perception of auditory
events. Cambridge, MA: The MIT Press.
Harré, Rom (1990). “Exploring the human Umwelt”. In Roy Bhaskar (ed.), Harré and
His Critics: Essays in honor of Rom Harré with his commentary on them. Oxford:
Blackwell, pp. 297-364.
Harris, Roy (1995). Signs of Writing. London and New York: Routledge.
Hasan, Ruqaiya (1978). “Text in the systemic-functional model”. In Wolfgang Dressler (ed.),
Current Trends in Textlinguistics. Berlin & New York: Walter de Gruyter, pp. 228-46.
References 255

Hasan, Ruqaiya (1980). “The texture of a text”. In M. A. K. Halliday and Ruqaiya


Hasan, Text and Context: Aspects of language in a social-semiotic perspective. Tokyo:
Sophia Linguistica, The Graduate School of Languages and Linguistics, Sophia
University, pp. 43-59.
Hjelmslev, Louis (1961 [1943]). Prolegomena to a Theory of Language. Francis J. Whitfield
(trans.). Revised English edition. Madison, Milwaukee and London: The
University of Wisconsin Press.
Jakobson, Roman (1960). “Concluding statement: linguistics and poetics”. In Thomas
A. Sebeok (ed.), Style in Language. Cambridge, MA: The MIT Press, pp. 350-377.
Johnston, Trevor (1992). “The realization of the linguistic metafunctions in a sign lan-
guage”, Social Semiotics 2, 1: 1-43.
Kanizsa, Gaetano (1980). Grammatica del Vedere: Saggi su percezione e gestalt. Bologna: Il
Mulino.
Kanizsa, Gaetano (1991). Vedere e Pensare. Bologna: Il Mulino.
Kauffman, Stuart A. (1993). The Origins of Order: Self-organization and selection in evolution.
New York and Oxford : Oxford University Press.
Kok, Arthur (2004) “ Multisemiotic mediation in hypertext”. In Kay O’Halloran (ed.),
Multimodal Discourse Analysis: Systemic functional perspectives. London and New York:
Continuum, pp. 131-159.
Kendon, Adam (1981). “Clouds, camels, chalk, and cheese”. Semiotica 36, 3-4: 365-80.
Kress, Gunther (1998). “Visual and verbal modes of representation in electronically
mediated communication: the potentials of new forms of text”. In Ilana Snyder,
(ed.), Page to Screen: Taking literacy into the electronic era. London and New York:
Routledge, pp. 53-79.
Kress, Gunther, Van Leeuwen, Theo (1990). Reading Images. Geelong, Victoria: Deakin
University Press.
Kress, Gunther, Van Leeuwen, Theo (1996). Reading Images: The grammar of visual
design. London and New York: Routledge.
Kress, Gunther, Van Leeuwen, Theo (2001). Multimodal Discourse: The modes and media
of contemporary communication. London: Arnold.
Lamb, Sydney M. (1966). “Epilegomena to a theory of language”. Romance Philology
XIX, 4: 531-73.
Labov, William, Waletzky, Joshua (1967). “Narrative analysis”. In June Helm (ed.),
Essays on the Verbal and Visual Arts. Seattle: University of Washington Press, pp.
12-44.
Latour, Bruno (1994). “Une sociologie sans objet? Remarques sur l’interobjectivité”,
Sociologie du Travail 4: 587-607.
Lemke, Jay L. (1983). “Thematic analysis: systems, structures, and strategies”, Semiotic
Inquiry/Recherches Sémiotiques (RSSI) 3(2): 159-87.
Lemke, Jay L. (1985). “Ideology, intertextuality, and the notion of register”. In James D.
Benson and William S. Greaves, (eds.), Systemic Perspectives on Discourse, Volume
1: Selected theoretical papers from the 9 th International Systemic Workshop. Norwood,
NJ: Ablex, pp. 275-94.
256 Multimodal Transcription and Text Analysis

Lemke, Jay L. (1988). “Discourses in conflict: heteroglossia and text semantics”. In


James D. Benson and William S. Greaves (eds.), Systemic Functional Approaches to
Discourse: Selected papers from the 12th International Systemic Workshop. Norwood, NJ:
Ablex, pp. 29-50.
Lemke, Jay L. (1989). “Semantics and social values”. Word 40,1-2: 37-50 [Special issue:
James D. Benson et al. (eds.), Systems, Structures, and Discourse].
Lemke, Jay L. (1990a). “Technical discourse and technocratic ideology”. In M.A.K.
Halliday, John Gibbons and Howard Nicholas (eds.), Learning, Keeping and Using
Language, Vol II: Selected papers from the 8th World Congress of Applied Linguistics, Sydney,
16-21 August 1987. Amsterdam/Philadelphia: John Benjamins, pp. 435-60.
Lemke, Jay L. (1990b). Talking Science: Language, learning, and values. Norwood, NJ:
Ablex.
Lemke, Jay L. (1992). “Semantics, semiotics, and grammatics: an ecosocial view”. Paper
presented at the 19th International Systemic-Functional Congress, Macquarie
University, Sydney.
Lemke, Jay L. (1998). “Multiplying meaning: visual and verbal semiotics in scientific
text.” In James R. Martin and Robert Veel (eds.), Reading Science: Critical and func-
tional perspectives on discourses of science. London and New York: Routledge, pp. 87-
113.
Lemke, Jay L. (2000). “Material sign processes and emergent ecosocial organization”.
In Peter Bøgh Andersen, Claus Emmeche, Niels Ole Finnemann, and Peder
Voetmann Christiansen (eds.), Downward Causation: Minds, bodies and matter.
Aarhus: Aarhus University Press, pp. 181-213.
Lewontin, Richard (2001 [2000]). The Triple Helix: Gene, organism and environment.
Cambridge, MA and London, England: Harvard University Press.
Malinowski, Bronislaw (1923). “Supplement I: The problem of meaning in primitive
languages”. In Charles K. Ogden and Ivor A. Richards (eds.), The Meaning of
Meaning. New York: Harcourt, Brace & World Inc., pp. 296-336.
Malinowski, Bronislaw (1935). “Part IV: An ethnographic theory of language and some
practical considerations”. Coral Gardens and their Magic. A study of the methods of
tilling the soil and of agricultural rites in the Trobriand Islands, Volume 2: The lan-
guage of magic and gardening. New York and Chicago: American Book Company.
Malinowski, Bronislaw (1944). A Scientific Theory of Culture and other Essays. Chapel
Hill: The University of North Carolina Press.
Martin, James R. (1985a). Factual Writing: Exploring and challenging social reality. Geelong,
Victoria: Deakin University Press.
Martin, James R., (1985b). “Process and text: two aspects of semiosis”. In James D.
Benson and William S. Greaves (eds.), Systemic Perspectives on Discourse, Volume 1:
Selected theoretical papers from the 9th International Systemic Workshop. Norwood, NJ:
Ablex, pp. 248-74.
Martin, James R. (1991). “Nominalization in science and humanities: distilling knowledge
and scaffolding text”. In Eija Ventola (ed.), Functional and Systemic Linguistics:
Approaches and uses. Berlin and New York: Mouton de Gruyter, pp. 307-37.
References 257

Martin, James R. (1992). English Text: System and Structure. Amsterdam/Philadelphia:


John Benjamins.
Martin, James R. (1993). “Technology, bureaucracy and schooling: discursive resources
and control”, In M.A.K. Halliday (Guest ed.). Cultural Dynamics VI, 1-2: 84-130.
Martin, James R. (1994). “Macro-genres: the Ecology of the Page”. Network 21: 29-52.
Martin, James R. (1995). “Text and Clause: Fractal Resonance”. Text 15 (1): 5-42.
Martin, James R. (1997). “Analysing genre: functional parameters”. In Frances Christie
and James R. Martin (eds.), Genre and Institutions: Social processes in the workplace
and school. London and Washington: Cassell, pp. 3-39.
Martin, James R., Christie, Frances, Rothery, Joan (1987). “Social processes in educa-
tion: A reply to Sawyer and Watson (and others)”. In Ian Reid (ed.), The Place of
Genre in Learning: Current debates. Geelong, Victoria: Deakin University Press,
pp. 46-57.
Martin, James R., Rose, David (2003). Working with Discourse: Meaning beyond the clause.
London and New York: Continuum.
Martinec, Radan (1998). “Cohesion in action”. Semiotica 120-1/2: 161-80.
Martinec, Radan (2000). “Rhythm in multimodal texts”. Leonardo 33, 4: 289-97.
Mathiot, Madeleine (1983). “Toward a meaning-based theory of face-to-face interac-
tion”. International Journal of the Sociology of Language 43: 5-56.
McGregor, William B. (1997). Semiotic Grammar. Oxford: Clarendon Press.
McNeill, David (1992). Hand and Mind: What gestures reveal about thought. Chicago and
London: The University of Chicago Press.
Merleau-Ponty, Maurice (1992 [1962]). Phenomenology of Perception. Colin Smith (trans.).
London: Routledge.
Nalon, Elena (1997). Multimodal meaning making: a visual and linguistic analysis of adver-
tising texts. Tesi di Laurea, Facoltà di Lingue e Letterature Straniere, Università
Ca’ Foscari di Venezia.
Nalon, Elena (2000). “Multimodal meaning making: perfume advertisements and the
human body”. In Anthony Baldry (ed.), Multimodality and Multimediality in the
Distance Learning Age. Campobasso: Palladino Editore, pp. 213-25.
Ochs, Elinor (1979). “Transcription as theory”. In Elinor Ochs and Bambi B.
Schieffelin (eds.) Developmental Pragmatics. New York and London: Academic
Press, pp. 43-72.
O’Connell, Daniel C., Kowal, Sabine (1995). “Transcription systems for spoken dis-
course”. In Jef Verscheuren, Jan-Ola Östman and Jan Blommaert (eds.), Handbook
of Pragmatics: Manual. Amsterdam/Philadelphia: John Benjamins, pp. 646-56.
O’Halloran, Kay (2004). “Visual semiosis in film”. In Kay O’Halloran (ed.), Multimodal
Discourse Analysis: Systemic functional perspectives. London and New York:
Continuum, pp. 109-130.
O’Halloran, Kay (2005). Mathematical Discourse: Language, symbolism and visual images.
London and New York: Continuum.
O’Malley, J. Michael, Chamot, Anna Uhl (1990). Learning Strategies in Second Language
Acquisition. Cambridge: Cambridge University Press.
258 Multimodal Transcription and Text Analysis

O’Toole, Michael (1994). The Language of Displayed Art. London: Leicester University
Press.
Peirce, Charles Sanders (1985). “Logic as semiotic: The theory of signs.” In Robert E.
Innis (ed.), Semiotics: An introductory anthology. London: Hutchinson, pp. 4-23.
Pike, Kenneth L. (1967). Language in Relation to a Unified Theory of the Structure of
Human Behavior. Second, revised edition. The Hague and Paris: Mouton.
Poynton, Cate (1985). Language and Gender: Making the difference. Geelong, Victoria:
Deakin University Press.
Rumelhart, David E. (1975). “Notes on a schema for stories”. In Daniel G. Bobrow
and Allan Collins (eds.), Representation and Understanding: Studies in cognitive science.
New York: Academic Press, pp. 211-236.
Saint-Martin, Fernande (1985). Introduction to a Semiology of Visual Language. Victoria
University, Toronto: Monographs, Working Papers and Prepublications of the
Toronto Semiotic Circle, Vol. 3.
St. Julien, John (1997). “Explaining learning: the research trajectory of situated cogni-
tion and the implications of connectionism”. In David Kirshner and James A.,
Whitson (eds.), Situated Cognition: Social, semiotic, and psychological perspectives.
Mahwah, NJ and London: Lawrence Erlbaum, pp. 261-79.
Salthe, Stanley N. (1993). Development and Evolution: Complexity and change in biology.
Cambridge, MA and London: The MIT Press.
Saussure, Ferdinand de (1993). Eisuke Komatsu (ed.), Cours de Linguistique Générale:
Premier et troisième cours d’après les notes de Reidlinger et Constantin. Collection
Recherches Université Gaskushuin no 24. Tokyo: Université Gakushuin.
Schank, Roger, Abelson, Robert (1977). Scripts, Plans, Goals, and Understanding.
Hillsdale, NJ: Lawrence Erlbaum.
Scheflen, Albert E. (1972). Body Language and Social Order : Communication as behavioral
control. Englewood Cliffs, NJ: Prentice-Hall.
Scheflen, Albert E. (1973). Communicational Structure: Analysis of a psychotherapy trans-
action. Bloomington, IN: Indiana University Press.
Schoenberg, Arnold (1975). Style and Idea: Selected writings of Arnold Schoenberg.
Leonard Stein (ed.), Leo Black (trans.). London: Faber & Faber.
Silverstein, Michael (1992). “The indeterminacy of contextualization: when is enough
enough?”. In Peter Auer and Aldo di Luzio (eds.), The Contextualization of
Language. Amsterdam/Philadelphia: John Benjamins, pp. 55-76.
Silverstein, Michael, Urban, Greg (eds.) (1996). Natural Histories of Discourse. Chicago:
University of Chicago Press
Sinclair, John (1991). Corpus, Concordance, Collocation. Oxford and Singapore: Oxford
University Press.
Taylor Torsello, Carol, Baldry, Anthony (2005). “SFL in text-based, web-enhanced lan-
guage study”. In Ruqaiya Hasan, Christian Matthiessen and Jonathan Webster
(eds.), Continuing Discourse On Language: A functional perspective, Volume 1.
London: Equinox, pp. 311-42.
Tesnière, Lucien (1965). Éléments de Syntaxe structurale. Paris: Klincksieck.
References 259

Thibault, Paul J. (1986). “Thematic system analysis and the construction of knowledge
and belief in discourse: the headlines in two Italian newspaper texts”. In Text,
Discourse, and Context: A social semiotic perspective. Victoria University, Toronto:
Monographs, Working Papers and Prepublications of the Toronto Semiotic
Circle, Vol. 3, pp. 44-91.
Thibault, Paul J. (1988-1989). Grammar, Text, and Discourse Genre: An advanced introduc-
tion to the systemic-functional approach. Department of Linguistics, University of
Sydney: Mimeo.
Thibault, Paul J. (1989). “Genres, social action, and pedagogy: towards a critical social
semiotic account”. Southern Review (Australia) 22, 3: 338-62.
Thibault, Paul (1990). “Questions of genre and intertextuality in some Australian tele-
vision advertisements”. In Rema Rossini Favretti (ed.), The Televised Text.
Bologna: Pàtron, pp. 89-131.
Thibault, Paul (1991a). “Grammar, technocracy, and the noun: technocratic values and
cognitive linguistics”. In Eija Ventola (ed.), Functional and Systemic Linguistics:
Approaches and uses. Berlin and New York: Mouton de Gruyter, pp. 281-305.
Thibault, Paul (1991b). Semiotics as Social Praxis: Text, social meaning-making and
Nabokov’s “Ada”. Theory and History of Literature series, Vol. 74. Minneapolis
and Oxford: University of Minnesota Press.
Thibault, Paul J. (1994a). “Text and/or context?”. State-of-the-Art article. In The
Semiotic Review of Books (Toronto) 4, 1 (May 1994): 10-12.
Thibault, Paul J. (1994b). “Intertextuality” in R. E. Asher and J. M. Y Simpson (eds.),
The Encyclopedia of Language and Linguistics, Volume 4, pp. 1751-54.
Thibault, Paul J. (1997a). Re-reading Saussure: The dynamics of signs in social life. London and
New York: Routledge.
Thibault, Paul J. (1997b). “Contextualization and social meaning-making practices”.
Discussing Communication Analysis 1: 31-47.
Thibault, Paul J. (1998a). “Graphology and visual semiosis”. Dip. di Studi Linguistici e
Letterari, Europei Postcoloniali, University of Venice: Mimeo.
Thibault, Paul J. (1998b). “Multimodality”. In Paul Bouissac (ed.), The Encyclopedia of
Semiotics. New York and Oxford: Oxford University Press, pp. 427-9.
Thibault, Paul J. (1999a). “Communicating and interpreting relevance through dis-
course negotiation: an alternative to relevance theory”. Journal of Pragmatics 31,
5: 557-94.
Thibault, Paul J. (1999b). “Putting Humpty Dumpty’s theory of meaning back together
again: Can Saussure help?”. Belgian Essays on Language and Literature (BELL) 9: 7-
34. [Liège: Belgian Association of Anglicists in Higher Education].
Thibault, Paul J. (2000a). “The multimodal transcription of a television advertisement:
theory and practice”. In Anthony Baldry (ed.), Multimodality and Multimediality in
the Distance Learning Age. Campobasso: Palladino Editore, pp. 311-385.
Thibault, Paul J. (2000b). “The dialogical integration of the brain in social semiosis:
Edelman and the case for downward causation”. Mind, Culture, and Activity 7, 4:
291-311.
260 Multimodal Transcription and Text Analysis

Thibault, Paul J. (2002). “Interpersonal meaning and the discursive construction of


action, attitudes and values: the global modal program of one text”. In Peter
Fries, Michael Cummings, David Lockwood and William Sprueill (eds.), Relations
and Functions in Language and Discourse. London: Continuum.
Thibault, Paul J. (2003). “Contextualization and social meaning-making practices”. In
Susan Eerdmans, Carlo Prevignano, Paul J. Thibault (eds.), Discussing John J.
Gumperz. Amsterdam/Philadelphia: Benjamins, pp. 41-61.
Thibault, Paul J. (2004a). Brain, Mind, and the Signifying Body: An ecosocial semiotic theory.
London and New York: Continuum.
Thibault, Paul J. (2004b). Agency and Consciousness in Discourse: Self-other dynamics as a
complex system. London and New York: Continuum.
Thompson, Sandra and Mann, William C. (1987). “Antithesis: a study in clause combin-
ing and discourse structure”. In Ross Steele and Terry Threadgold (eds.),
Language Topics: Essays in honour of Michael Halliday, Vol. 2, Amsterdam/
Philadelphia: John Benjamins, pp. 359-81.
Turrisendo, Fabio (2004). TV Drink Advertisements: A Corpus Analysis Using MCA
(Multimodal Corpus Authoring System). Unpublished degree thesis, Facoltà di
Lettere e Filosofia, Dipartimento di Lingue e Letterature Anglo-germaniche e
Slave, University of Padua.
Van Leeuwen, Theo (1985). “Rhythmic structure of the film text”. In Teun A. van Dijk
(ed.), Discourse and Communication: New approaches to the analysis of mass media
discourse and communication. Berlin and New York: Walter de Gruyter, pp. 216-32.
Van Leeuwen, Theo (1990). “Conjunctive structure in documentary film and televi-
sion”. Continuum 5(1): 76-115.
Van Leeuwen, Theo (1991). “The sociosemiotics of easy listening music”. Social
Semiotics 1, 1: 67-80.
Van Leeuwen, Theo (1996). “Moving English: The visual language of film”. Reading B
in Chapter 2 in Sharon Goodman and David Graddol (eds.), Redesigning
English: New texts, new identities. London and New York: Routledge, pp. 81-105.
Van Leeuwen, Theo (1999). Speech, Music, Sound. London: Macmillan.
Ventola, Eija (1987). The Structure of Social Interaction: A systemic approach to the semi-
otics of service encounters. London: Pinter.
Ventola, Eija, Charles, Cassily, Kaltenbacher, Martin (eds.), (2004). Perspectives on
Multimodality. Amsterdam/Philadelphia: John Benjamins.
Appendix II: Multimodal Transcription of the
Mitsubishi advertisement (T= time in seconds)
Camera Position and
T Visual Frame Visual Transitivity Soundtrack S, P, T
Movement
Pa: Agent: car; Pa: Agent: car; music: orchestra
Pr: Movement Vector: L-R; Pr: Movement Vector: L-R; Tempo: slow Phase 1
01 Pa: Goal: telephone booth Pa: Goal: telephone booth Volume: low
Shot 1

Pa: gloved hand; CD: very close;


Pr: movement vector: L-R; HA: frontal; Cut
Pa: Goal: telephone receiver VA: MH;
CM: stationary Shot 2

02

Pa: Sayer: woman; CD: close; voice: female: I've


Pr: Verbal; HA: frontal; got the money; Cut
03 VA: medium;
CM: stationary Shot 3

Pa: Sayer: man; CD: very close; voice: man: bring


Pr: Verbal; HA: frontal; it tomorrow Cut
04 VA: medium;
CM: stationary Shot 4

Pa: Gazer: man; CD: very close; then distant;


Pan to
Pr: Gaze Vector; HA: oblique;
left
Pa: Target: hostage VA: medium;
CM: left pan: track gaze vector
Shot 5
Pa: Actor: woman; sound: ambient: Cut
Pr: Movement Vector: walk; door of telephone
06 Pa: Goal: car: stationary booth closing: Phase 2
loud; resonant
Shot 6
Pa: Goal: car; CD: very close;
Pr: Movement Vector: walk: HA: frontal; Cut
07 Pa: Actor: woman: implied VA: medium;
CM: left pan Shot 7

Pa: Actor: woman; CD: very close; Left pan


Pr: Connection Vector: HA: frontal; +
grasp: steering wheel; VA: medium;
merging
Pa: Result: car drive CM: stationary: tracks move-
ment of car along road Shot 8

As before +: CD: very close: music: orchestra


Pa: Senser: woman; HA: frontal; as before
09 Pr: Mental; VA: medium; Tempo: fast
Pa: Phenomenon: projected CM: stationary: tracks move- Volume: louder
destination: Wien ment of car along road
262 Multimodal Transcription and Text Analysis: Appendix II

Camera Position and


T Visual Frame Visual Transitivity Soundtrack S, P, T
Movement
As before +: CD: distant: Fade:
Pa: Agent: car; HA: frontal;
Pr: Movement Vector: L-R VA: medium; Shot 8
+ towards viewer CM: stationary: tracks move- �
ment of car along road Shot 9

CD: close:
HA: frontal;
VA: medium;
CM: stationary: tracks move-
ment of car along road
Pa: Agent: car: implied; CD: very close;
Pr: Movement Vector: L-R; HA: frontal; Fade
11 Pa: Goal: balloons VA: medium;
CM: vector: movement Shot 10
through balloons
Pa: Actor: woman; CD: distant; Cut
Pr: Movement Vector: L-R; HA: frontal:
Pa: Goal: telephone booth: VA: medium; Phase 3
12 implied CM: stationary
Shot 11
Pa: Actor: woman: arm; CD: close;
Pr: Movement Vector: L-R; HA: oblique; Cut
14 Pa: Goal: telephone VA: medium;
receiver: stationary CM: stationary Shot 12

Pa: Sayer: woman; CD: close; voice: female:


Pr: Verbal HA: frontal; where now?
VA: medium;
CM: stationary
Pa: Sayer: man; CD: close; voice: male:
Pr: Verbal HA: frontal; Prague Cut
16 VA: medium;
CM: stationary Shot 13

Pa: Agent: shark fin; CD: distant; medium close;


Pr: Movement Vector: L-R; HA: frontal; Cut
17 Pa: Goal: suspended man VA: medium;
CM: stationary Shot 14

Pa: Goal: telephone CD: very close; ambient sound:


receiver; HA: frontal; woman hangs up Cut
Pr: Movement Vector; VA: medium; telephone
Actor: hand replaces CM: stationary receiver: loud; Shot 15
receiver resonant
Pa: Actor: woman; CD: medium distant;
Pr: Movement Vector: L-R; HA: oblique; Fade
19 Pa: Goal: car VA: medium;
CM: stationary Shot 17
263

Camera Position and


T Visual Frame Visual Transitivity Soundtrack S, P, T
Movement
Pa: Agent: car; CD: medium distant;
Fade
Pr: Movement Vector: R-L; HA: oblique;
20 Pa: Prague: implied VA: medium;
Shot 17
CM: stationary

21

CD: distant;
HA: frontal;
VA: medium;
CM: stationary
Pa: Actor: hand; CD: very close; Merge
Pr: Connection Vector: HA: frontal; Shot 18
22 hold; VA: medium; (super-
Pa: Goal: passport CM: stationary imposed on
17)

Pa: Agent: car; CD: distant; Shot 19


Pr: Movement Vector: R-L; HA: frontal; (super-
towards viewer VA: medium; imposed on
CM: stationary 18)

23 Fade

CD: very close;


HA: frontal;
VA: medium;
CM: stationary

CD: very close; voice: off screen


Phase 5
HA: frontal; male presenter: The
VA: medium; Mitsubishi Carisma
CM: downwards music: faded
Shot 20
24 Pa: Sayer: woman; doesn't need
Pr: Verbal;
Circ: In phone booth

Pa: Sayer: director; CD: very close; voice: male: film


Pr: Verbal HA: frontal; director: cut
VA: medium; Shot 21
26 CM: stationary

voice: male: off


screen presenter: an
implausible
264 Multimodal Transcription and Text Analysis: Appendix II

Pa: Sayer: woman; CD: very close; voice: female: what!


Pr: Verbal HA: frontal; (angrily) Cut
27 VA: medium;
CM: stationary Shot 22

Pa: Goal: phone booth; CD: very close; voice: male: off screen
Pr: Movement Vector; HA: frontal; presenter: plot Cut
28 Agent: implied VA: medium;
CM: stationary Shot 23

Pa: Goal: phone booth; CD: distant; voice: male: off screen
Pr: Movement Vector; HA: frontal; presenter: shot by an over- Cut
Agent: men in film studio VA: medium; paid
CM: stationary Shot 24

CD: distant; sound: ambient: telephone


HA: frontal; booth being dragged aside
VA: medium; by stage hands
CM: pan out

Pa: Actor: woman; sound:


Pr: Movement Vector: synthesised indicator of
woman walks towards something going awry
viewer; waving arms
Pa: Actor: man; CD: close; voice: male: off screen Cut
Pr: Movement Vector: arms; HA: oblique; presenter: director
30 Pa: Goal: car VA: medium; Phase 6
CM: stationary Shot 25
Pa: Agent: car; voice: male: off screen
Pr: Movement Vector: car presenter: The Mitsubishi
moves back onto ramp

CD: close; Carisma


HA: frontal; Cut
32 VA: medium;
CM: stationary Shot 26

some cars have it,


some don't

Pa: Sayer: hostage; voice: male: hostage: Cut


Pr: Verbal excuse me
35 Phase 7
Shot 27
music: synthesised Cut
36 Phase 8
Shot 28

37
Shot 29

blackout
38
Index
A 182, 185, 189, 203-5, 210, 213, 217, 230,
Abercrombie, David 215, 217, 219, 221, 222 235, 245-6
accessibility 93, 97 closeness and close shot(s) 5, 24, 39, 40, 42, 70, 85, 120,
accent 52, 184, 208, 216-7 125, 170, 172, 184, 187, 196, 197, 201, 231,
acoustic focus 52 232, 239, 240, 241 248
action potential 44, 46, 104, 138, 140, 146-7, 150, 152 clothes 20, 188
active and passive 39 cluster(s) 1, 4, 9, 11, 12, 21-34, 36, 38-44, 46, 48, 54, 60, 63,
Adobe Premiere 182 81, 115, 113, 121, 124, 127, 131, 145, 146,
advertisement(s) 5, 7, 17, 20, 33, 37, 38-43, 48-55, 105, 107, 151, 160, 166, 174, 214, 218
122, 165-248 cluster analysis 4, 11, 21, 24-31, 39, 48, 121, 131
ambient sound(s) 35, 53, 178, 180, 211 co-contextualisation 21, 83, 132, 249
animation 43, 60, 105, 127 codeployment of resources 4, 5, 6, 11, 20, 21, 47, 55, 61,
Asterix 37 102, 114, 118, 129, 139, 167, 178, 184, 217,
apprenticeship 93, 97 223, 247, 249
attitudinal and evaluative meanings and/or stances 37, 38, 176 coding orientation 41, 99, 200
attractors 117-8, 183 coherence 116, 187, 188, 205, 218, 239-42
attribution 85, 97, 137 cohesion 22, 179, 193, 211
Audi Quattro 167, 170 cohesive chains, elements and/or semantic ties 179-80, 187
audio files 109 cohyponyms 81, 88, 115, 136
auditory array 175, 210 collocation 198
authority 93 colour 29, 53, 58, 63, 76, 79, 91-3, 98, 99, 100, 102, 119, 124,
125, 130-5, 138, 140, 149, 151, 153-4, 159,
B 187, 198-202
backgrounding 38, 121, 205, 211 columns 30, 64, 65, 71, 74, 75, 77, 82, 127, 174, 178, 180-7,
Bakhtin, Mikhail 27, 31, 39, 42, 43, 69, 90, 96, 162, 211 190-4, 201, 202, 209, 210, 213, 214, 216-223,
Baldry, Anthony 12, 18, 34, 47, 48, 49, 51, 54, 80, 122, 166, 232, 241-2
167, 172, 225, 234, 248-9 comprehensibility 93
Bateson, Gregory 10, 17, 21, 83, 249 constituency relations 22, 234
body movement and/or position 21, 37, 172, 178, 180, 183, content stratum 22, 166, 182, 223-7, 237
185, 193, 202, 218, 219 content-form 236-7
Boo Bear text 38-44 content-substance 236-7
bottom bar 40-1, 44-6, 115, 131 context 1, 2, 3, 4, 6, 7, 11, 12, 18, 22, 24, 27, 53, 56, 80, 93,
British Museum 114, 119, 130-5, 138, 140, 151, 156, 159 96, 99, 100, 101, 111-2, 117-8, 124, 134-6,
broadcasting 165, 166 148, 165-7, 172, 178, 180, 207, 211, 212,
Bühler, Karl 83 217, 218, 222, 225, 227, 246-7, 249
context of culture 1-4, 6, 7, 18, 56, 111, 112, 165
C context of situation 2-7, 56, 111, 124, 165
call-outs 67 copatterning 50, 181, 183, 187, 195, 244, 247
camera: movement and/or position 172, 173, 187, 191, 193, corpora 18, 48, 248
194, 202, 240, 241 covariate ties 16, 138-140, 155, 187, 199
Capital 61-9, 210 cross-coupling 102
car advertisement(s) 5-7, 48-55, 104, 122, 170, 228-35, 238-44, cuts 50, 51, 190, 241
248
caricature 34, 37-8 D
Carrier (attributive clauses) 75, 85, 87, 97, 100, 137, 142 Daneš, František 74,188
cartoon(s) 7-17, 20-1, 24, 34-8, 46, 60, 63, 120-1, 123, 125,
Darwin, Charles 58, 59, 60, 63, 65, 208
130, 131, 223
Davidse, Kristin 74, 85
cartoonist(s) 34, 37
deformation of body 207, 232
causal sequences and/or relations 12, 13, 22, 80, 117, 190,
deixis and/or deictic frames 10, 99
238, 239
dependency relations 22, 234-9
charts 57, 63, 65-7
depicted scene 7, 10, 11, 12, 14, 16, 120, 123, 124, 139, 145,
Chesapeake Bay text 28, 32-4, 37
147, 150, 151, 152, 187, 198, 223
children 31, 48, 57, 58, 63, 78, 102, 103, 104, 114, 116, 119,
depicted world 10, 11, 12, 17, 36, 125, 189, 191, 193, 194,
125, 130-1, 135, 138, 140, 151, 156, 159
195, 196, 197, 198, 202, 205, 224-5, 227, 240
cinema 38, 46, 107, 192
depiction 10, 12, 14, 18, 23, 37, 55, 80, 83, 91, 104, 111, 119,
classroom 77, 87, 93, 102, 114, 116, 120
124, 125, 138, 156, 205, 206, 223, 225-7, 232
classification and analysis 86
diagrams 19, 41, 57, 60, 63, 65, 68, 79, 92-3, 127, 114, 193
clause(s) and/or clause groups 22, 41, 55, 64, 71, 74-78, 82,
dialogic acts, activity, moves and/or responses 22, 149, 179,
84, 88, 96- 99, 100-1, 127, 136-143, 145, 149,
181, 182, 220
266 Multimodal Transcription and Text Analysis

direct speech 7, 10, 11, 101 formal pattern(s) 167


direction 35, 39, 61, 63, 66, 121, 170, 201, 203, 210, 235 frame(s) and framing border(s) 1, 7-11, 16-7, 40, 44-9, 51, 55,
discourse level 12, 16, 122, 166, 167, 223, 232, 238 99, 102, 122-3, 166-8, 170-4, 178, 184-7, 189,
discourse organisation 167, 176, 239 190, 193, 201, 202, 209, 225, 230-4, 237, 242
discourse stratum 13, 223, 225, 233, 238 244
displacement 163, 187, 200 Fuller, Gillian 96
display 37, 44, 46, 48, 74, 78-81, 85, 86, 91, 123, 133, 134, functional relation(s) 129, 234
135, 136, 138, 147, 178, 182, 223-8, 232, 243 functional unit(s) 27, 50
dissolve(s) 190
distance 10, 39, 40, 67, 86, 123, 130, 159, 170, 172, 173, 178, G
184, 195, 196, 197, 198, 201, 203, 211, 240, gaze 18, 20, 21, 35-40, 42, 49, 51, 52, 167-173, 178, 186, 196,
241 200, 201, 230, 231, 235, 237, 240, 241
distinct temporal moments 13 gaze schemas 173
dog-chews-shoe cartoon 12-5 genre(s) 1, 2, 1, 3, 4, 7, 11, 12, 14, 17, 20, 26, 27, 30-9, 41,
dolly shot 194 42, 43, 48, 54, 56, 58, 61, 68, 71, 103, 107,
double-clicking 105 112-8, 120, 123, 125-9, 146, 156, 158-9, 160,
double page display 91 163-5, 173, 178, 180, 200 222, 247, 249
drawing(s) 60, 91, 92, 93, 193 gesture(s) and gestural emblems 1, 6, 7, 18, 20, 21, 23, 48,
drop-down menu(s) 67, 105 111, 156, 173, 175, 176, 177, 178, 206, 208,
DVD 46, 109, 215, 216, 219
dyadic link(s) 170 Gibson, James 37, 100, 123, 163, 175, 189, 191-3, 197, 198,
dynamic transformation 189 200, 223-5
dynamism 66, 212 Given (information) 1, 40, 47, 61, 65, 69, 78-83, 86-9, 92,
100, 108, 109, 113, 114, 117, 120, 125, 127,
E 128, 137, 146, 149, 153, 156, 157, 159, 161,
economics 30, 61, 92 163, 167, 174, 178-90, 194, 198-9, 201-10
ecosocial scale 93 global patterning 187
elaboration 85, 87, 88, 97, 109, 110, 111, 235 grammar and grammatical structures 1, 14, 18, 37, 41, 55, 64,
end phase 5, 54 66-8, 71, 78, 85, 87, 122, 138, 181, 203-5,
enhancement 235 207, 223, 225, 227, 238
environment 2, 106, 112, 119, 124, 125, 128, 136, 148, 156, graph(s) 41, 79, 92-3
157, 160, 163, 175, 191, 192, 194, 202, 203, H
208, 210, 211, 220, 224, 225 Halliday, M.A.K. 3, 4, 22, 24, 41, 63, 75, 79, 85, 96, 97, 99,
Eskimo text 125, 166-71, 173 172, 189, 221, 235, 237, 238
evaluative orientations and or/stances 10, 11, 17, 22, 38, 39, hand-arm gestures and/or movements 4-7, 175
212 Harris, Roy 91
event sequence and/or structure 12-6, 113, 239, 244 Hasan, Ruqaiya 113, 179
exemplification 38, 39, 48, 85, 88, 97, 142 head nouns 64
exophoric tie(s) 77 higher-level meaning-making sequences 44
experiential meaning and/or metafunction 14, 16, 22, 39, 40, Hjelmslev, Louis 157, 166, 223-5, 236-7
55, 71, 74, 80, 82-7, 90, 100, 102, 122, 131, home page(s) 103, 108, 114-9, 159
137, 140, 141, 143, 145, 147, 149, 167, 202, horizontal relations and structures 26, 27, 30, 35, 39, 40, 44,
203, 204, 205, 208, 210, 222, 225, 230, 231 64, 66, 71, 81, 129, 157, 170, 172, 174, 186,
expertise and/or expert discourse 71, 93 189, 190, 194, 195, 230, 240
expression stratum 110, 224, 225, 227, 233, 236 human body 39, 173
expression-form 236-7 hyper-New 74, 75, 76, 77
expression-substance 236-7 hyperreal 37, 200
extension 127, 148, 189, 221, 235, 238 hypertext(s) 92, 106, 110, 117, 118, 126-9, 134-6, 140, 141,
external observer 10 143, 145-8, 152, 155-63, 242-4
F hypertextual element(s) and/or structure(s) 104, 106, 117,
facial expression(s) 37, 167, 175, 176 129, 136, 137, 138, 139, 140, 146, 158, 159,
fade-in(s), fade-out(s) 190 161, 242, 243
farness 231, 232 hyper-Theme 74-7, 188
film text(s) 3, 6, 7, 10, 12, 43-9, 60, 104, 120-77, 164, 165, hypotaxis 234
174, 183, 189, 190, 200, 212, 223, 224, 234-
I
6, 243-4, 248
iconicity 35
film genres 7, 48
Ideal 39, 40, 81, 83, 89, 182
film storyboard 44
identification 55, 85, 97, 180, 183, 221
font size 41, 63
ideological manipulations 21
foregrounding 7, 20, 28, 38, 47, 54, 66, 74, 90, 91, 96, 121,
image(s) 14, 19, 20, 21, 29, 37, 38, 39, 41-3, 48, 57, 65, 68,
138, 145, 150, 170, 179, 180, 183, 187, 202,
70, 75, 79-91, 99, 102, 106, 109, 113, 120-1,
205, 218, 239, 243, 244
127, 130, 131, 136, 138, 139, 145-9, 155-6,
Index 267

161, 163, 173-5, 178, 180, 189, 191-9, 205, 83, 87, 89, 92, 112, 136, 137, 175, 179, 180,
223, 227, 232, 234, 248 185, 205, 225, 232
incoherence 187 Leonardo’s Notebook 12, 31, 44
indeterminacy and/or indeterminate points 201 lexicogrammatical resources and/or structures 22, 43, 71, 74,
indexical functions and/or relations 3, 28, 35, 77, 78, 86, 91, 96, 99, 137, 142, 173, 176, 218, 230, 237,
104, 140, 167, 202, 205 246, 247
inequality 234 line(s) 3, 7, 11, 29, 35, 44, 52, 60, 64, 67, 74, 91-3, 107-8, 140,
informational invariants 190 248
instance 3, 4, 18, 19, 30, 39, 42, 55, 65, 78, 79, 80, 84-5, 87, linear/typological character of language 65
114, 117, 137, 140, 145, 156, 172-5, 186, linearity 26, 91, 158, 242, 243
198, 202, 208, 213, 220, 227, 234, 239, 240 linguistic meaning-making resources 61
instance and type 4, 39 local discontinuity 187
integration principle 1, 4-5, 7, 11, 17-9, 24, 44, 61-3, 80, 83, 167 local variation 187, 188
interactants 39, 43, 197, 247 location 49, 50, 51, 100, 119, 129, 135, 139, 140, 154, 187,
interaction and/or interactional 19, 22, 26, 37, 70, 75, 78, 188, 198, 204, 205, 211, 248
80, 90, 106, 112, 118-9, 128, 131, 146, 148, logical meaning and/or metafunction 16, 190
152, 157, 158, 162-3, 180, 185, 186-8, 193, logo 5, 26, 27, 29, 31, 33, 41, 121, 131, 174, 185, 186, 199,
198, 205, 211, 216, 218, 227, 232-3, 247 200, 209
interactional encounter 7 London Transport (LT) text 19, 24. 25, 26, 27, 30, 34, 37
intermediate levels 43, 54, 56 long shot 39, 197
Internet 58, 103, 106, 108, 112, 119, 162, 163, 164 Lupo Alberto text 35, 36, 37, 38
interpersonal meaning and/or metafunction 10, 11, 17, 22,
34, 36, 39, 40-2, 44, 69, 80, 85, 89, 90, 92- M
102, 119, 121, 130-1, 148-52, 167, 170-2, 179, macro-analytical approach 166, 167, 169, 171, 173
184-5, 197, 201, 206-8, 211, 222, 225, 241 macro-New(s) 74, 75, 76, 77
interpersonal bond 44 macrophase(s) 48, 50, 54, 56, 173
interpersonal closeness 42, 197 macro-Theme 74, 75, 76, 77
interpersonal negotiation 98, 99, 102 make-up 7, 116
interpersonal relations 10, 170, 179, 201 Marmaduke cartoons 7-15, 20, 21, 37, 38
interplay of light and shade 53, 224, 241 Martin, James 13, 14, 16, 74, 75, 77, 78, 83, 84, 100, 113,
interrelated levels 54, 236 114, 188, 225, 237, 239, 244
intersemiosis 71, 83, 84, 91, 146 Martinec, Radan 47, 49, 80, 225
intertexts 64, 82, 90-1, 96, 123, 247 Marx, Karl 61, 62, 63, 64, 65, 66, 69
intertextual links 56 material object text 21, 44, 109, 173, 175, 177
intertextual thematic relations and/or systems 64, 137-8, 180 material surface 87, 109, 175
intertextual/indexical links 78 McGregor, William 22, 38, 100, 149, 204, 206
intertextuality 6, 31, 35, 44, 55, 68-9, 141 meaning compression (principle) 1, 19, 24-6, 42, 56, 58, 64, 165
invariant elements and/or structures 187, 190, 204, 228-9 meaning relations 55, 71, 83, 137, 140, 158, 159, 160, 167,
inverted commas 11 175, 180, 227, 242, 243
meaning-making activities 3, 78, 116, 227, 245
J meaning-making process 6, 118, 166, 180, 183
Jakobson, Roman 107 meaning-making resources 4, 6, 19, 20, 21, 30, 58, 61, 64,
James Bond films 52 116, 162, 163, 166, 167, 173, 197, 203, 223
John Stuart Mill 61 meaning-making units 27, 49, 146, 164, 223
medium close shot 39, 197, 241
K metacomment(s) 10, 11
kinesic elements, resources and/or structures 43, 109-10, 149, metadiscursive elements, resources and/or structures 17,
151-2, 174, 178-9, 185, 191, 201, 202, 207-9, 114, 159
215-6, 219 metafunctional interpretation 174, 181, 184, 222
Kress, Gunther 18, 34, 37, 39, 41, 58, 60, 63, 66, 70, 79, 80, metafunctional organisation 49, 225
81, 86, 91, 92, 93, 99, 122, 189, 195, 197, metafunctions 1, 4, 16, 17, 22, 34, 38-42, 80-3, 85, 87, 167,
200, 201, 225, 234 185, 222
L metaphorical contrasts 6
language 1-4, 7, 10, 11, 12, 20, 58, 63-5, 70, 78, 80-3, 87, 90, metarule 10
104, 110-1, 136, 138, 139, 149, 156-7, 172, metasemiotic status 99, 100, 102, 195, 214, 216
178, 180, 189, 198, 201, 204-6, 215, 217, mini-genres 11, 42, 113
224, 230, 232, 234, 244, 248, 249 mirror writing 46
larger-scale textual features and/or units 38, 50, 54, 122, 234, Mitsubishi Carisma text 48-54, 122, 166, 212, 223-248
246 modern scientific page(s) 63, 64, 67
layout 30, 58, 70, 91, 114 mood 10, 11, 22, 51, 149, 205, 212
leaflets 21, 30, 34 mouse (pointer) 44, 46, 105-6, 117, 121, 124-6, 130, 140, 147-
left-right organisation 40, 81, 83, 105 55, 159, 174
Lemke, Jay 16, 18, 20, 21, 55, 64, 65, 68, 69, 70, 71, 80, 82,
268 Multimodal Transcription and Text Analysis

movement 1, 3, 7, 12, 20, 21, 35, 37, 58, 60, 63, 66, 105, 147- original broadcasting 165
54, 167, 172, 173, 178, 184, 185, 187, 189,
190, 193, 194, 202-8 P
movement proposition 206 page-display bar 46
movie set 54 pagelets 28, 32, 67
moving element(s), resource(s) and/or structure(s) 27, 44, panning 52, 193, 194
105, 108, 118, 147, 174, 178, 179, 185, 187, paradigmatic relations 156-9, 173, 243-4
189, 191-4, 218, 219, 220 paralinguistic elements 20
multimodal concordancing 48 parataxis 234
multimodal genre(s) 1, 4, 26, 38 participant chains 232, 233
multimodal narrative(s) 34 participant roles 15, 16, 147, 159-62, 170, 176, 177, 202, 225,
multimodal page(s) 57, 58, 62, 64, 75 242, 248
multimodal scientific text(s) 83, 92 part-part relations 22, 234, 238
multimodal syntagm 85, 86 parts functioning in some larger whole 21
multimodal textual design 58 part-whole relations 56, 149, 234
multiple pathways 102 patterned relations 19, 21, 178, 179, 183
multiplying effect 18, 42 peaks and troughs 182
multitasking 117 perceptual purview 167
music 1, 20, 23, 51-2, 54, 178, 180-1, 184-5, 209, 211, 214, perceptual realism 120
216, 218, 220-2 perceptual simultaneity 13
musical intensity 54 periodicity 26, 74-7, 182
musical score 12 person deixis 10, 98, 101
perspective 1, 24, 39, 40, 53, 64, 86, 99, 106, 119, 120, 123-5,
N 147, 149, 152, 157, 163, 195, 197, 202-6
Nalon, Elena 80 phasal analysis 4, 47, 50, 54
narrative(s) 12, 16, 17, 34, 128, 238 phasal organisation 166
narrative discourse 15, 16, 238,-9 phase(s) 1, 6, 43-4, 46-51, 53, 54, 60, 61, 80, 116, 122, 134,
narrative meaning 15, 16, 233 166-7, 173-4, 180-6, 188, 209, 212-3, 217-9,
narrative organisation 12, 16 222-3, 228, 230, 232-3, 238-41, 244
narrative sequence 12-15 phonetic empathy 215
narrative structures 37 phonological prosodies 20
narrative timeline 16 photograph(s) 32, 34, 37,41-4, 58, 63, 69, 75, 79-81, 88, 91-3,
narrativity 14, 16, 239 120, 123, 137-9, 177, 193
Nasa Kids 31, 114, 119-29, 140, 146, 147, 151-6, 159 photographic display 78, 79, 81
NasaToons 121, 124-9, 132 physical surface 193
naturalistic representations and/or tracings 37, 41, 53, 63, 79, plant movements 58, 63, 66
121, 127, 131, 180, 200, 206 playwrights’ asides 21
negotiaton 96-7, 99 pointing 7, 26, 74, 83, 85, 93, 167, 172, 173, 235
New (information) 1, 24, 40, 67, 69, 74, 75, 76, 77, 81, 82, political cartoon(s) 63
83, 90, 91, 105, 111, 117, 119, 128, 136, 143, postproduction 183, 184
145, 146, 148, 150-6, 158, 160, 162, 164, posture 20, 122, 179
185, 186, 189, 190, 203 potential significance 53
nominal groups 64, 74, 82, 88, 145, 230 precursor forms 92
nominalisation 64 primary and/or secondary genres 4, 11, 27, 31, 39, 42-3, 68
non-linguistic resources 20 printed page(s) and/or text(s) 9, 13, 15, 17, 18, 19, 21, 23,
non-naturalistic qualities 53 31, 37, 38, 39, 42, 49, 53, 57, 59, 61, 63, 103,
non-salience 189, 212, 228 104, 105, 106, 109, 110, 111, 112, 113, 115,
non-verbal accompaniments 20 117, 119, 121, 123, 125, 127, 129, 133, 153,
nose-here perspective 197, 202, 205 155, 157, 267, 269, 271, 273, 275, 277, 279,
281
O progressive picture(s) 189, 191
O’Halloran, Kay 80, projection 46, 96, 99, 101, 106, 109, 157, 175, 182, 192, 193,
O’Toole, Michael 34, 80, 225 206, 223-5
observers 47, 112, 152, 172, 183, 192, 215 prominence 22, 47, 52, 53, 78, 121, 199, 209, 216, 228
optic array 37, 123, 175-6, 189-93, 205, 210, 223-9 prosodic features 52, 53
oral discourse 12, 69
oral modality 61, 63 Q
orchestral music 51, 52, 54 quotation marks 7, 10, 61, 63
organisational principles 4, 18, 26, 116, 181, 215
organisational scales 50 R
orientation 29, 35, 36, 41, 44, 55, 82, 98-102, 113, 119, 124, recontextualisation(s) 3, 21, 63, 69, 120, 133-4, 136, 147, 158,
127, 131, 133, 150-2, 158, 180, 185, 188, 162, 165, 191, 213
196, 199- 208, 211, 221, 222, 248 reference point 10, 11, 193
Index 269

register 221, 222 soundscape 212


relationship between texts and society 1 soundtrack 6, 7, 51-4, 174, 178, 180-5, 202, 210-20
resonance 52 spatial dispositions and/or relations 20, 63, 85
resource integration (principle) 1, 4-20, 24, 42, 44, 48, 56, 58, spatiotemporal arrangements 178
61, 63, 80, 83, 167 speaking 10, 11, 19, 20, 23, 47, 52, 53, 57, 69, 84, 85, 93, 96,
rhythm group 216, 217 173, 175, 182, 185, 200, 207, 217, 218, 222,
rollover 105, 121, 124, 147, 149-55 235, 241, 246
rows 30, 64, 65, 71, 77, 174, 180-219, 221, 223, 233, 241 specialised lettering 35
speech bubble(s) 10, 11, 37
S spoken and written word 4
Saint-Martin, Fernande 83, 84, 100 spoken discourse 20, 240
salience 11, 18, 20, 36, 38, 41, 47-53, 75, 86, 124, 130, 131, stance 10, 11, 17, 35, 36, 38, 89, 90, 92, 99, 100, 152, 180,
137, 150, 176-78, 180, 183-4, 187, 189, 190, 205, 208, 212
193, 199, 202, 212, 221-2, 227-8, 230, 235 stereotypical representations of social roles 17
Saussure, Ferdinand de 156, 157 strong classification and/or strong framing 117, 159
scalar levels, models and/or units in multimodality 1, 2, 19, subphase(s) 47, 49, 170, 174, 183-8, 190, 193, 217, 218, 222-3
38, 50-4, 146, 159, 173, 183, 225, 238-9, 244, subsystems 117
247 supercluster(s) 46, 115
scientific articles 42, 92 syntagmatic elements and/or expansion 156, 157, 158, 159,
scientific attitude 93 179, 204, 234, 238, 243, 244
scientific discourse 70, 92, 99 system 18, 23, 55, 64, 68, 69, 78, 82, 87, 90, 91, 110, 111,
scientific meanings 57, 79, 86, 87, 91, 96, 102, 127 112, 116, 117, 120, 123, 136, 137, 140-9,
scientific (printed and/or web) page 57, 58, 60-4, 67, 69 156-60, 172-5, 177, 180, 182, 183, 188, 191,
scientific processes 65 193, 194, 196, 201, 202, 205, 211, 224, 227,
scientific texts 60, 61, 70, 83, 89, 92 236, 237, 243, 244, 246
scientific truth 79, 89, 92, 120
scopal relations 38 T
screen 31, 36, 54, 104-14, 121, 128, 129, 148, 152, 153, 157- table(s) 12-5, 19, 22, 30, 41, 48-9, 52-3, 57, 63-7, 70-7, 81-3,
63, 180, 191-3, 196, 197, 201, 205, 223-8 88, 91, 92, 103-4, 108, 119, 128-9, 134-5,
search engines 46, 107, 115 152-6, 162, 167, 178, 181, 223, 225-9, 231-3,
secondary genres 4, 27, 31, 39, 42, 43, 68 237-9, 241-2
selective recontextualisation 147, 165 tagging 122
semiosis 18, 23, 80, 100, 110, 122-3, 149, 159, 164, 181, 191, technological infrastructure 160, 162, 163
197-8, 200, 204, 224-5, 227, 232, 234, 237-9 technology 58, 91, 103, 109, 117, 119, 248
semiotic modalities 2, 3, 4, 18, 20, 21, 43, 47, 70, 78, 80, 88, telephones and/or telephoning 19, 26, 49, 50, 51, 53, 54,
105, 117, 118, 126-8, 139, 140, 145-6, 158-9, 107, 122, 212, 228, 230, 233, 234, 235, 239,
174, 177-8, 181, 188, 207, 218, 223, 248-9 240, 241, 242
semiotic organisation 13, 14, 159, 160, 223 tempo 51, 52, 54, 185, 211, 212, 216, 217, 219
semiotic resources systems and/or tools 1, 7, 12, 15, 18-20, temporality and temporal elements 12, 13, 14, 60, 65, 223,
23, 34, 37, 47, 49, 54-5, 58, 61-6, 70, 73, 74, 238, 239
77, 79, 80, 83, 87, 98, 100, 103, 110-1, 116-8, temporal-causal relations 13
129, 139, 146, 160, 162, 166-7, 172-4, 177-8, tense 5, 10, 41, 101, 113
188, 190, 202, 217, 227, 236-7, 240, 247 text and society 4
sensory experience and/or modality 93 text and viewer 182, 205
sequential organisation 114, 117, 161 text-access tools 46
shared higher-order unit 234 textbooks 43, 58, 59, 60, 70, 77, 84, 89, 91, 93, 96, 102
shot(s) 5, 39, 48-51, 122, 166, 170, 173-4, 179, 180-95, 197-9, text-specific meanings 20
200-11, 217, 228, 230-44, 248 textual cohesion 179, 193, 211
sign-relation 236, 237 textual interactions 46
simultaneity of visual presentation 12 textual meaning and/or metafunction 16, 21-2, 39, 40, 54,
size 27, 29, 41, 63, 81-2, 89, 120, 129-31, 134, 176, 189, 199, 75, 80, 86, 91, 153, 155, 167, 179
228, 231 textual organisation 2, 3, 6, 19, 43, 46, 54, 105, 122, 187, 188
slider 44, 46 textual periodicity 74, 75, 77
slogan 27, 33 textual ties 16
slogo 27, 28, 29 textual (sub)units 11, 50, 54, 74, 239, 242
small-scale meanings, phases and/or units 50, 54, 122 textual/compositional resources 39
smiling 178, 179, 187, 208, 209 texture 149, 199, 202, 209, 224, 230
Smith, Adam 61 The Economist 61, 62, 63, 64, 65, 67, 68, 69
social contexts 4, 18, 93 thematic continuity 188, 233
social evolution 11 thematic formation 55, 82, 87-9, 136-9, 141, 142
sound(s) 3, 7, 20, 21, 35, 43, 48, 51, 52, 53, 63, 111, 173, thematic homogeneity 51, 135
176, 177, 178, 180, 181, 185, 209-221, 224, thematic relation(s) 19, 55, 136, 137, 138, 139, 140, 142, 143,
232, 236, 237 146, 180, 188
270 Multimodal Transcription and Text Analysis

thematic-semantic condensation 64, 71 161, 174-5, 178, 180, 189, 191, 193, 197,
Thibault, Paul 1, 18, 20, 34, 35, 47, 48, 49, 51, 55, 58, 64, 65, 68, 205, 223, 227, 232-4, 248
69, 71, 80, 82, 86, 88, 90, 96, 97, 99, 112, 114, visual information 39, 175, 189, 191
122, 166, 167, 172, 175, 183, 193, 204, 205, 225, visual kinaesthesis 191, 193-5, 202, 224-5, 232
232, 234, 237, 238, 247, 248 visual parameter 195
thought bubbles 24, 37 visual percepts 87, 93
timeline 14, 16, 182 visual process 39, 40, 231, 232
title(s) 26, 27, 28, 29, 33, 75, 76, 77, 127, 140 visual resources 7, 55, 64, 70, 80, 98, 99, 102, 125, 141,
tools 46, 102, 122, 162, 164, 198, 248 224, 236, 240, 242, 244
top-bottom organisation 40, 81, 83, 91, 105 visual salience 44, 82, 155, 188, 199, 228
topological elements, space and/or values 18, 64, 65, 83, 85, visual scene 51, 52, 120, 138, 153, 155, 225
100, 178, 189, 198-9, 203-4, 232, 234, 236 visual semiotic 42, 63, 65, 66, 70, 74, 79-83, 87, 92, 93,
transformations 12, 189, 190, 191, 193, 197, 224-5, 228-9 98-9, 127, 139, 188, 191, 210, 240, 243-4
transition(s) 13, 14, 46, 47, 48, 49, 50, 53, 151, 173, 181, 182, visual strategies 190
184, 185, 186, 187, 190, 193, 215, 216, 228, 233, visual text(s) 14, 51, 57, 68, 79, 80, 83, 84, 90, 102, 122,
243, 244, 248 171, 174, 176, 178, 186, 189, 193, 197,
transitivity frame(s) 1, 16, 46, 49, 51, 55, 122, 123, 166-73, 225, 199, 200, 217, 223-5, 227, 235, 237, 243
230-234, 237, 242, 244 visual-graphological resources 71, 77
transitivity relations 55, 138, 146, 173, 176 visual-spatial units 104, 105, 114
TV advertisements 4, 7, 48-9, 54, 165-6, 193, 212, 223 voice(s) 19, 21, 51, 52, 53, 54, 55, 63, 69, 96, 119, 142,
typological-categorial relationships 64-5, 74, 83, 189, 232-4 162, 184, 185, 211, 212, 213, 214, 218,
219, 220, 221, 222
U
unfolding text 173, 187, 222 W
use of speech 7 wave cycle 182
utterance 2, 10, 11, 22, 101, 218, 245, 246 wave-like patterning 181
weak classification and/or framing 129, 159
V web users 112-3, 115, 117, 159
Van Leeuwen, Theo 7, 18, 20, 34, 37, 39, 41, 51, 58, 60, 63, 66, web-based animations 44
79, 80, 81, 86, 92, 93, 99, 122, 180, 183, 189, web-based films 46
195, 197, 200, 201, 212, 216, 221, 225, 234, 244 Westpac text 69, 125, 130, 165, 166, 167, 174-223
variability 114, 117-8 whole-part relationships 56
vector(s) 29, 35-6, 39, 40, 49, 63, 66-8, 83-8, 90, 122, 131, 132, whole-whole relationships 38
140, 170, 186, 201, 225, 230-7, 241 wipes 190
Ventola, Eija 80, 113 word-processing packages 67
verbal and visual resources 7, 70, 80, 102, 141
verbal genres 41 Z
verbal text 19, 40-2, 63-4, 71-4, 77, 80-4, 87, 89, 91-2, 127, 131-3, Zombie text 5-7, 10, 20
136, 140-6, 149, 156, 177-8
vertical hierarchies and/or structures 26, 27, 30, 39, 40, 53, 64,
66, 71, 81, 115, 119, 129, 157, 170, 174, 178,
186, 195, 230, 240
very close/long shot 39, 170, 197, 240
VHS films 46
video recordings and/or texts 50, 60, 105, 122, 127, 165, 166,
167, 173, 174, 188, 189, 196, 226, 234, 236, 238,
242, 243, 249
virtual magnifying glass 31, 46
virtual world 106, 120, 123, 124, 148, 152, 156, 157, 159, 163
visual and actional resources 46
visual and verbal resources 19, 46, 60, 64, 68, 70
visual collocation 198
visual cues 15
visual cuts 190
visual design 70
visual devices 37
visual focus 200, 201, 228
visual forms 14, 43, 68, 204, 223, 242
visual frame 174, 178, 184, 185, 186, 187, 189, 190, 193, 201,
202, 209
visual genres 39, 41, 71, 146, 200
visual image(s) 14, 20-1, 29, 37, 39, 41-2, 57, 65, 68, 70, 79, 83-8,
90-1, 99, 109, 127, 136, 138, 145-6, 148, 155-6,

You might also like