P. 1
Concise Encyclopedia of Sociolinguistics

Concise Encyclopedia of Sociolinguistics


|Views: 5,638|Likes:
Published by Alexandra Florea

More info:

Published by: Alexandra Florea on Jan 13, 2012
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less






  • Multiculturalism and Language
  • Pragmatics and Sociolinguistics
  • Prescriptive and Descriptive Grammar
  • Saussurean Tradition and Sociolinguistics
  • Sociolinguistics of Sign Language
  • Conversational Maxims
  • Cooperative Principle
  • Deaf Community: Structures and Interaction
  • Dialogism
  • 2. Dialogism
  • Discourse
  • Discourse in Cross-linguistic and Cross-cultural Contexts
  • Doctor-Patient Language
  • Ethnography of Speaking
  • Ethnomethodology
  • Identity and Language
  • Kinesics
  • Kinship Terminology
  • Language in the Workplace
  • Narrative, Natural
  • Phatic Communion
  • Speech Accommodation
  • Speech Act Theory: An Overview
  • Business Language
  • Code, Sociolinguistic
  • Discourse Analysis and the Law
  • Formulaic Language
  • Institutional Language
  • Literary Language
  • Media Language and Communication
  • Medical Language
  • Religion and Language
  • Slang: Sociology
  • Speech and Writing
  • The Internet and Language
  • Adolescent Peer Group Language
  • Class and Language
  • Dialect and Dialectology
  • Dialect Humor
  • Ethnicity and the Crossing of Boundaries
  • Ethnicity and Language
  • Forensic Phonetics and Sociolinguistics
  • Gay Language
  • Gender and Language
  • Language Change and Language Acquisition
  • Maps: Dialect and Language
  • Social Class
  • Social Networks
  • Sociolinguistic Variation
  • Sociolinguistics and Language Change
  • Sociophonetics
  • Subcultures and Countercultures
  • Syntactic Change
  • The Atlas of North American English: Methods and Findings
  • Urban and Rural Forms of Language
  • Areal Linguistics
  • Bilingualism and Language Acquisition
  • Code-switching: Discourse Models
  • Code-switching: Overview
  • Code-switching: Sociopragmatic Models
  • Code-switching: Structural Models
  • Contact Languages
  • Endangered Languages
  • English as a Foreign Language
  • Ethnolinguistic Vitality
  • Foreigner Talk
  • Interlanguage
  • Intertwined Languages
  • Jargons
  • Koines
  • Language Enclaves
  • Language Loyalty
  • Language Maintenance, Shift, and Death
  • Language Spread
  • Language Transfer and Substrates
  • Lingua Franca
  • Migrants and Migration
  • Missionaries and Language
  • Pidgins and Creoles: An Overview
  • Pidgins and Creoles: Models
  • Pidgins and Creoles: Morphology
  • Sociolinguistic Area
  • Critical Language Awareness
  • Critical Sociolinguistics
  • Discrimination and Minority Languages
  • Language Conflict
  • Linguicide
  • Linguistic Imperialism
  • Manipulation
  • Marxist Theories of Language
  • Minority Languages
  • Politicized Language
  • Politics and Language
  • Power and Language
  • Power Differentials and Language
  • Representation
  • Semilingualism
  • Stereotype and Social Attitudes
  • Symbolic Power and Language
  • The Linguistic Marketplace
  • Academies: Dictionaries and Standards
  • Artificial Languages
  • Heritage Languages
  • International Languages
  • Language Adaptation and Modernization
  • Language Development
  • Language Diffusion Policy
  • Language Planning: Models
  • Linguistic Census
  • Linguistic Habitus
  • Multilingual States
  • National LanguagejOfficial Language
  • National Language/Official Language
  • Nationalism and Language
  • Orthography
  • Prescription in Dictionaries
  • Reversing Language Shift
  • School Language Policies
  • Standardization
  • Statistics: Principal Languages of the World (UNESCO)
  • Verbal Hygiene
  • Applied Linguistics and Sociolinguistics
  • Bilingual Education
  • Black English in Education: UK
  • Child Language: An Overview
  • Ebonics and African American Vernacular English
  • Education and Language: Overview
  • Educational Failure
  • English Grammar in British Schools
  • Gender, Education, and Language
  • Home Language and School Language
  • Pidgins, Creoles, and Minority Dialects in Education
  • Spoken Language in the Classroom
  • Standard English and Educational Policy
  • Teaching Endangered Languages
  • Attitude Surveys: Question-Answer Process
  • Corpus Linguistics and Sociolinguistics
  • Data Collection in Linguistics
  • Field Methods: Ethnographic
  • Field Methods in Modern Dialect and Variation Studies
  • Fieldwork and Field Methods
  • Fieldwork Ethics and Community Responsibility
  • Interactional Sociolinguistic Research Methods
  • Literacy: Research, Measurement, and Innovation
  • Observing and Analyzing Classroom Talk
  • Salvage Work (Endangered Languages)
  • Endangered Languages Projects (An Inventory)
  • Internet Resources for Sociolinguistics
  • Sociolinguistics Journals: A Critical Survey
  • Bakhtin, Mikhail M. (1895-1975)
  • Cooper, Robert Leon (1931- )
  • Das Gupta, Jyotirindra (1933- )
  • Edwards, John Robert (1947- )
  • Emeneau, Murray Barnson (1904- )
  • Fairclough, Norman (1941-)
  • Ferguson, Charles A. (1921-98)
  • Fishman, Joshua A. (1926- )
  • Halliday, Michael Alexander Kirkwood (1925- )

D. Wagner

Research on the causes and consequences of literacy
and illiteracy has grown dramatically since the 1980s,
yet much more needs to be known. Since there exists
a great variety of literacy programs for an even larger
number of sociocultural contexts, it should come as
no surprise that the effectiveness of literacy programs
has come under question, not only among
policymakers and specialists, but also among the
larger public. How effective are literacy campaigns?
What is the importance of political and ideological
commitment? Should writing and reading be taught
together or separately? Should literacy programs
include numeracy as well? Is literacy retained
following a limited number of years of primary
schooling or short-term campaigns? How important
are literacy skills for the workplace? Is it important
to teach literacy in the individual's mother
tongue? These and similar questions—so central to
the core of literacy work around the world—remain
without definitive answers, in spite of the occasion-
ally strong rhetoric in support of one position or
another. Basic and applied research, along with
effective program evaluation, are capable of provid-
ing critical information that will not only lead to
greater efficiency in particular literacy programs, but
will also lead to greater public support of literacy

Research on literacy can reveal key policy areas
which need to be addressed, as well as methodologies
for assessment and monitoring which will be crucial
in the coming years. This entry summarizes some of
the major areas of literacy research and measure-
ment, and offers some critical areas for future

1. Literacy Research in Global Perspective

There are three general domains in literacy work that
are likely to be the subject of greater attention and
will determine to a large extent whether attempts at
improving global literacy will be successful. Each of
these is reviewed below.

1.1 Defining Literacy: Operationalization for

With the multitude of experts and published books
on the topic, one would suppose that there would be

a fair amount of agreement as to how to define the
term 'literacy.' On the one hand, most specialists
would agree that the term connotes aspects of
reading and writing; on the other hand, major
debates continue to revolve around such issues as
what specific abilities or knowledge count as literacy,
and what 'levels' can and should be defined for
measurement. The term 'functional literacy' has often
been employed, as originally defined by Gray (1956:
19): 'A person is functionally literate when he has
acquired the knowledge and skills in reading and
writing which enable him to engage effectively in all
those activities in which literacy is normally assumed
in his culture or group.'
While functional literacy has a great deal of appeal
because of its implied adaptability to a given cultural
context, the term can be very awkward for research
purposes. For example, it is unclear in an industria-
lized nation like the UK what level of literacy should
be required of all citizens. Does a coal miner have
different needs than a barrister? Similarly, in a Third
World country, does an illiterate woman need to
learn to read and write in order to take her prescribed
medicine correctly, or is it more functional (and cost
effective) to have her school-going child read the
instructions to her? The use of the term 'functional,'
based on norms of a given society, is inadequate
precisely because adequate norms are so difficult to

An adequate, yet more fluid, definition of literacy
is '... a characteristic acquired by individuals in
varying degrees from just above none to an
indeterminate upper level. Some individuals are more
or less literate than others, but it is really not possible
to speak of literate and illiterate persons as two
distinct categories' (UNESCO 1957). Since there exist
dozens of orthographies for hundreds of languages in
which innumerable context-specific styles are in use
every day, it would seem ill advised to select a
universal operational definition. Attempts to use
newspaper reading skills as a baseline (as in certain
national surveys) may seriously underestimate lit-
eracy if the emphasis is on comprehension of text
(especially if the text is a national language not well
understood by the individual). Such tests may over-
estimate literacy if the individual is asked simply to


Methods in Sociolinguistics

read aloud the passage, with little or no attempt at
the measurement of comprehension. Surprisingly,
there have been relatively few attempts to design a
battery of tests from low literacy ability to high
ability which would be applicable across the complete
range of possible languages and literacies in any
society, such that a continuum of measurement
possibilities might be achieved. UNESCO, which
provides world-wide statistical comparisons of lit-
eracy, relies almost entirely on data provided by its
member countries, even though the measures are
often unreliable indicators of literacy ability. How-
ever, there were new attempts in the 1990s to
undertake literacy surveys in a comparative and
international framework in some industrialized
nations (OECD/Statistics Canada 1995).
At least part of the controversy over the definition
of literacy lies in how people have attempted to study
literacy. The methodologies chosen, which span the
social sciences, usually reflect the disciplinary train-
ing of the investigator. Anthropologists provide
in-depth ethnographic accounts of single commu-
nities, while trying to understand how literacy is
woven into the fabric of community cultural life. By
contrast, psychologists and educators have typically
chosen to study measurable literacy abilities using
tests and questionnaires, usually ignoring contextual
and linguistic factors. Both these approaches
(as well as history, linguistics, sociology, and
computer science) have value in achieving an under-
standing of literacy. There is no easy resolution to
this problem, but it is clear that a broad-based
conception of literacy is required not only for a
valid understanding of the term, but also for
developing appropriate policy actions (Venezky et al.

Because literacy is a cultural phenomenon—
adequately denned and understood only within each
culture in which it exists—it is not surprising that
definitions of literacy may never be permanently
fixed. Whether literacy includes computer skills,
mental arithmetic, or civic responsibility will depend
on how public and political leaders of each society
define the term and its use. Researchers can help in
this effort by trying to be clear about which definition
or definitions they choose to employ in their work.
For overviews of literacy in international context, see
Wagner and Puchner (1992), Wagner (1992), Wagner
et al. (1999).

1.2 Acquisition of Literacy

The study of literacy acquisition has been greatly
influenced by research undertaken in the industria-
lized world. Much of this research might be better
termed the acquisition of reading and writing skills,
with an emphasis on the relationship between
cognitive skills, such as perception and memory,
and reading skills, such as decoding and comprehen-

sion. Further, most of this work has been carried out
with school-aged children, rather than with adoles-
cents or adults. Surprisingly little research on literacy
acquisition has been undertaken in the Third World,
where researchers have focused primarily on adult
acquisition rather than on children's learning to read.
This latter phenomenon appears to be a result of the
emphasis to promote adult literacy in the developing
world, while usually ignoring such problems, until
quite recently, in industrialized societies.
Despite these gaps in the research literature,
certain general statements are relatively well estab-
lished as to how literacy is acquired across different
societies. In 1973 Downing published Comparative
which surveyed the acquisition of reading
skills across different languages and orthographies.
He found that mastery of the spoken language is a
typical prerequisite for fluent reading comprehension
in that language, although there exist many excep-
tions. Another finding is that in many alphabets
children first learn to read by sounding out words
with a memorized set of letter-sound correspon-
dences. It is now known that there are many
exceptions to this generalization. There are, of
course, languages which are not written in alphabets
(e.g., Chinese and Japanese). There also appear to be
large individual differences in learning styles within
literacy communities. Finally, many individuals can
read and write languages which they may not speak

Some specialists have stressed the importance of
class structure and ethnicity/race as explications of
differential achievement among literacy learners.
Ogbu (1978), for example, claimed that many
minority children in the United States are simply
unmotivated to learn to read and write in the cultural
structure of the school. This approach to under-
standing social and cultural differences in literacy
learning has received increased attention in that it
avoids blaming the individual for specific cognitive
deficits (as still happens), while focusing intervention
strategies more on changes in the social and political
structure of schooling or the society.
Finally, it has been assumed that learning to read
in one's 'mother tongue' or first language is always
the best educational policy for literacy provision,
whether for children or adults. Based on some
important research studies undertaken in the 1960s,
it has generally been taken for granted that indivi-
duals who have had to learn to read in a second
language are at a disadvantage relative to others who
learn in their mother tongue. While this genera-
lization may still be true in many of the world's
multilingual societies, more recent research has
shown that there may be important exceptions.
In one such study, it was found that Berber-
speaking children who had to learn to read in
Standard Arabic in Moroccan schools were able to


Literacy: Research, Measurement, and Innovation

read in fifth grade just as well as children who were
native speakers of Arabic (Wagner 1993). Adequate
research on nonliterate adults who learn to read in a
second versus a first language has yet to be undertaken.
In sum, considerable progress has been made in
understanding the acquisition of literacy in children
and adults, but primarily in industrialized societies.
Far less is known about literacy acquisition in a truly
global perspective, and in multilingual societies. Since
the majority of nonliterate people live in these areas
of the world, much more needs to be known if
literacy provision is to be improved in the coming

1.3 Retention of Literacy

The term 'educational wastage' is common in the
literature on international and comparative educa-
tion, particularly with respect to the Third World.
This term typically refers to the loss, usually by
dropping out, of children who do not finish what is
thought to be the minimum educational curriculum
of a given country (often 5 to 8 years of primary
schooling). Most specialists who work within this
area gather data on the number of children who enter
school each year, the number who progress on to the
next grade, those who repeat a given year (quite
common in many Third World countries), and those
who quit school altogether. The concept of wastage,
then, refers to those children for whom an economic
investment in educational resources has already been
made, but who do not complete the appropriate level
of studies.

The issue of literacy retention is crucial here, for it
is not actually the number of school leavers or
graduates that really matters for a society, but rather
what they learn and retain from their school years,
such as literacy skills. When students drop out of an
educational program, a society is wasting its
resources because those individuals (children or
adults) will not reach some presumed threshold of
minimum learning without losing what has been
acquired. Thus, retention of learning (or literacy, in
particular) is a key goal of educational planners
around the world. There are as yet only a small
number of research studies published on this ques-
tion, and their results are highly contradictory. Some
show that there is a 'relapse' into illiteracy for those
who have not received sufficient instruction, while
other demonstrate no serious loss (Wagner 1998a).

2. Literacy Measurement

2.1 Areas of Debate in Literacy Assessment

In order to provide worldwide statistical compari-
sons, UNESCO (UNESCO 1996, Wagner 1998b)
has relied almost entirely on data provided by its
member countries. These countries, in turn, typically
rely on national census information, which most

often determines literacy ability by self-assessment
questionnaires and/or by the proxy variable of years
of primary schooling. Many specialists now agree
that such measures are likely to be unreliable
indicators of literacy ability. Nonetheless, through
the 1990s, change in literacy measurement has
been slow in coming, even though some initiatives
have been undertaken.
There is considerable diversity of opinion as to the
usefulness of classifying individuals in the traditional
manner of 'literate' versus 'illiterate.' Several decades
ago, when developing countries began to enter the
United Nations, it was common to find that the vast
majority of the adult populations of these countries
had never gone to school nor learned to read and
write. It was relatively easy in those contexts to
simply define all such individuals as 'illiterate.' The
situation in the late 1990s is much more complex, as
some contact with primary schooling, nonformal
education programs, and the mass media is now
made by the vast majority of families in the Third
World. Thus, even though parents may be illiterate, it
is not unusual for one or more of their children to be
able to read and write to some degree. For this
reason, it would seem that simple dichotomies—still
in use by some international agencies and most
national governments—ought to be avoided, since
they tend to misrepresent the range or continuum of
literacy abilities that are common to most contem-
porary societies.
As noted earlier, work on adult literacy has
frequently derived its methodologies from the study
of reading development in children. This is true of
assessment as well, where the diagnosis of individual
reading difficulties has held sway for many years, in
both children and adults. This diagnostic model of
assessment assumes that individuals who do not read
well have some type of cognitive deficit which can
(often) be remediated if properly diagnosed by a
skilled professional. There is little doubt that this
model does apply to many adults who have not
learned to read, but who have attended school.
However, the majority of the world's population of
low literates and illiterates (located primarily in
developing countries) have received little or no
schooling, making the diagnostic method far less
relevant. For this latter population, detailed diag-
nostic measures are unimportant relative to the need
for better understanding of who goes to school, what
is learned, and which particular social groups are
most in need of basic skills. In such cases, as
discussed below, low-cost household surveys may
be a better assessment technique than diagnostic

Most countries have formulated an explicit lan-
guage policy which states which language or
languages have official status. Often, the decision
on national or official language(s) is based on such


Methods in Sociolinguistics

factors as major linguistic groups, colonial or post-
colonial history, and the importance of a given
language to the concerns of economic development.
Official languages are also those commonly used in
primary school, though there may be differences
between languages used in beginning schooling and
those used later on. The use of mother tongue
instruction in both primary and adult education
remains a topic of continuing debate (Dutcher 1982,
Hornberger 1999). While there is usually general
agreement that official language(s) ought to be
assessed in a literacy survey, there may be disagree-
ment over the assessment of literacy in nonofficial
languages (where these have a recognized and
functional orthography). In many countries, there
exists numerous local languages which have varying
status with respect to the official language; how these
languages and literacies are included in such surveys
is a matter of debate. For example, in certain
predominantly Muslim countries in sub-Saharan
Africa (e.g., Senegal or Ghana), the official language
of literacy might be French or English, while
Arabic—which is taught in Islamic schools and used
by a sizable population for certain everyday and
religious tasks—is usually excluded from official
literacy censuses. Many specialists now agree that
most (if not all) literacies should be surveyed; to
ignore such abilities is to underestimate national
human resources.
Comparability of data—across time and coun-
tries—is a major concern for planning agencies. If
definitions, categories, and classifications vary, it
becomes difficult if not impossible to compare data
collected from different surveys. On the other hand, if
comparability is the primary goal, with little atten-
tion to the validity of the definitions, categories, and
classifications for the sample population, then the
data become virtually meaningless. International and
national needs, definitions, and research strategies
may or may not come into conflict over the issue of
comparability, depending on the particular problem
addressed. For example, international agencies con-
tinue to utilize literacy rates that are measured in
terms of the number of'literates' and 'illiterates.' For
most countries, this type of classification presents few
problems at the level of census information, and it
provides international agencies with a cross-national
framework for considering literacy by geographic or
economic regions of the world. On the other hand,
national planners may want to know the effects of
completion of certain grades of primary or secondary
school, or of a literacy campaign, on levels of literacy
attainment, so that a simple dichotomy would be
insufficient. Household literacy surveys, because
more time may be devoted to in-depth questioning,
offer the opportunity to provide a much more
detailed picture of literacy and its demographic
correlates than has been previously available.

2.2 Household Literacy Surveys

Assessment surveys have employed varying ap-
proaches to defining literacy skill levels in different
countries. For example, some assessment surveys
have focused on 'ability to read aloud' from a
newspaper in the national language; some have
included basic arithmetic (numeracy) skills; while
still others have focused on being able to write one's
name or read a bus schedule. Two main types of
literacy survey methods—self-assessment and direct
measurement—have been utilized within widely
differing contexts.
Most national literacy data collections in the world
have utilized self-assessment techniques, which are
operationalized by simply asking the individual one
or more questions of the sort: 'Can you read and
write?' Occasionally, census takers collect informa-
tion on which language or languages pertain to the
above question, but rarely have time or resources
been invested beyond this point. Analysis of the
relationship between self-assessment and direct mea-
surement of literacy abilities has rarely been sought,
so that the reliability of self-assessment methods is
very problematic (Lavy et al. 1995).
Direct measurement of literacy typically involves
tests which are constructed with the aim of obtaining
performance or behavioral criteria for determining
literacy and/or numeracy abilities in the individual.
The large number and variety of literacy and
numeracy assessment instruments precludes a com-
plete discussion in this brief review. Objective
measures rely primarily on test items to elicit valid
and reliable data from the individual, with rather
strict controls on the context and structure of the test.
An example would be a multiple choice test where the
individual is presented with a short paragraph of text
and is asked to choose, among four items, the item
which best describes some particular piece of infor-
mation mentioned in the paragraph. These measures
are usually quite reliable in school settings and for
silent reading, where test-retest correlations and cross-
test correlations may be highly significant. Their use in
nonschool settings and with low-literate adults is less
well known, since these tests assume a certain
equivalence in 'test-taking skill' across individuals
tested. Such objective tests are particularly useful in
settings where the interviewer has little prior experience
in literacy assessment, since relatively little subjective
interpretation of test performance is required.
The direct measurement of literacy skills using
assessment instruments provides information on
more refined categories than available in self-assess-
ment, which usually provides merely a dichotomous
categorization. In industrialized countries, there have
been a number of important household literacy
surveys. The first two were completed in North
America: the Young Adult Literacy Assessment
(Kirsch and Jungeblut 1986) in the US, and the


Literacy: Research, Measurement, and Innovation

Canadian Literacy Survey (Statistics Canada 1988) in
Canada. More recently, a major international survey
was conducted in about a dozen OECD countries
(OECD/Statistics Canada), using a methodology that
overlapped rather substantially with the North
American surveys, providing in depth individual
assessments of reading, writing, and math skills in
both abstract and functional contexts (Tuijnman
et al. 1997). While the advantage of such in-depth
information may be justifiable in the context
of industrialized countries, these surveys clearly
represent the 'high end' of both detailed analysis
and cost in terms of household surveys.
For contexts such as in developing countries or in
low-literate ethnic communities in industrialized
countries, it may be useful to choose a categorical
breakdown which would provide just enough in-
formation for use by policymakers, and which could
be more easily and simply constructed. This 'low-end'
method of assessment is best exemplified in the model
developed under the auspices of the United Nations
National Household Survey Capability Program, and
which has been undertaken in several countries,
including Zimbabwe and Morocco (United Nations
1989, Wagner 1990). In this model, there are four
main skill classifications which are proposed: (a) non-
for a person who cannot read a text with
understanding and write a short text in a significant
national language, and who cannot recognize words
on signs and documents in everyday contexts, and
cannot perform such specific tasks as signing their
name or recognizing the meaning of public signs;
(b) low-literate for a person who cannot read a text
with understanding and write a short text in a
significant national language, but who can recognize
words on signs and documents in everyday contexts,
and can perform such specific tasks as signing their
name or recognizing the meaning of public signs; (c)
moderate-literate for a person who can, with some
difficulty (i.e., makes numerous errors), read a text
with understanding and write a short text in a
significant national language; and (d) high-literate for
a person who can, with little difficulty (i.e., makes few
errors), read a text with understanding and write a
short text in a significant national language. When
these four categories are utilized in conjunction with
other variables in the survey, it becomes possible to
arrive at answers to questions often posed by policy-
makers, such as: How does literacy vary by age, grade,
geographical region, language group, and so forth?

2.3 Measuring Literacy Levels

Beyond the broad category labels of literacy levels,
there is little agreement on how actually to assign
such labels to individuals. Does scoring above 50
percent on a test of paragraph comprehension qualify
an individual as literate, nonliterate, or in-between?
To a great extent, such labeling has been and

continues to be arbitrary. In addition, while most
assessment instruments utilize school-based and
curriculum-based materials, there is increasing
awareness among specialists of the importance of
measuring 'everyday' or practical literacy abilities.
One method for dealing with literacy assessment is to
determine the intersection of both literacy skills and
domains of literacy practice (Tuijnman et al. 1997).
There are a great many types of literacy tests, and a
great number of skills which specialists have thought
were important not only for the measurement of actual
literacy ability, but also in terms of the underlying
processes involved in being a competent reader or
writer. Drawing on recent survey work, as described
above, it is useful to think of literacy ability as
involving at least four basic types of skills: decoding,
comprehension, locating information, and writing.
Individuals who use literacy may perform literate
functions on a wide array of materials; in addition,
certain individuals may specialize in specific types of
literate domains (e.g., lawyers, doctors, agricultural
agents). Even individuals with low general levels of
literacy skill may be able to cope successfully with
written materials in a domain in which they have a
great deal of practice (e.g., farm workers who often
deal with insecticides). Since governments are gen-
erally interested in providing literacy for many
categories of people, an assessment should sample
across the material domains where literate functions
are typically found, such as, single words, short
phrases, tables and forms, and texts.
The estimation of literacy skills by text domains
involves the use of a matrix of the intersection of
literacy skills with the text domains in which literacy
skills can be applied. This provides a breakdown of
types of component skills in literacy. It should be
understood that there is rarely consensus on which
specific skills to test in literacy, and that any such
matrix is necessarily arbitrary. Nonetheless, a matrix
of literacy skills by domain can provide a useful
method for collecting the appropriate ('low-end')
amount of information needed for policy decisions,
and it can be considerably less expensive than some
of the comprehensive methods employed in the
North American surveys.

3. Literacy Research and Innovation

Based on the growing concern about literacy levels
across the globe, it seems clear that new domains of
research will begin to open up, such as the topics
described below.

3.1 Technology

There are new and exciting ideas about the utility of
technology for literacy provision to children and
adults. Much of this work is still in the early
development stages, such as efforts to utilize synthetic
speech to teach reading, or the use of multimedia


Methods in Sociolinguistics

displays (interactive video, audio tapes, and com-
puter displays) to provide more sophisticated
instruction than has been heretofore available.
Technological solutions to instruction—known as
computer-based education (CBE) or computer-as-
sisted instruction (CAI)—have been used, primarily
in industrialized nations, since the early 1980s, and
the presence of microcomputers in the classrooms of
schools has continued to grow at a phenomenal rate,
especially with the advent of the Internet (Anderson
1999, OECD 2000, Wagner and Hopey 1999).
Until the 1990s, the cost of educational technology
was too high even for most industrial countries, and
therefore far beyond the means of the developing
countries. But the price-to-power ratio (the relative
cost, for example, of a unit of computer memory or
the speed of processing) continues to drop at an
astounding rate. While the cost of the average micro-
computer has remained constant or declined slightly
for about a decade, the power of the year 2000
computer is 10-100 times greater than that produced
in 1990. If present trends continue, the capabilities
for CAI and CBE literacy instruction are likely, by
the year 2010, to go far beyond the elementary
approaches of the 1990s.

3.2 Multisectoral Approaches

Literacy skills are utilized in many life contexts outside
of academic settings. To date, most research and
development has focused primarily on school-based
settings. A major challenge rests in determining the
ways that literacy can be fostered and utilized in
everyday family and work settings. From a policy
perspective, more needs to be known about how
literacy education can be infused into the significant
development work of other sectors, such as agriculture
and health. In these two sectors, literacy is a major
vehicle for innovation and knowledge dissemination,
yet few studies have explored what levels of literacy
determine the effectiveness of such dissemination.

3.3 Design of Materials

In developing countries, increased textbook provi-
sion has been viewed by donors and ministries of
education as a key strategy for the improvement of
school instruction (Heyneman and Jamison 1980).
However, very little is known about how the design
of instructional materials influences comprehension
and learning. There are also enormous subject matter
and national variations in conventions of text design.
Some important work on the relationship between
characteristics of textbook discourse and comprehen-
sion is being carried out that has implications for
improving school textbooks, as well as materials for
other sectors. For example, there is a special need to
improve instructions for pharmaceutical and agricul-
tural chemicals, whose safe and effective use requires
performing complex cognitive tasks with procedural

information that is often difficult to comprehend
(Eisemon 1988, Wright 1999).

3.4 Mother-tongue and Second-language Issues

As previously discussed many learners enrolled in
adult education programs are being taught literacy in
a second language. In developing countries, a
significant proportion of these students are either
illiterate in their mother tongue or receive only a few
years of mother-tongue instruction before a second,
usually foreign, language is introduced as a medium of
instruction. Poor second-language literacy proficiency
is a cause of high repetition and wastage rates, and of
low achievement in academic subjects in primary and
secondary schools with profound consequences for
employment and other externalities of schooling.
Because of the significant debate on first- and
second-language/literacy policy (often related to
national issues of ethnicity and power), most
government agencies worldwide have been reluctant
to review such policies. However, there are a number
of important areas of work which need to be
addressed beyond the confines of this debate, such
as: (a) Under what conditions should mother tongue
literacy be a precondition for the introduction of
second-language literacy in school-based and non-
formal settings? (b) How does the implementation of
language-of-instruction policies affect literacy after
schooling? (c) What are the effects of using second-
language literacy in school on wastage and grade
repetition? (d) What are the implications of using the
second-language literacy for academic subjects like
mathematics, science, health, nutrition, and agricul-
ture? (e) What roles do orthographic similarity and
dissimilarity play in transfer between mother tongue
and second literacy? These and similar questions will
need to be addressed before major progress can be
made in improving literacy levels in national and
international contexts.

4. Conclusion

The importance of research, measurement, and
innovation in literacy is that they can provide new
paths to greater efficiency in literacy provision
around the world. While no social program (includ-
ing research) is without economic costs, such
expenditures must be understood in the light of costs
involved in not knowing how to carry out literacy
programs practically and efficiently. Those who have
argued that the literacy crisis is so great that the
support of reseajeh is somehow wasteful are likely to
be proven wrong. To invest resources in implementa-
tion without developing the means to learn from such
programs is to call into question any purported gains
in literacy work.
The year 2000 was a critical moment to reinforce
literacy efforts, as global economic changes are
requiring significant changes in worker skills and the


Multidimensional Scaling

heightened role of information exchange. In spite of
the clear need for cultural sensitivities and specifi-
cities, there may be important economies of scale as
more is learned about literacy. Methodologies for
pilot programs, assessment and evaluation, and
computerized textbook preparation, as examples,
may be transferable with local adaptations to varying
cultural contexts. The need for literacy and other
basic skills has never been greater, and the gap
between literate and nonliterate lifestyles is becoming
ever larger, with parallel growth in income disparities.
Literacy and learning are a part of the culture of every
society. To produce major changes in literacy requires
both a realistic understanding of the kinds of change
which people and nations desire, and sustained
support to provide appropriate instructional services.

See also: Literacy; Oracy.


Anderson J 1999 Information technologies and literacy. In:
Wagner D A, Venezky R L, Street B V (eds.) Literacy: An
International Handbook. Westview Press, Boulder, CO
Downing J 1973 Comparative Reading. Macmillan, New York
Dutcher N 1982 The Use of First and Second Languages in
Primary Education: Selected Case Studies. World Bank
Staff Working Paper, No. 504. World Bank, Washington,


Eisemon T E 1988 Benefiting from Basic Education, School
Quality, and Functional Literacy in Kenya. Pergamon,
New York
Gray W S 1956 The Teaching of Reading and Writing: An
International Survey. UNESCO, Paris
Heyneman S P, Jamison D T 1980 Student learning in
Uganda: Textbook availability and other factors. Com-
parative Education Review 24(2): 206—20
Hornberger N 1999 Language and literacy planning. In:
Wagner D A, Venezky R L, Street B V (eds.) Literacy: An
International Handbook. Westview Press, Boulder, CO
Kirsch I, Jungeblut A 1986 Literacy: Profiles of America's
Young Adults. Final report of the National Assessment of
Educational Progress. ETS, Princeton, NJ
Lavy V, Spratt J, Leboucher N 1995 Changing patterns of
illiteracy in Morocco: Assessment methods compared.

LSMS Paper 115. The World Bank. Washington, DC
OECD/Statistics Canada 1995 Literacy, Economy, and
Society. OECD, Paris
OECD 2000 Learning to Bridge the Digital Divide. OECD,


Ogbu J 1978 Minority Education and Caste: The American
System in Cross-cultural Perspective. Academic Press,
New York

Statistics Canada 1988 A National Literacy Skill Assessment
Planning Report. Statistics Canada, Ottawa, ON
Tuijnman A, Kirsch I, Wagner D A (eds.) 1997 Adult Basic-
Skills: Innovations in Measurement and Policy Analysis.

Hampton Press, Cresskill, NJ
UNESCO 1957 World Illiteracy at Mid-century: A Statis-
tical Study. UNESCO, Paris
UNESCO 1996 Education For All: Mid-decade Report.
United Nations 1989 Measuring Literacy through Household
Surveys: A Technical Study on Literacy Assessment and
Related Topics through Household Surveys. National
Household Survey Capability Programme. United
Nations, New York
Venezky R L, Osin L 1991 The Intelligent Design of
Computer-assisted Instruction. Longman, New York
Venezky R, Wagner D A, Ciliberti B (eds.) 1990 Towards
Defining Literacy. International Reading Association,
Newark, DE
Wagner D A 1990 Literacy assessment in the Third World:
An overview and proposed scheme for survey use.
Comparative Education Review 34(1): 112-38
Wagner D A 1992 Literacy: Developing the Future.
International Yearbook of Education, Vol. 43.
UNESCO/International Bureau of Education, Geneva
Wagner D A 1993 Literacy, Culture, and Development:
Becoming literate in Morocco. Cambridge University
Press, New York
Wagner D A 1998a Literacy retention: Comparisons
across age, time, and culture. In: Wellman H,
Scoot G, Paris (eds.) Global Prospects for Education:
Development, Culture, and Schooling. American Psycho-
logical Association, Washington, DC: pp. 229-51
Wagner D A 1998b Literacy Assessment for Out-of-
School Youth and Adults Concepts, Methods, and New
Directions. ILI/UNESCO Technical Report. Interna-
tional Literacy Institute, University of Pennsylvania,
Philadelphia, PA
Wagner D A 2000 Literacy and Adult Education. World
Education Forum, Dakar, Senegal
Wagner D A, Hopey C 1999 Adult literacy and the Internet:
Problems and prospects. In: Wagner D A, Venezky R L,
Street B V (eds.) Literacy: An International Handbook.
Westview Press, Boulder, CO
Wagner D A, Puchner L (eds.) 1992 World Literacy in the
Year 2000: Research and Policy Dimensions. Annals of
the American Academy of Political and Social Science,
Newbury Park, CA
Wagner D A, Venezky R L, Street B V (eds.) 1999 Literacy:
An International Handbook. Westview Press, Boulder, CO
Wright P 1999 Comprehension of printed instructions.
In: Wagner D A, Venezky R L, Street B V (eds.)
Literacy: An International Handbook. Westview Press,
Boulder, CO

Multidimensional Scaling

W. C. Rau

Graphic models have a long history in the study of
language; the century-old, frequent use of tree

diagrams to represent syntax and grammar stands
as an eloquent testimony to the need for classificatory


Methods in Sociolinguistics

tools in language studies (Stewart 1976). Linguists
seem much more reluctant, however, to use
conventional statistics in their investigations, and
texts on statistical applications in linguistics
(e.g., Butler 1985) seem rather elementary from the
perspective of the social sciences. There are
exceptions, of course, with the work of Labov
(1980) and Osgood (1975) coming immediately to
mind. Nonetheless, one looks long and hard through
mainstream journals, such as Language, to find
the kind of empirical and statistical analysis that
a social scientist would recognize. Data do not
appear very often, but when they do, they may be
presented without further statistical analysis or
interpretation (e.g., Lipski 1986, Lehrer 1985). This
is unfortunate, because many new statistical tools
exist that can profitably analyze a broad range of
linguistic phenomena. One such technique is multi-
dimensional scaling (or MDS). Linguists will be
relieved to see that it presents their data in sheer
attire, accentuating rather than concealing what is
already there.

1. Basic Procedures

As a preparatory step for analysis, social scientists
gather information into a data matrix where ob-
servations are placed in the rows, and variables are
placed in the columns. For example, Cavalli-Sforza
and Wang (1986) analyzed linguistic data from
Micronesia that filled a data matrix containing fully
9,136 pieces of information. Imagine their matrix
drawn on a very wide sheet of paper. There are 16
rows of data, one for each island's linguistic data.
Each of the 571 columns represents a particular gloss
or word. Each of the 9,136 cells of this vast table
contains the morpheme employed by the particular
society (row of the table) for the particular word
(column of the table). For example, a full stomach is
described as mat on the island of Tobi, math on
Falalop and Mogmog, and mat on Woleai, Ifaluk
and Satawal. Societies with the same morphemes
have cell entries of 1; those with different morphemes
receive a zero.

This array of information is referred to as raw
data. The point of statistical analysis is to 'boil down'
or 'reduce' these raw data to a set of summary indices
or dimensions which can then be used to highlight

major patterns in the data. One of the more common
first steps in this process of reduction is for the
analyst to calculate some measure of association
among either the variables or observations in the
matrix. Pearson's or Spearman's correlation coeffi-
cients, one of the numerous matching coefficients, or
a distance measure, such as Euclidean distances,
represent some of the choices. These measures are
then reduced further through use of factor analysis,
multidimensional scaling, or regression analysis.
When well done, the end result is a parsimonious
rendering of the major patterns of association among
variables or observations in the original data matrix.
The first step in this process leaves one with a
triangular matrix, such as the upper-right triangle in
Fig. 1. This table is a small excerpt of the one
published by Cavalli-Sforza and Wang, and shows
relationships linking only 6 of the 16 islands. Each
element in the matrix shows the extent to which one
observation (or variable) is similar to or different
from another observation (or variable). In Fig. 1
Woleai and Mogmog show the highest correlation,
0.728, and Satawan and Tobi the lowest, 0.446. In the
full data matrix, the highest coefficient found, 0.987,
is for Falalop and Mogmog, and shows these islands
to have nearly all of the 571 linguistic characteristics
in common. Tobi and Murilo, however, have the
lowest coefficient, 0.406, with less than half the
characteristics in common.
Reducing a data matrix to a set of similarity
measures is straightforward; any of a number of
computer statistical packages will reduce even the
largest data matrix in a matter of seconds. The real
issue is whether to reduce the data matrix across
either the variables or the observations. For example,
one can reduce the 16 x 571 Micronesian data matrix
to a 571 x 571 matrix of correlations among variables
(words) or a 16 x 16 matrix of correlations among
observations (islands). Which way does the linguist
want to go?

Here, conventional practices in the social sciences
are not helpful. Most social scientists are interested in
testing hypotheses which, with nearly universal
regularity, are couched in the language of variables.
Proverbial Variable X is hypothesized to affect
Variable Y, so the analyst uses correlation analysis,
regression analysis, etc., to determine how changes in







Miles/Matching coefficient


















Figure 1. Geographical distances and matching coefficients among six Caroline Islands and languages.


Multidimensional Scaling

Variable X or a set of X Variables is associated with
change in Variable Y. In short, social scientists
almost always work across the columns in the data

Linguists would appear to be more interested in
patterns among observations. Linguistic analysis is
more holistic or relational than most quantitative
analysis in the social sciences (cf. Rau and Roncek
1987, Rau and Leonard 1990). If true, then linguists
should reduce data across the rows of the data matrix
and not across the columns. This crucial difference
may be one reason why linguists have been reluctant
to borrow statistical tools employed by social
scientists. The latter, with their excessive fixation on
conventional approaches to hypothesis testing, use
statistical tools almost exclusively to analyze vari-
ables. Hence, the manner in which some of these
tools can be adapted to the analysis of linguistic data
has not been clearly demonstrated. It will now be
shown below how MDS can assist in holistic analyses
of data by reducing measures of association among

2. An Application of the Method

One cannot imagine a better or more parsimonious
theory for demonstrating this use of MDS than that
provided by Cavalli-Sforza and Wang (1986) in their
study of lexical replacement in the Carolines, a chain
of islands and atolls in Micronesia. Directly north of
New Guinea and the Solomons, to the east of the
Philippines, and to the west of the Marshall Islands,
the 16 islands and atolls in their study stretch in a
ragged but compact band for some 1,400 miles. The
Carolines represent an ideal natural laboratory for
testing their 'stepping-stone' theory of language

In a simplified version of the wave theory of
language change, they argue that linguistic diffusion
will be a negative exponential function of the distance
separating a linear array of discrete language
communities. Linguistic diffusion in traditional wave
theory is complicated by the multidirectionality of
sources of linguistic innovation. Since change can
emanate from a number of different directions, a
complex network of isoglosses can result. Choice of
the Carolines minimizes the problem of multidirec-
tionality since, as noted, the islands are somewhat
isolated and spread out in a fairly straight and
narrow band. Given these fortuitous geographic
circumstances, one can expect linguistic differences
among the islands to be a function of their respective
geographic distances.

2.1 Creating a Map of the Carolines

The measures published by Cavalli-Sforza and Wang
(and excerpted in Fig. 1) can be used to explain the
principles in MDS and then to provide a new test of
stepping stone theory. First, the miles between pairs

of islands are scaled, as given in the lower left triangle
in Fig. 1. MDS can scale distances as well as
correlations or matching coefficients. Several classic
demonstrations of MDS principles produce maps
from a matrix of distances among cities (see Kruskal
and Wish 1978: 7-10, Borg and Lingoes 1987: 1-5).
MDS can produce an accurate map if it is
given the miles or kilometers between any
number of locations, be they cities or atolls. A
map of the Carolines is seen in Fig. 2, with the
names of the atolls placed as close as possible to
their point locations. It was generated with
ALSCAL (Alternating Least Squares Algorithm
for Scaling), an MDS algorithm developed by
Forrest Young and others (Young 1987, Young
and Lewyckyj 1979) and carried by such popular
computer statistical packages as SPSS and SAS.
Notice one interesting feature of the plot. When
any of the MDS algorithms produces a configura-
tion, the 'compass points' or termini of the resulting
coordinates are arbitrary. In the present case, the
configuration in Fig. 1 represents a mirror inversion
of how the Carolines are presented in a geographic
atlas. What is most important in multidimensional
scaling is the structure or pattern of relationships
among objects in the configuration. Whether the
objects come out upside down is completely irrele-
vant. The researcher can always stand on his
head—or simply give the configuration a spin so
that it conforms to the conventional, and arbitrary,
Eurocentric geographical coordinates. Australians,
for example, might find the present configuration
entirely sensible!

2.2 The MDS Algorithms

Before getting into substantive issues, it will be useful
to explore briefly how MDS algorithms produce
configurations. For purposes of illustration the




Nama Pulusuk

Fana. Puluwat |
"East"- Moen - Pulap - Satawal - Ifaluk -









Figure 2. ALSCAL plot of the Carolina configuration.


Methods in Sociolinguistics

strategy behind Smallest Space Analysis will be
demonstrated. This is an MDS algorithm developed
by Louis Guttman (1968) and James Lingoes (1973).
Smallest Space Analysis is a close cousin to
ALSCAL, and its principles are easier to illustrate.
The lower left triangle of the first matrix in Fig. 3
contains the miles separating six islands, simply
copied from Fig. 1. MDS starts with comparisons
among each pair of islands, 15 island-pairs in this
case. It begins by assigning a ranking number of each
pair; the islands separated by the smallest distance
are assigned as 1, the pair with the next smallest
distance a 2, and so on until 15 numbers are assigned.
The D Matrix in Fig. 2 gives the resulting integers for
the island-pairs; it gives the rank order structure of
distances among the six islands. It shows which
islands are closest and furthest away from each other
in space. Puluwat and Satawal, separated by a
distance of only 129 miles, are ranked at 1; Tobi
and Satawan, with 1,284 miles between them, get the
rank of 15.

MDS will now produce a configuration of points
which tries to reproduce the D Matrix ranking
integers within the space or dimensionality specified
by the researcher. Since stepping-stone theory
assumes a one-dimensional array of language com-
munities, that constraint is applied even though
locations in geographic space usually require two or
three dimensions for a completely accurate mapping.
In the present instance, MDS has the added
advantage of testing whether the Carolines approx-
imate the unidimensional ordering assumed in step-
ping-stone theory. If distances among these islands
are in fact sequences in an order approximating a line
of stepping-stones, then MDS will be able to fit them
onto a one-dimensional line.
Smallest space analysis begins with a principal
components solution. Each island gets a numeric
value giving its location as a point on the principal
components dimension. These values are then used to
calculate interpoint distances across each pair of
islands. Next, these distances are transformed into
'rank images' and stored in a D* Matrix. Corre-
sponding ranks in the two matrices are compared,
and then points in the configuration are shifted about
until the rank images in the D* Matrix correspond as
closely as possible to the ranking of integers in the D
Matrix. This iterative process is guided by a 'badness
of fit' measure, called the 'coefficient of alienation' in
Smallest Space Analysis (and 'stress' in ALSCAL),
which optimizes the fit between the rank images and
ranking integers. When and if the coefficient of
alienation equals 0, then the rank images and ranking
integers correspond perfectly. In the present instance,
this would mean that the six islands can be sequenced
into a one-dimensional array with no loss of the rank
ordering of the mileage in the original distance

In Smallest Space Analysis, the loss function or
coefficient of alienation is given by

u = Id^/Id2


K = (1 - u)1



The term dy is the rank ordering of the distance
between the ith and jth islands in the D Matrix, and

dj* is the rank image for the same island-pair. K is the
value of the loss function or the coefficient of
alienation. On the basis of a hand-estimated point
plot of a smallest space configuration in Fig. 3,
K = 0.06, which means that the six islands can be
sequenced in a one-space with almost no loss of the
information in the original distance matrix.
Computers do it better, and in this case ALSCAL
can fit the 16 islands and 120 island-pairs in one
dimension with a stress value of 0.024, thus account-
ing for over 99 percent of the rank order information
in the original distance matrix. While stress in two
dimensions is zero, the islands are pretty much in a
straight line, thus permitting only one dimension to
be used. In short, the Carolines do present an ideal
natural laboratory for Cavalli-Sforza and Wang.
Equally comforting is the fit of the matching
coefficients presented in the matrix for all 16 islands.
As noted, MDS algorithms can also fit similarity
coefficients which transform binary data into mea-
sures ranging from 0 to 1.00. Since many kinds of
linguistic data can be represented in binary form (+ ,
—; yes, no; present, absent), transformation of this
information into a matrix of matching coefficients
prepares them for MDS analysis. For these matching
coefficients, the stress value in one dimension is 0.065;
hence 98.7 percent of the information in the
coefficient matrix is preserved in a one-dimensional
array of the islands.

2.3 Testing Stepping-stone Theory

The stepping-stone theory can now be tested in a
simple and direct fashion. The MDS configurations
allow the islands to be sequenced into geographical
and lexical arrays. If the theory is correct, the
position of the 16 islands in the two arrays should
be the same, or at least very close. This is in fact the
case. From one end, the geographic array begins:
Tobi, Sonsorol, Mogwog, Falalop, and Woleai. The
lexical array is almost identical, beginning: Tobi,
Sonsorol, Falalop, Mogwog, and Woleai. Ten of the
16 islands have exactly the same rank order
linguistically as they do geographically, and the rest
average only about two ranking steps different in the
two arrays. The correlation between the ordering of
the linguistic and geographic arrays is 0.926, which is
very nice indeed.
One always expects data to depart to some degree
from the pure pattern predicted by a theory. In this


Multidimensional Scaling

Miles\Rank order






Woleai "















Spatial distances\Rank images
































D Matrix

D* Matrix


Hand-estimated point plot


























Calculation of loss function

djj xd























_ 4_


= 1,238/1,240

= 0.998

. = (l-u2


= (1-0.997)' 2

= 0.057

Figure 3. An illustration of the MDS fitting process.

case, islands that are but a stone's throw away from
each other may have reversed orders, such as
Mogwog and Falalop. In addition, Fig. 2 shows
islands in the Truk District (The Eastern Carolines)
to have as much longitudinal displacement as
latitudinal displacement. And the positions of Murilo
and Satawan differ five and four steps, respectively, in
the linguistic and geographic rankings. Such devia-
tions should come as no surprise. Among the great
maritime navigators in premodern times, Microne-
sians could transverse the distances among islands in
the Truk District (Gladwin 1970, Lewis 1972).
Different population sizes and rates of trading and
exchange could account for these minor deviations.

3. Pronunciation in Dialects of Spanish

Another area for which MDS is suited concerns the
study of dialects. This area already has lent itself to

imaginative map making. Maps such as those made
by Dennis Preston (1986) require a rather direct
correspondence between geographic regions and
systems of pronunciation, however, and the distribu-
tion of dialects does not always prove so convenient.
Spanish is a case in point. Moreover, the nature and
distribution of Spanish dialects is a major educa-
tional issue as well. In the USA, Anglo students are
taught in the dialect of Mexican Spanish, and they
assume that since it is taught in the schools it must
therefore be the 'standard' or 'correct' pronunciation.
When students have difficulty understanding Cubans
or Puerto Ricans, for example, they often assume
'that Puerto Rican and Cuban pronunciation is
somehow sub-standard or inferior, and therefore
the lack of comprehension is the fault of the native
speakers themselves and not the students or the
school system' (Terrell 1977: 35).


Methods in Sociolinguistics

How then does Mexican or so-called 'standard
Latin-American pronunciation' compare with other
dialects? MDS can show this by plotting & figurative
map giving the similarities and differences among any
number of dialects. One of the raw data matrices
mentioned earlier will illustrate this, a compilation of
the field work of John Lipski and others (Lipski
1986). The original data matrix presents different
ways of pronouncing the phoneme /s/ in seven cities
or areas in Spain and another 21 cities or countries in
Central and South America. It appears that in most
of these studies 10 respondents with a high school
education (typically from capital or major cities) were
interviewed for one half hour in an informal or
conversational style. Tape recordings were then
analyzed for the frequency of occurrence of different
ways of pronouncing /s/ in five different spoken
contexts, such as immediately after another conso-
nant or after a word break and a stressed vowel.
Three different pronunciations of /s/ were noted:
simple sibilant, aspiration, or deletion.

3.1 Analyzing the Spanish Data

Considering just /s/ after a consonant, three cities can
be compared to note the kinds of differences present
in the dataset. In Barcelona, Spain, 99 percent of the
time a simple sibilant was spoken, and 1 percent of
the time an aspiration was given instead. The
percentages were very similar in Madrid, 94 and 6
percent. But in Granada, a sibilant was never spoken,
while 82 percent of the time there was an aspiration,
and in the remaining 18 percent the sound was
deleted altogether. Clearly, Barcelona and Madrid
are similar, while Granada is quite different from
both of them. It is easy to quantify this difference. To
determine the linguistic 'distance' between two cities,
the Euclidean distance formula will be used to total
the differences between the percentage figures for all
the language characteristics in the data set.
These data have been transformed into a matrix of
Euclidean distances among the pairs of countries or
cities. Two cities with the same patterns of pronun-
ciation would have a distance of 0. They would have
the same frequency or percentage of occurrence for
the 15 variables represented as columns in the data
matrix. As distances between pairs increase, they
become less and less similar. MDS can reveal both
the range and variety of differences contained in a
matrix. Put differently, it can reveal both quantitative
and 'qualitative' differences. But first the number of
dimensions needed for an accurate point configura-
tion must be determined.
The ALSCAL stress values for three through one
dimensional solutions are as follows. A three-dimen-
sional solution has a stress of 0.042, a very small
departure from a perfect fit, while a two-dimensional
solution does almost as well, with a stress of 0.068. A
one-dimensional solution is markedly worse, at

North (+)

--— -i— —-i

-i— _ -(-__.-(___ + _ _ ..)— _ .+ _ _ + _ _



West (-) -


Puerto Rico


SEVILLA Guayaquil Panama






El Salvador

u A


Costa Rica


MAD. Mexico



Quito I

South (-)

Figure 4. ALSCAL configuration for Spanish dialects.

0.206. Another way to evaluate the solutions is in
terms of R2

, the 'variance explained,' a measure of
how well the solution captures the information in the
data matrix. R for three dimensions is 0.991, almost
the theoretical maximum of 1. But a two-dimensional
solution is not far behind, with an R2

of 0.980, while
one dimension explains considerably less of the
variance, with 0.871. Clearly, a two-dimensional
solution is nearly as good as a three-dimensional
one, and far better than a one-dimensional solution,
almost completely reproducing the information in the
original matrix. And two dimensions are easier to
visualize than three. Therefore, two dimensions can
provide our figurative map of similarities and
differences among Spanish pronunciations.
Figure 4 is the resulting map. Compare the
locations of Mexico, Cuba, and Puerto Rico. Their
distance apart brings out the difference in their
phonological systems. Mexican pronunciation is
most similar to the Spanish spoken in the six to
seven countries directly adjacent to it at the center
east end of the plot. By the same token, Mexican
Spanish is hardly the hub of Lipski's sample of
dialects. If, for the moment, the seven dialects
clustered around Mexico are ignored, then Cuba
and Puerto Rico, along with Murcia become the hub
or center for the remaining 17 dialects.
Note also, as in the case of Spain, the diversity of
pronunciations that can occur within a country. The
Spanish spoken in Madrid (MAD in the plot) is most
similar to that spoken in Mexico and very similar to
that of Barcelona, but it is clearly not representative
of the other areas in Spain. Those areas, which are
capitalized in Fig. 4, form a sideways V at the west
end of the plot. Although Madrid and Mexican
Spanish and highly similar, Murcian Spanish is more
similar to the other dialects in Spain. With regard to
the phoneme /s/, someone from Murcia would
pronounce words very much like a Cuban or Puerto
Rican, and not at all like someone from Madrid or


Multidimensional Scaling

Barcelona. To the extent that the first coordinate (the
east-west axis) is more important than the second
(the north-south axis), then there is more dialect
diversity within Spain proper than in the Latin
American areas featured in Lipski's sample. Sam-
pling of a number of sites within each of the other
countries might increase the overall diversity, how-
ever. As a case in point note the difference between
Guayaquil and Quito in Ecuador—and to a lesser
extent, the differences between Bogota and Carta-
gena in Colombia.

3.2 Interpretation of an MDS Map

This brings up the issue of interpreting the properties
of dialect clusters or regions in Fig. 4. Two
interrelated strategies can help to determine why
one group of observations locate in one region of the
configuration, and other groups of observations
locate elsewhere. More generally, these strategies
identify the properties of each of the dimensions
framing a configuration. Figure 4 has two dimen-
sions. What aspects of the data, then, are responsible
for observations locating at either the east or west
end of dimension 1, or the north or south end of
dimension 2? Clearly, this is not a simple case of
diffusion like that illustrated by the Micronesian data
considered above, because there is little correspon-
dence between the linguistic map in Fig. 4 and the
geographic positions of the observations. The con-
figuration must be understood in some other way.
One simple and direct approach is to choose a set
of observations which extend across one dimension
while varying little on the other dimensions. The
analyst then inspects the data matrix and finds those
variables which change in the most consistent and
dramatic fashion as one moves along the dimension.
For example, as one scans along dimension 1 from
Barcelona through Bogota, Panama, and Guayaquil
to Sevilla, the data reveal a great decrease in the use
of sibilants. But picking a route through the
observations is a bit subjective. Also, especially when
the data set contains a great variety of variables, it
can be difficult to decide which variables are most
responsible for the configuration.
A second approach is to use regression analysis
(Kruskal and Wish 1978) to establish the relationship
between the variables in the raw data matrix and
numerical scores for the dimensions in the MDS
solution. For Lipski's data, the 28 observations have
numerical scores on each of the 15 variables in the
raw data matrix plus another score on each of the
dimensions or coordinates in the MDS configuration.
Each of the 15 variables can simply be regressed onto
the values for the two dimensions. Variables with
the largest regression weights for a particular dimen-
sion are the ones that will change in the most con-
sistent and dramatic fashion for observations
located at different points along the projection

of that dimension. Thus, these two strategies are
The results for the regression strategy are in Fig. 5.
The decimal values under each dimension are
standardized regression weights or betas. The betas
for each row establish the relative contribution that
scores for each dimension make to the variance
explained for a variable. The R2

value gives the total
variance explained by both dimensions. For example,
the two dimensions account for 87 percent of the
total variance in the first variable (sibilated con-
sonants), and dimension 1, with a beta of +0.93, is
responsible for nearly all of this variance. As a matter
of convention, ALSCAL assigns positive numeric
values to dimension scores to the east and north of
the center of a configuration; dimension scores to the
west and south are negative. Thus, the positive sign
for this beta means that the percentage of sibilated
consonants increases as one moves from west to east
on the first dimension. More generally, the beta
weights are roughly analogous to factor loadings in
factor analysis (for a discussion of factor analysis see
Rummel 1970). They tell us which variables are
associated with which dimensions.
The results in Fig. 5 present the kind of distinctive
statistical attire that reveals that stately beauty of
human language. The clarity and strength of the
patterns are striking and are reminiscent of Einstein's
famous adage: if God does not play dice with the
universe, then the Spanish certainly do not play dice
with their dialects. In this table Lipski's variables
have been rearranged to bring out the numerical
expression of the rules of pronunciation contained in
his data. If /s/ is rendered as a sibilant after a


Simple Sibilant [s]

1 Consonant
4 WB Consonant
7 Before Pause
10 WB Stressed Vowel
13 WB Unstressed Vowel

Aspiration [h]
2 Consonant
5 WB Consonant
8 Before Pause
11 WB Stressed Vowel
14 WB Unstressed Vowel

Deleted [0]
3 Consonant
6 WB Consonant
9 Before Pause
13 WB Stressed Vowel
15 WB Unstressed Vowel

Symbol: WB = Word Break














Figure 5. Standardized regression coefficients of ALSCAL
dimensions for each phonetic variable.


Methods in Sociolinguistics

consonant, then it is also rendered as a sibilant after
stressed or unstressed vowels and pauses or word
boundaries. To a lesser but still striking degree, the
same rule holds for aspiration or deletion.
Next, the results in Fig. 5 can be used to help
identify the phonological differences responsible for
the dispersion of the 28 dialects into different regions
and clusters in the two-dimensional configuration.
The regression weights for dimension 1 show that
speakers at the east end of the configuration
pronounce /s/ as a sibilant; those at the west end
resort to either aspiration or deletion. We can
distinguish further aspiration from deletion by means
of dimension 2. Those at the north end of coordinate 2
rely on aspiration, those at the south end, on deletion.

4. Scaling Individual Speakers

One more example may prove useful. Thus far,
aggregated data, i.e., islands and cities have been
analyzed. MDS can be just as useful in analyzing
patterns among individuals, although loss functions
may run higher. For purposes of illustration, the
individual-level data provided by Mohan and Zador
(1986) are examined.
These researchers were interested in the process of
'language death' (see Endangered Languages) which
they characterize as the point at which an ethnic
community loses the ability to speak in its native
language. A major question is whether language death
is a slow, gradual process akin to Darwinian evolu-
tionary change, or whether it is marked by sharp
discontinuities among groups of speakers who possess
qualitatively different degrees of fluency in the ethnic
language. Mohan and Zador chose the (South Asian)
Indian community in rural Trinidad, West Indies, for
study of this issue. Beginning in the mid-nineteenth
century, the area received migrants from Uttar Pradesh
and western Bihar who worked as indentured laborers
on sugar plantations. Once the common language of
the Indian community, Trinidad Bhojpuri is now only
spoken frequently by older, rural inhabitants of the
island. Creole and Standard English have become the
languages of choice among Indians born in Trinidad.
To determine the nature of language change in this
community, Mohan and Zador tested the language
competence of 40 speakers of Trinidad Bhojpuri who
were grouped into four age categories. Speakers' ages
ranged from 26 to 96, but the key group consisted of
speakers under age 35. While the older subjects were
representative of their age groups, Mohan and Zador
were hard pressed to find younger subjects who were
capable of speaking Trinidad Bhojpuri for any length
of time. Subjects number 31 through 40, all under age
35, were thus chosen on an ad hoc basis. Eight of
these respondents consisted of sibling pairs who were
close in age.

Mohan and Zador then tape recorded half-hour
segments of speech in Trinidad Bhojpuri and

calculated the average speed of speech, five different
kinds of errors, and ten incidence counts of features
of Trinidad Bhojpuri that a fluent native speaker
would possess. Taken as a whole, their 16 measures
were designed to elicit the kind of speech — fast,
correct, and articulate — that competent speakers of
any language will possess. If the loss of a native
language is a gradual process, then a gradual loss of
speed and grammatical complexity and a gradual
increase in errors would be expected. On the other
hand, if there is a relatively discontinuous loss of
language, a qualitative break in performance across
these measures could be expected.
The MDS analysis of the structure of relations
among observations is ideally suited to test issues of
this sort. If major discontinuities exist, an MDS
configuration will show large gaps resulting in
noncontinuous groups of subjects. The 40 x 16
matrix was transformed to a matrix of Euclidean
distances among the 40 subjects, and then analyzed
with ALSCAL A one-dimensional solution had a
stress of 0.205 and R2

of 0.898, not terribly
impressive. Solutions with two, three, and four
dimensions had stresses of 0.132, 0.098, and 0.071,
respectively. There is no hard and fast rule for
determining an acceptable stress value, but as a
general rule analysts like to get stress below 0.10,
although this is not always possible or necessary. In
this case, a three-dimensional solution achieves this
criterion. Inspection of the plots for both two-
dimensional and three-dimensional solutions leads
to the same outcome.
Figure 6 uses the first and second dimensions of a
three-dimensional solution, and it reveals the gap
expected by advocates of theories of discontinuous
language change. Six of the under age 35 group
locate at the far east end of the configuration. Only
two isolates (speakers 37 and 39), also under 35,
occupy a large empty region before one encounters a
much larger cluster of competent Trinidad Bhojpuri

North (+)

H --H ---t --H -- + -H --H --H ---\ -- + -34

West (-)








18 12 '

6 4 3







19 35


























South (-)

Figure 6. Plot of dimensions 1 and 2 for Trinidad Bhojpuri


Multidimensional Scaling

speakers in the center west region of the configura-
tion. Only two speakers under age 35 (numbers 35
and 36) are adjacent to this cluster, so the config-
uration establishes a large, decisive break in the
linguistic competence of the older and younger

The techniques of MDS permit further refinements
in this analysis. For one thing, the solution suggests
either that some of the variables employed by the

researchers measure something other than compe-
tence, or that competence itself is not unidimen-
sional. Regression analysis can help us interpret the
dimensions. The east-west dimension, not surpris-
ingly, is strongly associated with rapidity of speech,
with the slow speakers in the east, and the fast ones in
the west. This dimension has some claim on the label
of 'general competence,' because most of the 10
features of Trinidad Bhojpuri are associated with the
fast-speaking west end of the configuration, and most
of the five errors are associated with the east end.
However, some of the features are far more strongly
associated (conjunctive markers, compound verbs,
and noun definitizers) than are others (relative
clauses and numerical classifiers). The second dimen-
sion is greatly defined by lack of relative clauses, use
of numerical classifiers and noun definitizers, and
omission of copulas.
A fresh analysis could further differentiate the
large cluster of speakers in the center west of Fig. 6,
and there is the small group in the northwest to
consider as well. Interpretation of the groupings—
and definitive interpretation of the dimensions—
would require a scholar with the understanding of
Trinidad Bhojpuri possessed by Mohan and Zador.
The important point is that Fig. 6 identifies some
noticeable differences among fluent speakers of
Trinidad Bhojpuri, as well as charting the distribu-
tion of degrees of competence in this dying language.
In short, MDS offers linguists a powerful tool for
systematically identifying both large and subtle
differences in linguistic competence.

5. Conclusion

Several ways have been shown in which multi-
dimensional scaling of observations can assist lin-
guists in their study of the structure of language.
Whether one is concerned with individuals or
communities, one period of time or several, MDS
can assist in the identification and interpretation of
patterns in data. A few linguists, such as Cavalli-
Sforza and Wang or Mohan and Zador, already use
conventional statistical tools to good effect. The MDS
strategy employed here provides additional confirma-
tion for their theories while demonstrating the
advantages of complementing conventional statistical
tests with results from MDS. Additionally the techni-
que could be of value to linguists who have never
used statistical tools, conventional or otherwise.

See also: Sociology of Language; Scaling; Statistics in


Borg I, Lingoes J 1987 Multidimensional Similarity Structure
Analysis. Springer, New York
Butler C 1985 Statistics in Linguistics. Blackwell, Oxford, UK
Cavalli-Sforza L L, Wang W S-Y 1986 Spatial distance and
lexical replacement. Language 62: 38-55.
Gladwin T 1970 East is a Big Bird: Navigation and Logic on
Puluwat Atoll. Harvard University Press, Cambridge,


Guttman L 1968 A general nonmetric technique for finding
the smallest coordinate space for a configuration of
points. Psychometrika 33(4): 469-506
Kruskal J B, Wish M 1978 Multidimensional Scaling. Sage,
Beverly Hills, CA
Labov W (ed.) 1980 Locating Language in Time and Space.
Academic Press, New York
Lehrer A 1985 Markedness and antonymy. Journal of
Linguistics 21: 397-429
Lewis D 1972 We, the Navigators: The Ancient Art of
Landfmding in the Pacific. University Press of Hawaii,
Honolulu, HI
Lingoes J C 1973 The Guttman-Lingoes Nonmetric Program
Series. Mathesis Press, Ann Arbor, MI
Lingoes J C, Borg I 1978 A direct approach to individual
differences scaling using increasingly complex transfor-
mations. Psychometrika 43(4): 491-519
Lipski J M 1986 Reduction of Spanish word-final /s/ and
/n/. Canadian Journal of Linguistics 31: 139-56
Mohan P, Zador P 1986 Discontinuity in a life cycle: The
death of Trinidad Bhojpuri. Language 62: 291-319
Osgood C E, May W H, Miron M S 1975 Cross-cultural
Universals of Affective Meaning. University of Illinois
Press, Urbana, IL
Preston D R 1986 Five visions of America. Language in
Society 15: 221^0

Rau W, Leonard W 1990 The evaluation of Ph.D. programs
in sociology: Theoretical, methodological, and empirical
considerations. The American Sociologist 21: 232-56
Rau W, Roncek D 1987 Industrialization and world
inequality: The transformation of the division of labor
in 59 nations, 1960-81. American Sociological Review

52: 359-69

Rummel R J 1967 Understanding factor analysis. The
Journal of Conflict Resolution 11: 444-80
Rummel R J 1970 Applied Factor Analysis. Northwestern
University Press, Evanston, IL
Shye S 1985 Multiple Scaling: The Theory and Application of
Partial Order Scalogram Analvsis. North-Holland, New

Stewart A H 1976 Graphic Representation of Models in
Linguistic Theory. Indiana University Press, Blooming-
ton, IN
Terrell T D 1977 Constraints on the aspiration and deletion
of final /s/ in Cuba and Puerto Rico. Bilingual Review
4: 35-51
Young F W 1987 Multidimensional Scaling: History,
Theorv, and Applications. Erlbaum, Hillsdale, NJ
Young F W, Lewyckyj R 1979 ALSCAL User's Guide. L. L.
Thurstone Psychometric Lab, Chapel Hill, NC


Methods in Sociolinguistics

Observer's Paradox

A. Davies

The term 'observer's paradox' refers to the well-
known methodological problem in linguistic re-
search: how to collect samples of authentic casual
speech. Labov (1972, see Labov, William) presents it

the aim of linguistic research in the community must be
to find out how people talk when they are not being
systematically observed; yet we can only obtain these
data by systematic observation. The problem is of course
not insoluble: we must either find ways of supplementing
the formal interviews with other data, or change the
structure of the interview situation by one means or

(1972: 209)

The term 'observer's paradox' is appropriate because
it suggests that the data of speech which the observer
wishes to elicit are (or may be—there is no way of
knowing) contaminated by the presence of the
observer. The methodological problem, therefore, is
how to observe without observing. This is of
particular relevance to sociolinguistic research, given
the importance accorded by Labov to the most casual
speech (which he terms 'the vernacular') in determin-
ing the direction of language change.
As Preston points out, 'the more aware respondents
are that speech is being observed, the less natural their
performances will be. The underlying assumption is
that self-monitored speech is less casual and that less
casual speech is less systematic, and thus less revealing
of the ... vernacular' (Preston 1989: 7). Labov's
concern with the methodological imperatives of his
search places him in the anthropological tradition of
participant observation. In that tradition, the anthro-
pologist attempts to share the daily routines of the
society s/he is studying as a means of gaining a better
understanding and heightening rapport with the
people s/he is studying. In private, normally inacces-
sible, settings, there are well-known techniques, using
'poses' or 'disguises,' for gaining access which might
otherwise be denied. Such techniques include posing
as a patient in a mental hospital, a prison inmate, or a
newspaper reporter.
Labov offers a number of techniques for gaining
access to these private speech situations which are
normally inaccessible: 'various devices which divert
attention away from speech and allow the vernacular
to emerge' (1972: 209). The subject, says Labov, may
be involved in questions and topics 'which recreate
strong emotions he has felt in the past or involve him
in other contexts.' The best-known of such questions
in the Labov methodology is the 'Danger of Death'
question: 'Have you ever been in a situation where

you were in danger of being killed? Narratives given
in answer to this question almost always show a shift
of style away from careful speech towards the
vernacular' (Labov 1972: 209-10).
Labov seems uninterested in the ethical questions
posed by such techniques, questions which exercise
many social science researchers. In addition to the
doubtful ethicality of poses and disguises, there is the
question of their effect on research validity. Whether
or not such techniques as a pose or a disguise, or a
misinforming of the subject by pretending to ask
about some incident while really being interested in
his/her speech, or again a surreptitious recording
machine, are ethical, the fundamental philosophical
problem of perception remains. This is an enduring
problem of philosophy, associated with Bishop
Berkeley's 'esse est percipi' (to be is to be perceived)
and with the tradition of phenomenalism, according
to which the reality of an external physical object is
based on its being perceived by someone.
To such a powerful view there is no logical answer,
as Boswell drily comments in his account of Dr
Johnson's famous commonsense 'refutation' of Ber-
keley T shall never forget,' writes Boswell, 'the
alacrity with which Johnson answered, striking his
foot with mighty force against a large stone, till he
rebounded from it, "I refute it thus"' (1922: 162).
Labov is indeed correct in referring to the problem
as a paradox. In sociolinguistic research, one can
never be sure that the observer, however disguised,
however familiar, has no influence, and that 'good'
vernacular data have been elicited. After all, the only
valid vernacular data that can be observed are the
observer's own, and phenomenalism's flip side of
solipsism ensures that we can never be sure that other
people are like us when we are not observing them.
Labov's confidence that his techniques do indeed
bring about a shift of style towards the vernacular
may therefore seem to be too certain. No methodol-
ogy can completely remove the ethical and the
philosophical constraints, although one may choose,
like Dr Johnson, to suspend disbelief and assume
that it does.

See also: Fieldwork Ethics and Community Respon-
sibility; Sociolinguistic Variation; Vernacular.


Boswell J 1922 Boswell's Life of Johnson. Macmillan,
Labov W 1972 Sociolinguistic Patterns. University of
Pennsylvania Press, Philadelphia, PA
Preston D R 1989 Sociolinguistics and Second Language
Acquisition. Basil Blackwell, Oxford, UK


Observing and Analyzing Classroom Talk

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->