Professional Documents
Culture Documents
Sutrop 2001
Sutrop 2001
http://fmx.sagepub.com/
Published by:
http://www.sagepublications.com
Additional services and information for Field Methods can be found at:
Subscriptions: http://fmx.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations: http://fmx.sagepub.com/content/13/3/263.refs.html
What is This?
URMAS SUTROP
Institute of the Estonian Language
The list task and its two parameters (term frequency and its mean position in the lists)
are discussed here. A new simple cognitive salience index, S = F/(N mP), that com-
bines the two list task parameters is presented together with the procedure for its cal-
culation. The cognitive salience index is normed to vary between 1 and 0. The basic
terms in every domain are the most salient. The salience index of the ideal most
salient term has the figure 1 and that of the term not mentioned at all the value is 0.
The cognitive salience index gives comparable results between different investiga-
tions, as it does not depend on the length of the individual lists. The cognitive salience
index is compared with some earlier salience indices.
In this article, I introduce a new cognitive salience index that combines two
list task parameters. Then, I discuss the earlier free-list salience indices pro-
posed by Smith (1993; Smith et al. 1995; Smith and Borgatti 1997) and an
earlier cognitive salience index (Sutrop 1998, 2000), and compare them with
my new cognitive salience index. I briefly discuss the list task and its two
parameters (term frequency and its mean position in the lists). Finally, I give
a short procedure to calculate the cognitive salience index and draw some
conclusions.
1
LIST TASK
Under the term list task, I place written or oral interviews in anthropology,
linguistics, psychology, or other social sciences. The format of the list task is,
“Please list all Xs that you know.” If the interviews are conducted in the writ-
ten form, then the question is, “Please write down all Xs that you know.” You
can use this form only if the subjects are literate. In the case of oral interviews
(both with literate and illiterate subjects), you should ask, “Please name all Xs
that you know.”
This article was read at the International Conference on Cognitive Typology, Antwerp, Belgium,
12–14 April 2000. I thank professors Frans Plank, University of Konstanz, Germany; J. Jerome
Smith, University of South Florida; and Jüri Allik, University of Tartu, Estonia for their help. I
also thank three anonymous reviewers for their useful remarks.
Field Methods, Vol. 13, No. 3, August 2001 263–276
© 2001 Sage Publications
263
For instance, you can ask, “Please name all colors that you know,” “Please
write down all animals that you know,” or “Please name everything that you
can sense with your nose.” Such descriptive phrases as in the last example (X
= everything that you can sense with your nose) are needed when the domain
is small or the cover term denotes a cultural or cognitive domain and is, at the
same time, a member of that domain.
Let us now compare the color and smell terms. We can see that the word
color is (in many but not all languages) a cover term that denotes a concrete
domain but is not a member of that domain. In many languages, the term
smell denotes a concrete domain and is, at the same time, its member. It is not
obligatory for well-established domains to have a name. For example, it is
widely known that many Turkic languages have an extremely sophisticated
nomenclature for horses (depending on their sex, age, color, etc.), but the
generic term horse is absent in these languages.
If we restrict the interviews (both oral and written) temporally, for
instance, to two or five minutes, then we are using a restricted list task. If
there is no temporal limit (i.e., the interviewer asks, “Please name all Xs that
you know,” and after the informant stops listing Xs the interviewer asks, “Do
you know more Xs?”), we describe it as a free-list task. In most cases, this dis-
tinction between restricted and free-list tasks is irrelevant. So, we can simply
use the name list task. The list task techniques are described by Weller and
Romney (1988), Weller (1998), and Borgatti (1999).
The next important consideration concerning the list task is the purpose of
the study. If the only intention, for example, of an anthropologist, linguist, or
psychologist is to define certain cultural or cognitive domains, then we need
only one list task parameter—term frequency—to establish a domain. Here,
we need a precondition that a sufficient number of interviews are held. It is
generally accepted that twenty–thirty subjects are needed as a minimum
(Weller and Romney 1988:14; Borgatti 1999:122–30). For example, Davies
and Corbett (1994) held seventy-seven interviews to establish the basic color
terms in Russian. I conducted eighty interviews when investigating color,
smell, taste, and temperature terms in Estonian (Sutrop 1998, 2000,
forthcoming).
With a sufficient number of subjects, we can use the term frequency
parameter quite safely. The terms that are listed only by a single informant or
by very few subjects must be considered as accidental/occasional terms.
Borgatti (1999:125) gives three rules to determine a domain boundary: (1)
Include all items mentioned by more than one respondent, (2) look for a natu-
ral break (in the distribution of terms frequency) or grouping, and (3) define a
boundary arbitrarily. Actually, there are only two rules, because the first is a
special case of the third: One decides arbitrarily that all items mentioned by
more than one respondent should be included.
A rule of thumb here is: If the number of subjects is small (twenty), delete
the terms that are mentioned only by one informant, but if the number of sub-
jects is greater (e.g., fifty–eighty), delete the terms mentioned by only three
or fewer subjects.
The rationale for deleting the low-frequency terms from a domain is the
following: Only terms that are in active use (in that language) are listed with a
high frequency. The low-frequency terms are either in passive use or are used
only in some idiolects. Of course, a domain defined by deleting low-fre-
quency terms (arbitrarily or according to a natural break or grouping) con-
tains many passive terms. For that reason, the second list task parameter, the
mean position of a term, is also important for defining the terms in active use
in a domain.
The mean position of a term is calculated only from individual lists con-
taining that term. It is generally accepted that there is a good correlation
between the frequency and mean rank of a term (i.e., the most frequent terms
are named first and the terms that are named only by a few subjects are named
last) (Bousfield and Barclay 1950). For that reason, the mean rank can be
ignored in a list task if you only want to define the boundaries of a domain.
Actual data show that the correlation between the frequency and mean
rank of a term is not ideal. For example, in my study of the temperature terms
used in the Estonian language, the terms kõrge (high) and madal (low) were
both named at a low frequency by nine subjects out of eighty, but the mean
position of the term high was one, while it was four for the term low. In both
cases, the high mean position shows that if these terms were named at all,
they were named among the first. This anomaly could be explained by the
reflection of school physics on everyday language—all temperatures are
divided into high and low degrees (Sutrop 1998:93).
For a linguistic anthropologist, it is not enough to define only the borders
of a domain. He or she needs more detailed knowledge about the internal
structure of that domain. He or she may ask which are the basic and nonbasic
terms in that domain.2 Other possible applications of the list task data
(co-occurrences of the terms, multidimensional scaling, dichotomous cate-
gory bias, clustering, etc.) are described by Borgatti (1999:126–31) and Rob-
bins and Nolan (1997, 2000).
If you want to establish (cognitively) the basic terms in a domain (in most
cases these are morphologically simple, short, and native words), you need
some objective criteria for ordering the terms and discriminating the salient
group of basic terms from the nonbasic ones. The frequency measure or the
mean position of a term alone is not sufficient. Both measures give different
sets of candidates for the basic status. For that reason, it seems, the frequency
and mean position of a term could be combined into a cognitive salience
index.
mean position of a term, and the occurrence of a term in all idiolects of all
subjects corresponds to the frequency of that term.
A cognitive salience index shows the psychological salience in the list
task combining the frequency and mean position of a term into one parame-
ter. Salience (S) is a fraction:
S = F/(N mP).
The dividend considers the frequency (F) with which a term is named in
the list task. The divisor N mP considers the weight of the mean position (mP)
in which the term is named, and N is the number of subjects. If all subjects
have named a term (F = N) and the mean position of that term is 1, then the
salience (S) is also 1 for that term.
The cognitive salience index is normed to vary between 1 and 0. The basic
terms in every domain are the most salient. The salience index of the most
ideally salient term has the figure 1. Terms that tend to be named last and with
a low frequency have a value declining toward 0. The term that is not men-
tioned at all has the salience 0. The cognitive salience index gives compara-
ble results between different investigations, as it does not depend on the
length of the individual lists.
Frequency, mean position, and the integral cognitive salience index are all
good criteria for discriminating basic terms from nonbasic ones. Sometimes
the discrimination must also be made between more and less basic terms. In
such cases, certain linguistic criteria can well be applied.
DISCUSSION
If the total number of lists is N, count is the length of a list Li, and sequence
is the rank of a term Rj in that list Li, then we get the following formula for the
Smith gives the first term in a list sequence 0, the second term 1, and so
forth. An ideal term that is named by all subjects and always first has a value
Sj = 100%, because
S1 = (Σ(((Li – 0)/Li)100))/N = 100%.
The sequence of the last term in a list has the rank Rj = count – 1 = Li – 1. It
follows that the term that is always named last has a value declining toward 0:
Sj = L = (Σ(((Li – (Li – 1))/Li)100))/N = (Σ((1/Li)100))/N > 0%.
The free-list salience index ranges from 0 to 1, with higher figures denoting
higher salience. For a given term in a given list in which it occurs (ranged
according to its order of appearance in that list), an index score is calculated as
follows:
term index score = (list-length term rank) × (1/(list length – 1))
The term’s index score for each list in which it occurs is thus calculated. A
term’s mean score across all lists (even those in which the term does not
appear) is its free-list salience index. (P. 206)
If the total number of lists is N, the length of a list is Li, and the rank of a
term in a list Li is Rj, then we get the following formula for the free-list
salience index of an item Sj instead of the verbal formulas given by Smith
(1993):
As a term’s mean score across all lists (even those in which the term does
not appear) is its free-list salience index, the index Sj for a term is
Sj = (Σ((Li – Rj)/(Li – 1)))/N.
An ideal term that is named by all subjects and always first has the value
S1 = 1, for
S1 = Σ((Li – 1)/(Li – 1))/N = 1.
The term that is always named last has the value Sj = L = 0, for
Sj = L = Σ((Li – Li)/(Li – 1))/N = 0.
As L is the length of a sample, we may give it the index i (Li) and the for-
mula changes into
S = ((Σ(Li – Rj + 1))/Li)/N.
Now we can see that the corrected formula also contains an error. First, we
must add up the individual scores and after that divide the sum by an indexed
length of a list (Li). Such a division makes no sense. For that reason, the
parentheses in the formula must be changed to make the formula work:
S = (Σ((Li – Rj + 1)/Li))/N.
The first factor, F/N, considers the frequency (F) with which a term is
named in the list task and N is the number of subjects. If all subjects have
named a term, then this factor is 1 for that term. The second factor, (L – mP)/
(L – 1), considers the weight of the mean position (mP) in which the term is
named; L is a parameter that takes into account the length of the lists (L is the
mean length of the individual lists). If the mean position is 1, this factor is
also 1.
The ideal basic term that is psychologically most salient has the value 1 for
both factors, so the product S would also be 1. If the mean position (mP) mea-
sure for some term is equal to the mean length of the list (L) (i.e., mP = L),
then the value for our salience index S = 0; and if the mean position of a term
is greater than the mean length of the individual lists, then the salience index
has a negative value (S < 0).
This cognitive salience index gives the cognitively salient terms positive
figures and the nonsalient terms negative figures. The turning point between
them is a term named as a mean, which is equal with the mean length of the
lists. The cognitive salience index structures the cultural or cognitive domain
according to the salience of the terms in that domain.
I abandon this index here for two reasons. First, the index works only if we
exclude from our sample all the terms mentioned only by a few subjects.
Without doing so, the terms that are named only by one subject at the end of
some lists (mP > L) would give better results than the terms named by two or
three subjects. Second, and this applies to the indices proposed by Smith and
his colleagues as well, the results depend on the length of the lists (i.e., short
and long lists give different results). This causes difficulties in interpreting
the results in a concrete investigation or in the comparison the results of dif-
ferent works done by different researchers.
The cognitive salience index introduced here is different from the free-list
salience index. The free-list salience index is calculated on the basis of indi-
vidual lists, whereas the cognitive salience index for a term is calculated from
its frequency and mean position according to the formula:
S = F/(N mP).
TABLE 1
The List Task Results of the Smell Terms of Estonian
other hand, it gives a higher rank of salience to the term nohu (nasal catarrh),
which was named only once as the fourth term (mP = 4), than it gives to the
term vine (stink), which was named twice (mP = 4).
As we have seen, we should prefer the new simple cognitive salience
index, which is based on two cognitively important parameters (the term fre-
quency and its mean position) and is free from the side effects that depend on
the length of individual lists like the earlier cognitive index and the free-list
salience index. The new cognitive salience index works also with small sam-
ples or small number of subjects. In contrast, the old formula that depends on
the length of the individual lists does not. A sufficient number of subjects is
needed for calculating the old cognitive salience index from the list task data.
This abandoned index gives good results only when we eliminate the
low-frequency terms (i.e., those named only by a few subjects).
The domain of colors is very large in Estonian. In the list task, eighty sub-
jects named 285 different color terms (most were compounds and modified
terms; actually, 638 different color names were named in the list and
color-naming tasks). The mean length of an individual list was 18.94 terms.
In the list task, 71 color terms were named by four or more subjects. Table 2
shows the most salient color terms (the basic terms are printed in italics). The
cognitive salience index makes a clear cut between the basic terms and the
highest nonbasic terms. The basic term gray has the salience rank 11 and the
highest nonbasic term beige has the salience rank 12.
TABLE 2
The List Task Results of the Color Terms of Estonian
Term Gloss F R mP R S R
PROCEDURE
1. One should register the terms exactly in the order the terms were named in the
list task.
2. One should calculate the mean position of a term (mP) from the list task
results:
F is the frequency of a term (i.e., F is the number of the lists where a term is
listed),
N is the total number of the lists (subjects), and
Rj is the rank of a term in an individual list.
The frequency of a term that does not appear at all in the lists is 0 (Fø = 0) and
the mean position of that term is infinite (mPø = ∞).
2.1. The mean position of a term can be calculated as follows:
mP = (Σ Rj)/F.
3. Finally, one can calculate the cognitive salience index for an item (S):
S = F/(N mP).
3.1. For practical reasons, one can calculate the cognitive salience index for
an item (S) also from the formula:
S = F2/(N Σ Rj).
I suggest conducting at least twenty–thirty interviews to calculate the cogni-
tive salience index from the list task data.
CONCLUSIONS
List tasks can be divided into oral or written tasks and restricted (with time
limits) or unrestricted tasks. One can combine two list task parameters (the
frequency of a term and its mean position in the lists where it was named) into
a salience index. A cognitive salience index and a procedure for its calcula-
tion is proposed:
S = F/(N mP).
The cognitive salience index takes into account two cognitively important
parameters: the term frequency and its mean position. The mean position of a
term corresponds to its tendency to occur at the beginning of elicited lists of
terms, while the frequency of that term corresponds to the occurrence of the
term in the idiolects of all subjects.
The third parameter is the number of subjects in the study. For anthropolo-
gists, linguists, and other social scientists who do not feel comfortable with
mathematical formulas, such presentation is psychologically more accept-
able because they can combine two real measures (frequency and mean posi-
tion) into an integral parameter.
The cognitive salience index is free from the side effects caused by the
length of individual lists. This makes the results of every investigation com-
parable with every other. The cognitive salience index structures the given
cultural or cognitive domain. The basic terms in every domain are the most
salient. The most salient term, the one that is always named first by all sub-
jects, has the figure 1, while the terms in active use produce greater figures
and the passive terms in the domain produce smaller figures. The less salient
terms have a value declining toward 0. The terms that do not appear at all get
the value 0 in the list task.
NOTES
1. I use the term list task (Davies and Corbett 1994, 1995; Sutrop 1998, 2000, forthcoming)
synonymously with terms such as free list (Smith 1993), free-list (Smith et al. 1995), free-recall
listing (Weller 1998), free listing (Weller and Romney 1988; Trotter and Schensul 1998), free
listing task (Robbins and Nolan 1997), free-listing (Robbins and Nolan 2000), freelist (Borgatti
1999), or Lenneberg’s Approach A in a study of the color terms (Lenneberg 1967:339–40, 345),
associative responses (Bousfield and Sedgewick 1944), and so forth.
2. I define basic term in the tradition in which Berlin and Kay (1969:5–7) defined basic color
terms:
REFERENCES
Berlin, B., and P. Kay. 1969. Basic color terms: Their universality and evolution. Berkeley: Uni-
versity of California Press. Reprinted in 1991 with an additional bibliography by L. Maffi.
Borgatti, S. P. 1999. Elicitation techniques for cultural domain analysis. In Enhanced
ethnographic methods: Audiovisual techniques, focused group interviews, and elicitation
techniques, edited by J. J. Schensul, M. D. LeCompte, B. K. Nastasi, and S. P. Borgatti,
115–51. Ethnographer’s toolkit, Vol. 3. Walnut Creek, CA: AltaMira.
Bousfield, W. A., and W. D. Barclay. 1950. The relationship between order and frequency of
occurrence of restricted associative responses. Journal of Experimental Psychology 40 (5):
643–47.
Bousfield, W. A., and C.H.W. Sedgewick. 1944. An analysis of sequences of restricted associa-
tive responses. Journal of General Psychology 30:149–65.
Davies, I., and G. Corbett. 1994. The basic color terms of Russian. Linguistics 32:65–89.
———. 1995. A practical field method for identifying basic colour terms. Languages of the
World 9 (1): 25–36.
Lenneberg, E. H. 1967. Biological foundation of language. New York: John Wiley.
Robbins, M. C., and J. M. Nolan. 1997. A measure of dichotomous category bias in free listing
tasks. Cultural Anthropology Methods 9 (3): 8–12.
———. 2000. A measure of semantic category clustering in free-listing tasks. Field Methods 12
(1): 18–28.
Smith, J. J. 1993. Using ANTHROPAC 3.5 and a spreadsheet to compute a free-list salience
index. Cultural Anthropology Methods 5 (3): 1–3.
Smith, J. J., and S. P. Borgatti. 1997. Salience counts—and so does accuracy: Correcting and
updating a measure for free-list-item salience. Journal of Linguistic Anthropology 7 (2):
208–9.
Smith, J. J., L. Furbee, K. Maynard, S. Quick, and L. Ross. 1995. Salience counts: A domain
analysis of English color terms. Journal of Linguistic Anthropology 5 (2): 203–16.
Sutrop, U. 1998. Basic temperature terms and subjective temperature scale. Lexicology 4 (1):
61–104.
———. 2000. The basic colour terms of Estonian. Trames 4 (1): 143–68.
———. Forthcoming. The vocabulary of sense perception in Estonian: Structure and history.
Trotter, R. T. II, and J. J. Schensul. 1998. Methods in applied anthropology. In Handbook of
methods in cultural anthropology, edited by H. R. Bernard, 691–735. Walnut Creek, CA:
AltaMira.
Weller, S. C. 1998. Structured interviewing and questionnaire construction. In Handbook of
methods in cultural anthropology, edited by H. R. Bernard, 365–409. Walnut Creek, CA:
AltaMira.
Weller, S. C., and A. K. Romney. 1988. Systematic data collection. Vol. 10, Qualitative
Research Methods. Newbury Park, CA: Sage.
URMAS SUTROP, director of the Institute of the Estonian Language, Tallinn, Estonia,
has a diploma in biology (University of Tartu, Estonia) and a Ph.D. in general linguistics
(1998) from the University of Konstanz, Germany. His research interests include
ethnolinguistics, lexicology, historical linguistics, theonyms, and Finno-Ugric lan-
guages. Recent publications include “Temperature Terms in the Baltic Area” (1999) in
Estonian: Typological Studies III, Tartu; “Basic Terms and Basic Vocabulary” (2000)
in Estonian: Typological Studies IV, Tartu; “From the ‘Language Family Tree’ to the
‘Tangled Web of Languages’ ” (2000) in C9IFU, Tartu; and “Umwelt—Word and Con-
cept: Two Hundred Years of Semantic Change” (2001) in a special issue of Semiotica
about Jakob von Uexküll.