You are on page 1of 9

Journal of the International Phonetic

Additional services for Journal

of the International

Phonetic Association:
Email alerts: Click here
Subscriptions: Click here
Commercial reprints: Click here
Terms of use : Click here

Spanish intonation: Design and implementation of a

machine-readable corpus
Miriam Cid Uribe and Peter Roach
Journal of the International Phonetic Association / Volume 20 / Issue 02 / December 1990, pp 1 - 8
DOI: 10.1017/S0025100300004163, Published online: 06 February 2009

Link to this article:

How to cite this article:
Miriam Cid Uribe and Peter Roach (1990). Spanish intonation: Design and implementation of a
machine-readable corpus. Journal of the International Phonetic Association, 20, pp 1-8
Request Permissions : Click here

Downloaded from, IP address: on 18 Mar 2015

Spanish intonation:
Design and implementation of a machine-readable corpus
Department of Linguistics and Phonetics, University of Leeds, Leeds LS2 9JT, U.K.

1. Introduction
The intonation of Spanish does not appear to have received the same degree of attention
as the Spanish segmental system. Furthermore, much of the information available and the
analyses proposed until fairly recently seem to have been largely arrived at on the basis of
subjective, impressionistic considerations. It is only in the last few years that there seems
to have been a shift of emphasis in the study of Spanish prosody and more research of an
experimental and instrumental nature is being carried out.
Spanish intonation has traditionally been described in a global rather than in an
atomistic way, in terms of intonation contours where the end of the contour is the factor
that bears the linguistic significance. In a manner similar to the structuralist treatment of
American English, Navarro Tomas (1974) and others after him (e.g. Canellada y Madsen
1987) claim that Spanish intonation is to be described in terms of three different levels at
the end of an intonation contour which are manifested in five inflexions: cadencia, a low
fall {terminacion grave), which expresses absolute finality; anticadencia, which constitutes
the end of a subordinate clause and is a high rise; semicadencia, which is a fall but less
complete than the cadencia and expresses non-finality, series of elements or uncertainty1;
semianticadencia, which is a rise but less high than that of the anticadencia reflecting
oppositions and contrasts of a secondary kind; and 'level' which ends at the same level as
the body of the group, reflecting the interruption of an idea.
Thus, Navarro Tomas distinguishes five different significant 'tonemes'pitch shapes
plus a final musical pitch valuethat should account for the entire Spanish intonation
system; he claims that this can be further reduced to just two basic patterns, A and B:

These basic patterns have as their main units the phonic group ("a ... portion of
discourse between two pauses" according to Quilis and Fernandez, 1985 and Navarro
Tomas, 1974) whose individual tonal behavior provides the levels that are characteristic of

* Now at Facultad de Educacion, Universidad del Bio-Bio, Campus Chilian, Chilian, Chile.

Journal of the International Phonetic Association (1990) 20:2.


first two are really necessary as the level tone is an allotone of the fall. It needs to be
emphasised that none of the accounts referred to so far makes any provisions for
considering details of internal variations of pitch inside the 'phonic groups'.
Navarro Tomas regards the melodic unit as "the shortest portion of speech with a sense
of its own and with a definite musical form" (1968: 61); this can be measured in terms of
the number of syllables it contains, and coincides with the phonic group. He claims that
the grouping of words into melodic units is done according to the particular rhythmic nature
of the language and that there is an evident difference between the length of the melodic
units in ordinary conversation (which favors short units) and text reading (which shows
longer units). The results discussed below which were obtained from a corpus of spoken
Spanish support this claim and suggest that stylistic variation is a determinant in the length
of melodic units.
In the conventional treatment of Spanish, then, the phonic group or tonemaincluding
the 'phonemes' of juncture, pitch, and stressis taken to be the most basic element of
intonation, whose functions are both grammatical and attitudinal, and whose main forms
are represented by the Falling, the Rising, and the Level tonema.


The corpus of spoken Spanish

The study reported in this paper began with the belief that research in speech prosody
should make use of realistic data based on real-life use of language, rather than on made-up
examples (Cid Uribe 1989). We embarked on the construction of a recorded corpus of
spoken Spanish; the advantages of a corpus-based approach to linguistic research are
outlined in Sampson (1987). The main goal of our work was to design a machine-readable
corpus in the form of a database that would enable comparisons to be made with the
Spoken English Corpus, a collection of 52,637 words of spoken English that has been
prosodically transcribed (Knowles and Lawrence 1987).
The building of the corpus necessitated the making of a number of decisions in the light
of the requirement to allow comparative study of Spanish and English prosody. There are
many variables to be considered in the selection of a sample of speakers, and to be fully
representative a corpus would have to run through all permutations of all variable values.
This would have resulted in an impossibly large amount of data for a pilot study such as
this, and therefore some variables were held constant and a few were varied. The variables
we considered were the following:
1. Gender: we have worked on the basis that the corpus should from the outset
contain equal numbers of female and male speakers.
2. Educational level: this was kept fairly constant, all informants being professional
people in Spain with at least five years of university studies.
3. Age: all informants were in their late twenties or early thirties.
4. Geographical origin: informants from different parts of Spain were used, though
not all regions are represented in the data so far.


5. Speaking style: we have used a wide range of styles and degrees of formality,
though we are particularly interested in examining unscripted, spontaneous speech.
The informants who were specially recorded for the corpus were seven native speakers
of Spanish. Their spontaneous speech, which makes up 36% of the corpus, was recorded
in the studio of the Department of Linguistics and Phonetics at the University of Leeds.
The other recorded material was taken from broadcasts. The three major sources of data
were therefore the following:
1. Speech produced by the seven Spanish-speaking informants who provided
samples of six different styles: spontaneous dialogue, spontaneous autobiography,
spontaneous anecdote, poetry reading, narrative reading, and descriptive reading.
2. Video recording of TV news programs and commentary as broadcast on the
Spanish television network.
3. Video recording of a special overseas TV report.
After the recordings were made, the analysis of the data proceeded through three stages:
(i) orthographic transcription, (ii) syllabification, and (iii) prosodic transcription. In the
orthographic transcription all the punctuation marks usual for Spanish are included: this
was thought necessary so that the corpus could be used by researchers in other areas such
as lexical, grammatical, or semantic studies. The syllabification was carried out because
the syllable is such an important unit in phonology, both as a combinatory unit for
phonemic segments, and as basic unit for the assignment of prosodic features. In theory
the syllabification could have been done automatically by computer, and this may be tried in
the future. At this stage, however, we preferred to mark syllable boundaries by hand: this
meant that where an intervocalic consonant could have been assigned either to a syllable
coda or to the onset of the following syllable, the decision could be based on known word
divisions. As part of this process we also marked connected speech effects such as hiato,
sineresis, or sinalefa.
The final stage was the prosodic transcription. In this, we aimed to follow as closely as
possible the conventions used by the Spoken English Corpus in order to ensure
comparability. The approach used is essentially a conventional British 'tonetic' scheme.
This involved first marking major and minor tone unit boundaries: this was done on the
basis that major boundaries would exhibit a pause, and would occur at a major syntactic
boundary, while minor tone unit boundaries were identified by such factors as a break in
the pitch contour, a rhythmical discontinuity or a slight hesitation. Minor tone unit
boundaries are marked with a single vertical bar, major ones with a double bar. Syllable
boundaries inside the word are marked with a hyphen (syllable boundaries between words
are taken to be indicated by the space); other features of connected speech that are marked
are the following:
hesitation or incompleteness
compression at word boundary
compression at syllable boundary


Once the tone units had been marked, the tone markings were inserted. At this stage it
was assumed on intuitive grounds alone that a system of tone-marking devised to represent
English prosody would be adequate for Spanish as well. At some stage it will be necessary
to justify this experimentally, but for the present we consider that the set of tones used is of
such a general nature that there can be little to tie it to a particular language's phonology.
The tone-marking convention used requires the transcriber to put one of the set of tone
marks on each stressed syllable: this approach (a rather 'phonetic' one) is somewhat
different from the alternative British approach which restricts the use of tone-marks to a
single, 'nuclear' syllable in each tone unit, corresponding to a 'sentence stress', as part of a
more complex intonational structure as proposed by O'Connor and Arnold (1973),
Halliday (1967), Crystal (1969), and others.
To avoid the problems that arise from non-standard characters being used on different
computer peripherals, we used numerical codes to represent tones, as follows:

high fall
low fall
high rise
low rise

The resulting corpus of spoken Spanish contains 25,250 words appearing in 45 texts of
different lengths, grouped into the 15 different categories shown in Table 1.
Table 1. Speech categories in the corpus.

Spontaneous dialogue
Spontaneous autobiography
Spontaneous anecdote
Poetry reading
Descriptive reading
Narrative reading
TV report
TV news headlines
TV home news
TV news: political scandal
TV news: weather
TV news: tourism
TV news: sport
TV news: international
TV news: miscellaneous

No. of words

% of total


The list of these categories should not be regarded as closed: we anticipate future research
adding such things as radio talks, children's speech, lecture-style speech, preaching,
political speeches, and so on.


It can be seen that spontaneous speech has been given considerable importance in the
construction of this corpus: the three categories of spontaneous speech form 36.46% of the
total corpus. In general, material of this sort has been lacking in previous studies of
Spanish, though Canellada y Madsen (1987) does contain some samples of spontaneous
speech, providing a welcome shift of emphasis. In our corpus, the reading style covers
3,407 words, this comprising 13.3% of the total corpus. The highly stylized speech of
poetry reading has been given a very minor percentage of the total: while it is a form of
speech whose prosody has been much studied (Navarro Tomas 1974; Quilis and Fernandez
1981), it is a style not likely to reflect everyday language.

3. Results from the Spanish corpus

The transcriptions were stored on computer as machine-readable files and this made it
possible to extract information about them automatically. This section presents some of the
preliminary findings that have been made.

3.1 Syllables
As explained above, syllable boundaries were included in the transcription, and apart
from simple boundary marking we also used a [+] symbol to indicate sinalefa, i.e. two
vowels at a word boundary being compressed into a single syllable, as in
la+hora = two syllables rather than three
Inside the word, at a syllable boundary, the symbol [>] was used to indicate the elision of a
consonant resulting in a diphthong pronounced as one syllable rather than two syllables,
cla-va>do = two syllables rather than three.
There are two main features that affected the counting of syllables. A count of the
syllables in the corpus gave a total of 50,037. This gives an average figure of 1.96
syllables per word in our data. However, mean word length may well vary according to
style, and stylistic effects were looked for in the corpus. Considering the different levels of
formality and the styles used, the categories were divided into four main groups ranging
from informal, unscripted speech to highly formal. Group 1 includes those passages in
spontaneous, unscripted speech. Group 2 comprises a mixture of scripted and unscripted
speech. Group 3 includes those passages which contain scripted speech, i.e. passages read
from given texts. Group 4 includes the speech of all read television news texts. The
syllables-per-word figures for these groups show clear differences, as may be seen in
Table 2.
Table 2. Mean number of syllables per word, by group.

No. of words

No. of syllables

Mean syl/word


The obvious explanation for the shorter mean word length in less formal styles must be
the vocabulary used (longer words tending to be more typical of formal speech), but there
are other contributory factors. One is the presence of monosyllabic hesitation sounds
(transcribed 'e@'), and the frequent occurrence in spontaneous speech of monosyllables
like si and no, and the use of el and la as a form of hesitation. Sinalefa and sineresis,
which shorten words by decreasing the number of syllables, appear to be more frequent in
those passages belonging to Groups 1 and 2, and less frequent in the more formal passages
of Groups 3 and 4. In the reading of scripted texts the opportunities for natural
spontaneous repetitions, false starts, and hesitations are much fewer.

3.2 Tone units

The corpus was divided up into major and minor tone units; we followed Canellada y
Madsen's (1987) principles for tone-unit division in Spanish. In the whole corpus there
are 3,631 major and 8,610 minor tone units marked; major tone units contain an average of
2.37 minor tone units and an average of 7.02 words; minor tone units contain an average of
2.96 words.
Since it was possible that the length of a tone unit might vary according to style, a
figure was calculated for each passage. The passages were again collected into four
groups, and the results in Table 3 show clearly that the style of speech has an effect on the
composition of the tone units: the less formal the speech, the smaller the number of minor
tone units per major tone unit.
Table 3. Major and minor tone units, by group.


Major TUs

Minor TUs

Mean minor per major


As the length of tone units was expected to be an important parameter to be considered

when dealing with our comparison of Spanish and English speech, a mean length in terms
of syllables was calculated for major and for minor tone units. Major tone units contain an
average of 13.78 syllables, minor tone units an average of 5.80 syllables.

3.3 Tones
The total number of tones marked in the corpus of spoken Spanish is 19,626, with the
relative frequency of occurrences of each tone distributed as shown in Table 4.
Table 4. Relative frequency of tones.

Percentage of total





Percentage of total



General conclusions

The results obtained from this preliminary computer analysis of corpus of spoken
Spanish suggests the following conclusions:
1. Spanish seems to favor falls rather than rises, with a slight preference for low falls
as against high falls. Rises (high and low) only make up 21.3% of the total in
Spanish, while falls make up 37.76%.
2. The very high frequency of level tone occurrence in the Spanish data needs
explaining. To some extent this is a consequence of the model of intonation
adopted (for the sake of compatibility with the Spoken English Corpus): we suspect
(but have not yet been able to prove) that many or most of our level tones fall on
what would have been labelled as non-nuclear (or 'non-tonic') stressed syllables in
an analysis based on a tone unit with a single nuclear ('tonic') syllable. However,
it is also possible that level tone does occur frequently in nuclear position and that in
this case it is an 'allotone' of fall or rise.
3. Spanish seems to favour relatively short major tone units: these have an average
length of 7.2 words. The average of minor tone units to major ones is also low:
2.4. The length of tone units increases as a function of the degree of formality; in
dialogue this effect is emphasised as the need for turn-taking makes shorter
utterances desirable.
4. Since syllable boundaries were marked in the text, it has been possible to extract
some syllabic information. The overall figure for the average number of syllables
per word is 1.96.

Future developments

It is hoped that the work begun here can continue to grow as a result of collaborative
work; our immediate concern is to complete the work of comparing the Spanish results
with the English data. Among other research areas that we intend to work on are the
following: (i) refinement of the transcription system: while we would like to retain a system
that allows comparison with other corpora, we feel that alternative analyses proposed in
various recent publications on prosodic research may allow a much richer representation of
prosodic facts; (ii) we wish to attempt acoustic analysis of the prosodic information in the
data: this will involve digitization of the material and computer analysis for pitch extraction:
we hope eventually to publish this data in CD-ROM format; (iii) as mentioned above, we
wish to broaden in a systematic way the coverage of different varieties of spoken Spanish;
finally, (iv) a great deal remains to be done to make possible the discovery of relationships
between prosody and discourse.

CANELLADA, M. J. and MADSEN, J. (1987). Pronunciacion del Espanol.
Editorial Castalia.



CANELLADA, M. J. and MADSEN, J. (1987). Pronunciation del Espanol. Madrid:
Editorial Castalia.
ClD URIBE, M. E. (1989). Contrastive Analysis of English and Spanish Intonation using
Computer Corporaa Preliminary Study. Unpublished Ph.D. dissertation, University
of Leeds.
CRYSTAL, D. (1969). Prosodic Systems and Intonation in English. Cambridge:
Cambridge University Press.
HALLIDAY, M. A. K. (1967). Intonation and Grammar in British English. The Hague:
KNOWLES, G. and LAWRENCE, L. (1987). Automatic intonation assignment. In Garside,
R., Leech, G., and Sampson, G. (editors), The Computational Analysis of English,
139-148. London: Longman.
NAVARRO, T. (1974). Manual de Entonacion Espanola. Fourth edition. Madrid:
Ediciones Guadarrama.
O'CONNOR, J. D. and ARNOLD, G. F. (1973). The Intonation of Colloquial English.
Second edition. London: Longman.
QUILIS, A. (1981). Funciones de la entonaci6n. In Homenaje al Ambrosio Rabanales,
Boletin de Filologia (Santiago de Chile) 31,443-460.
QUILIS, A. and FERNANDEZ, J. A. (1985). Curso de Fonetica y Fonologia Espanolas.
Eleventh edition. Madrid: Consejo Superior de Investigaciones Cientificas.
ROACH, P. J. (1983). English Phonetics and Phonology. Cambridge: Cambridge
University Press.
SAMPSON, G. (1987). Probabilistic models of analysis. In Garside, R., Leech, G., and
Sampson, G. (editors), The Computational Analysis of English, 30-41. London: