Professional Documents
Culture Documents
音楽の拍節とフレーズ
—時間構造の認知理論—
大高誠二
(May 10th 2021 ver.1.2)
(Part1 (Chapter 1–4) only)
Musical Meter
and Phrase
A Cognitive Theory of
Temporal Structure
Seiji Ootaka
2021
1 / 40
PART 1
Metrial Structure as Cognitive
Schema
2 / 40
Chapter 1 What is metrical structure?
拍子記号 小節線 小節線
↓ ↓ ↓
1小節 1小節
Fig. 1-1.1
3
1 In the English-speaking world, to refer to the metrical structure of a measure, the term "meter" was first used in the
early 20th century in the United States. Until then, for this meaning, time or measure was used. Charles Elson wrote at
1小節
the article Time in his music dictionary published 1905: An eminent American teacher has suggested meter as a fitting
word, but this would lead to confusion with hymns and poetical meters. As the word time is almost always used to
denote the divisions of the measure, we present the divisions under this head (Cf. Elson's Music Dictionary, p.263). Thus
we should note that the term metrical structure is also not traditional term for this meaning.
3 / 40
Chapter 1 What is metrical structure?
Fig. 1-1.2 J. G. Sulzer, Allgemeine Theorie der schönen Künste (1st ed.), Part 2(K–Z), 1774, p.1137
4 / 40
auf einen taktmässigen Satz, so treten viel maunigt'altigero Abstu-
fungen der Betonung hervor • . Obenan stehen die Hauptlheile, dann
die gewesenen Haupliheile, dann die Hauptglieder
Chapter 1und alle
What is Unter-
metrical structure?
..
haben wir an einem Sätzchen nur ungefähr (nicht ganz genau und
Fig. 1-1.3 A. B. Marx, Allgemeine Musiklehre, 1839, p.119
vollständig) ftinf Abstufungen von Aceenten ·anzumerken gehabt. -
Man errä,th übrigens schon von selbst, dass das Gesetz der Accen-·
toationHowever,
nicht bis onezu
premise
diesem must not bevon·
Grade overlooked
kleinlichenhere. Unterschieden
Even if there is a und
difference in
Abstufungen the intensity of the
durchzuf"dhren ist.notes,
Für ifßiessenden
a listener does not perceive
Vortrag und itbeson-
as a
dersmeasure, it does not Bewegung
bei lebhafterer function as a measure.
genügt es, The pattern of differences inaltet•
mit Uebergehung
kleinem
sound Unterschiede
intensity itself isnur die wichtigem
something Aceente
that exists outside of us.zu Thebeobachten.
regularity of
In einem ..dndmltll
strong beats in musical Largltetto
oderstimuli has only würde der obige
the potential to be Satz etwaasso
understood
t: "'
wc. r. rJ r. u@kp;:st4sg-r=F! J II
measures byI listeners.
" _ _It is only when we assume that
,_ --..., the difference
" , .- in
119 sound intensity always causes the listener to recognize a specific metrical
structure 'that we can discuss the stimulus itself as the same as the. metrical
zu . aceentuiren .· sein; bei lebhafterer Bewegung. würden auch die
structure. For example, in language, the sequence of letters "get" or its
einfachen Aceente wegfaUen, und dadurch die Iebhaftern (mit Bo-
pronounced sound has its effect only when the reader or listener recognizes
. Gänge noch Zusammenhang gewinnen .•
pieseit as the English word
Befremngen von "get".
der Therefore, we can
Strenge der identify
Regel the string
werden in ofder
letters
Vor-or
the sound
tragslehre itself in
näher as the English word
Erwägung "get" only
kommea. on the
'Man assumption
muss thatRegel
aber die people
. .
will surely identify it as the English word "get". In linguistics, we don't need
to think about the image recognition of the letters, so the discussion will be
based on the assumption that the string of letters should be understood as
the verb "get" in English.
5 / 40
Chapter 1 What is metrical structure?
6 / 40
Chapter 1 What is metrical structure?
7 / 40
Chapter 1 What is metrical structure?
The parts of music that have a tendency to be regular the most would be
the parts that have the most to do with this cognitive framework. Therefore,
it seems that metrical structure best reflects the nature of this framework. In
fact, metrical structure is the very essence of this cognitive framework. Let's
think about this further.
1.2.1 Schema
The framework inferred in the previous discussion is very similar to
structures called frames or schemas that have been proposed in cognitive
science such as psychology and artificial intelligence. A frame or schema is a
hypothetical data structure that is assumed to be used by humans when
processing information from the outside world. Marvin Minsky describes a
frame as "a data-structure for representing a stereotyped situation, like being
in a certain kind of living room, or going to a child's birthday party." Similar
concepts are used in various fields, but there is no unified theory and the
terminology is disparate. Minsky, who left a great impact on artificial
intelligence research, used the term frame, so it is often referred to as frame
in the field of artificial intelligence. The term script was also used, especially
when discussing events along a time axis. In psychology, on the other hand,
the term schema is generally used, derived from Greek philosophy and Kant's
term. In linguistics, the term frame has been popularized by Charles J.
Fillmore. Structures similar to frames and schemas are also used in
programming, where the template for creating objects is called a class or
type. Eleanor Rosch discusses a similar problem using the term category.
8 / 40
Chapter 1 What is metrical structure?
Here, variables are data structures with various levels of abstraction. Let us
explain these features using a schema that is assumed to be used for human
face recognition as an example. In the face schema, we can assume that there
are places to store data for variables such as eyes and a nose. Each variable is
9 / 40
Chapter 1 What is metrical structure?
not just data, but also an eye schema and a nose schema, and the eye schema
contains smaller, more specific schemas that describe eyelashes and pupil. If
we were to use tools such as microscopes, it would not be impossible to
subdivide this into smaller and smaller schemas, but that is no longer the
data that we use to understand human faces. On the other hand, we can say
that the facial schema is a part of a larger schema of the human body.
A face and the eyes or nose of it are in a relationship of whole and part. In
this case, the schema of face and the schema of eyes or a nose have similar
functions, but they are completely different schemas in terms of content.
This kind of relationship can be called a "whole and part" (or "part-of"/"has-
a") relationship. In addition, there is also the "is a kind of" (or "is-a")
relationship, such as the relationship between the schema of an automobile
and the schema of a truck. In the case of the latter relationship, we can think
of a more specific schema of a truck as being created by adding sub-schemas
for expressing a truck in a nested manner to a more vague and general
schema of an automobile. In such a case, the schema of a truck is said to be a
descendant schema of the schema of an automobile and inherits the
characteristics of the schema of an automobile.
10 / 40
Chapter 1 What is metrical structure?
The level of beat or meter is often set near the level at which humans
perceive the metrical structure most clearly. This is consistent with the idea
of the basic level in Roche's category theory.
From the above, we can conclude that metrical structure of music is the
human schema itself, or the reflection of that schema in various works. In
other words, the discussion of meter is nothing but a discussion of human
schemas for music cognition.
11 / 40
Chapter 2 Events and Time
12 / 40
Chapter 2 Events and Time
Fig. 2-1.1
Most of the musical events we usually deal with should have a curve like
(b). With such a sharp increase in volume, people can figure out where the
start is. Therefore, in our standpoint, we cannot use the IOI (inter-onset-
interval), which is commonly used in discussing rhythmic phenomena,
without the condition that "human perceive it".
For the listener, a clear indication of where an event ends is when the
sound suddenly disappears, which should be taken as a sense of the
beginning of an event that is actually silence.
2.1.2 Accent
An accent is an attribute attached to an event, as Carl Schachter describes
it as "colors the event".2 An event is a group of stimuli in time, and an accent
2 The metrical accent, therefore, always colors the event—tone, harmony, occasionally even silence—that falls on the
favored point. (Schachter, Unfoldings: Essays in Schenkerian Theory and Analysis.,1999, p.82)。
13 / 40
Chapter 2 Events and Time
is a kind of "value" attached to the event. The term "accentuate" is often used
in general, but one should be careful about this. This is because an accent is
not an event or an object, and the word "accentuate" should not be
understood as to attach something to a note. It means to change some value
of an event.
Events that would have been identical without the difference in intensity
will be divided into different types, by differentiating the "strength" of
accents. If we use accents of two different intensities, the events will be
divided into two kinds, while events with the same strength of accent are
"identical events". This has a similar effect to differences in timbre, etc.
Giving a difference in accent is almost the same as making a difference in the
type of event.
14 / 40
Chapter 2 Events and Time
prominent than the accents of other events. On the other hand, "no accent"
means that the accent of the event is weaker or less prominent than the
accents of other events.
On the other hand, if we use accents as a proxy for human attention, then
the phrase "no accent" may be valid. However, this is an idea that requires
caution, as it may lead to the extremely perverse idea that human attention
exists in the stimulus. In fact, it is humans who perceive the accent, and only
the possibility of humans perceiving the accent exists in the stimulus. It is
human beings who interpret accents as being in a certain position, and it is
human beings who recognize those differences as meaningful differences and
group them or pay attention to them. All of this suggests that the problem of
accent is much the same as the problem of the difference between the
starting point of an event and the type of event as interpreted by humans.
The essence of an accent is that the starting point is felt, or that the starting
point is more prominent than others.
15 / 40
Chapter 2 Events and Time
So far, we've discussed the concept of accent and the caveats and
confusion associated with it. However, these misconceptions about accents
have actually been put to good use as a convenience. The concept of accent is
a device to materialize human responses outside of human beings. Using
accents, instead of saying that humans sense the beginning of an event, we
can say that the event has an accent, and when humans pay attention to one
of these accents, we can say that the event has a strong accent. And when an
event is categorized into two types according to the strength of the sound,
we can say that "strong accent and weak accent are alternating. In this case,
the accent is not the objective strength of the stimulus, but a concept that
takes into account the human response to the stimulus and then reassigns it
to the stimulus as if it really exists outside.
16 / 40
Chapter 2 Events and Time
duple meter schema, then they are either accepted by another schema or
they are incomprehensible. If the composer or performer wants the piece to
be understood as duple, he or she will place the notes in such a way that
they will be acceptable to duple meter schema. However, no matter how
acceptable the arrangement of sounds is, sounds do not become meter
without human involvement. Differences in the stimuli themselves, such as
the placement of the sounds, can only have possibility to be interpreted in
certain ways by the listener.
The word "schema" is derived from the Greek word σχῆμα (skhêma),
which means "shape". Aristoxenos discusses the relationship between
sounds of music, syllables of language or steps of dance and the time
divisions created by such temporal events, comparing it to the relationship
between "things that can take shape" and "shape".3 In his theory, rhythm is a
particularly desirable shape among such shapes. The species of rhythm
appears in the form of foot.4 Feet, the species of rhythm, was based on the
length of time, and had dominated the idea of meter in music, at least until
3 Cf. Pearson, L., Aristoxenus Elementa Rhythmica, 1990, pp.3–9: We must recognize rhythm and the rhythmizable medium
(rhythmizomenon) as separate notions and separate natures, related to one another in the same kind of way as shape and
shapable material in relation to it. [...] The shape (schema) is a particular arrangement of the parts of the object. [...]
rhythm occurs when the division of chronoi takes on some particular arrangement, because not eyery arrangement of
chronoi is rhythmical. [...] There are three kinds of rhythmizomena, speech, melody, bodily movement.
4 Cf. Pearson, L., Aristoxenus Elementa Rhythmica, 1990, p.11: The means by which we mark rhythm and make it
perceptible to the senses is a foot, one or more than one.
17 / 40
Chapter 2 Events and Time
the 18th century. A major change in this way of thinking was the
understanding of meter by accent rather than by shape.
18 / 40
Chapter 2 Events and Time
hemiola. However, there is a major flaw in his meter concept that needs to
be mentioned here.
5 Clearly, Yeston regards that meter exists in its complete form in musical stimuli without human involvement (The
Stratification of Musical Rhythm, 1976, p.65ff.): In order to create some regular grouping of elements within a simple
pulse, there must be some event occurring at regular intervals within it. [...] The fundamental logical requirement for
meter is therefore that there be a constant rate within a constant rate—at least two rates of events of which one is faster
and another is slower. [...] In view of these two necessary rhythmic strata, the question must now be asked: On which
level does the meter appear—on level A or on level B?[...] There is apparently, then, no such thing as a level of meter or
a level on which meter may appear; but rather, meter is an outgrowth of the interaction of two levels—two differently-
rated strata, the faster of which provides the elements and the slower of which groups them.
19 / 40
Chapter 2 Events and Time
Fig. 2-2.1
20 / 40
Chapter 3 Understanding by Schema
Chapter 3 Understanding by
Schema
We illustrate this with a very simple STAIR schema. Let us simply consider
stairs that goes straight ahead. In this case, the type of schema is
distinguished only by the difference in height and depth of each stair. Any
further details have little to do with going up and down the stairs. In other
words, the schema of stair can be expressed simply with the differences in
height and depth parameters. When a pedestrian is able to walk up and
down the stairs smoothly, the pedestrian understands the stairs.
21 / 40
Chapter 3 Understanding by Schema
STAIR schema. The pedestrian would have understood the stairs if he or she
could find in his or her mind a schema that would explain it. The pedestrian
can then easily prepare such a suitable schema by simply changing the
parameters of depth and height. It is important to note that it is possible to
obtain a schema for any stair by simply changing the parameters. This is
because pedestrians cannot have an infinite number of schemas for every
stairs, so a mechanism to obtain appropriate schemas with fewer procedures
and less storage space is necessary. This point will be discussed soon.
Now, let us apply this to the case of a musical object. Assume that there is
a musical object with a certain structure. When this can be interpreted as an
instance of a schema, we can say that this schema explains this musical
object. When a listener can find in his or her mental stock a schema that
explains this object, he or she has understood the object. Here, the listener
should not have a separate schema for each of the various musical shapes,
22 / 40
Chapter 3 Understanding by Schema
but should be able to generate a schema for each shape by some simple
mechanism, as described in the example of the stairs. This is because if we
had separate schemas, the listener would need to have a very large number
of different schemas to be able to understand a wide variety of music, and it
would not be easy to find the right one.
23 / 40
Chapter 3 Understanding by Schema
We agree with GTTM in this respect. However, GTTM avoids discussing the
nature of grouping in music by applying the general concept of grouping to
music, which is mainly based on the nature of visual objects, instead of
considering grouping in music. Therefore, although they admit that events
are grouped in music, they do not ask how the grouping of sounds occurs,
and end up discussing how to construct rules for determining the location of
group boundaries based on the various characteristics of musical events. This
is because GTTM ultimately prioritizes the construction of systems that can
be automatically processed by machines, rather than aiming for a human
understanding of music. In other words, GTTM theory is essentially a study
of artificial intelligence using computers, and therefore, in the end, the
groupings in GTTM do not even have to be the same as the groupings judged
by humans.
24 / 40
Chapter 3 Understanding by Schema
In other words, groups in music are based only on schemas. In this study, we
will call this the "principle of grouping". A schema unambiguously
determines a group as an instance. However, a musical group can be
understood by multiple schemas, depending on the interpretive possibilities
of the structure it stands for. This is usually because a group, by itself, does
not fully describe the structure represented by the schema. This may be
similar to the relationship between objects and shadows. An object casts a
certain shadow, but there are multiple possibilities for the object to be
inferred from the shadow. A shadow can have only one contour, but the
groups created by the schema include a nested hierarchical structure.
Question 1:.
If the groups are based on a complexified schema of meter, wouldn't there
just be complex meter in complex groups, and the simple meter that the
listener would normally experience in music would be lost? In music,
25 / 40
Chapter 3 Understanding by Schema
Question 2:.
If there is a schema that instantiates each shape of group, doesn't the
listener have to have a huge amount of schemas beforehand in order to
understand the various groups of music? Wouldn't that be a waste of the
brain's storage capacity?
Question 3.
No matter how many schemas we can prepare, won't there always be some
groups that cannot be supported by them? If the range that can be
supported is fixed, how on earth can the range that can be supported be
determined?
Question 4:.
Even if we could have such a huge number of schemas beforehand, how can
we find the right one each time in a short time?
26 / 40
Chapter 3 Understanding by Schema
27 / 40
Chapter 3 Understanding by Schema
sextuple meter schema. In this way, the listener would not have to think of
each schema as having to be prepared separately, because schemas can be
generated by such a system.
28 / 40
Chapter 4 Schema system of metrical structure
29 / 40
Chapter 4 Schema system of metrical structure
Starting from the schema of a measure, if you divide the measure into
two, you get duple meter, and if you divide it into three, you get triple meter.
Duple and triple meter are each a kind of measure, and inherit the properties
of a measure. The difference between the two and three divisions is just a
difference in parameters in the same mechanism. If we were to divide them
further into two or three, we would generate four, six, or nine beats. In this
way, we can think of a system of schemas that includes duple meter, triple
meter, or four, six, and nine beat meter. By tracing the phylogenetic
relationships of this system, the listener can select the schema that best fits
the subject. The schemas obtained by the basic procedure in a short number
of steps can be treated almost as if they were prepared from the beginning.
That is to say, rather than having all the schemas, the listener would have a
mechanism to generate them. In this way, the listener does not need to
prepare a large number of schemas in advance, and can avoid the problem of
overwhelming the memory capacity.
1. Isochronism
2. Division
3. Simple ratio
So we need the second meta-schema for the division. This actually comes
from the very nature of schemas, "(2) schemata can embed one within the
30 / 40
Chapter 4 Schema system of metrical structure
Of course, this does not mean that the object has to have this exact ratio.
The listener can understand the object as having this ratio roughly.
Distortions can be understood by comparison with something regular. In
other words, distortions can be understood by modifying the schema for
regular things. And in this case, subtle deviations from the exact proportions
will probably be felt as differences in impressions. We do not intend to deal
with such delicate issues here, but we believe that a certain kind of senses
such as "groove" in music for example, is deeply related to the impressions
associated with such discrepancies.
31 / 40
Chapter 4 Schema system of metrical structure
vertical bar on the right-hand side means that we can arbitrarily choose
between two or three divisions. Note that these rewriting rules only
expresses the ratio of the division. The quarter note on the left side means
an arbitrary time division, and the eighth note on the right side represents
the length of the second or third equal division of the left side. Thus, for
example, this is equivalent to that in Fig. 4-1.2. Traditionally, the starting
point for this rewriting was the whole note, or semibrevis.
Fig. 4-1.1
Fig. 4-1.2
Applying this rewriting rules to any position any number of times, all the
standard metrical structures will be obtained. This is the collective definition
of a sequence of symbols that represent time segments. Therefore, we have
defined a grammar that generates all possible standard metrical structures.
Of course, this does not mean that all of the shapes produced by this
rewriting rule will be used with equal frequency. Clearly, it is easier for the
listener to understand regular, repeated shapes. The word "meter" in the
normal usage of the term can be said to indicate relatively typical patterns of
simple measure division among the various possible shapes mentioned
above.
32 / 40
Chapter 4 Schema system of metrical structure
4.1.4 Hemiola
If we make both choices in the above rewriting rules at the same time,
instead of just one, we get the structure of hemiola. In other words, hemiola
is the juxtaposition of different partitions. However, one of them will usually
be the dominant division, and the inferior division will give the feeling such
like irregular syncopation.
Since the hierarchy of beat is based on division, the lower level beat frame
is essentially a division of the upper level beat frame. This gives the sense
that each frame begins to divide again, not superposition of independent
layers of pulses. This is also consistent with the fact that this study considers
the higher level beat as a group. However, if we are feeling two different
divisions at the same time, as is the case with hemiola, it is possible that a
33 / 40
Chapter 4 Schema system of metrical structure
group of beat sequences at a smaller level may be felt beyond the range of
the larger beats.
Doesn't this contradict the isochronous schema? But we have already seen
in Chapter 2 that an event is understood by the location of its starting point.
Therefore, the isochronous schema can be applied even if the second event is
not long enough. This is because the quarter note after the half note can be
understood as a half note with its second half lost. This also means that the
third beat of a triple meter has the same value as the third beat of a
quadruple meter. It is for this reason that Baroque theorists considered the
third beat of a triple meter to be a strong beat.
34 / 40
Chapter 4 Schema system of metrical structure
Similarly, the ratio of half notes to dotted quarter notes is 4:3, and by
subtracting dotted quarter notes from half notes, we get a division of (3+1).
The result is nothing more than a dotted rhythm. This is the same as 6/8
stopped at the length of a half note (See fig.4-2.2).
35 / 40
Chapter 4 Schema system of metrical structure
The (2+1) and (3+1) divisions have the almost same characteristics as
meter. In other words, each of these divisions can be a basic unit of melody
or harmony, just as the beats in ordinary meter. Therefore, these divisions
can be called quasi-meter.
The two rewriting rules added by the (2+1) and (3+1) quasi-meters are
as in Fig. 4-2.3. By adding these rules, the metrical structure can be
extended. Note that the (1+2) and (1+3) division cannot be treated here
because they involve syncopation (see Chapter 6 for syncopation).
FIg. 4-2.3
36 / 40
Chapter 4 Schema system of metrical structure
Fig. 4-2.4
37 / 40
Chapter 4 Schema system of metrical structure
Fig. 4-2.5
Fig. 4-2.6
However, there can be more than one possible way to understand the last
2 in (3+3+2). If we simply think of it as an extension of the hemiola, then
the last 2 is incomplete 3. On the other hand, it could be interpreted as the
one which can be derived with an additional duple divisions applied to the 3
in the quasi-meter (3+1). In that case, the notes would be the same
(3+3+2), but the structure would be different from the case where the
sequence of 3s is broken. This phenomenon is similar to a sentence that
consists of the same sequence of words but can be interpreted in two
different grammatical ways. In music, such cases that multiple
interpretations are possible are not at all uncommon.
The (3+3+2) division becomes difficult to recognize when the notes are
subdivided. For example, in Scott Joplin's "The Entertainer" (Ex. 4-2.1), the
beats of the (3+3+2) division is subdivided. In such a case, it becomes
impossible to identify this quasi-meter only by the ratio of note lengths
appearing in the score. Typical patterns are extracted in Fig. 4-2.7.
38 / 40
Chapter 4 Schema system of metrical structure
39 / 40
Chapter 4 Schema system of metrical structure
40 / 40