You are on page 1of 40

draft translation from Japanese

音楽の拍節とフレーズ
—時間構造の認知理論—
大高誠二
(May 10th 2021 ver.1.2)
(Part1 (Chapter 1–4) only)

Musical Meter
and Phrase
A Cognitive Theory of
Temporal Structure

Seiji Ootaka
2021

1 / 40
PART 1
Metrial Structure as Cognitive
Schema

2 / 40
Chapter 1 What is metrical structure?

Chapter 1 What is metrical


structure?

1.1 Cognitive origins of metrical structure

1.1.1 Where is a measure?


In modern notation, each interval between the vertical lines of a staff (bar
lines) is called a measure (bar). The main unit of note that makes up a
measure is a beat, and measures are classified according to the number and
the kind of notes that make up the measure, which are called meters.1 These
are the most basic temporal structures that make up modern so-called
"common-practice music," in which measures of the same meter are repeated
to create a regular structure.

        

拍子記号 小節線 小節線
↓ ↓ ↓

1小節 1小節

    
Fig. 1-1.1
3

1 In the English-speaking world, to refer to the metrical structure of a measure, the term "meter" was first used in the
early 20th century in the United States. Until then, for this meaning, time or measure was used. Charles Elson wrote at

1小節
the article Time in his music dictionary published 1905: An eminent American teacher has suggested meter as a fitting
word, but this would lead to confusion with hymns and poetical meters. As the word time is almost always used to
denote the divisions of the measure, we present the divisions under this head (Cf. Elson's Music Dictionary, p.263). Thus
we should note that the term metrical structure is also not traditional term for this meaning.

3 / 40
Chapter 1 What is metrical structure?

In reality, listeners cannot directly know where the vertical lines


separating the measures are in scores. If the measures are essential to
listeners, then listeners must somehow know where they are. Therefore,
composers and performers have to give some signal to their music so that
listeners can grasp the position of bar lines.

Dynamic accent (accentuation by intensity of sound) is an extremely


useful tool for this purpose. This is because, unlike changes of duration or
timbre, it can be applied as if it is a marker, with little or no change to the
structure of the music itself. Try placing accents freely at various positions in
an arbitrary melody, no matter how you do it, there is little worry that the
original melody will be lost. This is not the case when changing the length of
a note or changing the pitch. Dynamic accents have little or no significant
effect on the essential structure of the music.

In addition, it has been thought that dynamic accents can express


differences in the hierarchical levels of beats by its intensity. Fig. 1-1.2 is an
example of representation of hierarchy using dynamic accents. You can see
that Johann Georg Sulzer (Fig. 1-1.2) distinguishes between four levels, and
Adolf Bernhard Marx (Fig. 1-1.3) distinguishes between six levels.

Fig. 1-1.2 J. G. Sulzer, Allgemeine Theorie der schönen Künste (1st ed.), Part 2(K–Z), 1774, p.1137

4 / 40
auf einen taktmässigen Satz, so treten viel maunigt'altigero Abstu-
fungen der Betonung hervor • . Obenan stehen die Hauptlheile, dann
die gewesenen Haupliheile, dann die Hauptglieder
Chapter 1und alle
What is Unter-
metrical structure?

abtheilungen, die durch weitere Zergliederung entstehen. Hier -

..
haben wir an einem Sätzchen nur ungefähr (nicht ganz genau und
Fig. 1-1.3 A. B. Marx, Allgemeine Musiklehre, 1839, p.119
vollständig) ftinf Abstufungen von Aceenten ·anzumerken gehabt. -
Man errä,th übrigens schon von selbst, dass das Gesetz der Accen-·
toationHowever,
nicht bis onezu
premise
diesem must not bevon·
Grade overlooked
kleinlichenhere. Unterschieden
Even if there is a und
difference in
Abstufungen the intensity of the
durchzuf"dhren ist.notes,
Für ifßiessenden
a listener does not perceive
Vortrag und itbeson-
as a
dersmeasure, it does not Bewegung
bei lebhafterer function as a measure.
genügt es, The pattern of differences inaltet•
mit Uebergehung
kleinem
sound Unterschiede
intensity itself isnur die wichtigem
something Aceente
that exists outside of us.zu Thebeobachten.
regularity of
In einem ..dndmltll
strong beats in musical Largltetto
oderstimuli has only würde der obige
the potential to be Satz etwaasso
understood
t: "'
wc. r. rJ r. u@kp;:st4sg-r=F! J II
measures byI listeners.
" _ _It is only when we assume that
,_ --..., the difference
" , .- in

119 sound intensity always causes the listener to recognize a specific metrical
structure 'that we can discuss the stimulus itself as the same as the. metrical
zu . aceentuiren .· sein; bei lebhafterer Bewegung. würden auch die
structure. For example, in language, the sequence of letters "get" or its
einfachen Aceente wegfaUen, und dadurch die Iebhaftern (mit Bo-
pronounced sound has its effect only when the reader or listener recognizes
. Gänge noch Zusammenhang gewinnen .•
pieseit as the English word
Befremngen von "get".
der Therefore, we can
Strenge der identify
Regel the string
werden in ofder
letters
Vor-or
the sound
tragslehre itself in
näher as the English word
Erwägung "get" only
kommea. on the
'Man assumption
muss thatRegel
aber die people
. .
will surely identify it as the English word "get". In linguistics, we don't need
to think about the image recognition of the letters, so the discussion will be
based on the assumption that the string of letters should be understood as
the verb "get" in English.

In music, however, it is necessary to start from the level of by


oigitized image Google
recognition of these characters. For example, an object such as a chair or a
desk has certain properties that are independent of humans. However, they
cannot function as chairs or desks on their own, apart from human
perception. They are artificial objects that have been given the potential to be
used as chairs or desks by humans. Similarly, an array of musical stimuli is
just a physical entity that has no musical properties by itself apart from
human perception, with only the possibility of human interpretation.

5 / 40
Chapter 1 What is metrical structure?

Theorists have mistakenly assumed that there is a complete measure


already present in the musical stimuli. If we can assume that an accent is the
sign of measure, we may be able to equate the presence of the accent in
stimulus with the presence of the measure itself. However this is a very
behaviorist assumption, that differences in stimuli correspond directly to
differences in responses. This makes it impossible to consider the process of
how we get the perception of metrical structure from such stimuli. This is
because the accent or its placement is understood as a kind of symbol that
indicates a strong beat or a metrical structure. For example, given a picture
that you cannot tell whether it is a chair or a desk. If it is described as a desk
in words, we can recognize it as a desk without any judgment process from
the picture. An accent is a kind of noun or warning similar to this, and it
directly tells us the position of the strong beat without the need to
understand the measure.

The mistake of music theorists so far has been to regard measure as


existing in the stimuli of music, and to look for the essence of measure in
the stimuli. In other words, it has been understood that the recognition of
measure is established by simply perceiving regular stimuli. This is precisely
the reason why this study focuses on a cognition of music. Music is music
only when it is recognized by humans, and the nature of music is not the
nature of the sounds that make up the music, but the nature of human who
listen to it.

1.1.2 Music as a reflection of human nature


What then is the nature of human beings who listen to music? In modern
ordinary music, the essential characteristic of the rhythmic structure of
measures and beats is that they are regular. So why do measures and beats
arise regularly? Do the sounds of music themselves make their positions
regular? Of course, this is impossible. If it were a physical substance, for

6 / 40
Chapter 1 What is metrical structure?

example an ice crystal, we could say that it is the physicochemical properties


of the water molecules that cause them to line up regularly, and we could
understand how they do so by studying their behavior in detail. But in
music, no matter how much we study the physical properties of sound, we
will never know the reason for the regularity, because this regularity is given
to music by human activity.

In fact, the sounds of music are placed in specific positions by composers


and performers. Why do they place them in a regular way? Because that is
what the listener wants. Of course, in this case, the composer or performer is
a member of the audience, and the piece is written based on the expectation
that the audience will want the same arrangement. The reason why the
listener demands regularity is that it is easier to understand the music if it is
regular. The reason it is easier to understand is because it is predictable.
Prediction means knowing in advance the "shape" that will come next. For
example, let's consider stairs. If the height and depth of the steps of a
staircase are different from one to another, we cannot go up and down the
stairs quickly and rhythmically, and we need to measure each step. We can
move up and down the stairs without difficulty because they are regular. We
understand music by applying this kind of "shape" to it. If the music is
regular, we know in advance what the next "shape" will be, which makes it
easier to understand. And the steps of the stairs did not become regular by
themselves. They were made regular by human designers and builders in
response to users' demands.

We can make an important hypothesis here. It is that we listen to music


by applying to it some cognitive framework that we have in our minds. The
regularity of music is the result of the interaction between the listener and
the composer or performer, based on this cognitive ability. This is exactly the
same as the regularity of a staircase.

7 / 40
Chapter 1 What is metrical structure?

The parts of music that have a tendency to be regular the most would be
the parts that have the most to do with this cognitive framework. Therefore,
it seems that metrical structure best reflects the nature of this framework. In
fact, metrical structure is the very essence of this cognitive framework. Let's
think about this further.

1.2 Cognitive Framework

1.2.1 Schema
The framework inferred in the previous discussion is very similar to
structures called frames or schemas that have been proposed in cognitive
science such as psychology and artificial intelligence. A frame or schema is a
hypothetical data structure that is assumed to be used by humans when
processing information from the outside world. Marvin Minsky describes a
frame as "a data-structure for representing a stereotyped situation, like being
in a certain kind of living room, or going to a child's birthday party." Similar
concepts are used in various fields, but there is no unified theory and the
terminology is disparate. Minsky, who left a great impact on artificial
intelligence research, used the term frame, so it is often referred to as frame
in the field of artificial intelligence. The term script was also used, especially
when discussing events along a time axis. In psychology, on the other hand,
the term schema is generally used, derived from Greek philosophy and Kant's
term. In linguistics, the term frame has been popularized by Charles J.
Fillmore. Structures similar to frames and schemas are also used in
programming, where the template for creating objects is called a class or
type. Eleanor Rosch discusses a similar problem using the term category.

In this study, we consider that a common general structure is discussed


for each purpose in the various domains mentioned above, and we will refer

8 / 40
Chapter 1 What is metrical structure?

to this common structure collectively as a schema. It should be noted here


that the objects to be explained by the concept of schema vary from relatively
simple ones such as "triangles," "the number five," and "dogs," as mentioned
by Kant, to vast and complex ones such as the structure of narratives, as
mentioned by Frederick Barlett. However, in this discussion, there is no need
to consider complex subjects at all. The use of schemas in this study is very
similar to the use of triangles or the number 5, and it is enough to imagine a
structure like the scale of a ruler. We will now go on to discuss the definition
of schema by David E. Rumelhart and Andrew Ortony, but please note that
they were trying to deal with much more complex objects than this study
deals with.

1.2.2 Schema Features


According to Rumelhart and Ortony (1977), a schema has four main
characteristics:

(1) schemata have variables;


(2) schemata can embed one within the other;
(3) schemata represent generic concepts which, taken all together, vary in
their levels of abstraction; and
(4) schemata represent knowledge, rather than definitions.

(Rumelhart and Ortony, "The Representation of Knowledge in Memory" 1977,


p.101).

Here, variables are data structures with various levels of abstraction. Let us
explain these features using a schema that is assumed to be used for human
face recognition as an example. In the face schema, we can assume that there
are places to store data for variables such as eyes and a nose. Each variable is

9 / 40
Chapter 1 What is metrical structure?

not just data, but also an eye schema and a nose schema, and the eye schema
contains smaller, more specific schemas that describe eyelashes and pupil. If
we were to use tools such as microscopes, it would not be impossible to
subdivide this into smaller and smaller schemas, but that is no longer the
data that we use to understand human faces. On the other hand, we can say
that the facial schema is a part of a larger schema of the human body.

In this way, the hierarchy of schemas extends upward and downward,


with the level that humans normally use as the approximate center. The
larger the level, the more abstract it will be, and the finer the level, the more
concrete it will be. It is thought that this structure allows us to combine not
only the abstract features of a face, but also individual features into a data
structure that can be processed.

A face and the eyes or nose of it are in a relationship of whole and part. In
this case, the schema of face and the schema of eyes or a nose have similar
functions, but they are completely different schemas in terms of content.
This kind of relationship can be called a "whole and part" (or "part-of"/"has-
a") relationship. In addition, there is also the "is a kind of" (or "is-a")
relationship, such as the relationship between the schema of an automobile
and the schema of a truck. In the case of the latter relationship, we can think
of a more specific schema of a truck as being created by adding sub-schemas
for expressing a truck in a nested manner to a more vague and general
schema of an automobile. In such a case, the schema of a truck is said to be a
descendant schema of the schema of an automobile and inherits the
characteristics of the schema of an automobile.

A schema can be a format for reading data as well as a template for


creating something, as is typically the case in programming. For example,
when drawing a human face, one can often carry out with the minimum
description necessary to evoke a schema. The schema of a face allows for a
rough representation and understanding of the face. In programming, we use

10 / 40
Chapter 1 What is metrical structure?

the inheritance mechanism mentioned earlier to create child schemas that


inherit the characteristics of the parent schema but add content to it, in
order to save and organize information.

1.2.3 Metrical structure of music is schema for human


understanding of music.
We have looked at the properties of schemas from as general a point of
view as possible. Metrical structure of music has many of the characteristics
of such schemas. The characteristic that schemas have variables is similar to
the fact that metrical structure has musical events in each part. For example,
let us consider a measure which contains only one a measure long note. If we
consider the measure as a schema, it is a schema with one variable, which, if
repeated, becomes beats. This event may have additional information such as
timbre and pitch, but the first thing that is important is its position. Let us
split this note into two equal notes. This is the same as the schema of a
measure, which is composed of two sub-schemas, and is nothing but two
beats. Let us split these two notes further. To consider two patterns of
division, into two and three, what we get here is quadruple meter and
sextuple meter. Each of these meter is a child schema of the duple meter and
inherits the characteristics of the parent schema. In other words, the rough
understanding of quadruple meter is duple meter.

The level of beat or meter is often set near the level at which humans
perceive the metrical structure most clearly. This is consistent with the idea
of the basic level in Roche's category theory.

From the above, we can conclude that metrical structure of music is the
human schema itself, or the reflection of that schema in various works. In
other words, the discussion of meter is nothing but a discussion of human
schemas for music cognition.

11 / 40
Chapter 2 Events and Time

Chapter 2 Events and Time


How does the schema introduced in Chapter 1 capture musical events? It
is based on the understanding of the temporal position of events. Thus, we
can think of meter as being similar to the understanding of a kind of figure
that is composed of the positions of events.

2.1 Starting Position and Accent of Event

2.1.1 Starting Position of Event


When you play to the sound of a metronome, you would try to match the
starting position of the notes you produces with the timing indicated by the
metronome. This is true whether the note you play is legato or staccato. This
means that our cognitive framework perceives a musical event according to
the temporal position of its beginning. This position, however, should not be
merely the point where the physical stimulus begins. This position is the
point from which we regard the event to have started. For example, consider
an event such as a very low volume sound that gradually increases in volume
and then slowly decays again, as shown in (a) in Fig. 2-1.1 below. Physical
measurements may be made to determine the location of the start of the
event, but we would not be able to view this as an event that could be
synchronized with by human. This is similar to the detection of "edges" in
images. When capturing color, we cannot determine where the color changes
in a gradual gradient. In order to detect such an edge in sound, we need a
volume change like (b) in Figure 2-1.1.

12 / 40
Chapter 2 Events and Time

Fig. 2-1.1

Most of the musical events we usually deal with should have a curve like
(b). With such a sharp increase in volume, people can figure out where the
start is. Therefore, in our standpoint, we cannot use the IOI (inter-onset-
interval), which is commonly used in discussing rhythmic phenomena,
without the condition that "human perceive it".

On the other hand, we do not clearly perceive or give much importance to


the end position of an event. For example, the difference between a legato
and a staccato is a difference in the duration of the event or the position of
the end of the event as a sound stimulus. But this difference has no effect on
our rhythmic sense of the timing of the event, and is interpreted almost as a
kind of difference in timbre, that is, as a difference in the attributes of the
event.

For the listener, a clear indication of where an event ends is when the
sound suddenly disappears, which should be taken as a sense of the
beginning of an event that is actually silence.

2.1.2 Accent
An accent is an attribute attached to an event, as Carl Schachter describes
it as "colors the event".2 An event is a group of stimuli in time, and an accent

2 The metrical accent, therefore, always colors the event—tone, harmony, occasionally even silence—that falls on the
favored point. (Schachter, Unfoldings: Essays in Schenkerian Theory and Analysis.,1999, p.82)。

13 / 40
Chapter 2 Events and Time

is a kind of "value" attached to the event. The term "accentuate" is often used
in general, but one should be careful about this. This is because an accent is
not an event or an object, and the word "accentuate" should not be
understood as to attach something to a note. It means to change some value
of an event.

Where is the accent on the event? Even if an event is of constant volume


during its duration, it is not understood that the accent is on the entire
duration of the event. The position of the accent should coincide with the
start of the event. In other words, both the event and its attribute, the
accent, are understood according to the position of the beginning. This
means, conversely, that all events that have a definite starting point also have
a definite accent of some strength. In this case, the accent means exactly the
same thing as the starting point of the event.

There is another aspect of accent, that is the difference in "intensity".


When we say "accentuate," we usually mean to increase the intensity of the
sound. Of course, as already mentioned, the accent at this time is felt to be
at the point of beginning of the intensified sound.

Events that would have been identical without the difference in intensity
will be divided into different types, by differentiating the "strength" of
accents. If we use accents of two different intensities, the events will be
divided into two kinds, while events with the same strength of accent are
"identical events". This has a similar effect to differences in timbre, etc.
Giving a difference in accent is almost the same as making a difference in the
type of event.

But we need to be more careful here. If we understand accents as


described above, then the adjective "unaccented" becomes meaningless. An
unaccented event cannot be considered silent. If an event with a starting
point always has an accent of certain strength, then the state usually referred
to as "having an accent" or "strong accent" means that it is stronger or more

14 / 40
Chapter 2 Events and Time

prominent than the accents of other events. On the other hand, "no accent"
means that the accent of the event is weaker or less prominent than the
accents of other events.

On the other hand, if we use accents as a proxy for human attention, then
the phrase "no accent" may be valid. However, this is an idea that requires
caution, as it may lead to the extremely perverse idea that human attention
exists in the stimulus. In fact, it is humans who perceive the accent, and only
the possibility of humans perceiving the accent exists in the stimulus. It is
human beings who interpret accents as being in a certain position, and it is
human beings who recognize those differences as meaningful differences and
group them or pay attention to them. All of this suggests that the problem of
accent is much the same as the problem of the difference between the
starting point of an event and the type of event as interpreted by humans.
The essence of an accent is that the starting point is felt, or that the starting
point is more prominent than others.

However, as we have already pointed out, if the concept of accent is not


used carefully, it can be treated as if it were a substance, as if it were real in
the stimulus. It is only a difference in "value," but human identification of the
difference and understanding of its location creates the illusion that the
accent as object is placed in a specific place in the stimulus. So the object can
disappear when human attention is no longer focused on it, or conversely, it
can have a finer grade that is far beyond our ability to distinguish (remember
the six levels of distinction in Fig. 1-1.3).

This is a common error in thinking of human responses as pre-existing in


the stimuli that caused them. The idea that sadness and joy are already
contained in music has existed for a long time. Accents and meter are
likewise dragged out of the human mind and taken as if they were objectively
complete in stimuli and merely perceived by humans.

15 / 40
Chapter 2 Events and Time

So far, we've discussed the concept of accent and the caveats and
confusion associated with it. However, these misconceptions about accents
have actually been put to good use as a convenience. The concept of accent is
a device to materialize human responses outside of human beings. Using
accents, instead of saying that humans sense the beginning of an event, we
can say that the event has an accent, and when humans pay attention to one
of these accents, we can say that the event has a strong accent. And when an
event is categorized into two types according to the strength of the sound,
we can say that "strong accent and weak accent are alternating. In this case,
the accent is not the objective strength of the stimulus, but a concept that
takes into account the human response to the stimulus and then reassigns it
to the stimulus as if it really exists outside.

2.2 Formation of meter

2.2.1 Accents and meter


The mere presence of an accent in itself, or the mere perception of an
accent, however, does not give rise to meter, because each accent is only an
isolated thing that exists at a single point. This situation does not change at
all if you gather several accents regularly or make one of them stronger. That
is not enough to create meter. The reason is that isolated accents cannot be
connected or related to each other just because they are lined up in a regular
manner.

What creates meter between those accents? In order to do this, human


involvement is necessary. In other words, it is the human's schema that
creates meter. Humans make some accents duple meter by considering them
to be duple meter, for example. Of course, this is only possible if those
accents are acceptable to duple meter schema. If they are unacceptable to

16 / 40
Chapter 2 Events and Time

duple meter schema, then they are either accepted by another schema or
they are incomprehensible. If the composer or performer wants the piece to
be understood as duple, he or she will place the notes in such a way that
they will be acceptable to duple meter schema. However, no matter how
acceptable the arrangement of sounds is, sounds do not become meter
without human involvement. Differences in the stimuli themselves, such as
the placement of the sounds, can only have possibility to be interpreted in
certain ways by the listener.

Earlier in this paper, we argued that accent is equivalent to the point of


event onset and the difference in event types. Therefore, the schema of meter
can be thought of as a way to recognize the positional relationship of events.
The recognition of differences in accents or types of events provides
information for identifying various types of this positional relationship. This
paper considers the schema of meter as a structure for understanding a kind
of "shape" created by the positional relations of events.

The word "schema" is derived from the Greek word σχῆμα (skhêma),
which means "shape". Aristoxenos discusses the relationship between
sounds of music, syllables of language or steps of dance and the time
divisions created by such temporal events, comparing it to the relationship
between "things that can take shape" and "shape".3 In his theory, rhythm is a
particularly desirable shape among such shapes. The species of rhythm
appears in the form of foot.4 Feet, the species of rhythm, was based on the
length of time, and had dominated the idea of meter in music, at least until

3 Cf. Pearson, L., Aristoxenus Elementa Rhythmica, 1990, pp.3–9: We must recognize rhythm and the rhythmizable medium
(rhythmizomenon) as separate notions and separate natures, related to one another in the same kind of way as shape and
shapable material in relation to it. [...] The shape (schema) is a particular arrangement of the parts of the object. [...]
rhythm occurs when the division of chronoi takes on some particular arrangement, because not eyery arrangement of
chronoi is rhythmical. [...] There are three kinds of rhythmizomena, speech, melody, bodily movement.

4 Cf. Pearson, L., Aristoxenus Elementa Rhythmica, 1990, p.11: The means by which we mark rhythm and make it
perceptible to the senses is a foot, one or more than one.

17 / 40
Chapter 2 Events and Time

the 18th century. A major change in this way of thinking was the
understanding of meter by accent rather than by shape.

2.2.2 Misconceptions about the relationship between


accents and meter
Although differences themselves in accent are not meter, it is true that the
differences can encourage the listener to interpret it as a particular metrical
structure. In this sense, accents and their differences are a kind of sign.
However, this has led many theorists to equate the accent on the stimulus
itself with the existence of meter. As we have already mentioned, the
stimulus itself, which is outside of us, is only a possibility of meter, but the
stimulus prompts us to interpret it as meter, which mislead us into thinking
that the stimulus itself is meter.

We must deny that the presence of accents in a stimulus outside of the


human being is directly regarded as the presence of meter. The difference in
accents is just one piece of information that the listener can refer to when
applying the schema. The various differences in stimuli present in a piece of
music are the various forms of possiblity of metrical interpretation. Meter of
a piece of music is not something that exists objectively. This study is not
trying to investigate objective meter of a piece of music, but rather the
possibilities of interpretation by the listener. The object of this study is the
contents of the listener's toolbox, which is what kind of interpretation the
listener can make of various objects.

2.2.3 Yeston's fallacy of formation of meter


Maury Yeston is a theorist who left a great impact on modern American
music theory, and his theory of meter is the basis for later theories such as
GTTM theory, as well as Harald Krebs' discussion of syncopation and

18 / 40
Chapter 2 Events and Time

hemiola. However, there is a major flaw in his meter concept that needs to
be mentioned here.

In his theory, he claims that meter is created by the interaction between


multiple beat sequences.5 Seemingly, a 2/4 meter, for example, seems to be
composed of a regular sequence of quarter notes and that of half notes, as
shown in Fig. 2-2.1 below. A sequence of quarter notes alone would not be a
meter because it does not contain any elements that produce a group of
beats, and a sequence of half notes alone would not also be a meter because
it has no trigger for division. In this way, it would seem that meter arises
from the interaction of two "strata (layers)". However, there is a fundamental
error in this way of thinking. It assumes that the initial state is one in which
the quarter-note layer and the half-note layer exist independently of each
other. But if it is true, they would be unrelated to each other, and therefore,
the timing of the two layers would never match. There is no other way for
the two layers to be in such a relationship than for one to have been created
from the other.

Moreover, it is impossible for pulses to exist from the beginning. This is


because pulses are struck one by one from a starting point, after some
periodic determination.

5 Clearly, Yeston regards that meter exists in its complete form in musical stimuli without human involvement (The
Stratification of Musical Rhythm, 1976, p.65ff.): In order to create some regular grouping of elements within a simple
pulse, there must be some event occurring at regular intervals within it. [...] The fundamental logical requirement for
meter is therefore that there be a constant rate within a constant rate—at least two rates of events of which one is faster
and another is slower. [...] In view of these two necessary rhythmic strata, the question must now be asked: On which
level does the meter appear—on level A or on level B?[...] There is apparently, then, no such thing as a level of meter or
a level on which meter may appear; but rather, meter is an outgrowth of the interaction of two levels—two differently-
rated strata, the faster of which provides the elements and the slower of which groups them.

19 / 40
Chapter 2 Events and Time

Fig. 2-2.1

Yeston has mistakenly taken the resulting regularity as a starting


condition. This would be just like claiming that human footprints are made
by composing a layer of footprints of the right foot and a layer of footprints
of the left foot. Such an idea may be available as a technique for analysis, but
it is a false idea that is completely different from the essence of meter.

20 / 40
Chapter 3 Understanding by Schema

Chapter 3 Understanding by
Schema

3.1 Rumelhart & Ortony's (1977) model of


understanding
How does a listener understand an event using a schema? In this regard,
Rumelhart and Ortony give the following clear formulation.

We say that a schema “accounts for” a situation whenever that situation


can be interpreted as an instance of the concept the schema represents.
(Rumelhart & Ortony, 1977, p.111)

On having found a set of schemata which appears to give a sufficient


account of the information, the person is said to have “comprehended” the
situation. (Rumelhart & Ortony, 1977, p.112)

We illustrate this with a very simple STAIR schema. Let us simply consider
stairs that goes straight ahead. In this case, the type of schema is
distinguished only by the difference in height and depth of each stair. Any
further details have little to do with going up and down the stairs. In other
words, the schema of stair can be expressed simply with the differences in
height and depth parameters. When a pedestrian is able to walk up and
down the stairs smoothly, the pedestrian understands the stairs.

According to the formulation by Rumelhart and Ortony, to have a feature


of stair explained is to have that stair interpreted as being an instance of

21 / 40
Chapter 3 Understanding by Schema

STAIR schema. The pedestrian would have understood the stairs if he or she
could find in his or her mind a schema that would explain it. The pedestrian
can then easily prepare such a suitable schema by simply changing the
parameters of depth and height. It is important to note that it is possible to
obtain a schema for any stair by simply changing the parameters. This is
because pedestrians cannot have an infinite number of schemas for every
stairs, so a mechanism to obtain appropriate schemas with fewer procedures
and less storage space is necessary. This point will be discussed soon.

If we think of FACE schema, interpreting a certain arrangement of visual


stimuli as being an instance of FACE schema is to understand that
arrangement as being a face. And this understanding is possible because the
viewer already possesses the schema for faces, and that schema is likely to
have properties that allow it to stretch and contract to accommodate the
diversity of actual face arrangements. Although there must be a limit to this
stretching, normal human faces should fall within the range where the
schema can be easily obtained by such stretching. It is difficult to discuss
here how this limit is determined, but it is important that the schema has
such flexibility. This is because, for example, when we recognize a
handwritten figure as a triangle, it means that the viewer can ignore the
distortion of the figure caused by the handwriting and impose the
arrangement of the visual stimulus into the TRIANGLE schema. Although
this can be a source of misunderstanding of the subject, it should have the
effect of saving the type of schema needed.

Now, let us apply this to the case of a musical object. Assume that there is
a musical object with a certain structure. When this can be interpreted as an
instance of a schema, we can say that this schema explains this musical
object. When a listener can find in his or her mental stock a schema that
explains this object, he or she has understood the object. Here, the listener
should not have a separate schema for each of the various musical shapes,

22 / 40
Chapter 3 Understanding by Schema

but should be able to generate a schema for each shape by some simple
mechanism, as described in the example of the stairs. This is because if we
had separate schemas, the listener would need to have a very large number
of different schemas to be able to understand a wide variety of music, and it
would not be easy to find the right one.

The schema (or frame) envisioned by Rumelhart and Ortony or Minsky


seem to be more fixed and conceptual. They think of a single schema as
always corresponding to a concept or event. However in this study, we need
to think of a continuous variety of slightly different structures, rather than
thinking of just one typical form. Therefore, restaurants and children's
birthday parties are probably a little too complicated examples for discussing
such continuous schemas. We believe that metrical structure is a reasonably
simple subject for such studies.

3.2 Principles of Grouping

3.2.1 The cognitive role of grouping


The GTTM theory (Generative Theory of Tonal Music) by Lerdahl and
Jackendoff states that grouping is a fundamental process of human cognition:

The process of grouping is common to many areas of human cognition. If


confronted with a series of elements or a sequence of events, a person
spontaneously segments or “chunks” the elements or events into groups of
some kind. The ease or difficulty with which he performs this operation
depends on how well the intrinsic organization of the input matches his
internal, unconscious principles for constructing groupings.6

6 Lerdahl & Jackendoff, A Generative theory of tonal music, 1983, p.13

23 / 40
Chapter 3 Understanding by Schema

We agree with GTTM in this respect. However, GTTM avoids discussing the
nature of grouping in music by applying the general concept of grouping to
music, which is mainly based on the nature of visual objects, instead of
considering grouping in music. Therefore, although they admit that events
are grouped in music, they do not ask how the grouping of sounds occurs,
and end up discussing how to construct rules for determining the location of
group boundaries based on the various characteristics of musical events. This
is because GTTM ultimately prioritizes the construction of systems that can
be automatically processed by machines, rather than aiming for a human
understanding of music. In other words, GTTM theory is essentially a study
of artificial intelligence using computers, and therefore, in the end, the
groupings in GTTM do not even have to be the same as the groupings judged
by humans.

Since this research is aimed at a practical theory of music, it should take a


different path from GTTM. This paper does not aim to build a system of
rules that can be automatically judged by machines. What this paper aims to
do is to elucidate the mechanism by which humans create sound groups.

3.2.2 Principle of Grouping


Let us suggest a principle here, that sounds do not form groups by
themselves, but are grouped together by the listener's schema. It is obvious
that the physical existence of sound does not form groups by itself. The
relationship of the events to each other only changes the possibilities of
human interpretation. It is the listener's own cognition that creates the
group.

This, together with the formulation by Rumelhart and Ortony, can be


expressed as follows:

24 / 40
Chapter 3 Understanding by Schema

A group arises when a listener fits events into a listener's schema.


Therefore, a group can be regarded as an instance of a schema. In this case,
we can say that the schema "explains" the group. And if you find a set of
schemas that seems to explain the group well enough, then you can say
that you "understand" the group. Conversely, a listener cannot understand
as a group various events for which he cannot find an appropriate schema.

In other words, groups in music are based only on schemas. In this study, we
will call this the "principle of grouping". A schema unambiguously
determines a group as an instance. However, a musical group can be
understood by multiple schemas, depending on the interpretive possibilities
of the structure it stands for. This is usually because a group, by itself, does
not fully describe the structure represented by the schema. This may be
similar to the relationship between objects and shadows. An object casts a
certain shadow, but there are multiple possibilities for the object to be
inferred from the shadow. A shadow can have only one contour, but the
groups created by the schema include a nested hierarchical structure.

3.3 Schema system based on inheritance


relations
The principle of grouping reveals certain properties that are required of a
schema. We will discuss this in a way that answers the questions below.

Question 1:.
If the groups are based on a complexified schema of meter, wouldn't there
just be complex meter in complex groups, and the simple meter that the
listener would normally experience in music would be lost? In music,

25 / 40
Chapter 3 Understanding by Schema

however, simple meter are clearly perceived in complex passages as well as


in calm and simple groups. Isn't this a contradiction?

Question 2:.
If there is a schema that instantiates each shape of group, doesn't the
listener have to have a huge amount of schemas beforehand in order to
understand the various groups of music? Wouldn't that be a waste of the
brain's storage capacity?

Question 3.
No matter how many schemas we can prepare, won't there always be some
groups that cannot be supported by them? If the range that can be
supported is fixed, how on earth can the range that can be supported be
determined?

Question 4:.
Even if we could have such a huge number of schemas beforehand, how can
we find the right one each time in a short time?

Our answer to question 1 is that the listener understands complex groups


of music as something that is roughly a simple meter. This is extremely
important. For example, if a piece of music is very fast-paced but has very
detailed movements, it is often the case that the listener's understanding
cannot keep up with the speed of the music. In such a case, the listener will
not be able to understand most of the music if he or she must understand
every detail of it. Therefore, the listener must be able to be satisfied with a
rough understanding, in a moderately abbreviated manner.

Therefore, it would be better to think of it as follows: a complex group


can be understood as a modification of a simple group, and a simple group
can also be understood as a modification of a much simpler meter. 12/8

26 / 40
Chapter 3 Understanding by Schema

meter can be understood as a modification of quadruple meter, and


quadruple meter as a modification of duple meter.

Here, schemas are considered to have almost the same properties as


"classes" used in object-oriented programming. A class inherits its properties
from its parent super-class and has more detailed properties added to it. In
the same way, a schema for a group of music has a property inheritance
relationship. And, depending on the depth of understanding required, it may
be understood as a general schema or a detailed schema. However, the
inheritance relation considered here in this paper is limited to the relation as
a schema for understanding the "shape" of music. Therefore, we do not
consider that the super-schema can be traced back to more abstract concepts
such as "music in general".

As for question 2, Rumelhart & Ortony mention a similar problem:

If comprehension is achieved by utilizing a schema or set of schemata to


account for the input, how is the absurd conclusion that there exists a
schema for every conceivable input to be avoided? (Rumelhart & Ortony,
1977, p.112).

Their answer is that the whole understanding would be obtained by


combining simple schemas. However, the situation they cite as an example of
when such a combination would occur is "a girl hears the voice of an ice
cream vendor," a rather complex situation that is not appropriate for
discussion here. In this study, this can be explained a bit more simply using
the schema for understanding shapes based on inheritance relations that we
mentioned in answering question 1.

For example, if quadruple meter can be understood as a modification of


duple meter, then the listener will be able to understand quadruple meter
schema and duple meter schema as one system in which quadruple meter
schema is generated from duple meter schema. This system should further
be able to generate 12-beat meter schema or, by changing the parameters, a

27 / 40
Chapter 3 Understanding by Schema

sextuple meter schema. In this way, the listener would not have to think of
each schema as having to be prepared separately, because schemas can be
generated by such a system.

If we assume such a system, we can give satisfactory answers to the


question 3 and 4. It can be assumed that the listener can easily understand
any object within the range where the schema can be easily generated by
such a system. Therefore, objects for which it is difficult to generate a
schema, or objects that cannot be grasped by a schema that can be easily
generated, will be difficult, if not impossible, to understand. In this system,
the schemas do not exist separately, but are grouped into a tree-like series,
which should make it easier to select an appropriate schema.

We consider that such a system must be assumed in order for schema-


based understanding to function satisfactorily, not just for understanding
music. Of those that do have such a system, the schema system for
understanding music is probably one of the simplest and most regular. One
reason for this is that it allows us to treat music almost like the
understanding of a geometric figure. This eliminates the need to deal with
complex issues such as "meaning". Also, although we mentioned that it is
like a geometric figure, the time division of music is one-dimensional
information, and is more like an object like a number line rather than a
figure. Therefore, music does not have to deal with the complex two-
dimensional representation of positional relationships that is necessary for
figures. For this reason, it can be said that the theory of musical meter is an
affordable starting point for such a discussion of schema systems.

28 / 40
Chapter 4 Schema system of metrical structure

Chapter 4 Schema system of


metrical structure

4.1 Basic metrical structures

4.1.1 The Three Basic Meta-Schemas


In Chapter 1, we pointed out that the metrical structure is not something
that exists in the musical object, but is given to the musical object by the
listener who understands the music. Therefore, the arrangement of a musical
object has only the possibility of being interpreted by the listener as a
metrical structure. Such an interpretation by the listener is based on the
cognitive schema of metrical structure as will be described in this chapter.

In Chapter 2, we argued that the essence of metrical structure lies in the


temporal arrangement of events. Although accents are a convenient means of
indicating placement to people, the essence of a metrical structure is not in
the accent but in the placement indicated by it. Already in ancient Greece,
such an arrangement, or a particularly desirable one, was called a shape
(skhēma) or rhythmos. However, the shape that the ancient Greeks
considered good are clearly different from those of today's music. Therefore,
in this chapter, we must clarify the good shapes in today's music based on
our musical experience.

As already mentioned in Chapter 3, when a listener is confronted with a


certain stimulus, to understand that stimulus is to find a schema in which
that stimulus is an instance. In order to easily find such schemas for
comprehension within a limited capacity, the schemas need to be related

29 / 40
Chapter 4 Schema system of metrical structure

each other by some systematic mechanism that can be generated as needed.


The mechanism assumed here is a phylogenetic relationship based on
inheritance relations.

Starting from the schema of a measure, if you divide the measure into
two, you get duple meter, and if you divide it into three, you get triple meter.
Duple and triple meter are each a kind of measure, and inherit the properties
of a measure. The difference between the two and three divisions is just a
difference in parameters in the same mechanism. If we were to divide them
further into two or three, we would generate four, six, or nine beats. In this
way, we can think of a system of schemas that includes duple meter, triple
meter, or four, six, and nine beat meter. By tracing the phylogenetic
relationships of this system, the listener can select the schema that best fits
the subject. The schemas obtained by the basic procedure in a short number
of steps can be treated almost as if they were prepared from the beginning.
That is to say, rather than having all the schemas, the listener would have a
mechanism to generate them. In this way, the listener does not need to
prepare a large number of schemas in advance, and can avoid the problem of
overwhelming the memory capacity.

Such a basic procedure is the meta-schema for generating schemas. In


order to obtain metrical structures of music, three meta-schemas are
required as follows.

1. Isochronism
2. Division
3. Simple ratio

Isochronism generates a schema for understanding sequences of equal


intervals of time. This schema is the most basic schema in music, but it only
allows us to understand a single sequence of beats.

So we need the second meta-schema for the division. This actually comes
from the very nature of schemas, "(2) schemata can embed one within the

30 / 40
Chapter 4 Schema system of metrical structure

other" as described in 1.2.2. In this case, the embedded schema is always


smaller than the original schema, so the first meta-schema, isochronism,
gives rise to a hierarchy of beats.

In order for the subschemas obtained by division to satisfy isochronism, it


seems at first glance that the division must be based on equal parts.
However, it is not easy to know the length of half or one-third of the whole
from the mere property of isochronism. Isochronous nature alone does not
give us any information about the relationship between different lengths. For
this reason, we need to assume that people have the meta-schema for ratio
relations.

Looking back at our daily experiences, we can roughly judge a doubling or


tripling of length, but it is extremely difficult to perceive a relationship such
as fivefold or sevenfold. For this reason, the simpler the ratio, the easier it is
to grasp.

Of course, this does not mean that the object has to have this exact ratio.
The listener can understand the object as having this ratio roughly.
Distortions can be understood by comparison with something regular. In
other words, distortions can be understood by modifying the schema for
regular things. And in this case, subtle deviations from the exact proportions
will probably be felt as differences in impressions. We do not intend to deal
with such delicate issues here, but we believe that a certain kind of senses
such as "groove" in music for example, is deeply related to the impressions
associated with such discrepancies.

4.1.2 Defining a set of time segment sequences using


rewrite rules
The entire set of schemas created from these three meta-schemas can be
expressed by the following rewriting rules as shown in Fig. 4-1.1, following
the way of defining the set of symbol sequences in formal language. The

31 / 40
Chapter 4 Schema system of metrical structure

vertical bar on the right-hand side means that we can arbitrarily choose
between two or three divisions. Note that these rewriting rules only
expresses the ratio of the division. The quarter note on the left side means
an arbitrary time division, and the eighth note on the right side represents
the length of the second or third equal division of the left side. Thus, for
example, this is equivalent to that in Fig. 4-1.2. Traditionally, the starting
point for this rewriting was the whole note, or semibrevis.

Fig. 4-1.1

Fig. 4-1.2

Applying this rewriting rules to any position any number of times, all the
standard metrical structures will be obtained. This is the collective definition
of a sequence of symbols that represent time segments. Therefore, we have
defined a grammar that generates all possible standard metrical structures.
Of course, this does not mean that all of the shapes produced by this
rewriting rule will be used with equal frequency. Clearly, it is easier for the
listener to understand regular, repeated shapes. The word "meter" in the
normal usage of the term can be said to indicate relatively typical patterns of
simple measure division among the various possible shapes mentioned
above.

32 / 40
Chapter 4 Schema system of metrical structure

4.1.3 Traditional note system


The system of notes in traditional notation is designed to automatically
satisfy the above three meta-schemas without the user being aware of it;
Isochronism can be achieved by simply using the same notes; a note can be
replaced by several notes of smaller value; and note types are defined by
simple ratio relations to each other. It is like a mathematical arithmetic
symbol; by simply remembering to manipulate the symbols, complex
calculations can be handled without even being aware of what you are doing.
The musician is completely unaware that in using such notes, he is being led
to choose only isochronous sequences of segments out of the various
possible length relationships, and to use only divisions according to simple
proportions. It would not be an exaggeration to say that the development of
the traditional system of musical notation was a matter of incorporating the
mechanisms of human cognition into a system of musical symbols. This is
because it would have been difficult for the system of musical notes to
become so widespread if it had been contrary to the human cognitive system.

4.1.4 Hemiola
If we make both choices in the above rewriting rules at the same time,
instead of just one, we get the structure of hemiola. In other words, hemiola
is the juxtaposition of different partitions. However, one of them will usually
be the dominant division, and the inferior division will give the feeling such
like irregular syncopation.

Since the hierarchy of beat is based on division, the lower level beat frame
is essentially a division of the upper level beat frame. This gives the sense
that each frame begins to divide again, not superposition of independent
layers of pulses. This is also consistent with the fact that this study considers
the higher level beat as a group. However, if we are feeling two different
divisions at the same time, as is the case with hemiola, it is possible that a

33 / 40
Chapter 4 Schema system of metrical structure

group of beat sequences at a smaller level may be felt beyond the range of
the larger beats.

4.2 Metrical structure with unequal divisions—


quasi-meter

4.2.1 Quasi-meter based on unequal division


In the section on the meta-schema of simple ratios, we only dealt with
partitioning methods that result in equal parts. However, if the unit of
division is by ratio, then the division does not necessarily have to be done in
such a way that the whole is divided into parts equally. For example, the
ratio of a dotted half note to a half note is 3:2; if we include this relation in
the simple ratio, we can divide the dotted half note into a half note and the
remaining quarter note.

Doesn't this contradict the isochronous schema? But we have already seen
in Chapter 2 that an event is understood by the location of its starting point.
Therefore, the isochronous schema can be applied even if the second event is
not long enough. This is because the quarter note after the half note can be
understood as a half note with its second half lost. This also means that the
third beat of a triple meter has the same value as the third beat of a
quadruple meter. It is for this reason that Baroque theorists considered the
third beat of a triple meter to be a strong beat.

34 / 40
Chapter 4 Schema system of metrical structure

Fig. 4-2.1 equal division (2+2) and unequal division (2+1)

Similarly, the ratio of half notes to dotted quarter notes is 4:3, and by
subtracting dotted quarter notes from half notes, we get a division of (3+1).
The result is nothing more than a dotted rhythm. This is the same as 6/8
stopped at the length of a half note (See fig.4-2.2).

Fig. 4-2.2 equal division (3+3) and unequal division (3+1)

Note that it is extremely difficult to find rhythms using a ratio of 5:4 in


musical practice. This suggests that ratios involving 5 are not part of the
normal human meta-schema. Of course, the same thing could be said about
7. However, composite numbers such as 4, 6, 9 and 12 can be made by
combining 2 and 3, and it is thought that the listener can easily understand
the meter composed of these numbers. Therefore, we can assume that the

35 / 40
Chapter 4 Schema system of metrical structure

listener has an almost inborn ability to understand the relationships created


by 2 and 3.

The (2+1) and (3+1) divisions have the almost same characteristics as
meter. In other words, each of these divisions can be a basic unit of melody
or harmony, just as the beats in ordinary meter. Therefore, these divisions
can be called quasi-meter.

The two rewriting rules added by the (2+1) and (3+1) quasi-meters are
as in Fig. 4-2.3. By adding these rules, the metrical structure can be
extended. Note that the (1+2) and (1+3) division cannot be treated here
because they involve syncopation (see Chapter 6 for syncopation).

FIg. 4-2.3

4.2.2 Division of (3+3+2) as quasi-meter


It is clear from Fig. 4-2.4 that the (2+1) and (3+1) divisions of quasi-
meters are based on exactly the same relation as those of hemiola.

36 / 40
Chapter 4 Schema system of metrical structure

Fig. 4-2.4

The (3+3+2) division is extremely frequent in popular music, and as is


clear from Fig. 4-2.5 below, the (3+3+2) division can also be explained by
slightly extending the hemiola relation. Understood in this way, the
(3+3+2) division can be understood as a quasi-meter as with (2+1) and
(3+1). Of course, in the (3+3+2) division, the three parts that make up this
division work in the same way as beats in ordinary meter. Figure 4-2.6 shows
the (3+3+2) division as a rewriting rule.

37 / 40
Chapter 4 Schema system of metrical structure

Fig. 4-2.5

Fig. 4-2.6

However, there can be more than one possible way to understand the last
2 in (3+3+2). If we simply think of it as an extension of the hemiola, then
the last 2 is incomplete 3. On the other hand, it could be interpreted as the
one which can be derived with an additional duple divisions applied to the 3
in the quasi-meter (3+1). In that case, the notes would be the same
(3+3+2), but the structure would be different from the case where the
sequence of 3s is broken. This phenomenon is similar to a sentence that
consists of the same sequence of words but can be interpreted in two
different grammatical ways. In music, such cases that multiple
interpretations are possible are not at all uncommon.

The (3+3+2) division becomes difficult to recognize when the notes are
subdivided. For example, in Scott Joplin's "The Entertainer" (Ex. 4-2.1), the
beats of the (3+3+2) division is subdivided. In such a case, it becomes
impossible to identify this quasi-meter only by the ratio of note lengths
appearing in the score. Typical patterns are extracted in Fig. 4-2.7.

38 / 40
Chapter 4 Schema system of metrical structure

Ex. 4-2.1 S. Joplin, The Entertainer (mm.1–10)

Fig. 4-2.7 The (3+3+2) division in "The Entertainer"

The (3+3+2) division is often confused with syncopation. Since


syncopation will be dealt with in Chapter 6, we won't go into it here, but let
us just say that syncopation is a shape that cannot be reached by the
divisions in the manner described in Chapter 4. If we take away a certain
beat unit from a certain whole from its starting position, we cannot obtain
the syncopated division such as (1+2) or (1+2+1); to create syncopation,
the division must begin at delayed position. This displacement will be the
main theme of chapter 6, and the key for expansion of metrical structure.

39 / 40
Chapter 4 Schema system of metrical structure

40 / 40

You might also like