You are on page 1of 6

A study of Grammatical Inference Algorithms

in Automatic Music Composition and


Musical Style Recognition.
Pedro P. Cruz-Alcázar
Enrique Vidal-Ruiz

Departamento de Sistemas Informáticos y Computación


Universidad Politécnica de Valencia
Camino de Vera s/n. 46071 Valencia (Spain)
e-mail: pcruz@iti.upv.es
evidal@iti.upv.es

Abstract bar. The bar was chosen according to the result of


A study of the application of Grammatical Inference (GI) in throwing two dice.
the field of Music is presented. We have studied three GI Since the creation of the first computers, many
Algorithms which have been previously applied experiments have been carried out on how to use them
successfully in other fields. In this work, these algorithms advantageously in music composition, obtaining results
have been used to learn a stochastic grammar for each of that go from anecdotic to true masterpieces. Research in
three different musical styles from examples of melodies. this field, along with electronic music, is one of the
Then, each of the learned grammars was used to
stochastically synthesize new melodies (Composition) or to
capital contributions from recent decades. Many
classify test melodies (Style Recognition). The results, procedures have been employed in Automatic Music
which were systematically evaluated according to musical Composition, but the use of GI doesn’t seem to have been
quality criteria and classification error rate, are promising. explored so far. These procedures can be classified as
Deterministic (using some music composition methods or
Keywords: Syntactic Pattern Recognition, Grammatical altering existing music), proceeding from Mathematics
Inference, Automatic Music Composition, Musical Styles (random with dependent or independent events, fractals,
Recognition, Automatic Learning. 1/f noises, chaos and non linear dynamic systems, cellular
automata, etc.) and other fields such as Artificial
Introduction Intelligence (expert systems and neural networks) and
Language Theory. This work can be included in the latest
Grammatical Inference (GI) encompasses both theory line of research, although GI also has important aspects
and methods for learning syntactic models from training related to Random procedures with dependent events.
data, which are called grammars or automata. It is a well
established discipline that dates back to the sixties and has The field of Automatic Musical Style Recognition does
many applications, such as Pattern Recognition, Natural not seem to have been studied before. It deals with giving
Language Modelling in Speech Recognition, Genetic certain “musical culture” to a computer so that it will be
Code Processing, and so on. There are still many able to recognize the musical style to which supplied
applications of GI to explore, one of them being Music: in melodies belong. We have applied GI to this new problem
Composition, Analysis and Harmonizing melodies, which is the most innovative characteristic of this work.
Musical Style Recognition, etc.

Early work on computer generated music goes back to Grammatical Inference


the XVIII century, when certain games for composing In this section, we briefly review the field of GI and,
music “without knowledge of music”, were popular. This later in the article, we concisely explain the algorithms
was done by throwing dice. Mozart created one to used in our study. Grammatical Inference (GI) is a well-
compose 8-bar-long waltzes using a table with different established discipline originating from the work of Gold
kinds of bars: first bar, second bar, etc. up to the eighth on “Language Identification in the Limit” (Gold 67).
Perhaps the most traditional field of application of GI “characterizable” method that infers k-Testable
has been Syntactic Pattern Recognition; but there are Languages in the Strict Sense (k-TSSL) in the limit. A k-
many other potential applications. TSSL is usually defined by means of a four-touple Zk =
(Σ,I,F,T) where Σ is the alphabet, I is a set of initial
In general, GI aims at finding the rules of a grammar G, substrings of length smaller than k; F is a set of final
which describes an unknown language, by means of a substrings of length smaller than k and T is a set of
positive sample set R+ ⊆ { α | α ∈ (G) } and a negative forbidden substrings of length k. A language associated
sample set (of samples that should be rejected) R- ⊆ { β | with Zk is defined by the following regular expression:
β ∈∑ - (G) }. A more limited framework, but more (Zk)=IΣ*∩Σ*F-Σ*TΣ*. In other words, (Zk) consists of
*

frequent in practice, is the use of only positive samples. strings that begin with substrings in I, end with substrings
The main problem lies in finding a more “abstract” or in F and do not contain any substring in T. Stochastic k-
general grammar G’ so that R+⊂ (G’), and so that (G’)- TSSL are obtained by associating probabilities to the rules
R+ (the extra-language) contains only strings with “similar of the grammars, and it has been demonstrated that they
features” to the strings from R+. A grammar G is said to are equivalent to N-GRAM’s with N=k (Segarra 93).The
be identified in the limit (Gold 67) if, for sufficiently large inference of k-TSSLs was discussed in (García and Vidal
sample sets, (G’)= (G). Grammars and Formal 90) where the k-TSI algorithm was proposed. This
Languages can also be seen from a probabilistic algorithm consists in building the sets Σ,I,F,T by
viewpoint, in which different rules have different observation of the corresponding events in the training
probabilities of being used in the derivation of strings. strings. From these sets a Finite-State automaton that
The corresponding extension of GI is called Stochastic recognizes the associated language is straightforwardly
Grammatical Inference and this is how all the algorithms built.
used in our study work.
The ALERGIA Algorithm
The Error-Correcting Grammatical Inference ALERGIA infers stochastic regular grammars in the
(ECGI) Algorithm limit by means of a State Merging Technique based on
ECGI is a GI heuristic that was explicitly designed to probabilistic criteria (Carrasco and Oncina 94). It builds
capture relevant regularities of concatenation and length the prefix tree acceptor (PTA) from the sample and, at
exhibited by substructures of unidimensional patterns. It every node, estimates the probabilities of the transitions
was proposed in (Rulot and Vidal 87) and relies on error- associated to the node. The PTA is the canonical
correcting parsing to build up a stochastic regular automaton for the training data (R+) and only recognizes
grammar through a single incremental pass over a positive L(R+). Next, ALERGIA tries to merge pairs of nodes,
training set. Let R+ be this training sequence of strings. following a well-defined order. Merging is performed, if
Initially, a trivial grammar is built from the first string of the tails of the states have the same probabilities within
R+. Then, for every new string that cannot be parsed with statistical uncertainties. The process ends when further
the current grammar, a standard error-correcting scheme merging is not possible. By merging states in the PTA the
is adopted to determine a string in the language of the size of the recognized language is increased, and so,
current grammar that is error-correcting closest to the training samples are generalized. In order to cope with
input string. This is achieved through a Viterbi-like, error- common statistical fluctuations of experimental data,
correcting parsing procedure (Forney 73) which also state-equivalence is accepted within a confidence interval
yields the corresponding optimal sequence of both non- calculated from a number between 0 and 1, which is
error and error rules used in the parse. From these results, introduced by the user, and which is called accuracy (a).
the current grammar is updated by adding a number of This parameter will be the responsible for the higher or
rules and (non terminals) that permit the new string, along lower generalization of the language recognized by the
with other adequate generalizations, to be accepted. resulting automaton.
Similarly, the parsing results are also used to update
frequency counts from which probabilities of both non- The Synthesis Algorithm
error and error rules are estimated. In this work, automata will be inferred for different
musical styles (languages). “Composing” melodies in a
The k-TSI Algorithm specified style amounts to synthesizing strings (melodies)
This algorithm belongs to a family of techniques which from the automaton that represents the desired style. To
explicitly attempts to learn which symbols or substrings achieve this, we use an algorithm that performs random
tend to follow others in the target language. It is a string generation from a stochastic automaton.


It randomly follows the transitions of the automaton


whole note = = r sixteenth note = = s
from the initial state to some final state, according to the
probabilities associated to each transition. In order to 

listen to the synthesized strings, we developed a program


half note = = b thirty-second note = = f
to convert them into MIDI files (Rumsey 94), so they can
 

be listened to with any MIDI file player.


quarter note = 

= n sixty-fourth note = 

= u
The Analysis Algorithm


As each inferred automaton represents a musical style,


“Recognizing” styles resides in finding the automaton eighth note =  = c
which best recognizes the test melody. To attain this, we
used an algorithm that performed an Error-Correcting Fig. 1: How to code the length of notes in implicit notation.
Syntactic Analysis (stochastic) through an extension of
the Viterbi algorithm (Forney 73). It returned a value that 3
represents the probability from the analyzed string dot = • = p triplet = = t
(melody) to be generated by the automaton. On analyzing
half dot = • ½ = m double dot = •• = l
the same melody with different automata, we classified it
as belonging to the musical style (language) represented
Fig. 2: How to code the “extra-length” field
by the automaton which gave the greatest probability.
The dot increases note length by half of its value, while
How to code music. Implicit notation double dot increases it by three-fourths of its value. Tied
notes were coded with dots and it was necessary to
Since GI algorithms work with symbol strings, it is
introduce the half dot (which does not exist in music
necessary to code music in the same way. To this end, we
have converted musical notation into a simple code that notation) to code notes tied with notes whose length is a
preserves the main attributes of music and is formed of quarter of their value, like a half note tied to an eighth
symbols. It is called implicit notation because the note note. A triplet means that the three notes that it contains
“length”, or duration, is implicit in the coding as in a are played within a value of two notes, and adornment
musical score. As every melody is characterized by the notes are coded just as they are played.
pitch and length of the sounds that compose it, we have
coded only these two attributes. Each note is coded as a


symbol or word composed of three “fields”, one to code


 

:


 

     

 

 

the pitch and two (one is optional) for the length. A space
22c 26c 24c 24c 24n 0c 24c 27c 24c 26c 27c 24c 22c 26c 22c 24c 26c 24n
character is used between notes to separate them. An
example of implicit notation of a score is shown in Fig. 3.
Fig. 3. A Gregorian style score and the corresponding
implicit notation code.
In order to code the pitch, a number was assigned to
each note of the adopted scale, beginning with the deepest
sound, corresponding to number 1, and ascending Experiments
progressively halftone by halftone to the highest sound,
For the present study, we chose 3 occidental musical
corresponding to number 43. The note length was coded
styles from different epochs and we took 100 sample
using one or two characters after the note number. The
melodies from each one. These samples were 10 to 40
second character is optional and we call it “extra-length”.
seconds long. The first style was Gregorian, from the
Thus, a note in implicit notation is coded as follows:
Middle Ages. As a second style, we used passages from
Note number + length + extra-length. arias and instrumental solos from the sacred music of J. S.
Bach, the most famous composer from the Baroque
The way of coding the field length can be seen in figure period. The third style consisted of passages from Scott
1. The field extra-length contains a symbol that modifies Joplin Ragtimes for piano, which belong to the beginning
the length given by the previous field, normally increasing of the 20th century and are, along with Negro Spirituals,
it. We only needed the symbols described in figure 2 to forerunners of Jazz. Since we are interested only in
code the training samples, but there are more. monodies (just one voice), for the second and third styles
it was necessary to find and extract the melody from the
whole set of notes that form the sample musical passage. considered as melodies. Every synthesis in the Gregorian
For the Gregorian style, this was not necessary because it style accomplished this objective, but more than half of
is a monody. the syntheses in Bach and Joplin styles did not (these
syntheses were qualified as “bad”). The second objective
Automatic Composition experiments (very good syntheses) was reached in the Gregorian style,
We inferred three automata (one per style) with each GI but not with Bach and Joplin, as can be seen in Table 1.
algorithm, trying different values of k (with k-TSI) and a Examples of ECGI compositions are shown in Fig. 4 and
(with ALERGIA) with the objective of obtaining the best Fig. 5.
synthesis. As a preliminary evaluation, the following
criteria were adopted:
GREGORIAN BACH JOPLIN
1.- The synthesis must make musical sense and must
not be a random series of sounds. VERY GOOD 20 % 0% 0%
2.- It must sound similar to the Musical Style that it is GOOD 25 % 5% 5%
meant to (i.e. Gregorian, Bach, Joplin). NOT VERY GOOD 35 % 20 % 15 %
3.- It must have the characteristics of a proper melody BAD 20 % 75 % 80 %
(beginning, development, end).
Table 1: Qualification (in percentage) for melodies composed
4.- It must be different from any of the training
by the ECGI algorithm.
melodies.

The 3rd criterion specifically refers to the synthesis size,


because a passage with only a few notes cannot be
considered as a melody; furthermore, it must have a
beginning, a development and an end. In order to
(subjectively) assess the quality of the synthesis, a
secondary evaluation was carried out. This quality
evaluation was made according to the following
classification: very good, good, not very good, bad. Fig. 4: Example of ECGI composition in Gregorian style.

A synthesis was considered bad, if it did not satisfy all


of the four criteria. If it did satisfy them all, then it was
classified according to quality level of the composition as
very good, good or not very good. We consider a synthesis
as very good when it can be taken as an original piece
from the current style without being a copy or containing
evident fragments from samples. Good is the synthesis
that has some small imperfection, such as an inadequate
tonality change. And the classification of not very good
and bad syntheses explain themselves. With this study,
there were two objectives: the first one was to obtain
melodies which satisfy the preliminary evaluation (all
four criteria), and the second one was to see if GI
algorithms were able to compose melodies with a very
good quality, no matter how many would have to be Fig. 5: Example of ECGI composition in Joplin’s style.
rejected.
With the k-TSI and ALERGIA algorithms, results were
Results similar, although a little worse with ALERGIA. That is
why we will only present the results for Gregorian style
Detailed results from experiments with the ECGI
algorithm can be seen in Table 1. We have reached the (Table 2). Detailed results for the remaining styles can be
first objective with every style, that is, ECGI composed seen in (Cruz 96). In k-TSI, best k values for Gregorian
synthetic melodies in the desired style, which made were k=3 and k=4, and the second objective was reached
musical sense and which were long enough to be with them. With k<3 syntheses were too random, and with
k>4 they were composed of evident fragments from sample and 10 test melodies in such a way that, at the end,
samples which grew in size with k. For Bach’s style, best every melody was used both as sample and test. A
k values were k=4 and k=5, and for Joplin’s style k=4 was Confusion Matrix was built with the results of each
experiment. Its rows and columns denote both the style
the best. With these styles, we reached the first objective,
from test melodies and the style in which they were
but not the second one, because the best syntheses were classified, respectively. Later we obtained an Average
composed of evident fragments from samples. In k-TSI Confusion Matrix that summarized the results of the cross-
over 20% syntheses were rejected for being too short validation and whose diagonal represented the Average
(these syntheses were qualified as “bad” in Table 2). Success Rate in the recognition. We also obtained the
Average Classifying Error in the cross-validation.
An example of k-TSI compositions can be seen in Fig.
6. In ALERGIA, best a value was a=0.01 reaching the Results
first objective, but only in Gregorian style. However, the Results for ECGI were good for Gregorian, but not for
second was not attained in any style. ALERGIA syntheses Bach and Joplin’s styles, as can be seen in Table 3 (with
were worse than previous algorithms, so we considered the average success rate in the diagonal). It gave a high
not using it in future studies unless a larger number of average classifying error of the 35.9%.
samples were available.
GREGORIAN BACH JOPLIN
k-TSI k-TSI ALERGIA
(k = 3) (k = 4) (a = 0.01) GREGORIAN 100% 0% 0%
VERY GOOD 10 % 13.3 % 0% BACH 51% 40% 9%
GOOD 30 % 20 % 6.6 % JOPLIN 43% 5% 52%
NOT VERY GOOD 16.6 % 23.3 % 20 %
BAD 43.4 % 43.4 % 73.4 % Table 3: Confusion Matrix for ECGI Cross-Validation.

Table 2: Qualification (in percentage) for melodies composed


With k-TSI, the results were better and for reasons of
by the k-TSI and ALERGIA algorithms in Gregorian style. brevity we only present the diagonals form the confusion
matrices (Table 4) and the average classifying error
(Table 5) for each value of k tested. Detailed results can
be seen in (Cruz 96). We obtained the best results with
k=3, which also gave good results in Automatic
Composition. It is worth noting that these results are quite
good considering the small number of samples we had.

k=2 k=3 k=4 k=5

GREGORIAN 100% 100% 100% 100%


BACH 87% 85% 80% 78%
JOPLIN 0% 85% 64% 20%
Fig. 6: Example of k-TSI composition in Bach’s style.
Table 4: Average success rate classifying with k-TSI.
Automatic Musical Style Recognition experiments
The procedure is similar to Automatic Composition. k=2 k=3 k=4 k=5
The difference is that once the automata are obtained, no
new melodies are synthesized, but test melodies are 37.6% 9.9% 18.6% 34%
analyzed to see which automaton can generate them with
the greatest probability. Then, the style they belong to can
Table 5: Average classifying error with k-TSI
be determined.

A Cross-Validation was made with each GI algorithm The ALERGIA results were similar to k-TSI and can be
for the style classification of test melodies. Then, we seen in Table 6 and Table 7. They are presented in the
studied the success and failure ratios in this classification. same way as the k-TSI results. We tried different a values
In the cross-validation for each experiment, we took 90 covering its entire range ([0,1]) and the results for some of
these values are shown in the above-mentioned tables. We far, GI algorithms compose monodies; that is, melodies
stopped with a=0.008 because the canonical automaton without accompaniment. By changing the GI algorithms
was obtained with this value. Best results were achieved in such a way that they could accept samples formed by
with a=0.01, as in Automatic Composition. In general, several strings, it might be possible to deal with samples
Scott Joplin’s style was better recognized with ALERGIA with more than one voice.
than with k-TSI, and Bach’s style was better recognized
with k-TSI than with ALERGIA.
References
Carrasco, R.C.; Oncina, J. 1994. Learning Stochastic
a = 0.9 a = 0.6 a = 0.1 a = 0.01 a = 0.008 Regular Grammars by means of a State Merging Method.
GREGORIAN
“Grammatical Inference and Applications”. Carrasco,
100% 100% 100% 100% 99%
BACH
R.C.; Oncina, J. eds. Springer-Verlag, (Lecture notes in
63% 65% 78% 80% 66%
Artificial Intelligence (862)).
JOPLIN 72% 77% 76% 80% 70%
Cruz P.P. 1996. Estudio de diversos algoritmos de
Table 6: Average success rate classifying with ALERGIA. Inferencia Gramatical para el Reconocimiento
Automático de Estilos Musicales y la Composición de
a = 0.9 a = 0.6 a = 0.1 a = 0.01 a = 0.008 melodías en dichos estilos. Proyecto Fin de Carrera.
22.3% 19.3% 15.3% 13.3% 21.3% Facultad de Informática. Universidad Politécnica de
Valencia.
Table 7: Average classifying error with ALERGIA
Forney, G. D. 1973. The Viterbi algorithm. IEEE Proc. 3,
pp. 268-278.
Conclusions and future trends
García, P.; Vidal, E 1990. Inference of K-Testable
As a main conclusion from this study, we can say that
Languages In the Strict Sense and Application to
GI algorithms are appropriate for Automatic Composition
Syntactic Pattern Recognition. IEEE Trans. on PAMI, 12,
and Style Recognition, but some improvements have to be
9, pp. 920-925.
made. In Automatic Composition, we obtained good
results in Gregorian style with both the ECGI and the k-
Gold, E. M. 1967. Language Identification in the Limit.
TSI algorithms. These results were not as good for Bach
Inf. and Control, Vol. 10, pp. 447-474.
and Joplin’s styles, but are still promising. In Automatic
Music Style Recognition we achieved good results as well. Rulot, H.; Vidal, E. 1987. Modelling (sub)string-length
With k-TSI, the average classifying error was 9.9 %, and based constraints through a Gramatical Inference method.
with ALERGIA, it was 13.3 %. This is the first time that NATO ASI Series, Vol. F30 Pattern Recognition Theory
these kinds of experiments have been carried out. and Applications, pp. 451-459. Springer-Verlag.

In order to improve these results the following lines of Rumsey, F. 1994. MIDI Systems & Control. Ed. Focal
study: (1) Increasing the number of training samples: Press.
because GI algorithms tend to need a large quantity of
training samples to get good results. (2) Solving the Segarra, E. 1993. Una Aproximación Inductiva a la
“tonality” problem: since training samples have different Comprensión del Discurso Continuo. Facultad de
tonalities, this increases the diversity of the samples and Informática. Universidad Politécnica de Valencia, 1993.
tends to “confuse” GI algorithms. This can be solved with
different methods: transposing every sample to the same
tonality, including information about their tonality in the
samples or just changing the way of coding music to a
“differential code” which codes intervals between notes
instead of coding the absolute pitch of the notes. (3)
Increasing the samples length. Once these problems are
solved, we could employ entire musical pieces and not
just small fragments as was done in this study. future
studies could also be expanded to include Polyphony. So

You might also like