You are on page 1of 24

Yale University Department of Music

Entropy as a Measure of Style: The Influence of Sample Length

Author(s): Leon Knopoff and William Hutchinson
Source: Journal of Music Theory, Vol. 27, No. 1 (Spring, 1983), pp. 75-97
Published by: Duke University Press on behalf of the Yale University Department of Music
Stable URL: .
Accessed: 09/07/2014 22:20

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact

Duke University Press and Yale University Department of Music are collaborating with JSTOR to digitize,
preserve and extend access to Journal of Music Theory.

This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions



We can imagine the act of musical composition as the selection of

elements from severalmusical parameters.For example, the composer
may choose more tonic than dominant harmonies,more quarternotes
than half notes, and create a preponderanceof conjunct ratherthan dis-
junct motions. These choices will bringabout distributionalcharacteris-
tics that may belong to a "style." Once made, these choices are, at any
rate, identifiablecharacteristicsof the music itself.
Elements in musicalparametersare not unlike charactersin common
speech alphabets. Communicativestructuresof substantialsize are the
end result of a complex seriesof choices that are selections from alpha-
betic pools in the case of written literatureand, in the case of music,
from the pools of elementsin the severalparametersthat together com-
The study of the selection and distributionof alphabeticcharacters
is the domain of information theory. More than twenty years ago,
Youngblood proposed that the computation of information content,
the entropy of informationtheory, could serveas a "method to identify
The entropy of information theory is a calculation of the freedom
with which availablealphabeticmaterialsare used. Stated conversely,it


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
is an assessmentof the constraintsplaced on the selection of materials
in the process of forminga communication.Entropiesare often used in
comparisonwith the maximum entropy the alphabetwill allow: This
latter property is an evaluationof the maximum freedom with which
materials might be used; for music, the maximum entropy implies a
seeming anarchy of pitches, rhythms, and other musical elements. The
ratio of the entropy of a particularsample to the maximumentropy is
called the relativeentropy; the redundancyof a text is essentially the
same thing, but viewed from the inverse perspective in which a large
redundancyimplies a small relativeentropy and vice versa. The terms
entropy and redundancyare part of the ordinarylanguageof informa-
tion theory; they are defined in many references, including Young-
blood's "Style as Information."
Youngblood's early foray into possible utilizations of information
theory for the analysisof music showed that excerpts from Schumann's
Frauenliebe und Leben had slightly greater entropy than arias from
Mendelssohn'sSt. Paul.Theinterpretationof this quantitativedifference
in entropies was that, viewed as a whole, the availablepitch vocabulary
was used in a somewhat less constrained way by Schumann than by
Mendelssohn.And the examplesfrom both Schumannand Mendelssohn
were found to have smaller entropies than selections from Schubert's
Die sch6ne Millerin. Quantitativeestimates of entropy were taken to
be descriptive of musical style, and these entropies were also used for
comparativepurposes;that is, specific assessmentsof entropy were re-
lated, not only to a maximumpotential entropy, but also to assessments
of entropy from divergentstyles. It is the second of these relationships
that needs furtherscrutiny.
Youngblood's calculation of entropy for the melodic lines of eight
Schubert songs in major keys was 3.127; for melodies from Schumann
and Mendelssohn,the entropies were 3.05 and 3.03, respectively.Are
the values of entropy for the Schubert examplessignificantlydifferent
fromthose for Schumann?Do they representgenuinestylisticdifferences
between the two? Or does the closeness between the entropies for the
Schumann and Mendelssohnexamples imply that these musical styles
should be thought of as indistinguishable,at least on the basis of the
usage of individual pitches? In short, for the purposes of musical
analysis and musical comparison, are these entropies macroscopically
or microscopicallydissimilar?
Indeed, are they dissimilarat all? That is, do non-identicalentropies
reflect real stylistic differences or are the differing entropies merely
statistical fluctuations? Would a choice of other songs by the above
composers have yielded entropies that would be the same as those
above, or would a selection of other songs have yielded entropies that
might reduce the numerical differences among the entropies or even
change the ordering Schubert-Schumann-Mendelssohn given by the

This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
entropies above? How precise or how imprecise is an entropy value?
Can selected samples of music be used to distinguish one style from
another if these distinctions are based solely upon an information-
theoretic analysis?Does information theory indicate real stylistic vari-
ance, or does it, on occasion, merely reflect the presence of special
combinationsof the alphabeticingredientsthat happen to be found in
selected excerpts of finite length?
The years since Youngblood'sinitial venture into the calculationof
measuresof information content and their relation to musical analysis
have seen the publication of a number of calculationsof entropy and
redundancy. Some analysts continue to view these calculationsas one
way to describe musical style2 and as a basis for the synthesis and
theoretical description of music.3 But the relative precision of the
determinationof entropy with respect to the body of musicalliterature
it purportsto represent,andwith respectto its certaintyfor comparative
purposes, remains seemingly and surprisinglyunresolved. In addition,
the precision and comparativevalidity of entropies for both discrete
and continuous alphabetsmust be determined.The latter is of concern
because a calculationof maximumentropy for use of continuous alpha-
bets, such as loudness for example, is not obtainable. We know the
entropyfor an equal, randomusage of the 12 pitch classes(Appendix I).
But a similarcalculation for all the potentially usablelevels of loudness
would necessitate the establishmentof perceptualthresholdsand nota-
tional devicesnot now part of musicalknowledgeor theory.4
We will show that entropies can indeed be used for comparativepur-
poses, and for both discrete and continuous alphabets. If a value of
entropy is derived from a finite musicalsample,the analystmust, how-
ever, be preparedto calculate the likelihood that there may be a dif-
ference between the value calculated and that of the parent musical
style it purportsto represent.To the usual computationof entropy we
have therefore added a second calculation,which is the determination
of the extent to which the length of a given sample may be judged to
representsafely a homogeneousmusicalstyle. Below we discussthe cal-
culation of probable differences among entropies for finite excerpts
using discrete and continuous musical alphabets.In addition, we offer
as examplescalculationsof entropy and entropy differencesfor excerpts
from contrastingmusicalstyles.

Calculation of variance.The computation of entropy involves the

evaluation of the probabilitiesfor a random choice of symbols from a
hypothetical, infinitely large reservoir consisting of symbols of the
alphabet. The reservoir is populated with symbols according to the
percentageof their common usage in the style. Since we neverhave an
infinitely large reservoiravailableto us for analysis,but are obliged to


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
rely on finite samples for our computations, we try to estimate the
propertiesof the infinite reservoirby attributionfrom the propertiesof
a finite sample.Weassumethat the alphabeticalingredientsof the pieces
we select on an a prioribasis accuratelyrepresentthe full populationof
a style that we assume, for the purposesof our analysis,has an ergodic
and stationary distribution. We can determine the quality of this as-
sumptionby studyingthe influence on the end result of takinga number
of finite samplesof differinglengths.
To illustratehow the finite length of the sample influences the cal-
culation of entropy, let us suppose that we have an infinitely large
reservoirwith a biased population of symbols whose percentagesare
exactly equal to those in an originalfinite sample.Wenow select from
this infinite reservoira collection of symbols whose number is exactly
equal to the size of our originalfinite sample.We repeatthis processa
huge numberof times.Wecalculateprobabilitiesof occurrenceasthough
each finite sample is drawn from the same infinite reservoirand cal-
culatethe entropy of eachseparatefinite sample.We can now investigate
the statistical distributionof this hypothetical collection of entropies.
If the length of the originalsampleis made longer, the spreadin the en-
tropies becomes smaller,until we reach the limit where we have an in-
finitely long sample, and then the entropy has no uncertainty,that is,
no spreadat all. Thus, for infinitely long texts, differencesin entropies
must be indicatorsof real differencesin stylistic usage.
We bypass the tedious procedureof repeatedsamplingsby the use of
mathematicalmethods that are identified in statistics with the multi-
nomial nature of the selection process for a finite sample.To illustrate
this process, let us suppose that we have a languagecharacterizedby
only two alphabeticalsymbols, A and B. Wehave availableto us a text
consisting of only 100 characters.A count of the occurrencesof the
letters shows that A occurs 64 times and B occurs 36 times. According
to the prescription above, we imagine that there exists an infinitely
large reservoirof letters, 64% of which are A and 36%of which are B.
We extract many texts at random, each having exactly 100 characters.
Sometimes we will extract a text of 63 A's and 37 B's, sometimes
65 A's and 35 B's, and so forth. Under the assumptionthat the selec-
tion has been random, we can draw a histogram of the probabilities
that a particularcombination of A's and B's will arise (Fig. 1). As ex-
pected, the most frequent distribution is 64 A's and 36 B's, but this
occurs only 8.3% of the time. 95% of the texts selected have combina-
tions of letters lying between (55A, 45B) and (72A, 28B) inclusive.The
frequency distributionof usage of A's and B's amongthese texts is not
quite symmetricabout the most probablecombinationwhich is (64A,
36B). The central 95% of all combinationslies within about 8A or 9A
of the most likely combination.


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions




40 50 60 70 80 90 100 A
60 50 40 30 20 10 0 B
Figure 1. Probabilitiesof selecting samples of length 100 and having
varying fractions of two symbols from two infinitely large
pools having64%A/36%B(solid) and 75%A/25%B(dashed).






6300 6400 6500 A

3700 3600 3500 B
Figure2. Probabilitiesof selecting samplesof length 10,000 and having
varyingfractionsof two symbols from an infinitely largepool
having 64%A/36%B.The step interval,visible in the graphsin
Fig. 1, is now so small that the graph appears to be con-

This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
We suppose that we have another sample text also of length 100
charactersfrom a different style, and that the new single sample has a
distributionof characterswhich is (75A, 25B). The question at issue is
whether the two texts are sufficientlylong to discriminatebetween the
two styles on the basis of the usageof the charactersA and B. The his-
togram of many texts of length 100 from an infinite reservoirof com-
position 75%A,25%Bis also shown in Figure 1; the most likely text has
the composition (75A, 25B), of course, and this appearsin 9.2%of the
textual samples.This time the central95%of the sampleshas composi-
tions between (66A, 34B) and (83A, 17B), that is, also within about
?8A of the central value of 75A. We see that, at the 95%level of prob-
ability of each style, there was a chance that a text with, for instance
(70A, 30B) could have been selected from either reservoir.Because of
the overlap of the two populations at the 95%level of each, we state
that they cannot be distinguished,one from the other.
In Figure 2, we consider the same two reservoirs,but this time the
selection process involves texts having 10,000 symbols each, again
drawnat random.In the case of the originalreservoir,the most frequent
distributionremainsthe same,namely (64%A,36%B)which is (6400A,
3600B); 95% of the samples are found between (6305A, 3695B) and
(6495A, 3505B). Whereasthe most likely case (64A, 36B) arose 8.3%
of the time for texts with length 100, in the case of texts of length
10,000, the most likely case (6400A) arises0.83%of the time. For the
shorter texts, 95% of the texts will have A valueswithin ?8 or 9 of the
most likely value of 64; for the longertexts, 95%of the texts will have
A values within ?95 of the most likely value. But in terms of percent-
ages, texts of length 100 will have 95% of their samplesin the central
?8 or 9% of all the possible combinationsof A's and B's; for the longer
texts, 95% of all texts will have A values in the central ?0.95% of all
possible responses.By increasingthe textual length by a factor of 100,
the probability of occurrence of the most likely case is reducedby a
factor of 100, and the percentagespreadof the most frequent occur-
rences is reduced by a factor of roughly 10. If we had increasedthe
length of text of a factor of N, the percentagesof spreadwould have
been decreasedby V9. Withthe largersample,we can state with greater
precision that the spreadof distributions-and thus, after computation,
that the spreadof entropies-will be smaller;our singlebut largersample
gives us a firmerknowledgeof the propertiesof the reservoiritself. The
mathematicaldetails of the calculationof the influence of the size of
the sampleon the entropy are givenin AppendixI.
We can put the above remarksin anotherway, using as our example
the commonplace experience of political poll-taking. If in a poll of
opinion, one sample of 100 voters yields a response of 64%yes and
36% no, 95% of other samplings of 100 voters might have yielded


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
responsesbetween 55% and 72%yes. If, however, our pre-electionpoll
were increased to a sample of 10,000 voters and candidate A were
found to receive 64% of the votes cast in the poll, the final election, in
which many more votes would be cast, could be anticipatedto show a
vote favoring A from 63.05% to 64.95% with a 95% probability. In
other words, a poll sample of only 100 voters should not give over-
whelming confidence to the candidate,but a sample of length 10,000
would provide that confidence, unless the results of the advancepoll
were very close to 50%.
We assume that there exists a relationshipbetween sample size and
the accuracy of estimation, based on that sample, of the propertiesof
the infinite pool; we presumethe sample was drawn from the infinite
pool by a randomprocess.We call the infinite pool a musical style.5 Be-
cause of the influence of the sample size upon the uncertainty of the
estimation of entropy in the infinite reservoir,our confidence in identi-
fying the propertiesof the pool increasesas the size of our sample of
the style increases. We state that two styles differ from one another
when the estimates of the entropies for the two styles differ from one
anotherby more than their uncertaintiesat the 95%confidence level.
We may call "stylistic entropy" the distributionof probabilitiesfor
usage of elements within the infinite pool that we conceive as embody-
ing the style. We try to assess the stylistic entropy by using finite
samplesas guides. A measureof stylistic entropy is in direct relationto
what we recognizemusically as style and is potentially valuableto the
formal analysis of music and our general, theoretical structuringof
music. Thus the calculation of equation (A.2) is relevantto a central
problem in the study of music: the identificationof stylistic properties
and our capacity, through objective analysis, to distinguish these
propertiesfor comparativepurposes.
Also pertinent to discussionsof musicalstyle is the interrelationship
among redundancy,maximum entropy and relativeentropy mentioned
above. Relative entropy, as the ratio of the entropy of a style to maxi-
mum entropy, is an indicatorof where a style or work is situatedwithin
a hypothetical continuum running between the extremes of complete
redundancy and the absence of constraint. A referentialbasis of this
type appears to parallel our instinctive ordering and recognition of
styles as points alonga line (often historicallyconceived,as for example,
in aspects of dissonance)from the most constrained,or most redundant,
to the most free usage of materials.(We do not believe the reporting
of redundancyvalues contributes additional informationbeyond that
already imparted by the entropy values, at least if the maximum en-
tropies are identical. Thus in the following discussion,we comment on
entropiesand omit considerationsof redundancies.)


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
Entropy differencesamong selected vocal styles. To illustratethe use
of entropy to describemusical style and to investigatethe use of such
descriptions for comparativepurposes,we have expanded the size and
diversity of data discussedin Youngblood's originalstudy. Ourtabula-
tions of pitches in vocal melodic lines are from the following:
(A) The complete song cycle Die sch6neMillerin by Schubert;
(B) The complete song cycle Die Winterreiseby Schubert;
(C) The complete song cycle Schwanengesangby Schubert;
(D) Mozart arias and songs, K. V. 152, 307, 308, 349, 351, 390,
391, 392, 418, 433, 472, 476, 517, 518, 519, 520, 523, 524,
530, 531, 539, 579, 596, 597, 598;
(E) Johann Adolf Hasse (1699-1783): "Scrivo in te l'amatonome"
and "Per te d'amico"from IlNome, "Bei labbriche amore"and
"Giurail nocchier che al mare" from La Gelosia;"Esicuroil di
vicino" from L 'Aurora;
(F) RichardStrauss:opus 10, no. 8 (abbreviatedhereafteras 10-8),
15-5, 19-2, 21-1, 21-2, 27-2, 27-3, 27-4, 29-3, 32-1, 37-1,
As in Youngblood's original study, we have performed a zero order
Markovchain analysis, that is, we have determinedthe distributionof
usage of pitches without referenceto pitch interrelationships,such as
intervalleaps, relative durations,and so forth. All pitches were reduced
to a single octave andto C majoror A minor, dependingon the mode of
the sample.Transpositionto these keys was alwaysfrom the written key
signatureand thereforeno account was taken of transitionsand internal
modulations.Explicit changesin key signature,however,weretreatedin
the tabulations as new keys and these were then reducedseparatelyto
C major or A minor. All repetitions were included in the samples;a
strophic song with four verses, for example, had each of its pitches in-
cluded in the data count four times. Items (D), (E), and(F) in the above
list include compositionsin majorkeys only. In the tabulationsfor (A),
(B), and (C), we have separatedthe examples in major from those in
minor keys in orderto investigatepotential stylistic differencesin usage
between the two modes for one composer,Schubert.The restrictionto
major key examples in (D), (E), and (F) permits the identification of
variationsin stylistic usagein the same mode among four composers.
Our raw data for the various song categoriesare given in Appendix
II, where they may serve as a data base for readerswho wish to treat
these numbersin other ways.
One reason for expanding the size of the Schubert sample was to
illustrate the effect of the length of the textual sample upon our cer-
tainty or uncertainty in the determinationof entropy in a comparative
stylistic context.

This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
In Table I we have listed the number of notes, the entropy and the
95% confidence level deviations from the entropy values; these devia-
tions are calculated according to formula (A3). We have included in
Table I the entropy values given by Youngblood in his originalpaper,
and for reasons discussedbelow, the entropy for the melodic line from
a single Mozartsong, "Das Veilchen" (K. V. 476).
One might expect that there would be a strong dependence of the
spread of the entropy on the historicalperiod of the composer.On the
contrary, we find that the intervalbetween the 95% confidence limits
depends only upon the number of pitch samples and to a very high
accuracyis independentof historicalperiod. The 95%confidence limits
are givenby the approximateformula
where N is the number of pitches in the excerpt. The expressionis ac-
curate to within 8 or 9% as an indicator of the 95% confidence limits
for those cases that are likely to arise in the Westernliteraturefor the
period spanningMozartor Hasse to R. Strauss.The spreadin entropy
of pitch usage does not seem to depend on the historicalperiod, at least
for the musicalepochs spannedby the above composers.
We may apply these formulas in the following sense: Let us con-
sider the case of two styles that, based upon samplesof a certainlength,
have entropies that differ by 0.082. We ask how long samplesof equal
length of the two styles should be in orderthat the two estimatesof H
do not overlap at the 95% level. We find that if the two styles are each
sampled by about 7900 characters,the entropies are probably signifi-
cantly differentat the 95%confidence level. We have chosen the entropy
difference0.082 because of the result obtained by Youngblood,namely
the entropies of 3.130 and 3.048 which were derivedfrom about 1000
sampleseach for the Schubertand the Mendelssohnworks. We can assert
that these sampleswere too small-by almost a factor of 8-for us to be
certain that the entropies of the two styles of these composerswere
significantly different. Put another way, if we had longer samples of
each composer's style, we might have had a shift in the values of the
entropies so that the style with the greaterentropy might have become
the one with the smallerentropy; then again, the entropiesmight have
shifted the other way. We cannot make definitive statements about
stylistic differences with these small samples.In generalthe ability to
define the entropy of a style improvesas the length of the sample in-
creases,but the improvementis ratherslow, being only inverselyas the
squareroot of the length of the sample.
If we enlarge the length of our text to span the compositions de-
scribed in collections (A) through (F), can we then state anything
definite about stylistic differences or similaritieson the basis of the


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
Table 1
Number at 95%
of Confidence
Notes Entropy Level
Hasse All arias 3150 3.039 0.060
Mozart All songs 5057 3.009 0.048
Mozart "DasVeilchen" 176 3.308 0.286
Schubert Die schOneMllerin Major 4326 3.103 0.055
Schubert Schwanengesang Major 1783 3.207 0.089
Schubert Die Winterreise Major 1639 3.162 0.091
Schubert Die sch6neMllerin Minor 1122 3.103 0.109
Schubert Schwanengesang Minor 1305 3.143 0.106
Schubert Die Winterreise Minor 2383 3.222 0.078
Schubert All cycles Major 7748 3.163 0.042
Schubert All cycles Minor 4810 3.198 0.055
Schubert All cycles Maj+ Min(1) 12558 3.270 0.034
Schubert All cycles Maj+ Min(2) 12558 3.234 0.033
Schubert All cycles Maj+ Min(3) 12558 3.187 0.033
Strauss All songs 1220 3.397 0.104
Schubert (Youngblood) 1025 3.130 0.114
Mendelssohn(Youngblood) 577 3.039 0.144
Schumann (Youngblood) 1066 3.048 0.108
(1) Songsin majorkeys transposedto C majorand in minorkeytsto C minor.
(2) Songsin majorkeys transposedto C majorand in minorkeys to A minor.
(3) Pitchesorderedby frequencyof usage.


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
frequency of pitch usage?The six Schubert collections have entropies
rangingfrom 3.103 to 3.222. The highest and lowest entropies among
the six are from the minor key examples in Die schbne Miillerinand
Winterreise,respectively. We return to the question posed above. Are
numbers 3.103 and 3.222 near to each other or far apart?And if they
are far apart, what are the stylistic ingredientsthat set apart the songs
in minor keys from Die sch6neMillerin and Die Winterreise?The same
questions can be asked regardingthe comparisonamongother entries in
Table 1.
Suppose that we take a historical question as our first test of the
significance of the numericaldifferences in the values of entropy. Are
our samples from Hasse, Mozart, Schubert and Strauss indicative of
separableand, in some sense, sequential styles? Are these samples,at
the level of note-by-note use, indicative of identifiably separablestylis-
tic "reservoirs"from which the sampleswere drawn,at least at the 95%
level of confidence?
In Figure 3, the entropies for the songs in majorkeys by these four
composers are plotted as central valueson a bar extendingbetween the
95% confidence level estimates of the entropies, a result based on the
length of the sampleand the measuredprobabilitiesof usage.The styles
of Mozartand Hasse,judging from the pitch distributionsof the songs,
are not distinguishable.But the styles of Straussand Schubert are dis-
tinct from each other and both are individuallyisolated from the region
of entropies occupied by the Hasse and Mozart samples. Since the en-
tropies of the three distinguishablestyles increasewith historicaltime,
the stylistic distinctions indicate a lessening of the constraintsin the
usage of pitch, certainly in keeping with our knowledgeof the increas-
ing chromaticismof the nineteenth century. Such a conclusion, obvious
though it may be to the musician, could not have been statistically
stated had the examples of the various composers been shorter in
We turn to another illustrationof the effect of increasedsamplesize
on the accuracy with which entropies can be calculated. Figure 4 de-
picts, againwith the centralentropy value occupyinga position amid its
own 95% confidence level estimates, first the songs in the majormode
from the three song cycles of Schubert and then those in the major
mode from all three song cycles taken together. They are listed in order
from the song cycle with the smallest samplesize to the largest.In these
cases, the examples cannot be distinguishedby entropy, but one sees
that the greatestprecisionresultsfrom the longest sample.
If we use only those examplesfrom these same Schubertcollections
in the minor mode, we find that the range of entropies between the
95% confidence limits again does not permitus to identify stylistic dif-
ferences(Figure5). Onceagain,we see that the effect of reducedsample


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
; ; , Hasse
I I Strauss
2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7
Figure3. Entropies of notes in song collections in major keys of four
composers.In this figureand succeedingfigures,barsindicate
span of 95% confidence limits. All songs are transposedto a
common key.

I I I Schwanengesang

,I I Schone Mullerin
2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7

Figure4. Entropiesof notes in songsin majorkeys from three Schubert

song cycles.

II I Sch6ne Millerin
I Schwanengesang
1 I I- AllMinor
I I 1 I I I I I I I
2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7

Figure5. Entropiesof notes in songsin minor keys from three Schubert

song cycles.

This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
size is to increase the size of the intervalbetween the 95% confidence
limit extremes.
Perhapsthe most dramaticillustrationof the effect of increasingthe
length of the sample is given in Figure 6. Here we have plotted the en-
tropy for the pitch distributiongiven by the melodic line from a single
song, "Das Veilchen" by Mozart (K.V. 476) and comparedit with the
entropy for all the Mozartsongs in collection (D), which includes "Das
Veilchen" itself. The entropy for "Das Veilchen" is 3.31; the entropy
for the full Mozartcollection is 3.01. At first glance the difference in
entropies is so substantial that we might be persuadedthat the chro-
matism of "Das Veilchen" identifies it as arisingfrom a different genre
than the other pieces of Mozart.Such a conclusionis not justified. "Das
Veilchen" is so short that the certaintyabout the entropy of the stylis-
tic pool from which "Das Veilchen" might have been derivedis poor.
The intervalbetween the 95%confidence levels is so broadthat it over-
laps the narrowerinterval between the correspondinglimits for the
much longer sample of the full Mozartcollection. In brief, we cannot
say with certainty that "Das Veilchen" was selected from a different
stylistic pool than the full collection of Mozart songs; by the same
token, neither can we say that they were selected from the same pool.
The enormous difference between the two sets of entropy intervalsat
the 95% confidence level graphicallyemphasizes the importance of
sample size and makes it apparentthat an aspect of a composer'sstyle
is not likely to be establishedstatistically through an analysis of short

Joint entropy of two libraries.To investigatewhetheror not pitches

areusedin the majorand minor modes in stylistically differentmanners,
we have compared the entropies for (1) all the Schubert examples in
major keys, (2) all the Schubertexamples in minor keys, and (3) the
combination of the two. These entropies, again with bars drawnto in-
dicate certainty of the entropy estimatesat the 95%level of confidence,
are givenin Figure7. At a glance, one can observethat the collections of
songs in major and minor keys are not statistically distinguishableon
the basis of their pitch entropies. A second glance seems to imply that
there is a statisticallysignificantdifferencebetween the entropiesof the
songs in major keys and the full collection of works in both majorand
minor keys. This is, however, an artifact of juxtaposingtwo subtleties
in our analytic procedure: key transposition and the ordering of the
separatemajorand minor tabulations.
To obtain the entropy value 3.27 for the full collection of majorand
minor pieces, we have made the assumption that the two scales are
transposed so that they have the same tonic; for instance, we may
imagine that the minor scale is transposed to C minor and that the


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
I ?IDas Veilchen
1 I I I I I I I I
2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7

Figure6. Entropy of notes of "Das Veilchen" in comparisonwith en-

tropy of notes of 25 songs in major keys (including "Das
Veilchen")by Mozart.


I i AllMinor
AllSchubert (1)

S' ? (2)
S"- ' (3)

2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7

Figure7. Entropy of notes of songs in major and minorkeys in Schu-

bert song cycles, in comparisonwith entropies of the full
collection. All songs in major keys are transposedto the key
of C and in (1) all songs in minorkeys have been transposed
to C minor, in (2) songs in minor keys have been transposed
to A minor. In (3) songs in major and minorkeys have been
transposedto their respectiveordersof usage.


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
population entries for the minor pieces are then addedto those for the
major pieces transposedto C major.In this case the arrangementof the
principaltones of the diatonic majorand the harmonicminor scales (we
use the harmonicminor for illustrativepurposes)is
Major C D E F G A B
Minor C D Eb F G Ab B
Sum C D Eb E F G Ab A B
with the "accidentals" inserted in the appropriategaps. Since these
seven tones in each mode have greatest percentageof usage, it follows
that the composite formed by addingthe two populationstogether will
show larger populations at the nine sites indicated, than it will at the
three remainingsites (C#, F#, Bb). Hence the entropy of the sum of
the two populations, thus arranged,must be greaterthan the entropy
of each collection separately,which has only seven sites with largepop-
Our original assumption was that the pieces in major and minor
modes were selected from independent pools, and there is thus no
a priorireasonto align the two scalesas above. Other assumptionscould
be made; for example, since the key of A minor is the relativeminor to
the key of C, we might align the two scales so that the third of the
minor scale correspondsto the tonic of the major scale. This arrange-
ment is
Major C D E F G A B
Minor C D E F G1 A B
Sum C D E F G G# A B
With only eight strongly populated entries in the sum, the entropy for
this particularfusion of the two collections should be less than that of
the first transpositionand greaterthan either the major or minor col-
lections separately.The entropy in this case is 3.24 (Figure7).
These illustrationssuggest that any attempt to fuse two or more in-
dependent collections of works into a single libraryfor the purposeof
evaluatingthe information content of a large collection, must have a
rational basis for the fusion. The end results of the above attempts to
fuse the two librariesdepend on a priori assumptions;there is no ra-
tional basisfor their fusion.
We have presentedtwo possible bases for fusion; there are far more
than twelve. In the two fusions we have outlined above, we have as-
sumed that the orderof the pitches in the two chromaticscalesfor each
mode is preserved.The orderingof the discretetones in the convention-
al twelve tone scale rangingfrom C to B, as above, is a convenience


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
dictated by the convention of musical notation and by instrumental
performancetechniques. We might prefer the not implausiblebut still
a priori assumption that the orderingof a discrete alphabet may be
made on the basis of frequency of usage ratherthan on common prac-
tice. Under this assumption, we might argue that the conventional
arrangementof the letters of the alphabet
is convenient for library cardcatalogsand telephone directoriesin view
of our present historical antecedents,but that this orderinghas no rela-
tionship to usage. If the arrangementwere based on frequencyof usage,
we should use the familiar
sequence of the alphabet.(Thereappearsto be no reasonwhy this series
could not be taught to infants, except that certain nursery rhymes
would be meaninglessand that the aforementioneddirectorieswould
become inappropriate.)
A rearrangementof the order of the letters has no influence on the
information content of a messageor communication.We have already
remarkedthat the entropy of a populationwith 75%Aand 25%Bis the
same as that of another population with 25%Cand 75%D. But the
problems we have been discussingarise when we attempt to fuse the
two collections; should we combine the entriesin A and C together, or
should we put A and D together?Up to now, our musicalanalogiesare
that since C appearsbefore D in the alphabet, C should be combined
with A in the fusion process. But the consequences are significantly
different in the two cases. If we combine two equal collections, we find
the AC vs. BD fusion generates a 50%-50%partition, while the AD
versusBC fusion generatesa 75%-25%partition.
Let us assume therefore that we dissociate scale-sensitiveproperties
from the orderingof the pitches of the two Schubertsong collections,
and arrangethe scales in the order of frequency of usage. In this case
we would arrangethe notes of the two scales in different orders.The
Major GEC D FA B Ab= Eb = Gb Bb Db
Minor G C Eb D F Ab B Bb E Gb Db = A.
Here we have transposedthe collections into the keys of C majorand C
minor, although, as we have argued,this particulararrangementis not
essential to our argument.This display illustratesthe strong similarity
in usage between the two modes with a common tonic, the principal
difference being the interchangein the order of usage of the thirdand
the tonic. But this difference is significant.If we now add two popula-


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
tions, verticallyalignedas above, in orderto obtain a composite popula-
tion, we obtain the entropy 3.187, a value intermediateto the entropies
of the two collections separately,as is to be expected. The 95%confi-
dence bars are smallerin these cases of composite librariesthan for the
separatecollections, because of the greatersize of the composite.
The iconoclast might wish to carry our argumentregardingcareful
inspection of the assumptionsof common practice to the extreme and
inquire into the rationality of sorting the individualsongs into those in
major and minor keys separately.We admit that such sorting also has
attributes of a prioriassumptionsbuilt into it. But we assertthat the in-
formation content in any other sortinginto samplesof sufficient length
that stylistic resolution is possible can also be evaluated,as long as the
basis for the sortingis made clear.

Continuousalphabets.Withthe exception of the discretepitch alpha-

bet of usual Westernnotation and its interval alphabet, most of the
ingredients of Westernmusic are defined and measuredby continuous
alphabets.We have in mind such ingredientsof style as dynamics, dis-
sonance, melodic activity, instrumentalquality, and others, all of which
are perceived and measured in a non-discrete, that is, non-digitized
sense. Elsewhere we have outlined a definition and interpretation of
entropy for such systems.6The principalconjecturein that analysiswas
that the sample population has a continuous distributionof values;the
principal conceptual result is that, although no absolute entropy for
such systems can be defined, a relative entropy useful for comparative
purposescan neverthelessbe established.Whenthe scheme was applied
to the calculation of the entropy for dissonance in four-part Bach
chorales, among other results it was found that the dissonanceshad a
log-normaldistribution. The occurrenceof a recognizabledistribution
function permits us to approach more nearly one of our goals dis-
cussed above, namely to obtain a direct estimate of the source pool of
the alphabeticelements of style. Because of the availabilityof a model
of a distribution, we can estimate how many dissonance values are
likely to occur in the source pool even at values of dissonance not
directly sampled. We do so by interpolation between neighboring,
directly sampled values. The entropy of a log-normal distribution is
directly related to two characteristicsof the distribution, the mean
value and the variance.The 95% confidence level estimates spanning
the quoted entropy value are directly relatedto the uncertaintiesin the
two propertiesof the distribution;our ability to estimate the mean and
the variancein the log-normaldistributionalso improvesas the square
root of the numberof sampleestimates(see Appendix III). The applica-
tion of the formula has led directly to the uncertaintyestimatesin the
entropies we have given in our preceding paper.7 We conclude that

This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
uncertaintiesin entropy estimates for continuous alphabetsare subject
to the sameinfluences of samplelength as are entropy estimatesfor dis-
crete alphabets.

The hypothesis that information theory can serve as a tool in the

analysisand comparativestudy of musicalstyle is wellfounded."Musical
style" is commonly taken to referto those fixed featureswhich recuras
part of the characteristicmusical languageof a composer, school, era,
or geographicarea.8With respect to our perceptualresponseto musical
style, these fixed featureshave been understood in recent years to be
conditioned probabilityresponses,that is, learned,culturalexpectations
that pertain to the music of a particularera, compositional school,
geographiclocale, or to separablemusicalentities, such as the collective
worksby an individualcomposer.9
Discussions of questions of musical style often stress the pervasive,
enduring and repetitive aspects that characterizeboth the collective
membershipof a musical style and our collective recognition of that
style. The individuality of each work is another matter. The separate-
ness of the single work, its own repetitivefeatures,and the probability
relationshipsthat evolve within the singlework, arenot defined by our
theoretical understandingsof musical style; those factors which lend
discreteness-the particularconfigurationswhich give a single work its
uniqueness-play no direct role in our common theoreticalcomprehen-
sion of a musicalstyle.
The manipulation of data in Shannon's formula (A.2) for entropy
takes into account the collective features-expressed as probabilities-
that characterizethe distribution of materialsin any alphabet chosen
for study. To be sure,those materialsappearingmore equably contribute
more to the entropy; hence whenever usage occurs with fewer con-
straints the entropy is higher. The important fact is that the formula
identifies as data the same information we model in our theories of
musical style, namely the distribution of musical elements viewed
collectively within a chosen body of literature.
There is of course the practical difficulty of assessingthe entropy
of the long-rangecorrelationsthat go to make up the more visibleand
form-relatedaspects of musical style; to date our successin the venture
of adaptinginformationtheory to style has been confined to the occur-
rence of individualalphabeticelements or, at most, to the occurrences
of pairedcombinationsof these elements.
Our study has shown that one of the first questions to be raised
regardingthe use of informationtheory in musicalanalysis,namely the
identification of the point at which numericaldifferencesin entropies


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
are significantlydifferent, can be answered.We are able to predictwith
95% certainty (or with any other level of certainty for that matter)
when two texts have been drawnfrom different stylistic pools. The raw
selection of pitches, reduced to a common octave, may well be a primi-
tive measure on which to base stylistic comparison, but even at that
level, a sufficiently long sample of two texts, such as the Schubertand
Straussexamples,is capableof definingtwo contrastingmusical styles.
Entropy values that cannot be resolved by the procedureswe have
outlined above are indicative of neither the "sameness"or "otherness"
of two populations.Two overlappingestimatesmay be resolvableas dif-
ferent, if the sample sizes are increased.It is also possible that increase
of samplesize does not performthe resolutionexpected, merely because
the two texts are not chosen from different stylistic pools, at least on
the basis for analysischosen.
A further penetrationinto musicalstyle throughthe use of informa-
tion theory appearsto necessitate that ways be developed of choosing
those musicalparametersor alphabetswhich are better, or even the best,
discriminatorsof musical style. These better discriminatorswill allow a
reduction in the length of text needed to perform the discrimination
analysis. Undoubtedly these improvedmethods will include considera-
tion of the way in which the alphabeticelements are orderedand other-
wise interrelated in the sample texts; in the present examples, no
attention was given to the sequentialarrangementof the notes in the
There would seem to be much more efficient methods of pattern
recognition, that is, the identification of one piece or another or of a
groupof piecesas belongingto a given style, than those presentlyapplied
in information theory. Indeed the case can be made that present pro-
cedures could yield the same entropies for two different styles even
with large samples. A descriptionof more efficient methods for pattern
recognition has been begun. For the present,we only imply that if two
entropiesare different at a suitablelevel of discriminationsuch as at the
95%level of confidence, the two excerpts are taken from two different
master pools; if they are not resolvableat the givenlevel of discrimina-
tion, it does not follow that they are from the same pool. In the latter
case, other data with these methods, or other methods, will have to be
invoked to performthe discrimination.


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
Let the probability of observingthe ith symbol be Pi. Let us assume
that the ith symbol appearsni times randomlyin a text of greatlength
N. Then we estimate
Pi = ni/N.
The standarddeviation oi in the quantityPi is givenby the propertyof
the binomialdistributionwhich is
1 1/2
a ni(N-ni) (A.1)
N N N )
Since the entropy is defined as
H= Pi logePi (A.2)
we have

aH= -log2 e(1 + logePi)

Then the standarddeviationof the entropy due to the independentvar-
iations of the 12 charactersis
12 ni(N-ni) 1 + logePi
U= N N
for small oH. At the 95%confidence level, we have
l 1.96 (12 ni(N-ni) 2 112
1.96H I [1 +logei]
By direct substitutionof (A.1) we have

1 [(1 +
H = log2e n=9 1/2 (A.4)


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
Tonal Frequenciesfor Pieces in MajorKey

Die sch6ne Die Winter- Schwanen- Schubert
Miillerin reise gesang Major
C 690 256 259 1205
C# 40 16 21 77
D 601 260 228 1089
Eb 64 75 51 190
E 682 268 317 1267
F 357 153 249 759
F$ 121 19 45 185
G 865 293 259 1417
Ab 61 70 59 190
A 421 120 171 712
Bb 73 31 56 160
B 351 78 68 497
Total Number
of Notes 4326 1639 1783 7748


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions
Tonal Frequenciesfor SchubertPieces in MinorKeys

Die sch6ne Die Winter- Schwanen- Schubert
Miillerin reise gesang Minor
A 150 485 271 906
Bb 8 67 28 103
B 154 278 118 550
C 128 304 132 564
C 34 39 51 124
D 116 206 108 430
Eb 11 73 33 117
E 291 426 325 1042
F 70 172 101 343
F# 21 46 33 100
G 82 117 60 259
G# 57 170 45 272
Total Number
of Notes 1122 2383 1305 4810

The probabilitydensity for the log-normaldistributionis
w(x) = (cc/n)
W(X) exp I-cc[In(x/xo)12)
which is normalized;the mean is xo and the varianceis cc. We inte-
grate(w log w) dx from 0 to ooand get
H = logxo-%hlog(c/lr) + h.
Thus logxo- ?logrc is the relativeentropy of the continuous distribu-
tion. By inspection

J( 4 xo
is the standarddeviationof the relative entropy.


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions

1. Youngblood, J. E., "Style as Information," Journal of Music Theory 2 (195 8):

2. Cohen, J.E., "Information Theory and Music," Behavioral Science 7 (1962):
3. The reader should be cautioned that knowledge of entropy in a musical style
is not, in itself, sufficient for the synthesis of that musical style. Below we
show, for example, that a given value of entropy is not unique to a style and
that several styles may have the same entropy. The converse is, however, not
true; that is, two bodies of musical literature with differing entropies cannot
be the same.
4. Knopoff, L., and Hutchinson, W., "Information Theory for Musical Continua,"
Journal ofMusic Theory 25 (1981): 17-44.
5. The "infinite pool" is, of course, an abstraction from a dynamic process; our
purview is a reduction from the panoply of musical parameters that is com-
monly held to be a musical style. The reduction is nevertheless sufficient for
purposes of identification and comparison.
6. Knopoff and Hutchinson, "Information Theory."
7. Ibid.
8. Apel, W., "Style," in Harvard Dictionary of Music (Cambridge: Harvard
University Press, 1972), pp. 811 ff.
9. Meyer, Leonard, Emotion and Meaning in Music (Chicago: University of
Chicago Press, 1956), pp. 45 ff.


This content downloaded from on Wed, 9 Jul 2014 22:20:01 PM

All use subject to JSTOR Terms and Conditions