You are on page 1of 6

A Statistical Analysis of Tonal Harmony

By David Temperley
2009
------------

Overview
It is generally believed that harmony in common-practice music (i.e. 18th and 19th century Western art
music) is characterized by certain basic principles. Dominant harmonies (V and vii) go to tonics (I),
predominants (IV and ii) go to dominants, root motion by descending fifth is especially favored, and so
on. But to what extent are these principles actually followed in common-practice composition? There has
been surprisingly little empirical study of this question. [1]
This page presents a statistical analysis of harmonic progressions in a corpus of common-practice music.
The data files and programs used can be downloaded at the bottom of the page.
The data comes from the workbook accompanying Stefan Kostka and Dorothy Payne's theory textbook
Tonal Harmony, 3rd edition (McGraw-Hill, 1995). The workbook contains a number of excerpts of
common-practice pieces, to be analyzed by the student; an accompanying instructor's manual contains
"correct" analyses done by the textbook authors, in conventional Roman numeral notation. The analyses
also show modulations, and represent each chord in relation to the local key.
I created a corpus consisting of all of the analyzed excerpts in the workbook of 8 measures of more in
length; there were 46 such excerpts. I call this the "Kostka-Payne corpus." (A list of the excerpts is shown
here.) I created midifiles and "notefiles" (textfiles listing the notes with pitches and on/off times) of all the
excerpts. (This was done in connection with the testing of the Melisma music analysis system; the
notefiles and midifiles are available at the Melisma ftp site.) The harmonic analyses of the excerpts were
computationally encoded by Bryan Pardo, and added to the midifiles (these midifiles are available at
Pardo's website). I then converted Pardo's analyses into another format, which I call "chord-list" format.
The beginning of a chord-list (for the opening of the Minuet in G major from the Notebook for Anna
Magdalena Bach) is shown here:
0.000
2.608
3.913
5.217

2.608
3.913
5.217
6.521

- 0
- 5
- 0
- 11

1
4
1
7

7
7
7
7

7
0
7
6

Each line represents a chord segment. The first number indicates the beginning of the segment, in
seconds. (For each excerpt, I chose a tempo that I thought was reasonable, and then generated times for
the chord segments using this tempo.) The second number represents the end time of the segment.
Following this are four integers. The first is the "chromatic relative root": the chromatic interval from the
root to the tonic. I use the usual pitch-class notation for intervals: I = 0, bII (or #I) = 1, II = 2, etc. The
second integer indicates the "diatonic relative root" - the Roman numeral number (I = 1, bII = 2, II = 2,
etc.). The third number indicates the tonic (assuming the usual pitch-class notation: C = 0, Db/C# = 1,
etc.), and the fourth number indicates the _absolute_ root (again assuming the usual pitch-class notation).
So the first chord statement above indicates I in the key of G major - a G major chord, in absolute terms.
(Applied chords were relabeled in relation to the local key: for example, V/V was converted to II.)
Note that this format contains no information about the quality of chords (major/minor/diminished) or
extensions (e.g. sevenths, ninths). This information is available in Pardo's midifiles, but I did not encode
it. [2]

087 76.019 0.017 16.033 0.383 VI 50 0.054 0. and the total amount of time spent on that root.088 0.023 0.409 0. A few chords in the corpus were given chord symbols for which there is no widely accepted root.346 --. such as "German 6th".180 118. The title of each excerpt (using the short names shown in the corpus list) is indicated at the beginning of the excerpt.the number of times each chord moves to each other chord. for example.038 0. which extracts various kinds of aggregate statistics.036 25. the second "proportion" column shows the time spent on the chord as a proportion of the total time.668 III 21 0. and a total time of 1354. ("Pivot chords" . (The data only reflects transitions within a single key section.chords at key boundaries that function in both the previous key and the following one .223 0. The corpus contains 919 chords.116 seconds.792 bII 17 0.The file kp-chord-list contains the chord-lists for the complete KP corpus.006 0.) CHROMATIC ROOT TRANSITION COUNTS Cons Ant I bII II bIII III IV #IV I bII 0 3 22 1 1 32 7 7 0 3 1 0 2 0 II bIII 31 8 0 0 2 10 0 1 0 1 0 0 0 0 III IV #IV V bVI 4 0 4 0 0 4 0 45 0 1 0 7 0 0 2 1 7 0 0 3 0 116 2 45 4 1 11 9 11 0 2 4 0 0 0 VI bVII 17 0 8 0 7 1 0 3 0 0 0 0 1 0 VII 19 1 6 0 1 4 0 .059 44.663 seconds.) I also separated the corpus into major-key and minor-key key sections.029 29. Dotted lines "---" separate one key section from another.) Then I looked at the "chord transitions" -.007 0.301 VII 35 0. diatonic root and absolute root are also -1.018 0.029 18. diatonic.102 bVI 34 0.022 0.014 0. no transition is recorded for moves from one key section to another.552 proportion 0. "Antecedent" chords are shown on the vertical axis.011 0. tally.706 bVII 6 0.113 0.010 8.028 (The first "proportion" column shows the count of the chord as a proportion of the total count. proportion total excluding time Root count proportion tonic (secs) I 318 0. and kpchord-list-mi includes just the minor-key ones.) There were also 23 "miscellaneous" chords. the number of occurrences of I moving to II is 31.012 0.037 0.061 37.018 0.057 0. First I extracted the total count of each chromatic relative root.are represented in both key sections.805 II 104 0.622 #IV 17 0.233 0. the file kp-chord-list-ma includes just the major-key ones. and absolute roots. I then wrote a perl-script. taking a total time of 30.pl. Some Aggregate Statistics Once I had the KP corpus in "chord-list" form. For such chords.652 V 214 0. not assigned any explicit root (such as augmented-sixth chords). "consequent" chords on the horizontal.104 IV 70 0.121 91. (These are assigned chromatic root of -1 in the chord list. the label -1 is used for the chromatic.370 302.076 0.766 bIII 10 0.068 0.553.

000 0.000 0.000 0.192 0.000 0.000 0.000 0.047 0.000 0.000 0.000 0.125 0.105 0.000 0.000 0.000 0.000 0.000 0.012 0.000 0.000 0.190 0.000 0.000 0.000 0.000 0. +P4.027 0.000 0.000 0.011 0.015 0.000 0.559 0.000 0.V bVI VI bVII VII 167 5 4 0 27 0 2 2 0 0 8 8 28 0 0 1 0 0 5 0 2 1 1 0 3 4 3 4 0 0 0 0 2 0 1 0 2 1 1 1 7 0 0 0 1 6 3 0 0 0 0 2 0 0 0 2 0 1 0 0 It is useful to represent this data in two other ways.000 0.119 0.000 0.000 0.162 0.848 0.071 0.053 0.176 0.030 0.015 0.000 0.000 0.000 0.000 0.024 0.057 0.053 0.000 0.059 0.000 0.010 0.000 0.000 0.040 0.005 0.020 0.000 0.000 As a final analysis.471 0.000 0.176 0.438 0.074 0.000 0.105 0.026 0.000 0.010 0.000 0.004 0.651 0. The right column groups these into diatonic intervals.000 0.067 0.000 0.000 0.400 0.500 0.030 0.000 0.000 0.005 0.040 0.233 0.000 0.000 0.326 0.000 0.368 0.000 0. we represent chromatic root transitions as a proportion of the total count for the consequent chord.125 0.000 0.000 0.030 0.000 0.000 0.030 0.000 0.038 0.047 0.000 0.019 0.016 0.000 0.211 0. CHROMATIC ROOT TRANSITIONS AS PROPORTION OF COUNT FOR CONSEQUENT CHORD Cons Ant I bII II bIII III IV #IV V bVI VI bVII VII I bII II bIII III IV #IV V bVI VI bVII VII 0.118 0.000 0.000 0. For example.000 0.023 0.562 0.061 0.533 0.) along with its count.176 0.000 0.029 0. The left column below shows each chromatic interval (+m2 = ascending minor second.059 0.453 of the time.000 0.000 0.333 0.295 0.125 0.021 0.000 0.062 0.004 0.455 0.000 0.080 0.000 0.167 0.000 0.188 0.200 0.000 0.115 0.000 Now the same for the antecedent chord.1% of the time.000 0.044 0.412 0. The values in each column sum to 1. so a descending fifth is represented as an ascending fourth.008 0.029 0.167 0.000 0.100 0.000 0.077 0.084 0.211 0.158 0.081 0.067 0.000 0.010 0.000 0.053 0.000 0.066 0.000 0.000 0.000 0.000 0.000 0. First.000 0.167 0.000 0.000 0.000 0.125 0.000 0.211 0.000 0.029 0.000 0.405 0.005 0.625 0.000 0.023 0.047 0.043 0.000 0. we consider the counts of different root interval motions.133 0.071 0.222 0.118 0.053 0.000 0.000 0.000 0.000 0. (Each interval is represented by its smallest possible form.084 0.000 0.084 0.041 0.004 0.703 0.000 0.833 0.077 0.000 0.062 0.000 0. +M2 = ascending major second.059 0.400 0.000 0.308 0.000 0.010 0.105 0.000 0.000 0.059 0.030 0.147 0. CHROMATIC ROOT TRANSITIONS AS PROPORTION OF COUNT FOR CONSEQUENT CHORD Cons Ant I bII II bIII III IV #IV V bVI VI bVII VII I bII II bIII III IV #IV V bVI VI bVII VII 0.000 0.000 0.440 0.015 0.115 0.023 0.047 0.029 0.010 0.093 0.005 0.000 0.000 0.000 0.062 0.818 0.077 0.000 0. I moves to V .020 0.000 0.062 0.118 0.016 0. thus one can see.000 0. for example.000 0.368 0.082 0.021 0.100 0.118 0.000 0.000 0.000 0.000 0.125 0. etc.000 0. Now each row sums to 1.036 0.000 0.000 0.143 0.438 0.000 0.000 0.109 0.) INTERVAL COUNTS Chromatic +m2 72 +M2 55 +m3 7 +M3 25 +P4 308 -TT 25 -P4 167 Diatonic +M/m2 127 +M/m3 +P4 TT -P4 32 308 25 167 .053 0. that I is approached by V 62.621 0.000 0.093 0.000 0.000 0.601 0.000 0.000 0.091 0.005 0.000 0.280 0.000 0.453 0.000 0.000 0.000 0.000 0.160 0.100 0.000 0.121 0.000 0.010 0.

That is. and tritones least common of all (25).) The encoding of the data by Pardo reflected the lower level (I6/4-V). "correct" progressions of tonal harmony. Aggregate Statistics (with Cadential 6/4's Reanalyzed) A close inspection of the data revealed that the oddities noted above -. Consider just the transition table: Cons I Ant I 0 bII 2 II 5 bIII 1 III 1 IV 27 #IV 3 V 166 bVI 3 VI 4 bVII 0 VII 26 bII 7 0 3 1 0 2 0 0 2 2 0 0 II bIII 31 8 0 0 2 10 0 8 8 28 0 0 1 0 1 0 0 0 0 1 0 0 5 0 III IV #IV V bVI 4 0 4 0 0 4 0 2 1 1 0 3 45 0 1 0 7 0 0 4 3 4 0 0 2 1 7 0 0 3 0 0 0 2 0 1 84 3 62 4 1 16 13 0 4 1 1 2 11 0 2 4 0 0 0 7 0 0 0 1 VI bVII 17 0 8 0 7 1 0 6 3 0 0 0 3 0 0 0 0 1 0 0 2 0 0 0 VII 19 1 6 0 1 4 0 2 0 1 0 0 . and ascending seconds over descending seconds. (This is in fact a common convention. cadential 6/4's are frequently (indeed normally) preceded by II or IV. and I-IV (the last two are equally common). Cadential 6/4's. Both of these represent "predominant-to-tonic" motions and are generally considered undesirable. every two chord statements representing a cadential I6/4 followed by a V were replaced by a single statement representing V. This is perhaps most clearly seen in the table of root transition counts. thus it seemed likely that this largely accounted for the high frequency of II-I and IV-I motions. are analyzed in the Kostka-Payne text in a "two-level" fashion: A I6/4-V is placed inside a larger V. (This is surely one reason why many people prefer the V6/4 analysis. using the higher-level (V) analysis of cadential 6/4's.-M3 -m3 -M2 -m2 21 43 34 31 -M/m3 64 -M/m2 65 Discussion To a considerable extent. ii-V. descending thirds over ascending thirds. the cadential 6/4 is labeled as V6/4. In particular. which are extremely common in the KP corpus (and in common-practice music generally). However. Traditional theory holds that certain intervallic root motions are preferred over others: descending fifths are most preferred (strongly favored over ascending fifths). This appears to be largely due to cadential 6/4 chords. The interval counts are also of interest. A few things are surprising.the high frequency of ii-I and IV-I . All of these are standard. are V-I. and the data presented above reflects that as well. seconds (192) are much more common than thirds (96). The modified chord-list is kp-chord-list-2. The most common root motions. IV-I progressions do occur in certain circumstances (such as plagal cadences and I-IV-I motions expanding an opening I) but their frequency here seems high. this is discussed further below. the conventional rules of harmony are supported by this data. I thought that using the "V6/4" analysis might permit the conventional principles of tonal harmony to emerge more strongly. the frequencies of ii-I and IV-I are surprisingly high. fourths are by far the most common (475). under this convention. Overall. and ascending seconds (127) are more common than descending (65). in order. This data clearly shows all three of these preferences: descending fifths (+P4. "Incorrect" progressions such as V-IV are generally less common. 167). I-V.) The data was therefore recoded.were largely due to cadential 6/4 chords. descending thirds (65) are more common than ascending (32). 308) are much more common than ascending fifths (-P4.

I.that is. and function similarly. I-V. presents data about chord transitions for "a sample of Baroque music" (pp. I did extract a few basic statistics. 2. IV-I. third inversion. VI-II. and to gather further statistics from the chord lists provided -. While I have not analyzed the labels in detail with regard to mode and inversion. I-VII. Once the "V6/4" analysis of cadential 6/4's is assumed.for example. I made these as an intermediate step towards making the "chord-lists" below. The mftext program available at the Melisma website) can be used to extract the chord labels from Pardo's midifiles. there may be two chords of the same root and key in succession). The count of II-I is reduced from 22 to 5. he makes no allowance for non-chord-tones. no further information is given about the sample). but presents no complete data (such as tables of chord or progression frequencies).mi=1]"). second inversion. in his book Sweet Anticipation (2006).3%. and Wagner" has statistics about chord progressions. I-IV. Helen Budge's (1943) dissertation. 21. those built on minor triads. But I will leave further explorations to the reader. Allen Irvine McHose's (1947) study "The Contrapuntal Harmonic Technique of the 18th Century" offers occasional statistics about the frequency of various chords and progressions. Downloads List of excerpts in the Kostka-Payne corpus kp-nbck This directory contains "note-beat-chord-key" files for all excerpts in the corpus: A list of notes ("Note [ontime] [offtime] [pitch]"). These files bring together the "beat list" and "note list" formats that I used with the Melisma system (see the Melisma website for explanation) with the harmonic and key information from the Kostka-Payne analyses. the differences between the major and minor key distributions are fairly modest. Dmitri Tymoczko's paper "Root Motion. Chords built on major triads (including seventh chords that contain major triads.The recoding of cadential 6/4's has a significant effect. this goes against the modern practice of harmonic analysis. VIII. the conventional principles of tonal harmony appear to be very strongly confirmed." presents an interesting statistical analysis of tonal harmony. Function.9%.1%. A few sources deserve mention. Perhaps this should not surprise us. Chord list (list of chord statements) for the KP corpus . 23. II. since the primary tonic/dominant/predominant harmonies . Finally. beats ("Beat [time] [level]").are the same in both modes. Not a very earth-shattering conclusion (which is why I decided to put this in a web page rather than trying to publish it!) but I think it's good to know. Philip Norman's 1945 study "A Quantitative Study of Harmonic Similarities in Certain Specified Works of Bach. But only data on the frequency of individual (diatonic) chords is provided. systematically gathered from analyses by experts. compare the transitional frequency of IV-II (10) to II-IV (1).3% of the total. 3.g. (In fact.7 of the total. there is no data about transitions (motions from chord to chord). e. analyzing major and minor key sections separately. II-V.) Notes 1. A number of other comments could be made about this data. IV-II is much more common. Root-position chords are 60.9%. since in Pardo's annotations. those built on diminished triads. I-VI. again confirming a conventional rule. dominant sevenths) are 68. chords ("Chord [ontime] [offtime] [root]") and key sections ("Key [start time] [end time] [tonic] [mode:ma=0.pl to reproduce these statistics. "A Study of Chord Frequencies Based on the Music of Representative Composers of the Eighteenth and Nineteenth Centuries. first-inversion. the count of IV-I is reduced from 32 to 27. 12. The top 10 transitions are now V-I. There are 949 chord labels total (this is slightly greater than my count. I-II. The reader could also use tally. 9. V.2%. Beethoven. David Huron. 250-1. Scale Degree" (Musurgia 2005. but he assumes a new chord on every note . IV . For example. available in English at Tymoczko's website) analyzes a set of progressions from major-key Bach chorales.

Chord list for the KP corpus. a perl script for extracting aggregate data from chord lists. minor-key sections only tally. (The tables presented above are all outputs of tally. major key sections only Chord list for the KP corpus. major-key sections only The "V6/4" chord-list.) . minor key sections only Chord list for the KP corpus with the "V6/4" analysis of cadential 6/4 chords The "V6/4" chord-list.pl.pl.