Giantmidi-Piano: A Large-Scale Midi Dataset For Classical Piano Music

Anonymous (2022).
GiantMIDI-Piano: A large-scale MIDI Dataset

for Classical Piano Music, Transactions of the International So-
ciety for Music Information Retrieval, V(N), pp. xx–xx, DOI:
https://doi.org/xx.xxxx/xxxx.xx
ARTICLE TYPE
GiantMIDI-Piano: A large-scale MIDI Dataset for

Classical Piano Music
Qiuqiang Kong*, Bochen Li*, Jitong Chen*, Yuxuan Wang*
Abstract
Symbolic music datasets are important for music information retrieval and musical analysis.
However, there is a lack of large-scale symbolic datasets for classical piano music. In this ar-
arXiv:2010.07061v3 [cs.IR] 21 Apr 2022
ticle, we create a GiantMIDI-Piano (GP) dataset containing 38,700,838 transcribed notes and
10,855 unique solo piano works composed by 2,786 composers. We extract the names of music
works and the names of composers from the International Music Score Library Project (IMSLP).
We search and download their corresponding audio recordings from the internet. We further
create a curated subset containing 7,236 works composed by 1,787 composers by constrain-
ing the titles of downloaded audio recordings containing the surnames of composers. We ap-
ply a convolutional neural network to detect solo piano works. Then, we transcribe those solo
piano recordings into Musical Instrument Digital Interface (MIDI) files using a high-resolution
piano transcription system. Each transcribed MIDI file contains the onset, offset, pitch, and ve-
locity attributes of piano notes and pedals. GiantMIDI-Piano includes 90% live performance
MIDI files and 10% sequence input MIDI files. We analyse the statistics of GiantMIDI-Piano and
show pitch class, interval, trichord, and tetrachord frequencies of six composers from different
eras to show that GiantMIDI-Piano can be used for musical analysis. We evaluate the quality
of GiantMIDI-Piano in terms of solo piano detection F1 scores, metadata accuracy, and tran-
scription error rates. We release the source code for acquiring the GiantMIDI-Piano dataset at
https://github.com/bytedance/GiantMIDI-Piano
Keywords: GiantMIDI-Piano, dataset, piano transcription
1. Introduction ano rolls are continuous rolls of paper with perfora-

Symbolic music datasets are important for music infor- tions punched into them. In 1981, Musical Instrument
mation retrieval (MIR) and musical analysis. In the Digital Interface (MIDI) (Smith and Wood, 1981) was
Western music tradition, musicians use musical nota- proposed as a technical standard to represent music
tion to write music. This notation includes pitches, and can be read by a computer. MIDI files use event
rhythms, and chords of music. Musicologists used to messages to specify the instructions of music, includ-
analyse music works by reading music notation. Re- ing pitch, onset, offset, and velocity of notes. MIDI
cently, computers have been used to process and anal- files also carry rich information of music events such
yse large-scale data and have been widely used in MIR. as sustain pedals. The MIDI format has been popular
However, there is a lack of large-scale symbolic music for music production in recent years.
datasets covering a wide range of solo piano works.
In this work, we focus on building a large-scale
One difficulty of computer-based MIR is that musi-
MIDI dataset for classical solo piano music. There
cal notation such as staffs is not directly readable by
are several previous piano MIDI datasets including
a computer. Therefore, converting music notation into
the piano-midi.de (Krueger, 1996) dataset, the MAE-
computer-readable formats is important. Early works
STRO dataset (Hawthorne et al., 2019), the classical
of converting music into symbolic representations can
archives (Classical Archives, 2000) dataset, and the
be traced back to the 1900s, when piano rolls (Bryner,
Kunstderfuge dataset (Kunstderfuge, 2002). However,
2002; Shi et al., 2019) were developed to record mu-
those datasets are limited to hundreds of composers
sic that could be played on a musical instrument. Pi-
and hundreds of hours of unique works (Kunstderfuge,
* ByteDance 2002). MusicXML (Good et al., 2001) is another sym-
2 Anonymous: GiantMIDI-Piano: A large-scale MIDI Dataset for Classical Piano Music
Table 1: Piano MIDI datasets. GP is the abbreviation

bolic format of music, while there are fewer MusicXML for GiantMIDI-Piano.
datasets than MIDI datasets. Other machine-readable
formats include the music encoding initiative (MEI) Dataset Composers Works Hours Type
(Roland, 2002), Humdrum Sapp (2005), and LilyPond piano-midi.de 26 571 37 Seq.
(Nienhuys and Nieuwenhuizen (2003)). Optical mu- Classical archives 133 856 46 Seq.
sic recognition (OMR) (Rebelo et al., 2012; Bainbridge Kunstderfuge 598 - - Seq.
KerbScores - - - Seq.
and Bell, 2001) is a technique to transcribe image SUPRA 111 410 - Perf.
scores into symbolic formats. However, the perfor- ASAP 16 222 - Perf.
mance of OMR systems is limited to score qualities. MAESTRO 62 529 84 Perf.
MAPS - 270 19 Perf.
In this article, we collect and transcribe a large- GiantMIDI-Piano 2,786 10,855 1,237 90% Perf.
scale classical piano MIDI dataset called GiantMIDI- Curated GP 1,787 7,236 875 89% Perf.
Piano. To our knowledge, GiantMIDI-Piano is the
largest piano MIDI dataset so far. GiantMIDI-Piano is
collected as follows: 1) We parse the names of com-
posers and the names of music works from the Inter-
national Music Score Library Project (IMSLP)1 ; 2) We cal Archives, 2000) contains a large number of MIDI
search and download audio recordings of all matching files of classical music, including both piano and non-
music works from YouTube; 3) We build a solo piano piano works. There are 133 composers with a total
detection system to detect solo piano recordings; 4) duration of 46.3 hours of MIDI files in this dataset.
We transcribe solo piano recordings into MIDI files us- The KernScores dataset (Sapp, 2005) contains classi-
ing a high-resolution piano transcription system (Kong cal music with a Humdrum format and is obtained
et al., 2021). In this article, we analyse the statistics by an optical music recognition system. The Kunst-
of GiantMIDI-Piano, including the number of works, derfuge dataset (Kunstderfuge, 2002) contains solo pi-
durations of works, and nationalities of composers. In ano and non-solo piano works of 598 composers. All of
addition, we analyse the statistics of note, interval, and the piano-midi.de, classical archives, and Kunstderfuge
chord distribution of six composers from different eras datasets are entered using a MIDI sequencer and are
to show that GiantMIDI-Piano can be used for musical not played by pianists.
analysis.
1.1 Applications The MAPS dataset (Emiya et al., 2010) used MIDI
The GiantMIDI-Piano dataset can be used in many files from Piano-midi.de to render real recordings by
research areas, including 1) Computer-based musi- playing back the MIDI files on a Yamaha Disklavier. The
cal analysis (Volk et al., 2011; Meredith, 2016) such MAESTRO dataset (Hawthorne et al., 2019) contains
as using computers to analyse the structure, chords, over 200 hours of fine alignment MIDI files and audio
and melody of music works. 2) Symbolic music gen- recordings. In MAESTRO, virtuoso pianists performed
eration (Yang et al., 2017; Hawthorne et al., 2019) on Yamaha Disklaviers with an integrated MIDI capture
such as generating symbolic music in symbolic for- system. MAESTRO contains music works from 62 com-
mat. 3) Computer-based music information retrieval posers. There are several duplicated works in MAE-
(Casey et al., 2008; Choi et al., 2017) such as music STRO. For example, there are 11 versions of Scherzo
transcription and music tagging. 4) Expressive perfor- No. 2 in B-flat Minor, Op. 31 composed by Chopin.
mance analysis (Cancino-Chacón et al., 2018) such as All duplicated works are removed when calculating the
analysing the performance of different pianists. number and duration of works.
This paper is organised as follows: Section 2 sur-
veys piano MIDI datasets; Section 3 introduces the col- Table 1 shows the number of composers, the num-
lection of the GiantMIDI-Piano dataset; Section 4 in- ber of unique works, total durations, and data types of
vestigates the statistics of the GiantMIDI-Piano dataset; different MIDI datasets. Data types include sequenced
Section 5 evaluates the quality of the GiantMIDI-Piano (Seq.) MIDI files input by MIDI sequencers and per-
dataset. formed (Perf.) MIDI files played by pianists. There are
other MIDI datasets including the Lakh dataset (Raf-
2. Dataset Survey fel, 2016), the Bach Doodle dataset (Huang et al.,
We introduce several piano MIDI datasets as follows. 2019), the Bach Chorales dataset (Conklin and Wit-
The Piano-midi.de dataset (Krueger, 1996) contains ten, 1995), the URMP dataset (Li et al., 2018),
classical solo piano works entered by a MIDI sequencer. the Bach10 dataset (Duan et al., 2010), the Crest-
Piano-midi.de contains 571 works composed by 26 MusePEDB dataset (Hashida et al., 2008), the SUPRA
composers with a total duration of 36.7 hours till dataset (Shi et al., 2019), and the ASAP dataset (Fos-
Feb. 2020. The classical archives collection (Classi- carin et al., 2020). (Huang et al., 2018) collected
10,000 hours of piano recordings for music generation,
1 https://imslp.org but the dataset is not publicly available.
3. GiantMIDI-Piano Dataset Higher J indicates that X and Y have larger similarity,

3.1 Metadata from IMSLP and lower J indicates that X and Y have less similarity.
To begin with, we acquire the names of composers and We empirically set a similarity threshold to 0.6 to bal-
the names of music works by parsing the webpages of ance the precision and recall of searched results. If J
the International Music Score Library Project (IMSLP, is strictly larger than this threshold, then we say X are
2006), the largest publicly available music library in Y are matched, and vice versa. In total, we retrieve
the world. In IMSLP, each composer has a webpage and download 60,724 audio recordings out of 143,701
containing the list of their work names. We acquire music works.
143,701 music works composed by 18,067 composers
by parsing those web pages. For each composer, if 3.3 Solo Piano Detection
there exists a biography link on the composer page, we We detect solo piano works from IMSLP to build the
access that biography link to search for their birth year, GiantMIDI-Piano dataset. Filtering music works with
death year, and nationality. We set the birth year, death keywords containing “piano” may lead to incorrect re-
year, and nationality to “unknown” if a composer does sults. For example, a “Piano Concerto” is an ensemble
not have such a biography link. We obtain the national- of piano and orchestra which is not solo piano. On the
ities of 4,274 composers and births of 5,981 composers other hand, the keyword Chopin, Frédéric, Nocturnes,
out of 18,067 composers by automatically parsing the Op.62 does not contain the word “piano”, but it is in-
biography links. deed a solo piano. To address this problem, we train an
As the automatically parsed meta information of audio-based solo piano detection system using a con-
composers from the internet is incomplete, we man- volutional neural network (CNN) (Kong et al., 2020).
ually check the nationalities, births, and deaths for The piano detection system takes 1-second segments
2,786 composers. We label 2,291 birth years, 2,254 as input and extracts log mel spectrograms as input to
death years, and 2,115 nationalities by searching the the CNN.
information of composers on the internet. We label not
found birth years, death years, and nationalities as “un- The CNN consists of four convolutional layers. Each
known”. We create metadata files containing the infor- convolutional layer consists of a linear convolutional
mation of composers and music works, respectively. operation with a kernel size of 3 × 3, a batch normal-
ization (Ioffe and Szegedy, 2015), and a ReLU nonlin-
3.2 Search Audio earity (Glorot et al., 2011). The output of the CNN
We search audio recordings on YouTube by using a key- predicts the solo piano probability of a segment. Bi-
word of first name, surname, music work name from nary cross-entropy is used as a loss function to train
the metadata. For each keyword, we select the first the CNN. We collect solo piano recordings as positive
returned result on YouTube. However, there can be samples and collect music and other sounds as negative
cases where the returned YouTube title may not ex- samples. In addition, the mixtures of piano and other
actly match the keyword. For example, for a key- sounds are also used as negative samples. In inference,
word Frédéric Chopin, Scherzo No.2 Op.31, the top re- we average the predictions of all 1-second segments of
turned result can be Chopin - Scherzo No. 2, Op. 31 a recording to calculate its solo piano probability. We
(Rubinstein). Although the keyword and the returned regard an audio recording as solo piano if the proba-
YouTube title are different, they indicate the same mu- bility is strictly larger than 0.5 and vice versa. In total,
sic work. We denote the set of words in a search- we obtain 10,855 solo pianos composed by 2,786 com-
ing keyword as X and the set of words in a returned posers out of 60,724 downloaded audio recordings.
YouTube title as Y . We propose a modified Jaccard These 10,855 audio files are transcribed into MIDI files
similarity (Niwattanakul et al., 2013) to evaluate how which constitute the full GiantMIDI-Piano dataset.
much a keyword and a returned result are matched.
The original Jaccard similarity is defined as J = 3.4 Constrain Composer Surnames
|X ∩Y |/(|X |∪|Y |). The drawback of this original Jaccard Among the detected 10,855 solo piano works, there
similarity is that the length of the searched YouTube ti- are several music works composed by not well-known
tle |Y | can be long, so that J will be small. This is often composers but are attached to famous composers. For
the case because searched YouTube titles usually con- example, there are 273 searched music works assigned
tain extra words such as the names of performers and to Chopin, but only 102 of them are actually com-
the dates of performances. Our aim is to define a met- posed by Chopin, while other music works are com-
ric where the denominator only depends on the search- posed by other composers. To alleviate this problem,
ing keyword |X | and is independent of the length of the we create a curated subset by constraining the titles of
searched YouTube title |Y |. We propose a modified Jac- downloaded audio recordings containing the surnames
card similarity (Niwattanakul et al., 2013) between X of composers. After this constraint, we obtain a cu-
and Y as: rated GiantMIDI-Piano dataset containing 7,236 music
J = |X ∩ Y |/|X |. (1) works composed by 1,787 composers.
4
Number of composers Duration (h) Number of works
0
5
10
15
20
25
30
0
25
50
75
100
125
150
175
200
0
100
200
300
400
500
Liszt, Franz 25 40 Liszt, Franz 141 242
Unknown 671 Beethoven, Ludwig van 21 78
German 364 Schubert, Franz 20 87 Scarlatti, Domenico 140 228
French 322 Bach, Johann Sebastian 129 793
American 267 Bach, Johann Sebastian 17 158 Schubert, Franz 96 703
British 179 Chopin, Frédéric 16 19 Chopin, Frédéric 96 109
Italian 169 Carbajo, Víctor 14 15
Russian 135 Czerny, Carl 13 19 Carbajo, Víctor 77 81
Austrian 90 Schumann, Robert 11 39 Beethoven, Ludwig van 76 286
Polish 74 Rebikov, Vladimir 11 12 Mozart, Wolfgang Amadeu 72 613
Spanish 66 Simpson, Daniel Léo 69 198
Belgian 47 Mozart, Wolfgang Amadeu 11 140 Scriabin, Aleksandr 67 83
Norwegian 32 Haydn, Joseph 10 143 Czerny, Carl 58 83
Czech 29 Rachmaninoff, Sergei 10 19
Swedish 28 Medtner, Nikolay 10 17 Handel, George Frideric 57 224
Hungarian 25 Simpson, Daniel Léo 9 25 Rebikov, Vladimir 54 57
Brazilian 25 Scarlatti, Domenico 9 16 Haydn, Joseph 54 557
Dutch 23 Satie, Erik 52 76
Danish 22 Alkan, Charles-Valentin 9 12 Chaminade, Cécile 45 115
Australian 21 Brahms, Johannes 9 36 Alkan, Charles-Valentin 39 61
Ukrainian 16 Scriabin, Aleksandr 9 13 38 45
3.5 Piano Transcription

Canadian 16 Tchaikovsky, Pyotr 8 53 Lack, Théodore
Finnish 15 Rzewski, Frederic 7 14 Turina, Joaquín 37 61
Mexican 14 Raff, Joachim 37 73
Argentine 13 Scharwenka, Xaver 7 14 Zhang, Shuwen 36 39
Swiss 13 Turina, Joaquín 6 12 Hummel, Johann Nepomuk 36 86
Bohemian 12 Hummel, Johann Nepomuk 6 21 Heller, Stephen 35 106
Irish 10 Prokofiev, Sergey 6 27
Portuguese 9 Grieg, Edvard 6 15 Tinel, Jef 34 73
Japanese 9 Violette, Andrew 6 28 Medtner, Nikolay 34 60
Armenian 7 Schumann, Robert 33 116
alities of the full GP dataset.

Romanian 7 Godowsky, Leopold 5 6
Nationalities
Chinese 6 Debussy, Claude 5 19 Moszkowski, Moritz 32 48
Cuban 6 Heller, Stephen 5 39 Johnson, Charles Leslie 32 39
Croatian 5 Zhang, Shuwen 5 5 Scharwenka, Xaver 31 48
Chilean 5 Szymanowski, Karol 5 12 Joplin, Scott 31 35
Venezuelan 4 Teilman, Christian 30 30
Filipino 3 Busoni, Ferruccio 5 9 Meyer-Helmund, Erik 30 31
Uruguayan 3 Satie, Erik 4 8 Fauré, Gabriel 30 87
Colombian 3 Mendelssohn, Felix 4 40 Balakirev, Mily 30 46
Belarusian 2 Clementi, Muzio 4 6
Bulgarian 2 Moszkowski, Moritz 4 10 Gottschalk, Louis Morea 29 72
Azerbaijani 2 4 5 Godard, Benjamin 28 51
Indian 2 Albéniz, Isaac Ravina, Jean Henri 27 34
Iranian 2 Fauré, Gabriel 4 13 Debussy, Claude 27 87
Icelandic 2 Lyapunov, Sergey 4 9 Tchaikovsky, Pyotr 26 123
Turkish 1 Weber, Carl Maria von 4 12 Prokofiev, Sergey 26 72
2 https://github.com/bytedance/piano_transcription
Peruvian 1 Villa-Lobos, Heitor 4 19 Poulenc, Francis 26 77
Asian
Lithuanian 1
Afican
Egyptian 1 Grünfeld, Alfred 4 5 Mendelssohn, Felix 24 119
Balakirev, Mily 4 8
Unknown
Paraguayan 1
European
Grieg, Edvard 24 62
Latvian 1 Hindemith, Paul 4 39 Wachs, Paul 23 27
Greek 1 Arensky, Anton 4 10 Villa-Lobos, Heitor 22 75
Ecuadorean 1 4 35
North American
Dvo ák, Antonín
South American
Nigerian 1 Rzewski, Frederic 22 48
predicts all of the pitch, onset, offset, and velocity at-

MIDI files, respectively. The piano transcription system
161.3 and 20.5 hours of aligned piano recordings and
et al., 2019). The training and testing subset contain
set of the MAESTRO dataset version 2.0.0 (Hawthorne
ano transcription system is trained on the training sub-
tems (Kim and Bello, 2019; Kwon et al., 2020). The pi-
system (Hawthorne et al., 2018, 2019) and other sys-
ment over the onsets and frames piano transcription
transcription system2 (Kong et al., 2021), an improve-
MIDI files using an open-sourced high-resolution piano
We transcribe all 10,855 solo piano recordings into
Figure 3: Number of composers with different nation-
Bartók, Béla 4 15 Nazareth, Ernesto 22 27
Raff, Joachim 4 21 Brahms, Johannes 22 104
Vo í ek, Jan Václav 3 4 Rachmaninoff, Sergei 21 44
Lack, Théodore 3 4 Hitz, Franz 21 21
Granados, Enrique 3 5 Busoni, Ferruccio 21 36
Chaminade, Cécile 3 8 Victoria, Tomás Luis de 20 129
Gurlitt, Cornelius 3 3 Scott, James 20 21
Suk, Josef 3 8 Scott, Cyril 20 33
Koechlin, Charles 3 7 Lange, Gustav 20 24
Reger, Max 3 23 Gurlitt, Cornelius 20 23
Wieniawski, Józef 3 4 Stanchinsky, Aleksey 19 21
Schmitt, Florent 3 17 Schütt, Eduard 19 20
Blumenfeld, Felix 3 4 Krzy anowski, Ignacy 19 19
Méreaux, Jean-Amédée Le 3 4 Grünfeld, Alfred 19 32
Glazunov, Aleksandr 3 23 Godowsky, Leopold 19 26
St. Clair, Richard 3 12 Ferrari, Carlotta 19 389
Myaskovsky, Nikolay 3 18 Ornstein, Leo 18 29
4. Statistics
Dupont, Gabriel 3 6 Lyapunov, Sergey 18 29
Bortkiewicz, Sergei 3 6 Harrington, Jeffrey Mic 18 89
Schumann, Clara 3 5 Agnew, Roy 18 18
Poulenc, Francis 3 10 Granados, Enrique 17 21
Bertini, Henri 22 Di Gesu, Massimo 17 26
Stanchinsky, Aleksey 2 3 Clementi, Muzio 17 25
Handel, George Frideric 2 108 Wieniawski, Józef 16 21
Hahn, Reynaldo 2 8 Weber, Carl Maria von 16 48
Beach, Amy Marcy 2 5 Ravel, Maurice 16 51
Séverac, Déodat de 2 3 Myaskovsky, Nikolay 16 48
notes on a modern piano.

Gottschalk, Louis Morea 2 13 Mourey, Colette 16 43
Rubinstein, Anton 2 25 Merino Martínez, Aitor 16 35
Ravel, Maurice 2 12 Marmontel, Antoine Fran 16 16
Meyer-Helmund, Erik 2 2 Lyadov, Anatoly 16 27
Casella, Alfredo 2 8 Hiller, Ferdinand 16 40
Ornstein, Leo 2 4 Glinka, Mikhail 16 34
Cervantes, Ignacio 2 2 Glazunov, Aleksandr 16 81
Ferrari, Carlotta 2 36 Casella, Alfredo 16 32
Novák, Vít zslav 2 7 Blumenfeld, Felix 16 19
MacDowell, Edward 2 5 Bertini, Henri 16 16
Cui, César 2 7 Violette, Andrew 15 58
Sgambati, Giovanni 2 5 St. Clair, Richard 15 51
Joplin, Scott 2 4 Shirley, Nathan 15 26
Dussek, Jan Ladislav 2 9 Schumann, Clara 15 26
Martin , Bohuslav 2 10 Rowley, Alec 15 37
Figure 1: Number of solo piano works of the curated GP dataset. Top 100 are shown.
Figure 2: Duration of solo piano works of the curated GP dataset. Top 100 are shown.
Wagner, Richard 2 6 Nichifor, Serban 15 155

Ravina, Jean Henri 2 2 Macchi, Claudio 15 57
Scott, Cyril 2 3 Kuhlau, Friedrich 15 38
outputs are post-processed into MIDI events.

Teilman, Christian 2 2 Henselt, Adolf von 15 20
Schütt, Eduard 2 2 Hahn, Reynaldo 15 63
Kosenko, Viktor 2 3 Cui, César 15 33
Harrington, Jeffrey Mic 2 12 Bortkiewicz, Sergei 15 22
Franck, César 2 16 Bendel, Franz 15 17
Marmontel, Antoine Fran 2 2 Beach, Amy Marcy 15 29
Godard, Benjamin 2 8 Bartók, Béla 15 47
2 2 15 33
Complete works
Complete works
Bendel, Franz Arensky, Anton

Solo piano works
Solo piano works
Anonymous: GiantMIDI-Piano: A large-scale MIDI Dataset for Classical Piano Music
Mourey, Colette 2 6 Albéniz, Isaac 15 23
the number and duration of music works composed by

We analyse the statistics of GiantMIDI-piano, including
ties of pitch, onset, offset, and velocity. Finally, those
transcription system outputs the predicted probabili-
trogram as input feature (Hawthorne et al., 2019). The
banks with 229 bins are used to extract log mel spec-
so there are 100 frames in a second. Then, mel filter
size 2048 and a hop size 160 to extract spectrograms,
time Fourier transform (STFT) with a Hann window
mono with a sampling rate of 16 kHz. We use a short-
In inference, all piano recordings are converted into
indicating the onset or offset probabilities of pedals.
there is only one output after the CRNN sub-module
tecture as the note transcription system, except that
The pedal transcription system has the same archi-
ule has a dimension of 88, equivalent to the number of
current units (GRU) layers. The output of each mod-
convolutional layers and two bi-directional gated re-
lutional recurrent neural network (CRNN) with eight
sub-module. Each sub-module is modeled by a convo-
gression, an offset regression, and a velocity regression
tem consists of a frame-wise classification, an onset re-
sustain pedals. For piano note transcription, our sys-
tributes of notes. The transcribed results also include
1,200,000
Number of notes
1,000,000
800,000
600,000
400,000
200,000
0
A0 A 0 B0 C1 C 1 D1 D 1 E1 F1 F 1 G1 G 1 A1 A 1 B1 C2 C 2 D2 D 2 E2 F2 F 2 G2 G 2 A2 A 2 B2 C3 C 3 D3 D 3 E3 F3 F 3 G3 G 3 A3 A 3 B3 C4 C 4 D4 D 4 E4 F4 F 4 G4 G 4 A4 A 4 B4 C5 C 5 D5 D 5 E5 F5 F 5 G5 G 5 A5 A 5 B5 C6 C 6 D6 D 6 E6 F6 F 6 G6 G 6 A6 A 6 B6 C7 C 7 D7 D 7 E7 F7 F 7 G7 G 7 A7 A 7 B7 C8
Figure 4: Note histogram of the curated GP dataset.

Bach, Johann Sebastian
30,000
Number of notes
20,000
10,000
0 AA B
0 0 0 C1 C 1 D1 D 1 E1 F1 F 1 G1 G 1 A1 A 1 B1 C2 C 2 D2 D 2 E2 F2 F 2 G2 G 2 A2 A 2 B2 C3 C 3 D3 D 3 E3 F3 F 3 G3 G 3 A3 A 3 B3 C4 C 4 D4 D 4 E4 F4 F 4 G4 G 4 A4 A 4 B4 C5 C 5 D5 D 5 E5 F5 F 5 G5 G 5 A5 A 5 B5 C6 C 6 D6 D 6 E6 F6 F 6 G6 G 6 A6 A 6 B6 C7 C 7 D7 D 7 E7 F7 F 7 G7 G 7 A7 A 7 B7 C8
Beethoven, Ludwig van

30,000
Number of notes
20,000
10,000
0 AA B
Liszt, Franz
30,000
Number of notes
20,000
10,000
0 AA B
Figure 5: Notes histogram of J.S. Bach, Beethoven, and Liszt of the curated GP dataset.
different composers, the nationality of composers, and bert at 20 hours. Some composers composed more
the distribution of notes by composers. Then, we in- non-piano works than solo piano works. For example,
vestigate the statistics of six composers from different there are 108 hours of complete works composed by
eras by calculating their pitch class, interval, trichord, Handel in the dataset, while only 2 hours of them are
and tetrachord distributions. All of Fig. 1 to Fig. 11 ex- played on a modern piano. The rank of composers in
cept Fig. 3 are plotted with the statistics of the curated Fig. 2 is different from Fig. 1, indicating that the aver-
GiantMIDI-Piano dataset. Fig. 3 shows the manually- age duration of solo piano works composed by differ-
checked nationalities of 2,786 composers in the full ent composers are different.
GiantMIDI-Piano dataset.
4.3 Nationalities of Composers
4.1 The Number of Solo Piano Works Fig. 3 shows the number of composers with differ-
Fig. 1 shows the numbers of piano works composed ent nationalities sorted in descending order of the full
by different composers sorted in descending order of GiantMIDI-Piano dataset. The nationality of 2,786
the curated GiantMIDI-Piano dataset. Fig. 1 shows the composers are initially obtained from Wikipedia and
statistics of top 100 composers out of 2,786 composers. are later manually checked3 . Fig. 3 shows that
Blue bars show the number of solo piano works. Pink there are 671 composers with unknown nationality.
bars show the number of complete works, including There are 364 German composers, followed by 322
both solo piano and non-solo piano works. Fig. 1 French composers and 267 American composers. We
shows that there are 141 solo piano works composed color-code the continent of nationalities from “Un-
by Liszt, followed by 140 and 129 solo piano works known”, “European”, “North American”, “South Amer-
composed by Scarlatti and J. S. Bach. Some composers, ican”, “Asian”, to “African”. In GiantMIDI-Piano, the
such as Chopin composed more solo piano works than nationalities of most composers are European. The
non-solo piano works. For example, there are 96 solo numbers of composers with nationalities from South
piano works out of 109 complete works composed by American, Asian, and African are fewer.
Chopin in the curated GiantMIDI-Piano dataset. Fig. 1
shows that the number of solo piano works of different 4.4 Note Histogram
composers has a long tail distribution. Fig. 4 shows the note histogram of the curated
GiantMIDI-Piano dataset. There are 24,253,495 tran-
4.2 The Duration of Solo Piano Works scribed notes. The horizontal axis shows the scientific
pitch notations, which covers 88 notes on a modern
Fig. 2 shows the duration of solo piano works com-
piano from A0 to C8 . Middle C is denoted as C4 . We
posed by different composers sorted in descending or-
der of the curated GiantMIDI-Piano dataset. The du- 3 There are debates on the nationality of some composers. We ex-
ration of works composed by Liszt is the longest at 25 tract the nationality of composers from Wikipedia and do not discuss
hours, followed by Beethoven at 21 hours and Schu- region debates in this work.
Number of notes per second Note names
6
C3
C4
C5
0
5
10
15
20
Grünfeld, Alfred 4.77 Harrington, Jeffrey Mic 59.66
Méreaux, Jean-Amédée Le 4.85 Casella, Alfredo 60.39
Marmontel, Antoine Fran 5.39 Mourey, Colette 60.71
Scharwenka, Xaver 5.51 Franck, César 61.26
Koechlin, Charles 6.14 Wagner, Richard 61.36
Lack, Théodore 6.20 Godard, Benjamin 61.38
Cervantes, Ignacio 6.53 Satie, Erik 61.51
Satie, Erik 6.70 Myaskovsky, Nikolay 61.76
Rebikov, Vladimir 7.00 Busoni, Ferruccio 61.82
Handel, George Frideric 7.08 Prokofiev, Sergey 61.90
Rzewski, Frederic 7.39 Schumann, Robert 61.91
Ferrari, Carlotta 7.53 Rubinstein, Anton 62.08
Fauré, Gabriel 7.55 Brahms, Johannes 62.23
Dupont, Gabriel 7.59 Mendelssohn, Felix 62.32
MacDowell, Edward 7.64 Grieg, Edvard 62.36
Grieg, Edvard 7.78 Ferrari, Carlotta 62.39
Bach, JohannSebastian 7.91 Gurlitt, Cornelius 62.42
Suk, Josef 7.92 Bartók, Béla 62.55
Granados, Enrique 7.93 Sgambati, Giovanni 62.64
Sgambati, Giovanni 7.96 Rzewski, Frederic 62.73
Turina, Joaquín 8.05 Scriabin, Aleksandr 62.98
Franck, César 8.06 Beethoven, Ludwigvan 62.99
Myaskovsky, Nikolay 8.13 Bortkiewicz, Sergei 63.25
Haydn, Joseph 8.26 Tchaikovsky, Pyotr 63.32
Gurlitt, Cornelius 8.29 Medtner, Nikolay 63.45
Rubinstein, Anton 8.32 Stanchinsky, Aleksey 63.51
Cui, César 8.33 Teilman, Christian 63.52
Bartók, Béla 8.34 Godowsky, Leopold 63.59
played more often than black keys.

Hahn, Reynaldo 8.34 Schumann, Clara 63.65
the whole range of a modern piano.

Meyer-Helmund, Erik 8.50 MacDowell, Edward 63.66
Debussy, Claude 8.53 Bach, JohannSebastian 63.69
Séverac, Déodat de 8.58 Cui, César 63.69
Scott, Cyril 8.65 Alkan, Charles-Valentin 63.72
Mourey, Colette 8.66 Rachmaninoff, Sergei 63.75
Stanchinsky, Aleksey 8.72 Scott, Cyril 63.78
St. Clair, Richard 8.77 Balakirev, Mily 63.81
Carbajo, Víctor 8.80 Dupont, Gabriel 63.82
4.5 Pitch Distribution of Top 100 Composers

Bendel, Franz 8.88 Vo í ek, Jan Václav 63.84
Violette, Andrew 8.90 Méreaux, Jean-Amédée Le 63.86
Schmitt, Florent 8.92 Ornstein, Leo 63.94
Scriabin, Aleksandr 8.94 Kosenko, Viktor 63.95
Ornstein, Leo 8.99 Weber, Carl Maria von 63.95
Wagner, Richard 9.00 Simpson, Daniel Léo 64.00
Beach, Amy Marcy 9.02 Chopin, Frédéric 64.01
Novák, Vít zslav 9.08 Dvo ák, Antonín 64.10
Simpson, Daniel Léo 9.11 Lyapunov, Sergey 64.11
of most composers are between C4 and C5 , where C4
erage pitch value of C4 . Carl Czerny has the highest

butions. Jeffrey Michael Harrington has the lowest av-
dicate the one standard deviation area of pitch distri-
corresponds to a MIDI pitch value 60. The shades in-
curated GiantMIDI-Piano dataset. The average pitches
ing order over the top 100 composers in Fig. 2 of the
Fig. 6 shows the pitch distribution sorted in ascend-
octaves. The note range of Liszt is the widest, covering
is mostly between F1 and C7 covering five and a half
harpsichord or organ. The note range of Beethoven
is consistent with the note range of a conventional
mostly between C2 and C6 covering four octaves, which
Beethoven, and Liszt. The note range of J. S. Bach is
composers from different eras, including J. S. Bach,
Fig. 5 visualizes the note histogram of three
the octave between C4 and C5 . White keys are being
notes far from G4 . The most played notes are within
note is G4 . There are more notes close to G4 and less
histogram has a normal distribution. The most played
modern piano, respectively. Fig. 4 shows that the note
black bars correspond to the white and black keys on a
note C]/D[ is simply denoted as C]. The white and
do not distinguish enharmonic notes, for example, a
Hindemith, Paul 9.16 Hahn, Reynaldo 64.12
Martin , Bohuslav 9.17 Schubert, Franz 64.12
Balakirev, Mily 9.20 Hindemith, Paul 64.12
Mozart, WolfgangAmadeu 9.22 Handel, George Frideric 64.12
Brahms, Johannes 9.23 Fauré, Gabriel 64.16
Prokofiev, Sergey 9.26 Poulenc, Francis 64.32
Scarlatti, Domenico 9.35 Joplin, Scott 64.36
Godowsky, Leopold 9.50 Debussy, Claude 64.38
Wieniawski, Józef 9.52 Scharwenka, Xaver 64.38
Albéniz, Isaac 9.53 Marmontel, Antoine Fran 64.42
Beethoven, Ludwigvan 9.56 Granados, Enrique 64.48
Tchaikovsky, Pyotr 9.62 Villa-Lobos, Heitor 64.50
Schumann, Clara 9.69 Ravina, Jean Henri 64.55
Reger, Max 9.70 Lack, Théodore 64.59
Ravel, Maurice 9.80 Wieniawski, Józef 64.61
Busoni, Ferruccio 9.84 Mozart, WolfgangAmadeu 64.81
Bortkiewicz, Sergei 9.92 Suk, Josef 64.83
Heller, Stephen 9.96 Hummel, Johann Nepomuk 64.89
Szymanowski, Karol 9.97 Meyer-Helmund, Erik 64.91
Hummel, Johann Nepomuk 9.97 Moszkowski, Moritz 64.93
Teilman, Christian 9.98 Martin , Bohuslav 64.94
Chopin, Frédéric 9.99 Cervantes, Ignacio 64.95
Godard, Benjamin 9.99 Turina, Joaquín 64.98
Schumann, Robert 10.03 Carbajo, Víctor 65.08
Schütt, Eduard 10.07 Schütt, Eduard 65.14
Clementi, Muzio 10.09 Séverac, Déodat de 65.15
Top 100 Composers

Casella, Alfredo 10.11 Szymanowski, Karol 65.24
Poulenc, Francis 10.14 Chaminade, Cécile 65.24
average pitch value of A4 .

Blumenfeld, Felix 10.35 Heller, Stephen 65.35
65.38
4.7 Pitch Class Distribution

Dvo ák, Antonín 10.60 Koechlin, Charles
Arensky, Anton 10.64 Reger, Max 65.39
Schubert, Franz 10.72 Dussek, Jan Ladislav 65.48
Chaminade, Cécile 10.79 Rebikov, Vladimir 65.49
Moszkowski, Moritz 10.86 Arensky, Anton 65.51
Lyapunov, Sergey 10.91 Liszt, Franz 65.69
Ravina, Jean Henri 10.94 Clementi, Muzio 65.85
Villa-Lobos, Heitor 10.94 Haydn, Joseph 65.93
Kosenko, Viktor 10.98 Blumenfeld, Felix 65.96
Figure 6: Pitch distribution of top 100 composers of the curated GP dataset.
Dussek, Jan Ladislav 11.14 Schmitt, Florent 66.06

Mendelssohn, Felix 11.25 Novák, Vít zslav 66.12
notes per second with a value of 13.61.

Glazunov, Aleksandr 11.36 Bendel, Franz 66.14
Harrington, Jeffrey Mic 11.50 Ravel, Maurice 66.32
Medtner, Nikolay 11.54 Scarlatti, Domenico 66.37
Gottschalk, Louis Morea 11.63 Beach, Amy Marcy 66.63
Rachmaninoff, Sergei 11.66 Albéniz, Isaac 66.70
Figure 7: The number of notes per second of top 100 composers of the curated GP dataset.
Weber, Carl Maria von 11.81 Bertini, Henri 66.74

Vo í ek, Jan Václav 11.84 Glazunov, Aleksandr 66.89
Joplin, Scott 11.92 Violette, Andrew 67.28
Liszt, Franz 11.94 Zhang, Shuwen 67.39
Zhang, Shuwen 12.13 Raff, Joachim 67.91
Bertini, Henri 12.35 St. Clair, Richard 68.09
Raff, Joachim 12.50 Grünfeld, Alfred 68.17
Alkan, Charles-Valentin 14.16 Gottschalk, Louis Morea 68.40
Czerny, Carl 14.74 Czerny, Carl 68.69
Anonymous: GiantMIDI-Piano: A large-scale MIDI Dataset for Classical Piano Music
Beethoven used more C, D, and G than other notes.

works and used more A]/B[ than other composers.
Mozart used C, D, F, and G most in his solo piano
Bach used D, E, G, and A most in his solo piano works.
Chopin, Liszt, and Debussy. Fig. 8 shows that J. S.
different eras including J. S. Bach, Mozart, Beethoven,
We calculate the statistics of six composers from
to B are denoted as 0 to 11 (Forte, 1973), respectively.
{C, C], D, D], E, F, F], G, G], A, A], B}. The notes from C
We denote the set of pitch names as
a value of 4.18. Carl Czerny has the largest number of
feld has the smallest number of notes per second with
number of notes per second distribution. Alfred Grün-
shades indicate the one standard deviation area of the
second of most composers are between 5 and 10. The
tion of a composer. The average numbers of notes per
by dividing all works notes number by all works dura-
dataset. The number of notes per second is calculated
composers in Fig. 2 of the curated GiantMIDI-Piano
tribution sorted in ascending order over the top 100
Fig. 7 shows the number of notes per second dis-
4.6 The Number of Notes Per Second Distribution of
Bach, Johann Sebastian Mozart, Wolfgang Amadeus Beethoven, Ludwig van

0.15 0.15 0.15
0.10 0.10 0.10

Frequency
Frequency
Frequency
0.05 0.05 0.05
0.00 0.00 0.00

CC DD E F F GG A A B CC DD E F F GG A A B CC DD E F F GG A A B
Chopin, Frédéric Liszt, Franz Debussy, Claude
0.15 0.15 0.15
0.10 0.10 0.10

Frequency
Frequency
Frequency
0.05 0.05 0.05
0.00 0.00 0.00

CC DD E F F GG A A B CC DD E F F GG A A B CC DD E F F GG A A B
Figure 8: Pitch class distribution of six composers of

the curated GP dataset. Figure 10: Trichord distribution of six composers of
the curated GP dataset. Rel. frequency is the ab-
Bach, Johann Sebastian Mozart, Wolfgang Amadeus Beethoven, Ludwig van
0.15 0.15 0.15 breviation for relative frequency.
Frequency
0.10 0.10 0.10

0.05 0.05 0.05
0.00 0.00 0.00
-11 -9 -7 -5 -3 -1 1 3 5 7 9 11 -11 -9 -7 -5 -3 -1 1 3 5 7 9 11 -11 -9 -7 -5 -3 -1 1 3 5 7 9 11
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10
Chopin, Frédéric Liszt, Franz Debussy, Claude
0.15 0.15 0.15
Frequency
0.10 0.10 0.10

0.05 0.05 0.05
0.00 0.00 0.00
-11 -9 -7 -5 -3 -1 1 3 5 7 9 11 -11 -9 -7 -5 -3 -1 1 3 5 7 9 11 -11 -9 -7 -5 -3 -1 1 3 5 7 9 11
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10
Figure 9: Interval distribution of six composers of the

curated GP dataset.
Figure 11: Tetrachord distribution of six composers of
Chopin used D]/E[ and G]/A[ most in his solo piano the curated GP dataset.
works. Liszt and Debussy used all twelve pitch classes
more uniformly in their solo piano works than other
composers. Liszt used E most, and Debussy used C]/D[ Bach and Mozart used more downward major second
most. As expected, most of Baroque and classical solo than the upward major second. In the works of J. S.
piano works were in keys close to C, whereas Romantic Bach, the dip in the interval 0 indicates that repeated
and later composers explored distant keys and tended notes are less commonly used than non-repeated notes.
to use all twelve pitch classes more uniformly. Other composers used more repeated notes than J. S.
Bach. Major seventh and tritone are least used by all
4.8 Interval Distribution composers. Some Romantic and later composers, in-
An interval is the pitch difference between two notes. cluding Chopin, Liszt, and Debussy used all intervals
Intervals can be either melodic intervals or harmonic more uniformly than J. S. Bach from Baroque era.
intervals. A harmonic interval is the pitch difference of
two notes being played at the same time. A melodic 4.9 Trichord Distribution
interval is the pitch difference between two succes- We adopt the set musical theory (Forte, 1973) to anal-
sive notes. We consider both harmonic intervals and yse the chord distribution in GiantMIDI-Piano. A tri-
melodic intervals as intervals. We calculate the distri- chord is a set of any three pitch-classes (Forte, 1973).
bution of intervals of six composers. Notes are repre- Since GiantMIDI-Piano is transcribed from real record-
sented as a list of events in a MIDI format. We calculate ings, notes of a chord are usually not played simulta-
an interval as: neously. We consider notes within a window of 50 ms
∆ = y n − y n−1 , (2) as a chord. The windows are non-overlapped. Each
note only belongs to one chord. For a special case of
where y n is the MIDI pitch of a note and n is the index a set of onsets at 0, 25, 50, 75, and 100 ms, our sys-
of the note. tem first searches chords in a window starting from 0
We calculate ordered intervals including both posi- ms and returns {0, 25, 50}. Then, the system searches
tive intervals and negative intervals. For example, the chords in a window starting from 75 ms and returns
interval ∆ for an upward progress from C4 to D4 is 2, {75, 100}. We discard the pitch sets with more or
and the interval ∆ for a downward progress from C4 less than three notes within a 50 ms window to ensure
to A3 is −3. We only consider intervals between -11 chords are trichords. The sliding window for count-
to 11 (included) and discard the intervals outside this ing pitch sets will ensure that there are no overlapped
region. For example, the value 11 indicates a major when counting trichords. A major triad can be writ-
seventh interval. Fig. 9 shows the interval distribution ten as {0, 4, 7}, where the interval between 0 and 4 is a
of six composers. All composers used major second and major third, and the interval between 4 and 7 is a mi-
minor third most in their works. The interval distribu- nor third. We transpose all chords to chords with lower
tion is not symmetric to the origin. For example, J. S. notes C. For example, a chord {2, 6, 9} is transposed into
Table 2: Accuracy of retrieved music works of six com-

1.0
posers.
0.8
J. S. Bach Mozart Beethoven Chopin Liszt Debussy
Correct 147 85 82 102 197 29
0.6
Scores
Incorrect 102 35 70 171 22 9

Accuracy 59% 71% 54% 37% 90% 76%
0.4
Table 3: Accuracy of retrieved music works of six com-
Precision
0.2 Recall
posers after surname constraint.
F1
0.0 J. S. Bach Mozart Beethoven Chopin Liszt Debussy
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 Correct 129 72 76 96 141 27
Thresholds Incorrect 44 16 5 21 6 3
Accuracy 75% 82% 94% 82% 96% 90%
Figure 12: Precision, recall, and F1 score of solo piano
detection.
86.67%, and 88.14%, respectively. In this work, we set
{0, 4, 7}. We merge chords with the same prime form. the threshold to 0.5 to balance the precision and recall
Fig. 10 shows the trichord distribution of six com- for solo piano detection.
posers. All composers used major triad {0, 4, 7} most
followed by minor triad {0, 3, 7} in their works. Liszt 5.2 Metadata Evaluation
used more augmented triad {0, 4, 8} than other com- We randomly select 200 solo piano works from the
posers. Debussy used more {0, 2, 5}, {0, 2, 4} than other full GiantMIDI-Piano dataset and manually check how
composers which distinguished him from other com- many audio recordings and metadata are matched.
posers. Fig. 10 shows that composers from different We observe that 174 out of 200 solo piano works are
eras have different preferences for using trichords. correctly matched, leading to a metadata accuracy of
87%. Most errors are caused by mismatched composer
4.10 Tetrachord Distribution names. For example, when the keyword X is Chartier,
A tetrachord is a set of any four pitch-classes (Forte, Mathieu, Nocturne No.1 composed by Chartier, the re-
1973). Similar to trichord, we consider notes within trieved YouTube title Y is Nocturne No. 1 composed by
a window of 50 ms as a chord. We discard the pitch Chopin. After surname constraint, 136 out of 140 solo
sets with more or less than four notes within a 50 ms piano works are correctly matched, leading to a preci-
window to ensure chords are tetrachords. A domi- sion of 97.14%. We also observe that there are 180 live
nant seventh chord can be denoted as {0, 4, 7, 10}. Fig. performances and 20 sequenced MIDI files out of 200
11 shows the tetrachord distribution of six composers. solo piano works.
Seventh chords such as {0, 2, 6, 9} are transposed to root Furthermore, Table 2 shows the number of matched
position seventh chords {0, 4, 7, 10}. Fig. 11 shows that music works composed by six different composers. Cor-
Bach, Beethoven and Mozart, and Chopin used dom- rect indicates that the retrieved solo piano works are
inant seventh {0, 4, 7, 10} most. Liszt used diminished indeed composed by the composer. Incorrect indicates
seventh {0, 3, 6, 9} most and Debussy used minor seventh that the retrieved music works are not composed by
{0, 3, 7, 10} most. J. S. Bach used less dominant seventh the composer but are composed by someone else and
compared to the other five composers. The tetrachord are attached to the composer. Without surname con-
distribution of Debussy is different from other com- straint, Liszt achieves the highest match accuracy of
posers. Fig. 11 shows that composers from different 90%, while Chopin achieves the lowest match accu-
eras have different preferences for using tetrachords. racy of 37%. Table 3 shows that the match accuracy
of Chopin increases from 37% to 82% after surname
5. Evaluation of GiantMIDI-Piano constraint. The accuracy of other composers also in-
5.1 Solo Piano Evaluation creases. The curated GiantMIDI-Piano dataset contains
We evaluate the solo piano detection system as follows. 7,236 MIDI files composed by 1,787 composers. We
We manually label 200 randomly selected music works use a youtube_title_contains_surname flag in the meta-
from 60,724 downloaded audio recordings. We calcu- data file to indicate whether the surname is verified.
late the precision, recall, and F1 scores of the solo pi-
ano detection system with different thresholds ranging 5.3 Piano Transcription Evaluation
from 0.1 to 0.9 and show results in Fig. 12. Horizontal The piano transcription system achieves a state-of-the-
and vertical axes show different thresholds and scores, art onset F1 score of 96.72%, onset and offset F1 score
respectively. Fig. 12 shows that higher thresholds lead of 82.47%, and an onset, offset, and velocity F1 score
to higher precision but lower recall. When we set of 80.92% on the test set of the MAESTRO dataset.
the threshold to 0.5, the solo piano detection system The sustain pedal transcription system achieves an on-
achieves a precision, recall, and F1 score of 89.66%, set F1 of 91.86%, and a sustain-pedal onset and off-
Maestro GiantMIDI-Piano Relative difference

0.30 0.30 0.30
0.25 0.25 0.25
0.20 0.20 0.20
0.15 0.15 0.15
0.10 0.10 0.10
0.05 0.05 0.05
0.00 0.00 0.00

S D I ER S D I ER S D I ER
Figure 13: From left to right: ER of 52 solo piano works in the MAESTRO dataset; ER of 52 solo piano works in
the GiantMIDI-Piano dataset; Relative ER between the MAESTRO and the GiantMIDI-Piano dataset.
Table 4: Piano transcription evaluation on the
set F1 of 86.58%. The piano transcription system GiantMIDI-Piano dataset
outperforms the previous onsets and frames system
Hawthorne et al. (2018, 2019) with an onset F1 score D I S ER
of 94.80%.
We evaluate the quality of GiantMIDI-Piano on 52 Maestro 0.009 0.024 0.018 0.061
music works that appear in all of the GiantMIDI-Piano, GiantMIDI-Piano 0.015 0.051 0.069 0.154
the MAESTRO, and the Kunsterfuge (Kunstderfuge, Relative difference 0.006 0.026 0.047 0.094
2002) datasets. Long music works such as Sonatas
are split into movements. Repeated music sections are
removed. Evaluating GiantMIDI-Piano is a challeng- come from that a pianist may accidentally miss or add
ing problem because there are no aligned ground-truth notes while performing (Repp, 1996). The transcrip-
MIDI files, so the metrics in (Hawthorne et al., 2018) tion errors e transcriptionG come from piano transcription
are not usable. In this work, we propose to use an system errors. The alignment errors e alignmentG come
alignment metric (Nakamura et al., 2017) called error from the sequence alignment algorithm (Nakamura
rate (ER) to evaluate the quality of transcribed MIDI et al., 2017).
files. This metric reflects the substitution, deletion, Audio recordings and MIDI files are perfectly
and insertion between a transcribed MIDI file and a aligned in the MAESTRO dataset, so there are no tran-
target MIDI file. For a solo piano work, we align a scription errors. The ER can be written as:
transcribed MIDI file with its sequenced MIDI version
using a hidden Markov model (HMM) tool (Nakamura E R M = e performanceM + e alignmentM , (5)
et al., 2017), where the sequenced MIDI files are from
the Kunsterfuge (Kunstderfuge, 2002) dataset. The ER where the subscript M is the abbreviation for MAE-
is defined as the summation of substitution, insertion, STRO. For a same music work, we assume an approx-
and deletion: imation e performanceG ≈ e performanceM despite the perfor-
S +D +I mance among pianists are different. Similarly, we as-
ER = , (3)
N sume an approximation e alignmentG ≈ e alignmentM despite
where N is the number of reference notes, and S , I , the alignment errors are different.
and D are the number of substitution, insertion, and Those approximations are more accurate when the
deletion, respectively. Substitution indicates that some levels of the two pianists are closer. Then, we propose
notes are replacements of ground truth notes. Inser- a relative error by subtracting (4) and (5):
tion indicates that extra notes are being played. Dele-
tion indicates that some notes are missing. Lower ER r = E R G − E R M ≈ e transcriptionG . (6)
indicates better transcription performance.
The relative error r is a rough approximation of the
The ER of music works from GiantMIDI-Piano con-
transcription errors e transcriptionG . A lower r value indi-
sists of three parts: 1) performance errors, 2) transcrip-
cates better transcription quality.
tion errors, and 3) alignment errors:
Table 4 shows the alignment performance. The
E R G = e performanceG + e transcriptionG + e alignmentG (4) median alignment S M , D M , I M and E R M on the MAE-
STRO dataset are 0.009, 0.024, 0.021 and 0.061 re-
where the subscript G is the abbreviation for spectively. The median alignment S G , D G , I G and
GiantMIDI-Piano. The performance errors e performanceG E R G on the GiantMIDI-Piano dataset are 0.015, 0.051,
0.069 and 0.154 respectively. The relative error r be- Bryner, B. (2002). The piano roll: a valuable record-
tween MAESTRO and GiantMIDI-Piano is 0.094. The ing medium of the twentieth century. PhD thesis,
first column of Fig. 13 shows the box plot metrics Department of Music, University of Utah.
of MAESTRO. Some outliers are omitted from the fig- Cancino-Chacón, C. E., Grachten, M., Goebl, W., and
ures for better visualization. Some outliers are mostly Widmer, G. (2018). Computational models of ex-
caused by different interpretations of trills and tremo- pressive music performance: A comprehensive and
los. The second column of Fig. 13 shows the box plot critical review. Frontiers in Digital Humanities,
metrics of GiantMIDI-Piano. In GiantMIDI-Piano, Key- 5:25.
board Sonata in E-Flat Major, Hob. XVI/49 composed Casey, M. A., Veltkamp, R., Goto, M., Leman, M.,
by Haydn achieves the lowest ER of 0.037, while Pre- Rhodes, C., and Slaney, M. (2008). Content-
lude and Fugue in A-flat major, BWV 862 composed by based music information retrieval: Current direc-
Bach achieves the highest ER of 0.679 (outlier beyond tions and future challenges. Proceedings of the
the plot range). This underperformance is due to the IEEE, 96(4):668–696.
piano is not tuned to a standard pitch with A4 of 440 Choi, K., Fazekas, G., Cho, K., and Sandler, M. (2017).
Hz. The third column of Fig. 13 shows the relative ER A tutorial on deep learning for music information
between MAESTRO and GiantMIDI-Piano. The relative retrieval. arXiv preprint arXiv:1709.04396.
median scores of S, D, I and ER are 0.006, 0.026, 0.047 Classical Archives (2000). Classical Archives. www.
and 0.094 respectively. Fig. 13 also shows that there classicalarchives.com.
are fewer deletions than insertions. Conklin, D. and Witten, I. H. (1995). Multiple view-
point systems for music prediction. Journal of New
6. Conclusion Music Research, 24(1):51–73.
We collect and transcribe a large-scale GiantMIDI- Duan, Z., Pardo, B., and Zhang, C. (2010). Multi-
Piano dataset containing 38,700,838 transcribed pi- ple fundamental frequency estimation by modeling
ano notes from 10,855 unique classical piano works spectral peaks and non-peak regions. IEEE Trans-
composed by 2,786 composers. The total duration actions on Audio, Speech, and Language Processing,
of GiantMIDI-Piano is 1,237 hours. The curated sub- 18(8):2121–2133.
set contains 24,253,495 piano notes from 7,236 works Emiya, V., Bertin, N., David, B., and Badeau, R. (2010).
composed by 1,787 composers. GiantMIDI-Piano is Maps-a piano database for multipitch estimation
transcribed from YouTube audio recordings searched and automatic transcription of music. In Research
by meta-information from IMSLP. Report, INRIA-00544155f.
The solo piano detection system used in GiantMIDI- Forte, A. (1973). The structure of atonal music, volume
Piano achieves an F1 score of 88.14%, and the piano 304. Yale University Press.
transcription system achieves a relative error rate of Foscarin, F., Mcleod, A., Rigaux, P., Jacquemard, F.,
0.094. The limitations of GiantMIDI-Piano include: and Sakai, M. (2020). ASAP: a dataset of aligned
1) There are no pitch spellings to distinguish enhar- scores and performances for piano transcription.
monic notes. 2) GiantMIDI-Piano does not provide In International Society for Music Information Re-
beats, time signatures, key signatures, and scores. 2) trieval (ISMIR).
GiantMIDI-Piano does not disentangle the music score Glorot, X., Bordes, A., and Bengio, Y. (2011). Deep
and the expressive performance of pianists. sparse rectifier neural networks. In Proceedings of
We have released the source code for acquiring the Conference on Artificial Intelligence and Statis-
GiantMIDI-Piano. In the future, GiantMIDI-Piano can tics, pages 315–323.
be used in many research areas, including but not lim- Good, M. et al. (2001). MusicXML: An internet-friendly
ited to musical analysis, music generation, music infor- format for sheet music. In XML conference and expo,
mation retrieval, and expressive performance analysis. pages 03–04. Citeseer.
Hashida, M., Matsui, T., and Katayose, H. (2008).
7. Acknowledgement A new music database describing deviation infor-
We thank all anonymous reviewers, editors, and copy mation of performance expressions. In Interna-
editors for their substantial reviews of this manuscript. tional Society for Music Information Retrieval (IS-
We thank Prof. Xiaofeng Zhang for passing his compo- MIR), pages 489–494.
sition knowledge to Qiuqiang Kong during his under- Hawthorne, C., Elsen, E., Song, J., Roberts, A., Si-
graduate study at the South China University of Tech- mon, I., Raffel, C., Engel, J., Oore, S., and Eck,
nology. D. (2018). Onsets and frames: Dual-objective pi-
ano transcription. In International Society for Music
References Information Retrieval (ISMIR).
Bainbridge, D. and Bell, T. (2001). The challenge of Hawthorne, C., Stasyuk, A., Roberts, A., Simon, I.,
optical music recognition. Computers and the Hu- Huang, C. A., Dieleman, S., Elsen, E., Engel, J.,
manities, 35(2):95–121. and Eck, D. (2019). Enabling factorized piano
music modeling and generation with the maestro Proceedings of the XIV Colloquium on Musical Infor-
dataset. International Conference on Learning Rep- matics, volume 1, pages 167–171. Citeseer.
resentations (ICLR). Niwattanakul, S., Singthongchai, J., Naenudorn, E.,
Huang, C.-Z. A., Hawthorne, C., Roberts, A., Din- and Wanapu, S. (2013). Using of Jaccard coef-
culescu, M., Wexler, J., Hong, L., and Howcroft, ficient for keywords similarity. In Proceedings of
J. (2019). The Bach Doodle: Approachable music the International MultiConference of Engineers and
composition with machine learning at scale. In In- Computer Scientists (IMECS), pages 380–384.
ternational Society for Music Information Retrieval Raffel, C. (2016). Learning-based methods for com-
(ISMIR). paring sequences, with applications to audio-to-midi
Huang, C.-Z. A., Vaswani, A., Uszkoreit, J., Simon, I., alignment and matching. PhD thesis, Columbia
Hawthorne, C., Shazeer, N., Dai, A. M., Hoffman, University.
M. D., Dinculescu, M., and Eck, D. (2018). Mu- Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A. R.,
sic transformer: Generating music with long-term Guedes, C., and Cardoso, J. S. (2012). Optical mu-
structure. In International Conference on Learning sic recognition: state-of-the-art and open issues.
Representations (ICLR). International Journal of Multimedia Information Re-
IMSLP (2006). International Music Score Library trieval, 1(3):173–190.
Project. imslp.org. Repp, B. H. (1996). The art of inaccuracy: Why pi-
Ioffe, S. and Szegedy, C. (2015). Batch normalization: anists’ errors are difficult to hear. Music Perception,
Accelerating deep network training by reducing in- 14(2):161–183.
ternal covariate shift. Proceedings of the Interna- Roland, P. (2002). The music encoding initiative
tional Conference on Machine Learning (ICML). (MEI). In Proceedings of the First International Con-
Kim, J. W. and Bello, J. P. (2019). Adversarial learning ference on Musical Applications Using XML, volume
for improved onsets and frames music transcrip- 1060, pages 55–59.
tion. In International Society for Music Information Sapp, C. S. (2005). Online database of scores in the
Retrieval (ISMIR). humdrum file format. In International Society for
Kong, Q., Cao, Y., Iqbal, T., Wang, Y., Wang, W., Music Information Retrieval (ISMIR), pages 664–
and Plumbley, M. D. (2020). PANNs: Large-scale 665.
pretrained audio neural networks for audio pat- Shi, Z., Sapp, C. S., Arul, K., McBride, J., and Smith III,
tern recognition. IEEE/ACM Transactions on Audio, J. O. (2019). Supra: Digitizing the stanford uni-
Speech, and Language Processing, 28:2880–2894. versity piano roll archive. In International Soci-
Kong, Q., Li, B., Song, X., Wan, Y., and Wang, Y. ety for Music Information Retrieval (ISMIR), pages
(2021). High-resolution piano transcription with 517–523.
pedals by regressing onsets and offsets times. Smith, D. and Wood, C. (1981). The ‘USI’, or Universal
IEEE/ACM Transactions on Audio, Speech, and Lan- Synthesizer Interface. In Audio Engineering Society
guage Processing, 29:3707–3717. Convention 70.
Volk, A., Wiering, F., and K., P. K. (2011). Unfolding
Krueger, B. (1996). Classical Piano MIDI Page. http:
the potential of computational musicology. In In-
//www.piano-midi.de.
ternational Conference on Informatics and Semiotics
Kunstderfuge (2002). Kunstderfuge. http://www.
in Organisations (ICISO), pages 137–144.
kunstderfuge.com.
Yang, L., Chou, S., and Yang, Y. (2017). MidiNet: A
Kwon, T., Jeong, D., and Nam, J. (2020). Poly-
convolutional generative adversarial network for
phonic piano transcription using autoregressive
symbolic-domain music generation. In Interna-
multi-state note model. In International Society for
tional Society for Music Information Retrieval (IS-
Music Information Retrieval (ISMIR).
MIR), pages 324–331.
Li, B., Liu, X., Dinesh, K., Duan, Z., and Sharma, G.
(2018). Creating a multitrack classical music per-
formance dataset for multimodal music analysis:
Challenges, insights, and applications. IEEE Trans-
actions on Multimedia, 21(2):522–535.
Meredith, D. (2016). Computational music analysis,
volume 62. Springer.
Nakamura, E., Yoshii, K., and Katayose, H. (2017). Per-
formance error detection and post-processing for
fast and accurate symbolic music alignment. In In-
ternational Society for Music Information Retrieval
(ISMIR), pages 347–353.
Nienhuys, H.-W. and Nieuwenhuizen, J. (2003). Lily-
pond, a system for automated music engraving. In

Giantmidi-Piano: A Large-Scale Midi Dataset For Classical Piano Music

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Giantmidi-Piano: A Large-Scale Midi Dataset For Classical Piano Music

Uploaded by

Copyright:

Available Formats

Anonymous (2022).

GiantMIDI-Piano: A large-scale MIDI Dataset

GiantMIDI-Piano: A large-scale MIDI Dataset for

Keywords: GiantMIDI-Piano, dataset, piano transcription

1. Introduction ano rolls are continuous rolls of paper with perfora-

Table 1: Piano MIDI datasets. GP is the abbreviation

3. GiantMIDI-Piano Dataset Higher J indicates that X and Y have larger similarity,

3.5 Piano Transcription

alities of the full GP dataset.

predicts all of the pitch, onset, offset, and velocity at-

notes on a modern piano.

Wagner, Richard 2 6 Nichifor, Serban 15 155

outputs are post-processed into MIDI events.

Bendel, Franz Arensky, Anton

Mourey, Colette 2 6 Albéniz, Isaac 15 23

the number and duration of music works composed by

Figure 4: Note histogram of the curated GP dataset.

Beethoven, Ludwig van

played more often than black keys.

the whole range of a modern piano.

4.5 Pitch Distribution of Top 100 Composers

of most composers are between C4 and C5 , where C4

erage pitch value of C4 . Carl Czerny has the highest

Top 100 Composers

average pitch value of A4 .

4.7 Pitch Class Distribution

Dussek, Jan Ladislav 11.14 Schmitt, Florent 66.06

notes per second with a value of 13.61.

Weber, Carl Maria von 11.81 Bertini, Henri 66.74

Beethoven used more C, D, and G than other notes.

Bach, Johann Sebastian Mozart, Wolfgang Amadeus Beethoven, Ludwig van

0.10 0.10 0.10

0.00 0.00 0.00

0.10 0.10 0.10

0.00 0.00 0.00

Figure 8: Pitch class distribution of six composers of

0.10 0.10 0.10

0.10 0.10 0.10

Figure 9: Interval distribution of six composers of the

Table 2: Accuracy of retrieved music works of six com-

Incorrect 102 35 70 171 22 9

Maestro GiantMIDI-Piano Relative difference

0.25 0.25 0.25

0.20 0.20 0.20

0.15 0.15 0.15

0.10 0.10 0.10

0.05 0.05 0.05

0.00 0.00 0.00

You might also like