Professional Documents
Culture Documents
ARTICLE TYPE
Abstract
Symbolic music datasets are important for music information retrieval and musical analysis.
However, there is a lack of large-scale symbolic datasets for classical piano music. In this ar-
arXiv:2010.07061v3 [cs.IR] 21 Apr 2022
ticle, we create a GiantMIDI-Piano (GP) dataset containing 38,700,838 transcribed notes and
10,855 unique solo piano works composed by 2,786 composers. We extract the names of music
works and the names of composers from the International Music Score Library Project (IMSLP).
We search and download their corresponding audio recordings from the internet. We further
create a curated subset containing 7,236 works composed by 1,787 composers by constrain-
ing the titles of downloaded audio recordings containing the surnames of composers. We ap-
ply a convolutional neural network to detect solo piano works. Then, we transcribe those solo
piano recordings into Musical Instrument Digital Interface (MIDI) files using a high-resolution
piano transcription system. Each transcribed MIDI file contains the onset, offset, pitch, and ve-
locity attributes of piano notes and pedals. GiantMIDI-Piano includes 90% live performance
MIDI files and 10% sequence input MIDI files. We analyse the statistics of GiantMIDI-Piano and
show pitch class, interval, trichord, and tetrachord frequencies of six composers from different
eras to show that GiantMIDI-Piano can be used for musical analysis. We evaluate the quality
of GiantMIDI-Piano in terms of solo piano detection F1 scores, metadata accuracy, and tran-
scription error rates. We release the source code for acquiring the GiantMIDI-Piano dataset at
https://github.com/bytedance/GiantMIDI-Piano
1.1 Applications The MAPS dataset (Emiya et al., 2010) used MIDI
The GiantMIDI-Piano dataset can be used in many files from Piano-midi.de to render real recordings by
research areas, including 1) Computer-based musi- playing back the MIDI files on a Yamaha Disklavier. The
cal analysis (Volk et al., 2011; Meredith, 2016) such MAESTRO dataset (Hawthorne et al., 2019) contains
as using computers to analyse the structure, chords, over 200 hours of fine alignment MIDI files and audio
and melody of music works. 2) Symbolic music gen- recordings. In MAESTRO, virtuoso pianists performed
eration (Yang et al., 2017; Hawthorne et al., 2019) on Yamaha Disklaviers with an integrated MIDI capture
such as generating symbolic music in symbolic for- system. MAESTRO contains music works from 62 com-
mat. 3) Computer-based music information retrieval posers. There are several duplicated works in MAE-
(Casey et al., 2008; Choi et al., 2017) such as music STRO. For example, there are 11 versions of Scherzo
transcription and music tagging. 4) Expressive perfor- No. 2 in B-flat Minor, Op. 31 composed by Chopin.
mance analysis (Cancino-Chacón et al., 2018) such as All duplicated works are removed when calculating the
analysing the performance of different pianists. number and duration of works.
This paper is organised as follows: Section 2 sur-
veys piano MIDI datasets; Section 3 introduces the col- Table 1 shows the number of composers, the num-
lection of the GiantMIDI-Piano dataset; Section 4 in- ber of unique works, total durations, and data types of
vestigates the statistics of the GiantMIDI-Piano dataset; different MIDI datasets. Data types include sequenced
Section 5 evaluates the quality of the GiantMIDI-Piano (Seq.) MIDI files input by MIDI sequencers and per-
dataset. formed (Perf.) MIDI files played by pianists. There are
other MIDI datasets including the Lakh dataset (Raf-
2. Dataset Survey fel, 2016), the Bach Doodle dataset (Huang et al.,
We introduce several piano MIDI datasets as follows. 2019), the Bach Chorales dataset (Conklin and Wit-
The Piano-midi.de dataset (Krueger, 1996) contains ten, 1995), the URMP dataset (Li et al., 2018),
classical solo piano works entered by a MIDI sequencer. the Bach10 dataset (Duan et al., 2010), the Crest-
Piano-midi.de contains 571 works composed by 26 MusePEDB dataset (Hashida et al., 2008), the SUPRA
composers with a total duration of 36.7 hours till dataset (Shi et al., 2019), and the ASAP dataset (Fos-
Feb. 2020. The classical archives collection (Classi- carin et al., 2020). (Huang et al., 2018) collected
10,000 hours of piano recordings for music generation,
1 https://imslp.org but the dataset is not publicly available.
3 Anonymous: GiantMIDI-Piano: A large-scale MIDI Dataset for Classical Piano Music
0
5
10
15
20
25
30
0
25
50
75
100
125
150
175
200
0
100
200
300
400
500
Liszt, Franz 25 40 Liszt, Franz 141 242
Unknown 671 Beethoven, Ludwig van 21 78
German 364 Schubert, Franz 20 87 Scarlatti, Domenico 140 228
French 322 Bach, Johann Sebastian 129 793
American 267 Bach, Johann Sebastian 17 158 Schubert, Franz 96 703
British 179 Chopin, Frédéric 16 19 Chopin, Frédéric 96 109
Italian 169 Carbajo, Víctor 14 15
Russian 135 Czerny, Carl 13 19 Carbajo, Víctor 77 81
Austrian 90 Schumann, Robert 11 39 Beethoven, Ludwig van 76 286
Polish 74 Rebikov, Vladimir 11 12 Mozart, Wolfgang Amadeu 72 613
Spanish 66 Simpson, Daniel Léo 69 198
Belgian 47 Mozart, Wolfgang Amadeu 11 140 Scriabin, Aleksandr 67 83
Norwegian 32 Haydn, Joseph 10 143 Czerny, Carl 58 83
Czech 29 Rachmaninoff, Sergei 10 19
Swedish 28 Medtner, Nikolay 10 17 Handel, George Frideric 57 224
Hungarian 25 Simpson, Daniel Léo 9 25 Rebikov, Vladimir 54 57
Brazilian 25 Scarlatti, Domenico 9 16 Haydn, Joseph 54 557
Dutch 23 Satie, Erik 52 76
Danish 22 Alkan, Charles-Valentin 9 12 Chaminade, Cécile 45 115
Australian 21 Brahms, Johannes 9 36 Alkan, Charles-Valentin 39 61
Ukrainian 16 Scriabin, Aleksandr 9 13 38 45
Nationalities
Chinese 6 Debussy, Claude 5 19 Moszkowski, Moritz 32 48
Cuban 6 Heller, Stephen 5 39 Johnson, Charles Leslie 32 39
Croatian 5 Zhang, Shuwen 5 5 Scharwenka, Xaver 31 48
Chilean 5 Szymanowski, Karol 5 12 Joplin, Scott 31 35
Venezuelan 4 Teilman, Christian 30 30
Filipino 3 Busoni, Ferruccio 5 9 Meyer-Helmund, Erik 30 31
Uruguayan 3 Satie, Erik 4 8 Fauré, Gabriel 30 87
Colombian 3 Mendelssohn, Felix 4 40 Balakirev, Mily 30 46
Belarusian 2 Clementi, Muzio 4 6
Bulgarian 2 Moszkowski, Moritz 4 10 Gottschalk, Louis Morea 29 72
Azerbaijani 2 4 5 Godard, Benjamin 28 51
Indian 2 Albéniz, Isaac Ravina, Jean Henri 27 34
Iranian 2 Fauré, Gabriel 4 13 Debussy, Claude 27 87
Icelandic 2 Lyapunov, Sergey 4 9 Tchaikovsky, Pyotr 26 123
Turkish 1 Weber, Carl Maria von 4 12 Prokofiev, Sergey 26 72
2 https://github.com/bytedance/piano_transcription
Peruvian 1 Villa-Lobos, Heitor 4 19 Poulenc, Francis 26 77
Asian
Lithuanian 1
Afican
Egyptian 1 Grünfeld, Alfred 4 5 Mendelssohn, Felix 24 119
Balakirev, Mily 4 8
Unknown
Paraguayan 1
European
Grieg, Edvard 24 62
Latvian 1 Hindemith, Paul 4 39 Wachs, Paul 23 27
Greek 1 Arensky, Anton 4 10 Villa-Lobos, Heitor 22 75
Ecuadorean 1 4 35
North American
Dvo ák, Antonín
South American
Nigerian 1 Rzewski, Frederic 22 48
4. Statistics
Dupont, Gabriel 3 6 Lyapunov, Sergey 18 29
Bortkiewicz, Sergei 3 6 Harrington, Jeffrey Mic 18 89
Schumann, Clara 3 5 Agnew, Roy 18 18
Poulenc, Francis 3 10 Granados, Enrique 17 21
Bertini, Henri 22 Di Gesu, Massimo 17 26
Stanchinsky, Aleksey 2 3 Clementi, Muzio 17 25
Handel, George Frideric 2 108 Wieniawski, Józef 16 21
Hahn, Reynaldo 2 8 Weber, Carl Maria von 16 48
Beach, Amy Marcy 2 5 Ravel, Maurice 16 51
Séverac, Déodat de 2 3 Myaskovsky, Nikolay 16 48
Figure 2: Duration of solo piano works of the curated GP dataset. Top 100 are shown.
1,200,000
Number of notes
1,000,000
800,000
600,000
400,000
200,000
0
A0 A 0 B0 C1 C 1 D1 D 1 E1 F1 F 1 G1 G 1 A1 A 1 B1 C2 C 2 D2 D 2 E2 F2 F 2 G2 G 2 A2 A 2 B2 C3 C 3 D3 D 3 E3 F3 F 3 G3 G 3 A3 A 3 B3 C4 C 4 D4 D 4 E4 F4 F 4 G4 G 4 A4 A 4 B4 C5 C 5 D5 D 5 E5 F5 F 5 G5 G 5 A5 A 5 B5 C6 C 6 D6 D 6 E6 F6 F 6 G6 G 6 A6 A 6 B6 C7 C 7 D7 D 7 E7 F7 F 7 G7 G 7 A7 A 7 B7 C8
20,000
10,000
0 AA B
0 0 0 C1 C 1 D1 D 1 E1 F1 F 1 G1 G 1 A1 A 1 B1 C2 C 2 D2 D 2 E2 F2 F 2 G2 G 2 A2 A 2 B2 C3 C 3 D3 D 3 E3 F3 F 3 G3 G 3 A3 A 3 B3 C4 C 4 D4 D 4 E4 F4 F 4 G4 G 4 A4 A 4 B4 C5 C 5 D5 D 5 E5 F5 F 5 G5 G 5 A5 A 5 B5 C6 C 6 D6 D 6 E6 F6 F 6 G6 G 6 A6 A 6 B6 C7 C 7 D7 D 7 E7 F7 F 7 G7 G 7 A7 A 7 B7 C8
20,000
10,000
0 AA B
0 0 0 C1 C 1 D1 D 1 E1 F1 F 1 G1 G 1 A1 A 1 B1 C2 C 2 D2 D 2 E2 F2 F 2 G2 G 2 A2 A 2 B2 C3 C 3 D3 D 3 E3 F3 F 3 G3 G 3 A3 A 3 B3 C4 C 4 D4 D 4 E4 F4 F 4 G4 G 4 A4 A 4 B4 C5 C 5 D5 D 5 E5 F5 F 5 G5 G 5 A5 A 5 B5 C6 C 6 D6 D 6 E6 F6 F 6 G6 G 6 A6 A 6 B6 C7 C 7 D7 D 7 E7 F7 F 7 G7 G 7 A7 A 7 B7 C8
Liszt, Franz
30,000
Number of notes
20,000
10,000
0 AA B
0 0 0 C1 C 1 D1 D 1 E1 F1 F 1 G1 G 1 A1 A 1 B1 C2 C 2 D2 D 2 E2 F2 F 2 G2 G 2 A2 A 2 B2 C3 C 3 D3 D 3 E3 F3 F 3 G3 G 3 A3 A 3 B3 C4 C 4 D4 D 4 E4 F4 F 4 G4 G 4 A4 A 4 B4 C5 C 5 D5 D 5 E5 F5 F 5 G5 G 5 A5 A 5 B5 C6 C 6 D6 D 6 E6 F6 F 6 G6 G 6 A6 A 6 B6 C7 C 7 D7 D 7 E7 F7 F 7 G7 G 7 A7 A 7 B7 C8
Figure 5: Notes histogram of J.S. Bach, Beethoven, and Liszt of the curated GP dataset.
different composers, the nationality of composers, and bert at 20 hours. Some composers composed more
the distribution of notes by composers. Then, we in- non-piano works than solo piano works. For example,
vestigate the statistics of six composers from different there are 108 hours of complete works composed by
eras by calculating their pitch class, interval, trichord, Handel in the dataset, while only 2 hours of them are
and tetrachord distributions. All of Fig. 1 to Fig. 11 ex- played on a modern piano. The rank of composers in
cept Fig. 3 are plotted with the statistics of the curated Fig. 2 is different from Fig. 1, indicating that the aver-
GiantMIDI-Piano dataset. Fig. 3 shows the manually- age duration of solo piano works composed by differ-
checked nationalities of 2,786 composers in the full ent composers are different.
GiantMIDI-Piano dataset.
4.3 Nationalities of Composers
4.1 The Number of Solo Piano Works Fig. 3 shows the number of composers with differ-
Fig. 1 shows the numbers of piano works composed ent nationalities sorted in descending order of the full
by different composers sorted in descending order of GiantMIDI-Piano dataset. The nationality of 2,786
the curated GiantMIDI-Piano dataset. Fig. 1 shows the composers are initially obtained from Wikipedia and
statistics of top 100 composers out of 2,786 composers. are later manually checked3 . Fig. 3 shows that
Blue bars show the number of solo piano works. Pink there are 671 composers with unknown nationality.
bars show the number of complete works, including There are 364 German composers, followed by 322
both solo piano and non-solo piano works. Fig. 1 French composers and 267 American composers. We
shows that there are 141 solo piano works composed color-code the continent of nationalities from “Un-
by Liszt, followed by 140 and 129 solo piano works known”, “European”, “North American”, “South Amer-
composed by Scarlatti and J. S. Bach. Some composers, ican”, “Asian”, to “African”. In GiantMIDI-Piano, the
such as Chopin composed more solo piano works than nationalities of most composers are European. The
non-solo piano works. For example, there are 96 solo numbers of composers with nationalities from South
piano works out of 109 complete works composed by American, Asian, and African are fewer.
Chopin in the curated GiantMIDI-Piano dataset. Fig. 1
shows that the number of solo piano works of different 4.4 Note Histogram
composers has a long tail distribution. Fig. 4 shows the note histogram of the curated
GiantMIDI-Piano dataset. There are 24,253,495 tran-
4.2 The Duration of Solo Piano Works scribed notes. The horizontal axis shows the scientific
pitch notations, which covers 88 notes on a modern
Fig. 2 shows the duration of solo piano works com-
piano from A0 to C8 . Middle C is denoted as C4 . We
posed by different composers sorted in descending or-
der of the curated GiantMIDI-Piano dataset. The du- 3 There are debates on the nationality of some composers. We ex-
ration of works composed by Liszt is the longest at 25 tract the nationality of composers from Wikipedia and do not discuss
hours, followed by Beethoven at 21 hours and Schu- region debates in this work.
Number of notes per second Note names
6
C3
C4
C5
0
5
10
15
20
Grünfeld, Alfred 4.77 Harrington, Jeffrey Mic 59.66
Méreaux, Jean-Amédée Le 4.85 Casella, Alfredo 60.39
Marmontel, Antoine Fran 5.39 Mourey, Colette 60.71
Scharwenka, Xaver 5.51 Franck, César 61.26
Koechlin, Charles 6.14 Wagner, Richard 61.36
Lack, Théodore 6.20 Godard, Benjamin 61.38
Cervantes, Ignacio 6.53 Satie, Erik 61.51
Satie, Erik 6.70 Myaskovsky, Nikolay 61.76
Rebikov, Vladimir 7.00 Busoni, Ferruccio 61.82
Handel, George Frideric 7.08 Prokofiev, Sergey 61.90
Rzewski, Frederic 7.39 Schumann, Robert 61.91
Ferrari, Carlotta 7.53 Rubinstein, Anton 62.08
Fauré, Gabriel 7.55 Brahms, Johannes 62.23
Dupont, Gabriel 7.59 Mendelssohn, Felix 62.32
MacDowell, Edward 7.64 Grieg, Edvard 62.36
Grieg, Edvard 7.78 Ferrari, Carlotta 62.39
Bach, JohannSebastian 7.91 Gurlitt, Cornelius 62.42
Suk, Josef 7.92 Bartók, Béla 62.55
Granados, Enrique 7.93 Sgambati, Giovanni 62.64
Sgambati, Giovanni 7.96 Rzewski, Frederic 62.73
Turina, Joaquín 8.05 Scriabin, Aleksandr 62.98
Franck, César 8.06 Beethoven, Ludwigvan 62.99
Myaskovsky, Nikolay 8.13 Bortkiewicz, Sergei 63.25
Haydn, Joseph 8.26 Tchaikovsky, Pyotr 63.32
Gurlitt, Cornelius 8.29 Medtner, Nikolay 63.45
Rubinstein, Anton 8.32 Stanchinsky, Aleksey 63.51
Cui, César 8.33 Teilman, Christian 63.52
Bartók, Béla 8.34 Godowsky, Leopold 63.59
Frequency
Frequency
0.05 0.05 0.05
Frequency
Frequency
0.05 0.05 0.05
Figure 13: From left to right: ER of 52 solo piano works in the MAESTRO dataset; ER of 52 solo piano works in
the GiantMIDI-Piano dataset; Relative ER between the MAESTRO and the GiantMIDI-Piano dataset.
Table 4: Piano transcription evaluation on the
set F1 of 86.58%. The piano transcription system GiantMIDI-Piano dataset
outperforms the previous onsets and frames system
Hawthorne et al. (2018, 2019) with an onset F1 score D I S ER
of 94.80%.
We evaluate the quality of GiantMIDI-Piano on 52 Maestro 0.009 0.024 0.018 0.061
music works that appear in all of the GiantMIDI-Piano, GiantMIDI-Piano 0.015 0.051 0.069 0.154
the MAESTRO, and the Kunsterfuge (Kunstderfuge, Relative difference 0.006 0.026 0.047 0.094
2002) datasets. Long music works such as Sonatas
are split into movements. Repeated music sections are
removed. Evaluating GiantMIDI-Piano is a challeng- come from that a pianist may accidentally miss or add
ing problem because there are no aligned ground-truth notes while performing (Repp, 1996). The transcrip-
MIDI files, so the metrics in (Hawthorne et al., 2018) tion errors e transcriptionG come from piano transcription
are not usable. In this work, we propose to use an system errors. The alignment errors e alignmentG come
alignment metric (Nakamura et al., 2017) called error from the sequence alignment algorithm (Nakamura
rate (ER) to evaluate the quality of transcribed MIDI et al., 2017).
files. This metric reflects the substitution, deletion, Audio recordings and MIDI files are perfectly
and insertion between a transcribed MIDI file and a aligned in the MAESTRO dataset, so there are no tran-
target MIDI file. For a solo piano work, we align a scription errors. The ER can be written as:
transcribed MIDI file with its sequenced MIDI version
using a hidden Markov model (HMM) tool (Nakamura E R M = e performanceM + e alignmentM , (5)
et al., 2017), where the sequenced MIDI files are from
the Kunsterfuge (Kunstderfuge, 2002) dataset. The ER where the subscript M is the abbreviation for MAE-
is defined as the summation of substitution, insertion, STRO. For a same music work, we assume an approx-
and deletion: imation e performanceG ≈ e performanceM despite the perfor-
S +D +I mance among pianists are different. Similarly, we as-
ER = , (3)
N sume an approximation e alignmentG ≈ e alignmentM despite
where N is the number of reference notes, and S , I , the alignment errors are different.
and D are the number of substitution, insertion, and Those approximations are more accurate when the
deletion, respectively. Substitution indicates that some levels of the two pianists are closer. Then, we propose
notes are replacements of ground truth notes. Inser- a relative error by subtracting (4) and (5):
tion indicates that extra notes are being played. Dele-
tion indicates that some notes are missing. Lower ER r = E R G − E R M ≈ e transcriptionG . (6)
indicates better transcription performance.
The relative error r is a rough approximation of the
The ER of music works from GiantMIDI-Piano con-
transcription errors e transcriptionG . A lower r value indi-
sists of three parts: 1) performance errors, 2) transcrip-
cates better transcription quality.
tion errors, and 3) alignment errors:
Table 4 shows the alignment performance. The
E R G = e performanceG + e transcriptionG + e alignmentG (4) median alignment S M , D M , I M and E R M on the MAE-
STRO dataset are 0.009, 0.024, 0.021 and 0.061 re-
where the subscript G is the abbreviation for spectively. The median alignment S G , D G , I G and
GiantMIDI-Piano. The performance errors e performanceG E R G on the GiantMIDI-Piano dataset are 0.015, 0.051,
10 Anonymous: GiantMIDI-Piano: A large-scale MIDI Dataset for Classical Piano Music
0.069 and 0.154 respectively. The relative error r be- Bryner, B. (2002). The piano roll: a valuable record-
tween MAESTRO and GiantMIDI-Piano is 0.094. The ing medium of the twentieth century. PhD thesis,
first column of Fig. 13 shows the box plot metrics Department of Music, University of Utah.
of MAESTRO. Some outliers are omitted from the fig- Cancino-Chacón, C. E., Grachten, M., Goebl, W., and
ures for better visualization. Some outliers are mostly Widmer, G. (2018). Computational models of ex-
caused by different interpretations of trills and tremo- pressive music performance: A comprehensive and
los. The second column of Fig. 13 shows the box plot critical review. Frontiers in Digital Humanities,
metrics of GiantMIDI-Piano. In GiantMIDI-Piano, Key- 5:25.
board Sonata in E-Flat Major, Hob. XVI/49 composed Casey, M. A., Veltkamp, R., Goto, M., Leman, M.,
by Haydn achieves the lowest ER of 0.037, while Pre- Rhodes, C., and Slaney, M. (2008). Content-
lude and Fugue in A-flat major, BWV 862 composed by based music information retrieval: Current direc-
Bach achieves the highest ER of 0.679 (outlier beyond tions and future challenges. Proceedings of the
the plot range). This underperformance is due to the IEEE, 96(4):668–696.
piano is not tuned to a standard pitch with A4 of 440 Choi, K., Fazekas, G., Cho, K., and Sandler, M. (2017).
Hz. The third column of Fig. 13 shows the relative ER A tutorial on deep learning for music information
between MAESTRO and GiantMIDI-Piano. The relative retrieval. arXiv preprint arXiv:1709.04396.
median scores of S, D, I and ER are 0.006, 0.026, 0.047 Classical Archives (2000). Classical Archives. www.
and 0.094 respectively. Fig. 13 also shows that there classicalarchives.com.
are fewer deletions than insertions. Conklin, D. and Witten, I. H. (1995). Multiple view-
point systems for music prediction. Journal of New
6. Conclusion Music Research, 24(1):51–73.
We collect and transcribe a large-scale GiantMIDI- Duan, Z., Pardo, B., and Zhang, C. (2010). Multi-
Piano dataset containing 38,700,838 transcribed pi- ple fundamental frequency estimation by modeling
ano notes from 10,855 unique classical piano works spectral peaks and non-peak regions. IEEE Trans-
composed by 2,786 composers. The total duration actions on Audio, Speech, and Language Processing,
of GiantMIDI-Piano is 1,237 hours. The curated sub- 18(8):2121–2133.
set contains 24,253,495 piano notes from 7,236 works Emiya, V., Bertin, N., David, B., and Badeau, R. (2010).
composed by 1,787 composers. GiantMIDI-Piano is Maps-a piano database for multipitch estimation
transcribed from YouTube audio recordings searched and automatic transcription of music. In Research
by meta-information from IMSLP. Report, INRIA-00544155f.
The solo piano detection system used in GiantMIDI- Forte, A. (1973). The structure of atonal music, volume
Piano achieves an F1 score of 88.14%, and the piano 304. Yale University Press.
transcription system achieves a relative error rate of Foscarin, F., Mcleod, A., Rigaux, P., Jacquemard, F.,
0.094. The limitations of GiantMIDI-Piano include: and Sakai, M. (2020). ASAP: a dataset of aligned
1) There are no pitch spellings to distinguish enhar- scores and performances for piano transcription.
monic notes. 2) GiantMIDI-Piano does not provide In International Society for Music Information Re-
beats, time signatures, key signatures, and scores. 2) trieval (ISMIR).
GiantMIDI-Piano does not disentangle the music score Glorot, X., Bordes, A., and Bengio, Y. (2011). Deep
and the expressive performance of pianists. sparse rectifier neural networks. In Proceedings of
We have released the source code for acquiring the Conference on Artificial Intelligence and Statis-
GiantMIDI-Piano. In the future, GiantMIDI-Piano can tics, pages 315–323.
be used in many research areas, including but not lim- Good, M. et al. (2001). MusicXML: An internet-friendly
ited to musical analysis, music generation, music infor- format for sheet music. In XML conference and expo,
mation retrieval, and expressive performance analysis. pages 03–04. Citeseer.
Hashida, M., Matsui, T., and Katayose, H. (2008).
7. Acknowledgement A new music database describing deviation infor-
We thank all anonymous reviewers, editors, and copy mation of performance expressions. In Interna-
editors for their substantial reviews of this manuscript. tional Society for Music Information Retrieval (IS-
We thank Prof. Xiaofeng Zhang for passing his compo- MIR), pages 489–494.
sition knowledge to Qiuqiang Kong during his under- Hawthorne, C., Elsen, E., Song, J., Roberts, A., Si-
graduate study at the South China University of Tech- mon, I., Raffel, C., Engel, J., Oore, S., and Eck,
nology. D. (2018). Onsets and frames: Dual-objective pi-
ano transcription. In International Society for Music
References Information Retrieval (ISMIR).
Bainbridge, D. and Bell, T. (2001). The challenge of Hawthorne, C., Stasyuk, A., Roberts, A., Simon, I.,
optical music recognition. Computers and the Hu- Huang, C. A., Dieleman, S., Elsen, E., Engel, J.,
manities, 35(2):95–121. and Eck, D. (2019). Enabling factorized piano
11 Anonymous: GiantMIDI-Piano: A large-scale MIDI Dataset for Classical Piano Music
music modeling and generation with the maestro Proceedings of the XIV Colloquium on Musical Infor-
dataset. International Conference on Learning Rep- matics, volume 1, pages 167–171. Citeseer.
resentations (ICLR). Niwattanakul, S., Singthongchai, J., Naenudorn, E.,
Huang, C.-Z. A., Hawthorne, C., Roberts, A., Din- and Wanapu, S. (2013). Using of Jaccard coef-
culescu, M., Wexler, J., Hong, L., and Howcroft, ficient for keywords similarity. In Proceedings of
J. (2019). The Bach Doodle: Approachable music the International MultiConference of Engineers and
composition with machine learning at scale. In In- Computer Scientists (IMECS), pages 380–384.
ternational Society for Music Information Retrieval Raffel, C. (2016). Learning-based methods for com-
(ISMIR). paring sequences, with applications to audio-to-midi
Huang, C.-Z. A., Vaswani, A., Uszkoreit, J., Simon, I., alignment and matching. PhD thesis, Columbia
Hawthorne, C., Shazeer, N., Dai, A. M., Hoffman, University.
M. D., Dinculescu, M., and Eck, D. (2018). Mu- Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A. R.,
sic transformer: Generating music with long-term Guedes, C., and Cardoso, J. S. (2012). Optical mu-
structure. In International Conference on Learning sic recognition: state-of-the-art and open issues.
Representations (ICLR). International Journal of Multimedia Information Re-
IMSLP (2006). International Music Score Library trieval, 1(3):173–190.
Project. imslp.org. Repp, B. H. (1996). The art of inaccuracy: Why pi-
Ioffe, S. and Szegedy, C. (2015). Batch normalization: anists’ errors are difficult to hear. Music Perception,
Accelerating deep network training by reducing in- 14(2):161–183.
ternal covariate shift. Proceedings of the Interna- Roland, P. (2002). The music encoding initiative
tional Conference on Machine Learning (ICML). (MEI). In Proceedings of the First International Con-
Kim, J. W. and Bello, J. P. (2019). Adversarial learning ference on Musical Applications Using XML, volume
for improved onsets and frames music transcrip- 1060, pages 55–59.
tion. In International Society for Music Information Sapp, C. S. (2005). Online database of scores in the
Retrieval (ISMIR). humdrum file format. In International Society for
Kong, Q., Cao, Y., Iqbal, T., Wang, Y., Wang, W., Music Information Retrieval (ISMIR), pages 664–
and Plumbley, M. D. (2020). PANNs: Large-scale 665.
pretrained audio neural networks for audio pat- Shi, Z., Sapp, C. S., Arul, K., McBride, J., and Smith III,
tern recognition. IEEE/ACM Transactions on Audio, J. O. (2019). Supra: Digitizing the stanford uni-
Speech, and Language Processing, 28:2880–2894. versity piano roll archive. In International Soci-
Kong, Q., Li, B., Song, X., Wan, Y., and Wang, Y. ety for Music Information Retrieval (ISMIR), pages
(2021). High-resolution piano transcription with 517–523.
pedals by regressing onsets and offsets times. Smith, D. and Wood, C. (1981). The ‘USI’, or Universal
IEEE/ACM Transactions on Audio, Speech, and Lan- Synthesizer Interface. In Audio Engineering Society
guage Processing, 29:3707–3717. Convention 70.
Volk, A., Wiering, F., and K., P. K. (2011). Unfolding
Krueger, B. (1996). Classical Piano MIDI Page. http:
the potential of computational musicology. In In-
//www.piano-midi.de.
ternational Conference on Informatics and Semiotics
Kunstderfuge (2002). Kunstderfuge. http://www.
in Organisations (ICISO), pages 137–144.
kunstderfuge.com.
Yang, L., Chou, S., and Yang, Y. (2017). MidiNet: A
Kwon, T., Jeong, D., and Nam, J. (2020). Poly-
convolutional generative adversarial network for
phonic piano transcription using autoregressive
symbolic-domain music generation. In Interna-
multi-state note model. In International Society for
tional Society for Music Information Retrieval (IS-
Music Information Retrieval (ISMIR).
MIR), pages 324–331.
Li, B., Liu, X., Dinesh, K., Duan, Z., and Sharma, G.
(2018). Creating a multitrack classical music per-
formance dataset for multimodal music analysis:
Challenges, insights, and applications. IEEE Trans-
actions on Multimedia, 21(2):522–535.
Meredith, D. (2016). Computational music analysis,
volume 62. Springer.
Nakamura, E., Yoshii, K., and Katayose, H. (2017). Per-
formance error detection and post-processing for
fast and accurate symbolic music alignment. In In-
ternational Society for Music Information Retrieval
(ISMIR), pages 347–353.
Nienhuys, H.-W. and Nieuwenhuizen, J. (2003). Lily-
pond, a system for automated music engraving. In