Automatic Transcription of PIano Music - Presentation at ICME 2011

ICME 2011 Oral Presentation
2011/07/14
Automatic Transcription of Piano Music by Sparse Representation of Magnitude Spectra

Cheng-Te Lee, Yi-Hsuan Yang, and Homer Chen
National Taiwan University Speaker: Cheng-Te Lee
Outline
Introduction Proposed System Performance Analysis & Demo
I.
Introduction
Automatic Transcription
Music signal (in WAVE format) Musical score (in MIDI format)
Goal: Converting music signal to musical scores Main drawbacks of previous work
Training data is difficult to generate Assuming the spectral shapes of notes are constant
4
Spectral Shape of Piano Sound

Spectra of note C4 (MIDI number 60) produced by 6 pianos
ADSR Model
Attack, Decay, Sustain, Release The spectral shape of a note varies with time
Note C4 in time-domain
D
Spectra over time
S
R
Frame
Design Consideration
Exploit online repository of piano notes as database to make the transcription
work without generating training data adapt to a new piano easily adopt the ADSR model
Keyboard
Database of individual piano notes
Input signal
Synthesized mixture
II. Proposed System
System Overview
Volume normalization Frame decomposition FFT analysis Sparse representation computation
WAVE file
Tuning factor estimation
Note candidate selection
Noise elimination
Piano sound database
Database Tuning
Tuned piano sound database
HMM postprocessing
MIDI file
Note Candidate Selection

WAVE file
Noise elimination
Database Tuning
HMM postprocessing
MIDI file
10
Note Candidate Selection

Octave notes can be easily mistaken for each other because they have similar spectra Avoid octave error by note candidate selection
Leverage the harmonic structure of piano sounds Spectra of note C4 (MIDI number 60) of two pianos:
Strong fundamental Weak fundamental
11
Illustration of Candidate Selection

Strong fundamental
Weak fundamental
13
Sparse Representation Computation
WAVE file
Volume normalization
Frame decomposition
FFT analysis
Sparse representation computation
Noise elimination
Database Tuning
HMM postprocessing
MIDI file
14
Sparsity of Played Notes

A total of 88 keys on a piano But the actual keys played each time are a sparse subset of the whole keys
Only 4 voiced notes at a time on average

15
Sparse Representation
Problem formulation
x* arg min || x ||0 subject to y = Ax,
x
(1)
y: vector of the magnitude spectrum of a frame A: matrix of bases, each column of A is the magnitude spectrum of a note candidate x*: vector of sparse representation coefficients
16
Illustration of Sparse Representation

x
(1)
y (frame spectrum)
A (spectra of note candidates)
x* (coefficient vector)
Solving (1) is NP-complete

17
Sparse Representation (contd)

x
(1)
If the solution of (1) is sparse enough, it is close to the solution of the l1-regularized problem
x* arg min || y - Ax ||2 + || x ||1
x
Can be solved in polynomial time, O(n1.2)
18
HMM Post-Processing
WAVE file
Noise elimination
Database Tuning
HMM postprocessing
MIDI file
19
HMM Post-Processing
Model each note with a two-state (on/off) HMM (88 HMMs for 88 keys on a piano) Given a frame sequence X = x1x2xn, t[1,n] Maximize Because
Estimated from sparse representation coefficient
so we maximize
Learnt from MIDI files
20
Result of HMM Post-Processing

True Positive , False Positive False Negative , True Negative,
(a) Before HMM post-processing
(b) After HMM post-processing

21
III. Performance Analysis & Demo
22
Frame-Level Evaluation
70.2% F-measure
10 one-minute long classical music recordings Each frame is 100 ms long, hop size is 10 ms 59,910 frames, 211,082 notes, 3.54 avg. polyphony
Significant improvement compared to two stateof-the-art systems

Under the one-tailed t-test (p-value < 0.05)
F-measure Proposed system Klapuris system [1] 70.2% 62.2% Precision 74.4% 72.4% Recall 66.5% 54.6%
Marolts system [2]
66.1%
78.6%
57.1%
23
[1] M. Marolt, A connectionist approach to automatic transcription of polyphonic piano music, IEEE Trans. Multimedia, vol. 6, no. 3, pp. 439449, 2004. [2] A. Klapuri, Multiple fundamental frequency estimation by summing harmonic amplitudes, in Proc. ISMIR, Victoria, Canada, pp. 216221, Oct. 2006.
Note-Level Evaluation
73.0% F-measure Only consider onsets of notes
Within 100ms of the ground-truth onset 4937 notes
Significant improvement compared to the best system of MIREX F0 tracking 2010 [3]
F-measure Proposed system Yehs system [3] 70.2% 67.1% Precision 74.6% 57.2% Recall 71.6% 81.1%
24
[3] C. Yeh and A. Roebel. (2010). Multiple-F0 estimation for MIREX 2010. Music Information Retrieval Evaluation eXchange. [Online]. Available: http://www.music-ir.org/mirex/abstracts/2010/AR1.pdf
Analysis of System Components
25
Number of Base Elements

Because we adopt the ADSR model, there are more than one base element for each note F-measure is improved from 64.6% (88 base elements) to 70.2% (646 base elements)
26
Conclusion
We have presented an automatic transcription system that
exploits sparse nature of played keys adapts to a new piano easily adopts ADSR model to improve the accuracy
Significant improvement over state-of-the-art systems
Live Demo
Song
Sonata no. 8 Prelude and Pathetique in Fugue No.2 in C minor, 3rd C Minor movement Bach Beethoven Moments Musicaux No. 4 Schubert Sonata K.333 in Bb Major, 1st Movement Mozart
Composer Original
Result
F-measure 78.2% 74.6% 67.0% 78.4%
28
Thanks for your attention

Q&A
29

Automatic Transcription of PIano Music - Presentation at ICME 2011

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automatic Transcription of PIano Music - Presentation at ICME 2011

Uploaded by

Copyright:

Available Formats

ICME 2011 Oral Presentation

Automatic Transcription of Piano Music by Sparse Representation of Magnitude Spectra

Spectral Shape of Piano Sound

Spectra over time

Database of individual piano notes

II. Proposed System

Tuning factor estimation

Note candidate selection

Piano sound database

Tuned piano sound database

Note Candidate Selection

Tuning factor estimation

Note candidate selection

Piano sound database

Tuned piano sound database

Note Candidate Selection

Illustration of Candidate Selection

Sparse Representation Computation

Sparse representation computation

Tuning factor estimation

Note candidate selection

Piano sound database

Tuned piano sound database

Sparsity of Played Notes

Only 4 voiced notes at a time on average

Illustration of Sparse Representation

A (spectra of note candidates)

Solving (1) is NP-complete

Sparse Representation (contd)

Can be solved in polynomial time, O(n1.2)

Tuning factor estimation

Note candidate selection

Piano sound database

Tuned piano sound database

Result of HMM Post-Processing

(a) Before HMM post-processing

(b) After HMM post-processing

III. Performance Analysis & Demo

Significant improvement compared to two stateof-the-art systems

Marolts system [2]

Analysis of System Components

Number of Base Elements

Significant improvement over state-of-the-art systems

Thanks for your attention

You might also like