0% found this document useful (0 votes)

100 views8 pages

Erhu Identification Using MFCC and GMM

This document presents research on using MFCC (mel-frequency cepstral coefficients) and GMM (Gaussian mixture models) for music instrument identification, specifically for identifying the Chinese erhu instrument. The researchers extracted MFCC features from erhu recordings and used GMM to build classification models. Their experimental results showed that MFCC features led to significantly better erhu identification rates compared to conventional power spectrum features, demonstrating the potential of MFCC and GMM for music instrument identification tasks.

Uploaded by

Saurabh Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views8 pages

Erhu Identification Using MFCC and GMM

Uploaded by

Saurabh Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Music Instrument Identification Using MFCC: Erhu as an Example

Chih-Wen Weng*, Cheng-Yuan Lin**, Jyh-Shing Roger Jang** *Chinese Music Dept, Tainan National College of The Arts, Taiwan Computer Science Dept, Tsing Hua University, Taiwan Email: ivanweng@[Link] **Computer Science Dept, Tsing Hua University, Taiwan Email: {gavins, jang}@[Link] Abstract In the analysis of musical acoustics, we usually use the power spectrum to describe the difference between timbres from two music instruments. However, according to our experiments, the power spectrum cannot be used as effective features for erhu instrument identification. In this paper, we use MFCC (mel-scale frequency cepstral coefficients) as features for music instrument identification using GMM (Gaussian mixture models); the result is very encouraging. MFCC and GMM are commonly used in speech/speaker recognition with success. This paper demonstrates that MFCC and GMM can also be used for erhu instrument identification. Immediate extension of the current work includes MFCC-based music instrument assessment and music acoustic analysis. Keywords: Timbre, Power Spectrum, MFCC, GMM, Music Instrument Identification, Erhu Music Instrument

1. Introduction
Conventional analysis of music timbre is usually based on power spectrum to find the frequency distribution of a given audio music signal. In this paper, we propose the use of MFCC (mel frequency cepstral coefficients) [12][15] for timbre classification and music instrument identification. This study is motivated by the success of speech and speaker recognition using MFCC, which is a non-parametric method modeling the human auditory perception system. Our experiments involve the use of GMM (Gaussian mixture models) for music instrument identification of erhu
1

(an instrument for traditional Chinese music), using both the features of power spectrum and MFCC. The initial results demonstrate the feasibility of using MFCC for such purpose. The rest of the paper is organized as follows. Section 2 explains the design method for our ETR (erhu timber recognition) system. Section 3 covers the experimental results and corresponding discussion. Section 4 gives conclusions and possible future directions of this study.

2. ETR System Design

In the following, we shall describe how to build an erhu timbre recognition (ETR for short) system. Similar to most pattern recognition systems (such as speaker recognition [1][7]), ETR involves the following steps: 1. Feature extraction: How to extract discriminant features from the given recordings. 2. Model construction: How to construct the ETR system from the extracted features in the previous step. 3. Model application: How to use the constructed model for classifying new recordings. For most speech and speaker recognition systems, there are several methods for feature extraction, including power spectrum, formant, LPC (linear predictive coding), LSP (line spectrum pair) and MFCC (mel frequency cepstral coefficients) [12][15]. Power spectrum is mostly used to explain the difference in timbres of two instruments. On the other hand, MFCC is by far the most commonly used feature for speech and speaker recognition. Therefore in this paper, we shall investigate the use of power spectrum and MFCC for ETR and compare their performance. Several frequently used classifiers for model construction in pattern recognition include GMM (Gaussian mixture models) [4], VQ (vector quantization) [16], ANN (artificial neural networks) [11], SVM (support vector machines) [14], linear classifiers [13], and so on. Our past experiments for speech and speaker recognition indicate that GMM can usually achieve a better recognition rate. Therefore, we use GMM as the classifier for our ETR system. The following figure indicates the flowchart of our ETR system.

Figure 1. The ETR systems flowchart. Power spectrum is frequently used for explaining the timbre for music instruments. In our experiment, we use fast Fourier transform (FFT) to extract the power spectrum of each frame from the recording of erhu. Each frame consists of 512 sample points, leading to a feature dimension of 256 for power spectrum. This dimension is too large for most model construction methods. Therefore, we choose LDA (linear discriminant analysis) [8] algorithm to select the most discriminant dimensions. The LDA algorithm is a statistical technique for data classification and dimensionality reduction in pattern recognition. The principal concept of LDA is to identify a new basis that can maximize the ratio of between-class variance to the within-class variance. In this paper, we keep 24 discriminant dimensions of each power spectrum for next model construction/application. The MFCC (mel frequency cepstral coefficients) [9] is a very popular feature in speech and speaker recognition. It can be derived using the following steps: 1. 2. 3. 4. 5. 6. Pre-emphasis Hanning (or Hamming) windowing FFT to obtain power spectrum Triangular bandpass filtering Discrete cosine transform to obtain MFCC Taking delta MFCC (optional)

The use of MFCC has been justified by the satisfactory performance of speaker/speech recognition systems available in the literature. Consequently, we choose MFCC for our ETR system. GMM (Gaussian mixture model) is a classic parametric model used in many
3

pattern recognition applications. The GMM assumes that the probability distribution of the observed data takes the following form:
m p ( x | ) = i N ( x; i , i ) , i =1

(1)

where N ( x; i , i ) denotes the p-dimensional normal distribution with mean vector

and covariance matrix , defined by

N ( x; i , i ) = 1 i ( 2 ) p / 2
1 / 2

1 1 exp ( x i )T i ( x i ) 2
m

(2)

In (1) the terms i are positive scalar weights ( i =1 i

x is a p-dimensional vector. In this paper, x is actually a p-dimensional vector

= 1 and i 0 ) and

representing the features of each frame. Each erhu instrument is represented by a GMM and is referred to by its model i . i = { i , i , i }, i = 1,....,10 . GMM can have several different forms depending on the choice of covariance matrices. The covariance matrix has three different types, including one covariance matrix per Gaussian component (nodal covariance), one covariance matrix for all Gaussian components in an erhu model (grand covariance) or a single covariance matrix shared by all erhu models. In addition, the covariance matrix can also be full or diagonal. In this paper, nodal and diagonal covariance matrices are primarily used for erhu modeling. There are several methods for estimating the parameters of GMM. So far the maximum likelihood (ML) [6] estimation has been the most well-established one. The goal of ML estimation is to derive the optimum model parameters that can maximize the likelihood of GMM. Generally, the model parameters are extracted by Expectation-Maximization (EM) [2] algorithm. This algorithm is also used in estimating the HMM (hidden Markov models) parameters, which is also known as the Baum-Welch reestimation algorithm [10]. The statistical models of T erhu instruments can be denoted by {1 , 2 , 3 ,......, T } . In this paper, the value of T is 10. We apply the following equation to find the erhu model which has the maximum a posteriori probability for a given observation sequence X : p ( X | k ) Pr( k ) N = arg max Pr( k | X ) = arg max 1 k T 1 k T p( X ) Assuming equally likely erhu instruments ( Pr(k ) =1/T) and noting that p ( X ) is the same for all erhu models, the classification rule simplifies to
4

N = arg max p( X | k )
1 k T

And using the logarithms and the independence between observations, we have the following equation: J N = arg max log p ( x j | k ) ,
1 k T j =1

Where x j is the feature vector for j-th frame.

3. Experimental Results
For the following experiment, we have collected 11 recordings from each of 10 different erhu instruments. These 110 files were recorded when one of the authors played these 10 erhu instruments with different music, speed, volume, style, and so on. For each erhu, we use its 10 recordings as the training data and the remainder one as the test data. The recording conditions are listed next: Duration: 12 seconds Sampling rate: 16 KHz Bit resolution: 16 bits Number of channels: one (mono) For efficiency consideration, we only try three different durations: 1, 3, 5 seconds for both training and test. For experimental parameters, we have combined 4 different numbers of feature dimensions and GMM mixture counts, as shown next: The feature dimensions of power spectrum or MFCC: 11, 12, 22 and 24. For power spectrum, the selected features are obtained from LDA. For MFCC, the selected features are: 11: Original MFCC, excluding the first term 12: Original MFCC 22: Derivative of the 11 features mentioned above 24: Derivative of the 12 features mentioned above The mixture counts of GMM: 4, 8, 16 and 32. The following two diagrams show the recognition rates using power spectrum and MFCC, respectively, when the duration is only 1 seconds. It is obvious that MFCC can achieve better recognition rates than power spectrum for all the 16 cases. The average recognitions are 26.25% and 61.875% for power spectrum and MFCC, respectively. (The recognition rates referred in the following discussion are based on
5

the leave-one-out method.)

1 seconds training data for each different erhu
1 seconds training data for each different erhu

50 40

Recognition Rate %

30 20 10 0 11 12 32 21 22 8 4 Gaussian Mixture Number 16

Recognition Rate %

0 11 12 32 21 22 MFCC Dimension 8 4 Gaussian Mixture Number 16

Power Spectrum Dimension

The following two diagrams show the recognition rates when the duration is 3 seconds. Again, MFCC can achieve better recognition rates than power spectrum for all the 16 cases. The average recognitions are 48.75% and 85.625% for power spectrum and MFCC, respectively.
3 seconds training data for each different erhu 3 seconds training data for each different erhu

100 80

Recognition Rate %

Recognition Rate %
11 12 32 21 22 8 4 Gaussian Mixture Number 16

60 40 20 0 11 12 32 21 22 MFCC Dimension 8 4 Gaussian Mixture Number 16

Power Spectrum Dimension

The following two diagrams show the recognition rates when the duration is 5 seconds. Again, MFCC can achieve better recognition rates for all the 16 cases. The average recognitions are 59.375% and 93.125% for power spectrum and MFCC, respectively. In particular, when the feature dimension is 22 or 24, ETR based on MFCC can achieve 100% recognition rates for all GMM mixture counts.
5 seconds training data for each different erhu 5 seconds training data for each different erhu

100 80

Recognition Rate %

Recognition Rate %
11 12 32 21 22 8 4 Gaussian Mixture Number 16

60 40 20 0 11 12 32 21 22 MFCC Dimension 8 4 Gaussian Mixture Number 16

Power Spectrum Dimension

From the above experimental results, it is obvious that MFCC is more effective than power spectrum. Moreover, as the duration is 5 seconds or longer, the recognition rates are close to 100%. This indicates MFCC is a feasible feature for
6

erhu instrument identification.

4. Conclusions and Future Work

This paper provides an empirical study of using MFCC and power spectrum for erhu music instrument identification. The experiments demonstrate that MFCC is a much more effective feature for this purpose. This is justifiable since MFCC is widely used in speech/speaker recognition and it is aimed to model the human auditory perception system. An immediate future work of this study is to use other audio features, such as PLP [15] or Wavelet Transform [3]. Moreover, different classes of instruments have different timbres. Therefore we can apply the same algorithm for music instrument assessment, which can provide important information for music performers and instrument manufacturers.

5. Reference
[1] [2] [3] [4] Automatic recognition of speakers from their voices, Proc. IEEE, vol.64 pp. 460-475, 1976 A. Dempster, N. Laird, and D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Royal Stat. Soc., vol. 39, pp. 1-38, 1977. C.S. Burrus, R.A. Gopinath, and h. Guo, Introduction to wavelets and Wavelet Transform, Prentice-hall International, Inc., New Jersey, 1998 D. Reynolds and R. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Processing, vol. 3, no. 1, pp. 72-83, 1995. Duhamel, P. and M. Vetterli, "Fast Fourier Transforms: A Tutorial Review and a State of the Art," Signal Processing, Vol. 19, April 1990, pp. 259-299. G. McLachuo, Mixture Models, New York: Marcel Dokker, 1998. G. R. Doddington, Speaker recognition identifying people by their voices, Proc. IEEE, vol. 73, pp. 1651-1644, Nov. 1985. J. Duchene and S. Leclercq, "An Optimal Transformation for Discriminant Principal Component Analysis," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 10, No 6, November 1988 Krishnamurthy, A.K. and D.G. Childers, Two Channel Speech Analysis, IEEE Trans. on Acoustics, Speech and Signal Processing, 1986, 34, pp. 730-743. L. Baum et al., A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Ann, Math Stat., vol. 41, pp. 164-171, 1970. L. Fausett, Fundamentals of Neural Networks, New Jersey, Prentice-Hall 1994. Rabiner, Juang, Fundamentals of speech recognition, published by Prentice Hall. Richard O. Duda, Peter E. Hart, David G. Stork Pattern classification (2nd edition), Wiley, New York, 2001

[5] [6] [7] [8]

[9] [10]

[11] [12] [13]

[14]

[15] [16]

Saunders, C., Stitson, M. O., Weston, J., Bottou, L., Schoelkopf., B. and Smola, A. (1998) Support Vector Machine - Reference Manual. Technical Report CSD-TR-98-03, Department of Computer Science, Royal Holloway, University of London. Xuedong Huang, Alex Acero, H. W. Hon, Spoken language processing, published by Prentice Hall. Y. Linde, A. Buzo, & R. M. Gray, An Algorithm for Vector Quantization Design, IEEE Transactions on Communication, v. COM-208, pp.84-95, 1980.

Authors Biography Chih-Wen Weng is a full-time lecturer in Chinese Music department at National Tainan Art University, Taiwan, since 1996. He won the first prize of the erhu category in 1987s music competition in Taiwan. He has also received several music composition prizes. His research interests include music temperament analysis and computer music synthesis. Since 2003, he has been a PhD candidate in the Department of Computer Science at National Tsing Hua University, Taiwan. Cheng-Yuan Lin received the Master degree in Department of Computer Science at National Tsing Hua University, Taiwan, in 2001. Since 2003, he has been a PhD candidate in the same department. His research interests include speech/singing/music synthesis, speaker/speech/music recognition and audio signal processing. Jyh-Shing Roger Jang received the Ph.D. degree in EECS Department at UC Berkeley in 1992. Since 1995, he has been with the Department of Computer Science, National Tsing Hua University, Taiwan. His research interests include melody/music/speech/speaker recognition, audio signal processing/recognition, pattern recognition, neural networks and fuzzy logic.

Music Genre Classification Using MFCC
No ratings yet
Music Genre Classification Using MFCC
57 pages
Speaker Recognition via Vocal Tract Features
No ratings yet
Speaker Recognition via Vocal Tract Features
5 pages
Musical Instrument Classification with MIR Toolbox
No ratings yet
Musical Instrument Classification with MIR Toolbox
5 pages
Final Project Report
No ratings yet
Final Project Report
15 pages
Emotion Detection in Speech Analysis
No ratings yet
Emotion Detection in Speech Analysis
5 pages
Classification of Vehicles Based On Audio Signals Using Quadratic Discriminant Analysis and High Energy Feature Vectors
No ratings yet
Classification of Vehicles Based On Audio Signals Using Quadratic Discriminant Analysis and High Energy Feature Vectors
19 pages
Speaker Identification
No ratings yet
Speaker Identification
8 pages
A Novel Approach For MFCC Feature Extraction
No ratings yet
A Novel Approach For MFCC Feature Extraction
5 pages
Mel Frequency Cepstral Coefficient (MFCC) - Guidebook - Informatica e Ingegneria Online
No ratings yet
Mel Frequency Cepstral Coefficient (MFCC) - Guidebook - Informatica e Ingegneria Online
12 pages
Text-Independent Speaker Identification
No ratings yet
Text-Independent Speaker Identification
16 pages
1-S2.0-S0885230824000962-Main Significance of Chirp MFCC As A Feature in Speech and Audio
No ratings yet
1-S2.0-S0885230824000962-Main Significance of Chirp MFCC As A Feature in Speech and Audio
11 pages
Enhanced Speaker Recognition Using Inverted MFCC
No ratings yet
Enhanced Speaker Recognition Using Inverted MFCC
7 pages
Enhanced Speaker Recognition with MFCC
No ratings yet
Enhanced Speaker Recognition with MFCC
7 pages
Voice Activation for Humanoid Robots
No ratings yet
Voice Activation for Humanoid Robots
6 pages
Arabic Voice Conversion Using MFCC GMM
No ratings yet
Arabic Voice Conversion Using MFCC GMM
4 pages
Musical Instrument Classification with CNN
No ratings yet
Musical Instrument Classification with CNN
5 pages
Music Genre Classification Techniques
No ratings yet
Music Genre Classification Techniques
5 pages
Gender Recognition Using Fast Fourier Transform With Ann
No ratings yet
Gender Recognition Using Fast Fourier Transform With Ann
6 pages
MFCC Technique For Speech Recognition
No ratings yet
MFCC Technique For Speech Recognition
6 pages
Musical Instruments Sound Classification Using GMM
No ratings yet
Musical Instruments Sound Classification Using GMM
12 pages
13MFCC Tutorial
No ratings yet
13MFCC Tutorial
6 pages
Music Genre Classification
No ratings yet
Music Genre Classification
33 pages
Musical Instrument Timbre Classification
100% (1)
Musical Instrument Timbre Classification
10 pages
Musical Genre Classification by Instrument Features
No ratings yet
Musical Genre Classification by Instrument Features
4 pages
Music Genre AI Classification
No ratings yet
Music Genre AI Classification
5 pages
Perceptual Representations For Classification of Everyday Sounds
No ratings yet
Perceptual Representations For Classification of Everyday Sounds
6 pages
Audio Classification Feature Analysis
No ratings yet
Audio Classification Feature Analysis
6 pages
Music Genre Classification with Deep Learning
No ratings yet
Music Genre Classification with Deep Learning
12 pages
Musical Instrument Recognition System
No ratings yet
Musical Instrument Recognition System
4 pages
MFCC Features: Appendix A
No ratings yet
MFCC Features: Appendix A
19 pages
Classifying Musical Instrument Timbres
100% (1)
Classifying Musical Instrument Timbres
8 pages
Speech Recognition Using MFCC Analysis
No ratings yet
Speech Recognition Using MFCC Analysis
4 pages
Noise Robust Audio Classification Techniques
No ratings yet
Noise Robust Audio Classification Techniques
6 pages
Efficient Instrument Recognition in Music
No ratings yet
Efficient Instrument Recognition in Music
5 pages
Machine Learning for Modulation Recognition
No ratings yet
Machine Learning for Modulation Recognition
5 pages
Vocal Segment Detection in Songs
No ratings yet
Vocal Segment Detection in Songs
8 pages
Voice Recognition Using MFCC and DTW
No ratings yet
Voice Recognition Using MFCC and DTW
4 pages
Music Genre Classification Project Repor
No ratings yet
Music Genre Classification Project Repor
19 pages
Acoustic Modeling Using Deep Belief Networks: Abdel-Rahman Mohamed, George E. Dahl, and Geoffrey Hinton
No ratings yet
Acoustic Modeling Using Deep Belief Networks: Abdel-Rahman Mohamed, George E. Dahl, and Geoffrey Hinton
10 pages
Pitch and MFCC Dependent GMM Models For Speaker Identification Systems
No ratings yet
Pitch and MFCC Dependent GMM Models For Speaker Identification Systems
4 pages
Blind Channel Identification Aided Generalized Aut
No ratings yet
Blind Channel Identification Aided Generalized Aut
7 pages
MFCC Speaker Recognition in MATLAB
No ratings yet
MFCC Speaker Recognition in MATLAB
5 pages
Early Fault Diagnosis in Rotating Machinery
No ratings yet
Early Fault Diagnosis in Rotating Machinery
15 pages
Automatic Speech Recognition 2
No ratings yet
Automatic Speech Recognition 2
22 pages
Text-Independent Speaker Identification
No ratings yet
Text-Independent Speaker Identification
9 pages
MFCC Step
100% (1)
MFCC Step
5 pages
Dynamic Spectrum Derived MFCC and HFCC Parameters and Human Robot Speech Interaction
No ratings yet
Dynamic Spectrum Derived MFCC and HFCC Parameters and Human Robot Speech Interaction
5 pages
Modified Mel Frequency Cepstral Coefficient
No ratings yet
Modified Mel Frequency Cepstral Coefficient
5 pages
Text-Independent Speaker Recognition System
No ratings yet
Text-Independent Speaker Recognition System
11 pages
Sound Feature Analysis for Instrument Identification
No ratings yet
Sound Feature Analysis for Instrument Identification
22 pages
An Approach To Extract Feature Using MFC
No ratings yet
An Approach To Extract Feature Using MFC
5 pages
A Mixture Model-Based Real-Time Audio Sources Classification Method
No ratings yet
A Mixture Model-Based Real-Time Audio Sources Classification Method
5 pages
Polyphonic Instrument Sound Estimation
No ratings yet
Polyphonic Instrument Sound Estimation
15 pages
MFCC and SVM for Speaker Verification
No ratings yet
MFCC and SVM for Speaker Verification
4 pages
Jios 1431
No ratings yet
Jios 1431
18 pages
Folk Music Classification with HMMs
No ratings yet
Folk Music Classification with HMMs
6 pages
Word Beginning Practice Activities
No ratings yet
Word Beginning Practice Activities
4 pages
CWC Assistant Engineer Interview Schedule
No ratings yet
CWC Assistant Engineer Interview Schedule
2 pages
1-6 2-1 Ocb
No ratings yet
1-6 2-1 Ocb
26 pages
20th Asian Athletics Championships Schedule
No ratings yet
20th Asian Athletics Championships Schedule
5 pages
Contract Engineer Interview Notice
No ratings yet
Contract Engineer Interview Notice
1 page
Physics 11 Kinematics Worksheet
No ratings yet
Physics 11 Kinematics Worksheet
2 pages
Chemistry Experiment 13.1
100% (1)
Chemistry Experiment 13.1
3 pages
Using The Moody Diagram
No ratings yet
Using The Moody Diagram
6 pages
Engineering Foundation Design
No ratings yet
Engineering Foundation Design
81 pages
Runge Kutta Method
100% (1)
Runge Kutta Method
33 pages
Honeybee Grasshopper Primer
No ratings yet
Honeybee Grasshopper Primer
564 pages
Investigatory Project
No ratings yet
Investigatory Project
7 pages
Methods For The Determination of Limit of Detection and
No ratings yet
Methods For The Determination of Limit of Detection and
5 pages
Core Cutter
100% (1)
Core Cutter
3 pages
SR Ivocap System
No ratings yet
SR Ivocap System
36 pages
Engineering Mathematics I Exam Paper
No ratings yet
Engineering Mathematics I Exam Paper
3 pages
Basic Dimension: A Basic Dimension Is A Numerical Value Used To Describe The
No ratings yet
Basic Dimension: A Basic Dimension Is A Numerical Value Used To Describe The
1 page
Combined Flexure and Compression Analysis
No ratings yet
Combined Flexure and Compression Analysis
3 pages
MMW Synthesis Paper (Mapua)
No ratings yet
MMW Synthesis Paper (Mapua)
2 pages
KP Astrology
67% (3)
KP Astrology
19 pages
High-Pressure Vapor-Liquid Equilibria Calculation
100% (1)
High-Pressure Vapor-Liquid Equilibria Calculation
9 pages
Design of Subsea Equipment Exposed To Cathodic Protection
No ratings yet
Design of Subsea Equipment Exposed To Cathodic Protection
20 pages
CK mds800 RHB Small
No ratings yet
CK mds800 RHB Small
2 pages
Density and Molar Volume Paper
No ratings yet
Density and Molar Volume Paper
5 pages
Equilibrium Temperature in Plates
No ratings yet
Equilibrium Temperature in Plates
14 pages
Team Contest Instructions and Problems
No ratings yet
Team Contest Instructions and Problems
11 pages
MHP Tiaong, Quezon
No ratings yet
MHP Tiaong, Quezon
15 pages
SiC Thin Film Epitaxial Growth Guide
No ratings yet
SiC Thin Film Epitaxial Growth Guide
48 pages
Examining The E: A Closer Look at A Classic
No ratings yet
Examining The E: A Closer Look at A Classic
5 pages
Simulador Qrod v3 para Bombeo Mecanico
100% (1)
Simulador Qrod v3 para Bombeo Mecanico
3 pages
9 8a PDF
No ratings yet
9 8a PDF
4 pages
Bi-Directional Couplers Overview
No ratings yet
Bi-Directional Couplers Overview
3 pages
Concrete Wall Design with RAM Advanse
No ratings yet
Concrete Wall Design with RAM Advanse
21 pages
Synthesis of The Neurotransmitter 4-Aminobutanoic Acid (GABA) From Diethyl Cyanomalonate
No ratings yet
Synthesis of The Neurotransmitter 4-Aminobutanoic Acid (GABA) From Diethyl Cyanomalonate
5 pages
Analisis Reactive Silica
100% (2)
Analisis Reactive Silica
2 pages

Erhu Identification Using MFCC and GMM

Uploaded by

Erhu Identification Using MFCC and GMM

Uploaded by

Music Instrument Identification Using MFCC: Erhu as an Example

2. ETR System Design

where N ( x; i , i ) denotes the p-dimensional normal distribution with mean vector

and covariance matrix , defined by

In (1) the terms i are positive scalar weights ( i =1 i

x is a p-dimensional vector. In this paper, x is actually a p-dimensional vector

Where x j is the feature vector for j-th frame.

the leave-one-out method.)

30 20 10 0 11 12 32 21 22 8 4 Gaussian Mixture Number 16

0 11 12 32 21 22 MFCC Dimension 8 4 Gaussian Mixture Number 16

Power Spectrum Dimension

60 40 20 0 11 12 32 21 22 MFCC Dimension 8 4 Gaussian Mixture Number 16

Power Spectrum Dimension

60 40 20 0 11 12 32 21 22 MFCC Dimension 8 4 Gaussian Mixture Number 16

Power Spectrum Dimension

erhu instrument identification.

4. Conclusions and Future Work

[5] [6] [7] [8]

[11] [12] [13]

You might also like