You are on page 1of 9

ISSN No.

Volume 1, No.1, July – August Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9 2012

International Journal of Computing, Communications and Networking
Available Online at http://warse.org/pdfs/ijccn01112012.pdf

Acoustic Representation of BODO and RABHA Phonemes
1

Jyotismita Talukdar1, Nabankur Pathak2 Asia Institute of Technology, Bangkok, Thailand, E-mail:jyotismita4@gmail.com 2 Gauhati University, India, E-mail:phtassam@gmail.com

ABSTRACT In this paper we studied the spectral features of Bodo and Rabha Phonemes. The spectral features are studied using formant frequency and Cepstral coefficients. Depending on the analysis on cepstral features and formant frequencies of Bodo and Rabha phonemes and words we observed that significant variation of cepstral coefficients are observed among the Bodo vowels. The cepstral variation is found to be maximum with respect to vowel /o/ and minimum corresponding to vowel /u/, in case of male speakers. Similarly, for female Bodo speakers, the maximum variation of cepstral measure is found corresponding to vowels /o/ and minimum in case of /i/.In case of Rabha vowels, i.e., /o/, /a/, /i/, ./e/,, /u/ and /w/ for both male and female speakers the range of variation of the cepstral coefficient is found to be maximum in case of male speakers with respect to vowel /u/ and minimum with respect to vowel /o/. In case of female speaker, the maximum variation of cepstral co-efficient is found in case of vowel /o/ and minimum with respect to vowel /e/. This observation may be helpful in sex determination for both Bodo and Rabha speakers.The range of variation of cepstral coefficients for Bodo and Rabha male is found within the range of 3.8177 >CBodo>1.1523 and 8.1329>CRabha>2.0579 respectively. The range of variation for female is found 1.9578>CBodo>0.9276 and 7.6546>CRabha>2.4127. i.e. the variation of cepstral features for Bodo vowels is less (Male-2.6654; Female1.0302) with respect to the Rabha vowels (Male-6.0750; Female-5.2419) i.e., the former is stable as compared to the latter. The investigation have shown that the range of

formant frequency is maximum in case of isolated vowels, but when the vowels are placed in the nucleus of a structure like CV, VC or CVC, the formant frequency decreases. Keywords: Acoustic Representation, Phonemes, Cepstral Features 1. INTRODUCTION The Bodos and the Rabhas are the early ethnic and linguistic communities settled in the North-Eastern part of India. The Bodos belong to a larger group of ethnicity called the BodoKachari. Racially, they belong to a Mongoloid stock of the Indo-Mongoloids or Indo-Tibetans. Mythologically, according to Dr. Suniti Kumar Chatterjee, a well-known historian, the Bodos are “the Offspring of son of Vishnu and mother Earth”, who are termed as Kiratas during the epic period. They are recognized as a plain tribe in the 6th schedule of the constitution of India. Historically, there are different views on the early migration of the Mongolian into the North-Eastern part of India. Some of them are: According to Grierson’s “The Linguistic Survey of India”, the Mongolian settled in old Assam, migrated from HoangHo and Yangtze River banks and scattered and dwelt in different river banks of the state. The upper course of the Yangtz and Hoang-Ho in the North-West China were the original home of the Tibeto-Burman races. The hierarchy of Bodo community is shown in figure .

Hierarchy of Bodo & Rabha Languages
1 @ 2012, IJCCN All Rights Reserved

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

Speech Data Collection for Acoustic Representation

2. LPC ANALYSIS Linear prediction is a method for signal source modelling dominant in speech signal processing and having wide application in other areas. Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques. The glottis (the space between the vocal cords) produces the sound, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat, the mouth and the nasal cavity) forms the tube, which is characterized by its resonance frequencies, which are called formants. The basic problem of the LPC system is to determine the formants from the speech signal. The solution of this problem is a difference equation, which expresses each sample of the signal as a linear combination of previous samples. Such an equation is called a linear predictor i.e. Linear Predictive Coding. The coefficients of the difference equation (the prediction coefficients) characterize the formants. Therefore, the LPC system needs to estimate these coefficients. The estimation is made by minimizing the mean square error between the predicted signal and the actual signal. The basic idea behind the LPC model is that a given speech sample at time n, can be approximated as a linear combination of the past p speech samples (Rabiner & Juang, 1993) such that (1) (1)
n assumed to be Where the coefficients are 1 2 constants over the speech analysis frame. The equation (1) can be converted to an equality by including an excitation term Gu(n),

Typically, the spoken language data can be classified based on
    Mode of speech Medium of recoding Language Dialects Environment

In the present study, speech data is collected from the native speakers of Bodo and Rabha language who are fluent in speaking and writing the language. Male and female speaker of age between 15 to 30 years, possessing a pleasant and a good voice quality are chosen to record the data. The recording is done one-by one manner. The speakers were instructed to read each word or sentence naturally, without emotions and expression. They were asked to speak clearly and to keep their normal speaking rate and volume. To keep the recording consistent, both in phonetic and prosodic (within the framework of symbolic Prosody) terms, an expert in acoustic phonetics supervised the recording. The average duration of recording session was about 4 hours (3 recording session) for each speaker (Male & Female). We have recorded the following data sets for analysis of the cepstral coefficients of vowel phonemes and formant frequencies of some selected Bodo and Rabha words.  Bodo and Rabha vowel phonemes for cepstral analysis.  Selected word sets of V, CV, VC and CVC structure in both languages for formant analysis. The recording is done in audio editing software Cool Edit Pro and the analysis was done in MATLAB 7.1. Each digitized voice uttered, is divided or blocked into 50 frames of duration 20 millisecond (ms). Every frame contains 441 samples and for each frame 20 cepstral coefficients have been calculated. The spectral characteristics of six Bodo and Rabha vowels, corresponding to male and female speakers were investigated. Approximately 12 samples were averaged to obtain one coefficient. Firstly, 10th frame of all utterances of male and female speakers have been considered for analysis. The variation of the cepstral coefficients for the Bodo and Rabha vowels corresponding to the selected speakers have been shown in Table-(1) & Table-(2) and depicted in Figures-(3 & 4) and Figures-(6 & 7). However, from continuous frame wise analysis, it is observed that: 2, 4, 6, and 8 frames for Bodo speaker (Figure-5) and 9, 14, 16 and 17 frames for Rabha speaker (Figure-8) have shown distinct variation of the cepstral coefficients for male and female speakers.

a , a ,...a

(2) Where normalized excitation and G is the gain of excitation. Expressing equation (2) in Z domain we get the relation:

(3) Leading to the transfer function:

(4) based on our knowledge that the actual excitation function for speech is essentially either voiced speech sounds or an unvoiced sound.
2 @ 2012, IJCCN All Rights Reserved

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

The relation between and is defined (based on the speech production model Figure-1.1)

 s (m  i) s (m  k )
n n

This term m term covariance of sn(m) i.e.,

are related to the short

(5)
We consider the linear combination of past speech samples as the estimate , defined as, Which can be expressed in compact notation as,

(15)

(16) Which describe a set of p equations. It is readily shown that (6) The predictor error, , is defined as , (17) thus the minimum mean-squared error consists of a fixed term and is depend on the predictor coefficients. To solve Equation (16) for the optimum coefficients we have to compute =1(8) , for , and the minimum mean-square error, , can be expressed as :

(7) And the error transfer function is,

The basic problem of linear prediction analysis is to determine the set of predictor coefficient , directly from the speech signal so that the speech properties of the digital filter match those of the speech waveform within the analysis window. To set up the equations that must be solved to determine the predictor coefficients, we define the short-term speech and error segments at time n as, (9) (10) and tried to minimize the mean square error signal at time n, (11) Using equation (9) & (10) we can write

, and then solve the resulting set of p simultaneous equations. A method to solve these equations and compute the coefficients is the autocorrelation method. The LPC-Cepstral Co-efficient In the present study, LPC-based cepstral coefficients and phonetically important parameters are used as feature vectors. Cepstral weighted feature vector is obtained for each frame by block processing of continuous speech signals. The analog speech waveform is then sampled and quantized analog-to-digital converter. To spectrally flatten the signal, the speech signal has been subjected to the pre-emphasis procedure through a first order digital filter whose transfer function has been given by , with (19)

(12) To solve the equation (4.12) we put

Consecutive speech signal are taken as a single frame. To reduce the undesired effect of Gibbs phenomenon, the frames are multiplied by a windows function (Hamming window), which is given by (Proakis, & Manolakis, 2004;Talukdar , P.H, 2010)

(13) giving (14) Where N is the number of sample in a block. Now, each frame of the windowed signal is next auto correlated to give (20)
3 @ 2012, IJCCN All Rights Reserved

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

m=0, 1, 2…p Where the highest auto correlated value LPC analysis. is the order of the (21) (22) Equation (4.30) shows the computation of cepstral coefficients C p+1, C p+2…C p. Generally, is taken for cepstral representation.

a. LPC Parameter Conversion to Cepstral Coeffecients The LPC cepstral coefficients, which are a set of values that have been found to be more robust, reliable feature set for speech recognition than the LPC coefficients. These coefficients are obtained recursively as follows. Where model. is the gain term in the LPC

Table 1: Range of variation of the cepstral coefficients corresponding to the male and female Bodo speaker Cepstral Coefficients Vowel /o/ /a/ /i/ /e/ /u/ /w/ Max. 2.2237 1.6260 1.1528 1.2355 1.0922 1.1832 Male Min. -1.5940 -0.9615 -0.1253 -0.6532 -0.0601 -0.1541 Range of variation 3.8177 2.5875 1.2781 1.8887 1.1523 1.3373 Max 1.9492 0.9492 0.9059 0.9847 1.1385 1.1843 Female Min Range of variation -0.0086 1.9578 -0.0641 -0.0217 -0.0578 0.0690 -0.1674 1.0133 0.9276 1.0425 1.2075 1.3517

Figure 1. Cepstral characteristics of Bodo vowels for male speaker
10 0 -10 /o/ -20 2 1 0 -1 2 1 0 -1 0 5 10 /u/ 0 5 10 /i/ 0 5 10 -20 4 2 0 -2 2 1 0 -1 0 5 10 /w/ 0 5 10 /e/ 0 5 10 0 -10 /a/ 10

Figure 2. Cepstral characteristics of Bodo vowels for female speaker
4 2 0 -2 2 1 0 -1 4 2 0 -2 0 5 /u/ 0 5 10 /i/ 0 5 10 /o/ 4 2 0 -2 4 2 0 -2 4 2 0 -2 10 0 Cepstral Coefficient 5 10 /w/ 0 5 10 /e/ 0 5 10 /a/

Cepstral Coefficient

Amplitude(dB)

Amplitude(dB)

4 @ 2012, IJCCN All Rights Reserved

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9
Frame no.:2 1.4 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 -1 -0.4 -0.6 -1.5
LogM agnitude(dB) LogM agnitude(dB)

Frame no.:4 2 Male Female Male Female

1.5

1

0.5

0

-0.5

0

2

4

6

8 10 12 Cepstral Coefficients

14

16

18

20

0

2

4

6

8 10 12 Cepstral Coefficients

14

16

18

20

Frame no.:16 8 7 6 5 4 3 2 1 0 -1 -2 -1 -0.5
LogMagnitude(dB) LogMagnitude(dB)

Frame no.:6 1.5 Male Female 1 Male Female

0.5

0

0

2

4

6

8 10 12 Cepstral Coefficients

14

16

18

20

0

2

4

6

8 10 12 Cepstral Coefficients

14

16

18

20

Figure 3. Distinction between Bodo Male & Female speaker in frame no 2,4,16 & 8

Table 2: Range of variation of the Cepstral coefficients corresponding to the Male and Female Bodo speaker Cepstral Coefficients Male Female vowel Max. Min. Range of variation Max. Min. Range of variation /o/ 1.0057 -1.0522 2.0579 3.9045 -3.7501 7.6546 /a/ 1.4964 -1.8083 3.3047 2.0135 -2.0784 4.0919 /i/ 1.4086 -1.8085 3.2171 1.9864 -1.9832 3.9696 /e/ 2.1054 -2.2054 4.3108 0.9164 -1.4963 2.4127 /u/ 3.4942 -4.6387 8.1329 1.0839 -1.6952 2.7791 /w/ 2.4834 -1.0627 3.5461 1.7201 -0.8801 2.6002

Figure 4. Distinction between the male and female Rabha speaker in frame no: 9,14,16 & 17 Figure 5. Cepstral characteristics of Rabha vowels for Figure 6. Cepstral characteristics of Rabha vowels male speaker for female speaker.
5 /o/ 0 0 5 /a/ 0 5 /o/ 0 5 /a/

-5 2

0

5

10

-5 2

0

5

10
Amplitude(dB)

-5

0

5 5 /i/

10 10

-5 5

0

5

10

Amplitude(dB)

2 0 0 -2

/e/ 0

0

/i/

0

/e/

-2 2 1 0 -1

0

5

10

-2 5

0

5

10 2

-5 2 /u/ 0

0

5

10

/u/

0

/w/

1 0

/w/

0

5

10

-5

0

5

10

-1

0

5

10

-2

0

5

10

Cepstral Coefficient

Cepstral coefficient

b. Formant Estimation of BODO and RABHA Phonemes Formant frequency components of human frequencies of vocal concentration during is the distinguishing frequency speech. It refers to specific resonance tract which have maximum energy the vowels utterance. It can be

qualitatively distinguished by the frequency component of the vowel. Generally, three formants frequencies (F1, F2 and F3) are considered for perception and discrimination of vowels by a listener (Kewley, 1982, 1983). A variety of approaches, such as formant tracking articulator model and auditory model have been used for the analysis and synthesis of speech. The formant tracking method, based on Linear
5

@ 2012, IJCCN All Rights Reserved

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

Predictive Code (LPC), has received considerable attention. Based on digitalized technique, the entire frequency range is divided into a fixed number of segment and each segment is represents a formant frequency. A 2nd order resonator for each segment k with a specific boundary is defined. A predictor polynomial defined as the Fourier transform of the corresponding 2nd order predictor is given by (Welling. I, and Ney, II, 1998): (23) Where and are real valued predictor coefficients. Therefore, from equation (23) we get (24)
Table 3: Vowe Female /o/ 319.1 833.3 3030.4 309.3 764.0 2748.8 /or/(fire) 326.4 1623.4 3023,8 539.1 2293.9 3242.6 /hw/(to give) 320.7 1687.9 3120.24 494.4 2109.8 3216.3 /san/(the sun) 285.5 1800.6 3286.8 838.3 1494.4 3546.54 /a/ 380.3 1194.5 3650.4 343.8 1172.0 2494.5

= The parameter

(25) , determines the bandwidth of the . .The

resonator defined as negative (-) of formant frequency is given by,

(26)

Male

F1 F2 F3 F1 F2 F3

VC Female F1 F2 F3 F1 F2 F3

Male

CV Female F1 F2 F3 F1 F2 F3 F1 F2 F3 F1 F2 F3

/ /(I) 326.1 1717.5 3006.2 714.0 2365.5 3199.6 /bu/(to swell)

Formant Frequencies Estimation of BODO Words Formant frequency /i/ /e/ 411.3 387.5 2409.8 2240.8 2911.8 3165.0 394.6 384.9 2341.6 2178.1 3002.4 3577.1 /ich/(pain) /un/(back side) 293.3 2371.3 3455.9 299.3 2932.2 3189.1 /ru/(to boil) 311.1 1623.5 3445.5 633.0 2386.2 3298.9 /bar/(wind) 298.5 2657.89 3024.78 892.2 1356.9 3198.00 300.5 1424.7 3276.9 280.2 2240.0 2636.7 / /(to beat) 337.6 1853.7 2996.8 375.5 2536.2 2842.9 /lir/(to write) 304.7 2354.87 3254.67 745.3 1293.2 3354.52

/u/ 249.6 997.7 3044.3 244.7 837.5 3690.6 /ul/(confuse) 347.2 2353.1 2853.5 442.8 2544.9 3350.7 /be/(this) 354.6 1699.5 3001.65 283.4 2250.1 3220.0 /dwn/(to keep) 352.6 1471.0 3163.2 300.7 1238.3 3648.01

/w/ 292.7 1527.2 3165.3 206.4 1147.1 2486.9 /em/(bed) 311.4 2452.7 2765.3 398.5 1265.7 2435.8 /gi/(to fear) 334.8 1617.9 2947.7 393.0 2223.5 3287.7 /thar/ (sure) 276.1 2491.2 3155.5 415.6 1629.4 3674.98

Male

CVC Female

Male

382.1 1661.1 3077.1 690.1 2545.8 3355.9 /swb/(smoke) 282.5 1966.4 3135.6 727.5 1421.3 3265.67

6 @ 2012, IJCCN All Rights Reserved

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

Formant Frequencies estimation of 6 Bodo vowels for female utterances
50 0 /o/ 50 /e/ 0

Formant Frequencies estimation of 6 Bodo vowels for male utterances
50 /o/ 0 0 40 20 /e/

-50
A p d d ) m litu e B

0

1000

2000

3000 /a/

4000

-50 20

0

1000

2000

3000 /u/

4000
A p d (d ) m litu e B

-50 50

0

1000

2000

3000

4000

-20 40

0

1000

2000

3000

4000

50
(

0 -20

/a/ 0

20 0

/u/

0

-50 20

0

1000

2000

3000 /i/

4000

-40 50

-50 50

0

1000

2000

3000 /w/

4000

0

1000

2000

3000 /i/

4000

-20 40 20 0

0

1000

2000

3000

4000

0 0 -20 -40 0 1000 2000 3000 4000 0 Frequency(Hz) -50 1000 2000

/w/

0

3000

4000

-50

0

1000

2000

3000

4000

-20

0

1000

2000

3000

4000

Frequency(Hz)

F1-F2 plot shows the vowel triangle for male and female speaker of Bodo language.
F1-F2 for male & female vowel tringle 2600 2400 2200 2000 /i/ Red-Male Blue-Female

Formant frequency curves shows the distinction of formant variation for V,VC,CV & CVC word structure.
Change of fromant with v/vc/cv /cvc 40 V 30 VC 20 CV

G in(d ) a B

1800
F (H 2 z)

10

CV C

1600 1400 1200 1000 /a/ 800 600 200 /u/ 250 300 350 F1 (Hz) 400 450 500

0

-10

-20

-30

0

500

1000

1500 2000 2500 Frequency (Hz)

3000

3500

4000

F1-F2 plot shows the range formant frequencies of the CV,VC or CVC word structure of Bodo language mostly lies within the range of the formant frequencies of the vowels.
Range of Formant Frequency 2500

2000

VC

CV 1500
F (H 2 z)

FV MV

1000

CVC

500 MV-Male,vowel FV-Female,vowel 0 200 250 300 350 F1 (Hz) 400 450 500

Vowel Female

Male

F1 F2 F3 F1 F2 F3

/o/ 640.3 2560.4 3220.8 620.2 2154.7 2876.1 /ora /(you are) 543.7 1748.9 3823.5 643.3 2396.9 3242.6 /to/(hen) 465,9 1874.5

VC Female F1 F2 F3 F1 F2 F3

Male

CV Female F1 F2

Table 4: Formant frequency Formant frequency /a/ /i/ /e/ 283.5 280.4 1040.8 1480.3 2560.8 1384.4 3600.2 2200.6 3151.8 243.8 301.3 987.4 1654.4 2251.8 2657.9 2865,8 3985.8 3758.4 /intcek/(this much) /ek/(to jump) / /(I am) 375.3 275.7 765.2 1682.9 2769.4 1765.6 30165.5 3321.9 3546.9 987.5 276.9 321.9 2401.6 3001.8 2394.9 3099.0 3548.2 2987.3 /tsa/(to eat) /mi/(vegetable) /the/(fruit) 3428 2463.9 354.9 1987.7 698.8 1976.4

/u/ 480.9 2360.1 3211.4 504.5 2857.9 3415.8 /ut/(camal) 653.9 2015.9 2976.9 431.9 2656,7 3241.9 //tcu/(thorn) 387.03 1687.5

/w/ 340.5 1080.2 2720.4 253.9 2415.7 2965.8 /r /(length) 392.8 2438.3 2657.9 400.3 1834.9 2865.9 /a /(shout) 565.7 176.5
7

@ 2012, IJCCN All Rights Reserved

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

Male

F3 F1 F2 F3

CVC Female F1 F2 F3 F1 F2 F3

2976.4 498.3 2183.5 3216.3 /tcok/(compound) 276.4 1867.5 3341.7 845.2 1476.7 3498.6

2981.6 690.3 2574,8 3582.7 /na (You are) 265.8 20001.8 3875.4 698.4 1501.0 3315.8

3768,9 541,7 3286.1 3321.8 /rin)(loan) 301.6 2782.5 3054.6 875.2 1401,9 3176.0

2885.6 367.2 2653.0 2976.2 /ben /(where) 312.4 2323.3 3198.6 684.9 1354.8 3299.7

3415.6 298.5 2261.3 3139.7 /tbau/(owl) 299.7 1976.4 2988.3 301.5 1222.6 3571.8

2986.3 391.2 2695.3 3271.6 /tsara /(disease) 261.6 2434.1 3145.1 500.8 1687.8 3679.1

Male

Formant Frequencies estimation of 6 Rabha Vowels for female utterances
20 /o/ 0 -20 -20
Am plitu (dB de )

Formant Frequencies estimation of 6 Rabha vowels for male utterances
20 40 20 0 0 1000 2000 3000 4000 -20 40 20 0 0 -20 40 0 1000 2000 3000 4000 -20 40 20 0 0 1000 2000 3000 4000 -20 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 -20 -40

20 /e/ 0

0

1000

2000

3000

4000

-40 20

0

1000

2000

3000 /u/

4000
A p d (d ) m litu e B

20 0 -20 -40 20 /i/ 0 0 1000 2000 3000 4000 /a/

20

0

-20 20

0

1000

2000

3000 /w/

4000

20 0 -20

0

-20

0

1000

2000

3000

4000

-20

0

1000

2000

3000

4000

Frequency(Hz)

Frequency(Hz)

F1-F2 plot shows the vowel triangle for male and female speaker of Rabha language
Vowel tringle for male & female speaker 3000 Red-Male Blue-Female

F1-F2 plot shows the range formant frequencies of the CV,VC or CVC word structure of Rabha language mostly lies within the range of the formant frequencies of the vowels.
Range of Formant frequency 3000 2800 2600 MV-Male,vowel FV-Female,Vowel

/i/ 2500

2000
F2(Hz)

2400 CV 2200
F2 (Hz)

MV 2000 1800 VC 1600 CVC FV

1500 /a/

1000

/u/

1400 1200

500

0

500 F1 (Hz)

1000

1500

1000

0

500 F1(Hz)

1000

1500

3.RESULTS AND DISCUSSION Depending on the analysis on cepstral features and formant frequencies of Bodo and Rabha phonemes and words the following observations were made-Significant variation of cepstral coefficients are observed among the Bodo vowels as shown in Table-1. The cepstral variation is found to be maximum with respect to vowel /o/ and minimum corresponding to vowel /u/, in case of male speakers. Similarly, for female Bodo speakers, the maximum variation of cepstral measure is found corresponding to vowels /o/ and minimum in case of /i/. In case of Rabha vowels, i.e., /o/, /a/, /i/, ./e/,, /u/ and /w/ for both male and female speakers the range of variation of the cepstral coefficient (Table-2) is found to be maximum in case of male speakers with respect to vowel /u/ and

minimum with respect to vowel /o/. In case of female speaker, the maximum variation of cepstral co-efficient is found in case of vowel /o/ and minimum with respect to vowel /e/. Significantly, cepstral coefficients of Bodo vowels for frame nos: 2, 3, 6 & 8 have shown distinctive characteristic (Figure-4) for male and female speaker. The variation of the cepstral coefficients for male is very irregular in contrast to the stable variation of female cepstral coefficients. The same phenomenon is also observed in case of Rabha vowels also, but in this case the frame numbers are different i.e. frame no: 9, 14, 16 and 17 (Figure-7). This observation may be helpful in sex determination for both Bodo and Rabha speakers. The range of variation of cepstral coefficients for Bodo and Rabha male is found within the range of 3.8177 >CBodo>1.1523 and 8.1329>CRabha>2.0579 respectively. The range of variation for female is found
8

@ 2012, IJCCN All Rights Reserved

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

1.9578>CBodo>0.9276 and 7.6546>CRabha>2.4127. i.e. the variation of cepstral features for Bodo vowels is less (Male2.6654; Female-1.0302)with respect to the Rabha vowels(Male-6.0750;Female-5.2419) i.e., the former is stable as compared to the latter. The Figure 10 and Figure 15 represent the extremes of formant locations in the F1-F2 plane for both Bodo and Rabha vowels. It is found that the formant locations for /u/ (low F1, low F2), /i/ (low F1, high F2) and /a/(high F1, low (low F1, low F2), /i/ (low F1, high F2) and /a/(high F1, low F2) with other vowels are placed with respect to the triangle vertices. The Figure 12 and Figure 16 have shown that the formant frequencies of the selected word sets for both Bodo and Rabha lies within the range of the formant frequencies of the isolated vowels. The investigation have shown that (Table-3 & 4) the range of formant frequency is maximum in case of isolated vowels, but when the vowels are placed in the nucleus of a structure like CV, VC or CVC, the formant frequency decreases. ACKNOWLEDGEMENT We highly acknowledge the Ministry of Communication & Information Technology (MIT), New Delhi, Govt. of India, for providing us the relevant information while preparing the manuscript of this paper. REFERENCES 1. Rabiner, L.R and B. H. Juang. Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliff, New Jersy, 1993. A.M. Noll. Spectrum Pitch Determination, J. Acoustic Society. A.M. Vol.41. pp.293-309, Feb.1967

3.

Borz. Porat. A course in digital Signal Processing, John Willy & Sons. 1997. Proakis, J.G. and Manolakis, D.G. Digital Signal Processing Principles, Algorithm and Applications, Pearson edition, Third Indian reprint 2004. Kewley-Port, D. Measurement of formant transitions in naturally produced stop consonant-vowel syllables, Journal of the Acoustical Society of America, 72, pp. 379-389, 1982. Kewley-Port, D. Time-varying features as correlates of place of articulation of stop consonants, Journal of the Acoustical Society of America, 73, pp. 322-335, 1983. Willing I., and Ney, II. Formant Estimation for Speech Recognition, IEEE Transactions on Speech and Audio Processing, Vol 6. pp.-36-48,1998. Talukdar P.H; 2010. Speech production, Analysis and Coding, Lambert Publication, Germany 2010.

4.

5.

6.

7.

8.

2.

9 @ 2012, IJCCN All Rights Reserved