Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
4Activity
0 of .
Results for:
No results containing your search query
P. 1
Score-Level Fusion for Efficient Multimodal Person Identification using Face and Speech

Score-Level Fusion for Efficient Multimodal Person Identification using Face and Speech

Ratings: (0)|Views: 1,398|Likes:
Published by ijcsis
In this paper, a score fusion personal identification method using both face and speech is introduced to improve the rate of single biometric identification. For speaker recognition, the input speech signal is decomposed into various frequency channels using the multi-resolution property of wavelet transform. For capturing the characteristics of the signal, the Mel frequency cepstral coefficients (MFCCs) of the wavelet channels are calculated. For the recognition stage, hidden Markov models (HMMs) are used. Comparison of the proposed approach with the MFCCs conventional method shows that the proposed method not only effectively reduces the influence of noise but also improves recognition. For face recognition, the wavelet-only scheme is used in the feature extraction stage of face and nearest neighbour classifier is used in the recognition stage. The proposed method relies on fusion of approximations and horizontal details subbands normalized with z-score at the score level. After each subsystem computes its own matching score, the individual scores are finally combined into a total score using sum rule, which is passed to the decision module. Although fusion of horizontal details with approximations gives small improvement in face recognition using ORL database, their fused scores prove to improve recognition accuracy when combining face score with voice score in a multimodal identification system. The recognition rate obtained with speech in noisy environment is 97.08% and the rate obtained from ORL face database is 97.92%. The overall recognition rate using the proposed method is 99.6%.
In this paper, a score fusion personal identification method using both face and speech is introduced to improve the rate of single biometric identification. For speaker recognition, the input speech signal is decomposed into various frequency channels using the multi-resolution property of wavelet transform. For capturing the characteristics of the signal, the Mel frequency cepstral coefficients (MFCCs) of the wavelet channels are calculated. For the recognition stage, hidden Markov models (HMMs) are used. Comparison of the proposed approach with the MFCCs conventional method shows that the proposed method not only effectively reduces the influence of noise but also improves recognition. For face recognition, the wavelet-only scheme is used in the feature extraction stage of face and nearest neighbour classifier is used in the recognition stage. The proposed method relies on fusion of approximations and horizontal details subbands normalized with z-score at the score level. After each subsystem computes its own matching score, the individual scores are finally combined into a total score using sum rule, which is passed to the decision module. Although fusion of horizontal details with approximations gives small improvement in face recognition using ORL database, their fused scores prove to improve recognition accuracy when combining face score with voice score in a multimodal identification system. The recognition rate obtained with speech in noisy environment is 97.08% and the rate obtained from ORL face database is 97.92%. The overall recognition rate using the proposed method is 99.6%.

More info:

Published by: ijcsis on May 11, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

10/31/2012

pdf

text

original

 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 4, April 2011
Score-Level Fusion for Efficient Multimodal PersonIdentification using Face and Speech
Hanaa S. Ali
Faculty of EngineeringZagazig UniversityZagazig, Egypthanahshaker@yahoo.com 
Mahmoud I. Abdalla
Faculty of EngineeringZagazig UniversityZagazig, Egyptmabdalla2010@gmail.com 
 Abstract 
 —In this paper, a score fusion personal identificationmethod using both face and speech is introduced to improve therate of single biometric identification. For speaker recognition,the input speech signal is decomposed into various frequencychannels using the multi-resolution property of wavelettransform. For capturing the characteristics of the signal, the Melfrequency cepstral coefficients (MFCCs) of the wavelet channelsare calculated. For the recognition stage, hidden Markov models(HMMs) are used. Comparison of the proposed approach withthe MFCCs conventional method shows that the proposedmethod not only effectively reduces the influence of noise but alsoimproves recognition. For face recognition, the wavelet-onlyscheme is used in the feature extraction stage of face and nearestneighbour classifier is used in the recognition stage. Theproposed method relies on fusion of approximations andhorizontal details subbands normalized with z-score at the scorelevel. After each subsystem computes its own matching score, theindividual scores are finally combined into a total score usingsum rule, which is passed to the decision module. Although fusionof horizontal details with approximations gives smallimprovement in face recognition using ORL database, their fusedscores prove to improve recognition accuracy when combiningface score with voice score in a multimodal identification system.The recognition rate obtained with speech in noisy environmentis 97.08% and the rate obtained from ORL face database is97.92%. The overall recognition rate using the proposed methodis 99.6%.
I.
 
I
 NTRODUCTION
 A biometric is a biological measurement of any human physiological or behavior characteristics that can be used toidentify an individual. One of the applications which most people associate with biometrics is security. However, biometrics identification has a much broader relevance ascomputer interface becomes more natural. Biometrictechnologies are becoming the foundation of an extensive arrayof highly secure identification and personal verificationsolutions. A biometric-based authentication system operates intwo modes: enrollment and authentication. In the enrollmentmode, a user’s biometric data is acquired and stored in adatabase. The stored template is labelled with a user identity tofacilitate authentication. In the authentication mode, the biometric data of a user is once again acquired and the systemuses this to either identify or verify the claimed identity of theuser. While verification involves comparing the acquired biometric information with only those templates correspondingto the claimed identity, identification involves comparing theacquired biometric information against templatescorresponding to all users in the database [1]. In recent years, biometrics authentication has seen considerable improvementsin reliability and accuracy. A brief comparison of major  biometric techniques that are widely used or under investigation can be found in [2]. However, each biometrictechnology has its strengths and limitations, and no single biometric is expected to effectively satisfy the requirements of all verification or identification applications. Biometric systems based on one biometric are often not able to meet the desired performance requirements and have to be contend with avariety of problems such as insufficient accuracy caused bynoisy data acquisition, interclass variations and spoof attacks[3]. For biometric applications that demand robustness andhigher accuracy than that provided by a single biometric trait,multimodal biometric approaches often provide promisingresults. Multimodal biometric authentication is the approach of using multiple biometric traits from a single user in an effort toimprove the results of the identification process and to reduceerror rates. Another advantage of the multimodal approach isthat it is harder to circumvent or forge [4]. Some of the morewell-known multimodal biometric systems proposed thus far are outlined below.In [5], a comparison of decision level fusion of face andvoice modalities using various classifiers is described. Theauthors evaluate the use of sum, majority vote, three differentorder statistical operators, Behavior Knowledge Space andweighted averaging of classifier output as potential fusiontechniques. In [6], the approach of applying multiplealgorithms to single sample is introduced. In this work, adecision level fusion is performed based on sum, SupportVector Machine and Dempster-Shafer theory on multiplefingerprint matching algorithms submitted to FVC 2004competition with a view to evaluate which technique to use for fusion. In [7], multiple samples of face from same and differentsources are used to create a multimodal modal system using 2Dand 3D face images. The approach uses 4 different 2D imagesand a single 3D image from each user for verification and
48http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 4, April 2011
fusion takes place in parallel at matching score level using sum, product or the minimum value rule. Middendorff, Bowyer andYan in [8] detail different approaches used in combining ear and face for identification. In [9], an overview of thedevelopment of the SecurePhone mobile communicationsystem is presented. In this system, a multimodal biometricauthentication gives access to the system’s built-in e-signingfacilities, enabling users to deal m-contracts using a mobile callin an easy yet secure and dependable way. In their work,signature data is combined with the video data of unrelatedsubjects into virtual subjects. This is possible becausesignatures can be assumed statistically independent of face andvoice data. In his PhD thesis, Karthik [10] proposes a fusionstrategy based likelihood ratio used in the Neyman-Pearsontheorem for combination of match score. He shows that thisapproach achieves high recognition rates over multipledatabases without any parameter tuning.In this paper, we introduce a multimodal biometric systemwhich integrates face and voice to make a personalidentification. Most of the successful commercial biometricsystems currently rely on fingerprint, face or voice. Face andspeech are routinely used by all of us in our daily recognitiontasks [11]. Despite the fact that there are more reliable biometric recognition techniques such as fingerprint and irisrecognition, the success of these techniques depends highly onuser cooperation, since the user must position his eye in frontof the iris scanner or put his finger in the fingerprint device. Onthe other hand, face recognition has the benefit of being a passive, non intrusive system to verify personal identity in anatural and friendly way since it is based on images recorded by a distance camera, and can be effective even if the user isnot aware of the existence of the face recognition system. Thehuman face is the most common characteristics used byhumans to recognize other people and this is why personalidentification based on facial images is considered thefriendliest among all biometrics [12]. Speech is one of the basiccommunications, which is better than other methods in thesense of efficiency and convenience [13]. For these reasons,face and voice are chosen in our work to build individual facerecognition and speaker identification modules. These modulesare then combined to achieve a highly effective personidentification system.II.
 
F
USION IN BIOMETRICS
 Ross and Jain [3] have presented an overview of multimodal biometrics and have proposed various levels of fusion, various possible scenarios, the different modes of operation, integrationstrategies and design issues. The fusion levels proposed for multimodal systems are shown in Fig. 1 and described below.
 A.
 
 Fusion at the Feature Extraction Level 
The data obtained from each sensor is used to compute afeature vector. As the features extracted from one biometrictrait are independent of those extracted from the other, it isreasonable to concatenate the two vectors into a single newvector. The primary benefit of feature level fusion is thedetection of correlated feature values generated by differentfeature extraction algorithms and, in the process, identifying asalient set of features that can improve recognition accuracy[14]. The new vector has a higher dimension and represents theidentity of the person in a different hyperspace. Eliciting thisfeature set typically requires the use of dimensionalityreduction/selection methods and, therefore, feature level fusionassumes the availability of a large number of training data.
 B.
 
 Fusion at the Matching Score Level 
Feature vectors are created independently for each sensor and are then compared to the enrollment templates which arestored separately for each biometric trait. Each system providesa matching sore indicating the proximity of the feature vector with the template vector. These individual scores are finallycombined into a total score (using maximum rule, minimumrule, sum rule, etc.) which is passed to the decision module toassert the veracity of the claimed identity. Score level fusion isoften used because matcher scores are frequently availablefrom each vendor matcher system and, when multiple scoresare fused, the resulting performance may be evaluated in thesame manner as a single biometric system. The matchingscores of the individual matchers may not be homogeneous.For example, one matcher may output a similarity measurewhile another may output a dissimilarity measure. Further, thescores of individual matchers need not be on the numericalscale. For these reasons, score normalization is essential totransform the scores of the individual matchers into a commondomain before combining them [1]. Common theoreticalframework [15] for combining classifiers using sum rule,maximum and minimum rules are analyzed, and have observedthat sum rule outperforms other classifiers combinationschemes.
C.
 
 Fusion at the Decision Level 
A separate identification decision is made for each biometric trait. These decisions are then combined into a finalvote. The fusion process is performed by a combinationalgorithm such as AND, OR, etc. Also a majority votingscheme can be used to make the final decision.III.
 
SPEAKER 
I
DENTIFICATION EXPERIMENT
 
 A.
 
 Feature Extraction Technique
Speech signals contain two types of information; time andfrequency. The most meaningful features in time space aregenerally the sharp variations in signal amplitude. In thefrequency domain, although the dominant frequency channelsof speech signals are located in the middle frequency region,different speakers may have different responses in allfrequency regions [16]. Thus, some useful information may belost using the traditional methods which just consider fixedfrequency channels.In this paper, the multi-resolution decomposing techniqueusing wavelet transform is used. Wavelets have the ability toanalyze different parts of a signal at different scales. Based onthis technique, one can decompose the input speech signal intodifferent resolution levels. The characteristics of multiplefrequency channels and any change in the smoothness of thesignal can be detected. Then, the Mel-frequency cepstral
49http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 4, April 2011
coefficients (MFCCs) are extracted from the wavelet channelsto represent features characteristics.The Mel-frequency cepstral (MFC) is a representation of the short-term power spectrum of a sound based on a linear cosine transform of a log power spectrum on a nonlinear Melscale of frequency. In the MFC, the frequency bands areequally spaced on the Mel scale, which approximates thehuman auditory system’s response more closely than thelinearly-spaced frequency bands used in the normal cepstral.This frequency warping property can allow for better representation of sound [17]. In this way, the proposedwavelet-based MFCCs feature extraction technique combinesthe advantages of both wavelets and MFCCs.
 B.
 
 Recognition Technique
In speaker identification, the objective is to discriminate between the given speaker and all other speakers. The goal is todesign a system that minimizes the probability of identificationerrors. This is done by computing a match score. This score is ameasure of similarity between the input feature vectors andsome model. In this work, hidden Markov models (HMMs) areused in the recognition stage. HMMs are stochastic models inwhich the pattern matching is probabilistic. The result is ameasure of likelihood, or conditional probability of theobservation given the model. HMMs are used to model astochastic process defined by a set of states and transition probabilities between those states. Each state of the HMM willmodel a certain segment of the vector sequence of theutterance, while the dynamic changes of the vector sequencewill be modelled by transition between the states. In the statesof the HMM, stationary emission processes are modelled,which are assumed to correspond with stationary segments of speech. Within those segments, the wide variability of theemitted vectors should be allowed [18].
C.
 
 Experiments, Results and Discussions
The database contains the speech data files of 40 speakers.These speech files consist of isolated Arabic words. Eachspeaker repeats each word 16 times, 10 of the utterances are for training and 6 for testing. The data were recorded using amicrophone, and all samples are stored in Microsoft waveformat files with 8000 Hz sampling rate, 16 bit PCM and monochannels.The signals are decomposed at level 3 using db8 wavelet.For the MFCCs, the Mel filter bank is designed with 20frequency bands. In the calculation of all the features, thespeech signal is partitioned into frames; the frame size of theanalysis is 256 samples with 100 samples overlapping.A recognition system was developed using the HiddenMarkov toolbox for use with Matlab, implementing a 4 statesleft-to-right transition model for each speaker, the probabilitydistribution on each state was modelled as a 8 mixturesGaussian with diagonal covariance matrix. It is often assumedthat the individual features of the feature vector are notcorrelated, then diagonal covariance matrices can be usedinstead of full covariance matrices. This reduces the number of  parameters and computational efforts.HMMs are used with the proposed feature extractiontechnique, and the results are compared to HMMs used for recognition with the MFCCs alone. Also, in order to evaluatethe performance of the proposed method in a noisyenvironment, the test patterns of 6 utterances are corrupted byadditive white Gaussian noise so that the signal to noise ratio(SNR) is 20 dB. The results are summarized in Table I.It is noted that the wavelet-based MFCCs give better resultsthan MFCCs alone. Also, the performance of the system usingMFCCs alone is affected significantly by the added noise,while the proposed technique demonstrate much better noiserobustness with a satisfactory identification rate.
Stream 2Stream 1Decision Yes/NoFeatureExtractionFeatureExtractionFeatureVector FeatureVector MatchingMatchingMatchScoreMatchScoreDecision Yes/NoFeature LevelFusionScore LevelFusionDecision Level FusionFigure 1. Fusion levels in multimodal biometric fusion.
50http://sites.google.com/site/ijcsis/ISSN 1947-5500

Activity (4)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads
Lucas Ramos liked this
scribdshullir liked this

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->