(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 4, April 2011
fusion takes place in parallel at matching score level using sum, product or the minimum value rule. Middendorff, Bowyer andYan in  detail different approaches used in combining ear and face for identification. In , an overview of thedevelopment of the SecurePhone mobile communicationsystem is presented. In this system, a multimodal biometricauthentication gives access to the system’s built-in e-signingfacilities, enabling users to deal m-contracts using a mobile callin an easy yet secure and dependable way. In their work,signature data is combined with the video data of unrelatedsubjects into virtual subjects. This is possible becausesignatures can be assumed statistically independent of face andvoice data. In his PhD thesis, Karthik  proposes a fusionstrategy based likelihood ratio used in the Neyman-Pearsontheorem for combination of match score. He shows that thisapproach achieves high recognition rates over multipledatabases without any parameter tuning.In this paper, we introduce a multimodal biometric systemwhich integrates face and voice to make a personalidentification. Most of the successful commercial biometricsystems currently rely on fingerprint, face or voice. Face andspeech are routinely used by all of us in our daily recognitiontasks . Despite the fact that there are more reliable biometric recognition techniques such as fingerprint and irisrecognition, the success of these techniques depends highly onuser cooperation, since the user must position his eye in frontof the iris scanner or put his finger in the fingerprint device. Onthe other hand, face recognition has the benefit of being a passive, non intrusive system to verify personal identity in anatural and friendly way since it is based on images recorded by a distance camera, and can be effective even if the user isnot aware of the existence of the face recognition system. Thehuman face is the most common characteristics used byhumans to recognize other people and this is why personalidentification based on facial images is considered thefriendliest among all biometrics . Speech is one of the basiccommunications, which is better than other methods in thesense of efficiency and convenience . For these reasons,face and voice are chosen in our work to build individual facerecognition and speaker identification modules. These modulesare then combined to achieve a highly effective personidentification system.II.
USION IN BIOMETRICS
Ross and Jain  have presented an overview of multimodal biometrics and have proposed various levels of fusion, various possible scenarios, the different modes of operation, integrationstrategies and design issues. The fusion levels proposed for multimodal systems are shown in Fig. 1 and described below.
Fusion at the Feature Extraction Level
The data obtained from each sensor is used to compute afeature vector. As the features extracted from one biometrictrait are independent of those extracted from the other, it isreasonable to concatenate the two vectors into a single newvector. The primary benefit of feature level fusion is thedetection of correlated feature values generated by differentfeature extraction algorithms and, in the process, identifying asalient set of features that can improve recognition accuracy. The new vector has a higher dimension and represents theidentity of the person in a different hyperspace. Eliciting thisfeature set typically requires the use of dimensionalityreduction/selection methods and, therefore, feature level fusionassumes the availability of a large number of training data.
Fusion at the Matching Score Level
Feature vectors are created independently for each sensor and are then compared to the enrollment templates which arestored separately for each biometric trait. Each system providesa matching sore indicating the proximity of the feature vector with the template vector. These individual scores are finallycombined into a total score (using maximum rule, minimumrule, sum rule, etc.) which is passed to the decision module toassert the veracity of the claimed identity. Score level fusion isoften used because matcher scores are frequently availablefrom each vendor matcher system and, when multiple scoresare fused, the resulting performance may be evaluated in thesame manner as a single biometric system. The matchingscores of the individual matchers may not be homogeneous.For example, one matcher may output a similarity measurewhile another may output a dissimilarity measure. Further, thescores of individual matchers need not be on the numericalscale. For these reasons, score normalization is essential totransform the scores of the individual matchers into a commondomain before combining them . Common theoreticalframework  for combining classifiers using sum rule,maximum and minimum rules are analyzed, and have observedthat sum rule outperforms other classifiers combinationschemes.
Fusion at the Decision Level
A separate identification decision is made for each biometric trait. These decisions are then combined into a finalvote. The fusion process is performed by a combinationalgorithm such as AND, OR, etc. Also a majority votingscheme can be used to make the final decision.III.
Feature Extraction Technique
Speech signals contain two types of information; time andfrequency. The most meaningful features in time space aregenerally the sharp variations in signal amplitude. In thefrequency domain, although the dominant frequency channelsof speech signals are located in the middle frequency region,different speakers may have different responses in allfrequency regions . Thus, some useful information may belost using the traditional methods which just consider fixedfrequency channels.In this paper, the multi-resolution decomposing techniqueusing wavelet transform is used. Wavelets have the ability toanalyze different parts of a signal at different scales. Based onthis technique, one can decompose the input speech signal intodifferent resolution levels. The characteristics of multiplefrequency channels and any change in the smoothness of thesignal can be detected. Then, the Mel-frequency cepstral