You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/324031666

Voice Biometric: A Technology for Voice Based Authentication

Article  in  Advanced Science, Engineering and Medicine · July 2018


DOI: 10.1166/asem.2018.2219

CITATIONS READS

2 1,900

3 authors:

Nilu Singh Alka Agrawal


Babu Banarasi Das University Babasaheb Bhimrao Ambedkar University
44 PUBLICATIONS   109 CITATIONS    76 PUBLICATIONS   160 CITATIONS   

SEE PROFILE SEE PROFILE

Prof. Raees Ahmad Khan


Babasaheb Bhimrao Ambedkar University
158 PUBLICATIONS   583 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Security Durability Evaluation View project

speech recognition View project

All content following this page was uploaded by Nilu Singh on 27 March 2018.

The user has requested enhancement of the downloaded file.


Article
Advanced Science,
Copyright © 2018 American Scientific Publishers
Engineering and Medicine
All rights reserved Vol. 10, 1–6, 2018
Printed in the United States of America www.aspbs.com/asem

Voice Biometric: A Technology for Voice


Based Authentication
Nilu Singh∗ , Alka Agrawal, and R. A. Khan
SIST-DIT, Babasaheb Bhimrao Ambedkar University (A Central University), Lucknow, UP, India

Recognizing a person’s individuality by his/her voice is known as Automatic Speaker Recognition


(ASR). Speaker recognition falls in the category of biometric security systems. Biometric is related
to human characteristics or individuality. Biometric verification or realistic authentication is used
to recognize an individual through his/her voice individual characteristic. Voice biometric includes
behavioral or physiological measurements of individual. Behavioral biometric is performed by Voice,
Signature, Keystrokes, and Typing etc. whereas physiological biometric includes iris, face, retina,
fingerprints, ear, DNA etc. Now a days voice biometric is emerging research area. This paper
presents a review dealing with the process of recognizing human beings from biometric data.
Keywords: Forensic Linguistics, Biometrics and Its Types, Forensic Speaker Recognition,
Speaker Recognition, Prosodic, Applications of Speaker Recognition.

1. INTRODUCTION Forensic science is a method of crime investigation by


In today’s environment, where insecurity is everywhere gathering and examining criminal’s information.7 Com-
security has been one of the important issues. For provid- puter forensics or computer forensic science is an area
ing security voice biometric is an emerging area especially of digital forensic science. It is used in legal verification
for the purpose of authentication.1 2 In voice biometric through computers (for digital storage media). The aim
speaker recognition is performed with the help of the of digital forensic is to identifying, analysing, preserv-
unique characteristics of human voice including physio- ing, recovering and presenting specific information and
logical and behavioral characteristics. These characteristics judgment concerning digital information.8 Digital foren-
have specific and appropriate features of voice and have sic is mainly associated with investigation related forensic
crimes through computers and the results are used in civil
potential to recognizing a person.3 With this approach it
legal proceedings. Digital forensic uses various techniques
is also possible to authenticate a person irrespective of
to distinguish identity of individuals. Digital forensic is
changes of environment or channel. This approach is very
a very young area for crime investigation.9 Forensic lin-
useful and cost effective as it is voice based biometric tech-
guistic is used for voice identification which is also called
nique which is easily available in this digital era.4 5 There
forensic phonetics. It is performed on the basis of voice
exist many areas where this technique can be success- acoustic qualities (if the voice is recorded anywhere e.g.,
fully implemented for security and investigation perspec- on a tape, mobile phone or any other device).10 Identify-
tive. Few of the application areas of speaker recognition ing a speaker with the help of forensic linguistic is called
system includes forensics, remote access control security, forensic speaker recognition. It is important and challeng-
web services, online calling, personalization of services ing task. Forensic speaker recognition is an application of
and customer relationship management, voice based bio- speaker recognition.11
metric system, voice based banking, surveillance/criminal The term speaker recognition refers to speaker ver-
investigation etc.4–6 Figure 1 show that how biometric ification as well as speaker identification. Speaker
related to human characteristics. recognition/Voice recognition is the process of a person’s
authentication by his/her voice.5 It is also known as Bio-
metric Identification Technique (BIT).3 In this technique

Author to whom correspondence should be addressed. human traits are used for identification and verification for

Adv. Sci. Eng. Med. 2018, Vol. 10, No. xx 2164-6627/2018/10/001/006 doi:10.1166/asem.2018.2219 1
Voice Biometric: A Technology for Voice Based Authentication Singh et al.

G. Non Confront: Something which cannot be easily


fooled.
The above properties forms the basis to decide what
features should be used as biometric. Every biometric has
individual purpose for its use such as security system,
crime investigation, voting system, Time accounting etc.
The selection of biometric depends on the requirement
for authentication.13 14 The available biometric examples
include DNA matching, eyes, ear, voice, face, fingerprint,
hand geometry, signature/writing etc. the Figures 2 and 3
shows about physiological and behavioral characteristics
Figure 1. Types of biometric.
of any person.
(a) DNA: Identification of an individual using the analysis
the purpose of access control. Speaker recognition tech- of DNA segments.
nology is mainly used for three purposes. In authentication (b) Eyes (Eris and Retina): Use of the features found in
purpose, forensic scenario, screening and indexing applica- the eyes to identify an individual.
tions. Authentication refers to verify the identity of a user (c) Ear: Identification of someone by using the shape of
who needs physical or logical access. Voice forensic refers the ear.
to comparing two voice samples to determine the source (d) Face: The analysis of facial features for the authenti-
of the same. Screening and indexing is refers to search of cation of an individual.
specific speaker speech from large voice database.12 (e) Fingerprint: Use of the ridges and valleys found on
The rest of the paper is organized as follows: the next the surface tips on human fingers.
section describes about biometric techniques and their (f) Hand geometry: Use of the geometric features of the
types. Section 3 presents concept of speaker recognition hand.
technology and its phases. In Section 4 prosodic features (g) Signature/writing: The authentication of an individual
are presented for speaker recognition. Section 5 describes by the analysis of handwriting style.
the applications of speaker recognition. In Section 6 (h) Voice Speaker Recognition): The use of the
describes the performance measurement of a biometric voice/speech as a method of determining the identity of a
system. Finally paper concludes at Section 7. speaker.
The above mentioned are some biometrics which are
2. BIOMETRIC AND ITS TYPES used for person authentication. Biometric is considered
Biometric is a feature of human being by using which a for Security and accuracy, but the disadvantages are being
person can be recognized. The following properties have offensive of privacy and cost of implementation.14 Follow-
been considered as guiding light to understand about what ing are some advantages of biometrics:
biological features are best suited for biometric:13 14 • Convenience
A. Uniqueness: Something that differentiate individuals. • Increased security
B. Universality: Something that everybody has. • Accuracy
C. Stability: Something which is constant over time for • Non imitative
each and every person. • Non sharable
D. Measurability: Something which is easy to measure. • Cannot be lost
E. Acceptability: Something which is well accepted by • Reduced paper work
people. • Easy to access.
F. Performance: Something which has speed, accuracy The above advantage makes biometric systems more
and robustness. secure and accurate. Biometric recognition is the

Figure 2. Types of physiological characteristics of human.

2 Adv. Sci. Eng. Med. 10, 1–6, 2018


Singh et al. Voice Biometric: A Technology for Voice Based Authentication

Figure 3. Types of behavioural characteristics of human.

continuously developing branch of science. It is conve-


nient and reliable technology this allows using biometrics
in common life by making this technology easier and inter-
esting. Almost every country in the world is using at least
one biometric to recognize its nationals. Use of biometric
is increasing day to day for security. For example, In India
‘Aadhaar’ is the largest biometric database in the world,
about 480 million ‘Aadhaar’ numbers have been assigned
to the Indian nationals up to 2013.15

3. SPEAKER RECOGNITION TECHNOLOGY


Speaker recognition is a pattern recognition problem
which is a branch of machine learning. Speaker recog-
nition and speech recognition both come under voice Figure 4. Phases of automatic speaker recognition system.
recognition, which is also known as voice biometric.
Speech and speaker recognition can be distinguished as
‘speaker recognition’ i.e., ‘who is speaking’ and ‘speech in case of verification speech sample is compared with
recognition’ i.e., ‘what is being said.’16 17 Extraction of the claimed voice print e.g., presenting your identity card
speech signal is the main task in the development of to security officer, the security agent compare your face
speaker recognition system. To extract speech features to the photo attached in the identity card and verify
from speech/voice signal there are many techniques avail- that either claimed identity is accepted or rejected so it
able such as MFCC, LPCC, LPC, Prosodic etc. One of the is 1:1 match.3 18 20 Figure 4 shows the common steps
speech features is spectral features of speech signal which involved in development of Automatic speaker recognition
is used to representing speaker’s voice characteristics.18 19 system.
After feature extraction speaker models are created for On the basis of process involved in verification and
individual and stored as voice database. To create mod- identification it can be easily inferred that verification is
els for speakers, various modelling techniques including faster than identification. During the study in this area, it
GMM, HMM, pattern matching, frequency estimation, has found that in most of the cases identification is per-
vector quantization, decision tree and neural networks etc. formed first to find the best match then only verification
are used mainly.16 is done to reach out a conclusive result. It can be justified
Automatic speaker recognition system has two phases by taking an example suppose that “if a voice sample of
enrollment phase and verification phase. In enrollment a suspected assaulter is captured then this voice sample is
phase speaker’s voice is recorded and specific features matched with the previous formed voice database and tried
are extracted from speech signal/voice print. In verifica- to find out best match of this sample (that is identification)
tion phase a voice sample/utterance is compared to stored after that verification is performed then gives conclusive
template or voice print.3 18 Speaker recognition can be result declaring true or false and the best match voice is
classified into speaker verification and speaker identifi- belong to that assaulter or not.
cation. Identification and verification can be explained
as: In case of speaker identification, the voice sample
is compared with multiple templates from stored voice 4. PROSODIC FOR SPEAKER
database and the best match is selected e.g., compar- RECOGNITION
ing a voice sample of an assaulter from previously pre- ASR has three common steps. These steps include data
dictable voice database of criminals and trying to find acquisition, feature extraction and modelling techniques.
best match in this case one speaker’s voice is matched To extract features from speech signal, many feature
against ‘n’ templates so it is also called 1:n match. While extraction techniques are available such as MFCC, LPC,

Adv. Sci. Eng. Med. 10, 1–6, 2018 3


Voice Biometric: A Technology for Voice Based Authentication Singh et al.

LPCC, Prosodic etc. and for modelling HMM, GMM, such as remote computers, financial transactions (credit
UBM etc. techniques are available.3 21 card accessibility), and voice based bank account accessi-
Automatic Speaker Recognition (ASR) is a procedure bility etc.32 Some applications are mentioned below which
to recognize a person using a machine by his/her spo- have been implemented and used in different areas-are:
ken words/sentence. This technology is useful to main- In February 2016, HSBC (UK high-street bank) and its
tain security in various fields such as crime investigation, internet-based retail bank First Direct announced that it
access control, voice based banking and authenticity etc. was providing biometric banking software to access online
To develop ASR system, extraction of characteristic of and phone accounts using fingerprint or voice of their
speech signal is one of the core tasks. The primary work customers.34
of feature extraction technique is to extract voice charac- In 2015 ICICI bank, India has made available voice
teristics from the speech signal. These voice characteris- based password banking to their customers.
tics are unique to each and every person to be used to In August 2014 Go Vivace Inc. has installed a speaker
distinguish them.22 23 Prosodic features for speech signal identification system in its telecom industry that has
are expressed in terms of stress (the relative prominence allowed clients to positively search for an individual,
of a syllable or musical note (especially with regard to among millions of speakers by using just a recording of
stress or pitch), rhythm (recurring at regular intervals) and their voice.35
intonation (rise and fall of the voice pitch). These convey In May 2013, Barclays Wealth has announced to use
required information to identify the spoken language. passive speaker recognition to verify the identity of tele-
Speech features used for speaker recognition may be phone customers. The system used has been developed by
spectral (cepstral) features, phonetic features and prosodic voice recognition chief Nuance.36
features.21 24 25 Traditional speaker recognition systems Nuance Voice Biometrics solutions have been deployed
depend on ‘spectral features’ which are extracted from in several major financial institutions across the world
very short segment of speech signal. This technique is ade- including Banco Santander, Tangerine Bank, Royal Bank
quate for clean data but the system performance degrades of Canada, and Manulife.37
if the data is noisy or there is handset variability.26 The above discussed examples are some deployed appli-
Unfortunately this technique is not suitable to extract cations of speaker recognition.
long-range speech features related to person’s speaking In addition speaker recognition is also used in areas such
behaviour such as prosodic, lexical and discourse related as criminal investigations and many more.3
habits. The purpose for using such long-range features
in speaker recognition system is to increase system per-
formance as compared to system using cepstral/spectral 6. PERFORMANCE MEASUREMENTS OF
features.27 BIOMETRIC SYSTEMS
As authors said in Refs. [28, 29] that it has been The performance of speaker recognition system or biomet-
found that by using long-range speech features, the sys- ric systems depends on the following factors:
tem performance has been improved. Another advantage • False Acceptance Rate (FAR)
of such type of technique is to use long-range speech • False Rejection Rate (FRR)
features which replicate person’s behavioral characteris- • Relative Operating Characteristic (ROC)
tics of speech features. Such type of features could have • Equal Error Rate
potential for recognizing speakers as well as recognizing • Template Capacity.
characteristics of the speech, for example speaking rate,
A. False Acceptance Rate (FAR)
speaking style, pitch etc. The main purpose of research
It is the probability that the recognition system matches
on long-range speech features is to understand speaking
incorrectly (input pattern) with the non-matching template
behaviour.30 It is assumed that prosody is associated to
in the speaker database, i.e., it measures the percent of
linguistic component of voice such as syllables and it is
imposter recognition acceptance.2 12 13
noticeable that changes occur in measurable parameters for
B. False Rejection Rate (FRR)
example fundamental frequency F0, energy and duration
It is the probability that the recognition system misses
of speech.31
the correct input pattern template which exists in the
speaker database, i.e., it measures the percent of true
5. APPLICATIONS OF SPEAKER speaker which are falsely rejected.
RECOGNITION C. Relative Operating Characteristic (ROC)
In this digital era speaker recognition system has been It is a graphical classification between the FAR and
commercialized. This technology is now accepted by both FRR. Speaker recognition system performs on the basis
government and financial sectors for quick and secure of the result (accepted/rejected) of matching algorithm.
authentication.33 In recent years, use of biometric tech- Matching algorithm gives results on a threshold; thresh-
niques for person authentication is used in many areas old determines the recognition value close to a template

4 Adv. Sci. Eng. Med. 10, 1–6, 2018


Singh et al. Voice Biometric: A Technology for Voice Based Authentication

stored in the database. Higher threshold reduces the FAR 10. J. Bishop, Using the concepts of forensic linguistics, bleasure
but increases the FRR, while when threshold is reduced and motif to enhance multimedia forensic evidence collection,
then lower false non-match are found and more imposters The 2014 International Conference on Security and Management
SAM’14, Monte Carlo Resort in Las Vegas, Nevada, USA (2014),
are accepted.2 12 13
pp. 21–24.
D. Equal Error Rate 11. R. G. Hautamaki, T. Kinnunen, V. Hautamaki, and A.-M.
It is the rate where FAR and FRR both are equal. EER Laukkanen, Speech Communication 72, 13 (2015).
is a prompt method to compare the accuracy of any system 12. Available: http://www.biometricsinstitute.org/pages/types-of-bio-
with the help of different ROC curve. The lowest EER metrics.html, Biometrics Institute, Australia (2016).
shows more accurate system.4 21 13. Available: http://www.biometric-security-devices.com/types-of-bio-
metric-devices.html Vincent Dail, Biometric-security-devices.com
E. Template Capacity
(2011).
Template capacity of any system is defined as the max- 14. A. Babich, Biometric Authentication, Types of Biometric Identi-
imum number of sets of data which can be stored in the fiers, Bachelor’s Thesis, Degree Programme in Business Information
system.2 Technology University of Applied Sciences (2013), pp. 1–56.
15. P. Pollack and Sumby, Experimental Phonetics, MSS Information
Speaker recognition performance is usually calculated by
Corporation (1974), pp. 251–258.
detection error trade-off (DET) curves, equal error rate, 16. Van Lancker and Kreiman, Journal of Phonetics 19 (1984).
and a weighted cost value. To be more accurate, a speaker 17. Available at: https://en.wikipedia.org/wiki/Surveillance, Surveil-
recognition system should have lowest EER. It also have lance, August (2016).
high storage capacity.21 38 18. Sahoo Soyuj Kumar, S. R. Mahadeva Prasanna, Choubisa, and Tarun,
IETE Technical Review 29, 54 (2012).
19. N. Singh, A. Agrawal, and R. A. Khan, Science Journal of Circuits,
7. CONCLUSION Systems and Signal Processing 4, 14 (2015).
This paper has discussed about some important aspects 20. E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, and
of Forensic Speaker Recognition. The term uniqueness is A. Stolcke, Speech Communication 46, 455 (2005).
21. N. Singh, R. A. Khan, and Raj Shree, Equal Error Rate and
considered as a principle of forensic techniques. Foren- Audio Digitization and Sampling Rate for Speaker Recogni-
sic Linguistics (voice biometric) is the branch of sci- tion System, American Scientific Publishers (2014), Vol. 20,
ence and engineering to analyze the legal problem of pp. 1085–1088.
digital evidence. This technology is the combination of 22. O. O. Khalifa, S. Khan, Md. Rafiqul Islam, M. Faizal, and D. Dol,
science and law. Improvement in the system accuracy Text independent automatic speaker recognition, 3rd International
is needed as these systems are used in many impor- Conference on Electrical and Computer Engineering ICECE, Dhaka,
Bangladesh (2004), pp. 28–30.
tant and secure fields including voice indexing, critical
23. J. A. Bachorowski and M. J. Owren, Journal of Acoust. Soc. Am.
medical records, online transactions, fraud and access con- 106, 1054 (1999).
trol etc. This paper demonstrated the usefulness of foren- 24. D. A. Reynolds and R. C. Rose, IEEE Trans. Speech Audio Process-
sic linguistic in the security critical areas. In addition it ing 3, 72 (1995).
has also discussed about the techniques to be used to 25. E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, and
develop such type of systems. The review helps in under- A. Stolcke, Speech Communication 46, 455 (2005).
26. A. Adami, R. Mihaescu, D. A. Reynolds, and J. J. Godfrey, Mod-
standing the developments in biometric system and its
eling prosodic dynamics for speaker recognition, Proceedings of the
applications. IEEE International Conference on Acoustics, Speech, and Signal
Processing, Hong Kong (2003), pp. 788–91.
27. S. Kajarekar, L. Ferrer, A. Venkataraman, K. Sonmez, E. Shriberg,
References and Notes A. Stolcke, H. Bratt, and V. R. R. Gadde, Speaker recognition
1. T. Kinnunen and H. Li, Speech Communication 52, 12 (2010).
using prosodic and lexical features, Proceedings of the IEEE Speech
2. Available at: https://en.wikipedia.org/wiki/Biometrics (2016).
Recognition and Understanding Workshop, St. Thomas, U.S. Virgin
3. N. Singh and R. A. Khan, Underlying of text independent speaker
Islands (2003), pp. 19–24.
recognition, IEEE Conference ID: 37465, 10th INDIACom 2016 Int.
28. E. Blaauw, Speech Communication 14, 359 (1994).
Conference on Computing for Sustainable Global Development, at
29. E. Shriberg, L. Ferrer, S. Kajareker, and A. Venkataraman, SVM
BVICAM, New Delhi (2016), pp. 11–15.
4. A. Jain, L. Hong, and S. Pankanti, Communications of the ACM modeling of SNERF-grams for speaker recognition, Proceedings of
43, 91 (2000). Inter Speech International Conference on Spoken Language Process-
5. Available: https://en.wikipedia.org/wiki/Speaker Recognition, ing, JEJU Island, Korea (2004), pp. 1–4.
Speaker recognition, March (2016). 30. L. Mary and B. Yegnanarayana, Speech Communication 50, 782
6. P. Rose, Technical forensic speaker recognition: Evaluation, types (2008).
and testing of evidence, Computer Speech and Language, Elsevier, 31. N. Singh and R. A. Khan, Extraction and representation of prosodic
(2006), Vol. 20, pp. 159–191. features for automatic speaker recognition technology, Advanced in
7. Available at: TheFreeDictionary.com.Speaker recognition (2016). Engineering and Technology, Published by: Mc Graw Hill Education
8. M. G. Noblett, M. M. Pollitt, and L. A. Presley, Recovering and (2015), pp.1–7, ISBN-10:93-85965-79-4.
examining computer forensic evidence, Available at: https://www.fbi 32. S. Memon, Automatic speaker recognition: Modeling, feature extrac-
.gov/about-us/lab/forensic-science-communications/fsc/oct2000/index tion and effects of clinical environment, School of Electrical and
.htm/computer.htm (2010). Computer Engineering Science, Engineering and Technology Port-
9. D. Meuwly, Science and Justice 46, 205 (2006). folio RMIT University (2010), pp. 1–242.

Adv. Sci. Eng. Med. 10, 1–6, 2018 5


Voice Biometric: A Technology for Voice Based Authentication Singh et al.

33. J. Kollewe, HSBC rolls out voice and touch ID security for bank 36. Voice Biometrics for fast, secure authentication in your IVR and
customers business, The guardian, February 2016, Available at: mobile apps, Available: http://www.nuance.com/ucmprod/groups/
https://www.theguardian.com/business/2016/feb/19/hsbc-rolls-out- enterprise/@web-enus/documents/collateral/nc_044985.pdf (2016).
voice-touch-id-security-bank-customers (2016).
37. J.-E. Lee, Anil K. Jain, and R. Jin, Scars, Marks and Tattoos (SMT):
34. Available at: http://web.archive.org/web/20140815043233/http://
Soft Biometric for Suspect and Victim Identification, 978-1-4244-
govivace.com:80/solutions/speaker-identification-for-forensic-sciences,
Speaker recognition (2016). 2567-9/08/IEEE (2008), pp. 1–8.
35. Available at: Wealth.barclays.com, International Banking, Voice Bio- 38. S. Furui, Speech and Speaker Recognition Evaluation, Springer
metric Technology in Banking Barclays (2013). (2007), pp. 1–27.

Received: 25 July 2017. Accepted: 23 December 2017.

6 Adv. Sci. Eng. Med. 10, 1–6, 2018

View publication stats

You might also like