Professional Documents
Culture Documents
net/publication/324031666
CITATIONS READS
2 1,900
3 authors:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Nilu Singh on 27 March 2018.
Adv. Sci. Eng. Med. 2018, Vol. 10, No. xx 2164-6627/2018/10/001/006 doi:10.1166/asem.2018.2219 1
Voice Biometric: A Technology for Voice Based Authentication Singh et al.
LPCC, Prosodic etc. and for modelling HMM, GMM, such as remote computers, financial transactions (credit
UBM etc. techniques are available.3 21 card accessibility), and voice based bank account accessi-
Automatic Speaker Recognition (ASR) is a procedure bility etc.32 Some applications are mentioned below which
to recognize a person using a machine by his/her spo- have been implemented and used in different areas-are:
ken words/sentence. This technology is useful to main- In February 2016, HSBC (UK high-street bank) and its
tain security in various fields such as crime investigation, internet-based retail bank First Direct announced that it
access control, voice based banking and authenticity etc. was providing biometric banking software to access online
To develop ASR system, extraction of characteristic of and phone accounts using fingerprint or voice of their
speech signal is one of the core tasks. The primary work customers.34
of feature extraction technique is to extract voice charac- In 2015 ICICI bank, India has made available voice
teristics from the speech signal. These voice characteris- based password banking to their customers.
tics are unique to each and every person to be used to In August 2014 Go Vivace Inc. has installed a speaker
distinguish them.22 23 Prosodic features for speech signal identification system in its telecom industry that has
are expressed in terms of stress (the relative prominence allowed clients to positively search for an individual,
of a syllable or musical note (especially with regard to among millions of speakers by using just a recording of
stress or pitch), rhythm (recurring at regular intervals) and their voice.35
intonation (rise and fall of the voice pitch). These convey In May 2013, Barclays Wealth has announced to use
required information to identify the spoken language. passive speaker recognition to verify the identity of tele-
Speech features used for speaker recognition may be phone customers. The system used has been developed by
spectral (cepstral) features, phonetic features and prosodic voice recognition chief Nuance.36
features.21 24 25 Traditional speaker recognition systems Nuance Voice Biometrics solutions have been deployed
depend on ‘spectral features’ which are extracted from in several major financial institutions across the world
very short segment of speech signal. This technique is ade- including Banco Santander, Tangerine Bank, Royal Bank
quate for clean data but the system performance degrades of Canada, and Manulife.37
if the data is noisy or there is handset variability.26 The above discussed examples are some deployed appli-
Unfortunately this technique is not suitable to extract cations of speaker recognition.
long-range speech features related to person’s speaking In addition speaker recognition is also used in areas such
behaviour such as prosodic, lexical and discourse related as criminal investigations and many more.3
habits. The purpose for using such long-range features
in speaker recognition system is to increase system per-
formance as compared to system using cepstral/spectral 6. PERFORMANCE MEASUREMENTS OF
features.27 BIOMETRIC SYSTEMS
As authors said in Refs. [28, 29] that it has been The performance of speaker recognition system or biomet-
found that by using long-range speech features, the sys- ric systems depends on the following factors:
tem performance has been improved. Another advantage • False Acceptance Rate (FAR)
of such type of technique is to use long-range speech • False Rejection Rate (FRR)
features which replicate person’s behavioral characteris- • Relative Operating Characteristic (ROC)
tics of speech features. Such type of features could have • Equal Error Rate
potential for recognizing speakers as well as recognizing • Template Capacity.
characteristics of the speech, for example speaking rate,
A. False Acceptance Rate (FAR)
speaking style, pitch etc. The main purpose of research
It is the probability that the recognition system matches
on long-range speech features is to understand speaking
incorrectly (input pattern) with the non-matching template
behaviour.30 It is assumed that prosody is associated to
in the speaker database, i.e., it measures the percent of
linguistic component of voice such as syllables and it is
imposter recognition acceptance.2 12 13
noticeable that changes occur in measurable parameters for
B. False Rejection Rate (FRR)
example fundamental frequency F0, energy and duration
It is the probability that the recognition system misses
of speech.31
the correct input pattern template which exists in the
speaker database, i.e., it measures the percent of true
5. APPLICATIONS OF SPEAKER speaker which are falsely rejected.
RECOGNITION C. Relative Operating Characteristic (ROC)
In this digital era speaker recognition system has been It is a graphical classification between the FAR and
commercialized. This technology is now accepted by both FRR. Speaker recognition system performs on the basis
government and financial sectors for quick and secure of the result (accepted/rejected) of matching algorithm.
authentication.33 In recent years, use of biometric tech- Matching algorithm gives results on a threshold; thresh-
niques for person authentication is used in many areas old determines the recognition value close to a template
stored in the database. Higher threshold reduces the FAR 10. J. Bishop, Using the concepts of forensic linguistics, bleasure
but increases the FRR, while when threshold is reduced and motif to enhance multimedia forensic evidence collection,
then lower false non-match are found and more imposters The 2014 International Conference on Security and Management
SAM’14, Monte Carlo Resort in Las Vegas, Nevada, USA (2014),
are accepted.2 12 13
pp. 21–24.
D. Equal Error Rate 11. R. G. Hautamaki, T. Kinnunen, V. Hautamaki, and A.-M.
It is the rate where FAR and FRR both are equal. EER Laukkanen, Speech Communication 72, 13 (2015).
is a prompt method to compare the accuracy of any system 12. Available: http://www.biometricsinstitute.org/pages/types-of-bio-
with the help of different ROC curve. The lowest EER metrics.html, Biometrics Institute, Australia (2016).
shows more accurate system.4 21 13. Available: http://www.biometric-security-devices.com/types-of-bio-
metric-devices.html Vincent Dail, Biometric-security-devices.com
E. Template Capacity
(2011).
Template capacity of any system is defined as the max- 14. A. Babich, Biometric Authentication, Types of Biometric Identi-
imum number of sets of data which can be stored in the fiers, Bachelor’s Thesis, Degree Programme in Business Information
system.2 Technology University of Applied Sciences (2013), pp. 1–56.
15. P. Pollack and Sumby, Experimental Phonetics, MSS Information
Speaker recognition performance is usually calculated by
Corporation (1974), pp. 251–258.
detection error trade-off (DET) curves, equal error rate, 16. Van Lancker and Kreiman, Journal of Phonetics 19 (1984).
and a weighted cost value. To be more accurate, a speaker 17. Available at: https://en.wikipedia.org/wiki/Surveillance, Surveil-
recognition system should have lowest EER. It also have lance, August (2016).
high storage capacity.21 38 18. Sahoo Soyuj Kumar, S. R. Mahadeva Prasanna, Choubisa, and Tarun,
IETE Technical Review 29, 54 (2012).
19. N. Singh, A. Agrawal, and R. A. Khan, Science Journal of Circuits,
7. CONCLUSION Systems and Signal Processing 4, 14 (2015).
This paper has discussed about some important aspects 20. E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, and
of Forensic Speaker Recognition. The term uniqueness is A. Stolcke, Speech Communication 46, 455 (2005).
21. N. Singh, R. A. Khan, and Raj Shree, Equal Error Rate and
considered as a principle of forensic techniques. Foren- Audio Digitization and Sampling Rate for Speaker Recogni-
sic Linguistics (voice biometric) is the branch of sci- tion System, American Scientific Publishers (2014), Vol. 20,
ence and engineering to analyze the legal problem of pp. 1085–1088.
digital evidence. This technology is the combination of 22. O. O. Khalifa, S. Khan, Md. Rafiqul Islam, M. Faizal, and D. Dol,
science and law. Improvement in the system accuracy Text independent automatic speaker recognition, 3rd International
is needed as these systems are used in many impor- Conference on Electrical and Computer Engineering ICECE, Dhaka,
Bangladesh (2004), pp. 28–30.
tant and secure fields including voice indexing, critical
23. J. A. Bachorowski and M. J. Owren, Journal of Acoust. Soc. Am.
medical records, online transactions, fraud and access con- 106, 1054 (1999).
trol etc. This paper demonstrated the usefulness of foren- 24. D. A. Reynolds and R. C. Rose, IEEE Trans. Speech Audio Process-
sic linguistic in the security critical areas. In addition it ing 3, 72 (1995).
has also discussed about the techniques to be used to 25. E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, and
develop such type of systems. The review helps in under- A. Stolcke, Speech Communication 46, 455 (2005).
26. A. Adami, R. Mihaescu, D. A. Reynolds, and J. J. Godfrey, Mod-
standing the developments in biometric system and its
eling prosodic dynamics for speaker recognition, Proceedings of the
applications. IEEE International Conference on Acoustics, Speech, and Signal
Processing, Hong Kong (2003), pp. 788–91.
27. S. Kajarekar, L. Ferrer, A. Venkataraman, K. Sonmez, E. Shriberg,
References and Notes A. Stolcke, H. Bratt, and V. R. R. Gadde, Speaker recognition
1. T. Kinnunen and H. Li, Speech Communication 52, 12 (2010).
using prosodic and lexical features, Proceedings of the IEEE Speech
2. Available at: https://en.wikipedia.org/wiki/Biometrics (2016).
Recognition and Understanding Workshop, St. Thomas, U.S. Virgin
3. N. Singh and R. A. Khan, Underlying of text independent speaker
Islands (2003), pp. 19–24.
recognition, IEEE Conference ID: 37465, 10th INDIACom 2016 Int.
28. E. Blaauw, Speech Communication 14, 359 (1994).
Conference on Computing for Sustainable Global Development, at
29. E. Shriberg, L. Ferrer, S. Kajareker, and A. Venkataraman, SVM
BVICAM, New Delhi (2016), pp. 11–15.
4. A. Jain, L. Hong, and S. Pankanti, Communications of the ACM modeling of SNERF-grams for speaker recognition, Proceedings of
43, 91 (2000). Inter Speech International Conference on Spoken Language Process-
5. Available: https://en.wikipedia.org/wiki/Speaker Recognition, ing, JEJU Island, Korea (2004), pp. 1–4.
Speaker recognition, March (2016). 30. L. Mary and B. Yegnanarayana, Speech Communication 50, 782
6. P. Rose, Technical forensic speaker recognition: Evaluation, types (2008).
and testing of evidence, Computer Speech and Language, Elsevier, 31. N. Singh and R. A. Khan, Extraction and representation of prosodic
(2006), Vol. 20, pp. 159–191. features for automatic speaker recognition technology, Advanced in
7. Available at: TheFreeDictionary.com.Speaker recognition (2016). Engineering and Technology, Published by: Mc Graw Hill Education
8. M. G. Noblett, M. M. Pollitt, and L. A. Presley, Recovering and (2015), pp.1–7, ISBN-10:93-85965-79-4.
examining computer forensic evidence, Available at: https://www.fbi 32. S. Memon, Automatic speaker recognition: Modeling, feature extrac-
.gov/about-us/lab/forensic-science-communications/fsc/oct2000/index tion and effects of clinical environment, School of Electrical and
.htm/computer.htm (2010). Computer Engineering Science, Engineering and Technology Port-
9. D. Meuwly, Science and Justice 46, 205 (2006). folio RMIT University (2010), pp. 1–242.
33. J. Kollewe, HSBC rolls out voice and touch ID security for bank 36. Voice Biometrics for fast, secure authentication in your IVR and
customers business, The guardian, February 2016, Available at: mobile apps, Available: http://www.nuance.com/ucmprod/groups/
https://www.theguardian.com/business/2016/feb/19/hsbc-rolls-out- enterprise/@web-enus/documents/collateral/nc_044985.pdf (2016).
voice-touch-id-security-bank-customers (2016).
37. J.-E. Lee, Anil K. Jain, and R. Jin, Scars, Marks and Tattoos (SMT):
34. Available at: http://web.archive.org/web/20140815043233/http://
Soft Biometric for Suspect and Victim Identification, 978-1-4244-
govivace.com:80/solutions/speaker-identification-for-forensic-sciences,
Speaker recognition (2016). 2567-9/08/IEEE (2008), pp. 1–8.
35. Available at: Wealth.barclays.com, International Banking, Voice Bio- 38. S. Furui, Speech and Speaker Recognition Evaluation, Springer
metric Technology in Banking Barclays (2013). (2007), pp. 1–27.