0% found this document useful (0 votes)
100 views8 pages

Voice Recognition Technology Overview

This document discusses voice recognition technologies and their market. It begins with an overview of speech recognition and its uses. The voice recognition technologies market was valued at $38.4 billion in 2010 and is expected to reach $58.4 billion by 2015, growing at a compound annual growth rate of 8.8%. The document then discusses the hardware and software components of the voice recognition market. It analyzes the objectives and scope of the report, which will cover voice recognition categories, technologies, applications, and five-year forecasts for market segments. The document also compares voice recognition to interactive voice response and voice verification technologies.

Uploaded by

Mamta Tayade
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Topics covered

  • voice recognition research,
  • voice recognition security,
  • biometric systems,
  • voice recognition growth,
  • voice recognition,
  • voice recognition user experie…,
  • IVR systems,
  • automation systems,
  • voice-enabled applications,
  • consumer expectations
0% found this document useful (0 votes)
100 views8 pages

Voice Recognition Technology Overview

This document discusses voice recognition technologies and their market. It begins with an overview of speech recognition and its uses. The voice recognition technologies market was valued at $38.4 billion in 2010 and is expected to reach $58.4 billion by 2015, growing at a compound annual growth rate of 8.8%. The document then discusses the hardware and software components of the voice recognition market. It analyzes the objectives and scope of the report, which will cover voice recognition categories, technologies, applications, and five-year forecasts for market segments. The document also compares voice recognition to interactive voice response and voice verification technologies.

Uploaded by

Mamta Tayade
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Topics covered

  • voice recognition research,
  • voice recognition security,
  • biometric systems,
  • voice recognition growth,
  • voice recognition,
  • voice recognition user experie…,
  • IVR systems,
  • automation systems,
  • voice-enabled applications,
  • consumer expectations

2 LITERATURE SURVEY 2.

1Need Of System
Speech recognition also known as automatic speech recognition, computer speech recognition, speech to text, or just STT converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speakeras is the case for most desktop recognition software. Recognizing the speaker can simplify the task of translating [Link] recognition is a broader solution that refers to technology that can recognize speech without being targeted at single speaker such as a call system that can recognize arbitrary voices. Speech recognition applications include voice user interfaces such as voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), domotic appliance control, search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-totext processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice The voice recognition technologies market will grow at a compound annual growth rate (CAGR) of 8.8% between 2010 and 2015. The total market is valued at an estimated $38.4 billion in 2010 and is expected to reach $58.4 in 2015. 2.2 Market survey Voice recognition software technologies need hardware to transmit the signals as well as abate ambient noise. This sector of the market is worth an estimated $16.5 billion in 2010 and will grow at a 9.8% compound annual growth rate (CAGR) to reach $26.3 billion in 2015. Automatic speech recognition and text-to-speech software work together to voice-enable many applications. Software sales will increase at a compound annual growth rate (CAGR) of 6.8%, from a value of $13.6 billion in 2010 to a value of $18.9 billion in 2015. Find Solutions for Enterprises, SMBs & Service Providers at The Worlds Communications Conference, ITEXPO West. September 13-15, 2011 Austin, Texas

2.2.1 Introduction reasons for the study and its importance No longer narrowly associated with assistive and customer care applications, voice recognition technologies are becoming integral parts of products and services that span a much broader array of industries. With worldwide software revenues expected to reach $18.9 billion by 2015, this maturing industry owes much of its growth to advances from the critical triad of automatic speech recognition (ASR), text-to-speech (TTS), and speaker verification (SV) technologies.

America and Europe, voice recognition providers are partnering with manufacturers who are loading their products with voice-activated multimodal options. These applications do everything from help drivers navigate to their destination and workers voice-pick warehouse inventory to aid doctors automate medical transcription processes and allow Web users to browse by voice commands. Marketers with a watchful eye are not only training their sights on the pent-up product demand of growing Asia-Pacific populations, they are also factoring in the potential of the emerging middle class in Latin America when they develop their marketing strategies.

2.2.2Objectives of the study Companies in the voice recognition space are facing similar challenges as other technology markets. Converging technologies offer the promise of new products and markets. But they also invite disruptive activities inherent in mergers and acquisitions -- all occurring during the worst economic downturn of the new millennium.

Customers are applying the same measuring stick to voice-aided products and services as they do to other products: They value accuracy, speed, and efficiency. Whether obtaining stock quotes from their smart phone, getting wake-up calls from voice-enabled alarm clocks, or accessing voice-translated-email, consumers not only are increasing their expectations about the content quality but also about the quality of the experience.

Choosing voice recognition solutions represents a significant information technology (IT) investment -- a fact not lost on companies that, in better economic times, focus primarily on strategic growth. Compelled to keep discretionary spending to a minimum, companies are more inclined to purchase products and services that can show a quantitative return on investment. Traditionally, call centers, with their highly developed statistical databases and a multitude of speech-enabled processes, provide some of the most compelling evidence that properly integrated voice recognition applications can help companies realize cost savings of as much as 80%.

This report will analyze in depth, voice recognition technologies and the market and applications they serve. It addresses such questions as: Who is using these technologies? What benefits do they accrue from using them? At what price points do they buy them? Which markets will reap the most benefits from their adoption? Which issues must be addressed to generate a successful 2.3.2Scope of the report This report analyzes voice recognition technologies and their markets. It recognizes the fact that software and hardware technologies act in tandem, building the momentum needed for its success. Additionally, tracking the growth of traditional and emerging voice-enabled devices is important since these media will promote and extend voice recognition's reach.

An overview of the voice recognition industry precedes later chapters that review the main voice recognition categories, discuss top supplier market share, new technologies, and the unique challenges faced by each category in the future. Five-year forecasts follow, segmented by voice recognition categories as well as expansion into end markets.

Succeeding chapters discuss enabling technologies, corporate and national research and development funding, the organizational and economic makeup of the voice recognition industry, and the legislative, political, and environmental issues facing the industry. The changing dynamics of international market share also are addressed.

The appendices contain upcoming voice recognition industry-related conferences and recent patent grants, as well as a list of related mergers and acquisitions, licensing arrangements, and partnerships.

2.4 Comparision with other systems 2.4.1. Interactive voice response (IVR) is a technology that allows a computer to interact with humans through the use of voice and DTMF keypad inputs. In telecommunications, IVR allows customers to interact with a companys database via a telephone keypad or by speech recognition, after which they can service their own inquiries by following the IVR dialogue. IVR systems can respond with prerecorded or dynamically generated audio to further direct users on how to proceed. IVR applications can be used to control almost any function where the interface can be broken down into a series of simple interactions. IVR systems deployed in the network are sized to handle large call volumes. IVR technology is also being introduced into automobile systems for hands-free operation. Current deployment in automobiles revolves around satellite navigation, audio and mobile phone systems. It has become common in industries that have recently entered the telecommunications industry to refer to an automated attendant as an IVR. The terms, however, are distinct and mean different things to traditional telecommunications professionals, whereas emerging telephony and VoIP professionals often use the term IVR as a catch-all to signify any kind of telephony menu, even a basic automated attendant.[ The term voice response unit (VRU), is sometimes used as well. 2.4.2Voice Verification Voice biometrics works by digitizing a profile of a person's speech to produce a stored model voice print, or template. Biometric technology reduces each spoken word to segments composed of several dominant frequencies called formants. Each segment has several tones that can be captured in a digital format. The tones collectively identify the speaker's unique voice print. Voice prints are stored in databases in a manner similar to the storing of fingerprints or other biometric data. To ensure a good-quality voice sample, a person usually recites some sort of text or pass phrase, which can be either a verbal phrase or a series of numbers. The phrase may be repeated several times before the sample is analyzed and accepted as a template in the database. When a person speaks the assigned pass phrase, certain words are extracted and compared with the stored template for that individual. When a user attempts to gain access to the system, his or her pass phrase is compared with the previously stored voice model. Some voice recognition systems do not rely on a fixed set of enrolled pass phrases to verify a person's identity. Instead, these systems are trained to recognize similarities between the voice patterns of individuals when the persons speak unfamiliar phrases and the stored templates.

A person's speech is subject to change depending on health and emotional state. Matching a voice print requires that the person speak in the normal voice that was used when the template was created at enrollment. If the person suffers from a physical ailment, such as a cold, or is unusually excited or depressed, the voice sample submitted may be different from the template and will not match. Other factors also affect voice recognition results. Background noise and the quality of the input device (the microphone) can create additional challenges for voice recognition systems. If authentication is being attempted remotely over the telephone, the use of a cell phone instead of a landline can affect the accuracy of the results. Voice recognition systems may be vulnerable to replay attacks: if someone records the authorized user's phrase and replays it, that person may acquire the user's privileges. More sophisticated systems may use liveness testing to determine that a recording is not being used. Voice verification systems can be used to verify a person's claimed identity or to identify a particular person. It is often used where voice is the only available biometric identifier, such as over the telephone. Voice verification systems may require minimal hardware investment as most personal computers already contain a microphone. The downside to the technology is that, although advances have been made in recognizing the human voice, ambient temperature, stress, disease, medications, and other physical changes can negatively impact automated recognition. Voice verification systems are different from voice recognition systems although the two are often confused. Voice recognition is used to translate the spoken word into a specific response, while voice verification verifies the vocal characteristics against those associated with the enrolled user. The goal of voice recognition systems is simply to understand the spoken word, not to establish the identity of the speaker. A familiar example of voice recognition systems is that of an automated call center asking a user to "press the number one on his phone keypad or say the word 'one'." In this case, the system is not verifying the identity of the person who says the word "one"; it is merely checking that the word "one" was said instead of another option. 2.4.3Biometric Fingerprint Recognition Identification systems based on biometrics are capable of identifying persons on the basis either physical or behavioural characteristics. Currently, there are over ten different techniques

available to identify a person based on biometrics. The following techniques are applied within the main categories physical and behavioural characteristics: Behavioural characteristics keystrokes dynamics voice recognition signature dynamics Physical characteristics iris recognition retina recognition vein pattern recognition face recognition recognition of hand or finger geometry fingerprint recognition

Before a system is able to verify the specific biometrics of a person, it of course requires something to compare it with. Therefore, a profile or template containing the biometrical properties is stored in the system. Recording the characteristics of a person is called enrolment. In order to get a profile that corresponds most with reality, the biometrical. characteristics are scanned several times. In case of fingerprint recognition the finger is scanned three to four times to get a profile that is independent of variations that occur in practice, such as the angle of placement of the finger on the scanner. Since storage capacity for the profiles in these systems is usually limited (for example if used in combination with smart cards), it is common to use data compression before storing the profile. Storing profiles in tokens requires a combination of token and biometry for verification and therefore gives a higher level of security. When a biometrical verification is to occur, a scan of the biometrics of a person is made and compared with the characteristics that are stored in the profile. In general, a certain margin of error is allowed between the observed and stored characteristics. If this margin is too small, the system will reject a righteous person more often while if this margin is too large, malicious persons will be accepted by the system. The probabilities that a righteous person will be rejected and that a malicious person will be accepted, are called False Reject Rate (FRR) and False Accept Rate (FAR) respectively. When using a biometric system, one would of course want to minimise both rates, but unfortunately these are not independent. An optimum trade-off between

FRR and FAR has to be found with respect to the application.

2.5Types of Speech Recognition There are two types of speech recognition. One is called speaker-dependent and the other is speaker-independent. Speaker-dependent software is commonly used for dictation software, while speaker-independent software is more commonly found in telephone applications. Speaker-dependent software works by learning the unique characteristics of a single person's voice, in a way similar to voice recognition. New users must first "train" the software by speaking to it, so the computer can analyze how the person talks. This often means users have to read a few pages of text to the computer before they can use the speech recognition software. Speaker-independent software is designed to recognize anyone's voice, so no training is involved. This means it is the only real option for applications such as interactive voice response systems where businesses can't ask callers to read pages of text before using the system. The downside is that speaker-independent software is generally less accurate than speaker-dependent software. Speech recognition engines that are speaker independent generally deal with this fact by limiting the grammars they use. By using a smaller list of recognized words, the speech engine is more likely to correctly recognize what a speaker said. This makes speaker-independent software ideal for most IVR systems, and any application where a large number of people will be using the same system. Speaker dependent software is used more widely in dictation software, where only one person will use the system and there is a need for a large grammar. 2.6 Speech recognition based password enabled switching device The project aims in designing a system which is capable of switching ON/OFF the electrical devices based on the speech (command). This system creates a new era in the automation system. This system integrates human-machine interface. The modules in the project are: Speech recognition system which is capable of recognizing the speech command by the user. Switches Relay and Triac were connected to the electrical appliances that are to be controlled.

Speech is the primary and most convenient means of communication between humans. Whether due to technological curiosity to build machines that mimic humans or desire to automate work with machine, research in speech recognition as a first step towards human-machine communication. Speech recognition is the process of recognizing the spoken word to take necessary actions accordingly. The controlling device of the whole system is a Microcontroller. Speech recognition module along with Relay and Triac are interfaced to the Microcontroller. Whenever user speaks a command (already defined), the speech recognition module recognize it and feds this as input to Microcontroller. The Microcontroller processes this information and acts on the switches relay and Triac accordingly depending on the voice command. This system also provides a unique feature of enabling password. This feature provides security. The foremost thing the user needs to do is to spoke the voice command based password to activate the system. The Microcontroller is programmed in Embedded C language. We can use this project to reduce deaths due to current shocks in industries and also to on/off the electrical devices using Speech recognition module.

. .

Common questions

Powered by AI

Speaker-dependent systems are tailored to recognize a single user's voice through training, making them suitable for dictation tasks where accuracy with a complex grammar is necessary . In contrast, speaker-independent systems are designed to recognize any speaker, without training, and are typically used in IVR systems where ease of access for many users overrides the need for individual accuracy . However, speaker-independent systems may limit recognized vocabularies to maintain accuracy .

Voice verification systems are used for biometric security by comparing a person's spoken input to a stored voice print, ensuring the individual's identity . They differ from voice recognition systems, which convert spoken words into specific commands or text, focusing on understanding rather than identifying the speaker . Voice verification is often employed where voice is the sole biometric factor, while voice recognition systems are used in applications requiring speech-to-text or command interpretation .

Key factors influencing adoption include regional market demands, economic conditions, and the perceived value of voice recognition technologies in improving operational efficiency . In response, companies are tailoring their marketing strategies to the emerging middle class in Latin America and the expanding populations in the Asia-Pacific region . They focus on partnerships with manufacturers to embed voice-activated features into various products, adapting offerings to regional needs . This strategic focus maximizes the perceived benefits and aligns with potential regional growth .

Voice recognition technologies have significantly impacted telecommunications through applications like interactive voice response (IVR) systems, which enhance customer interaction beyond keypad inputs and provide robust automated services . Future challenges include managing the convergence of technologies, navigating economic downturns, and ensuring products meet consumer expectations for accuracy and efficiency, while also providing a quantifiable return on investment . Additionally, issues like dealing with disruptive mergers and acquisitions add complexity to market dynamics .

Biometric identification systems enhance security by using unique physical or behavioral traits for identification, like fingerprints or voice prints . They balance accuracy and security through careful calibration of False Accept Rate (FAR) and False Reject Rate (FRR), aiming for optimal security without hindering legitimate access . Data compression and efficient storage of biometric profiles in tokens further secure the system against unauthorized access . Such systems provide robust security measures while facilitating user convenience in diverse applications .

Potential vulnerabilities in voice recognition systems include their susceptibility to changes in a user's voice due to health or emotional factors, which can impede match accuracy . Background noise and poor input device quality can further compromise system performance . The systems are also vulnerable to replay attacks, where recorded voice samples are used to gain unauthorized access. Implementing liveness testing, which detects unnatural playback, can mitigate some risks . Ensuring consistent environmental and device standards can also improve reliability .

ASR and TTS technologies drive significant growth in voice recognition markets by enabling a wider range of applications such as navigation, warehouse inventory management, medical transcription, and web browsing via voice commands . These technologies allow for more natural human-machine interactions and have expanded the appeal and compatibility of voice recognition technologies across different industries . This technological synergy supports market growth and broadens service offerings, contributing to a projected software revenue increase to $18.9 billion by 2015 .

IVR technology is reshaping user interaction models by automating and simplifying customer interactions with company databases through voice or keypad inputs, allowing users to self-service inquiries and control various functions . This technology is increasingly used in telecommunications, automotive industries for hands-free operation, and telephony menus, broadening its application scope and enhancing user experience . This integration enables large-scale deployment and efficient information handling, marking a significant shift towards automated service models .

Speech recognition plays a critical role in modern technology applications such as voice user interfaces, call routing, appliance control, data entry, and more . Between 2010 and 2015, the speech recognition market was projected to grow at a compound annual growth rate (CAGR) of 8.8%, increasing the market value from $38.4 billion to $58.4 billion . The hardware sector is also expected to see substantial growth to support these technologies, with an estimated increase from $16.5 billion to $26.3 billion .

Voice recognition technologies face challenges from environmental factors like background noise, which affects detection accuracy . Poor input device quality, such as inadequate microphones, further impacts system performance. Addressing these challenges involves enhancing microphone quality, implementing noise-cancellation technologies, and ensuring consistent recording environments . Advanced algorithms capable of filtering environmental noise and adapting to various conditions can also improve reliability and user experience . Overall, system designers must prioritize both hardware and software solutions to mitigate these environmental challenges effectively .

You might also like