You are on page 1of 11

Machine Learning: voice and speech

recognition systems

Name Cara Henning


Student ID I6157630
E-Mail c.henning@student.maastrichtuniversity.nl
Course EBC2144 Commercialising Science and
Technology
Tutor Jasper Wanten
Tutorial group 1

1
Table of Contents

1. Introduction ................................................................................................... 3

2. Literature Review .......................................................................................... 4

3. Market analysis: The voice and speech recognition market ....................... 5

4. Cases: Amazon’s Alexa and its competitors ................................................. 6

5. Conclusion ..................................................................................................... 8

6. References ..................................................................................................... 9

6.1. Text Sources ........................................................................................... 9

6.2. Figure Sources ..................................................................................... 11

2
1. Introduction

The digital age, also known as information or computer age, is characterised by the fast growth
of technology and the accessible information and knowledge exchange (History of technology;
Cisco, 2017). With the rising of machine learning we are in a time where accessing information
has never been so easy and simple. Machine learning differs from normal computer systems in
a way that it does not follow hard coded rules that specify every step to solve a problem, but it
learns from example data and real situation. Therefore, Machine learning is a more intelligent
and efficient way to solve problems. The machines are learning just as humans do to adapt to
new situation based on past experience. Nowadays, Machine learning can be found in many
industries such as medicine, banking, or car industry. Machine learning is a growing market
ever since its first appearance. We use machine learning to help us out in every-day tasks as
doing our grocery shopping, asking about the weather, or looking for new movie
recommendations (The Royal Society, 2017). The main focus of this paper will be voice and
speech recognition systems, that have influenced us, especially in the last twenty years. Since
the launch of Google Voice search this market has been growing exponentially (Sonix, 2019).

This paper will first explain how machine learning works. Second, it will give a short overview
of the history of voice and speech recognition and further, define this voice and speech
recognition system and provide examples how people are using it for their everyday tasks.
Third, this paper will give insights in the market of voice and speech recognition. It will define
the different segments and show the forecasts for this industry. Finally, the paper will finish
with a real-life example about Amazon’s Alexa, an intelligent home assistant that has
revolutionised the voice and speech recognition industry.

3
2. Literature Review

"Machine learning is the technology that allows systems to learn directly from examples, data,
and experience"(The Royal Society, 2017). This field of computer science strives to develop a
self-learning computer system (Maastricht University, 2019). It is a form of artificial
intelligence that allows computers to adapt to a situation based on the knowledge it gained.
Data collection and processing power will enable computers to solve problems without
following specific steps and hard-coded rules as regular computers do. Through the collection
and processing of the data, machine learning algorithms and statistical model can be developed.
Machine learning makes the computer solve problems and making decisions as humans do. As
the computer gets more information and data, it can keep learning and adapt to the new
situation. The data collected is called training data since the computer learns from this
information and real-life situations. Due to machine learning information and knowledge have
become more accessible to humans (The Royal Society, 2017). Computers have been learning
for almost 50 years now, but the data and processing power we have today, make those
computers more capable in learning and adapting than before (Langley, 2011). Nowadays, we
use machine learning on a daily basis. There are many examples of machine learning in our
everyday life, such as email filtering or movie recommendation programs used by Netflix and
other streaming platforms or voice and speech recognition system as Apple's Siri (Kaur, 2019;
DeepAl, 2019).

This paper will mainly focus on voice and speech recognition and its evolution over the past
years. Voice and speech recognition is a computer system that can identify and decode the
human voice. This recognition system can write out a spoken text message, perform commands,
and answer questions (Computer Hope, 2019). This system was first used for numbers and not
words. Bell laboratories designed this system called "Audrey" in 1952, it was able to recognise
digits that were spoken out loud by a person. In the early 1960s, IBM introduced a system that
was able to understand and speak sixteen words in English (Sonix, 2019). At the beginning of
the 21st century, voice recognition technology did not evolve much until Google launched
Google Voice Search. Google made it voice and speech recognition accessible to millions of
people. Googles' English voice recognition system could identify and understand 230 billion
words. Google Voice Search allowed people to search while saying keywords out loud instead
of typing them (Google Mobile Blog, 2011). In 2011, Apple first introduced Siri, an intelligent
assistant that helped people to access information quickly and to navigate through their iPhones

4
(Bosker, 2013). Three years later, Amazon released Alexa, a virtual assistant who is part of
Amazon's Echo family. Alexa fulfils a person's spoken commands; it explains products that are
available on Amazon, answers question about the weather or news, and is able to play music.
Alexa is continually updated by Amazon to adapt to the speech patterns, vocabulary, and
preferences of each user (Black, 2019). Ever-since, there is an ongoing competition between
the largest tech companies, such as IBM or Google to be the most accurate in voice and speech
recognition technology. Since 2017, Google is the leader in voice and speech recognition
accuracy; they have reached a 95% accuracy in the English language which is the same rate for
humans, this was the first time that a machine was as accurate as humans (Suresh, 2019).

3. Market analysis: The voice and speech recognition market


The voice and speech recognition market is growing. According
to Mordor Intelligence (2019), the projected growth of the
global voice recognition market is at a CAGR of 16.8% during
the forecast period 2019 - 2024. "The global market is expected
to reach USD 21.7 billion by 2024" (Markets and Markets,
2019). There are several key players in the industry, such as Figure 1: voice and speech recognition market
forecast 2015 -2026
Apple Inc., Alphabet Inc., Microsoft Corporation, Nuance
Communication Inc., IBM, or Amazon.

The Asian Pacific market is expected to be the fastest-growing market in the next years. Mordor
Intelligence (2019) argues that China has the largest population and therefore, a high adoption
rate for voice and speech recognition. Although Asia might be the biggest market in the
forecasted future, North America is still the most important market in the present day. In 2018,
North America generated a revenue of USD 2.9 billion, which is almost one third more than
Europe’s generated revenue. North America is expected to further increase the usage of the
voice assistant in the upcoming years (Fortune Business insight, 2018).

Figure 2: Geographic market growth

5
According to the fortune Business insights (2018), there are several different segments in the
market. The market is segmented by end-users, component,
technology, deployment, and regional as can be seen in
figure 3. There are two subcategories in the segment by
deployment: Cloud-based and On-Premise voice and speech
recognition. The cloud-based voice recognition is a more
advanced tool for businesses; it has a higher accuracy then
Figure 3: Market Segments of the voice and speech
On-Premise voice and speech recognition. recognition market

The end-users are divided into different segments


such as IT and telecommunications, automotive,
retail, government, or healthcare as can be seen in
figure 4. The Fortune Business insights (2018)
states that “healthcare industry will hold the
highest share in terms of revenue generation in the
global market by end-user.” In the modern
Figure 4: Different categories of the End-User Segment
healthcare system voice and speech recognition is
increasingly used in emergency medicine, radiology, or pathology. For example, Amazon has
created a voice assistant that can notice through a person’s voice if that person is sick or
depressed and then sells products based on the person’s feelings (Brodkin, 2018).

4. Cases: Amazon’s Alexa and its competitors


Amazons Alexa was born out of the desire to better understand of Amazons customers. Back
in 2014, when Amazon placed Alexa in the market, the Echo family was powered by an
intelligent, smart speaker. Alexa is a virtual assistant that can turn on music, answer questions,
or help with the grocery shopping. Amazon themselves describe Alexa as a cloud-based voice
service and a smart home assistant (Ratnesh, 2018). Alexa has taken voice and speech
recognition technology to another level, where it not only listens and answers questions, but
can help people with daily tasks.

According to Forbes (2018), most customers of Alexa are male and most likely under 44 years
old. Most people keep their voice assistant in their living room, but there are a surprisingly
significant amount of people who keep Alexa in the kitchen. Even though the largest customer

6
segment is under 44 years old, Alexa is also used by older people, since Alexa is easy to
understand and use, especially for those who have physical or visual impairments.

Alexa's purpose is to make life easier for its customers. In a time where people feel busier than
ever, virtual assistants are quite famous as they help with everyday tasks. Alexa can do the
grocery shopping, add items on the Amazon shopping list, and even give tips and tricks for
cooking. Furthermore, it can help while working; people can more easily multitask with Alexa
as their assistant (Ratnesh, 2018). Additionally, people can make phone calls with others that
have an Amazon Echo device (Profis, 2019).

Ever since Amazon launched the Echo family with Alexa, they have had the lead in the US
voice recognition and smart home market (Kim, 2019). But what about Alexa's competitors as
Apple's Siri? After Amazon launched Alexa, Apple and Google came up with other smart
speakers for people's home. Apple that acquired Siri in 2010 from a 24 people start-up company
used this voice and speech recognition system since they launched their iPhone 4S in 2011
(Bosker, 2017). Early 2018, Apple tried to compete with Alexa and created the HomePod with
Siri (Apple, 2019). The difference between Apple's HomePod and Alexa is that it is more
expensive, but for Apple customers more convenient. Although Apple device is similar to
Amazon's cloud-based voice and speech
recognition system, it has only 6% of the US
market share (Kim, 2019). Two years after
the launch of Alexa, Google introduced a
similar device Google Home with Google
Assistant. Google Home currently has the
second largest market share in the US market
(Marketing Land, 2019). Figure 5: US market share for smart speaker

7
5. Conclusion

Firstly, the paper states that machine learning has gained importance over the past decades. In
the digital age, people want easy, fast, and accessible information and knowledge. Machine
learning provides this accessible knowledge. The self-learning computer system will help solve
many problems while it uses data and past experience to adapt to its situation. We can find
machine learning in our daily lives, for example, the movie recommendations on Netlfix, or
assistance for grocery shopping. Even though machine learning has been around for a long time
now, it is not yet fully exploit.

Secondly, this essay concentrates on one of the many different categories that exist in machine
learning. Having focused on voice and speech recognition systems, we found that this
technology has been around for a longer time, but the market only started booming since the
21st century. Moreover, we have gained a clear perception of the industry's key players,
forecast, and segments. The voice and speech recognition market is exponentially growing, and
the competition among many big tech companies such as Amazon, IBM, or Google is high.
This market leaves many opportunities for more development and improvements.

Finally, as discussed in Alexa's example, intelligent assistants can help people manage their
daily tasks and therefore, make their lives easier. Amazon's Echo family and especially Alexa
have revolutionised the voice and speech recognition market and created a smart speaker.
Competitors, as Google and Apple are following Amazon's path.

To conclude, machine learning, especially the voice and speech recognition market or smart
speaker market, are still growing, and many new developments in this industry have yet to
come.

8
6. References
6.1. Text Sources

Apple. (2019). Siri does more than ever even before you ask. At
https://www.apple.com/siri/

Apple. (2019). Weitere Funktionen von Siri auf dem HomePod. Apple Support. At
https://support.apple.com/de-de/HT208336

Black, M. (2019). What is Amazon Echo? A Complete Guide. Tech Advisor. At


https://www.techadvisor.co.uk/news/audio/amazon-echo-3584881/

Brodkin, J. 2018. Amazon patents Alexa tech to tell if you’re sick, depressed and sell you
meds. Ars Technica. At https://arstechnica.com/gadgets/2018/10/amazon-patents-alexa-tech-
to-tell-if-youre-sick-depressed-and-sell-you-meds/#

Brody, B. (2019). 'Sorry, but I am too busy to talk to you right now': this is our modern
life reality. The Guardian. At https://www.theguardian.com/commentisfree/2019/aug/25/too-
busy-americans-leisure-time-work-life-balance

Bosker, B. (2016). SIRI RISING: The Inside Story Of Siri’s Origins — And Why She
Could Overshadow The iPhone. Huffpost. At https://www.huffpost.com/entry/siri-do-engine-
apple-
iphone_n_2499165?guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer
_sig=AQAAAJ2LVX0xfqkZTB5uxlm2ZPZ33k8ZyuTkXe8cRGe89wtUAU92B6GHYW2zoj
hHP7mNBWbfGQx4-1dipYvQw_PV4iknlRgBSIBySC9uGzQReY-
RbSmjOzjQZ2nShg8Ka0UozRwXQPCfqeG8sOBOeJi54odXxiiV4VINI4QFtTooLzFW&guc
counter=2

Computer Hope. (2019). Voice recognition At


https://www.computerhope.com/jargon/v/voicreco.htm

DeepAl. (2019). Machine learning. At https://deepai.org/machine-learning-glossary-and-


terms/machine-learning

Fortune Business Insights. (2018). Speech and Voice Recognition Market Size, Share &
Industry Analysis, By Component, By Technology, By Deployment, By End-User, and
Regional Forecast, 2019 – 2026. At https://www.fortunebusinessinsights.com/industry-
reports/speech-and-voice-recognition-market-101382#

History of Technology. At https://historyoftechnologyif.weebly.com/information-age.html

Johnson, K., et al. (2012). The Innovative Success that is Apple, Inc.. Marshall
University. Marshall digital Scholar. At
https://mds.marshall.edu/cgi/viewcontent.cgi?article=1420&context=etd

Kaur, A. (2019). Top 10 real-life examples of Machine Learning. Big data made simple.
At https://bigdata-madesimple.com/top-10-real-life-examples-of-machine-learning/
9
Koskal, I. (2018). Who's the Amazon Alexa Target Market, Anyway?. Forbes. At
https://www.forbes.com/sites/ilkerkoksal/2018/10/10/whos-the-amazon-alexa-target-market-
anyway/#72ad413b2eb5

Langley, P. (2011). The Changing Science of Machine Learning. Springer. At


https://link.springer.com/content/pdf/10.1007%2Fs10994-011-5242-y.pdf

Maastricht University: (2019). Machine learning. At


https://www.maastrichtuniversity.nl/research/institutes/dke/research/machine-learning-ml

Markets and Marktes. (2019). Speech and Voice Recognition Market by Technology,
Vertical, Deployment, and Geography - Global Forecast to 2024. At
https://www.marketsandmarkets.com/Market-Reports/speech-voice-recognition-market-
202401714.html

Mordor Intelligence. (2019). VOICE RECOGNITION MARKET - GROWTH, TRENDS,


AND FORECAST (2019 - 2024). At https://www.mordorintelligence.com/industry-
reports/voice-recognition-market

Prist, A. (2018). Alexa Skills Evolution: From the Very Beginning to Present Days. Voice
IU. At https://medium.com/voiceui/alexa-skills-evolution-from-the-very-beginning-to-
present-days-e506a3886ee6

Prowohl, T. (2018). ALLE Siri-Befehle auf einen Blick. Techbook. At


https://www.techbook.de/mobile/alle-siri-befehle-auf-einen-blick

Royal Society Working Group. (2017). Machine learning: the power and promise of
computers that learn by example. Technical report. At https://royalsociety.org/topics-
policy/projects/machine-learning/

Sentryo. (2017). New production technology for Industry 4.0. At


https://www.sentryo.net/new-production-technology-industry-4-0/

Sonix. (2019). A short history of speech recognition. At https://sonix.ai/history-of-


speech-recognition

Sterling, G. (2019). Alexa devices maintain 70% market share in U.S. according to
survey. Marketing Land. At https://marketingland.com/alexa-devices-maintain-70-market-
share-in-u-s-according-to-survey-265180

Suresh, N. (2019). Google Assistant now provides 95% Accuracy!. Let’s think easy. At
https://letsthinkeasy.com/google-assistant-provides-accuracy/

Xperience. (2017). Cloud Vs On Premise Software: Which is Best For Your Business?. At
https://www.xperience-group.com/blog/cloud-vs-on-premise-software/

10
6.2. Figure Sources

Figure 1: Fortune Business Insights. (2018). Speech and Voice Recognition Market Size,
Share & Industry Analysis, By Component, By Technology, By Deployment, By End-User,
and Regional Forecast, 2019 – 2026. At https://www.fortunebusinessinsights.com/industry-
reports/speech-and-voice-recognition-market-101382#

Figure 2: Mordor Intelligence. (2019). VOICE RECOGNITION MARKET - GROWTH,


TRENDS, AND FORECAST (2019 - 2024). At
https://www.mordorintelligence.com/industry-reports/voice-recognition-market

Figure 3: Fortune Business Insights. (2018). Speech and Voice Recognition Market Size,
Share & Industry Analysis, By Component, By Technology, By Deployment, By End-User,
and Regional Forecast, 2019 – 2026. At https://www.fortunebusinessinsights.com/industry-
reports/speech-and-voice-recognition-market-101382#

Figure 4: Fortune Business Insights. (2018). Speech and Voice Recognition Market Size,
Share & Industry Analysis, By Component, By Technology, By Deployment, By End-User,
and Regional Forecast, 2019 – 2026. At https://www.fortunebusinessinsights.com/industry-
reports/speech-and-voice-recognition-market-101382#

Figure 5: Sterling, G. (2019). Alexa devices maintain 70% market share in U.S. according to
survey. Marketing Land. At https://marketingland.com/alexa-devices-maintain-70-market-
share-in-u-s-according-to-survey-265180

11

You might also like