0% found this document useful (0 votes)
215 views52 pages

Speech Recognition for Solitary Care

This project aims to develop a virtual personal assistant using speech recognition and the Alexa Voice Service. The assistant is designed to provide interaction and entertainment for people who suffer from loneliness due to absence of others, such as those undergoing cancer treatment or elderly individuals. The system utilizes a Raspberry Pi single-board computer along with a microphone and speaker to enable two-way voice conversations with the assistant using speech recognition and text-to-speech capabilities. The assistant is programmed to respond to user queries by accessing online sources to provide information and entertainment through natural language conversations.

Uploaded by

Ajit Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
215 views52 pages

Speech Recognition for Solitary Care

This project aims to develop a virtual personal assistant using speech recognition and the Alexa Voice Service. The assistant is designed to provide interaction and entertainment for people who suffer from loneliness due to absence of others, such as those undergoing cancer treatment or elderly individuals. The system utilizes a Raspberry Pi single-board computer along with a microphone and speaker to enable two-way voice conversations with the assistant using speech recognition and text-to-speech capabilities. The assistant is programmed to respond to user queries by accessing online sources to provide information and entertainment through natural language conversations.

Uploaded by

Ajit Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Project Name

A graduate project report submitted to AKTU in partial fulfillment


of the requirement for the award of the degree of

Bachelor of Technology
In

Computer Science and Engineering

SUBMITED BY UNDER GUIDANCE OF

Student Name1 (Roll No.) Name of Supervisor


Student Name2 (Roll No.) (Designation)
Student Name3 (Roll No.)
Student Name4 (Roll No.)
Student Name5 (Roll No.) Computer Science & Engineering

Department of Computer Science & Engineering

Ashoka Institute of Technology and Management


(A constitute of Dr. A.P.J. Abdul Kalam Technical University)
Varanasi -221007

August, 2021
CERTIFICATE

This is to Certify that Student Name1, Student Name2, Student Name3, Student Name4 and
Student Name5 has carried out the Project work presented in the Project Report entitled “Project
Name” for the award of Bachelor of Technology (Computer Science & Engineering) from
Ashoka Institute of Technology & Management, Varanasi in partial fulfillment of the
requirement for the award of degree of B. Tech. in Computer Science & Engineering, is a record of
the candidate’s own work carried out by them under my supervision. The matter embodied in this
report is original and has not been submitted for the award of any other degree.

Forwarded By:
(Name of Supervisor)
Mr. Arvind Kumar
Designation
Assistant Prof. & Head of Department
(Department of CSE)
(Department of CSE)

Date- ………………….

2
DECLARATION

We hereby declare that the project entitled “Project Name” submitted by us in the partial
fulfillment of the requirement for the award of the degree of Bachelor of Technology
(Computer Science & Engineering) of Dr. A.P.J. Abdul Kalam Technical University,
Lucknow is record of our own work carried under the supervision and guidance of Name of
Supervisor.
To the best of our knowledge this project have not been submitted to any other University or
Institute for the award of degree.

Signature : Signature :
Name : Student Name1 Name : Student Name2
Roll No. : Roll No. :
Date : Date :

Signature : Signature :
Name : Student Name3 Name : Student Name4
Roll No. : Roll No. :
Date : Date :

Signature :
Name : Student Name5
Roll No. :
Date :

3
ACKNOWLEDGEMENT

In performing our project, we had to take the help and guideline of some respected persons,
who deserves our gratitude. The completion of this project gives us much pleasure.
I take this opportunity to express my deep gratitude and deep regard to my guide Name of
Supervisor, Designation, Department of Computer Science & Engineering, Ashoka Institute
of Technology & Management, Varanasi, for his/her exemplary guidance, monitoring and
encouragement throughout the course of this project. And he is also our Internal Guide for
helping us to complete out this project work successfully.

We also take the opportunity to acknowledge the contribution of Mr. Arvind Kumar,
Assistant Professor & Head of Department, Computer Science & Engineering, of College
Ashoka Institute of Technology & Management, Varanasi, for his full support and assistance
during the development of the project.
A sincere thanks to all my project team they performed very well and their constant effort is
really appreciable. Their dedication encouraged me to perform well, I really admire their
company, and it was a great experience to work with them.
We are thankful to all our faculty member for their cooperation, invaluably constructive
criticism and friendly advice during the project. We are also thankful to our colleague and
classmate who helped us in compilation of this project.
Finally, yet importantly, we would like to express my heartfelt thanks to my beloved parents
for their blessings. I perceive as this opportunity as a big milestone in my career
development. I will strive to use gained skill and knowledge in the best possible way, and will
work on their improvement, in order to continue cooperation with all of you in the future.

4
ABSTRACT

Speech recognition is one of the most recently developing field of research at both industrial and
scientific levels. Until recently, the idea of holding a conversation with a computer seemed pure
science fiction. If you asked a computer to “open the pod bay doors”—well, that was only in
movies. But things are changing, and quickly. A growing number of people now talk to their mobile
smart phones, asking them to send e-mail and text messages, search for directions, or find
information on the Web. Our Project aims at one such application. Project was designed keeping in
mind, the various categories of people who suffer from loneliness due to absence of others to care
for them, especially the ones who are under cancer treatment and old aged people. The system will
provide interaction and entertainment.

5
LIST OF FIGURES

FIGURE TITLE OF FIGURE PAGENO.


FIG 1 Raspberry Pi 3 Model B 14
FIG 2 Microphone and Sound Card 16
FIG 3 Raspbian OS icon 16
FIG 4 Amazon Developer Account 25
FIG 5 Installation of alexa services 26
FIG 6 Starting companion services 27
and npm.
FIG 7 Starting our application. 28
FIG 8 Activating wake word engine 29
FIG 9 Improving the microphone 30
through command line
FIG 10 System Working Flowchart 31
FIG 11 Information Exchange in AVS 32
FIG 12 Hardware Architecture of 32
AVS Client
FIG 13 Internet Connection Problem 48
FIG 14 GSM Quad Band 800 48
FIG 15 Home automation possibilities 49
FIG 16 . Car Automation 51

6
LIST OF ABBREVIATIONS

ABBREVIATION DESCRIPTION
AVS ALEXA VOICE SERVICE
RPi RASPBERRY PI
STT SPEECH TO TEXT
TTS TEXT TO SPEECH
SR SPEECH RECOGNIYION
ASR AUTOMATIC SPEECH RECOGNITION
ADC ANALOG TO DIGITAL CONVERTER
CU CONTROL UNIT
IOT INTENET OF THINGS
DNN DEEP NEURAL NETWORK
Wi-Fi WIRELESS FIDELITY
OS OPERATING SYSTEM

7
CONTENTS

1. OBJECTIVE …………………………………………………………………….. 9
2. INTRODUCTION ……………………………………………………………... 11
2.1 WHAT IS SPEECH RECOGNITION? ……………………………. ……………………… 11
2.2 WHAT IS VIRTUAL PERSONAL ASSISTANT? ………………………………………... 12
2.3 WHAT IS RASPBERRY PI? ………………………………………………………………. 13
3. SYSTEM REQUIREMENTS ...…………………………………………………14
3.1 HARDWARE REQUIREMENTS .…………………………………………………………..15
3.2 SOFTWARE REQUIREMENTS ……………………………………………………………16
4. IMPLEMENTATION ………...………………………………………………...19
4.1 ALGORITHMS ……………………………………………………………………………...19
4.2 SETTING UP RASPBERRY PI ……………….…………………………………………….21
4.3 STEPS TO SETUP ALEXA VOICE SERVICE .……………………………………………25
4.4 FLOWCHART………………………………………………………………………………..31
4.5 BLOCK DIAGRAM………………………………………………………………………….32
4.6 CODING’S ...…………………………………………………………………………………32
5. TESTING …………………………………………………………………………45
5.1 UNIT TESTING ……………………………………………………………………………..46
5.2 SYSTEM TESTING ………………………………………………………………………....47
5.3 INTEGRATION TESTING …………………………………………………………………47
6. FUTURE ENHANCEMENTS...………………………………………………..47
7. APPLICATIONS ..……………………………………………………………...49
8. REFERENCES ..………………………………………………………………..52

8
1. OBJECTIVE

Providing information and entertainment, to otherwise solitary people, hence acts as a personal
assistant.
People with disabilities can benefit from speech recognition programs. For individuals that are
Deaf or Hard of Hearing, speech recognition software is used to automatically generate a closed-
captioning of conversations such as discussions in conference rooms, classroom lectures, and/or
religious services.

Speech recognition is also very useful for people who have difficulty using their hands, ranging
from mild repetitive stress injuries to involved disabilities that preclude using conventional
computer input devices. In fact, people who used the keyboard a lot and developed RSI became an
urgent early market for speech recognition. Speech recognition is used in deaf telephony, such as
voicemail to text, relay services, and captioned telephone. Individuals with learning disabilities who
have problems with thought-to-paper communication (essentially they think of an idea but it is
processed incorrectly causing it to end up differently on paper) can possibly benefit from the
software but the technology is not bug proof. Also the whole idea of speak to text can be hard for
intellectually disabled person's due to the fact that it is rare that anyone tries to learn the technology
to teach the person with the disability.

Being bedridden can be very difficult for many patients to adjust to and it can also cause other
health problems as well. It is important for family caregivers to know what to expect so that they
can manage or avoid the health risks that bedridden patients are prone to. In this article we would
like to offer some information about common health risks of the bedridden patient and some tips for
family caregivers to follow in order to try and prevent those health risks.

Depression is also a very common health risk for those that are bedridden because they are unable
to care for themselves and maintain the social life that they used to have. Many seniors begin to feel
hopeless when they become bedridden but this can be prevented with proper care. Family
caregivers should make sure that they are caring for their loved one’s social and emotional needs as
well as their physical needs. Many family caregivers focus only on the physical needs of their loved
ones and forget that they have emotional and social needs as well. Family caregivers can help their
loved ones by providing them with regular social activities and arranging times for friends and
other family members to come over so that they will not feel lonely and forgotten. Family
caregivers can also remind their loved ones that being bedridden does not necessarily mean that
they have to give up everything they used to enjoy.
9
But since family members won’t always be available at home, the above mentioned problems are
still prevalent in these patients, hence our interactive system will provide them with entertainment
(music, movies), and voice responses to general questions. Therefore it behaves as an electronic
companion

10
2. INTRODUCTION

Throughout this project we mainly talk about the speech recognition, raspberry pi and virtual
personal assistant. So before moving further we must have knowledge about these terms as
mentioned above.

2.1 WHAT IS SPEECH RECOGNITION

In computer science and electrical engineering, speech recognition (SR) is the translation of spoken
words into text. It is also known as "automatic speech recognition" (ASR), "computer speech
recognition", or just "speech to text" (STT).

Some SR systems use "training" (also called "enrolment") where an individual speaker reads text or
isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to
fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do
not use training are called "speaker independent" systems. Systems that use training are called
"speaker dependent".

Speech recognition applications include voice user interfaces such as voice dialing (e.g. "Call
home"), call routing (e.g. "I would like to make a collect call"), domestic appliance control, search
(e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit
card number), preparation of structured documents (e.g. a radiology report), speech-to-text
processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input).

The term voice recognition or speaker identification refers to identifying the speaker, rather than
what they are saying. Recognizing the speaker can simplify the task of translating speech in
systems that have been trained on a specific person's voice or it can be used to authenticate or verify
the identity of a speaker as part of a security process. From the technology perspective, speech
recognition has a long history with several waves of major innovations. Most recently, the field has
benefited from advances in deep learning and big data. The advances are evidenced not only by the
surge of academic papers published in the field, but more importantly by the world-wide industry
adoption of a variety of deep learning methods in designing and deploying speech recognition
systems. These speech industry players include Microsoft, Google, IBM, Baidu (China), Apple,
Amazon, Nuance, IflyTek (China), many of which have publicized the core technology in their
speech recognition systems being based on deep learning.

Now the rapid rise of powerful mobile devices is making voice interfaces even more useful and
pervasive. Jim Glass, a senior research scientist at MIT who has been working on speech interfaces
since the 1980s, says today’s smart phones pack as much processing power as the laboratory
machines he worked with in the ’90s. Smart phones also have high-bandwidth data connections to
the cloud, where servers can do the heavy lifting involved with both voice recognition and
understanding spoken queries. “The combination of more data and more computing power means
you can do things today that you just couldn’t do before,” says Glass. “You can use more
sophisticated statistical models.”

The most prominent example of a mobile voice interface is, of course, Siri, the voice-activated
personal assistant that comes built into the latest iPhone. But voice functionality is built into
11
Android, the Windows Phone platform, and most other mobile systems, as well as many apps.
While these interfaces still have considerable limitations (see Social Intelligence), we are inching
closer to machine interfaces we can actually talk to.

In 1971, DARPA funded five years of speech recognition research through its Speech
Understanding Research program with ambitious end goals including a minimum vocabulary size
of 1,000 words. BBN. IBM. Carnegie Mellon and Stanford Research Institute all participated in the
program. The government funding revived speech recognition research that had been largely
abandoned in the United States after John Pierce's letter. Despite the fact that CMU's Harpy system
met the goals established at the outset of the program, many of the predictions turned out to be
nothing more than hype disappointing DARPA administrators. This disappointment led to DARPA
not continuing the funding. Several innovations happened during this time, such as the invention of
beam search for use in CMU's Harpy system. The field also benefited from the discovery of several
algorithms in other fields such as linear predictive coding and cepstral analysis.

2.2 WHAT IS VIRTUAL PERSONAL ASSISTANT

A virtual assistant is a software agent that can perform tasks or services for an individual.
Sometimes the term "chatbot" is used to refer to virtual assistants generally or specifically those
accessed by online chat (or in some cases online chat programs that are for entertainment and not
useful purposes).
As of 2017, the capabilities and usage of virtual assistants are expanding rapidly, with new products
entering the market. An online poll in May 2017 found the most widely used in the US
were Apple's Siri (34%), Google Assistant (19%), Amazon Alexa (6%), and Microsoft
Cortana (4%).[1] Apple and Google have large installed bases of users on smartphones. Microsoft
has a large installed base of Windows-based personal computers, smartphones and smart speakers.
Alexa has a large install base for smart speakers.
The first tool enabled to perform digital speech recognition was the IBM Shoebox, presented to the
general public during the 1962 Seattle World's Fair after its initial market launch in 1961. This
early computer, developed almost 20 years before the introduction of the first IBM Personal
Computer in 1981, was able to recognize 16 spoken words and the digits 0 to 9. The next milestone
in the development of voice recognition technology was achieved in the 1970s at the Carnegie
Mellon University in Pittsburgh, Pennsylvania with substantial support of the United States
Department of Defense and its DARPA agency. Their tool "Harpy" mastered about 1000 words, the
vocabulary of a three-year-old. About ten years later the same group of scientists developed a
system that could analyze not only individual words but entire word sequences enabled by a Hidden
Markov Model.
The earliest virtual assistants, which applied speech recognition software were automated
attendant and medical digital dictation software. In the 1990s digital speech recognition technology
became a feature of the personal computer with Microsoft, IBM, Philips and Lernout &
Hauspie fighting for customers. Much later the market launch of the first smartphone IBM Simon in
1994 laid the foundation for smart virtual assistants as we know them today. The first modern
digital virtual assistant installed on a smartphone was Siri, which was introduced as a feature of
the iPhone 4S on October 4, 2011.

12
2.3 WHAT IS RASPBERRY PI?

The Raspberry Pi is a low cost, credit-card sized computer that plugs into a computer monitor or
TV, and uses a standard keyboard and mouse. It is a capable little device that enables people of all
ages to explore computing, and to learn how to program in languages like Scratch and Python. It’s
capable of doing everything you’d expect a desktop computer to do, from browsing the internet and
playing high-definition video, to making spreadsheets, word-processing, and playing games.

What’s more, the Raspberry Pi has the ability to interact with the outside world, and has been used
in a wide array of digital maker projects, from music machines and parent detectors to weather
stations and tweeting birdhouses with infra-red cameras. We want to see the Raspberry Pi being
used by kids all over the world to learn to program and understand how computers work.

The Raspberry Pi Foundation is a registered educational charity (registration number


1129409) based in the UK. Our Foundation’s goal is to advance the education of adults and
children, particularly in the field of computers, computer science and related subjects.

The organization behind the Raspberry Pi consists of two arms. The first two models were
developed by the Raspberry Pi Foundation. After the Pi Model B was released, the Foundation set
up Raspberry Pi Trading, with Eben Upton as CEO, to develop the third model, the B+. Raspberry
Pi Trading is responsible for developing the technology while the Foundation is an educational
charity to promote the teaching of basic computer science in schools and in developing countries.

The Foundation provides Raspbian, a Debian-based Linux distribution for download, as well as


third-party Ubuntu, Windows 10 IoT Core, RISC OS, and specialized media center distributions. It
promotes Python and Scratch as the main programming language, with support for many other
languages. The default firmware is closed source, while an unofficial open source is available.

3. SYSTEM REQUIREMENTS
The project needs both hardware and software components. The hardware components includes, the
Raspberry Pi model B, 5Amp. Charger, keyboard, mouse, earphones, microphone with sound card,
Ethernet cable, HDMI screen and HDMI cable. Software components are Raspbian OS on SD card,
java and python compiler and the online resource’s alexa speech API. They are described in detail
below:

3.1 HARDWARE COMPONENTS

1. RASPBERRY PI 3 MODEL B

The Raspberry Pi is a series of credit card–sized single-board computers developed in the UK by


the Raspberry Pi Foundation with the intention of promoting the teaching of basic computer science
in schools.

The system is developed through ARM microprocessor ARM is a registered trademark of ARM
Limited. Linux now provides support for the ARM-11 family processors; it gives consumer device

13
manufacturers, commercial-quality Linux implementation along with tools to reduce time-to-
market and development costs. Raspberry Pi is a credit card sized computer development platform
based on a BCM2835 system on chip, sporting an ARM11 processor, developed in the UK by
Raspberry Pi Foundation. Raspberry Pi model functions as a regular desktop computer when it is
connected to the keyboard or monitor. Raspberry Pi is very cheap and most reliable to make a
Raspberry Pi supercomputer. The Raspberry Pi uses Linux kernel-based.

Fig 1. Raspberry Pi 3 Model B

The Foundation provides Debian and Arch Linux ARM distributions for download. Tools are
available for Python as the main programming language, with support for BBC BASIC (via the
RISC OS image or the Brandy Basic clone for Linux), C, C++, Java, Perl and Ruby.

SPECIFICATIONS INCLUDE:

1. SoC: Broadcom BCM2835 (CPU, GPU, DSP, SDRAM, one USB port)
2. CPU: 700 MHz single-core ARM1176JZF-S
3. GPU: Broadcom Video Core IV @ 250 MHz
4. OpenGL ES 2.0 (24 GFLOPS)
5. MPEG-2 and VC-1 (with license), 1080p30 H.264/MPEG-4 AVC high-profile decoder and
encoder
6. Memory (SDRAM):512 MB (shared with GPU) as of 15 October 2012
7. USB 2.0 ports: 2 (via the on-board 3-port USB hub)
8. Video outputs: HDMI (rev 1.3 & 1.4), 14 HDMI resolutions from 640×350 to 1920×1200 plus
various PAL and NTSC standards, composite video (PAL and NTSC) via RCA jack
9. Audio outputs: Analog via 3.5 mm phone jack; digital via HDMI and, as of revision 2 boards,
I²S
10. On-board storage: [SD / MMC / SDIO card slot
11. On-board network: [11]10/100 Mbit/s Ethernet (8P8C) USB adapter on the third/fifth port of
the USB hub (SMSC lan9514-jzx) [42]
12. Low -level peripherals: 8× GPIO plus the following, which can also be used as GPIO: UART,
I²C bus, SPI bus with two chip selects, I²S audio +3.3 V, +5 V, ground.
13. The Power ratings: 700 mA (3.5 W)
14. Power source: 5V via Micro USB or GPIO header
15. Size: 85.60 mm × 56.5 mm (3.370 in × 2.224 in) – not including protruding connectors
16. Weight: 45 g (1.6oz)
14
The main differences between the two flavors of Pi are the RAM, the number of USB 2.0 ports and
the fact that the Model A doesn’t have an Ethernet port (meaning a USB Wi-Fi is required to access
the internet. While that results in a lower price for the Model A, it means that a user will have to
buy a powered USB hub in order to get it to work for many projects. The Model A is aimed more at
those creating electronics projects that require programming and control directly from the command
line interface. Both Pi models use the Broadcom BCM2835 CPU, which is an ARM11-based
processor running at 700MHz.
There are overclocking modes built in for users to increase the speed as long as the core doesn’t get
too hot, at which point it is throttled back. Also included is the Broadcom Video Core IV GPU with
support for OpenGL ES 2.0, which SD can perform 24 Glops and decode and play H.264 video at
1080p resolution. Originally the Model A was due to use 128MB RAM, but this was upgraded to
256MB RAM with the Model B going from 256MB to 512MB. The power supply to the Pi is via
the 5V micro USB socket. As the Model A has fewer powered interfaces it only requires 300mA,
compared to the 700mA that the Model B needs. The standard system of connecting the Pi models
is to use the HDMI port to connect to an HDMI socket on a TV or a DVI port on a monitor. Both
HDMI-HDMI and HDMI-DVI cables work well, delivering 1080p video, or 1920x1080. Sound is
also sent through the HDMI connection, but if using a monitor without speakers then there’s the
standard 3.5mm jack socket for audio. The RCA composite video connection was designed for use
in countries where the level of technology is lower and more basic displays such as older TVs are
used.

2. SOUND CARD WITH MICROPHONE

Sound card is used since raspberry pi has no on board ADC,


A sound card (also known as an audio card) is an internal computer expansion card that facilitates
economical input and output of audio signals to and from a computer under control of computer
programs. The term sound card is also applied to external audio interfaces that use software to
generate sound, as opposed to using hardware inside the PC. Typical uses of sound cards include
providing the audio component for multimedia applications such as music composition, editing
video or audio, presentation, education and entertainment (games) and video projection.

Fig 2. Microphone and Sound Card

Sound functionality can also be integrated onto the motherboard, using components similar to plug-
in cards. The best plug-in cards, which use better and more expensive components, can achieve
higher quality than integrated sound. The integrated sound system is often still referred to as a
"sound card". Sound processing hardware is also present on modern video cards with HDMI to
output sound along with the video using that connector; previously they used a SPDIF connection
to the motherboard or sound card. We are using Quantum sound card and Quantum collar mic.

15
3.2 SOFTWARE REQUIRED
1. RASPBIAN OS

Although the Raspberry Pi’s operating system is closer to the Mac than Windows, it’s the latter that
the desktop most closely resembles
It might seem a little alien at first glance, but using Raspbian is hardly any different to using
Windows (barring Windows 8 of course). There’s a menu bar, a web browser, a file manager and
no shortage of desktop shortcuts of pre-installed applications.
Raspbian is an unofficial port of Debian Wheezy arm with compilation settings adjusted to produce
optimized "hard float" code that will run on the Raspberry Pi. This provides significantly faster
performance for applications that make heavy use of floating point arithmetic operations. All other
applications will also gain some performance through the use of advanced instructions of the
ARMv6 CPU in Raspberry Pi.

Fig 3. Raspbian OS icon.

Although Raspbian is primarily the efforts of Mike Thompson (MP Thompson) and Peter Green
(plug wash), it has also benefited greatly from the enthusiastic support of Raspberry Pi community
members who wish to get the maximum performance from their device.

2. ALEXA VOICE SERVICE

The Alexa Voice Service (AVS) allows developers to voice-enable connected products with a
microphone and speaker. Once integrated, your product will have access to the built in capabilities
of Alexa (like music playback, timers and alarms, package tracking, movie listings, calendar
management, and more) and third-party skills developed using the Alexa Skills Kit.

AVS is comprised of interfaces that correspond to client-functionality, like speech recognition,


audio playback, and volume control. Each interface contains logically grouped messages called
directives and events. Directives are messages sent from the cloud instructing your client to take
action. Events are messages sent from your client to the cloud notifying Alexa something has
occurred.

16
3. PUTTY

Putty is a free and open-source terminal emulator, serial console and network file transfer
application. It supports several network protocols, including SCP, SSH, Telnet, rlogin, and raw
socket connection. It can also connect to a serial port (since version 0.59). The name "Putty" has no
definitive meaning. Putty was originally written for Microsoft Windows, but it has been ported to
various other operating systems. Official ports are available for some Unix-like platforms, with
work-in-progress ports to Classic Mac OS and Mac OS X, and unofficial ports have been
contributed to platforms such as Symbian, Windows Mobile and Windows Phone.
Putty was written and is maintained primarily by Simon Tatham and is currently beta software.

4. PYTHON 2.7

Python is a widely used high-level, general-purpose, interpreted, dynamic programming language.


Its design philosophy emphasizes code readability, and its syntax allows programmers to express
concepts in fewer lines of code than would be possible in languages such as C++ or Java. The
language provides constructs intended to enable clear programs on both a small and large scale.

Python supports multiple programming paradigms, including object-oriented, imperative and


functional programming or procedural styles. It features a dynamic type system and automatic
memory management and has a large and comprehensive standard library. Python interpreters are
available for installation on many operating systems, allowing Python code execution on a wide
variety of systems. Using third-party tools, such as Py2exe or Installer, Python code can be
packaged into stand-alone executable programs for some of the most popular operating systems,
allowing the distribution of Python-based software for use on those environments without requiring
the installation of a Python interpreter.

CPython, the reference implementation of Python, is free and open-source software and has a
community-based development model, as do nearly all of its alternative implementations. CPython
is managed by the non-profit Python Software Foundation.

WHY PYTHON 2.7?

If you can do exactly what you want with Python 3.x, great! There are a few minor downsides, such
as slightly worse library support1 and the fact that most current Linux distributions and Macs are
still using 2.x as default, but as a language Python 3.x is definitely ready. As long as Python 3.x is
installed on your user's computers (which ought to be easy, since many people reading this may
only be developing something for themselves or an environment they control) and you're writing
things where you know none of the Python 2.x modules are needed, it is an excellent choice. Also,
most Linux distributions have Python 3.x already installed, and all have it available for end-users.
Some are phasing out Python 2 as preinstalled default.

17
4. IMPLEMENTATION
Both acoustic modeling and language modeling are important parts of modern statistically based
speech recognition algorithms. Hidden Markov models (HMMs) are widely used in many systems.
Language modeling is also used in many other natural language processing applications such as
document classification or statistical machine translation.

4.1 ALGORITHMS

Hidden Markov Model

Modern general-purpose speech recognition systems are based on Hidden Markov Models. These
are statistical models that output a sequence of symbols or quantities. HMMs are used in speech
recognition because a speech signal can be viewed as a piecewise stationary signal or a short-time
stationary signal. In a short time-scale (e.g., 10 milliseconds), speech can be approximated as a
stationary process. Speech can be thought of as a Markov model for many stochastic purposes.

Another reason why HMMs are popular is because they can be trained automatically and are simple
and computationally feasible to use. In speech recognition, the hidden Markov model would output
a sequence of n-dimensional real-valued vectors (with n being a small integer, such as 10),
outputting one of these every 10 milliseconds. The vectors would consist of cepstral coefficients,
which are obtained by taking a Fourier transform of a short time window of speech and de-
correlating the spectrum using a cosine transform, then taking the first (most significant)
coefficients. The hidden Markov model will tend to have in each state a statistical distribution that
is a mixture of diagonal covariance Gaussians, which will give a likelihood for each observed
vector. Each word, or (for more general speech recognition systems), each phoneme, will have a
different output distribution; a hidden Markov model for a sequence of words or phonemes is made
by concatenating the individual trained hidden Markov models for the separate words and
phonemes.

Described above are the core elements of the most common, HMM-based approach to speech
recognition. Modern speech recognition systems use various combinations of a number of standard
techniques in order to improve results over the basic approach described above. A typical large-
vocabulary system would need context dependency for the phonemes (so phonemes with different
left and right context have different realizations as HMM states); it would use cepstral
normalization to normalize for different speaker and recording conditions; for further speaker
normalization it might use vocal tract length normalization (VTLN) for male female normalization
and maximum likelihood linear regression (MLLR) for more general speaker adaptation. The
features would have so-called delta and delta-delta coefficients to capture speech dynamics and in
addition might use heteroscedastic linear discriminant analysis (HLDA); or might skip the delta and
delta-delta coefficients and use splicing and an LDA-based projection followed perhaps by
heteroscedastic linear discriminant analysis or a global semi-tied co variance transform (also known
as maximum likelihood linear transform, or MLLT). Many systems use so-called discriminative
training techniques that dispense with a purely statistical approach to HMM parameter estimation
and instead optimize some classification-related measure of the training data. Examples are
maximum mutual information (MMI), minimum classification error (MCE) and minimum phone
error (MPE).

18
Decoding of the speech (the term for what happens when the system is presented with a new
utterance and must compute the most likely source sentence) would probably use the Viterbi
algorithm to find the best path, and here there is a choice between dynamically creating a
combination hidden Markov model, which includes both the acoustic and language model
information, and combining it statically beforehand (the finite state transducer, or FST, approach).

A possible improvement to decoding is to keep a set of good candidates instead of just keeping the
best candidate, and to use a better scoring function (re scoring) to rate these good candidates so that
we may pick the best one according to this refined score. The set of candidates can be kept either as
a list (the N-best list approach) or as a subset of the models (a lattice). Re scoring is usually done by
trying to minimize the Bayes risk (or an approximation thereof): Instead of taking the source
sentence with maximal probability, we try to take the sentence that minimizes the expectancy of a
given loss function with regards to all possible transcriptions (i.e., we take the sentence that
minimizes the average distance to other possible sentences weighted by their estimated probability).

The loss function is usually the Levenshtein distance, though it can be different distances for
specific tasks; the set of possible transcriptions is, of course, pruned to maintain tractability.
Efficient algorithms have been devised to re score lattices represented as weighted finite state
transducers with edit distances represented themselves as a finite state transducer verifying certain
assumptions.

DEEP NEURAL NETWORK

A deep neural network (DNN) is an artificial neural network with multiple hidden layers of units
between the input and output layers. Similar to shallow neural networks, DNNs can model complex
non-linear relationships. DNN architectures generate compositional models, where extra layers
enable composition of features from lower layers, giving a huge learning capacity and thus the
potential of modeling complex patterns of speech data. The DNN is the most popular type of deep
learning architectures successfully used as an acoustic model for speech recognition since 2010.
The success of DNNs in large vocabulary speech recognition occurred in 2010 by industrial
researchers, in collaboration with academic researchers, where large output layers of the DNN
based on context dependent HMM states constructed by decision trees were adopted.

One fundamental principle of deep learning is to do away with hand-crafted feature engineering and
to use raw features. This principle was first explored successfully in the architecture of deep auto
encoder on the "raw" spectrogram or linear filter-bank features, showing its superiority over the
Mel-Cepstral features which contain a few stages of fixed transformation from spectrograms. The
true "raw" features of speech, waveforms, have more recently been shown to produce excellent
larger-scale speech recognition results.

Since the initial successful debut of DNNs for speech recognition around 2009-2011, there have
been huge new progresses made. This progress (as well as future directions) has been summarized
into the following eight major areas:

Scaling up/out and speedup DNN training and decoding;

Sequence discriminative training of DNNs;

Feature processing by deep models with solid understanding of the underlying mechanisms;
19
Adaptation of DNNs and of related deep models;

Multi-task and transfer learning by DNNs and related deep models;

Convolution neural networks and how to design them to best exploit domain knowledge of speech;

Recurrent neural network and its rich LSTM variants;

Other types of deep models including tensor-based models and integrated deep
generative/discriminative models.

Large-scale automatic speech recognition is the first and the most convincing successful case of
deep learning in the recent history, embraced by both industry and academic across the board.
Between 2010 and 2014, the two major conferences on signal processing and speech recognition,
IEEE-ICASSP and Interspeech, have seen near exponential growth in the numbers of accepted
papers in their respective annual conference papers on the topic of deep learning for speech
recognition. More importantly, all major commercial speech recognition systems (e.g., Microsoft
Cortana, Xbox, Skype Translator, Google Now, Apple Siri, Baidu and iFlyTek voice search, and a
range of Nuance speech products, etc.) nowadays are based on deep learning methods.

4.2 STEPS TO SETUP RASPBERRY PI


4.2.1. Connecting Everything Together

1. Plug the preloaded SD Card into the RPi.


2. Plug the USB keyboard and mouse into the RPi, perhaps via a USB hub. Connect the Hub to
power if necessary.
3. Plug a video cable into the screen (TV or monitor) and into the RPi.
4. Plug your extras into the RPi (USB WiFi, Ethernet cable, external hard drive etc.). This is where
you may really need a USB hub.
5. Ensure that your USB hub (if any) and screen are working.
6. Plug the power supply into the mains socket.
7. With your screen on, plug the power supply into the RPi micro USB socket. 8. The RPi should
boot up and display messages on the screen.

It is always recommended to connect the Micro USB power to the unit last (while most connections
can be made live, it is best practice to connect items such as displays with the power turned off).

4.2.2 Operating System SD Card

As the RPi has no internal mass storage or built-in operating system it requires an SD card
preloaded with a version of the Linux Operating System.

1. You can create your own preloaded card using any suitable SD card (4GBytes or above) you
have to hand. We suggest you use a new blank card to avoid arguments over lost pictures.

2. Preloaded SD cards will be available from the RPi Shop.

20
4.2.3. Keyboard & Mouse

Most standard USB keyboards and mice will work with the RPi. Wireless keyboard/mice should
also function, and only require a single USB port for an RF dongle. In order to use a Bluetooth
keyboard or mouse you will need a Bluetooth USB dongle, which again uses a single port.

Remember that the Model A has a single USB port and the Model B has two (typically a keyboard
and mouse will use a USB port each).

4.2.4. Display

There are two main connection options for the RPi display, HDMI (High Definition) and
Composite (Standard Definition).

1. HD TVs and many LCD monitors can be connected using a full-size 'male' HDMI cable,
and with an inexpensive adaptor if DVI is used. HDMI versions 1.3 and 1.4 are supported
and a version 1.4 cable is recommended. The RPi outputs audio and video via HMDI, but
does not support HDMI input.
2. Older TVs can be connected using Composite video (a yellow-to-yellow RCA cable) or via
SCART (using a Composite video to SCART adaptor). Both PAL and NTSC format TVs
are supported.

When using a composite video connection, audio is available from the 3.5mm jack socket, and can
be sent to your TV, headphones or an amplifier. To send audio to your TV, you will need a cable
which adapts from 3.5mm to double (red and white) RCA connectors.

Note: There is no analogue VGA output available. This is the connection required by many
computer monitors, apart from the latest ones. If you have a monitor with only a D-shaped plug
containing 15 pins, then it is unsuitable.

4.2.5. Power Supply

The unit is powered via the microUSB connector (only the power pins are connected, so it will not
transfer data over this connection). A standard modern phone charger with a microUSB connector
will do, providing it can supply at least 700mA at +5Vdc. Check your power supply's ratings
carefully. Suitable mains adaptors will be available from the RPi Shop and are recommended if you
are unsure what to use.

Note: The individual USB ports on a powered hub or a PC are usually rated to provide 500mA
maximum. If you wish to use either of these as a power source then you will need a special cable
which plugs into two ports providing a combined current capability of 1000mA.

4.2.6. Cables

You will need one or more cables to connect up your RPi system.
1. Video cable alternatives: o HDMI-A cable o HDMI-A cable + DVI adapter o Composite
video cable o Composite video cable + SCART adaptor
21
2. Audio cable (not needed if you use the HDMI video connection to a TV)
3. Ethernet/LAN cable (Model B only).

4.2.7. Preparing your SD card for the Raspberry Pi

In order to use your Raspberry Pi, you will need to install an Operating System (OS) onto an SD
card. An Operating System is the set of basic programs and utilities that allow your computer to
run; examples include Windows on a PC or OSX on a Mac.
These instructions will guide you through installing a recovery program on your SD card that will
allow you to easily install different OS’s and to recover your card if you break it.

1. Insert an SD card that is 4GB or greater in size into your computer


2. Format the SD card so that the Pi can read it.

A. Windows

I. Download the SD Association's Formatting Tool from


https://www.sdcard.org/downloads/formatter_4/eula_windows/
II. Install and run the Formatting Tool on your machine
III. Set "FORMAT SIZE ADJUSTMENT" option to "ON" in the "Options" menu
IV. Check that the SD card you inserted matches the one selected by the Tool
V. Click the “Format” button

B. Mac

I. Download the SD Association's Formatting Tool from


https://www.sdcard.org/downloads/formatter_4/eula_mac/
II. Install and run the Formatting Tool on your machine
III. Select “Overwrite Format”
IV. Check that the SD card you inserted matches the one selected by the Tool
V. Click the “Format” button

C. Linux

I. We recommend using gparted (or the command line version parted)


II. Format the entire disk as FAT
III. Download the New out of Box Software (NOOBS) from:
downloads.raspberrypi.org/noobs
IV. Unzip the downloaded file
a. Windows Right click on the file and choose “Extract all”
b. Mac Double tap on the file
c. Linux Run unzip [downloaded filename]

3. Copy the extracted files onto the SD card that you just formatted.
4. Insert the SD card into your Pi and connect the power supply.
5. You can also alternatively download the raspbian image from https://raspberrypi.org.

22
Your Pi will now boot into NOOBS and should display a list of operating systems that you can
choose to install. If your display remains blank, you should select the correct output mode for
your display by pressing one of the following number keys on your keyboard;

1. HDMI mode this is the default display mode.


2. HDMI safe mode select this mode if you are using the HDMI connector and cannot see
anything on screen when the Pi has booted.
3. Composite PAL mode select either this mode or composite NTSC mode if you are using the
composite RCA video connector.
4. Composite NTSC mode.

4.3 STEPS TO SETUP ALEXA VOICE SERVICES

Step One: Register for an Amazon Developer Account

Fig 4. Amazon Developer Account

Before you do anything, you’ll need to register for a free Amazon Developer Account, then create a
profile for your DIY Echo. This is pretty straightforward:

1. Log into your Amazon Developer Account.

2. Click on the Alexa Tab.

3. Click Register a Product Type > Device.

4. Name your device type and display name (We chose “Raspberry Pi” for both).

5. Click Next.
23
6. On the Security Profile screen, slick “Create new profile.”

7. Under the General tab, next to “Security Profile Name” name your profile. Do the same for
the description. Click Next.

8. Make a note of the Product ID, Client ID, and Client Secret that the site generates for you.

9. Click the Web Settings tab, then click the Edit button next to the profile dropdown.

10. Next to Allowed Origins, click, “Add another” and type in: https://localhost:3000.

11. Next to Allowed Return URLs, click “Add another” and type
in: https://localhost:3000/authresponse Click next when you’re done.

12. The Device Details tab is next. It doesn’t matter much what you enter here. Pick a category,
write a description, pick an expected timeline, and enter a 0 on the form next to how many
devices you plan on using this on. Click Next.

13. Finally, you can choose to add in Amazon Music here. This does not work on the Pi
powered device, so leave it checked as “No.” Click Save.

Now you have an Amazon Developer Account and you’ve created a profile for your Pi-powered
Echo. It’s time to head over to the Raspberry Pi and get Alexa working.

Fig 5. Installaton of alexa services

Step Two: Clone and Install Alexa

Plug everything into your Pi and boot it up. You’ll need to be in the graphic user interface (now
dubbed PIXEL) for this because you eventually use a web browser to authenticate your device.

24
1. Open the Terminal application on the Raspberry Pi and type: cd Desktop and press Enter.

2. Type in git clone https://github.com/heramb1008/ashoka-alexaand press Enter.

3. Once that’s complete, type in: cd ~/Desktop/alexa-avs-sample-app and press Enter.

4. Type in nano automated_install.sh and press Enter.

5. This pulls up your text editor. Here, you’ll need to enter your Product ID, Client ID, and
Client Secret that you notes in the step above. Use the arrow keys to navigate to each entry.
Enter each detail after the = sign as noted in the image above. When you’re done, tap
CTRL+X to save and exit.

6. You’re now back at the command line. It’s time to run the install script. Type in cd
~/Desktop/alexa-avs-sample-app and press Enter.

7. Type in  . automated_install.sh and press Enter.

8. When prompted, press Y for the different questions, and answer as you see fit for the rest.
This will configure your Pi and install some extra software. This can take up to 30 minutes,
so just let it do its thing. Once that finishes, it’s time to start the Alexa service.

Fig 6. Starting companion services and npm.

Step Three: Run the Alexa Web Service

Next, you’re going to run three sets of commands at once in three different Terminal windows.
You’ll create a new Terminal window for each of the following steps. Don’t close any windows!
You’ll need to do steps three (this one,) four, and five every time you reboot your Raspberry Pi.
The first one you’ll start is the Alexa Web Service:

25
1. Type in cd ~/Desktop/alexa-avs-sample-app/samples and press Enter.

2. Type in cd companionService && npm start and press Enter.

This starts the companion service and opens up a port to communicate with Amazon. Leave this
window open.

Step Four: Run the Sample App and Confirm Your Account

Fig 7. Starting our application.

Open up a second Terminal window (File > New Window). This next step runs a Java app and
launches a web browser that registers your Pi-powered Echo with the Alexa web service.

1. In your new Terminal window type in cd ~/Desktop/alexa-avs-sample-app/samples and


press Enter.

2. Type in cd javaclient && mvn exec: exec and press Enter.

3. A window will pop up asking you to authenticate your device. Click Yes. This opens up a
browser window. A second pop-up will appear in the Java app asking you to click Ok.
Do not click this yet.

4. Log into your Amazon account in the browser.

26
5. You’ll see an authentication screen for your device. Click Okay. Your browser will now
display “device tokens ready.”

6. You can now click the Ok pop-up in the Java app.

Now, your Raspberry Pi has the necessary tokens to communicate with Amazon’s server. Leave
this Terminal window open.

Step Five: Start Your Wake Word Engine

Fig 8. Activating wake word engine.

1. Type in cd ~/Desktop/alexa-avs-sample-app/samples and press Enter.

2. Type in cd wakeWordAgent/src && ./wakeWordAgent -e kitt_ai

That’s it, your DIY Echo is now running. Go ahead and try it out by saying “Alexa.” You should
hear a beep indicating that it’s listening. When you hear that beep, ask a question like, “What’s the
weather?” or “What’s the score in the Dodgers game?”

Step Six: Improve the Microphone and Make Sure Your Echo Can Hear You

27
Fig 9. Improving the microphone through command line.

Finally, depending on the quality of your microphone, you may notice that it has trouble hearing
you. Instead of screaming “Alexa” at the top of your lungs, let’s go to the command line one last
time.

1. From the command line, type in alsamixer and press Enter.

2. Tap F6 to select a different USB device. Use the arrow keys to select your microphone.

3. Use the arrow keys to increase the capture volume.

4. When you’re happy with the volume, tap ESC to exit.

5. Type in sudo alsactl store and press Enter to make the settings permanent.

Now, you should be able to trigger your DIY Echo by talking to it like a normal human instead of
yelling. You can also change the default volume here if you need to.

28
4.4 FLOWCHART

Fig 10. System Working Flowchart.

START

NO
VOICE CONTROL IF EXIT
(KEYWORD)

YES

MIC.ACTIVELISTEN ()

NO
VOICE IF
ERROR
COMMAND
(VALID)

YES

CALL THE SUB PROCESS

RETURN

29
4.5 BLOCK DIAGRAM

In this figure we able to understand how client request there query with AVS.

Fig 11.Information Exchange in AVS

Fig 12. Hardware Architecture of AVS Clent

4.6 CODING’S

In this section we talk about coding arena about the project which listed out through section. In the
section there is particular function we provide which illustrated below:

SECTION 1: In this we learn about how to start companion services and npm.
#!/usr/bin/en
v node

30
console.log('This node service needs to be running to store token information memory and vend them for the AVS
app.\n');

/**
* Module dependencies.
*/
var app = require('../app');
var debug = require('debug')('companion:server');
var https = require('https');
var fs = require('fs');
var config = require("../config");

/**
* Get port from environment and store in Express.
*/
var port = normalizePort(process.env.PORT || '3000');
app.set('port', port);

var options = {
key: fs.readFileSync(config.sslKey),
cert: fs.readFileSync(config.sslCert),
ca: fs.readFileSync(config.sslCaCert),
requestCert: true,
rejectUnauthorized: false,
};

/**
* Create HTTP server.
*/
var server = https.createServer(options, app);

/**
* Listen on provided port, on all network interfaces.
*/
server.listen(port);
server.on('error', onError);
server.on('listening', onListening);

31
/**
* Normalize a port into a number, string, or false.
*/
function normalizePort(val) {
var port = parseInt(val, 10);

if (isNaN(port)) {
// named pipe
return val;
}

if (port >= 0) {
// port number
return port;
}

return false;
}

/**
* Event listener for HTTP server "error" event.
*/
function onError(error) {
if (error.syscall !== 'listen') {
throw error;
}

var bind = typeof port === 'string'


? 'Pipe ' + port
: 'Port ' + port;

// handle specific listen errors with friendly messages


switch (error.code) {
case 'EACCES':
console.error(bind + ' requires elevated privileges');
process.exit(1);
break;
case 'EADDRINUSE':
console.error(bind + ' is already in use');
process.exit(1);
32
break;
default:
throw error;
}
}

/**
* Event listener for HTTP server "listening" event.
*/
function onListening() {
var addr = server.address();
var bind = typeof addr === 'string'
? 'pipe ' + addr
: 'port ' + addr.port;
console.log('Listening on ' + bind);
}
SECTION 2: In this section we talk about those code from which we get token from AVS.
var express =
require('express')
;
var path = require('path');
var bodyParser = require('body-parser');
var auth = require('./authentication.js');

var app = express();


app.use(bodyParser.json());

/**
* The endpoint for the device to request a registration code to then show to the user.
*/
app.get('/provision/regCode', function (req, res) {
if (!req.client.authorized) {
console.error("User is not authorized to access this URL. Make sure the client certificate is set up
properly");
res.status(401);
res.send({ error: "Unauthorized", message: "You are not authorized to access this URL. Make sure your
client certificate is set up properly." });
return;
}

auth.getRegCode(req.query.productId, req.query.dsn, function (err, reply) {


if (err) {
33
console.error("Error retrieving registration code: " + err.name + ", " + err.message);
res.status(err.status);
res.send({ error: err.name, message: err.message });
} else {
console.log("Successfully retrieved registration code for " + req.query.productId + " / " +
req.query.dsn);
res.send(reply);
}
});
});

/**
* The endpoint for the device to request a new accessToken when the previous one expires.
*/
app.get('/provision/accessToken', function (req, res) {
if (!req.client.authorized) {
console.error("User is not authorized to access this URL. Make sure the client certificate is set up
properly");
res.status(401);
res.send({ error: "Unauthorized", message: "You are not authorized to access this URL. Make sure your
client certificate is set up properly." });
return;
}

auth.getAccessToken(req.query.sessionId, function (err, reply) {


if (err) {
console.error("Error retrieving access token: " + err.name + ", " + err.message);
res.status(err.status);
res.send({ error: err.name, message: err.message });
} else {
console.log("Successfully retrieved access token for session id: " + req.query.sessionId);
res.send(reply);
}
});
});

/**
* The endpoint for the device to revoke a token.
*/
app.get('/provision/revokeToken', function (req, res) {
if (!req.client.authorized) {
console.error("User is not authorized to access this URL. Make sure the client certificate is set up
properly");
34
res.status(401);
res.send({ error: "Unauthorized", message: "You are not authorized to access this URL. Make sure your
client certificate is set up properly." });
return;
}

auth.revokeToken(req.query.sessionId, function (err, reply) {


if (err) {
console.error("Error revoking token: " + err.name + ", " + err.message);
res.status(err.status);
res.send({ error: err.name, message: err.message });
} else {
console.log("Successfully revoked token for session id: " + req.query.sessionId);
res.send(reply);
}
});
});

/**
* The endpoint for the customer to visit and get redirected to LWA to login.
*/
app.get('/provision/:regCode', function (req, res, next) {
auth.register(req.params.regCode, res, function (err) {
// on success gets redirect so wont return to a callback.
res.status(err.status);
res.send({ error: err.name, message: err.message });
next(err);
});
});

/**
* The endpoint that LWA will redirect to to include the authorization code and state code.
*/
app.get('/authresponse', function (req, res) {
auth.authresponse(req.query.code, req.query.state, function (err, reply) {
if (err) {
res.status(err.status);
res.send({ error: err.name, message: err.message });
} else {
res.send(reply);
}
});
});
35
// standard error handling functions.
app.use(function (req, res, next) {
// Suppress /favicon.ico errors
var favicon = "favicon.ico";
if (req.url.slice(-favicon.length) != favicon) {
var err = new Error('Not Found: ' + req.url);
err.status = 404;
next(err);
} else {
next();
}
});

app.use(function (err, req, res, next) {


console.log("error: ", err);
res.status(err.status || 500);
res.send('error: ' + err.message);
});

module.exports = app;

SECTION 3: In this section we talk about how to install latest version of Java in RPi and as well as
we check that our OS must be raspbian.
#!/bin/bas
h

# Ensure we are running on Raspbian


lsb_release -a 2>/dev/null | grep Raspbian
if [ "$?" -ne "0" ]; then
echo "This OS is not Raspbian. Exiting..."
exit 1
Fi

# Determine which version of Raspbian we are running on


VERSION=`lsb_release -c 2>/dev/null | awk '{print $2}'`
echo "Version of Raspbian determined to be: $VERSION"

if [ "$VERSION" == "jessie" ]; then


UBUNTU_VERSION="trusty"
36
elif [ "$VERSION" == "wheezy" ]; then
UBUNTU_VERSION="precise"
Else
echo "Not running Raspbian Wheezy or Jessie. Exiting..."
exit 1;
Fi

# Remove any existing Java


sudo apt-get -y autoremove
sudo apt-get -y remove --purge oracle-java8-jdk oracle-java7-jdk openjdk-7-jre openjdk-8-jre

# Install Java from Ubuntu's PPA


# http://linuxg.net/how-to-install-the-oracle-java-8-on-debian-wheezy-and-debian-jessie-via-repository/
sudo sh -c "echo \"deb http://ppa.launchpad.net/webupd8team/java/ubuntu $UBUNTU_VERSION main\" >>
/etc/apt/sources.list"
sudo sh -c "echo \"deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu $UBUNTU_VERSION main\" >>
/etc/apt/sources.list"

KEYSERVER=(pgp.mit.edu keyserver.ubuntu.com)

GPG_SUCCESS="false"
for server in ${KEYSERVER[@]}; do
COMMAND="sudo apt-key adv --keyserver ${server} --recv-keys EEA14886"
echo $COMMAND
$COMMAND
if [ "$?" -eq "0" ]; then
GPG_SUCCESS="true"
Break
Fi
Done

if [ "$GPG_SUCCESS" == "false" ]; then


echo "ERROR: FAILED TO FETCH GPG KEY. UNABLE TO UPDATE JAVA"
Fi

sudo apt-get update


sudo apt-get -y install oracle-java8-installer
sudo apt-get -y install oracle-java8-set-default

37
SECTION 4: In this section we talk about wake word agent through which we can access of AVS
console for solving our queries.
#include
"WakeWordAgent.h
"
#include "WakeWordUtils.h"
#include "Logger.h"

#include <string>
#include <unistd.h>

using namespace AlexaWakeWord::Logger;

namespace AlexaWakeWord {

WakeWordAgent::WakeWordAgent(WakeWordEngineFactory::EngineType engineType,
WakeWordIPCFactory::IPCType ipcType) :
m_isRunning{false}, m_currentState{State::UNINITIALIZED} {

setState(State::IDLE);

try {

log(Logger::DEBUG, std::string("WakeWordAgent: initalizing") +


" | wake word engine of type:" +
WakeWordEngineFactory::engineTypeToString(engineType) +
" | IPC handler of type:" +
WakeWordIPCFactory::IPCTypeToString(ipcType));

m_wakeWordEngine = WakeWordEngineFactory::createEngine(this, engineType);


m_IPCHandler = WakeWordIPCFactory::createIPCHandler(this, ipcType);

m_isRunning = true;

m_thread = make_unique<std::thread>(&WakeWordAgent::mainLoop, this);

38
} catch(std::bad_alloc& e) {
log(Logger::ERROR, "WakeWordAgent: could not allocate memory");
throw;
} catch (WakeWordException& e) {
log(Logger::ERROR,
std::string("WakeWordAgent: exception in constructor: ") + e.what());
throw;
}
}

WakeWordAgent::~WakeWordAgent() {
log(Logger::DEBUG, "WakeWordAgent: Joining on thread");
m_isRunning = false;
m_thread->join();
}

// A fairly simple state machine. On each state we care about, the


// transition should be self-explanatory.
void WakeWordAgent::mainLoop() {

log(Logger::INFO, "WakeWordAgent: thread started");

std::unique_lock<std::mutex> lck(m_mtx);

auto checkState = [this] {


return m_currentState == State::WAKE_WORD_DETECTED
|| m_currentState == State::WAKE_WORD_PAUSE_REQUESTED
|| m_currentState == State::WAKE_WORD_RESUME_REQUESTED;
};

while (m_isRunning) {

// Wait for a state where an action is required


m_cvStateChange.wait(lck, checkState);

try {

39
switch (m_currentState) {
case State::WAKE_WORD_DETECTED:
m_IPCHandler->sendCommand(Command::WAKE_WORD_DETECTED);
setState(State::SENT_WAKE_WORD_DETECTED);
break;

case State::WAKE_WORD_PAUSE_REQUESTED:
m_wakeWordEngine->pause();
m_IPCHandler->sendCommand(Command::CONFIRM);
setState(State::WAKE_WORD_PAUSED);
break;

case State::WAKE_WORD_RESUME_REQUESTED:
m_wakeWordEngine->resume();
setState(State::IDLE);
break;

default:
// no-op
break;
}

} catch (WakeWordException &e) {


log(Logger::ERROR, std::string("WakeWordAgent::mainLoop - exception:") +
e.what());
setState(State::IDLE);
}
}

log(Logger::INFO, "WakeWordAgent: thread ended");


}

// Besides setting the state, prints some pretty cool trace!


void WakeWordAgent::onWakeWordDetected() {

log(Logger::INFO, "===> WakeWordAgent: wake word detected <===");

40
if(State::IDLE == m_currentState ||
State::SENT_WAKE_WORD_DETECTED == m_currentState) {
std::lock_guard<std::mutex> lock(m_mtx);
setState(State::WAKE_WORD_DETECTED);
m_cvStateChange.notify_one();
}
}

// Called by our IPC handling object when a command is received.


// Updates our state machine accordingly.
// Note that the mainLoop() function above is where this change of
// state results in any action.
void WakeWordAgent::onIPCCommandReceived(IPCInterface::Command command) {

log(Logger::INFO, "WakeWordAgent: IPC Command received:" +


std::to_string(command));

switch(m_currentState) {
case State::IDLE:
case State::SENT_WAKE_WORD_DETECTED:
if (IPCInterface::PAUSE_WAKE_WORD_ENGINE == command) {
std::lock_guard<std::mutex> lock(m_mtx);
setState(State::WAKE_WORD_PAUSE_REQUESTED);
m_cvStateChange.notify_one();
}
break;
case State::WAKE_WORD_PAUSED:
if(IPCInterface::Command::RESUME_WAKE_WORD_ENGINE == command) {
std::lock_guard<std::mutex> lock(m_mtx);
setState(State::WAKE_WORD_RESUME_REQUESTED);
m_cvStateChange.notify_one();
}
break;
default:
// no-op
break;
}
}

// utility function
std::string WakeWordAgent::stateToString(State state) {

41
switch(state) {
case State::UNINITIALIZED:
return "UNINITIALIZED";
case State::IDLE:
return "IDLE";
case State::WAKE_WORD_DETECTED:
return "WAKE_WORD_DETECTED";
case State::SENT_WAKE_WORD_DETECTED:
return "SENT_WAKE_WORD_DETECTED";
case State::WAKE_WORD_PAUSE_REQUESTED:
return "WAKE_WORD_PAUSE_REQUESTED";
case State::WAKE_WORD_PAUSED:
return "WAKE_WORD_PAUSED";
case State::WAKE_WORD_RESUME_REQUESTED:
return "WAKE_WORD_RESUME_REQUESTED";
default:
log(Logger::ERROR, "WakeWordAgent::stateToString: unhandled switch case");
return "UNKNOWN";
}
}

// Utility function. Encapsulates tracing to aid debugging.


void WakeWordAgent::setState(State state) {

m_currentState = state;

log(Logger::INFO, "WakeWordAgent: State set to " +


stateToString(state) + "(" + std::to_string(static_cast<int>(state)) + ")");
}}

42
5. TESTING

5.1 SYSTEM TESTING

Testing is a set activity that can be planned and conducted systematically. Testing begins at
the module level and work towards the integration of entire computers based system. Nothing
is complete without testing, as it is vital success of the system.

5.1.1 Testing Objectives


There are several rules that can serve as testing objectives, they are

1. Testing is a process of executing a program with the intent of finding an error


2. A good test case is one that has high probability of finding an undiscovered error.
3. A successful test is one that uncovers an undiscovered error.

If testing is conducted successfully according to the objectives as stated above, it would


uncover errors in the software. Also testing demonstrates that software functions appear to the
working according to the specification, and that the performance requirements appear to
have been met.

There are three ways to test a program


1. For Correctness
2. For Implementation efficiency
3. For Computational Complexity.

Tests for correctness are supposed to verify that a program does exactly what it was designed
to do. This is much more difficult than it may at first appear, especially for large programs.
It is a code-refining process, which reexamines the implementation phase of
algorithm development.

43
Tests for computational complexity amount to an experimental analysis of the complexity of
an algorithm or an experimental comparison of two or more algorithms, which solve the same
problem.

5.1.2 Testing Plans


The following ideas should be a part of any testing plan

1. Preventive Measures
2. Spot checks
3. Testing all parts of the program
4. Test Data
5. Looking for trouble
6. Time for testing
7. Re Testing

5.2 Testing performed

The data is entered in all forms separately and whenever an error occurred, it is corrected
immediately. A quality team deputed by the management verified all the necessary documents
and tested the Software while entering the data at all levels. The entire testing process can be
divided into 4 phases

1. Unit Testing

2. Integrated Testing

3. Validation Testing

4. Acceptance Testing

5.2.1 UNIT TESTING

First we perform the unit testing on every module of our project. First module is starting
window with application logo (WTS). We perform Unit Testing and debug the whole

44
code and handle the possible exception and ensure that module did not fail in any
condition. After that we perform Unit Testing on second module i.e. mailing page and
debug
the whole code and handle the possible exception.

5.2.2 INTEGRATION TESTING

Now we perform Integration Testing by integrating all the module to test the whole
functionality of application and handle the possible exception.

5.2.3 VALIDATION TESTING


In this, all the Code Modules were tested individually one after the other. The following
were tested in all the modules
1. Loop testing
2. Boundary Value analysis
3. Equivalence Partitioning Testing

In our case all the modules were combined and given the test data. The combined module
works successfully without any side effect on other programs.

5.2.4 ACCEPTANCE TESTING

At last we perform acceptance testing to ensure the application perform according to client
need. This is the final step in testing. In this the entire system was tested as a whole with all
forms,code, modules and class modules. This form of testing is popularly known as Black
Box testing or system is testing.Black Box testing methods focus on the functional
requirement of the software.

That is, Black Box testing enables the software engineer to derive sets of input
conditionsthat will fully exercise all functional requirements for a program. Black Box
testing attempts to find errors in the following categories; incorrect functions, interface
errors, errors in data structures or external database access, performance errors and also
attempts to search and mark for initialization errors and termination error.
6. FURTHER ENHANCEMENTS

1. RECOGNITION WITHOUT INTERNET ACCESS


We are well aware that there is no availability of internet access throughout our country.
Currently, India is nowhere near meeting the target for a service which is considered almost a
basic necessity in many developed countries. In such cases this project may not function,
therefore we have enhancing this project to work even without internet using recognition toolkits
such as CMU Sphinx.

Fig 13. Internet Connection Problem

2. GSM Module for voice activated calling


Raspberry PI SIM800 GSM/GPRS Add-on V2.0 is customized for Raspberry Pi interface based
on SIM800 quad-band GSM/GPRS/BT module. AT commands can be sent via the serial port on
Raspberry Pi, thus functions such as dialing and answering calls, sending and receiving messages
and surfing on line can be realized. Moreover, the module supports powering-on and resetting
via software.

Fig 14. GSM Quad Band 800

46
3. HOME AUTOMATION

Fig 15. Home automation possibilities

With the right level of ingenuity, the sky's the limit on things you can automate in your home,
but here are a few basic categories of tasks that you can pursue: Automate your lights to turn on
and off on a schedule, remotely, or when certain conditions are triggered. Set your air
conditioner to keep the house temperate when you're home and save energy while you're away.

47
7. APPLICATIONS

1. Usage in education and daily life

For language learning, speech recognition can be useful for learning a second language. It can
teach proper pronunciation, in addition to helping a person develop fluency with their speaking
skills. Students who are blind (see Blindness and education) or have very low vision can benefit
from using the technology to convey words and then hear the computer recite them, as well as
use a computer by commanding with their voice, instead of having to look at the screen and
keyboard.

2. Aerospace (e.g. space exploration, spacecraft, etc.)

NASA’s Mars Polar Lander used speech recognition from technology Sensory, Inc. in the Mars
Microphone on the Lander
Automatic subtitling with speech recognition
Automatic translation.

3. Court reporting (Real time Speech Writing)

4. Telephony and other domains

ASR in the field of telephony is now common place and in the field of computer gaming and
simulation is becoming more widespread. Despite the high level of integration with word
processing in general personal computing. However, ASR in the field of document production
has not seen the expected [by whom?] increases in use.
The improvement of mobile processor speeds made feasible the speech-enabled Symbian and
Windows Mobile smartphones. Speech is used mostly as a part of a user interface, for creating
Pre-defined or custom speech commands. Leading software vendors in this field are: Google,
Microsoft Corporation (Microsoft Voice Command), Digital Syphon (Sonic Extractor),
LumenVox, Nuance Communications (Nuance Voice Control), Voice Box Technology, Speech
Technology Center, Vito Technologies (VITO Voice2Go), Speereo Software (Speereo Voice
Translator), Verbyx VRX and SVOX.

5. In Car systems

Typically a manual control input, for example by means of a finger control on the steering-
wheel, enables the speech recognition system and this is signalled to the driver by an audio
prompt. Following the audio prompt, the system has a "listening window" during which it may
accept a speech input for recognition.

48
Fig 16. Car Automation

Simple voice commands may be used to initiate phone calls, select radio stations or play music
from a compatible smartphone, MP3 player or music-loaded flash drive. Voice recognition
capabilities vary between car make and model. Some of the most recent car models offer natural-
language speech recognition in place of a fixed set of commands. Allowing the driver to use full
sentences and common phrases. With such systems there is, therefore, no need for the user to
memorize a set of fixed command words.

6. Helicopters

The problems of achieving high recognition accuracy under stress and noise pertain strongly to
the helicopter environment as well as to the jet fighter environment. The acoustic noise problem
is actually more severe in the helicopter environment, not only because of the high noise levels
but also because the helicopter pilot, in general, does not wear a facemask, which would reduce
acoustic noise in the microphone. Substantial test and evaluation programs have been carried out
in the past decade in speech recognition systems applications in helicopters, notably by the U.S.
Army Avionics Research and Development Activity (AVRADA) and by the Royal Aerospace
Establishment (RAE) in the UK. Work in France has included speech recognition in the Puma
helicopter. There has also been much useful work in Canada. Results have been encouraging,
and voice applications have included: control of communication radios, setting of navigation
systems, and control of an automated target handover system.
As in fighter applications, the overriding issue for voice in helicopters is the impact on pilot
effectiveness. Encouraging results are reported for the AVRADA tests, although these represent
only a feasibility demonstration in a test environment. Much remains to be done both in speech
recognition and in overall speech technology in order to consistently achieve performance
improvements in operational settings.

7. High-performance fighter aircraft

Substantial efforts have been devoted in the last decade to the test and evaluation of speech
recognition in fighter aircraft. Of particular note is the U.S. program in speech recognition for the
Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16 VISTA), and a program in
France installing speech recognition systems on Mirage aircraft, and also programs in the UK
dealing with a variety of aircraft platforms. In these programs, speech recognizers have been
operated successfully in fighter aircraft, with applications including: setting radio frequencies,

49
commanding an autopilot system, setting steer-point coordinates and weapons release
parameters, and controlling flight display.

50
8. REFERENCES
[1] D. Yu and L. Deng "Automatic Speech Recognition: A Deep Learning Approach" (Publisher:
Springer) published near the end of 2014,

[2]Claudio Becchetti, Klucio Prina Ricotti. “Speech Recognition: Theory and C++
Implementation “2008 edition

[3]Reynolds, Douglas; Rose, Richard (January 1995). "Robust text-independent speaker


identification using Gaussian mixture speaker models" (PDF). IEEE Transactions on Speech and
Audio Processing (IEEE) 3 (1): 72–83. doi:10.1109/89.365379. ISSN 10636676. OCLC
26108901. Retrieved 21 February 2014.

[4]Waibel, Hanazawa, Hinton, Shikano, Lang. (1989) "Phoneme recognition using time delay
neural networks. IEEE Transactions on Acoustics, Speech and Signal Processing."

[5] Microsoft Research. "Speaker Identification (WhisperID)". Microsoft. Retrieved 21 February


2014.

[6]]'Low Cost Home Automation Using Offline Speech Recognition', International Journal of
Signal Processing Systems, vol. 2, no. 2, pp. 96-101, 2014.

[7]Juang, B. H.; Rabiner, Lawrence R. "Automatic speech recognition–a brief history of the
technology development" (PDF). p. 6. Retrieved 17 January 2015.

[8] Deng, L.; Li, Xiao (2013). "Machine Learning Paradigms for Speech Recognition: An
Overview". IEEE Transactions on Audio, Speech, and Language Processing.

[9]P. V. Hajar and A. Andurkar, “Facial Recognition and Speech Recognition using Raspberry
Pi', International Journal of Advanced Research in Computer and Communication Review Paper
on System for Voice and F Engineering, vol. 4, no. 4, pp. 232-234, 2015.

[10]Claudio Becchetti, Klucio Prina Ricotti. “Speech Recognition: Theory and C++
Implementation “2008 edition

[11]Reynolds, Douglas; Rose, Richard (January 1995). "Robust text-independent speaker


identification using Gaussian mixture speaker models" (PDF). IEEE Transactions on Speech and
Audio Processing (IEEE) 3 (1): 72–83. doi:10.1109/89.365379. ISSN 10636676. OCLC
26108901. Retrieved 21 February 2014.

[12]Waibel, Hanazawa, Hinton, Shikano, Lang. (1989) "Phoneme recognition using time delay
neural networks. IEEE Transactions on Acoustics, Speech and Signal Processing."

[13] Microsoft Research. "Speaker Identification (WhisperID)". Microsoft. Retrieved 21


February 2014.

51
[14]]'Low Cost Home Automation Using Offline Speech Recognition', International Journal of
Signal Processing Systems, vol. 2, no. 2, pp. 96-101, 2014.

[15]P. V. Hajar and A. Andurkar, “Facial Recognition and Speech Recognition using Raspberry
Pi', International Journal of Advanced Research in Computer and Communication Review Paper
on System for Voice and F Engineering, vol. 4, no. 4, pp. 232-234, 2015.

52

You might also like