Professional Documents
Culture Documents
1 Introduction
1.1 Background
Internet of things (IOT) is a hot topic in this day; It makes world turning to intelligent
quickly, maybe the scene of Sci-fi movies is close to our life already. The world of IOT,
sensors are everywhere, because it collects many data to improve human’s life. How to
use these data to help human, so Artificial Intelligent (AI) becoming the hot topic, AI
can let computer learning data, and we can call this “Machine Learning,” while the
computer is getting data and work as a human. AI has experienced low ebb before, but
technique growing let AI relive, back to the hot topic.
AI has a lot of applications, like natural language processing, speech recognition,
image recognition… etc. Each application has the different technique, the most being
discussed is Neural Network (NN) currently. Because of the same type NN cannot
process the other kind applications well all the time, people evolve different type NN to
apply the various forms.
Inevitably, each NN was trained to get the better performance. “Deep Learning”
also is a hot topic now, and Its concept is using multilayer NN to improve performance,
and according to the related work, multilayer NN’s performance is better than single
layer NN’s performance.
1.2 Motivation
AI is the trending now; we want to follow the trending, use AI to build a system to
increase our live quality. Information security is the most crucial part in IOT; it is
dangerous that if everyone can control your smart home devices, to avoid that situation
we combine the concepts of IOT and AI to build intelligent home security Indicator,
using AI to filter user of smart home devices.
1.3 Purpose
In this paper, we are using IOT and AI technique to build a smart home security
Indicator. In Taiwan, most people still using the traditional key lock or electronic lock,
so they have to bring key or Magnetic buckle with the risk of losing. Our aim is
building home access control system by face and speaker recognition, that improves
people live quality, without bringing the key to outdoor, avoiding the risk of losing.
2 Related Work
To make sure our research is possible, we survey related paper, Ben-Yacoub et al. [1]
proposed a method to achieve face and speech verification that compares the face
speech matching score to make the decision (reject or accept). According to these
survey, we can know how to build us system. We discuss more related technique
below.
2.2 Raspberry Pi
Raspberry Pi is a kind of micro-computer, you can install the operating system in it.
The function like a base, you can expand different modules, for example, if we want to
collect image information, we can install the lens module, if we’re going to obtain
sound information, we need to install the microphone module. Users have to install the
wireless technique module to get the ability of the Internet connecting or remote
controlling if using the second generation Raspberry Pi. The third generation Raspberry
Pi published on Feb. 29, 2016. The third generation Raspberry Pi already has WIFI and
Bluetooth, so it has more convenience than the second generation. Through Raspberry
Pi has the ability that easy expanding and compiling, it becomes one of the favorite
choices of IoT devices.
Jain et al. [3] present a necessary home automation application on Raspberry Pi that
can use local networking or wireless technique to control LEDs switching action by
reading the subject of E-mail.
A face recognition system can be divided into three parts: face detection, feature
extraction, and face recognition, as Fig. 2 [4]. In face detection part, the system has to
find the face region of the images to reduce calculating data that not belong face.
Feature extraction part is collecting the feature data of the face region. The final section
is the face recognition that using the feature data to recognize the identity of the person
in the image.
performance than other methods in most situation. We will discuss the convolutional
neural network below.
3 Methodology
3.1 Architecture
In this paper, we proposed a structure that combines face recognition and voice
recognition to build a home security system. Raspberry Pi as the home security sys-
tem’s kernel, using lens module to collect face information, using microphone module
to receive voice information. Raspberry Pi upload the info to emotion API and speech
API, emotion API is used to recognize face information, speech API is used to identify
voice information. When the identity of emotion API result equals integrity of speech
API result, completing the verification requirement, the system will continue next step
Combining Voice and Image Recognition for Smart Home … 217
that control devices, like an electronic lock or other devices that supported remote
control. Figure 3 is the architecture of our system.
We have to finish three parts to complete this research; the first part is building the
system environment that using raspberry pi, and collecting data, including face images
and voice files is the second part, the last part is using Microsoft Azure platform to train
the model.
Microsoft has the webpage to explain their API, such as the various function in
Face API [11], or the description of each function parameters, speech API has the own
page of description, too [12].
Figure 4 is the process of Face API. First, we have to create a group where to save a
person profile. The second step is creating the personal patterns. In the third step, we
upload face image into the personal profile; then we train the person group in the step
number four. After completing four actions, we can do the face recognition.
The method of Speech API is in Fig. 5, like the Face API, we create the personal
profiles at first, then we do the enrollment job of the patterns, then we can do the
speaker recognition.
4 Preliminary Experiment
We choose our lab partner as the tester, using their face data and voice data to do the
experiment, six testers in total. We collect five face images and five voice records to
each person that as our test data.
Open Source Computer Vision Library (OpenCV) [13, 14] can help us to collect
face image easier. OpenCV has the various filters, including the face, eyes, mouth and
so on. Therefore we can take a photo when detected human faces. We use the filters of
eyes and face in our system; the reason is if we only use the face filter, the method
misjudge rate will be increased, the eyes filter can make sure we captured the human
face images. Figure 6 is the image that obtained by using the filters of eyes and face.
Combining Voice and Image Recognition for Smart Home … 219
In the part of voice collection, we use PyAudio [15] package to help us collecting
voice data. Some codes of PyAudio were shown in Fig. 7, as we can see, we can
modify some parameters of voice recording, such as chunk, format, channels, etc.
Both face and speech recognition to each person we use three files to train model
and two files to test. Tables 1 and 2 are the result of Face API and Speech API.
According to these tables, we can know there was some error in API; the red value
means error value, the value 1 means both two files were accepted, the value 0.5 means
one file was approved, one file was rejected, the value 0 indicates both two records
were rejected. The result of API testing shocks us, especially in Speech API, the test
file of the tester E can pass each person verification, we will look for the solution to
improve the performance of our system in future work.
220 H.-T. Lee et al.
We use Bluetooth light device as our result display device, using different color
light let the user know the result of the system judge. According to Table 3, blue light
represents the result of the system acceptations; red light represents the result of the
system rejection.
IoT technique has more common than before. Information security becomes essential
although IoT can improve life more conveniently. The smart home can bring you
convenient, also can bring you risk. So we build a home security system to enhance our
information security. We have a preliminary result of our research, and there still room
to improve our study, like Speech API accuracy, our system fluency… etc. We look for
the solution to improve the efficiency and complete the system in future.
References
1. Ben-Yacoub, S., Abdeljaoued, Y., Mayoraz, E.: Fusion of face and speech data for person
identity verification. IEEE Trans. Neural Netw. 10(5), 1065–1074 (1999)
2. Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of Things (IoT): a vision,
architectural elements, and future directions. Future Gener. Comput. Syst. 29(7), 1645–1660
(2013)
3. Jain, S., Vaibhav, A., Goyal, L.: Raspberry Pi based interactive home automation system
through E-mail. In: 2014 International Conference on Reliability Optimization and
Information Technology (ICROIT) (2014)
4. Chihaoui, M., Elkefi, A., Bellil, W., Amar, C.B.: A survey of 2D face recognition
techniques. Computers 5(4), 1–25 (2016)
5. Sirovich, L., Kirby, M.: Low-dimensional procedure for the characterization of human faces.
J. Opt. Soc. Am. A 4, 519–524 (1986)
6. Klevans, R.L., Rodman, R.D.: Voice Recognition, 1st edn. Artech House, Inc., Norwood,
MA, USA (1997)
7. Gaikwad, S.K., Gawali, B.W., Yannawar, P.: A review on speech recognition technique. Int.
J. Comput. Appl. 10(3), 16–24 (2010)
8. Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency
cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. J. Comput. 2(3),
138–143 (2010)
9. Lee, H.-T., Chen, R.-C., Wei, D.: Building emotion recognition control system using
Raspberry Pi. In: The 6th International Conference on Frontier Computing (FC 2017), Japan
(2017)
10. Microsoft Azure: https://azure.microsoft.com/zh-tw/. Last accessed 21 Feb 2018
11. Face API|Microsoft Azure: https://azure.microsoft.com/zh-tw/services/cognitive-services/
face/. Last accessed 21 Feb 2018
12. Speaker Recognition API|Microsoft Azure: https://azure.microsoft.com/zh-tw/services/
cognitive-services/speaker-recognition/. Last accessed 21 Feb 2018
13. OpenCV Library: https://opencv.org/. Last accessed 21 Feb 2018
14. OpenCV Tutorial: http://monkeycoding.com/?page_id=12. Last accessed 21 Feb 2018
15. PyAudio Documentation-PyAudio 0.2.11 Documentation: http://people.csail.mit.edu/hubert/
pyaudio/docs/. Last accessed 21 Feb 2018