You are on page 1of 5

2016 IEEE 12th International Colloquium on Signal Processing & its Applications (CSPA2016), 4 - 6 March 2016, Melaka, Malaysia

Abnormal Sound Analytical Surveillance System


Using Microcontroller
Tan Teng Teng #1, Lim Tien Sze #2, Ong Lee Yeng #3
#
Centre For Remote Sensing And Surveillance Technologies (CRSST),
Faculty of Engineering and Technology, Multimedia University,
Jalan Ayer Keroh Lama 75450 Bukit Beruang, Melaka, Malaysia
1
tengteng77@yahoo.com, 2 tslim@mmu.edu.my, 3 lyong@mmu.edu.my

Abstract—Analytical surveillance can perform the surveillance Mel Frequency Cepstral Coefficients (MFCC) is one of
tasks much more efficient comparing to operator manual popular audio feature used for recognition. MFCC had been
monitoring. This had made it getting increased market’s interest used as one of the audio features for scream detection[2][3]. It
in recent years. Commonly, closed circuit television (CCTV) is also result a good classification. MFCC takes human
used for security surveillance. However, CCTVs are purely vision
output. These silent videos may not provide complete picture of
perception sensitivity with respect to frequencies into
the happening. Sound detection is incorporate into vision consideration. This include of mel frequency filtering which
surveillance for enhancement. Sound detection is able to detect reflect of human ear perceived frequency/pitch. Human
abnormal sound although happen at camera blind spots or due to hearing is more sensitive to low frequencies. Another step in
intentional blocking. In this paper, we propose to use MFCC extraction is logarithm. This is to mimic human
microcontroller embedded system to enhance current CCTV perception to loudness. There are also other audio features
system. Proposed abnormal sound embedded system is to carry including pitch range[2][5], energy[2][4] and zero crossing
out the sound detection, audio processing and analysis. This rate(ZCR)[4] had been exploited to recognize scream and
study is using only single microphone for sound detection. Audio glass breaking sound.
amplitude and frequency range are targeted feature extracted
from Fast Fourier Transform (FFT). Abnormal sound of human
Microcontroller had been used in voice and speech
screaming and glass breaking were classified using decision tree. recognition[7][8][9][10]. Clayder and Nils had compared that
From experiment, proposed abnormal sound analytical audio signal processed by microcontroller is compatible with
surveillance system test yield average of 88% accuracy detection. computer[8]. Consideration on using microcontroller is its
We can consider our work is simple and cost effective for field limitation on speed and memory capacity.
implementation. Audio also plays an important role other than vision on
analytic surveillance detection. Start with CCTV surveillance
Index Terms—Abnormal sound, audio analytical, and now to be developed with abnormal sound detection for
microcontroller.
surveillance enhancement. Proposed abnormal sound analytical
I. INTRODUCTION surveillance is designed to detect, analyze, recognize and
trigger for further action.
Crime’s index has increase trend and had been identified as
main concern of people and to the economy. From the statistic II. METHODOLOGY
data, property theft contributing 82% of overall reported
Since CCTV is commonly used for surveillance, we are
crimes[1]. CCTVs are introduced to perform the surveillance
proposing and developing an add-on module to enhance the
tasks in street and residential areas but blind spot had lead to
current CCTV surveillance system.
imperfection of the CCTV surveillance system. Sound is
We start with theoretical and mathematic analysis on
another important environment stimulus for surveillance
interested sound to be detected. Here, we perform characteristic
alertness. Glass breaking sound could be considered as action
comparison between abnormal sound and no abnormality
of try break into a premise that needs attention. Human
sound. Thus, we selected features that could recognize between
screaming when needs help and attention. These 2 types of
abnormal and no abnormality sound for classification. Here,
sound will be our main focus in this paper. We are proposing
we develop our abnormal sound analytical algorithm.
an abnormal sound analytical surveillance using
Next step we continue with modeling the platform using
microcontroller to enhance existing CCTV. The audio-assisted
microcontroller embedded board for our proposed abnormal
vision surveillance system will be catered with comprehensive
sound analytical surveillance system. The proposed abnormal
capability to recognize the happening. Thus, surveillance
sound analytical microcontroller embedded system is used for
system will be more efficient to prevent and detect theft
audio signal acquisition and proposed algorithm execution.
happening.

978-1-4673-8780-4/16/$31.00 ©2016 IEEE 162

Authorized licensed use limited to: Monash University. Downloaded on August 29,2020 at 01:40:14 UTC from IEEE Xplore. Restrictions apply.
2016 IEEE 12th International Colloquium on Signal Processing & its Applications (CSPA2016), 4 - 6 March 2016, Melaka, Malaysia

Finally our proposed abnormal sound analytical surveillance


system will be assessed at real environment to evaluate the
effectiveness on scream and glass breaking sound detection.

A. Feature Selection

Fig. 4. Frequency domain plot for ‘Tolong saya’ screaming scenario.

Fig. 1. Time domain plot for ‘Tolong saya’ normal speech scenario.

Fig. 5. Frequency domain plot for glass breaking.

For audio, there are three physical properties: frequency,


intensity and envelope to be deployed for the next step of
classification. The selection of appropriate features will
determine on the classification result. Human brain percept
Fig. 2. Time domain plot for ‘Tolong saya’ screaming scenario. that high pitch sound carry the abnormal scenario and will
draw attention to it.
For surveillance, we are proposed a method to detect the
abnormal scenario. The content of the audio will not be main
core to be analyzed. Figure 1 normal speech and Figure 2
screaming is from the same Malay word ‘Tolong saya’. We
clearly observed that screaming’s audio spectrum envelope is
wider than normal speech. Screaming’s audio amplitude is
higher than normal speech. Amplitude is selected as one of
our classification feature to differentiate normal speech and
screaming.
The audio signal could be analysed in frequency domain by
performing Fourier transform to decompose a time function
audio signal into frequency function signal. For periodic
signal, discrete Fourier Transform (DFT) was applied to
convert time function into a finite combination of complex
sinusoids correspondence to frequency. Fast Fourier
Fig. 3. Frequency domain plot for ‘Tolong saya’ normal speech scenario.
Transform (FFT) is an algorithm to perform DFT. In
frequency domain, the magnitude represents the amplitude for

163

Authorized licensed use limited to: Monash University. Downloaded on August 29,2020 at 01:40:14 UTC from IEEE Xplore. Restrictions apply.
2016 IEEE 12th International Colloquium on Signal Processing & its Applications (CSPA2016), 4 - 6 March 2016, Melaka, Malaysia

each frequency. For analog signals with periodic sequence, N


can be represented by a discrete Fourier Transform as :-

∑ ⁄
(1)

From frequency domain chart Fig. 4 and Fig. 5 abnormal


scenarios, high amplitude mostly accumulate at higher
frequency compare to Fig. 3 normal speech. Frequency will be
another feature selected for our classification algorithm.
Figure 6, Figure 7 and Figure 8 show that sound source
distance from sensing microphone has a significant impact on
amplitude. Magnitude value is reduce when source move
further away from microphone but the high frequency
characteristic still remain useable for recognition. The
influence of source distance will be taken into consideration in
our abnormal sound analytic algorithm. Pre-emphasis step was
Fig. 8. Source distance at 600cm from microphone on frequency domain plot.
added to minimize the influence of source distance.
The intensity, I is related to source distance by :-
B. Classification
For audio classification, we proposed to use decision tree.
(2)
Classification tree is decision tree learning to map
(3) observations base on decision rules to conclude the target
categories. From the root, decision node will specific an
where by Pav=average power emitted, A=coverage area attribute test. The decision node will either branch to leaf node
and r=radius distance from source. which indicate the target category or path to another decision
node. These processes will continue until come to the end of
leaf node. Our decision ended with three categories: glass
breaking, human screaming and no abnormality.
Typically human voice is approximate 300-3400Hz.
Normal speech range from 85-255Hz. Generally, female’s
voice frequency is higher than male. Screaming generally
generated high pitch and loud. High pitch is corresponding to
high frequency. Meanwhile, the louder the sound, the bigger is
the amplitude.
The classification tree root start from the 1st leaf node with
recognise soften audio. To mitigate the intensity lost, audio
with magnitude lower than defined amplitude threshold will
be emphasized with a given vector. Audio with acceptable
level amplitude will proceed to next stage without additional
processing. Next step is to filter off the spectrum with
Fig. 6. Source distance at 120cm from microphone on frequency domain plot. magnitude less than amplitude filter threshold. We continue
leaf node with higher frequency threshold to classify glass
breaking sound. Lastly leaf node will differentiate the human
screaming sound or no abnormality scenario.
III. HARDWARE AND SYSTEM IMPLEMENTATION
The stimulus signal will be passed to brain for processing
and recognition. Finally, brain will trigger refection action
accordingly. These series of flow are developed in automation
machine using intelligent computing. The idea of this paper is
to convert human recognition on abnormal sound into an
automatic computing analytic surveillance system. This is
show in Fig. 9
For implementation, there are two main things to be
considered which are sound acquisition and processing. For
sound acquisition, there are two types set-up: one is single
microphone and another type is microphone arrays[6]. The
Fig. 7. Source distance at 280cm from microphone on frequency domain plot.
processing could be carried out using PC or as simple as

164

Authorized licensed use limited to: Monash University. Downloaded on August 29,2020 at 01:40:14 UTC from IEEE Xplore. Restrictions apply.
2016 IEEE 12th International Colloquium on Signal Processing & its Applications (CSPA2016), 4 - 6 March 2016, Melaka, Malaysia

embedded system[6]. In this paper, we are presenting the


abnormal sound analytical surveillance using microcontroller
embedded system. This audio detection system could be added
to current video surveillance system with minimum software
modification.
In this paper, we proposed a set-up of our abnormal sound
analytical surveillance system consist of a single microphone
and a microcontroller embedded system. The set-up is shown
in Fig. 11. There are only 3 items required: proposed
microcontroller embedded system (Fig 10), microphone and
connection cable to CCTV processor to set-up our abnormal
sound analytical surveillance. PIC32MX250F128D was
Fig. 11. Microcontroller embedded system setup.
selected as processor of proposed abnormal sound analytical
surveillance system. PIC32MX250F128D microchip is
commonly acquired in market and cost effective for
implementation. The processor could be easy be programmed Analog Audio
Signal
Microphone

by using free development software tool provided by Acquisition

Microchip[11]. A codec WM8940 was used for analog-digital


conversion(ADC). We are using Li-ion battery for locally
power supply. The microcontroller interface including of a Analog Digital
Convertor
Codec
VM8974
3.5mm audio jack socket for sensing microphone connection
and a micro USB for alertness signal transmission through
UART to main processor of CCTV surveillance. Fig. 10 is Feature Selection/
Extraction
appearance of our microcontroller embedded system. Proposed (Frequency
Domain)
embedded system was used to perform sound capturing, feature
extraction and classification. Subsequently trigger surveillance
results for awareness labeling. Proposed abnormal sound YES

analytical surveillance system algorithm flow was presented in Audio Is minimum

Fig. 12. Analytical


Algorithm
amplitude threshold
meet?

Pre- Microcontroller
emphasize Embedded
NO System

Filtering

Classification

Fig. 9. Conversion from human recognition to computing analytical Classification


Results
UART
Transmission
classification on abnormal sound.

Fig. 12. Flow chart for abnormal sound analytical surveillance system.

Fig. 10. Microcontroller PIC32MX250F128D embedded system.

165

Authorized licensed use limited to: Monash University. Downloaded on August 29,2020 at 01:40:14 UTC from IEEE Xplore. Restrictions apply.
2016 IEEE 12th International Colloquium on Signal Processing & its Applications (CSPA2016), 4 - 6 March 2016, Melaka, Malaysia

developed. Proposed abnormal sound analytic system is


Proposed abnormal sound analytic algorithm is following: capable to detect human screaming and glass breaking.

INITIALIZE the system


SETUP serial port ACKNOWLEDGMENT
SETUP ADC We would like to acknowledge CREST for the project
INITIALIZE codec WM8974 funding. We also appreciate our industry partner Myreka Sdn
START surveillance LOOP Bhd for technical support on this project.
START codec WM8974
FOR frame size
Codec WM8974 read audio input TABLE I. ASSESSMENT RESULT AT OFFICE
ENDFOR
Sound Type Error Rate (%)
Arrange the input signal in array
Signal extraction FFT Misdetection
IF Minimum Amplitude < Defined value Human scream 4%
Amplify the input signal Glass breaking 22%
END IF Normal condition 14%
IF Maximum Frequency > Defined Frequency
Classify the signal into Glass Breaking
ELSE
REFERENCES
IF Maximum Frequency > Defined Frequency
Classify the signal into Screaming [1] http://onlineapps.epu.gov.my/rmke10/rmke10_english.html
ELSE [2] Weimin Huang, Tuan Kiang Chiew, Haizhou Li, Tian Shiang
No Abnormality Kok and Jit Biswas, “Scream Detection for Home
END ELSE_IF Applications,” IEEE Industrial Electronics and Applicationsis,
pp.2115-2120, 2010.
Transmit classification result through UART
REPEAT Surveillance LOOP [3] Chuan-Yu Chang and Yi-Ping Chang, “Application of
Abnormal Sound Recognition System for Indoor Environment,”
IEEE ICICS, pp., 2013.
IV. EXPERIMENT AND EVALUATION RESULTS [4] Cheung-Fat Chan and Eric W.M. Yu, “An abnormal Sound
Detection and Classification System for Surveillance
Proposed abnormal sound analytical surveillance system is Applications,” EURASIP, pp.1851-1855, 2010.
target for indoor environment. Office was selected as our
[5] Burak Uzkent and Buket D. Barkana, “Pitch-Range Based
experiment locations. Sound samples were downloaded and Feature Extraction for Audio Surveillance Systems,” IEEE
played from website[12][13]. A total of 50 sounds including International Conference on Information technology, pp.476-
of human screaming, glass breaking and no abnormality 480, 2011.
scenario sounds were tested on our system. The performance [6] Laurentiu Frangu, Marius Mazarel, Claudiu Chiculita and Silviu
of proposed system was measured base on misdetection. Epure, “Embedded System for Audio Source Separation,” IEEE
Indicator is measured in percentage. Misdetection is Design and Technology in Electronic Packaging, pp.185-188,
calculating quantity of abnormal sound scenario that not 2010.
detected from the total test sound samples. No abnormality [7] Nihat Ozturk and Ulvi Unozkan, “Microprocessor Based Voice
sound types are human speech, sneeze, cough and laughing. Recognition System Realization,” Application of Information
Our initial test result was shown as per TABLE I. Our and Communication Technologies, pp. 1-3, 2010.
proposed abnormal sound analytical surveillance system is [8] Clayder Gonzalez-Cadenillas and Nils Murrugarra-Llerene,
having better detection on human screaming compare to glass “Isolated Words Recognition Using a Low Cost
breaking sound. In overall, the detection’s accuracy is average Microcontroller,” Computing Systems Engineering, pp. 77-82,
88%. 2013.
There are still future works to be carried out on our [9] Nitin Kandpal, Yashodhan Mandke and Amit Patwardhan,
proposed system. Performance assessment will be held at “Implementation of Voice Recognition in Low Power
different surveillance locations to collect more data to support Microcontroller,” IACSIT Hong Kong, pp. 111-115, 2012.
our proposed system’s detection accuracy. [10] Wang Zhenhai, “Application for Realizing Voice Recording
Using MCU,” Fuzzy Systems and Knowledge Discovery, pp.
82-85, 2011.
V. CONCLUSION [11] Microchip PIC32MX Family Reference Manual, 2008.
In this study, an abnormal sound analytic surveillance [12] The Freesound Project, http://www.freesound.org, 2015.
system using microcontroller prototype has successfully [13] www.freesound.org, 2015.

166

Authorized licensed use limited to: Monash University. Downloaded on August 29,2020 at 01:40:14 UTC from IEEE Xplore. Restrictions apply.

You might also like