Professional Documents
Culture Documents
BARREAU Pierrick
INSA Lyon, Telecommunications Dept. DCU, MSc. In Electronic Commerce
31 aot 2012
31/08/2012
Summary
1. Context of the study ........................................................................................................................ 2 1.1 1.2 1.3 2. Practicum presentation ........................................................................................................... 2 SensAnalytics: Goals and motivations ..................................................................................... 2 Market Research: Process and Findings .................................................................................. 2
Product specifications ..................................................................................................................... 4 2.1 2.2 Product definition.................................................................................................................... 4 Functional analysis .................................................................................................................. 5
3.
State of the art Overview ................................................................................................................ 7 3.1 3.2 3.3 Baby activity recognition: Characteristics and Challenges ...................................................... 7 Current state-of-the-art .......................................................................................................... 8 Complete solution overview ................................................................................................. 17
4.
Solution development &optimization ........................................................................................... 18 4.1 4.2 Development environment ................................................................................................... 18 The jAudio library .................................................................................................................. 19
4.2.1 Presentation and reliability .................................................................................................. 19 4.2.2 How will we use it? ............................................................................................................... 20 4.3 Solution development ........................................................................................................... 21
4.3.1 Recording sound with Android ............................................................................................. 21 4.3.2 Signal pre-processing system ............................................................................................... 22 4.3.3 Feature extractors ................................................................................................................ 22 4.3.4 Matching function ................................................................................................................ 22 4.4.1 Recognition testing ............................................................................................................... 22 4.4.2 Performance testing ............................................................................................................. 23 4.4.3 Future improvements ........................................................................................................... 24 5. Experience Feedback ..................................................................................................................... 25
Currently pursuing a Master in Electronic Commerce at the Dublin City University, I performed an innovative company creation project (called Practicum) over the summer with a team of 5 persons. In order to cater at the same time with my duties as an INSA Lyon engineering student, I turned this exercise into a Research & Development project fulfilling the requirements for both formations. Before detailing any further the content of the following R&D project, let us introduce the Practicum principles. Similar to the Innovation project conducted during the fourth year of the Telecommunications departments formation, the Practicum goals are to assess our understanding of the subjects (both technical and business) taught during the year. The outcome is a start-up creation project containing insights into both the business aspects (business model and processes, marketing) and a technical mock-up proving the viability of the concept supported by the team. As part of an international team composed of 3 business- and 2 engineering-background students, we developed a start-up called SensAnalytics. I took the role of quality manager, business analyst and developer, which allowed me to have a complete overview of the R&D process and to realize the technical implementation required for my engineering degree. Let us introduce the initial goals and motivations of my team.
1.2
In our everyday lives, millions of events take place around and inside our bodies. We as humans naturally capture and interpret some of this data; however most of it is lost or not understood. Our start-up SensAnalytics seeks to acquire these complex data and turn them into human readable communication. With the will of establishing our product in an original market, yet untouched by recent high technologies we aimed at delivering a product for the baby market. The most promising segment appeared to be the baby care one, because it is the first expenses budget of parents after food. From there, we chose to design a baby monitor, because it is the technology-related product. Our initial idea was to build a technological sustainable advantage through the use of a heterogeneous Wireless Sensor Network (WSN) which gathers complex biometric data in order to conclude on the childs status (health, sleep cycle, emotions ) using machine learning algorithms. However our idea changed as we considered the market environment.
1.3
In order to answer at best the parenting markets needs it aims at serving, SensAnalytics first task has been to analyse the worldwide market in order to identify and gather all the positive business drivers to help settling its products. 31 aot 2012
competitors offers and designed an offer that fits the best the gap between the two.
Figure 1: Market Study Process
In order to put some context on our decisions, let us review our key findings. - 78% of respondents put security guarantees (reliable communication, medical certifications ) as the most important characteristic of a baby monitor. The price is the second determinant factor in the purchase decision with 82% ranking it over size, simplicity of use. 84% of our target market owns a smartphone, which constitutes a higher penetration rate than the UK population (62%). 96% already searched on Internet if their child was normal compared to the age norms. They also admit feeling stressed by their childs mental and physical development and frustrated by the poor results of a Google search. The baby monitor marketplace is crowded and involves big players such as Philips or Motorola. Its difficult to establish a hardware product as the R&D process involves heavy costs and competitors have already rationalised their production chain expenses. The baby monitor smartphone app market is competitive as well, but with low-quality products and no implanted big players.
31 aot 2012
These statements allow us analysing the market environment to define a product that suits the best the gap between the users needs and the current competitors offers.
2. Product specifications
2.1 Product definition
After analysing the results of the survey and the market analysis, we define our final solution and opt for a three step product specification that would integrate the necessary success factors presented earlier. As illustrated bellow, these three steps would take place in a three year plan to continuously develop a sustainable business.
4
Figure 2: Product specification plan
In the following technical study, we only consider the app-to-app baby monitor. It works by using two smartphones, one placed with the child as a monitor, and the other with the parent as a receiver. The monitor detects auditory events, in particular crying and talking and then notify the receiver that alerts parents about activity, and allows them to listen to the monitor in real-time. Unlike traditional baby-monitoring solutions, our product works over WIFI and 3G, and parents can therefore monitor their child from any location. Thus, this first development step consists in offering a simple smartphone baby-monitor application with the most important features to ensure a secure, simple and efficient service easily accessible and open to important sophistication for the future steps. Features: Simple audio Monitoring: The user can listen to his child at any time anywhere 2 way audio talk: the user can talk to his baby anytime anywhere via his smartphone and the audio will be played on the other smartphones speakers. Alerts when the baby is awake and crying with sensitivity control Customizable events associated with actions: if the baby cries, the user can configure the application to automatically play a song or other audio track. Sleep cycle analysis via auto generated tables.
31 aot 2012
2.2
Functional analysis
External analysis The functions resulting from the external functional analysis are drawn in the chart below. In order to translate their importance and to give concrete objectives to development teams, each of them is given weights along with an indicator of success and a target range the product has to comply with at the end of the development. N S1 S2 S3 S4 Service functions (FS) Help configuring settings Acquire baby data Recognize baby activity Trigger actions accordingly Weigh 2 5 4 3 Objective indicator(s) Learning time Data loss Positive recognition rate False positive rate Nb possible actions Nb possible events Nb milestones info Nb medical info Nb monitored info Nb comparison indic. Comparison time Objective indicator(s) Learning time Language supported Access supported Application downtime Security guarantees Range 1 5 min < 10% > 70% < 30% 5 5 10 5 All possible All possible < 10s Range 1 5 min ENG, FR Wi-Fi, 3G, Internet < 48h / year Interferences resilience, Battery monitoring ISO 9001 >3 H > 6cm, W > 6cm 0% > 50cm
Compare with norms Constraint functions (FC) Intuitive interface Interface accessible everywhere Need reliability and security guarantees
4 Weigh 4 3 3
2 5
ISO norms Intensive test phases Smartphone size Toxic products Min acquisition range
2.2
Constraints related to smartphones 3.1 Should be hosted on smartphones with good battery life and computation power Should use few computation cost Should use little battery Should always keep top priority Should be hosted on server with good response time and high availability Should have enough space to store data 4
3 3 2 4 3
Constraints related to environment 5.1 5.2 5.3 6.1 6.2 7.1 7.2 Should adapt to background noises Should be resilient to interferences Should provide good acquisition range Cheap app purchasing cost Should only use smartphones Medical data stored anonymously Sensitive data are secured 4 5 3 5 3 Legal constraints 5 5
7.3
Customers informed about stored data Should be always connected Should ensure alerts always forwarded
Constraints related to the mobile network 8.1 8.2 4 5 Alerts to parents Message loss Yes < 2%
The Octopus chart resulting from the analysis is presented in Appendix 1. 31 aot 2012 Internal analysis Having identified the market requirements and constraints, we then conducted a FAST analysis (see Appendix 2) giving us the different development parts that needed to be considered. The diagram highlights the main function of the system performed through six service functions. Each of these service functions evolves technical functions internal to the system and corresponding to physical
31 aot 2012
3.2
Current state-of-the-art
Generally, infant activity automatic classification process is a pattern recognition problem. It comprises two main stages that are: signal processing and pattern classification [13]. However, concerning our product, a first stage is added and consists in detecting infant cries from audio records. Once cry samples are detected and extracted from audio records, it is possible to apply the signal processing and pattern recognition steps. The signal processing step aims at normalizing, cleaning and filtering the raw signal before using the suitable feature extraction techniques to build a vector of relevant values. This vector serves then as input for the classification algorithms which will compare them against their norms to conclude on the recognition or not of a given activity. Each step has its own set of technical solutions that can be then associated together to form a complete baby activity recognition system. We will analyse the different techniques available at each stages and conclude on the most suitable association for the product. 3.2.1 Signal Pre-processing
31 aot 2012
The pre-processing step is about isolating the baby sound signal through filtering and amplifying it. The challenge here is to design a digital filter which can process the sound in real-time and without too extensive resources. We opt for a low-pass Finite Impulse Response (FIR) filter at the highest frequency of the baby cry spectrum. As it ranges from 0.1 to 10 KHz for the fundamental frequency and the formants, we opt for a FIR at 10 KHz with an attenuation of -30 to -50 dB in the stop-band and a ripple of 3 dB in the pass-band. It will filter the high frequencies coming from mobile networks or home apparels surrounding the baby [4-5]. By computing a time domain convolution, we end up with a filtered signal. Associated with a peak detector and an amplifier, the resulting sound is then altered to only amplify the frequencies coming from the infant.
The audio has also to be sampled with an accurate frequency in order to reduce the computational complexity while keeping a sufficient sound quality for future cry detection and feature extraction steps. The 8 kHz sampling frequency is generally used for infant speech analysis [14], but a 20 KHz sampling with 16-bit quantization has also be used with success by Robb & al. for the determination of fundamental frequency and formants of baby cry [28]. Both will be tested during the implementation. 3.2.2 Features extraction
Once the signal has been cleaned, we can study the most important features for baby activity recognition and their extraction techniques. Most techniques found were related to cry detection. The waking process is well described in theory but has not been addressed by scientists. However we suggest our own way to detect it at the end of this section. 3.2.2.1 STE and STZC approach Because of the physical limitations of human beings, speech analysis systems have to consider short duration speech segments. Indeed, speech over short time intervals can be considered stationary, overlapping these 10-30 ms segments by half is a method used to reduce the amount of computation needed to analyse the infant cry signal [15]. The combination of two mathematical tools may be used to detect cry events from a pre-processed audio record: the Short-Time Energy (STE) and the Short-Time Zero Crossing (STZC). Short-Time Energy (STE) Short-time energy (STE) is defined as the average of the square of the sample values in a suitable window. It can be mathematically described as follows [15]: ( ) ( ) ( )
Formula 1: STE formula
31 aot 2012
where w(m) are coefficients of a suitable window function of length N. As previously mentioned, short-time processing of speech should take place during segments between 10-30 ms in length. For signals of 8 kHz sampling frequency, a window of 128 (which represents a segment of 16 ms) is suitable. STE estimation is useful as a speech detector because there is a noticeable difference between the average energy between voiced and unvoiced speech, and between speech and silence [15]. This technique is usually paired with short-time zero crossing for a robust detection scheme.
Short-time zero crossing (STZC) is defined as the rate at which the signal changes sign. STZC estimation is useful as a speech detector because there are noticeable fewer zero crossings in voiced speech as compared with unvoiced speech. It can be mathematically described as follows [15]: ( ) { | ( ( )) ( )
Formula 2: STZC formula
( (
))| ( )
( ( )) ( ( ))
Figure 4 displays the results of short-time signal detection using both STE and STZC tools. STZC allows to envelop periods when the signal changes sign with a significant rate (which is identified as speech events) while STE allows to detect significant normalized energy within these envelops to conclude on infant cry events. In order to consistently pick up desired cry events, a desired cry was defined as a voiced segment of sufficiently long duration and sufficiently noticeable STE. We can express it by using two quantifiable threshold conditions that need to be met to constitute a desired cry: (1) Normalized energy > 0.05: to eliminate non-voiced artefacts and cry precursors (breathing, whimpering). (2) Signal envelope period > 0.1 seconds: to eliminate impulsive voiced artefacts such as coughing.
10
Figure 1 (a)
Figure 1 (b)
31 aot 2012
In Figure 1, each cry envelope is bounded by the STZC and the voiced portion of each cry is bounded by where the STE meets the t = 0 axis. Figure 1(a) contains two false signals where STZC suggests an infant vocalization has occurred. However, there is no significant STE to indicate the presence of a voiced infant cry until the third vocalization. Even though this third vocalization meets the normalized energy threshold of a voiced event, the duration does not meet the minimum time period. This third vocalization was actually a cough.
The STZC in Figure 1(b) suggests that five vocalizations have occurred, four of which meet criterion for a voiced cry. However, two of these voiced vocalizations are impulsive and of too short a duration and thus are ruled out as cries through the envelope period threshold. The final vocalization lacks the energy to be analysed as a cry event. 3.2.2.2 Frequency domain approach Another approach to cry detection is to study the frequency domain of the signal by extracting: - The vocal fundamental frequency (F0), which is the lowest frequency of the voice waveform. - The formant frequencies, which indicates the acoustic resonance of the human vocal tract. They are measured as amplitude peaks in the frequency spectrum of sound (see Figure 5). - The Mel-Frequency Cepstral Coefficients (MFCC) features which allow capturing the spectral discriminant of each signals.
11
To study the spectral domain, a first step is to transform the signal representation from time to frequency domain using the Discrete Fourier Transform (DFT). This allows picturing the main frequencies of a signal. Our product requires a fast and computation-efficient algorithm to compute the DFT, thus by reviewing a benchmark of existing Fast Fourier Transform (FFT) algorithms [16], we choose the solution of Pei-Chen & al. [17] as it allows real-time FFT computing using few computational resources. Once the frequency domain of the signal is determined, the fundamental frequency and the formants can be measured using a peak detector, i.e. a function that finds maxima in the value range. To increase the reliability of the detection, some techniques aims at smoothing the signal to help real maxima appear. The Smoothed Spectrum Method (SSM) seems the most promising with an efficiency of 97.99% against 95.50% for a classical local maximum value detector and 96.86% for the Cepstrum analysis [23]. The idea is to use a weighted addition to smooth the spectrum and increase the detection reliability.
31 aot 2012
To determine the MFCCs, we follow the more complex process proposed by Vempada & al. [18]: - Divide the cry signal into sequence of frames with a frame size of 20 ms and a shift of 10 ms. - Apply the Hamming window over each of the frames. - Compute magnitude spectrum for each windowed frame by applying DFT. - Mel spectrum is computed by passing the DFT signal through Mel filter bank. - DCT is applied to the log Mel frequency coefficients to derive the desired MFCCs. The computation of these coefficients is CPU-intensive and is only supported in real-time on important and optimized infrastructures. Yet it can provide further interesting development as new initiatives to improve the algorithm are under development and because it allows distinguishing the cry cause among 3 main types (hunger, pain, wet diaper) with a good reliability [18]. 3.2.2.3 Rhythmic organisation of the sound A final approach to cry detection is to consider it as a dynamic signal. The rhythmic organisation analysis of the sound takes a look at the infant noise bursts and pauses durations. By monitoring the magnitude spectrum of the infant expiratory sounds over time, an algorithm proposed by Sandford Zeskind & al. [19] tries to find temporal features correlation among different individuals. However, even if this solution can be run in real-time without the requirement for an efficient hardware, recent investigations have proven that rhythmic organisation is not yet a reliable indicator for cry detection. 3.2.2.4 Waking detection system In the literature, the detection of infant waking is mainly addressed by recognizing cries. However, we believe that parents can find value knowing when their child is awake, not only when they cry but to feed or change them. The current research attempts focus mainly on the sleep stages recognition using complex biometric sensors such as Electro-encephalogram (ECG), accelerometers or Galvanic Skin response (GSR) [20-21], but no dedicated auditory study of the temporal waking process of an infant can be found. According to Karraker & al. [22], the waking process has some detectable auditory events such as giggles, sheets movements, or shocks. These are sudden noises, thus sudden changes in the signal spectrum. This gave us the idea to monitor the signal spectrum changes over time. When sudden peaks appear in several previously determined frequency ranges (e.g. voice spectrum) at repeated instants over time, then conclusive evidence of an infant wake can be inferred. To support that idea, one approach would be to compute the spectral density (PSD) of the signal every sample and to keep track of the past PSD. If a sudden change appears at a specified frequency, then a variable is incremented. If after a number of samples, no other change is detected then the variable is set to zero. Otherwise, if the variable exceeds a threshold value then the waking activity is recognized. The frequencies and variables involved in this solution will be defined during test sessions with baby as it is a rather empiric system. The assumptions surrounding this idea will also be further tested with different baby noises and environment before adding it to the customer-facing application.
12
31 aot 2012
13
Once the features have been extracted and form a vector of value, there are two main approaches to recognize a pattern from this data: - A static matching function which compare the values against known and identified norms giving a matching score between the signal and an ideal activity-related signal. If the score is greater than a decision-threshold, the activity is recognized. - Machine learning algorithms which, rather than processing the data, act as a black box that learn from precedent outputs its own classification and regression model and conclude directly on a recognized activity giving the vector position in the data space. Let us further detail and compare them. 3.2.3.1 Matching functions The design of a matching function is empiric and involves three decisions that can severely impact its performance. Firstly, different functions can be employed. The most simple and adapted to our case is the weighted differential addition (see Formula 3). Giving a set of features that we have previously determined (Normalised energy (STE), Signal envelope period (STZC), fundamental frequency (F0) and formants (F1 Fx), the function is the weighted sum of differences between the features values of a given signal and an ideal activity signal. If the result of that function is lower than a threshold then the activity is recognized.
) )
With w: output n: number of indicators wn: feature weight vsignal: feature value for the studied signal vnorm: feature value for the ideal signal
14
Once the function has been defined, the feature weights and the threshold values should be determined. The weights attribution can be done considering: the importance of the features for the activity recognition, their reliability (increased if reliable, else lowered), but also the usual gap range between the signal and the norm in order to reduce the unwanted impact of a non-determinant feature difference. The threshold value is defined through testing and experiences in order to lower the false positives and negatives rates. Once the matching function has been designed, it can be deployed anywhere. Considering our small set of features, it uses little computational power. Their only drawback is that the determination of the weights and threshold values should be done every time a new feature is added. However, once the matching function class has been implemented, it can be used for other functionalities without the need for any other development. 3.2.3.2 Machine learning algorithms 31 aot 2012 Machine learning is the branch of artificial intelligence that studies and develops architectures and algorithms to equip an agent (a machine which is usually a computer) with certain behaviour and an ability to build internal models from empirical training data in order to solve a certain task [27]. Among them we distinguish the Support Vector Machine (SVM) and the Neural Network (NN) that are often used for auditory event and activity classification.
These algorithms are based on training samples. At each iteration the SVM is presented with a set of sample feature vectors and its associated activity (e.g. crying / not crying). By processing these examples, the SVM maps them into its internal value space and computes the regression model (e.g. segmenting the space between crying and not crying activities). Once the SVM is trained, when unmarked feature vectors are given to it, it is able to recognize the pattern in a time and computation-efficient manner. Neural Network (NN)
15
A neural network is a multi-categorical classifier. It is composed of an interconnected multi-layered set of entities called neurons, where each neuron can be activated outputting its activity which is a level of confidence in the recognition of a pattern. Each neuron is connected to the neurons at the next layer by weighted links.
y1 y2
w1
w2
( )
Neuron Weights
()
Output for next level
yn
31 aot 2012
wn
The whole concept relies on the firing function (). When the sum of all inputs multiplied by their affected weights exceeds a certain threshold, the neuron is activated and outputs a value yj as explained in Figure 7. Thus the decision-making algorithm is the combination of multiple neurons
16
31 aot 2012
3.3
When aggregating all these design choices into one solution, we end up with the following Activity recognition system architecture detailed in Figure 8 below.
17
31 aot 2012
4.1
Development environment
Before starting to develop any application, it is important to install and configure an accurate and optimal development environment. Google provides (in addition to the Operating System) a set of tools for application development projects. The development environment we use is composed of several layers with specific roles: - A Java runtime environment - JRE - A development platform Eclipse - A Java development Kit - JDK - Modules and libraries related to the project - An Android development Kit - SDK - An Android device The architecture of the development environment is detailed in the following diagram.
18
31 aot 2012
JRE
JDK
SDK
Eclipse
19
Test Devices
4.2
4.2.1 Presentation and reliability JAudio is a new framework for feature extraction designed to eliminate the duplication of effort in calculating features from an audio signal. This system meets the needs of audio processing researchers by providing a library of analysis algorithms that are suitable for a wide array of sound analysis tasks. It provides an easy-to-use GUI that makes the process of selecting desired features straight forward but also a command-line interface to manipulate its services via scripting. Here is the common process of using jAudio. The system takes a sequence of audio les as input. In the GUI, users select the features that they wish to have extractedletting jAudio take care of all dependency problemsand either execute directly from the GUI or save the settings for batch processing. The output is either an ACE XML le or an ARFF le depending on the users preference. In order to address issues related to audio feature extraction, jAudio was designed by taking into account technical specifications and several design decisions were taken. Many of these design decisions match our needs for the implementation of the cry detection system presented above:
20
31 aot 2012
30
BARREAU Pierrick Activity Recognition System for baby monitoring 4.3 Solution development
The principal component of an Android application is the Activity. It is a single, focused thing that the user can do and the entry point of the SDK. From that central point, one can invoke any object necessary for the application. In order to start the development with a good overview of what objects implementation needed to be done, we first draw an UML class diagram. It allows separating the concerns between 4 main components: - The activity recognition system and how recording sound using Android SDK. - The pre-processing system and how filtering the sound to improve its quality - The feature extractors and how using jAudio to quickly craft our own extractors. - The matching functions and how adapting weights and threshold to make it more reliable.
21
Figure 10: UML class diagram
4.3.1
The AudioRecord object is provided by Android to directly pull sound from any audio source for the smartphone. We configure it to take the microphone as an input (MediaRecorder.AudioSource.MIC). As we previously said we will try two different sampling frequencies. When using the Eclipse emulator (AVD), the sample frequency is set to 8 KHz as it cannot support more. When deployed on a real-world smartphone, it is set to 20 KHz. As for the audio encoding, we choose to quantize on 16bit (using AudioFormat.ENCODING_PCM_16BIT) as it proves to have good enough results for Robb & al. [28]. Finally we set the channel configuration to CHANNEL_IN_MONO to effectively pull voice sound from the microphone. Thus we end up with: audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, 8000, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, bufferSize); 31 aot 2012
Figure 11: Audio recording source code
It allows creating a stream to which we can then apply the noise suppression (via the object NoiseSuppressor), the echo cancelation (via the object EchoCanceler) and the signal normalization (via the object AutomaticGainControl) pre-processing algorithms. Then we can store this stream as an array of short in a buffer in order to forward it to the filtering section and then to the feature extractors. The code used to record sound on Android is provided in Appendix 4.
30
In order to further removing the noise coming from the external environment, we filter the signal using a digital FIR at 10 KHz with an attenuation of -30 to -50 dB in the stop-band and a ripple of 3 dB in the pass-band (see 3.2.1). We use Matlab to generate it with the sptool functionality configured to employ the Hanning Window method. This generates a table of coefficients that we store in the attribute Coefficients of the Filter class. To filter the signal, we then just have to implement a convolution algorithm. 4.3.3 Feature extractors
Using the jAudio library, we are able to directly apply an FFT algorithm (using the FFT object) on the buffer. Then, using the PeakFinder object, we can conclude on the fundamental frequency and the signal formants. To implement the STE and STZC, we use the FeatureExtractor interface. It provides a set of common methods that fits well with our project. 4.3.4 Matching function
The implementation of the matching function is straightforward. This a class including the weights, threshold and ideal values for activity recognition as attributes. In order to be able to update the sensibility of the app according to the baby voice characteristics, we define getters and setters to update the weights and the threshold. The pattern recognition is performed by the method computeFunction. It is a simple implementation of the weighted differential addition formula presented in section 3.2.3.1. The source code can be found in Appendix 5.
22
4.4
4.4.1
In order to efficiently test our ARS, we need to have a database to benchmark the system against. We first thought of using the Baby Chillanto database [30] and ask the researchers involved at the Instituto Nacional de Astrofisica Optica y Electronica to get access to it. However our requests being unanswered, we chose to constitute our own baby sounds database. The main drawback is that we are unable to provide standard figures to compare with other existing solutions. To constitute this collection, we searched online (mainly on sound sharing platforms such as findsounds.com) and gathered 15 baby sounds samples. Then we play these sounds near the smartphone microphone on various environments and assess our system. By constantly refining our weights and threshold, we finally ended up with a recognition rate of 40% (6 samples over 15). A second optimisation step was to quantify the effect of the audio effects added during the sound acquisition on the recognition rate. By successively disabling these extra functionalities, we discovered that the two most important pre-processing effects were the noise suppression and the signal normalisation algorithms. Considering those results, we chose to disable the echo cancelation algorithm, saving computation resources in the process.
31 aot 2012
30
As we previously said in the functional analysis, our design goals are a recognition rate of at least 70% to have a reliable baby monitor. We are far from this requirement, mainly because the matching function algorithm showed its limits. As future work, we plan to migrate towards a SVM solution (see 4.4.3). 4.4.2 Performance testing
To test the performance of our application we use two separate environments: - The Android Virtual Devices (AVD) that emulates a smartphone on a computer. Directly integrated within Eclipse, it allows testing the application among several platforms without the need to buy them physically. We use that tool to test the application on Samsung Galaxy Nexus and Motorola MT870. - 2 real-world smartphones (the HTC Sense and the HTC One). The logs and resource consumption can be directly viewed in the feedback console of Eclipse. This allowed us to see the performance of our application when deployed on a broad range of smartphone. As previously said in the functional analysis (see 2.2), the application requirements will be a smartphone CPU rate of at least 1.5 GHz, therefore we choose the smartphones available as AVD according to that characteristic. Moreover, our design goal is to provide an application that uses at worst 15% of the CPU use rate. The application CPU use depends on the smartphone performance and from the Android version deployed [31]. Therefore we conducted a benchmark of different smartphones on different Android OS version. The performance test results are summarized in the chart below. The presented CPU use percentage has been determined by taking the average maximal rate reported by Eclipse during a test session. Smartphone HTC Sense Real smartphone Android 4.1 Android 4.0 Android 3.2 Android 2.3.3 31 aot 2012 19.4% / / / HTC One Real smartphone / / 20.2% / Samsung Galaxy AVD 20.5% 21.2% 22.4% 26.7% Motorola MT870 AVD / / 24.3% 27.6%
23
As we can see the performance design goals are rarely reached. However, we took a look at some possible optimisation. By recording asynchronously the sound, we would save some resources as processing is given much priority while recording can take place when resources are free. To perform that, we change the implementation of the SoundRecorder so it extends the ASyncTask class. This change only requires to implement the doInBackground() method which will contain the code to
30
For the most recent smartphone and OS version the design goals are fulfilled but with short success. Further work need to be conducted to improve the resource use of the app. A possible future advancement would be to improve the storage and sharing of the acquired audio signal to be in a dynamic buffer. 4.4.3 Future improvements
As the matching function proved to be limited as a pattern recognition technique, we plan to implement a Support Vector Machine (SVM) integrated in the Android Application using the Native Development Kit (NDK). Indeed, the NDK allows programming in C/C++, which is a programming language more suitable to implement this type of solution than Java. The ultimate goal would be to implement some advanced training algorithms and reach a 60-70% recognition rate. Moreover, we plan to use a shared memory among threads to store the audio buffer which contains the sound signal pulled from the microphone. This would allow the recorder continuous and asynchronously storing sound while the activity recognition system object consumes that data to conclude on a recognised baby state. The goal would then be to reach a CPU use rate lower than 10%.
24
31 aot 2012
30
5. Experience Feedback
The practicum aims at bringing to market a focused and adapted product. Thus the technical solution has constantly been changed as the project goes on to fit with the market needs or the business model evolutions. This challenging experience of continuously refining network architecture and software to answer evolving users issues while keeping the whole product coherent was my first real experience of R&D process. Moreover, I had the chance to collaborate with individuals from different background (business, management, and marketing) and from different countries (France, Ireland, and Spain). This association of multiple competences, work methodologies and cultures within a team constitute an interesting insight on what could be a real-world international start-up environment. In addition to the technical knowledge I developed throughout the solutions implementation, I also had the opportunity to help businessmen defining our key value proposition and business model, along with identifying potential future prospects. This complete overview of the development of a project both from a business and a technical perspective added an entrepreneurial competence to my resume. Furthermore, with the rise of users will to capture their daily activity and the maturity of wireless body sensor network, the importance of pattern recognition systems will grow in the upcoming year. Having a lot of interests in these technologies, and more particularly in machine learning algorithms, this technological study fits well with my professional career expectations. Finally, collaborating with the CLARITY research center1 allowed me to get a first experience with a research activity. Indeed, our project being supervised by Cathal Gurrin and Alan Smeaton, two major managers of that center, the state-of-the-art and solution definition were performed in a laboratory context. Thus this master thesis project gave me the opportunity to immerse myself in a highly technological start-up working with multiple important stakeholders of the field.
25
31 aot 2012
30
Appendices
Appendix 1: The Octopus Chart
Parents
Baby FP FC2
FS-S1
Cost
FS-S2
FS-S4 FC6
FC1
FS-S3
FC3 FS-E2
Smartphone
26
FS-E1
FC5
Physical Environment
FC4
Server
FC8
Mobile Network
FC7
Legal environment
BARREAU Pierrick Activity Recognition System for baby monitoring Appendix 2: FAST diagrams
27
28
31 aot 2012
30
29
31 aot 2012
30
30
31 aot 2012
30
BARREAU Pierrick Activity Recognition System for baby monitoring Appendix 3: Sound recording code
package com.example.sensanalytics; import android.os.AsyncTask; import import import import import import import import import import import java.io.BufferedOutputStream; java.io.DataOutputStream; java.io.File; java.io.FileNotFoundException; java.io.FileOutputStream; java.io.IOException; android.util.Log; android.annotation.TargetApi; android.media.AudioRecord; android.media.MediaRecorder; android.media.AudioFormat;
public class SoundRecorder extends AsyncTask <Void, Integer, Void> private File file; private Boolean isRecording;
private int frequency = 8000; private int channelConfiguration = AudioFormat.CHANNEL_IN_MONO; private int audioEncoding = AudioFormat.ENCODING_PCM_16BIT; private AudioRecord audioRecord; public File getFile() { return file; } public void setFile(File file) { this.file = file; } public Boolean getIsRecording() { return isRecording; } public void setIsRecording(Boolean isRecording) { this.isRecording = isRecording; }
31
31 aot 2012
30
32
31 aot 2012
} audioRecord.stop(); audioRecord.release(); dos.close(); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) {
30
33
31 aot 2012
30
References
[1] Bao L., Intille S. S. "Activity Recognition from User-Annotated Acceleration Data", In Proceedings of the Second International Conference in Pervasive Computing (PERVASIVE '04). Vienna, Austria, pp. 1-17, 2004. [2] Stikic M., Laerhoven K. V., Schiele B., "Exploring semi-supervised and active learning for activity recognition", 12th IEEE International Symposium on Wearable Computers, 2008, pp. 81-88. [3] Wasz-Hockert, O., Lind, J., Vuorenkoski, V., Partanen, T. and Valanne, E., The infant cry: a spectrographic and auditory analysis, Clinics in Developmental Medicine No. 29, London: Spastics International Publications, 1988. [4] Clarkson B., Extracting context from environmental audio, Digest of Papers. Second International Symposium on Wearable sensors 1998, 1998, pp. 154-155. [5] Murry, T. Acoustic and perceptual characteristics of infant cries. In: Murry, T., Murry, J. (Eds.), Infant Communication: Cry and Early Speech. TX: College Hill Press, 1980, pp. 251-271. [6] Wasz-Hockert, O., Michelsson, K. and Lind, J. (1985) Twenty-five years of Scandinavian cry research. In: Lester, B.M, Boukydis, C.F.Z. (Eds.) Infant Crying: Theoretical and Research Perspectives. Plenum, New York, pp. 83-104. [7] Michelsson, K., Sirvio, P., Koivisto, M., Sovijarvi, A. and Wasz-Hockert, 0., Spectrographic analysis of pain cry in neonates with cleft palate, Biol. Neonate 26, 1975, pp. 353-358. [8] Michelsson, K., Sirvio, P. and Wasz-Hockert, 0., Sound spectrographic cry analysis of infants with bacterial meningitis, Devel. Med. Child Neurol. 19, 1977, pp. 309-315. [9] Blinick, G., Travolga, W.N. and Antopol, W. Variations in birth cries of new-born infants from narcotic addicted and normal mothers, Am. J. Obstet. Gynecol. 110, 1971, pp. 48-958. [10] Cacace, A. T., Robb, M. P., Saxman, J. H., Risemberg, H., Koltai, P., "Acoustic features of normalhearing pre-term infant cry", International journal of pediatric otorhinolaryngology, Volume 33, Issue 3, 1995, pp. 213 224. [11] Murray, A.D., Javel, E. and Watson, C.S., "Prognostic validity of auditory brainstem evoked response screening in new-born infants", Am. J. Otolaryngol. 6, 1985, pp. 120-131. [12] Oller, D.K., Eilers, R.E., Bull, D.H. and Carney, A.E., "Prespeech vocalizations of a deaf infant: a comparison with normal metaphonalogical development", J. Speech Hear. Res. 28, 1985, pp. 47-63. 31 aot 2012 [13] Saraswathy, J., Hariharan, M., Yaacob, S., Khairunizam, W., " Automatic Classification of Infant Cry: A Review", International Conference on Biomedical Engineering, 2012, pp. 534-549. [14] Kevin Kuo, Feature Extraction and Recognition of Infant Cries, 2010 IEEE International Conference on Electro/Information Technology (EIT), 2010, pp. 1-5. [15] Kondoz, A. M., Digital Speech, John Wiley & Sons Ltd, West Sussex, England, 2004.
34
30
35
30
36
31 aot 2012
30