SensAnalytics - Activity Recognition System For Baby Monitoring

BARREAU Pierrick Activity Recognition System for baby monitoring
BARREAU Pierrick
INSA Lyon, Telecommunications Dept. DCU, MSc. In Electronic Commerce
Activity Recognition System for Baby Monitoring

Final report for a Final-year project at the INSA Lyon, Telecommunications, Services & Usages department
31 aot 2012
Tutor: Dr Cathal GURRIN Practicum Coordinator Dublin City University
31/08/2012
Summary
1. Context of the study ........................................................................................................................ 2 1.1 1.2 1.3 2. Practicum presentation ........................................................................................................... 2 SensAnalytics: Goals and motivations ..................................................................................... 2 Market Research: Process and Findings .................................................................................. 2
Product specifications ..................................................................................................................... 4 2.1 2.2 Product definition.................................................................................................................... 4 Functional analysis .................................................................................................................. 5
3.
State of the art Overview ................................................................................................................ 7 3.1 3.2 3.3 Baby activity recognition: Characteristics and Challenges ...................................................... 7 Current state-of-the-art .......................................................................................................... 8 Complete solution overview ................................................................................................. 17
4.
Solution development &optimization ........................................................................................... 18 4.1 4.2 Development environment ................................................................................................... 18 The jAudio library .................................................................................................................. 19
4.2.1 Presentation and reliability .................................................................................................. 19 4.2.2 How will we use it? ............................................................................................................... 20 4.3 Solution development ........................................................................................................... 21
4.3.1 Recording sound with Android ............................................................................................. 21 4.3.2 Signal pre-processing system ............................................................................................... 22 4.3.3 Feature extractors ................................................................................................................ 22 4.3.4 Matching function ................................................................................................................ 22 4.4.1 Recognition testing ............................................................................................................... 22 4.4.2 Performance testing ............................................................................................................. 23 4.4.3 Future improvements ........................................................................................................... 24 5. Experience Feedback ..................................................................................................................... 25
Appendices ............................................................................................................................................ 26 References ............................................................................................................................................. 34 31 aot 2012
1. Context of the study

1.1 Practicum presentation
Currently pursuing a Master in Electronic Commerce at the Dublin City University, I performed an innovative company creation project (called Practicum) over the summer with a team of 5 persons. In order to cater at the same time with my duties as an INSA Lyon engineering student, I turned this exercise into a Research & Development project fulfilling the requirements for both formations. Before detailing any further the content of the following R&D project, let us introduce the Practicum principles. Similar to the Innovation project conducted during the fourth year of the Telecommunications departments formation, the Practicum goals are to assess our understanding of the subjects (both technical and business) taught during the year. The outcome is a start-up creation project containing insights into both the business aspects (business model and processes, marketing) and a technical mock-up proving the viability of the concept supported by the team. As part of an international team composed of 3 business- and 2 engineering-background students, we developed a start-up called SensAnalytics. I took the role of quality manager, business analyst and developer, which allowed me to have a complete overview of the R&D process and to realize the technical implementation required for my engineering degree. Let us introduce the initial goals and motivations of my team.
1.2
SensAnalytics: Goals and motivations
In our everyday lives, millions of events take place around and inside our bodies. We as humans naturally capture and interpret some of this data; however most of it is lost or not understood. Our start-up SensAnalytics seeks to acquire these complex data and turn them into human readable communication. With the will of establishing our product in an original market, yet untouched by recent high technologies we aimed at delivering a product for the baby market. The most promising segment appeared to be the baby care one, because it is the first expenses budget of parents after food. From there, we chose to design a baby monitor, because it is the technology-related product. Our initial idea was to build a technological sustainable advantage through the use of a heterogeneous Wireless Sensor Network (WSN) which gathers complex biometric data in order to conclude on the childs status (health, sleep cycle, emotions ) using machine learning algorithms. However our idea changed as we considered the market environment.
1.3
Market Research: Process and Findings
In order to answer at best the parenting markets needs it aims at serving, SensAnalytics first task has been to analyse the worldwide market in order to identify and gather all the positive business drivers to help settling its products. 31 aot 2012

Following the process outlined in Figure 1, the whole team began by defining what the core elements of its offer were. We summarized our project as an improvement to existing baby monitors that allow parents to get access to valuable analytics at any time and any place. The core element of the value proposition is the access to comprehensive data helping parents in their daily decision-making and empowering the child to communicate with them. After having determined our target market as being the UK considering a set of indicators, we surveyed a pool of first-time parents collecting their needs. Then we studied the marketplace identifying the strengths and weaknesses of the
competitors offers and designed an offer that fits the best the gap between the two.
Figure 1: Market Study Process
In order to put some context on our decisions, let us review our key findings. - 78% of respondents put security guarantees (reliable communication, medical certifications ) as the most important characteristic of a baby monitor. The price is the second determinant factor in the purchase decision with 82% ranking it over size, simplicity of use. 84% of our target market owns a smartphone, which constitutes a higher penetration rate than the UK population (62%). 96% already searched on Internet if their child was normal compared to the age norms. They also admit feeling stressed by their childs mental and physical development and frustrated by the poor results of a Google search. The baby monitor marketplace is crowded and involves big players such as Philips or Motorola. Its difficult to establish a hardware product as the R&D process involves heavy costs and competitors have already rationalised their production chain expenses. The baby monitor smartphone app market is competitive as well, but with low-quality products and no implanted big players.
31 aot 2012
These statements allow us analysing the market environment to define a product that suits the best the gap between the users needs and the current competitors offers.
2. Product specifications
2.1 Product definition
After analysing the results of the survey and the market analysis, we define our final solution and opt for a three step product specification that would integrate the necessary success factors presented earlier. As illustrated bellow, these three steps would take place in a three year plan to continuously develop a sustainable business.
4
Figure 2: Product specification plan
In the following technical study, we only consider the app-to-app baby monitor. It works by using two smartphones, one placed with the child as a monitor, and the other with the parent as a receiver. The monitor detects auditory events, in particular crying and talking and then notify the receiver that alerts parents about activity, and allows them to listen to the monitor in real-time. Unlike traditional baby-monitoring solutions, our product works over WIFI and 3G, and parents can therefore monitor their child from any location. Thus, this first development step consists in offering a simple smartphone baby-monitor application with the most important features to ensure a secure, simple and efficient service easily accessible and open to important sophistication for the future steps. Features: Simple audio Monitoring: The user can listen to his child at any time anywhere 2 way audio talk: the user can talk to his baby anytime anywhere via his smartphone and the audio will be played on the other smartphones speakers. Alerts when the baby is awake and crying with sensitivity control Customizable events associated with actions: if the baby cries, the user can configure the application to automatically play a song or other audio track. Sleep cycle analysis via auto generated tables.
31 aot 2012

Having identified the core product we aim at bringing to the market, we start the development of version 1.0 of the mobile application by conducting a functional analysis. This first step allows writing a complete specification with requirements and constraints, along with identifying the different development tasks the R&D team will have to perform.
2.2
Functional analysis
External analysis The functions resulting from the external functional analysis are drawn in the chart below. In order to translate their importance and to give concrete objectives to development teams, each of them is given weights along with an indicator of success and a target range the product has to comply with at the end of the development. N S1 S2 S3 S4 Service functions (FS) Help configuring settings Acquire baby data Recognize baby activity Trigger actions accordingly Weigh 2 5 4 3 Objective indicator(s) Learning time Data loss Positive recognition rate False positive rate Nb possible actions Nb possible events Nb milestones info Nb medical info Nb monitored info Nb comparison indic. Comparison time Objective indicator(s) Learning time Language supported Access supported Application downtime Security guarantees Range 1 5 min < 10% > 70% < 30% 5 5 10 5 All possible All possible < 10s Range 1 5 min ENG, FR Wi-Fi, 3G, Internet < 48h / year Interferences resilience, Battery monitoring ISO 9001 >3 H > 6cm, W > 6cm 0% > 50cm
Related to baby security
Related to baby evolution E1 Gather baby evolution information 2
E2 N 1.1 1.2 1.3
Compare with norms Constraint functions (FC) Intuitive interface Interface accessible everywhere Need reliability and security guarantees
4 Weigh 4 3 3
Constraints related to parents
1.4 31 aot 2012 2.1
Need quality certifications Should be harmless for health
2 5
ISO norms Intensive test phases Smartphone size Toxic products Min acquisition range
Constraints related to baby
2.2
Should not be reachable from bed

2.3 2.4 Should be strong Baby-side applications should be silent 2 4 Resilience to shock Block calls & messages Ensure silent mode Battery life Processor power CPU use rate Battery impact Priority downtime Server response time Server downtime Database initial capacity DB capacity growth SNR Intersymbol interference Max acquisition range Price Extra device Anonymous storage Development security guidelines Communication protocol Terms and conditions Yes Yes 100% use time > 10h > 1.5 GHz < 15% < 10% < 5% < 1 sec < 48h / year > 1 To > 5 To / year > 5 dB BER < 15% > 5m 0 0 Yes Common criteria level 2 SSL-TLS Yes
Constraints related to smartphones 3.1 Should be hosted on smartphones with good battery life and computation power Should use few computation cost Should use little battery Should always keep top priority Should be hosted on server with good response time and high availability Should have enough space to store data 4
3.2 3.3 3.4 4.1 4.2
3 3 2 4 3
Constraints related to server
Constraints related to environment 5.1 5.2 5.3 6.1 6.2 7.1 7.2 Should adapt to background noises Should be resilient to interferences Should provide good acquisition range Cheap app purchasing cost Should only use smartphones Medical data stored anonymously Sensitive data are secured 4 5 3 5 3 Legal constraints 5 5
Constraints related to price
7.3
Customers informed about stored data Should be always connected Should ensure alerts always forwarded
Constraints related to the mobile network 8.1 8.2 4 5 Alerts to parents Message loss Yes < 2%
The Octopus chart resulting from the analysis is presented in Appendix 1. 31 aot 2012 Internal analysis Having identified the market requirements and constraints, we then conducted a FAST analysis (see Appendix 2) giving us the different development parts that needed to be considered. The diagram highlights the main function of the system performed through six service functions. Each of these service functions evolves technical functions internal to the system and corresponding to physical

and hardware solutions. Considering the set of identified technical functions and selected solutions, the FAST diagram imposes several features for the implementation: - An efficient network architecture, - A reliable app-server communication over 3G network, - An activity recognition system, - An intuitive application design aimed at parents. Partnering with another engineer, we split the workload into two parts. I chose to focus on the network architecture and on the implementation of an activity recognition system. In the following I will outline the R&D process associated with the ARS solution. In order to explain progressively our solution, we will briefly highlight the main aspects and challenges of the baby activity recognition field. With a clearer understanding of the domain, we review the most useful sound features and the most common techniques to consider for waking and cry detection. By assessing them according to the products functional requirements, we conclude on the solution that fits best our needs.
3. State of the art Overview

3.1 Baby activity recognition: Characteristics and Challenges
As every human activity recognition field, the baby one faces the same difficulty: translating the complex stimuli information of the human body into computer-understandable data [1]. What our brain is able to perform innately through cortex and synapse interactions demands many computation power from machines and a lot of careful studies from scientists [2]. Moreover, on the same way that our understanding of someones behaviour is not entirely reliable, computers algorithms can only be partially trusted when concluding on emotions or activity recognition. Therefore we see appear the two of the domains characteristics inherent from the humanity nature: - The complexity of the computation algorithms (regarding both implementation and execution). - The uncertainty of results, which make any solution only partially reliable. The fact that we only use sound as an input for the recognition is also determinant. The auditory environment surrounding the baby can be at any moment polluted by noises coming for different sources that we cannot identify upfront [3]. These noises can trick the algorithms by presenting the same characteristics as a baby cry or waking signal and thus alter the results (this situation is called false positives). They can also overlap and distort an actual baby signal, thus causing the algorithm not to detect the activity (that situation is called false negatives) [4]. Therefore, a major challenge of our solution is to reliably differentiate baby signals from any background noises. A first processing step will thus isolate and amplify this signal to ensure that recognition algorithms will always be passed good enough quality signals. Other difficulties are inherent to the voice evolution at the early stages of the childs life. All humans have different vocal attributes but some frequencies are similar and thus retrieval of activities based on sound can be done quite reliably. However, during the 6 to 12 first months of the infant existence, his voice evolves to get its first stabilized form. This initial state influences greatly his cry or voice signals and is variable depending on his ethnical origins [5-6], the diseases he may have [7-8], but also considering the prenatal conditions of his birth (drugs [9], alcohol consumption [10] by the mother, pre-term/full-term [10-11]) and its auditory capabilities [12]. This can be seen as a problem as it creates a requirement for specific cases recognition, but as the research field is currently well
31 aot 2012

documented on their sound characteristics it can also be seen as a future opportunity as the application could be turned into a disease detector [7-8]. However, even giving these slight changes, the neonatal cry is a reasonably patterned vocal behaviour considered to have innate biological function. To sum up, our product need a pre-processing signal segmentation step and to take into account the sound features that are relevant to the major part of the baby population. Considering that the activity recognition aims at triggering alerts and actions, false negatives (when a real cry signal is not detected) are far more critical that false positives (when the algorithms is tricked into recognizing a fake cry signal), because parents would prefer to be alerted more than needed instead of missing an important moment. The solution will thus be selected according to 3 characteristics: - Its false negatives rate. - Its ability to recognize specific cases. - But also its computation power demand. To conclude, even if this research field is still full of technical challenges yet to be answered, there are some solutions that are able to recognize a baby cry and waking status with a promising success rate. Let us review the current state-of-the-art and compare them considering our products requirements.
3.2
Current state-of-the-art
Generally, infant activity automatic classification process is a pattern recognition problem. It comprises two main stages that are: signal processing and pattern classification [13]. However, concerning our product, a first stage is added and consists in detecting infant cries from audio records. Once cry samples are detected and extracted from audio records, it is possible to apply the signal processing and pattern recognition steps. The signal processing step aims at normalizing, cleaning and filtering the raw signal before using the suitable feature extraction techniques to build a vector of relevant values. This vector serves then as input for the classification algorithms which will compare them against their norms to conclude on the recognition or not of a given activity. Each step has its own set of technical solutions that can be then associated together to form a complete baby activity recognition system. We will analyse the different techniques available at each stages and conclude on the most suitable association for the product. 3.2.1 Signal Pre-processing
31 aot 2012
The pre-processing step is about isolating the baby sound signal through filtering and amplifying it. The challenge here is to design a digital filter which can process the sound in real-time and without too extensive resources. We opt for a low-pass Finite Impulse Response (FIR) filter at the highest frequency of the baby cry spectrum. As it ranges from 0.1 to 10 KHz for the fundamental frequency and the formants, we opt for a FIR at 10 KHz with an attenuation of -30 to -50 dB in the stop-band and a ripple of 3 dB in the pass-band. It will filter the high frequencies coming from mobile networks or home apparels surrounding the baby [4-5]. By computing a time domain convolution, we end up with a filtered signal. Associated with a peak detector and an amplifier, the resulting sound is then altered to only amplify the frequencies coming from the infant.
Figure 3: Pre-processing system overview
The audio has also to be sampled with an accurate frequency in order to reduce the computational complexity while keeping a sufficient sound quality for future cry detection and feature extraction steps. The 8 kHz sampling frequency is generally used for infant speech analysis [14], but a 20 KHz sampling with 16-bit quantization has also be used with success by Robb & al. for the determination of fundamental frequency and formants of baby cry [28]. Both will be tested during the implementation. 3.2.2 Features extraction
Once the signal has been cleaned, we can study the most important features for baby activity recognition and their extraction techniques. Most techniques found were related to cry detection. The waking process is well described in theory but has not been addressed by scientists. However we suggest our own way to detect it at the end of this section. 3.2.2.1 STE and STZC approach Because of the physical limitations of human beings, speech analysis systems have to consider short duration speech segments. Indeed, speech over short time intervals can be considered stationary, overlapping these 10-30 ms segments by half is a method used to reduce the amount of computation needed to analyse the infant cry signal [15]. The combination of two mathematical tools may be used to detect cry events from a pre-processed audio record: the Short-Time Energy (STE) and the Short-Time Zero Crossing (STZC). Short-Time Energy (STE) Short-time energy (STE) is defined as the average of the square of the sample values in a suitable window. It can be mathematically described as follows [15]: ( ) ( ) ( )
Formula 1: STE formula
31 aot 2012
where w(m) are coefficients of a suitable window function of length N. As previously mentioned, short-time processing of speech should take place during segments between 10-30 ms in length. For signals of 8 kHz sampling frequency, a window of 128 (which represents a segment of 16 ms) is suitable. STE estimation is useful as a speech detector because there is a noticeable difference between the average energy between voiced and unvoiced speech, and between speech and silence [15]. This technique is usually paired with short-time zero crossing for a robust detection scheme.

Short-Time Zero Crossing (STZC)
Short-time zero crossing (STZC) is defined as the rate at which the signal changes sign. STZC estimation is useful as a speech detector because there are noticeable fewer zero crossings in voiced speech as compared with unvoiced speech. It can be mathematically described as follows [15]: ( ) { | ( ( )) ( )
Formula 2: STZC formula
( (
))| ( )
( ( )) ( ( ))
Figure 4 displays the results of short-time signal detection using both STE and STZC tools. STZC allows to envelop periods when the signal changes sign with a significant rate (which is identified as speech events) while STE allows to detect significant normalized energy within these envelops to conclude on infant cry events. In order to consistently pick up desired cry events, a desired cry was defined as a voiced segment of sufficiently long duration and sufficiently noticeable STE. We can express it by using two quantifiable threshold conditions that need to be met to constitute a desired cry: (1) Normalized energy > 0.05: to eliminate non-voiced artefacts and cry precursors (breathing, whimpering). (2) Signal envelope period > 0.1 seconds: to eliminate impulsive voiced artefacts such as coughing.
10
Figure 1 (a)
Figure 1 (b)
Figure 4: Cry signal detection examples
31 aot 2012
In Figure 1, each cry envelope is bounded by the STZC and the voiced portion of each cry is bounded by where the STE meets the t = 0 axis. Figure 1(a) contains two false signals where STZC suggests an infant vocalization has occurred. However, there is no significant STE to indicate the presence of a voiced infant cry until the third vocalization. Even though this third vocalization meets the normalized energy threshold of a voiced event, the duration does not meet the minimum time period. This third vocalization was actually a cough.
The STZC in Figure 1(b) suggests that five vocalizations have occurred, four of which meet criterion for a voiced cry. However, two of these voiced vocalizations are impulsive and of too short a duration and thus are ruled out as cries through the envelope period threshold. The final vocalization lacks the energy to be analysed as a cry event. 3.2.2.2 Frequency domain approach Another approach to cry detection is to study the frequency domain of the signal by extracting: - The vocal fundamental frequency (F0), which is the lowest frequency of the voice waveform. - The formant frequencies, which indicates the acoustic resonance of the human vocal tract. They are measured as amplitude peaks in the frequency spectrum of sound (see Figure 5). - The Mel-Frequency Cepstral Coefficients (MFCC) features which allow capturing the spectral discriminant of each signals.
Figure 5: Signal frequency domain representation
11
To study the spectral domain, a first step is to transform the signal representation from time to frequency domain using the Discrete Fourier Transform (DFT). This allows picturing the main frequencies of a signal. Our product requires a fast and computation-efficient algorithm to compute the DFT, thus by reviewing a benchmark of existing Fast Fourier Transform (FFT) algorithms [16], we choose the solution of Pei-Chen & al. [17] as it allows real-time FFT computing using few computational resources. Once the frequency domain of the signal is determined, the fundamental frequency and the formants can be measured using a peak detector, i.e. a function that finds maxima in the value range. To increase the reliability of the detection, some techniques aims at smoothing the signal to help real maxima appear. The Smoothed Spectrum Method (SSM) seems the most promising with an efficiency of 97.99% against 95.50% for a classical local maximum value detector and 96.86% for the Cepstrum analysis [23]. The idea is to use a weighted addition to smooth the spectrum and increase the detection reliability.
31 aot 2012
To determine the MFCCs, we follow the more complex process proposed by Vempada & al. [18]: - Divide the cry signal into sequence of frames with a frame size of 20 ms and a shift of 10 ms. - Apply the Hamming window over each of the frames. - Compute magnitude spectrum for each windowed frame by applying DFT. - Mel spectrum is computed by passing the DFT signal through Mel filter bank. - DCT is applied to the log Mel frequency coefficients to derive the desired MFCCs. The computation of these coefficients is CPU-intensive and is only supported in real-time on important and optimized infrastructures. Yet it can provide further interesting development as new initiatives to improve the algorithm are under development and because it allows distinguishing the cry cause among 3 main types (hunger, pain, wet diaper) with a good reliability [18]. 3.2.2.3 Rhythmic organisation of the sound A final approach to cry detection is to consider it as a dynamic signal. The rhythmic organisation analysis of the sound takes a look at the infant noise bursts and pauses durations. By monitoring the magnitude spectrum of the infant expiratory sounds over time, an algorithm proposed by Sandford Zeskind & al. [19] tries to find temporal features correlation among different individuals. However, even if this solution can be run in real-time without the requirement for an efficient hardware, recent investigations have proven that rhythmic organisation is not yet a reliable indicator for cry detection. 3.2.2.4 Waking detection system In the literature, the detection of infant waking is mainly addressed by recognizing cries. However, we believe that parents can find value knowing when their child is awake, not only when they cry but to feed or change them. The current research attempts focus mainly on the sleep stages recognition using complex biometric sensors such as Electro-encephalogram (ECG), accelerometers or Galvanic Skin response (GSR) [20-21], but no dedicated auditory study of the temporal waking process of an infant can be found. According to Karraker & al. [22], the waking process has some detectable auditory events such as giggles, sheets movements, or shocks. These are sudden noises, thus sudden changes in the signal spectrum. This gave us the idea to monitor the signal spectrum changes over time. When sudden peaks appear in several previously determined frequency ranges (e.g. voice spectrum) at repeated instants over time, then conclusive evidence of an infant wake can be inferred. To support that idea, one approach would be to compute the spectral density (PSD) of the signal every sample and to keep track of the past PSD. If a sudden change appears at a specified frequency, then a variable is incremented. If after a number of samples, no other change is detected then the variable is set to zero. Otherwise, if the variable exceeds a threshold value then the waking activity is recognized. The frequencies and variables involved in this solution will be defined during test sessions with baby as it is a rather empiric system. The assumptions surrounding this idea will also be further tested with different baby noises and environment before adding it to the customer-facing application.
12
31 aot 2012

3.2.2.5 Sound features extractors comparison As we previously said, any feature extractor can be coupled with any classification algorithm in order to form a complete activity recognition system. Therefore before examining the pattern detection techniques, we decide in the following what features will be chosen for the final solution considering our product requirements: reliability rate, computational cost, adaptability and evolution potential towards new features. With a false positive/negative rate of 75.6/86.5% (used on a real database) [15], the STE and STZC approach seems promising, but need to be completed by another solution in order to improve its reliability. It can be run with few resources if optimized and is adaptable and evolutionary for further functionalities related to speech. It could for example detect when the baby pronounces its first intelligible words [14]. The frequency domain approach is a very interesting solution. Detecting the fundamental frequency can be done at a reliable rate (97.9%) using the Smoothed Spectrum Method (SSM) and helps successfully detecting a cry at 99% (associated with neural network and used on the Baby Chillanto database) [24]. It does not require extensive computation resources, but its only drawback is that the solution can only be used for very specific detection. As for the formant, their determination characteristic is similar to the fundamental frequency. Their computation can be done in real-time but can impact the smartphone performance. Extensive testing will be done on this part after implementation. The formants being a good indicator of the human vocal tract, they could constitute the basis for further development of functionalities related to emotions and speech. Moreover, by monitoring the signal frequency spectrum, changes can be found and the waking recognition system could be subsequently implemented and tested. The MFCCs are too CPU-intensive to be kept for implementation [24-26]. With the rise of smartphones CPU power in the upcoming years, the implementation of a real-time computation could be imagined, but it is currently unfeasible. However, they would be really interesting to classify the cry causes and the emotions. Finally, the rhythmic organisation is also not chosen, because of its low reliability rate (30-40%). If further investigations are made in that area and new reliable temporal indicators are found, this solution could be interesting as it does not require a lot of computational power [19]. It would also allow bringing more contexts to the cry signal and, with the study of expiratory bursts, open new perspectives for safety risks and diseases detections. Considering these points, we choose to implement the extraction of the STE/STZC, the fundamental frequency and the formants as features used for childs status detection algorithms. The techniques comparison is summarized in the chart below. Technique STE/STZC 31 aot 2012 Frequency F0 Formants Mel frequency Rhythm Reliability rate + ++ ++ ++ -Computation cost + ++ + -++ Adaptability + + ++ ++ Evolution potential + + ++ ++
13

3.2.3 Pattern recognition algorithms
Once the features have been extracted and form a vector of value, there are two main approaches to recognize a pattern from this data: - A static matching function which compare the values against known and identified norms giving a matching score between the signal and an ideal activity-related signal. If the score is greater than a decision-threshold, the activity is recognized. - Machine learning algorithms which, rather than processing the data, act as a black box that learn from precedent outputs its own classification and regression model and conclude directly on a recognized activity giving the vector position in the data space. Let us further detail and compare them. 3.2.3.1 Matching functions The design of a matching function is empiric and involves three decisions that can severely impact its performance. Firstly, different functions can be employed. The most simple and adapted to our case is the weighted differential addition (see Formula 3). Giving a set of features that we have previously determined (Normalised energy (STE), Signal envelope period (STZC), fundamental frequency (F0) and formants (F1 Fx), the function is the weighted sum of differences between the features values of a given signal and an ideal activity signal. If the result of that function is lower than a threshold then the activity is recognized.
) )
Formula 3: Weighted differential addition formula
With w: output n: number of indicators wn: feature weight vsignal: feature value for the studied signal vnorm: feature value for the ideal signal
14
Once the function has been defined, the feature weights and the threshold values should be determined. The weights attribution can be done considering: the importance of the features for the activity recognition, their reliability (increased if reliable, else lowered), but also the usual gap range between the signal and the norm in order to reduce the unwanted impact of a non-determinant feature difference. The threshold value is defined through testing and experiences in order to lower the false positives and negatives rates. Once the matching function has been designed, it can be deployed anywhere. Considering our small set of features, it uses little computational power. Their only drawback is that the determination of the weights and threshold values should be done every time a new feature is added. However, once the matching function class has been implemented, it can be used for other functionalities without the need for any other development. 3.2.3.2 Machine learning algorithms 31 aot 2012 Machine learning is the branch of artificial intelligence that studies and develops architectures and algorithms to equip an agent (a machine which is usually a computer) with certain behaviour and an ability to build internal models from empirical training data in order to solve a certain task [27]. Among them we distinguish the Support Vector Machine (SVM) and the Neural Network (NN) that are often used for auditory event and activity classification.

Support Vector Machine (SVM) A SVM is a two-categorical classifier, i.e. it can be used to conclude on the recognition of a given activity or not. It is composed of an internal regression model which separates the value space into two parts: the recognised pattern space and the rest. When the SVM receives a feature vector, it projects it on the value space and concludes on the recognition or not of the activity considering the position of the resulting point compared to the regression model. In order to build its internal regression model, training algorithms (see Figure 6) are used to make it learn (i.e. build) it.
Figure 6: Support Vector Machine principles overview
These algorithms are based on training samples. At each iteration the SVM is presented with a set of sample feature vectors and its associated activity (e.g. crying / not crying). By processing these examples, the SVM maps them into its internal value space and computes the regression model (e.g. segmenting the space between crying and not crying activities). Once the SVM is trained, when unmarked feature vectors are given to it, it is able to recognize the pattern in a time and computation-efficient manner. Neural Network (NN)
15
A neural network is a multi-categorical classifier. It is composed of an interconnected multi-layered set of entities called neurons, where each neuron can be activated outputting its activity which is a level of confidence in the recognition of a pattern. Each neuron is connected to the neurons at the next layer by weighted links.
Inputs of previous level
y1 y2
w1
w2
( )
Neuron Weights
()
Output for next level
yn
31 aot 2012
wn
Figure 7: Neural network model
The whole concept relies on the firing function (). When the sum of all inputs multiplied by their affected weights exceeds a certain threshold, the neuron is activated and outputs a value yj as explained in Figure 7. Thus the decision-making algorithm is the combination of multiple neurons

decisions. Initially, scientists configure the network hierarchy and the firing rule of each neuron. The training algorithms then teach the network by changing the weights affected to links. For each data samples, some neurons will fire (ie. say to the next level that they recognize the pattern) and some not. During training time, data samples are marked as belonging to one category or another. Their features are extracted and serve as inputs in the neural network. The objective of training algorithms is then to minimize the quadratic error of the output by reducing the weights of neurons that went wrong and improving the others depending on the level of confidence they output. 3.2.3.3 Pattern recognition algorithm comparison As we have previously said, the matching function is an interesting solution because it does not require much computation power and after a careful design and testing stage can achieve pattern recognition with good false positive/negative rates (around 90% [24]). Moreover once the Java class has been implemented, it can be easily reusable for other functionalities. Its only drawback is that the design stage (determination of feature weights and threshold values) should be performed again for each new functionality, thus requiring expensive experiences and testing. On the other hand, the machine learning algorithms feature better recognition rates (from 95 up to 99% [24]). Their main advantage is that once the neural network (NN) has been designed and once the SVM or the NN training algorithms and procedures have been defined, the deployment of new functionalities over these solutions only requires computation power and time. No extensive development is needed. The only drawback is that every time a new feature is considered, since the value vector form which serves as input is changed, the algorithm will have to be trained again from the start to build its internal model. Moreover, the training algorithms complexity (e.g. feed-forward) leads to strong requirements on the computation power of the infrastructure that will support the operation [27]. However, once they have been trained, these algorithms are able to recognize quickly and efficiently complex patterns and can be deployed on a smartphone platform. Thus it is possible to consider the integration of training algorithms on an on-demand cloud computing platform to solve the need and avoid huge infrastructure costs at the foundation of our start-up. Deployment and integration of pretrained recognition algorithms would then be directly performed within the mobile application. Nevertheless, other ethical issues are also raised by the training phase. To have a good-enough performance, the algorithms need to be trained with samples recorded in real-life condition. But as stated by Robb & al. [28], the techniques employed for eliciting cry vocalizations and their subsequent use for research purposes can be subject to ethical questioning. Moreover, it could form the basis for a communication and marketing problem with parents if the technical principles at the roots of our technology are publicly denounced. However, auditory databases of baby cries have been gathered by scientists and could be used. Thus, these solutions raise extra requirements for transparency and careful definition of the techniques employed for the samples collection, along with a special risk strategy. To conclude, considering that our actual product has just to recognize between 3 activities and would mainly perform cry recognition, we choose the SVMs over NNs because they need less design and computation power for training. If the app activity recognition functionalitys importance increases on the future, we might consider migrating to a neural network solution. The chart below
16
31 aot 2012

summarizes the assessment of the different solutions. The matching function solution is chosen over the SVM for a first attempt because it is simpler to implement and allow performing most of the job. It would allow a faster release of the first version of the mobile app and does not raise the extra risk of communication problems related to ethical issues. Techniques Reliability Computation power Evolution potential Ethical issues Development Simplicity Matching function + ++ ++ ++ Support Vector Machine ++ + + -Neural Network ++ + ---
3.3
Complete solution overview
When aggregating all these design choices into one solution, we end up with the following Activity recognition system architecture detailed in Figure 8 below.
17
31 aot 2012
Figure 8: Activity recognition system overview
4. Solution development &optimization

The most difficult part in designing an activity recognition system is the choice of its components (features and their extractors, pattern recognition technique) that we previously justified. Once they are set, the development of the application in itself first involves configuring the development environment which we will review and then implement it. In our case, the implementation that followed was eased by all the high-level functions provided by the Android API. Therefore we explain less the development part as it has less technical challenges and impacts than the previous part.
4.1
Development environment
Before starting to develop any application, it is important to install and configure an accurate and optimal development environment. Google provides (in addition to the Operating System) a set of tools for application development projects. The development environment we use is composed of several layers with specific roles: - A Java runtime environment - JRE - A development platform Eclipse - A Java development Kit - JDK - Modules and libraries related to the project - An Android development Kit - SDK - An Android device The architecture of the development environment is detailed in the following diagram.
18
31 aot 2012
Figure 9: Development Environment overview

Each of these components has specific roles and provides a set of services to the layer up. Ultimately, we use the software Eclipse with add-ons for Android development and a library called jAudio to perform the auditory features extraction (see 5.2.2). We define the components in the chart below. The JRE is a Java Virtual Machine which allows executing Java application on a device. Most of users have a already an installed JRE on their computer especially to browse the Internet and execute specific Java application. However, a JRE does not allow creating Java applications. The JDK (Java Development Kit) includes development tools such as compilers, debuggers and Java libraries to create Java applications. We can notice that a JDK often includes a JRE, so installing a JDK is sufficient to have a JRE. The SDK is a development kit provided by Google that includes a set of tools for Android development projects. Especially, it includes APIs (a set of classes with available functions for developers), code examples, technical documentation, and an emulator. It is freely available on the Googles website. Eclipse is a multi-language software development environment comprising an integrated development environment (IDE) and an extensible plugin system. It is written mostly in Java. It can be used to develop applications in Java and, by means of various plug-ins. Google provide a compatible module with Eclipse to assist Android application development projects. It is possible to import additional Java libraries to the project to take advantage of existing Java classes and functionalities. For example, in our project we have imported the JAudio library to perform audio treatment. It is possible to test an application either on the emulator provided by the Android SDK or directly on an Android smartphone. It is necessary to configure an emulator before being able to use it. It especially means to specify the screen type, the size of the SD card, etc.
JRE
JDK
SDK
Eclipse
ADT Additional Java Libraries
19
Test Devices
4.2
The jAudio library
4.2.1 Presentation and reliability JAudio is a new framework for feature extraction designed to eliminate the duplication of effort in calculating features from an audio signal. This system meets the needs of audio processing researchers by providing a library of analysis algorithms that are suitable for a wide array of sound analysis tasks. It provides an easy-to-use GUI that makes the process of selecting desired features straight forward but also a command-line interface to manipulate its services via scripting. Here is the common process of using jAudio. The system takes a sequence of audio les as input. In the GUI, users select the features that they wish to have extractedletting jAudio take care of all dependency problemsand either execute directly from the GUI or save the settings for batch processing. The output is either an ACE XML le or an ARFF le depending on the users preference. In order to address issues related to audio feature extraction, jAudio was designed by taking into account technical specifications and several design decisions were taken. Many of these design decisions match our needs for the implementation of the cry detection system presented above:

Java based JAudio is implemented in Java in order to capitalize on Javas cross-platform portability and design advantages. A custom low-level audio layer was implemented in order to supplement Javas limited core audio support and allow those writing jAudio features to deal directly with arrays of sample values rather than needing to concern themselves directly with low-level issues such as buffering and format conversions. By importing the jAudio library within our project development environment it is possible to directly use implemented jAudio classes and feature extraction methods. It permits to have a homogenous Java based code between the Android application development and the backend audio treatment implementation. XML & ARFF output JAudio supports multiple output formats, including both the native XML format and the ARFF format. Both of them provide structured data easily extractable and usable as input for matching functions. Handling dependencies In order to reduce the complexity of calculations, it is often advantageous to reuse the results of an earlier calculation in other modules. JAudio provides a simple way for a feature class to declare which features it requires in order to be calculated. An example is the magnitude spectrum of a signal. It is used by a number of features, but only needs to be calculated once. Just before execution begins, jAudio reorders the execution of feature calculations such that every features calculation is executed only after all of its dependencies have been executed. Furthermore, unlike any other system, the user need not know the dependencies of the features selected. Any feature selected for output that has dependencies will automatically and silently calculate dependent features as needed without replication. It is especially interesting in terms of calculation speed and power consumption reduction. Extensibility Effort was taken to make it as easy as possible to add new features and associated documentation to the system. An abstract class is provided that includes all the features needed to implement a feature. Moreover, meta-features are templates that can be applied against any feature to create new features. Examples of meta-features include Derivative, Mean, and Standard Deviation. Each of these meta-features may be automatically applied to all features without the user needing to explicitly create these derivative features. It allows us to establish exactly the features we have previously selected. 4.2.2 How will we use it? As previously stated, jAudio will allow us to define our own feature extractors using their low-level audio characteristics extraction library. This will allow us implementing the STE and STZC computation components. There are already in-built functions to extract the fundamental frequency and the formants. However, we will need to perform tests on a benchmark of smartphones to verify that the jAudio solutions do not use too much computation power and memory resources. Once the feature extractors are developed, the Eclipse IDE allows us linking transparently all the libraries. Thanks to the ADT plugin, a direct interaction between the Android SDK and our software development platform is possible. And once we have added the JAudio library to the Java Build path, the compilation of a code gathering Android, JAudio and standard java libraries is successful.
20
31 aot 2012
30
BARREAU Pierrick Activity Recognition System for baby monitoring 4.3 Solution development
The principal component of an Android application is the Activity. It is a single, focused thing that the user can do and the entry point of the SDK. From that central point, one can invoke any object necessary for the application. In order to start the development with a good overview of what objects implementation needed to be done, we first draw an UML class diagram. It allows separating the concerns between 4 main components: - The activity recognition system and how recording sound using Android SDK. - The pre-processing system and how filtering the sound to improve its quality - The feature extractors and how using jAudio to quickly craft our own extractors. - The matching functions and how adapting weights and threshold to make it more reliable.
21
Figure 10: UML class diagram
4.3.1
Recording sound with Android
The AudioRecord object is provided by Android to directly pull sound from any audio source for the smartphone. We configure it to take the microphone as an input (MediaRecorder.AudioSource.MIC). As we previously said we will try two different sampling frequencies. When using the Eclipse emulator (AVD), the sample frequency is set to 8 KHz as it cannot support more. When deployed on a real-world smartphone, it is set to 20 KHz. As for the audio encoding, we choose to quantize on 16bit (using AudioFormat.ENCODING_PCM_16BIT) as it proves to have good enough results for Robb & al. [28]. Finally we set the channel configuration to CHANNEL_IN_MONO to effectively pull voice sound from the microphone. Thus we end up with: audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, 8000, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, bufferSize); 31 aot 2012
Figure 11: Audio recording source code
It allows creating a stream to which we can then apply the noise suppression (via the object NoiseSuppressor), the echo cancelation (via the object EchoCanceler) and the signal normalization (via the object AutomaticGainControl) pre-processing algorithms. Then we can store this stream as an array of short in a buffer in order to forward it to the filtering section and then to the feature extractors. The code used to record sound on Android is provided in Appendix 4.
30

4.3.2 Signal pre-processing system
In order to further removing the noise coming from the external environment, we filter the signal using a digital FIR at 10 KHz with an attenuation of -30 to -50 dB in the stop-band and a ripple of 3 dB in the pass-band (see 3.2.1). We use Matlab to generate it with the sptool functionality configured to employ the Hanning Window method. This generates a table of coefficients that we store in the attribute Coefficients of the Filter class. To filter the signal, we then just have to implement a convolution algorithm. 4.3.3 Feature extractors
Using the jAudio library, we are able to directly apply an FFT algorithm (using the FFT object) on the buffer. Then, using the PeakFinder object, we can conclude on the fundamental frequency and the signal formants. To implement the STE and STZC, we use the FeatureExtractor interface. It provides a set of common methods that fits well with our project. 4.3.4 Matching function
The implementation of the matching function is straightforward. This a class including the weights, threshold and ideal values for activity recognition as attributes. In order to be able to update the sensibility of the app according to the baby voice characteristics, we define getters and setters to update the weights and the threshold. The pattern recognition is performed by the method computeFunction. It is a simple implementation of the weighted differential addition formula presented in section 3.2.3.1. The source code can be found in Appendix 5.
22
4.4
4.4.1
Solution testing and optimization

Recognition testing
In order to efficiently test our ARS, we need to have a database to benchmark the system against. We first thought of using the Baby Chillanto database [30] and ask the researchers involved at the Instituto Nacional de Astrofisica Optica y Electronica to get access to it. However our requests being unanswered, we chose to constitute our own baby sounds database. The main drawback is that we are unable to provide standard figures to compare with other existing solutions. To constitute this collection, we searched online (mainly on sound sharing platforms such as findsounds.com) and gathered 15 baby sounds samples. Then we play these sounds near the smartphone microphone on various environments and assess our system. By constantly refining our weights and threshold, we finally ended up with a recognition rate of 40% (6 samples over 15). A second optimisation step was to quantify the effect of the audio effects added during the sound acquisition on the recognition rate. By successively disabling these extra functionalities, we discovered that the two most important pre-processing effects were the noise suppression and the signal normalisation algorithms. Considering those results, we chose to disable the echo cancelation algorithm, saving computation resources in the process.
31 aot 2012
30

The results of our researches supporting these assumptions are summarized in the chart below: With all effects 40% (6/15) Without noise suppression 13% (2/15) Without normalisation 13% (2/15) Without echo canceler 40% (6/15)
As we previously said in the functional analysis, our design goals are a recognition rate of at least 70% to have a reliable baby monitor. We are far from this requirement, mainly because the matching function algorithm showed its limits. As future work, we plan to migrate towards a SVM solution (see 4.4.3). 4.4.2 Performance testing
To test the performance of our application we use two separate environments: - The Android Virtual Devices (AVD) that emulates a smartphone on a computer. Directly integrated within Eclipse, it allows testing the application among several platforms without the need to buy them physically. We use that tool to test the application on Samsung Galaxy Nexus and Motorola MT870. - 2 real-world smartphones (the HTC Sense and the HTC One). The logs and resource consumption can be directly viewed in the feedback console of Eclipse. This allowed us to see the performance of our application when deployed on a broad range of smartphone. As previously said in the functional analysis (see 2.2), the application requirements will be a smartphone CPU rate of at least 1.5 GHz, therefore we choose the smartphones available as AVD according to that characteristic. Moreover, our design goal is to provide an application that uses at worst 15% of the CPU use rate. The application CPU use depends on the smartphone performance and from the Android version deployed [31]. Therefore we conducted a benchmark of different smartphones on different Android OS version. The performance test results are summarized in the chart below. The presented CPU use percentage has been determined by taking the average maximal rate reported by Eclipse during a test session. Smartphone HTC Sense Real smartphone Android 4.1 Android 4.0 Android 3.2 Android 2.3.3 31 aot 2012 19.4% / / / HTC One Real smartphone / / 20.2% / Samsung Galaxy AVD 20.5% 21.2% 22.4% 26.7% Motorola MT870 AVD / / 24.3% 27.6%
23
As we can see the performance design goals are rarely reached. However, we took a look at some possible optimisation. By recording asynchronously the sound, we would save some resources as processing is given much priority while recording can take place when resources are free. To perform that, we change the implementation of the SoundRecorder so it extends the ASyncTask class. This change only requires to implement the doInBackground() method which will contain the code to
30

record sound. We also choose to disable the echo cancelation algorithm, because it does not strongly affect the recognition performance (see 4.4.1). With those changes, the performance results are as followed: Smartphone Android 4.1 Android 4.0 Android 3.2 Android 2.3.3 HTC Sense 13.8% / / / HTC One / / 14.5% / Samsung Galaxy 14.5% 16.4% 17.3% 19.8% Motorola MT870 / / 19.3% 20.3.6%
For the most recent smartphone and OS version the design goals are fulfilled but with short success. Further work need to be conducted to improve the resource use of the app. A possible future advancement would be to improve the storage and sharing of the acquired audio signal to be in a dynamic buffer. 4.4.3 Future improvements
As the matching function proved to be limited as a pattern recognition technique, we plan to implement a Support Vector Machine (SVM) integrated in the Android Application using the Native Development Kit (NDK). Indeed, the NDK allows programming in C/C++, which is a programming language more suitable to implement this type of solution than Java. The ultimate goal would be to implement some advanced training algorithms and reach a 60-70% recognition rate. Moreover, we plan to use a shared memory among threads to store the audio buffer which contains the sound signal pulled from the microphone. This would allow the recorder continuous and asynchronously storing sound while the activity recognition system object consumes that data to conclude on a recognised baby state. The goal would then be to reach a CPU use rate lower than 10%.
24
31 aot 2012
30
5. Experience Feedback
The practicum aims at bringing to market a focused and adapted product. Thus the technical solution has constantly been changed as the project goes on to fit with the market needs or the business model evolutions. This challenging experience of continuously refining network architecture and software to answer evolving users issues while keeping the whole product coherent was my first real experience of R&D process. Moreover, I had the chance to collaborate with individuals from different background (business, management, and marketing) and from different countries (France, Ireland, and Spain). This association of multiple competences, work methodologies and cultures within a team constitute an interesting insight on what could be a real-world international start-up environment. In addition to the technical knowledge I developed throughout the solutions implementation, I also had the opportunity to help businessmen defining our key value proposition and business model, along with identifying potential future prospects. This complete overview of the development of a project both from a business and a technical perspective added an entrepreneurial competence to my resume. Furthermore, with the rise of users will to capture their daily activity and the maturity of wireless body sensor network, the importance of pattern recognition systems will grow in the upcoming year. Having a lot of interests in these technologies, and more particularly in machine learning algorithms, this technological study fits well with my professional career expectations. Finally, collaborating with the CLARITY research center1 allowed me to get a first experience with a research activity. Indeed, our project being supervised by Cathal Gurrin and Alan Smeaton, two major managers of that center, the state-of-the-art and solution definition were performed in a laboratory context. Thus this master thesis project gave me the opportunity to immerse myself in a highly technological start-up working with multiple important stakeholders of the field.
25
31 aot 2012
CLARITY Center for Sensor Web Technologies: http://www.clarity-centre.org/
30
Appendices
Appendix 1: The Octopus Chart
Parents
Baby FP FC2
FS-S1
Cost
FS-S2
FS-S4 FC6
FC1
FS-S3
FC3 FS-E2
Smartphone
26
FS-E1
FC5
Physical Environment
FC4
Server
FC8
Mobile Network
FC7
Legal environment
BARREAU Pierrick Activity Recognition System for baby monitoring Appendix 2: FAST diagrams
Main Function (FP)
27
Service Functions (FS) FS1: Configure Settings

FS2: Acquire baby data
28
31 aot 2012
30

FS3: Recognize baby activity
29
FS4: Trigger actions
31 aot 2012
30

FS5: Gather evolution data
30
FS6: Compare with norms
31 aot 2012
30
BARREAU Pierrick Activity Recognition System for baby monitoring Appendix 3: Sound recording code
package com.example.sensanalytics; import android.os.AsyncTask; import import import import import import import import import import import java.io.BufferedOutputStream; java.io.DataOutputStream; java.io.File; java.io.FileNotFoundException; java.io.FileOutputStream; java.io.IOException; android.util.Log; android.annotation.TargetApi; android.media.AudioRecord; android.media.MediaRecorder; android.media.AudioFormat;
import android.media.audiofx.NoiseSuppressor; import android.media.audiofx.AcousticEchoCanceler; import android.media.audiofx.AutomaticGainControl;
public class SoundRecorder extends AsyncTask <Void, Integer, Void> private File file; private Boolean isRecording;
private int frequency = 8000; private int channelConfiguration = AudioFormat.CHANNEL_IN_MONO; private int audioEncoding = AudioFormat.ENCODING_PCM_16BIT; private AudioRecord audioRecord; public File getFile() { return file; } public void setFile(File file) { this.file = file; } public Boolean getIsRecording() { return isRecording; } public void setIsRecording(Boolean isRecording) { this.isRecording = isRecording; }
31
31 aot 2012
30

@Override @TargetApi(16) protected Void doInBackground(Void... arg0){ setIsRecording(true); try { DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(getFile()))); int bufferSize = AudioRecord.getMinBufferSize(frequency, channelConfiguration, audioEncoding); audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, frequency, channelConfiguration, audioEncoding, bufferSize); // Apply Acoustic Echo Canceler algorithm on recorded sound Boolean isAvailable = AcousticEchoCanceler.isAvailable(); if (isAvailable) { AcousticEchoCanceler aec = AcousticEchoCanceler.create(audioRecord.getAudioSessionId()); if(!aec.getEnabled()) aec.setEnabled(true); } // Apply Noise Suppression algorithm on recorded sound isAvailable = NoiseSuppressor.isAvailable(); if (isAvailable) { NoiseSuppressor ns = NoiseSuppressor.create(audioRecord.getAudioSessionId()); if(!ns.getEnabled()) ns.setEnabled(true); } // Normalize the output signal isAvailable = AutomaticGainControl.isAvailable(); if (isAvailable) { AutomaticGainControl agc = AutomaticGainControl.create(audioRecord.getAudioSessionId()); if(!agc.getEnabled()) agc.setEnabled(true); } int r = 0; short[] audioBuffer = new short[bufferSize]; while(isRecording && r<50){ int bufferReadResult = audioRecord.read(audioBuffer, 0, bufferSize); for(int i = 0; i < bufferReadResult; i++){ dos.writeShort(audioBuffer[i]); Log.e("info", "Ecris la valeur:"+audioBuffer[i]); } r++;
32
31 aot 2012
} audioRecord.stop(); audioRecord.release(); dos.close(); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) {
30

// TODO Auto-generated catch block e.printStackTrace(); } return null; } public void stopRecording(){ setIsRecording(false); } @Override protected void onProgressUpdate(Integer... values) { // TODO Auto-generated method stub super.onProgressUpdate(values); }
Appendix 4: Matching function source code

package com.example.sensanalytics; import java.lang.Math; public class matchingFunction { private int[] weights = {1,2,2,2,3,3}; private double[] idealValues = {880, 1020, 3340, 4510, 0.05, 0.1}; private double threshold = 0.8; // Getters and Setters to allow updating the weights and Threshold for sensitivity control public int[] getWeights() { return weights; } public void setWeights(int[] weights) { this.weights = weights; } public int getThreshold() { return threshold; } public void setThreshold(int threshold) { this.threshold = threshold; } // Compute the matching function public double computeFunction(double extractedFeaturesValues[]){ double res = 0; for(int i=0;i<5;i++){ res += weights[i]*(extractedFeaturesValues[i] idealValues[i])/Math.max(extractedFeaturesValues[i], idealValues[i]); } return res; } }
33
31 aot 2012
30
References
[1] Bao L., Intille S. S. "Activity Recognition from User-Annotated Acceleration Data", In Proceedings of the Second International Conference in Pervasive Computing (PERVASIVE '04). Vienna, Austria, pp. 1-17, 2004. [2] Stikic M., Laerhoven K. V., Schiele B., "Exploring semi-supervised and active learning for activity recognition", 12th IEEE International Symposium on Wearable Computers, 2008, pp. 81-88. [3] Wasz-Hockert, O., Lind, J., Vuorenkoski, V., Partanen, T. and Valanne, E., The infant cry: a spectrographic and auditory analysis, Clinics in Developmental Medicine No. 29, London: Spastics International Publications, 1988. [4] Clarkson B., Extracting context from environmental audio, Digest of Papers. Second International Symposium on Wearable sensors 1998, 1998, pp. 154-155. [5] Murry, T. Acoustic and perceptual characteristics of infant cries. In: Murry, T., Murry, J. (Eds.), Infant Communication: Cry and Early Speech. TX: College Hill Press, 1980, pp. 251-271. [6] Wasz-Hockert, O., Michelsson, K. and Lind, J. (1985) Twenty-five years of Scandinavian cry research. In: Lester, B.M, Boukydis, C.F.Z. (Eds.) Infant Crying: Theoretical and Research Perspectives. Plenum, New York, pp. 83-104. [7] Michelsson, K., Sirvio, P., Koivisto, M., Sovijarvi, A. and Wasz-Hockert, 0., Spectrographic analysis of pain cry in neonates with cleft palate, Biol. Neonate 26, 1975, pp. 353-358. [8] Michelsson, K., Sirvio, P. and Wasz-Hockert, 0., Sound spectrographic cry analysis of infants with bacterial meningitis, Devel. Med. Child Neurol. 19, 1977, pp. 309-315. [9] Blinick, G., Travolga, W.N. and Antopol, W. Variations in birth cries of new-born infants from narcotic addicted and normal mothers, Am. J. Obstet. Gynecol. 110, 1971, pp. 48-958. [10] Cacace, A. T., Robb, M. P., Saxman, J. H., Risemberg, H., Koltai, P., "Acoustic features of normalhearing pre-term infant cry", International journal of pediatric otorhinolaryngology, Volume 33, Issue 3, 1995, pp. 213 224. [11] Murray, A.D., Javel, E. and Watson, C.S., "Prognostic validity of auditory brainstem evoked response screening in new-born infants", Am. J. Otolaryngol. 6, 1985, pp. 120-131. [12] Oller, D.K., Eilers, R.E., Bull, D.H. and Carney, A.E., "Prespeech vocalizations of a deaf infant: a comparison with normal metaphonalogical development", J. Speech Hear. Res. 28, 1985, pp. 47-63. 31 aot 2012 [13] Saraswathy, J., Hariharan, M., Yaacob, S., Khairunizam, W., " Automatic Classification of Infant Cry: A Review", International Conference on Biomedical Engineering, 2012, pp. 534-549. [14] Kevin Kuo, Feature Extraction and Recognition of Infant Cries, 2010 IEEE International Conference on Electro/Information Technology (EIT), 2010, pp. 1-5. [15] Kondoz, A. M., Digital Speech, John Wiley & Sons Ltd, West Sussex, England, 2004.
34
30

[16] Balducci, M., Ganapathiraju, A., Hamaker, J., Picone, J., "Benchmarking Of FFT Algorithms", IEEE Proceedings on Southeastcon 97. 'Engineering New Century', 1997, pp. 328-330. [17] Pei-Chen, L., Yun-Yun, L., "Real-Time FFT Algorithm Applied To On-Line Spectral Analysis", Circuit System Signal Process, Vol. 8, No. 4, 1999, pp. 377-393. [18] Vempada, R.R., Kumar, B.S.A., Rao, K.S., "Characterization of infant cries using spectral and prosodic features", National Conference on Communications (NCC), 2012, pp. 1-5. [19] Sanford Zeskind, P., Parker-Price, S., Barr R.G., "Rhythmic organization of the sound of infant crying", Developmental Psychobiology Volume 26, Issue 6, 1993, pp. 321333. [20] Sadeh, A., Acebo, C., Seifer, R., Aytur, S., Carskadon, M.A., "Activity-Based Assessment of Sleep Wake Patterns during the 1st Year of Life", Infant Behavior and Development Vol.18, 1995, pp. 329337. [21] Heiss, J.E., Held, C.M., Estvez, P.A., Perez, C.A., Holzmann, C.A., Prez, J.P., "Classification of Sleep Stages in Infants: A Neuro Fuzzy Approach", IEEE Engineering in Medicine And Biology, 2003, pp. 147-151. [22] Karraker, K., "The Role of Intrinsic and Extrinsic Factors in Infant Night Waking", Journal of Early & Intensive Behavior Intervention, Vol. 5 Issue 3, 2008, pp. 108-121. [23] Vrallyay Jr., G., Beny, Z., Illnyi, A., Farkas, Z., Kovcs, L., "Acoustic analysis of the infant cry: classical and new methods", Proceedings of the 26th Annual International Conference of the IEEE EMBS, 2004, pp. 313-316. [24] Saraswathy, J., Hariharan, M., Yaacob, S., Khairunizam, W., "Automatic Classification of Infant Cry: A Review", International Conference on Biomedical Engineering (ICoBE), 2012, pp. 543-548. [25] Garcia, J.O., Reyes Garca, C.A., "Mel-frequency cepstrum coefficients extraction from infant cry for classification of normal and pathological cry with feed-forward neural networks, INAOE, IEEE, 2003. [26] Mansouri Jam, M., Sadjedi, H., "Identification of hearing disorder by multi-band entropy cepstrum extraction from infants cry", IEEE, 2009. [27] Martel J. Convolutional Neural Networks - A Short Introduction to Deep Learning. Not published yet. 2012. [28] Robb, M.P., Goberman, A.M., Cacace, A.T., "Methodological Issues in the Acoustic Analysis of Infant Crying", 31 aot 2012 [29] Aggarwal, J.K., Ryoo, M.S., "Human Activity Analysis: a Review", ACM Computing Surveys (CSUR) Surveys, Vol. 43, Issue 3, Article 16, 2011. [30] O.F. Reyes-Galaviz, S. Cano-Ortiz and C. Reyes-Garca, Evolutionary-neural system to classify infant cry units for pathologies identification in recently born babies, in 8th Mexican International Conference on Artificial Intelligence,MICAI 2009, Guanajuato, Mexico, pp. 330335, 2009.
35
30

[31] Huang, J., Xu, Q., Tiwana, B., Mao, Z.M., Zhang, M., Bahl, P., "Anatomizing application performance differences on smartphones", Proceedings of the 8th international conference on Mobile systems, applications, and services, 2010, pp. 165-178.
36
31 aot 2012
30

SensAnalytics - Activity Recognition System For Baby Monitoring

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SensAnalytics - Activity Recognition System For Baby Monitoring

Uploaded by

Copyright:

Available Formats

BARREAU Pierrick Activity Recognition System for baby monitoring

Activity Recognition System for Baby Monitoring

Tutor: Dr Cathal GURRIN Practicum Coordinator Dublin City University

BARREAU Pierrick Activity Recognition System for baby monitoring

BARREAU Pierrick Activity Recognition System for baby monitoring

1. Context of the study

SensAnalytics: Goals and motivations

Market Research: Process and Findings

BARREAU Pierrick Activity Recognition System for baby monitoring

BARREAU Pierrick Activity Recognition System for baby monitoring

BARREAU Pierrick Activity Recognition System for baby monitoring

Related to baby security

Related to baby evolution E1 Gather baby evolution information 2

E2 N 1.1 1.2 1.3

Constraints related to parents

1.4 31 aot 2012 2.1

Need quality certifications Should be harmless for health

Constraints related to baby

Should not be reachable from bed

BARREAU Pierrick Activity Recognition System for baby monitoring

3.2 3.3 3.4 4.1 4.2

Constraints related to server

Constraints related to price

BARREAU Pierrick Activity Recognition System for baby monitoring

3. State of the art Overview

BARREAU Pierrick Activity Recognition System for baby monitoring

BARREAU Pierrick Activity Recognition System for baby monitoring

Figure 3: Pre-processing system overview

BARREAU Pierrick Activity Recognition System for baby monitoring

Figure 4: Cry signal detection examples

BARREAU Pierrick Activity Recognition System for baby monitoring

Figure 5: Signal frequency domain representation

BARREAU Pierrick Activity Recognition System for baby monitoring

BARREAU Pierrick Activity Recognition System for baby monitoring

BARREAU Pierrick Activity Recognition System for baby monitoring

Formula 3: Weighted differential addition formula

BARREAU Pierrick Activity Recognition System for baby monitoring

Figure 6: Support Vector Machine principles overview

Inputs of previous level

Figure 7: Neural network model

BARREAU Pierrick Activity Recognition System for baby monitoring

BARREAU Pierrick Activity Recognition System for baby monitoring

Complete solution overview

Figure 8: Activity recognition system overview

BARREAU Pierrick Activity Recognition System for baby monitoring

4. Solution development &optimization

Figure 9: Development Environment overview

BARREAU Pierrick Activity Recognition System for baby monitoring

ADT Additional Java Libraries

The jAudio library

BARREAU Pierrick Activity Recognition System for baby monitoring

Recording sound with Android

BARREAU Pierrick Activity Recognition System for baby monitoring

Solution testing and optimization

BARREAU Pierrick Activity Recognition System for baby monitoring

BARREAU Pierrick Activity Recognition System for baby monitoring

BARREAU Pierrick Activity Recognition System for baby monitoring

CLARITY Center for Sensor Web Technologies: http://www.clarity-centre.org/

BARREAU Pierrick Activity Recognition System for baby monitoring

Main Function (FP)

Service Functions (FS) FS1: Configure Settings

BARREAU Pierrick Activity Recognition System for baby monitoring

BARREAU Pierrick Activity Recognition System for baby monitoring