You are on page 1of 24

A ​real-time​ algorithm for ​acoustic​ monitoring of ingestive

​ razing​ cattle
behaviour​ of g

Abstract
Estimating ​forage intake and monitoring ​behaviour of the ​grazing livestock are difficult tasks.
Detection and classification of events like ​chew​, bite and ​chewbite are very useful for estimate that
information. It is well known that ​acoustic monitoring is the best way to quantify and classify events
of the ruminant feeding ​behaviour​. However, existing ​methods fail to be computationally efficient.
In this work, we present an acoustical analysis system that works in real-time and automatically
detects and classifies ingestive events ​of ​grazing cattle feeding. The system employs a directional
wide-frequency microphone facing inward on the forehead of the animal, ​signal analysis, and
decision logic to detect, classify and measure ingestive events. The system measures ​acoustic
parameters of ingestive events, such as ​duration​, amplitude, shape and energy, which can support
further ​event classification and become the inputs to a ​forage intake model. P​erformance of the
algorithm was obtained by comparing automatic ​labeling of our method with reference ​labeling
made by an expert in animal behaviour. For testing and validation purposes, experiments with two
differents databases were conducted. For both databases we obtained a detection rate of events
without differentiation between ​chew​, bite and ​chewbite of 98%. Using the first database, the
algorithm parameters were ​tuned and tested over differents signals of 5 minutes ​duration,
obtaining a recognition rate of 83% for event classification​. Over the second database, two types of
forage were ​analyzed​: alfalfa and fescue, each one for two different heights: tall (24.5 ± 3.8 cm)
and ​short (11.6 ± 1.9 cm). Average recognition rates were 79% for tall alfalfa, 73% for ​short alfalfa,
78% for tall fescue and 77% for ​short fescue. These ​results are similar to the ones reported with
other algorithms. However, the proposed method have the additional advantages of linear
computational ​complexity and low computational cost​, making possible its implementation in low
cost embedded systems for ​real-time execution. Additionally, the output of the system is a small
size file, only containing the ​results of analysis, which allows its transmission through a wifi network
or its storage in the device.

Keywords: ​Acoustic monitoring; Grazing cattle ​behaviour​; Jaw movement classification; ​Signal
processing​; ​Real-time​ operation.

1. Introduction
Accurate monitoring of diet and feeding ​behaviour of ​grazing cattle is necessary to ensure the
health and welfare of these ​animals​, which then will result in more quantity and better quality of
livestock derivate products (meat and dairy). Many efforts have been put into finding the most
appropriate technique to address this problem but the success of these techniques has been
limited by several factors (Ungar, 1996, Delegarde, 1999). One possible way of performing the
monitoring of ruminant feeding is through the detection of the three most common events of
grazing activity: bite, ​chew and ​chewbite ​(Milone et. al., 2012). Biting includes the apprehension
and saving of ​forage​, while ​chewing action includes the crushing and processing of the ​forage​.
There is also another important ​event resulting from the superposition of ​chewing and biting
activities but made with the same jaw movement, it is called ​chewbite​. The detection and
classification of these three types of events is necessary for accurate monitoring of diet. The
quantification of chews provides important information not only on the ruminal fermentation of ​fiber​,
but also is related to the rumen pH (Sauvant, 2000). The bites and chewbites are activities that
incorporate matter to the ​animal organism, including the latter also a pre-process of food (Laca and
WallisDeVries, 2000). While the ​number and characteristics of these events within the same
activity may vary according to several factors, they should stay within certain limit values to ensure
good health of the ruminant (​De Boever et. al., 1990). To the best of our knowledge, only few
authors have developed algorithms able to detect and classify these events because of the
complexity​ to differentiate them, especially in noisy environments.
One approach for studying the grazing ​behaviour uses ​acoustic monitoring. Alkon and Cohen
(1986) and Alkon, Cohen, and Jordan (1989) suggested the use of ​acoustic biotelemetry to study
wild ​animal ​behaviour​, and successfully identified various foraging actions in porcupines. Laca et
al. (1992) used ​acoustic monitoring to study the ​short​-term grazing ​behaviour of cattle, by
mounting an inward-facing microphone on the forehead of the animal, because bone conduction
and the oral cavity tunnels deliver much stronger ​chew and bite ​sound intensities at the skull
surface than into the air space surrounding the head of the animal. The clarity of the ​signal thus
obtained left no doubt that this method detected all jaw movements that performed a bite or a
chew​. The ripping ​sound of a bite and the grinding ​sound of a ​chew were readily distinguishable,
and the ​acoustic method was considered to be much more reliable and efficient for counting bites
than direct visual observation. Acoustic monitoring has been used since then to study grazing
behaviour of domesticated herbivores in the context of ​short and controlled experiments intended
to study basic questions regarding ingestive ​behaviour​. All reported applications have involved
fresh herbage as the vegetation; several studies obtained encouraging ​results regarding estimation
of intake on the basis of ​acoustic variables (Galli et al., 2006; Galli, Cangiano, Milone, & Laca,
2011; Laca & Wallis ​De​ Vries, 2000).
In order to develop suitable algorithms to ​recognize the jaw movements from ingestive sounds,
Milone et al. (2009) used concepts from the field of automatic speech recognition to develop an
algorithm based on Hidden Markov ​Models (HMM) capable of identifying and classifying ​chew​,
bite, and compound ​chewbite actions in sheep. The algorithm achieved average recognition rates
of 89%, 58%, and 56% in the evaluations, respectively for each ​acoustic ​event​. Subsequently, Galli
et al. (2011) used this algorithm to demonstrate the possibility of using ​acoustic variables to
estimate dry matter intake in grazing sheep. Milone et al. (2012) developed an algorithm extending
the use of HMMs to ​recognize the three types of ingestive sounds of cattle (​chew​, bite and
chewbite​) in alfalfa (tall and ​short​) and fescue (tall and ​short​). They obtained recognition rates of
84% and 65% for tall and ​short alfalfa, respectively, and 85% and 84% for tall and ​short fescue,
respectively.
Clapham et al. (2011) developed an automated ​bite-detection system for cattle grazing mixed
perennial pasture. The algorithm detects automatically only ​bites events for cows eating different
types of pastures and reaching about 95% recognition rate. They calibrated and used ​SIGNAL
sound analysis program for ​Windows (developed by the Engineering ​Design company) for the
detection of such events. According to them, the software processed signals at approximately ten
times faster than ​real-time​. All recording signals were made on ​high quality (44.1 kHz sampling
rate and 16-bit resolution). A high-pass filter with ​cutoff frequency at 600 Hz (to attenuate signals
such as wind) was applied, because they focussed on ​event detection in the band from 17 kHz to
22 kHz. They encountered some difficulties in trying to scale the experiments to several days of
monitoring, including the capacity of storage devices and energy sources. The method seems to
require a careful calibration before it can be used in each ​experimental​ condition.
Navon et al. (2012) proposed a new algorithm based on the analysis of ​time domain features
(shape, intensity, ​duration and sequence) of the recorded ​sound for detection of jaw movements.
It does not require calibration and it uses a machine-learning approach to separate true
jaw-movement sounds from background and intense spurious noises. The algorithm ​performance
was tested in three field studies by comparing automatic ​labeling generated by the algorithm with
reference ​labeling made by an expert in animal behaviour. For cattle grazing green pasture in a
low noise environment with a Lavalier microphone positioned on the forehead, the system
achieved similar ​results to previous works (94% correct identification and a false positive rate of
7%).
Recently, Tani et al. (2013) developed a monitoring system using a single-axis accelerometer,
instead of microphones. The proposed algorithm uses ​signal processing and pattern recognition
techniques to iteratively estimates the eating and ruminating events from the recorded signal. The
patterns employed by the recognition algorithm are defined in frequency domain and they are used
to identify and classify the events. The algorithm achieves similar ​results to previous works
(Clapham et al., 2011; Navon et al., 2012) without calibration, however three problems related with
signal’s properties arise for this algorithm: i) the spectral similarity of rumination and eating event
could lead to poor results when raw signals with poor ​signal​/noise relationship are analyzed, ii) the
non-stationary property of noises (background noise and/or unexpected noises), what makes it
change its statistical ​properties in time, and the spectral estimation of signals sampled at high
frequency requires a high computational power that makes difficult its implementation on portable
embedded systems. The idea of ​using an accelerometer for ​acoustic monitoring is interesting, but
it had intrinsic errors that should be ​analyzed for a more reliable recording. The authors performed
these experiments in stall, while admitting that it would be interesting to conduct such experiments
on ​grazing​ cattle, where sensor attachment and drift may be important issues.
Although several previous studies have shown good ​performance in detecting events using
acoustic monitoring, few of them have made an ​event classification and none have made an
analysis of the computational ​complexity of its implementation. An analysis of ​complexity is
necessary since the storage and transference of ​high quality and long ​duration (several hours or
days) data has many practical limitations. In this sense, it would be preferable to have a ​real-time
analysis tool to store the ​results directly, with the lowest computational cost possible. The aim of
this work was to develop an algorithm for automatic identification and classification of jaw
movements (​chew​, bite and ​chewbite​) at a low computational cost. This idea points towards the
need of developing an automatic ​real-time system that can be implemented on portable embedded
system, which is the reason why the computational cost must be low. Outdoor environments
inevitably introduce some level of background noise into a recording, and it can be variable and
unpredictable. We aimed to deal with commonly encountered levels of such noises, combining
passive isolation (directional microphones and isolation material that could be easily acquired) with
basic signal processing.
The paper is ​organised as follows: ​Section 2 introduces the algorithm and its implementation. Also
its computational ​complexity is ​analysed in order to show its ability to operate in ​real time​. In
Section 3, the two databases and the performance measure used are described. In ​Section 4,
results obtained from the application of proposed algorithm are presented and ​analysed​. A
discussion about ​results is presented in ​Section 5. Finally, a brief conclusion is presented in
Section 6.

2. ​The​ algorithm
The ​design goal of the algorithm was to achieve good ​performance in the detection and
classification of events but with a low computational cost that allows its ​real time execution in
portable embedded systems. One of the adopted criteria was to work in ​time domain instead of
transformed domain (frequency, time-frequency) to avoid high computational load of using ​signal
analysis​ and machine-learning tools.

2.1 ​General description


When the ​animal is feeding, it moves the jaw to perform two activities: biting (​Figure 1​c​), when
forage is apprehended and cut, and ​chewing ​(​Figure 1a), when ​forage is crushed to reduce the
size of the particles and increase the surface/volume ratio. The combined motion ​chew​bite is the
combination of the both movement (​Figure 1​b​). Then, the feeding ​behavior can be defined in terms
of a temporal sequence of bites, chews and chewbites. Each of these actions produce sounds
whose pitch and timbre is given by the ingested material, while the ​duration and intensity of the
sound is determined by the jaw movement. The ​sound produced by a bite is ​characterized by a
high intensity and a ​short ​duration, that identify a tapping. While the ​sound produced by a ​chew is
characterized by a small amplitude and a similar ​duration than a bite, that ​define a chirp. As a
combination of the previous two, the chew-bite has features of both and a longer duration. All the
events (i.​e​, ​chew​, bite and ​chew-bite​) show a ​peak in the intensity of the ​sound ​envelope that
allows the detection of the jaw movement but not its classification.

Figure 1.​ Examples of signals of typical ​acoustic​ events produced by jaw movements and their
correspondent features for: a) ​chew​, ​b​) ​chewbite​ and ​c​) bite. In each ​row​, top-down: ​acoustic
signal​; intensity ​envelope​, sign of slope envelope, maximum intensity, ​duration​ threshold.

Therefore, we isolated three ​properties of the ​sound ​envelope that, together, discriminate
sound-produced​ by jaw movements with high accuracy:
● Shape: The start and end of a jaw movement are accompanied by changes in the slope of
the intensity of the ​sound ​envelope (​Figure 1). The sign of the ​envelope slope changes one
or two times for chews and bites, while it changes more than two times for a combined
chew-bite​. Therefore, the ​number of changes in the sign of the ​envelope slope can be used
as a measure of the ​event​ shape;
● Intensity: Jaw movements produce sounds whose intensity changes along time (​Figure 1).
The maximum intensity remains constant for a ​chew (low intensity) and a bite (high
intensity), while it changes (from low to high intensity) for a combined ​chew-bite​;
● Duration: A jaw movement has a defined and limited ​duration that is characteristic for
chews and bites (​Figure 1). The ​duration of chews and bites are similar and shorter than a
combined ​chew-bite​.
These properties were fundamental to the design of the algorithm, which can be clearly divided into
two successive tasks:
1. Event detection: in this ​stage the algorithm detects the region of the ​sound ​envelope that
shows the occurrence of a possible jaw movement. The detection is carried out through the
identification of characteristic peaks in the ​sound​ ​envelope​ using an adaptive threshold;
2. Event classification: in this ​stage the algorithm use the features of the ​sound ​envelope to
classify detected ​event using a simple set of ​rules (Table 1)​. Classification is carried out
through the analysis and ​comparison of shape, intensity and ​duration of detected event with
given thresholds.
For its implementation, each of these tasks can be thought of as a set of five successive stages.
The event detection task is composed by the first four stages, while event classification task is
performed during the last stage. In the following paragraphs we will describe briefly how the
detection and classification algorithms were implemented.

Figure 2.​ Signals generated during stages of the algorithm for 15 seconds of sound: ​a​) Sound
signal ​b)​ ​Sound​ ​envelope​ computation, ​c)​ ​peak​ detection, ​d)​ shape, ​e​) maximum amplitude and ​f​)
duration​ of events.

Stage 1 - ​Envelope computation: One of the key elements of this algorithm is the analysis of the
sound ​envelope to detect and classify the events. In the first step the absolute value of ​signal
elements is computed. Then, the resulting ​signal is filtered using a second-order low pass
Butterworth filter with a bandwidth of 5.5 Hz. The ​sound ​envelope of each event shows a longer
period (around 0.5 s) than the original ​sound (​Figure 1), then the second step is to subsample the
sound ​envelope from its original sampling frequency to 100 Hz. The main effect of this task is to
reduce the computational requirements (load and time) in the subsequent tasks, since it reduces
the amount of information to be processed without losing accuracy in the detection and
classification ​results​.
Stage 2 - ​Division of ​sound into segments: ​Short segments are easier to handle because of
computational resource constraints, and their use facilitate the treatment of unexpected events that
need special attention. Such events include intense external noises of ​short ​duration and constant
background noises. The size of segments depends on the resources available in the computer
employed to implement the algorithm. In a common desktop computer segments can have a
typical size of 30 s or longer. On the other hand, in an embedded computer the segments should
have a shorter size due to its computing capacity. This depends on the amount of memory
available (minimum size of 2 s for analysis).
Stage 3 - ​Peak detection: The presence of peaks in the ​sound ​envelope reveals the possible
target events. Each peak is detected as a change in the envelope of the derivative. However, to be
considered a possible event it must be higher than given thresholds. The peaks are detected
through the ​comparison of the ​sound ​envelope with a time-varying threshold ​T(​ ​k​) (red dashed line
in ​Figure 2.​c​). This threshold is generated by an algorithm that take into account the following
features given by the anatomical and ​behavioural characteristics of the ​animal​: i) there is a
minimum period of time between two consecutive jaw movements and ii) the ​duration of jaw
movements is constrained to a maximum period within a continuous activity (ruminating or
grazing).
Taking into account these features of the jaw movements to be monitored, the ​peak detection
algorithm generates the time-varying threshold ​T(​ ​k)​ with the following features (Christov, 2004):
● An ​unresponsive period ​(T ​ U​ ):
​ a period of time after detecting an ​event in which the
algorithm is not searching for a new ​event​. It is computed for each ​event as a fraction 𝛼 (0
< 𝛼 < 1) of the average ​duration​ of the last five events detected.
● A ​maximum period (​TM ​ ​): the maximum time that an ​event can last within the same activity.
It is computed for each ​event ​𝛽 ≥ 1 times the average ​duration of the last five events
detected.
● The ​peak expectation threshold (​TP​ ​): is the minimum value expected for the next ​peak
(blue dot-dashed line in ​Figure 2.​c​). It is computed as a fraction 𝜸 (0 < 𝜸 1) of the moving
average of the last five peaks ​S​P​ detected in the ​envelope​ ​signal
5
T P (k) = 5
∑ S P (k − i).
i=1

● The ​threshold slew-rate (​∆T)​ : is the quantity that the threshold ​T(​ ​k​) is decreased at each
sample, after the unresponsive period ​T​U has expired, to improve the ​event detection
sensitivity. The threshold ​T(​ k​ ​) only changes during the period of time between ​TU​ and ​TM
​ . It
is given by
T (k) = T (k − 1) − ΔT ∀ TU < k < TM .

This ​stage of the algorithm generates a file with timestamps that indicates where the peaks, in the
segment ​analysed​, have been detected. This information is used by the ​event detection and
classification ​stage​ to determine where to ​analyse​ the ​signal​ ​properties​.
​ roperties computation: This is the main step in capturing information to classify the
Stage 4 - P
candidate events detected in previous ​stage​. It computes the relevant ​properties for each
candidate ​event detected in the previous ​stage​. In this work we compute three ​properties of the
sound​ ​envelope​ and one from the ​sound​ itself.
● Shape of the ​event​: is quantified through computation of the ​number of changes in the
sign of the ​envelope slope ​NC (​Figure 2.​d​). To avoid confusion with noises, the slope is
computed only if the magnitude of the ​sound ​envelope is bigger than the background noise
detected in the ​analyzed​ segment (​NT​).
● Maximum amplitude of the event: (​EA​) is computed directly from the absolute value of
sound over a window whose length is half of the ​duration of a typical ​chewbite ​event
(​Figure 2.​e​). This way of computing the maximum amplitude of the ​signal not only provides
information about the maximum intensity of the ​event​, but it also includes additional
information about the shape of the ​event (see ​Figure 1) that can exploited by classification
algorithms.
● Duration of the ​event​: (​ED​) is determined from the ​sound ​envelope by measuring the time
period when the ​sound​ ​envelope​ is bigger than the background noise ​NT​ (​Figure​ 2.​f​).
Stage 5 - ​Event classification: This ​stage of the algorithm combines all the information available
about the individual events to classify them. Using a set of ​rules​, based on the ​properties
computed in the previous ​stage​, the events are classified into five categories: ​chew (​C​), bite (​B​),
chew-bite (​CB​), silence (​S​) and noise (​N​). The algorithm explores the timestamp, shape, maximum
amplitude and ​duration variables to detect and classify the events. The algorithm firstly checks the
time when a ​peak has been detected and ​analyses the three ​properties for this time instant. It
applies a set of ​rules to find if an ​event has happened or not and, in a positive case, which kind of
event has been detected. The set of ​rules employed in this work were established heuristically
from a training data set, under the constraints that the total ​number of ​rules should not be greater
than fifteen. The set of decision rules is detailed in Table 1. Each rule specifies the conditions that
a shape of event (NC), maximum amplitude (EA) and duration (ED) must meet to be classified as
CB, B or C, respectively. For example, if the number of changes in the sign of the envelope (NC) is
greater than 2, the amplitude of the event (EA) exceeds the noise threshold (NT) and duration is
greater than 0.3 seconds, then the detected event is classified as CB.
Table 1:​ ​Rules​ for jaw movement classification

Event Rules

Chewbite IF​ NC>2 ​AND​ EA>NT ​AND​ ED>0.3[s] ​THEN​ L(​j​)= “CB”
Bite IF​ NC<=2 ​AND​ EA>=0.5​ ​T​P​ ​AND​ ED<0.3[s] ​THEN​ L(​j​) = “​B”​
Chew IF NC<=2 ​AND EA>NT ​AND EA<0.5 T ​ ​AND ED<0.3[s] ​THEN
​ P
L(​j​) = “​C​”

Figure 3 shows the ​flow diagram of the algorithm for detection and classification of jaw movements
as a whole. The ​envelope ​signal (​Sp​) is ​analyzed by segments of ​N ​samples​. When a segment is
fully ​analyzed​, the ​results are saved before ​analyzing the next segment. In the first ​stage the
algorithm compute the time-varying threshold ​T(​ ​k)​ . Then, it checks if a ​peak has been detected. If
no peaks has been detected, the algorithm assigns the silence label (​S​) to the ​event​. If a ​peak has
been detected, the algorithm classifies the ​event by applying the ​rules to the three ​properties
derived from ​sound ​envelope (shape, maximum value and ​duration​) and assigning the
correspondent label (​C​, ​B​, ​CB ​or ​N)​ .
Figure 3.​ ​Flow​ diagram of the algorithm for ​event​ detection and classification.
2.2 ​Complexity​ analysis
One of the key problems of the algorithms reported in literature (Clapman et al., 2011, Milone et al,
2012, Tani et al., 2013 and Navon et al., 2012) is their computational ​complexity​, which imposes
severe limitations for their implementation in a system running in ​real-time​. This fact becomes
relevant when ​high quality​ and long ​duration​ (several hours) ​audio​ signals need to be processed.
For analysis purposes, the computational cost of each step of the algorithm was evaluated as
function of the ​number of ​samples ​n to be processed each second (​Table 2). Each ​stage of the
algorithm can be decomposed into basic tasks. For example, the ​envelope computation (​Stage 1)
can be decomposed into three tasks: i) signal rectification, ii) signal filtering and iii) signal
subsampling. The ​number of operations employed by filtering task will depend on its
implementation (IIR or FIR).
Table 2:​ ​Number​ of operations per second of the algorithm.
Stage Task Operations/s

1 Signal rectification 2​n


1 Signal filtering 9​n
1 Signal subsampling 2​n
2 Samples counting 100
3 Threshold generation 900
3 Event detection 100
4 Envelope slope computation 200
4 Maximum ​signal 100
4 Event ​duration​ computation 200
5 Silence rule 100
5 Chew-bite​ rule 500
5 Bite rule 500
5 Chew rule 500
5 Noise rule 500

The ​number of operations for each task performed through the algorithm is shown in ​Table 2, and
the total ​number of operations ​f(​ n
​ )​ required to execute the algorithm, for an IIR implementation of
filters, is
f (n) = 13n + 3700.
As shown in ​Table 2, only the first three tasks (i.e., rectification, filtering and subsampling) will
depend on the sampling frequency of the input ​signal​. After subsampling (​Stage 1), the ​signal
processed by the remaining tasks has a constant sample rate (100 ​samples​/s). Therefore, the
remaining tasks will be independent of the ​audio sample rate. For example, the computation of the
envelope slope requires the subtraction of two consecutive ​samples and computation of its sign,
which involves two operations. The ​rules ​evaluation involves five comparisons to check if the
conditions of rules are verified or not. A more detailed description of the complexity analysis for this
algorithm is provided in Appendix A. As this algorithm shows a linear computational complexity
(O(n)), a similar analysis on the algorithm developed by Milone et. al. (Appendix B) shows a
superlinear complexity (O(n log(n))).
For ​real-time operation, the algorithm must be able to process a ​signal segment before another
segment becomes available. To accomplish this objective, the algorithm implementation must
complete at least ​f​(​n​) fix-point operations every second. For the range of sampling rates
considered in this application (from 4 KHz to 44 Khz), it is easy to find a low cost commercial
microprocessor able to perform more than the required ​number of operations. For example, given
a 44 Khz sample rate it is possible to complete the execution of the algorithm with Tiva ​C
microcontroller (Tiva™ ​C Series LaunchPad ​Evaluation Kit, Texas Instruments Inc.) using a 10
MHz clock. The processing speed could be increased further (augmenting the clock frequency),
but at the expense of increasing energy consumption, which is an essential issue in portable
systems​.

3 ​Materials and methods


The purpose of the field studies was to obtain recordings of ​acoustic monitoring with which to test
the software implementation of the above algorithm. While signals were recorded in some cases
long-term (several hours), for analysis only maximum periods of 5 minutes were considered, given
the practical difficulty of labeling aurally longer periods. It was also necessary to establish
performance measures for analysis purposes.

​ xperimental​ field conditions


3.1 E
The ​materials used were two own databases obtained under different conditions and at different
times and places. One of these databases was obtained in real conditions of grazing and
rumination and was useful to test the algorithm at outdoor environments. The other database
contains signals obtained for different pastures and under controlled experimental conditions. This
database was used to ​analyse the effect of ​forage (​type and height) on the the detection and
classification capabilities of the proposed algorithm, and to compare its ​performance with other
method that used the same ​database​.
The acoustic signals to generate the first database were obtained from an experiment performed at
the Kellogg Biological Station (Michigan State University) dairy facility, during August of 2014.
Protocols for animal handling and care were reviewed, approved and conducted according to the
Institutional Animal Care and Use Committee of Michigan State University. In this experiment the
foraging behavior of five Holstein multiparous lactating cows grazing perennial ryegrass/white
clover and orchardgrass/white clover pastures were continuously monitored during six days. These
signals were recorded using SONY ICDPX312 recorders and a system of electret directional
microphones on the forehead of the ​animal ​(stereo configuration). All recordings were made at
44.1 kHz sampling rate and 16-bit resolution, providing a nominal 22 kHz recording bandwidth and
96 dB ​dynamic range, and stored in the WAV (Waveform ​Audio​) file format. In the following, these
recordings will be named as the Michigan ​Database​ (MDb).
The second ​database used in this work was the same used by Milone et. al. 2012. The fieldwork to
obtain this ​database was performed at the Campo ​Experimental ​J​.​F​. Villarino, Facultad ​de
Ciencias Agrarias, Universidad Nacional ​de Rosario, Argentina. The project was evaluated and
approved by the Committee on Ethical Use of ​Animals for ​Research of the Universidad Nacional
de Rosario. Sound signals from dairy cows grazing alfalfa or fescue of two different heights (tall,
24.5 ± 3.8 cm or ​short​, 11.6 ± 1.9 cm) were recorded in individual grazing sessions during a period
of 5 days from 17 to 21 February 2004. Forage species were selected because they differ in
structure and neutral detergent ​fibre content (alfalfa, 360 ± 11 ​g​/kg and fescue, 631 ± 6 ​g​/kg),
which are factors that have influence on the ​sound of ​chewing (Duizer, 2001). Two 4–6 year-old
lactating Holstein cows weighing 608 ± 24.9 kg, previously tamed and trained, were used. Two
wireless microphones (Nady 151 VR, Nady ​Systems​, Oakland, CA, USA) were randomly assigned
to ​animals each day. The microphone was placed facing inward on the forehead protected by foam
rubber (Milone et al., 2009). The distance between the wireless microphone and the receiver was
2–3 ​m​. Each cow grazed plants in pots that were firmly attached to a board placed inside a barn.
Behaviour was recorded with an ​analog video camcorder (Sony CCD-TR517), and then coded in
MPG format at 25 frames per second. The ​sound from the wireless microphone was recorded on
the tape soundtrack (16 bits, 44.1 kHz). A total of 50 signals were obtained: 15 from tall alfalfa, 11
from ​short alfalfa, 12 from tall fescue and 12 from ​short fescue. On average, for each
pasture/height the signals contained approximately 13 min of recording and around 800 events
(13% bites, 64% chews and 23% chewbites). In the following these recordings will be named as
the Rosario ​Database​ (RDb).
The algorithm was validated using these signals that were ​labeled aurally by experts in animal
behaviour, who were able to identify and classify individual events (​C​, ​B​, CB, ​S,​ ​N​) during grazing
and rumination. Thus, we will use this ​labeling as the reference for ​comparison and ​performance
evaluation​. In the case of signals belonging to MDb, only two periods of five minutes of all
recording were labelled. Each period contains approximately 350 events (25% bites, 48% chews
and 27% chewbites). One of these periods was used to analyze the effect of parameters and the
other was used for evaluation purposes. In a similar way that for MDb, a data partition was made
for RDb. For each pasture the 50% of signals were used to analyze the effect of parameters, while
the 50% remaining was used for evaluation purposes.

​ erformance​ m
3.2 P ​ easures
One important issue for the ​comparison between the events ​recognized and classified by the
algorithm and the corresponding reference labels is the time ​synchronization of events in both
sequences. To solve this, the HTK1 ​performance analysis tool (HRESULTS) was used. ​The
comparison is based on a Dynamic Programming-based string alignment procedure ​(Young et. al.,
2005)​. ​The outputs of this tool were: ​number of deletions (​D​), ​number of substitutions (​S​), ​number
of insertions (I) and the total ​number of labels in the defining transcription files (​N​). The ​percentage
number​ of labels correctly ​recognised​ is given by:
N −D−S
C% = N 100% ,
and the accuracy is computed by:
N −D−S−I
A% = N 100% .

4. ​Results
The purpose of this section is to analyze the effect of the operational parameters on the system
performance and analyze the performance of the proposed algorithm with the databases and
performance measures introduced in previous Section. The effect of the algorithm parameters on
performance was carried out through an exploratory analysis. The algorithm was evaluated for
different values of each parameter, obtaining the system performance measured in terms of
recognition rate, accuracy and computing time.

1
http://htk.eng.cam.ac.uk/
4.1 ​The effect of parameters
The main parameters of the algorithm are: i) the sampling frequency, ii) the quantization level, iii)
the cut-off frequency of the envelope detector filter and iv) the subsampling frequency. Their effect
on the system performance will be analyzed in the following paragraphs.
One of the fundamental requirements to process signals in ​real time on embedded ​systems is the
computational load and complexity of the algorithm, since they determine the requirements of the
hardware employed to implement the system. In signal processing, the computational load
principally depends on two parameters: i) the sampling frequency and ii) the quantization level of
the signal. The sampling frequency defines the information ​flow processed by the system per unit
of time and it plays a key role on the computational load of the algorithm (see ​Section 2.2). The
quantization level of the signal defines accuracy of the signal representation and, therefore the
word length required by the system to process the information. In this way, quantization defines
one aspect of the complexity of the system implementation.
Figure 4 shows the effect of sampling frequency on the performance of the algorithm (recognition
rate and accuracy) and the corresponding computational time for a sound segment of five minutes
of MDb database. The recognition rate and accuracy remain high (around 80%) constant over wide
range of frequencies (from 2 kHz to 11 kHz) and then decline for frequencies out this range. This
phenomenon can be explained by the fact that for sampling frequencies below 2 kHz the
signal/noise ratio is degraded because of important components of the signal are filtered. In a
similar way, once the sampling frequency goes beyond 11 kHz, the signal information remain
constant but the amount of noises processed ​by the algorithm ​increases, reducing the overall
signal​/noise relationship, degrading the overall ​signal​/noise relationship. However, in the range of
frequencies from 2 kHz to 11 kHz, the information and noises processed ​by the algorithm ​remain
constant, keeping the overall ​signal​/noise relationship constant.

Figure 4.​ Performance ​evaluation​ and corresponding computational time for a frame of five
minutes ​duration​ using different sampling frequencies.

The linear dependency of the computational time with the sample frequency (see Section 2.2) can
be seen in ​Figure 4, where the time required to process a five minutes segment is shown for
different sampling frequencies. It can be seen that a sampling frequency of 4 kHz provides a good
performance (recognition rate and accuracy) with low computational cost and fast execution. In this
sense, the algorithm presented has proved capable of processing signals 50 times faster than
real-time ​, i.​e​., ​analyzing 50 minutes of ​acoustic data per minute in a standard desktop PC. This is
shown in Figure 4, where can be observed that the algorithm is capable of processing 300 s (5
minutes) of sound signal in 6 seconds.

Figure 5.​ Performance ​evaluation​ and quantization error (computed as mean squared error) using
different quantization levels.

Figure 5 shows the effect of quantization level (or word length representation) on the performance
of the algorithm (recognition rate and accuracy) for a sound segment of MDb database. The
recognition rate and accuracy remain high (around 80%) and constant for quantization using 8 or
more bits. This phenomena can be explained by the fact that the quantization error for data
codified with 8 bits or more is not relevant, which can be seen in terms of MSE values.

Figure 6.​ ​Performance​ ​evaluation​ for different ​cut-off​ frequencies of the ​envelope​ detector filter.
Figure 6 shows the effect of the ​cut-off frequency of the ​envelope detector filter on the recognition
rate of MDb database. For the range between 3 Hz and 5 Hz the performance and accuracy of the
algorithm improve as the cut-off frequency of the filter is increased. A performance over 75% and
accuracy over 70% can be observed in the frequency range between 5 Hz and 6 Hz. Beyond 6 Hz,
performance and accuracy decline as cut-off frequency grows. These phenomena can be
explained by the fact that enlarging the bandwidth of the filter at low frequencies increases the
amount information processed by the algorithm, augmenting the overall ​signal​/noise relationship
and the performance of the algorithm. However, once the ​cut-off frequency goes beyond the 6 Hz,
the information remains constant and the amount of noises processed ​by the algorithm ​increases,
reducing the overall ​signal​/noise relationship diminishing the performance of the algorithm.

Figure 7.​ ​Performance​ ​evaluation​ in terms of p


​ ercentage​ of events correctly classified (​C​, ​B​, CB,
S​, ​N​) for different subsampling frequencies (MDb).

Figure 7 shows the effect of the subsampling frequency on the recognition rate of MDb database.
The ​performance (recognition rate and accuracy) shows a continuous increment as subsampling
frequency is increased until 100 Hz. From this frequency, the recognition rate shows a steady
behaviour while ​the accuracy increases decays slowly with the subsampling frequency raise.
These phenomena can be explained in a similar way of the ​cut-off frequency: by increasing the
subsampling frequency the amount information processed by the algorithm increases, improving
the overall ​signal​/noise relationship. However, once the subsampling frequency goes beyond the
100 Hz, the useful information remains constant and the overall ​signal​/noise relationship does not
change.

The same analysis was performed for RDb database over all pastures. A different effect of
parameters was observed. By doing an analysis over all standard sampling frequencies, the best
results were observed (in terms of recognition rate and accuracy) at 2 kHz and 4 kHz. With respect
to the cut-off frequency, the best results were observed at 3.5 Hz, achieving a high performance
with a high accuracy. Moreover, similar performances were obtained at cut-off frequencies of 4 Hz
and 5 Hz but with a lower accuracy. Regarding the decimation frequency, the best performance
was observed at 100 Hz, similarly to MDb database. We choose a sampling frequency of 2 kHz
because its lower computational cost.

​ vent​ detection
4.2 E

The ​event detection ​stage of this algorithm was evaluated over both databases, showing achieve a
detection rate of 98% (calculated as (N-D)/N over both databases), in the same order of the
algorithms published in the ​specialised literature. In this sense, Clapham et al. (2011) reported
detection of bites that was 95% correct, while Navon et al. (2012) reported detection rates of jaw
movements of 94% in ​low noise environments. Milone et al. (2012) (CBHMM) developed an
algorithm extending the use of HMM ​models to detect and classify the three types of ingestive
sounds of cattle (i.​e​. ​C​, ​B and CB), reaching a recognition rate close to 94% for detection of events
(calculated as (N-D)/N over RDb). In a similar way, Tani et. al. (2013) detects and classifies
ingestive and ruminating ​chewing​ and it achieves ​results​ of 98% approximately for event detection.

These quantitative ​results (except the result of algorithm developed by Milone et. al.) are not
directly comparable to ours because the studies vary in ​number of events ​analysed​, ​duration of
records, ​type and height of pasture, recording procedures and devices, and validation method.
Furthermore, the data employed in these studies are not available for numerical experimentation.

4.3 ​Event​ classification


Regarding event classification, we can say that this method clearly distinguished among types of
jaw movement in both MDb and RDb databases (Tables 4 and 5). The algorithm can only be
compared with the CBHMM method, because it was the only method found that makes a
classification of ingestive events (i.​e​. ​C​, ​B and CB). However, the comparison can only be carried
out on the second database (RDb) because the models of CBHMM method were originally fitted to
this database. The application of the CBHMM method on a different database could be wrong,
because the models would need to be adapted to the new recording conditions. Also, it is
important to note that to train and evaluate the ​models in CBHMM, a hold-out cross-validation
method was used (Duda et al., 1999), while in the present method (CBRTA) an analysis of the
effect of parameters was made using a subset of RDb, not further used for testing purposes.
For the event classification in MDb database, the best set of parameters sets for each database
were used in a cross way, in order to demonstrate robustness. Therefore, in Table 4, CBRTA
(MDb) is the algorithm with the best parameters for MDb database, while CBRTA (RDb) is the
algorithm with the best parameters for the RDb database. For the MDb database, the proposed
algorithm shows an average recognition rate of 83% of the total events for CBRTA (MDb) and an
average recognition rate of 79% of the total events for CBRTA (RDb). Therefore, the ​results for
event classification are lower than event detection rate in an average of 15% using CBRTA (MDb).
For both sets of parameters the algorithm achieves good performance rates for this database,
demonstrating its ability to generalize. Also, Table 4 ​summarizes the recognition rates for different
events. In this ​table can be observed the ability of the algorithm to correctly identify the ​chewing
events in both cases. It can be seen some degree of confusion between bites and chewbites,
which may be due to the similarity in the characteristics of both events. We believe that this
confusion is less critical at a practical level, since both (​B and CB) are ingestive events and not
only food processing (​C​).

Table 4:​ ​Percentage​ of correct classification of bites, chews and chewbites (MDb).
Bite Chew Chewbite Average
Event
CBRTA CBRTA CBRTA CBRTA CBRTA CBRTA CBRTA CBRTA
(MDb) (RDb) (MDb) (RDb) (MDb) (RDb) (MDb) (RDb)

Bite 95 94 2 3 3 3

Chew 8 8 87 91 5 1 83 79

Chewbite 22 44 8 6 70 50

The ​results with the RDb database were obtained for CBRTA using the best set of parameters
(resulting from the previous analysis) for the same database, and not further used for testing
purposes. This shows an average recognition rate of 77% of the total events over all pastures,
while the CBHMM method reached an average of 79.5% over all pastures. Table 5 ​summarizes
the recognition rates for different events for this ​database​. The best ​results were seen for tall
pastures reaching 79% and 78% for alfalfa and fescue respectively, while for short fescue a 77%
was obtained. An additional deterioration of 5% in the recognition rate can be appreciated for ​short
alfalfa. One reason for this situation is the change of water content in ​short forages (closest to soil
moisture), which modify the ​sound produced during the jaw movements (Rutter et al., 2002).
Therefore, recordings for ​short forages have a lower ​signal​-to-noise ratio and introduce errors in
the classification. In the same way as for the first ​database good ​results for the classification of
chews were obtained, which is a good sign for identification of rumination activities. Moreover, a
confusion of chewbites can be observed. That could be ameliorated by incorporating new features
like a measure of symmetry of the ​event​ or information about the sequence of events.

Table 5:​ ​Percentage​ of correct classification of bites, chews and chewbites for
the different forages (RDb).

Bite Chew Chewbite Average


Event
CBHMM CBRTA CBHMM CBRTA CBHMM CBRTA CBHMM CBRTA

Bite 79 67 11 18 9 15
Tall
Chew 3 2 88 90 9 8 84 79
alfalfa
Chewbite 2 5 3 11 94 84

Bite 76 62 16 30 8 8
Short
Chew 5 0 90 94 5 6 65 74
alfalfa
Chewbite 23 5 15 29 61 66

Bite 83 74 0 21 17 5
Tall
Chew 1 1 93 95 7 4 85 78
fescue
Chewbite 1 10 4 33 94 57

Bite 90 79 9 14 1 7
Short
Chew 0 1 99 99 1 0 84 77
fescue
Chewbite 2 25 7 32 91 43
The algorithm for automatic detection and classification of ​masticatory events sounds was
implemented using MATLAB R2010b for ​evaluation​ purposes.

5. ​Discussion

Most of previous studies that addressed the problem have only focussed on the detection of events
and not in their classification. To the best of our knowledge, the only work that focuses in both
detection and classification of ​acoustic events is the CBHMM method developed by Milone et. al.
(2012). However, none of these studies has made an analysis of computational ​complexity​. In the
present paper we carried out a detailed analysis of the CBHMM method and our algorithm and
then a ​comparison of their computational complexities was made. The analysis showed a linear
computational ​complexity for CBRTA algorithm (​O​(​n​)), while for the CBHMM method was found
greater ​complexity (​O​(​n log(​n​))). On the other hand, in laboratory tests, the algorithm presented
has proved capable of processing signals 50 times faster than ​real-time​. Others authors (Clapham
et. al. 2011) have reach 10 times faster than ​real-time on ​bite detection task only. This is a
promising result because it will allow to develop embedded implementations running in real-time
and achieving good ​performance​ with a low computational power.

The ​performance of the ​event detection ​stage could be compared with previous studies, achieving
a detection rate of 98%, close to the existing values in those studies. Regarding the ​event
classification ​stage​, the recognition rate for the first ​database (MDb) averaged 83% when was
used a set of the best parameters for a partition (not further used for testing purposes) within the
same database and performance rate of 79% when was used a set of the best parameters (not
further used for testing purposes) for a partition of the second database. This result shows that the
algorithm seems to be robust to databases with large differences. For the second ​database (RDb)​,
the proposed algorithm achieved a recognition rate of 77% on average, while the CBHMM method
averaged 79.5% over all pastures. For this ​database we observed a decrease in the recognition
rate for ​short alfalfa, regarding the ​performance to the rest of pastures, which is consistent with the
results obtained by Milone et. al. (2012).In short alfalfa there was a higher proportion of stems
(lesser proportion of leaves) than in tall alfalfa and fescue (tall and short), probably because of that
when cows cut the forage they produce bites sounds with lower amplitude increasing the
confusion between events. As expected, ​performance decreased for ​event classification because
it is a more difficult task than ​event detection. Can be seen that both task are related because the
algorithm classify only those events that were detected.

According to the flowchart of the algorithm (Figure 3) any detected event that is not classified
according to the rules of chewbite, bite or chew, is a noise event. Maybe some noise events could
be misclassified as chew, bite or chewbite, but we believe that the influence is minimal for two
reasons. The first reason is due to the type and location of the microphone. Directional electret
microphones have been used (sensing only in one direction) and placed facing inward on the
cow’s forehead protected by foam rubber (Milone et al., 2009). This was made to avoid the
influence of wind and to avoid using high-pass filters, which also could remove important
components of interest signal. The second reason is the use of a lowpass filter with cutoff
frequency of 5.5 Hz (or 3.5 Hz depending of database) in one of the stages of the algorithm.
Generally, noise is characterized as a non-stationary signal with high energy at high frequency, so
we believe that the noise energy that match with the frequency band of interest will not have too
much influence in the classification.
6. Conclusions

It has been demonstrated the importance of ​acoustic monitoring technique for both detection and
classification of ingestive events in ruminants. Although this technique is very appropriate, it
presents difficulties when trying to ​analyze large volumes of ​high-quality ​audio information. Usually
these difficulties are related to computation time, data transference between devices and storage
capacity. In this regard, the proposal of the authors was try to get good ​performance ​results with
single microphone ​signal processing​ techniques in order to reduce the computational ​cost​.

The algorithm developed was able to achieve good ​performance in the detection and classification
of ingestive events (i.e., ​chew​, bite and ​chewbite​) in ruminants. Furthermore, having a linear
computational ​complexity makes possible its implementation and execution on an embedded
system operating in ​real-time​. The ​results showed that with a sampling frequency of 4 kHz, good
performance rates can be obtained with a low computational cost. This tells us that the main
energy for the classification of ingestive events is below to 2 kHz in the target ​signal, ​consistent
with the results obtained by Milone et. al. (2012).

The next challenge will be that this method becomes part of forage intake model. It is also planned
its implementation in an integral system that allows the storage of results and incorporates an
intelligent supply system, in order to obtain a long ​time operation as a dedicated system in the
application field.

Acknowledgements
The authors would like to thank Marcelo Larripa, Alejandra Planisich and Martín Quinteros from
Facultad ​de Ciencias Agrarias, Universidad Nacional ​de Rosario for their assistance in ​animal
management and gathering data. This work has been funded by Agencia Nacional ​de Promoción
Científica ​y ​Tecnológica​, Universidad Nacional del Litoral and Universidad Nacional ​de Rosario,
under Projects PICT 2011-2440 “Desarrollo ​de una plataforma ​tecnológica para ganadería ​de
precisión”, PACT CAID 2011 “Señales, Sistemas ​e Inteligencia Computacional”, CAID 2011-525 “”
and 2013-AGR216.
Appendix A:​ ​CBRTA algorithm ​complexity​ analysis
In this appendix we will evaluate the computational cost of each step of the algorithm presented in
work and express it depending on the ​number of ​samples ​n to be processed per each second (
O(f (n)) ). The ​number of ​samples n to be processed each second depends on the sampling
frequency. Therefore, the number of operations required by each stage of the algorithm will
depend on the tasks performed

1. Signal rectification​: A simple pre-processing operation that guarantee a positive sign for
all samples. This operation requires only a comparison and a multiplication O(2n) .
2. Filtering: A filter a second-order low pass filter is applied to the resulting ​signal to obtain
the sound envelope. This filter can be implemented in two different ways​: i) A second order
difference equation (IIR) that involves five multiplications and four additions O(9n) or ii) a
finite impulse response filter (FIR) that involves ​N multiplications and ​N additions O(2N n) ,
where N is the number of taps employed by the filter. The use of one particular way of
implementing the filter will depend on the main constraint of the implementation:
computational efficiency for the FIR or numerical stability for the IIR.
3. Subsampling: ​To reduce the computational requirements (load and time) in the
subsequent tasks, without losing accuracy, the sound envelope is subsampled from its
original sampling frequency to 100 Hz. This operation requires an addition and a
comparison O(2n) .
4. Sound segments generation: The data stream generated in previous task is divided into
short segments. From computational point of view this task only involves counting samples,
which requires an addition O(100) .
5. Threshold generation: The time-varying threshold ​T​(​k)​ is computed through two task: i)
The computation of the ​peak expectation threshold (​TP​ ​), which requires five additions and
one multiplication O(600) , and the computation of the threshold ​T​(​k)​ , which requires one
addition and two comparisons O(300) . Therefore, the overall computational complexity of
this task is O(900) .
6. Event detection: This stage only involves the comparison of the threshold ​T(​ ​k​) with the
sound envelope, which implies a computational complexity of O(100) .
7. Properties computation: This step computes the properties of the sound envelope and
itself employed to classify the event. The shape of the event is quantified through
computation of the number of changes in the sign of the envelope slope when its
magnitude is bigger than the background noise. It requires one comparison and one
subtraction O(200). The duration of the event is computed from the sound envelope by
counting the number of samples when the envelope is bigger than the background noise
NT. It requires one comparison and one addition O(200). Finally, the maximum amplitude
of the event is computed directly from the absolute value of sound over a window whose
length is half of the duration of a typical chewbite event. It only requires one comparison
O(100).
8. Event classification: ​the algorithm combines all the information available about the
individual events to classify them. Using a set of five ​rules​, based on the ​properties
computed in the previous ​stage​, the events are classified into ​chew​, bite, ​chew-bite​, silence
and noise. The evaluation of a rule to classify a silence only requires the comparison of the
sample counter ​k​T​, which involve one operation O(100). To evaluate the remaining rules,
the algorithm checks the conditions that define them (five for each rule). Therefore, the
overall computational complexity for each of these rules is O(500). Since all rules are
evaluated at every event, the overall complexity for this task is O(2100).

The total ​number of operations ​f(​ ​n)​ required to execute the algorithm, for an IIR implementation of
filters, is
f (n) = 13n + 3700.

Appendix B:​ ​CBHMM algorithm c​ omplexity​ analysis


In this appendix we will briefly evaluate the cost of each step of the algorithm presented by Milone
et al, 2012 and express it as a function of the ​number of ​samples ​n to be processed per each
second ( O(f (n)) ). The corresponding system was implemented by the authors using the HTK
toolkit (Young et. al., 2005).

The ​number of ​samples n to be processed each second depends on the number of windows (per
second) and the ​number​ of ​samples​ per window nw as:

n = (number of windows) · nw

Window ​duration and step were defined as 60 ms and 40 ms, respectively. Regardless of the
sample rate of input ​audio​ the ​number​ of ​windows​ to be processed in each second are:
signallength −winstep 1000 ms − 40 ms
number of windows = winlength −winstep = 60 ms − 40 ms = 48 windows/second

The ​number​ of ​samples​ nw to be processed per window depends on the sampling frequency:

nw = (sampling f requency) · winlength

Recognition processes can be separated into two main stages: feature extraction and
classification. During the feature extraction ​stage, in each window the exact same processes are
performed. The following ​complexity​ analysis will be done for a single window of nw samples:

1. Pre-emphasis filter: a simple pre-processing operation that ​emphasise the ​signal sn by


applying a first order difference equation ( s′n = sn − k · sn−1 ) that involves an addition and
a multiplication O(2nw ) .
2. Windowing​: a Hamming window function is applied to preprocessed ​signal​. This operation
requires a multiplication for each sample of the window. O(nw )
3. Window energy​: is a numeric value obtained from windowed ​signal (E = Σ s′n ²) and it will
be part of feature vector. It requires O(2nw ) .
4. Fourier transform​: the window ​signal is transformed using a fast Fourier transform, then
the magnitude is taken. Complexity of these operations are O(nw log(nw )) and O(nw ) ,
respectively.
5. Filterbank analysis: is a simple transform based on a bank of triangular filters designed to
give approximately equal resolution on a mel-scale. Each Fourier magnitude coefficient is
multiplied by the corresponding filter gain and the ​results accumulated. Thus, each bin
holds a weighted sum representing the spectral magnitude in that filterbank channel. 10
filters that spread between 0 and 500 Hz were selected by (Milone et al.) then the
complexity as most is 10 · O(2 max.f ilter.length) . Because it is clear that
max.f ilter.length << nw then this operation should not be the most computationally
expensive.
6. Logarithm: is applied to each channel parameter of the filterbank. This requires 10
operations. O(10)
7. Deltas​: feature vector is conformed by 22 elements arranged as: 10 log-filterbank
parameters, window energy, deltas of log-filterbank parameters, and delta of energy. Deltas
computation requires eleven additional operations. O(11)

The total ​number​ of operations required to extract features from a single window is:

21 + 10 (value << nw ) + 6 nw + nw log(nw ) .

This number must be multiplied by the number of windows to obtain the complete number of
operations in the feature extraction stage for one second of audio. The cost of classification stage
will be revised in the following under the same assumption that one second of audio must be
processed.

Given the small number of “word models” in this application (only 3: chew, bite and chewbite,
without taking into account the silence model for simplicity) and the use of a relatively small
“language model” it is reasonable to suppose a similar complexity that in an isolated word
recognition task. Also, one second of audio could contains only one event, due to typical durations
of masticatory events. Thus, to do isolated word recognition, the following steps must be
performed: (i) generate a sequence of feature vectors corresponding to the audio, (ii) calculation
of model likelihoods for all possible models, (iii) selection of the word whose model likelihood is
highest.

Step (i) was already addressed in feature extraction stage. To perform step (ii) the Viterbi algorithm
is used. This algorithm requires on the order of V N ²T computations, where V is the number of
words, N is the number of ​states in each model, and T is the length of the feature vectors
sequence. Since V = 3 (chew, bite and chewbite), N = 4 , T = 48 (number of windows per
second), the computations needed are V N ²T = 2304 . Each Viterbi computation requires a
multiplication (1 operation), an addition (1 operation), and a likelihood calculation (at least
M (d + d²) operations (Duda, Hart, pp. 111), where the number of features is d = 22 , and the
number of mixed gaussians is M = 90 ). Then, the operations needed in step (ii) are
V N ²T (2 + M (d + d²)) = 104 928 768 ≃ 105 000 000 . Operations performed in step (iii) are just 3
comparisons to obtain the highest likelihood. Therefore, operations performed in classification
stage is determined by step(ii).

The total ​number of operations f (n) required to execute this algorithm is the sum of feature
extraction and classification stage costs:
f (nw ) = 48 {21 + 10 (value << nw ) + 6 nw + nw log(nw )} + 105 000 000
n n n n
f (n) = 48 {21 + 10 (value << 48 ) +6 48 + 48 log( 48 )} + 105 000 000

where n = 48 nw .
References

Alkon​, P. , & Cohen, A. (1986). Acoustical biotelemetry for ​wildlife ​research​: a preliminary test
and prospects. Wildlife Society Bulletin, 14, 193-196.
Alkon​, P., Cohen, A., & Jordan, P. A. (1989). Towards an ​acoustic biotelemetry system for ​animal
behavior​ studies. Journal of ​Wildlife​ ​Management​, 53, 658-662.
Chambers​, A., Hodgson, J​ ​., & Milne, ​J​. (1981). The development and use of equipment for the
automatic recording of ingestive ​behavior in sheep and cattle. Grass and ​Forage Science, 36,
97-105.Jose
Christov I. “Real time electrocardiogram QRS detection using combined adaptive threshold”,
Biomedical Engineering​, vol. 3(1), pp. 28-37, 2004.
Clapham​, W., Fedders, ​J​., Beeman, ​K​., & Neel, ​J​. (2011). Acoustic monitoring system to quantify
ingestive ​behavior​ of ​free-grazing​ cattle. Computers and Electronics in Agriculture, 76, 96-104.
Electronics in Agriculture 76, 96–104.
De Boever ​J​, Andries ​J​, Brabander ​D​, Cottyn ​B and ​F Buysse. “​Chewing activity of ruminants as
a measure of physical structure – a review of factors affecting it”, ​Animal ​Feed Science and
Technology, vol. 27, pp. 281–291, 1990.
Delagarde R, Caudal P and ​J Peyraud. “Development of an automatic bitemeter for grazing
cattle”, ​Ann. Zootech​. , vol. 48, pp. 329–339, 1999.
Duda​, R.O., Hart, P.E., Stork, D.G., 1999. Pattern Classification, second ed. John Wiley and
Sons.
Galli​, ​J​., Cangiano, ​C​., Demment, ​M​., & Laca, ​E​. (2006). Acoustic monitoring of ​chewing and
intake of fresh and dry forages in steers. Animal ​Feed​ Science and Technology, 128, 14-30.
Galli​, ​J​., Cangiano, ​C​., Milone, ​D​., & Laca, ​E​. (2011). Acoustic monitoring of ​short​-term ingestive
behavior​ and intake in grazing sheep. Livestock Science, 140, 32-41.
Laca​, ​E​., Ungar, E
​ ​., Seligman, ​N​., Ramey, ​M​., & Demment, ​M​. (1992). An integrated
methodology for studying ​short​-term grazing b
​ ehaviour of cattle. Grass and ​Forage Science,
47, 81-90.
Laca​, ​E​., & Wallis DeVries, ​M​. (2000). Acoustic measurement of intake and grazing ​behaviour of
cattle. Grass and ​Forage​ Science, 55, 97-104.
Matsui​, ​K​., & Okubo, T. (1991). A method for quantification of jaw movements suitable for use on
free-ranging cattle. Applied ​Animal​ ​Behaviour​ Science, 32, 107-116.
Milone​, ​D​., Rufiner, H., Galli, ​J​., Laca, ​E​., & Cangiano, ​C​. (2009). Computational method for
segmentation and classification of ingestive sounds in sheep. Computers and Electronics in
Agriculture, 65, 228 - 237.
Milone​, ​D​., Galli, ​J​., Cangiano, ​C​., Rufiner, H. and Laca, ​E. (2011). Automatic recognition nof
ingestive sounds of cattle based on hidden Markov models. Computers and Electronics in
Agriculture, 87, 51 - 55.
Navon ​S​, Mizrach A, Hetzroni A and E ​ Ungar. “Automatic recognition of jaw movements in
free-ranging cattle, goats and sheep, using ​acoustic monitoring”, ​Biosystems Engineering,​ vol.
114(4), pp. 474-483 (2013).
Penning​, P. (1983). A technique to record automatically some aspects of grazing and ruminating
behaviour​ in sheep. Grass and ​Forage​ Science, 38, 89-96.
Rutter​, ​S​. (2000). Graze: a program to ​analyze recordings of the jaw movements of ruminants.
Behavior​ ​Research​ ​Methods​, 32, 86-92.
Rutter​, ​S​., Champion, R., & Penning, P. (1997). An automatic system to record foraging
behaviour​ in free-ranging ruminants. Applied ​Animal​ ​Behaviour​ Science, 54, 185-195.
Rutter​, ​S​., Ungar, ​E​., Molle, ​G​., & Decandia, ​M​. (2002). Bites and chews in sheep: ​acoustic
versus automatic recording. In Proceedings of the 10th European intake workshop, Reykjavik,
Iceland.
Sauvant ​D​. “Granulométrie des rations et nutrition du ruminant”, INRA Productions Animales, vol.
13 (2), pp. 99–108, 2000.
Stobbs, T., & Cowper, L. (1972). Automatic measurement of the jaw movements of dairy cows
during grazing and rumination. Tropical Grasslands, 6, 107-112.
Tani ​Y​, Yokota ​Y​, Yayota ​M and ​S Ohtani. “Automatic recognition and classification of cattle
chewing activity by an ​acoustic monitoring method with a single-axis acceleration sensor”,
Computers​ and Electronics in Agriculture​, vol. 92, pp. 54-65 (2013).
Undersander​, D.J., E.C. Albert, D.R. Cosgrove, D.G. Johnson, and P.R. Peterson. 2002.
Pastures for profi t: A guide to rotational grazing. Univ. of Wisconsin Coop. Ext. Publ. A3529.
Univ. of Wisconsin, Madison
Ungar​, ​E​. (1996). Ingestive ​behaviour​. In ​J​. Hodgson, & A. W. Illius (Eds.), The ecology and
management​ of grazing ​systems​, Wallingford: CAB International, pp. 185 - 218.
Ungar​, ​E​., & Rutter, ​S​. (2006). Classifying cattle jaw movements: comparing IGER ​Behaviour
Recorder and ​acoustic​ techniques. Applied ​Animal​ ​Behaviour​ Science, 98, 11-27.
Young ​S​., Evermann ​G​., Gales ​M​., Hain T., Kershaw ​D​., Moore ​G​., Odell ​J​., Ollason ​D​., Povey
D​., Valtchev V., Woodland P.. “The HTK Book (for HTK Version 3.3)”, Cambridge University
Engineering Department (2005).

You might also like