Professional Documents
Culture Documents
razing cattle
behaviour of g
Abstract
Estimating forage intake and monitoring behaviour of the grazing livestock are difficult tasks.
Detection and classification of events like chew, bite and chewbite are very useful for estimate that
information. It is well known that acoustic monitoring is the best way to quantify and classify events
of the ruminant feeding behaviour. However, existing methods fail to be computationally efficient.
In this work, we present an acoustical analysis system that works in real-time and automatically
detects and classifies ingestive events of grazing cattle feeding. The system employs a directional
wide-frequency microphone facing inward on the forehead of the animal, signal analysis, and
decision logic to detect, classify and measure ingestive events. The system measures acoustic
parameters of ingestive events, such as duration, amplitude, shape and energy, which can support
further event classification and become the inputs to a forage intake model. Performance of the
algorithm was obtained by comparing automatic labeling of our method with reference labeling
made by an expert in animal behaviour. For testing and validation purposes, experiments with two
differents databases were conducted. For both databases we obtained a detection rate of events
without differentiation between chew, bite and chewbite of 98%. Using the first database, the
algorithm parameters were tuned and tested over differents signals of 5 minutes duration,
obtaining a recognition rate of 83% for event classification. Over the second database, two types of
forage were analyzed: alfalfa and fescue, each one for two different heights: tall (24.5 ± 3.8 cm)
and short (11.6 ± 1.9 cm). Average recognition rates were 79% for tall alfalfa, 73% for short alfalfa,
78% for tall fescue and 77% for short fescue. These results are similar to the ones reported with
other algorithms. However, the proposed method have the additional advantages of linear
computational complexity and low computational cost, making possible its implementation in low
cost embedded systems for real-time execution. Additionally, the output of the system is a small
size file, only containing the results of analysis, which allows its transmission through a wifi network
or its storage in the device.
Keywords: Acoustic monitoring; Grazing cattle behaviour; Jaw movement classification; Signal
processing; Real-time operation.
1. Introduction
Accurate monitoring of diet and feeding behaviour of grazing cattle is necessary to ensure the
health and welfare of these animals, which then will result in more quantity and better quality of
livestock derivate products (meat and dairy). Many efforts have been put into finding the most
appropriate technique to address this problem but the success of these techniques has been
limited by several factors (Ungar, 1996, Delegarde, 1999). One possible way of performing the
monitoring of ruminant feeding is through the detection of the three most common events of
grazing activity: bite, chew and chewbite (Milone et. al., 2012). Biting includes the apprehension
and saving of forage, while chewing action includes the crushing and processing of the forage.
There is also another important event resulting from the superposition of chewing and biting
activities but made with the same jaw movement, it is called chewbite. The detection and
classification of these three types of events is necessary for accurate monitoring of diet. The
quantification of chews provides important information not only on the ruminal fermentation of fiber,
but also is related to the rumen pH (Sauvant, 2000). The bites and chewbites are activities that
incorporate matter to the animal organism, including the latter also a pre-process of food (Laca and
WallisDeVries, 2000). While the number and characteristics of these events within the same
activity may vary according to several factors, they should stay within certain limit values to ensure
good health of the ruminant (De Boever et. al., 1990). To the best of our knowledge, only few
authors have developed algorithms able to detect and classify these events because of the
complexity to differentiate them, especially in noisy environments.
One approach for studying the grazing behaviour uses acoustic monitoring. Alkon and Cohen
(1986) and Alkon, Cohen, and Jordan (1989) suggested the use of acoustic biotelemetry to study
wild animal behaviour, and successfully identified various foraging actions in porcupines. Laca et
al. (1992) used acoustic monitoring to study the short-term grazing behaviour of cattle, by
mounting an inward-facing microphone on the forehead of the animal, because bone conduction
and the oral cavity tunnels deliver much stronger chew and bite sound intensities at the skull
surface than into the air space surrounding the head of the animal. The clarity of the signal thus
obtained left no doubt that this method detected all jaw movements that performed a bite or a
chew. The ripping sound of a bite and the grinding sound of a chew were readily distinguishable,
and the acoustic method was considered to be much more reliable and efficient for counting bites
than direct visual observation. Acoustic monitoring has been used since then to study grazing
behaviour of domesticated herbivores in the context of short and controlled experiments intended
to study basic questions regarding ingestive behaviour. All reported applications have involved
fresh herbage as the vegetation; several studies obtained encouraging results regarding estimation
of intake on the basis of acoustic variables (Galli et al., 2006; Galli, Cangiano, Milone, & Laca,
2011; Laca & Wallis De Vries, 2000).
In order to develop suitable algorithms to recognize the jaw movements from ingestive sounds,
Milone et al. (2009) used concepts from the field of automatic speech recognition to develop an
algorithm based on Hidden Markov Models (HMM) capable of identifying and classifying chew,
bite, and compound chewbite actions in sheep. The algorithm achieved average recognition rates
of 89%, 58%, and 56% in the evaluations, respectively for each acoustic event. Subsequently, Galli
et al. (2011) used this algorithm to demonstrate the possibility of using acoustic variables to
estimate dry matter intake in grazing sheep. Milone et al. (2012) developed an algorithm extending
the use of HMMs to recognize the three types of ingestive sounds of cattle (chew, bite and
chewbite) in alfalfa (tall and short) and fescue (tall and short). They obtained recognition rates of
84% and 65% for tall and short alfalfa, respectively, and 85% and 84% for tall and short fescue,
respectively.
Clapham et al. (2011) developed an automated bite-detection system for cattle grazing mixed
perennial pasture. The algorithm detects automatically only bites events for cows eating different
types of pastures and reaching about 95% recognition rate. They calibrated and used SIGNAL
sound analysis program for Windows (developed by the Engineering Design company) for the
detection of such events. According to them, the software processed signals at approximately ten
times faster than real-time. All recording signals were made on high quality (44.1 kHz sampling
rate and 16-bit resolution). A high-pass filter with cutoff frequency at 600 Hz (to attenuate signals
such as wind) was applied, because they focussed on event detection in the band from 17 kHz to
22 kHz. They encountered some difficulties in trying to scale the experiments to several days of
monitoring, including the capacity of storage devices and energy sources. The method seems to
require a careful calibration before it can be used in each experimental condition.
Navon et al. (2012) proposed a new algorithm based on the analysis of time domain features
(shape, intensity, duration and sequence) of the recorded sound for detection of jaw movements.
It does not require calibration and it uses a machine-learning approach to separate true
jaw-movement sounds from background and intense spurious noises. The algorithm performance
was tested in three field studies by comparing automatic labeling generated by the algorithm with
reference labeling made by an expert in animal behaviour. For cattle grazing green pasture in a
low noise environment with a Lavalier microphone positioned on the forehead, the system
achieved similar results to previous works (94% correct identification and a false positive rate of
7%).
Recently, Tani et al. (2013) developed a monitoring system using a single-axis accelerometer,
instead of microphones. The proposed algorithm uses signal processing and pattern recognition
techniques to iteratively estimates the eating and ruminating events from the recorded signal. The
patterns employed by the recognition algorithm are defined in frequency domain and they are used
to identify and classify the events. The algorithm achieves similar results to previous works
(Clapham et al., 2011; Navon et al., 2012) without calibration, however three problems related with
signal’s properties arise for this algorithm: i) the spectral similarity of rumination and eating event
could lead to poor results when raw signals with poor signal/noise relationship are analyzed, ii) the
non-stationary property of noises (background noise and/or unexpected noises), what makes it
change its statistical properties in time, and the spectral estimation of signals sampled at high
frequency requires a high computational power that makes difficult its implementation on portable
embedded systems. The idea of using an accelerometer for acoustic monitoring is interesting, but
it had intrinsic errors that should be analyzed for a more reliable recording. The authors performed
these experiments in stall, while admitting that it would be interesting to conduct such experiments
on grazing cattle, where sensor attachment and drift may be important issues.
Although several previous studies have shown good performance in detecting events using
acoustic monitoring, few of them have made an event classification and none have made an
analysis of the computational complexity of its implementation. An analysis of complexity is
necessary since the storage and transference of high quality and long duration (several hours or
days) data has many practical limitations. In this sense, it would be preferable to have a real-time
analysis tool to store the results directly, with the lowest computational cost possible. The aim of
this work was to develop an algorithm for automatic identification and classification of jaw
movements (chew, bite and chewbite) at a low computational cost. This idea points towards the
need of developing an automatic real-time system that can be implemented on portable embedded
system, which is the reason why the computational cost must be low. Outdoor environments
inevitably introduce some level of background noise into a recording, and it can be variable and
unpredictable. We aimed to deal with commonly encountered levels of such noises, combining
passive isolation (directional microphones and isolation material that could be easily acquired) with
basic signal processing.
The paper is organised as follows: Section 2 introduces the algorithm and its implementation. Also
its computational complexity is analysed in order to show its ability to operate in real time. In
Section 3, the two databases and the performance measure used are described. In Section 4,
results obtained from the application of proposed algorithm are presented and analysed. A
discussion about results is presented in Section 5. Finally, a brief conclusion is presented in
Section 6.
2. The algorithm
The design goal of the algorithm was to achieve good performance in the detection and
classification of events but with a low computational cost that allows its real time execution in
portable embedded systems. One of the adopted criteria was to work in time domain instead of
transformed domain (frequency, time-frequency) to avoid high computational load of using signal
analysis and machine-learning tools.
Figure 1. Examples of signals of typical acoustic events produced by jaw movements and their
correspondent features for: a) chew, b) chewbite and c) bite. In each row, top-down: acoustic
signal; intensity envelope, sign of slope envelope, maximum intensity, duration threshold.
Therefore, we isolated three properties of the sound envelope that, together, discriminate
sound-produced by jaw movements with high accuracy:
● Shape: The start and end of a jaw movement are accompanied by changes in the slope of
the intensity of the sound envelope (Figure 1). The sign of the envelope slope changes one
or two times for chews and bites, while it changes more than two times for a combined
chew-bite. Therefore, the number of changes in the sign of the envelope slope can be used
as a measure of the event shape;
● Intensity: Jaw movements produce sounds whose intensity changes along time (Figure 1).
The maximum intensity remains constant for a chew (low intensity) and a bite (high
intensity), while it changes (from low to high intensity) for a combined chew-bite;
● Duration: A jaw movement has a defined and limited duration that is characteristic for
chews and bites (Figure 1). The duration of chews and bites are similar and shorter than a
combined chew-bite.
These properties were fundamental to the design of the algorithm, which can be clearly divided into
two successive tasks:
1. Event detection: in this stage the algorithm detects the region of the sound envelope that
shows the occurrence of a possible jaw movement. The detection is carried out through the
identification of characteristic peaks in the sound envelope using an adaptive threshold;
2. Event classification: in this stage the algorithm use the features of the sound envelope to
classify detected event using a simple set of rules (Table 1). Classification is carried out
through the analysis and comparison of shape, intensity and duration of detected event with
given thresholds.
For its implementation, each of these tasks can be thought of as a set of five successive stages.
The event detection task is composed by the first four stages, while event classification task is
performed during the last stage. In the following paragraphs we will describe briefly how the
detection and classification algorithms were implemented.
Figure 2. Signals generated during stages of the algorithm for 15 seconds of sound: a) Sound
signal b) Sound envelope computation, c) peak detection, d) shape, e) maximum amplitude and f)
duration of events.
Stage 1 - Envelope computation: One of the key elements of this algorithm is the analysis of the
sound envelope to detect and classify the events. In the first step the absolute value of signal
elements is computed. Then, the resulting signal is filtered using a second-order low pass
Butterworth filter with a bandwidth of 5.5 Hz. The sound envelope of each event shows a longer
period (around 0.5 s) than the original sound (Figure 1), then the second step is to subsample the
sound envelope from its original sampling frequency to 100 Hz. The main effect of this task is to
reduce the computational requirements (load and time) in the subsequent tasks, since it reduces
the amount of information to be processed without losing accuracy in the detection and
classification results.
Stage 2 - Division of sound into segments: Short segments are easier to handle because of
computational resource constraints, and their use facilitate the treatment of unexpected events that
need special attention. Such events include intense external noises of short duration and constant
background noises. The size of segments depends on the resources available in the computer
employed to implement the algorithm. In a common desktop computer segments can have a
typical size of 30 s or longer. On the other hand, in an embedded computer the segments should
have a shorter size due to its computing capacity. This depends on the amount of memory
available (minimum size of 2 s for analysis).
Stage 3 - Peak detection: The presence of peaks in the sound envelope reveals the possible
target events. Each peak is detected as a change in the envelope of the derivative. However, to be
considered a possible event it must be higher than given thresholds. The peaks are detected
through the comparison of the sound envelope with a time-varying threshold T( k) (red dashed line
in Figure 2.c). This threshold is generated by an algorithm that take into account the following
features given by the anatomical and behavioural characteristics of the animal: i) there is a
minimum period of time between two consecutive jaw movements and ii) the duration of jaw
movements is constrained to a maximum period within a continuous activity (ruminating or
grazing).
Taking into account these features of the jaw movements to be monitored, the peak detection
algorithm generates the time-varying threshold T( k) with the following features (Christov, 2004):
● An unresponsive period (T U ):
a period of time after detecting an event in which the
algorithm is not searching for a new event. It is computed for each event as a fraction 𝛼 (0
< 𝛼 < 1) of the average duration of the last five events detected.
● A maximum period (TM ): the maximum time that an event can last within the same activity.
It is computed for each event 𝛽 ≥ 1 times the average duration of the last five events
detected.
● The peak expectation threshold (TP ): is the minimum value expected for the next peak
(blue dot-dashed line in Figure 2.c). It is computed as a fraction 𝜸 (0 < 𝜸 1) of the moving
average of the last five peaks SP detected in the envelope signal
5
T P (k) = 5
∑ S P (k − i).
i=1
● The threshold slew-rate (∆T) : is the quantity that the threshold T( k) is decreased at each
sample, after the unresponsive period TU has expired, to improve the event detection
sensitivity. The threshold T( k ) only changes during the period of time between TU and TM
. It
is given by
T (k) = T (k − 1) − ΔT ∀ TU < k < TM .
This stage of the algorithm generates a file with timestamps that indicates where the peaks, in the
segment analysed, have been detected. This information is used by the event detection and
classification stage to determine where to analyse the signal properties.
roperties computation: This is the main step in capturing information to classify the
Stage 4 - P
candidate events detected in previous stage. It computes the relevant properties for each
candidate event detected in the previous stage. In this work we compute three properties of the
sound envelope and one from the sound itself.
● Shape of the event: is quantified through computation of the number of changes in the
sign of the envelope slope NC (Figure 2.d). To avoid confusion with noises, the slope is
computed only if the magnitude of the sound envelope is bigger than the background noise
detected in the analyzed segment (NT).
● Maximum amplitude of the event: (EA) is computed directly from the absolute value of
sound over a window whose length is half of the duration of a typical chewbite event
(Figure 2.e). This way of computing the maximum amplitude of the signal not only provides
information about the maximum intensity of the event, but it also includes additional
information about the shape of the event (see Figure 1) that can exploited by classification
algorithms.
● Duration of the event: (ED) is determined from the sound envelope by measuring the time
period when the sound envelope is bigger than the background noise NT (Figure 2.f).
Stage 5 - Event classification: This stage of the algorithm combines all the information available
about the individual events to classify them. Using a set of rules, based on the properties
computed in the previous stage, the events are classified into five categories: chew (C), bite (B),
chew-bite (CB), silence (S) and noise (N). The algorithm explores the timestamp, shape, maximum
amplitude and duration variables to detect and classify the events. The algorithm firstly checks the
time when a peak has been detected and analyses the three properties for this time instant. It
applies a set of rules to find if an event has happened or not and, in a positive case, which kind of
event has been detected. The set of rules employed in this work were established heuristically
from a training data set, under the constraints that the total number of rules should not be greater
than fifteen. The set of decision rules is detailed in Table 1. Each rule specifies the conditions that
a shape of event (NC), maximum amplitude (EA) and duration (ED) must meet to be classified as
CB, B or C, respectively. For example, if the number of changes in the sign of the envelope (NC) is
greater than 2, the amplitude of the event (EA) exceeds the noise threshold (NT) and duration is
greater than 0.3 seconds, then the detected event is classified as CB.
Table 1: Rules for jaw movement classification
Event Rules
Chewbite IF NC>2 AND EA>NT AND ED>0.3[s] THEN L(j)= “CB”
Bite IF NC<=2 AND EA>=0.5 TP AND ED<0.3[s] THEN L(j) = “B”
Chew IF NC<=2 AND EA>NT AND EA<0.5 T AND ED<0.3[s] THEN
P
L(j) = “C”
Figure 3 shows the flow diagram of the algorithm for detection and classification of jaw movements
as a whole. The envelope signal (Sp) is analyzed by segments of N samples. When a segment is
fully analyzed, the results are saved before analyzing the next segment. In the first stage the
algorithm compute the time-varying threshold T( k) . Then, it checks if a peak has been detected. If
no peaks has been detected, the algorithm assigns the silence label (S) to the event. If a peak has
been detected, the algorithm classifies the event by applying the rules to the three properties
derived from sound envelope (shape, maximum value and duration) and assigning the
correspondent label (C, B, CB or N) .
Figure 3. Flow diagram of the algorithm for event detection and classification.
2.2 Complexity analysis
One of the key problems of the algorithms reported in literature (Clapman et al., 2011, Milone et al,
2012, Tani et al., 2013 and Navon et al., 2012) is their computational complexity, which imposes
severe limitations for their implementation in a system running in real-time. This fact becomes
relevant when high quality and long duration (several hours) audio signals need to be processed.
For analysis purposes, the computational cost of each step of the algorithm was evaluated as
function of the number of samples n to be processed each second (Table 2). Each stage of the
algorithm can be decomposed into basic tasks. For example, the envelope computation (Stage 1)
can be decomposed into three tasks: i) signal rectification, ii) signal filtering and iii) signal
subsampling. The number of operations employed by filtering task will depend on its
implementation (IIR or FIR).
Table 2: Number of operations per second of the algorithm.
Stage Task Operations/s
The number of operations for each task performed through the algorithm is shown in Table 2, and
the total number of operations f( n
) required to execute the algorithm, for an IIR implementation of
filters, is
f (n) = 13n + 3700.
As shown in Table 2, only the first three tasks (i.e., rectification, filtering and subsampling) will
depend on the sampling frequency of the input signal. After subsampling (Stage 1), the signal
processed by the remaining tasks has a constant sample rate (100 samples/s). Therefore, the
remaining tasks will be independent of the audio sample rate. For example, the computation of the
envelope slope requires the subtraction of two consecutive samples and computation of its sign,
which involves two operations. The rules evaluation involves five comparisons to check if the
conditions of rules are verified or not. A more detailed description of the complexity analysis for this
algorithm is provided in Appendix A. As this algorithm shows a linear computational complexity
(O(n)), a similar analysis on the algorithm developed by Milone et. al. (Appendix B) shows a
superlinear complexity (O(n log(n))).
For real-time operation, the algorithm must be able to process a signal segment before another
segment becomes available. To accomplish this objective, the algorithm implementation must
complete at least f(n) fix-point operations every second. For the range of sampling rates
considered in this application (from 4 KHz to 44 Khz), it is easy to find a low cost commercial
microprocessor able to perform more than the required number of operations. For example, given
a 44 Khz sample rate it is possible to complete the execution of the algorithm with Tiva C
microcontroller (Tiva™ C Series LaunchPad Evaluation Kit, Texas Instruments Inc.) using a 10
MHz clock. The processing speed could be increased further (augmenting the clock frequency),
but at the expense of increasing energy consumption, which is an essential issue in portable
systems.
erformance m
3.2 P easures
One important issue for the comparison between the events recognized and classified by the
algorithm and the corresponding reference labels is the time synchronization of events in both
sequences. To solve this, the HTK1 performance analysis tool (HRESULTS) was used. The
comparison is based on a Dynamic Programming-based string alignment procedure (Young et. al.,
2005). The outputs of this tool were: number of deletions (D), number of substitutions (S), number
of insertions (I) and the total number of labels in the defining transcription files (N). The percentage
number of labels correctly recognised is given by:
N −D−S
C% = N 100% ,
and the accuracy is computed by:
N −D−S−I
A% = N 100% .
4. Results
The purpose of this section is to analyze the effect of the operational parameters on the system
performance and analyze the performance of the proposed algorithm with the databases and
performance measures introduced in previous Section. The effect of the algorithm parameters on
performance was carried out through an exploratory analysis. The algorithm was evaluated for
different values of each parameter, obtaining the system performance measured in terms of
recognition rate, accuracy and computing time.
1
http://htk.eng.cam.ac.uk/
4.1 The effect of parameters
The main parameters of the algorithm are: i) the sampling frequency, ii) the quantization level, iii)
the cut-off frequency of the envelope detector filter and iv) the subsampling frequency. Their effect
on the system performance will be analyzed in the following paragraphs.
One of the fundamental requirements to process signals in real time on embedded systems is the
computational load and complexity of the algorithm, since they determine the requirements of the
hardware employed to implement the system. In signal processing, the computational load
principally depends on two parameters: i) the sampling frequency and ii) the quantization level of
the signal. The sampling frequency defines the information flow processed by the system per unit
of time and it plays a key role on the computational load of the algorithm (see Section 2.2). The
quantization level of the signal defines accuracy of the signal representation and, therefore the
word length required by the system to process the information. In this way, quantization defines
one aspect of the complexity of the system implementation.
Figure 4 shows the effect of sampling frequency on the performance of the algorithm (recognition
rate and accuracy) and the corresponding computational time for a sound segment of five minutes
of MDb database. The recognition rate and accuracy remain high (around 80%) constant over wide
range of frequencies (from 2 kHz to 11 kHz) and then decline for frequencies out this range. This
phenomenon can be explained by the fact that for sampling frequencies below 2 kHz the
signal/noise ratio is degraded because of important components of the signal are filtered. In a
similar way, once the sampling frequency goes beyond 11 kHz, the signal information remain
constant but the amount of noises processed by the algorithm increases, reducing the overall
signal/noise relationship, degrading the overall signal/noise relationship. However, in the range of
frequencies from 2 kHz to 11 kHz, the information and noises processed by the algorithm remain
constant, keeping the overall signal/noise relationship constant.
Figure 4. Performance evaluation and corresponding computational time for a frame of five
minutes duration using different sampling frequencies.
The linear dependency of the computational time with the sample frequency (see Section 2.2) can
be seen in Figure 4, where the time required to process a five minutes segment is shown for
different sampling frequencies. It can be seen that a sampling frequency of 4 kHz provides a good
performance (recognition rate and accuracy) with low computational cost and fast execution. In this
sense, the algorithm presented has proved capable of processing signals 50 times faster than
real-time , i.e., analyzing 50 minutes of acoustic data per minute in a standard desktop PC. This is
shown in Figure 4, where can be observed that the algorithm is capable of processing 300 s (5
minutes) of sound signal in 6 seconds.
Figure 5. Performance evaluation and quantization error (computed as mean squared error) using
different quantization levels.
Figure 5 shows the effect of quantization level (or word length representation) on the performance
of the algorithm (recognition rate and accuracy) for a sound segment of MDb database. The
recognition rate and accuracy remain high (around 80%) and constant for quantization using 8 or
more bits. This phenomena can be explained by the fact that the quantization error for data
codified with 8 bits or more is not relevant, which can be seen in terms of MSE values.
Figure 6. Performance evaluation for different cut-off frequencies of the envelope detector filter.
Figure 6 shows the effect of the cut-off frequency of the envelope detector filter on the recognition
rate of MDb database. For the range between 3 Hz and 5 Hz the performance and accuracy of the
algorithm improve as the cut-off frequency of the filter is increased. A performance over 75% and
accuracy over 70% can be observed in the frequency range between 5 Hz and 6 Hz. Beyond 6 Hz,
performance and accuracy decline as cut-off frequency grows. These phenomena can be
explained by the fact that enlarging the bandwidth of the filter at low frequencies increases the
amount information processed by the algorithm, augmenting the overall signal/noise relationship
and the performance of the algorithm. However, once the cut-off frequency goes beyond the 6 Hz,
the information remains constant and the amount of noises processed by the algorithm increases,
reducing the overall signal/noise relationship diminishing the performance of the algorithm.
Figure 7 shows the effect of the subsampling frequency on the recognition rate of MDb database.
The performance (recognition rate and accuracy) shows a continuous increment as subsampling
frequency is increased until 100 Hz. From this frequency, the recognition rate shows a steady
behaviour while the accuracy increases decays slowly with the subsampling frequency raise.
These phenomena can be explained in a similar way of the cut-off frequency: by increasing the
subsampling frequency the amount information processed by the algorithm increases, improving
the overall signal/noise relationship. However, once the subsampling frequency goes beyond the
100 Hz, the useful information remains constant and the overall signal/noise relationship does not
change.
The same analysis was performed for RDb database over all pastures. A different effect of
parameters was observed. By doing an analysis over all standard sampling frequencies, the best
results were observed (in terms of recognition rate and accuracy) at 2 kHz and 4 kHz. With respect
to the cut-off frequency, the best results were observed at 3.5 Hz, achieving a high performance
with a high accuracy. Moreover, similar performances were obtained at cut-off frequencies of 4 Hz
and 5 Hz but with a lower accuracy. Regarding the decimation frequency, the best performance
was observed at 100 Hz, similarly to MDb database. We choose a sampling frequency of 2 kHz
because its lower computational cost.
vent detection
4.2 E
The event detection stage of this algorithm was evaluated over both databases, showing achieve a
detection rate of 98% (calculated as (N-D)/N over both databases), in the same order of the
algorithms published in the specialised literature. In this sense, Clapham et al. (2011) reported
detection of bites that was 95% correct, while Navon et al. (2012) reported detection rates of jaw
movements of 94% in low noise environments. Milone et al. (2012) (CBHMM) developed an
algorithm extending the use of HMM models to detect and classify the three types of ingestive
sounds of cattle (i.e. C, B and CB), reaching a recognition rate close to 94% for detection of events
(calculated as (N-D)/N over RDb). In a similar way, Tani et. al. (2013) detects and classifies
ingestive and ruminating chewing and it achieves results of 98% approximately for event detection.
These quantitative results (except the result of algorithm developed by Milone et. al.) are not
directly comparable to ours because the studies vary in number of events analysed, duration of
records, type and height of pasture, recording procedures and devices, and validation method.
Furthermore, the data employed in these studies are not available for numerical experimentation.
Table 4: Percentage of correct classification of bites, chews and chewbites (MDb).
Bite Chew Chewbite Average
Event
CBRTA CBRTA CBRTA CBRTA CBRTA CBRTA CBRTA CBRTA
(MDb) (RDb) (MDb) (RDb) (MDb) (RDb) (MDb) (RDb)
Bite 95 94 2 3 3 3
Chew 8 8 87 91 5 1 83 79
Chewbite 22 44 8 6 70 50
The results with the RDb database were obtained for CBRTA using the best set of parameters
(resulting from the previous analysis) for the same database, and not further used for testing
purposes. This shows an average recognition rate of 77% of the total events over all pastures,
while the CBHMM method reached an average of 79.5% over all pastures. Table 5 summarizes
the recognition rates for different events for this database. The best results were seen for tall
pastures reaching 79% and 78% for alfalfa and fescue respectively, while for short fescue a 77%
was obtained. An additional deterioration of 5% in the recognition rate can be appreciated for short
alfalfa. One reason for this situation is the change of water content in short forages (closest to soil
moisture), which modify the sound produced during the jaw movements (Rutter et al., 2002).
Therefore, recordings for short forages have a lower signal-to-noise ratio and introduce errors in
the classification. In the same way as for the first database good results for the classification of
chews were obtained, which is a good sign for identification of rumination activities. Moreover, a
confusion of chewbites can be observed. That could be ameliorated by incorporating new features
like a measure of symmetry of the event or information about the sequence of events.
Table 5: Percentage of correct classification of bites, chews and chewbites for
the different forages (RDb).
Bite 79 67 11 18 9 15
Tall
Chew 3 2 88 90 9 8 84 79
alfalfa
Chewbite 2 5 3 11 94 84
Bite 76 62 16 30 8 8
Short
Chew 5 0 90 94 5 6 65 74
alfalfa
Chewbite 23 5 15 29 61 66
Bite 83 74 0 21 17 5
Tall
Chew 1 1 93 95 7 4 85 78
fescue
Chewbite 1 10 4 33 94 57
Bite 90 79 9 14 1 7
Short
Chew 0 1 99 99 1 0 84 77
fescue
Chewbite 2 25 7 32 91 43
The algorithm for automatic detection and classification of masticatory events sounds was
implemented using MATLAB R2010b for evaluation purposes.
5. Discussion
Most of previous studies that addressed the problem have only focussed on the detection of events
and not in their classification. To the best of our knowledge, the only work that focuses in both
detection and classification of acoustic events is the CBHMM method developed by Milone et. al.
(2012). However, none of these studies has made an analysis of computational complexity. In the
present paper we carried out a detailed analysis of the CBHMM method and our algorithm and
then a comparison of their computational complexities was made. The analysis showed a linear
computational complexity for CBRTA algorithm (O(n)), while for the CBHMM method was found
greater complexity (O(n log(n))). On the other hand, in laboratory tests, the algorithm presented
has proved capable of processing signals 50 times faster than real-time. Others authors (Clapham
et. al. 2011) have reach 10 times faster than real-time on bite detection task only. This is a
promising result because it will allow to develop embedded implementations running in real-time
and achieving good performance with a low computational power.
The performance of the event detection stage could be compared with previous studies, achieving
a detection rate of 98%, close to the existing values in those studies. Regarding the event
classification stage, the recognition rate for the first database (MDb) averaged 83% when was
used a set of the best parameters for a partition (not further used for testing purposes) within the
same database and performance rate of 79% when was used a set of the best parameters (not
further used for testing purposes) for a partition of the second database. This result shows that the
algorithm seems to be robust to databases with large differences. For the second database (RDb),
the proposed algorithm achieved a recognition rate of 77% on average, while the CBHMM method
averaged 79.5% over all pastures. For this database we observed a decrease in the recognition
rate for short alfalfa, regarding the performance to the rest of pastures, which is consistent with the
results obtained by Milone et. al. (2012).In short alfalfa there was a higher proportion of stems
(lesser proportion of leaves) than in tall alfalfa and fescue (tall and short), probably because of that
when cows cut the forage they produce bites sounds with lower amplitude increasing the
confusion between events. As expected, performance decreased for event classification because
it is a more difficult task than event detection. Can be seen that both task are related because the
algorithm classify only those events that were detected.
According to the flowchart of the algorithm (Figure 3) any detected event that is not classified
according to the rules of chewbite, bite or chew, is a noise event. Maybe some noise events could
be misclassified as chew, bite or chewbite, but we believe that the influence is minimal for two
reasons. The first reason is due to the type and location of the microphone. Directional electret
microphones have been used (sensing only in one direction) and placed facing inward on the
cow’s forehead protected by foam rubber (Milone et al., 2009). This was made to avoid the
influence of wind and to avoid using high-pass filters, which also could remove important
components of interest signal. The second reason is the use of a lowpass filter with cutoff
frequency of 5.5 Hz (or 3.5 Hz depending of database) in one of the stages of the algorithm.
Generally, noise is characterized as a non-stationary signal with high energy at high frequency, so
we believe that the noise energy that match with the frequency band of interest will not have too
much influence in the classification.
6. Conclusions
It has been demonstrated the importance of acoustic monitoring technique for both detection and
classification of ingestive events in ruminants. Although this technique is very appropriate, it
presents difficulties when trying to analyze large volumes of high-quality audio information. Usually
these difficulties are related to computation time, data transference between devices and storage
capacity. In this regard, the proposal of the authors was try to get good performance results with
single microphone signal processing techniques in order to reduce the computational cost.
The algorithm developed was able to achieve good performance in the detection and classification
of ingestive events (i.e., chew, bite and chewbite) in ruminants. Furthermore, having a linear
computational complexity makes possible its implementation and execution on an embedded
system operating in real-time. The results showed that with a sampling frequency of 4 kHz, good
performance rates can be obtained with a low computational cost. This tells us that the main
energy for the classification of ingestive events is below to 2 kHz in the target signal, consistent
with the results obtained by Milone et. al. (2012).
The next challenge will be that this method becomes part of forage intake model. It is also planned
its implementation in an integral system that allows the storage of results and incorporates an
intelligent supply system, in order to obtain a long time operation as a dedicated system in the
application field.
Acknowledgements
The authors would like to thank Marcelo Larripa, Alejandra Planisich and Martín Quinteros from
Facultad de Ciencias Agrarias, Universidad Nacional de Rosario for their assistance in animal
management and gathering data. This work has been funded by Agencia Nacional de Promoción
Científica y Tecnológica, Universidad Nacional del Litoral and Universidad Nacional de Rosario,
under Projects PICT 2011-2440 “Desarrollo de una plataforma tecnológica para ganadería de
precisión”, PACT CAID 2011 “Señales, Sistemas e Inteligencia Computacional”, CAID 2011-525 “”
and 2013-AGR216.
Appendix A: CBRTA algorithm complexity analysis
In this appendix we will evaluate the computational cost of each step of the algorithm presented in
work and express it depending on the number of samples n to be processed per each second (
O(f (n)) ). The number of samples n to be processed each second depends on the sampling
frequency. Therefore, the number of operations required by each stage of the algorithm will
depend on the tasks performed
1. Signal rectification: A simple pre-processing operation that guarantee a positive sign for
all samples. This operation requires only a comparison and a multiplication O(2n) .
2. Filtering: A filter a second-order low pass filter is applied to the resulting signal to obtain
the sound envelope. This filter can be implemented in two different ways: i) A second order
difference equation (IIR) that involves five multiplications and four additions O(9n) or ii) a
finite impulse response filter (FIR) that involves N multiplications and N additions O(2N n) ,
where N is the number of taps employed by the filter. The use of one particular way of
implementing the filter will depend on the main constraint of the implementation:
computational efficiency for the FIR or numerical stability for the IIR.
3. Subsampling: To reduce the computational requirements (load and time) in the
subsequent tasks, without losing accuracy, the sound envelope is subsampled from its
original sampling frequency to 100 Hz. This operation requires an addition and a
comparison O(2n) .
4. Sound segments generation: The data stream generated in previous task is divided into
short segments. From computational point of view this task only involves counting samples,
which requires an addition O(100) .
5. Threshold generation: The time-varying threshold T(k) is computed through two task: i)
The computation of the peak expectation threshold (TP ), which requires five additions and
one multiplication O(600) , and the computation of the threshold T(k) , which requires one
addition and two comparisons O(300) . Therefore, the overall computational complexity of
this task is O(900) .
6. Event detection: This stage only involves the comparison of the threshold T( k) with the
sound envelope, which implies a computational complexity of O(100) .
7. Properties computation: This step computes the properties of the sound envelope and
itself employed to classify the event. The shape of the event is quantified through
computation of the number of changes in the sign of the envelope slope when its
magnitude is bigger than the background noise. It requires one comparison and one
subtraction O(200). The duration of the event is computed from the sound envelope by
counting the number of samples when the envelope is bigger than the background noise
NT. It requires one comparison and one addition O(200). Finally, the maximum amplitude
of the event is computed directly from the absolute value of sound over a window whose
length is half of the duration of a typical chewbite event. It only requires one comparison
O(100).
8. Event classification: the algorithm combines all the information available about the
individual events to classify them. Using a set of five rules, based on the properties
computed in the previous stage, the events are classified into chew, bite, chew-bite, silence
and noise. The evaluation of a rule to classify a silence only requires the comparison of the
sample counter kT, which involve one operation O(100). To evaluate the remaining rules,
the algorithm checks the conditions that define them (five for each rule). Therefore, the
overall computational complexity for each of these rules is O(500). Since all rules are
evaluated at every event, the overall complexity for this task is O(2100).
The total number of operations f( n) required to execute the algorithm, for an IIR implementation of
filters, is
f (n) = 13n + 3700.
The number of samples n to be processed each second depends on the number of windows (per
second) and the number of samples per window nw as:
n = (number of windows) · nw
Window duration and step were defined as 60 ms and 40 ms, respectively. Regardless of the
sample rate of input audio the number of windows to be processed in each second are:
signallength −winstep 1000 ms − 40 ms
number of windows = winlength −winstep = 60 ms − 40 ms = 48 windows/second
The number of samples nw to be processed per window depends on the sampling frequency:
Recognition processes can be separated into two main stages: feature extraction and
classification. During the feature extraction stage, in each window the exact same processes are
performed. The following complexity analysis will be done for a single window of nw samples:
The total number of operations required to extract features from a single window is:
This number must be multiplied by the number of windows to obtain the complete number of
operations in the feature extraction stage for one second of audio. The cost of classification stage
will be revised in the following under the same assumption that one second of audio must be
processed.
Given the small number of “word models” in this application (only 3: chew, bite and chewbite,
without taking into account the silence model for simplicity) and the use of a relatively small
“language model” it is reasonable to suppose a similar complexity that in an isolated word
recognition task. Also, one second of audio could contains only one event, due to typical durations
of masticatory events. Thus, to do isolated word recognition, the following steps must be
performed: (i) generate a sequence of feature vectors corresponding to the audio, (ii) calculation
of model likelihoods for all possible models, (iii) selection of the word whose model likelihood is
highest.
Step (i) was already addressed in feature extraction stage. To perform step (ii) the Viterbi algorithm
is used. This algorithm requires on the order of V N ²T computations, where V is the number of
words, N is the number of states in each model, and T is the length of the feature vectors
sequence. Since V = 3 (chew, bite and chewbite), N = 4 , T = 48 (number of windows per
second), the computations needed are V N ²T = 2304 . Each Viterbi computation requires a
multiplication (1 operation), an addition (1 operation), and a likelihood calculation (at least
M (d + d²) operations (Duda, Hart, pp. 111), where the number of features is d = 22 , and the
number of mixed gaussians is M = 90 ). Then, the operations needed in step (ii) are
V N ²T (2 + M (d + d²)) = 104 928 768 ≃ 105 000 000 . Operations performed in step (iii) are just 3
comparisons to obtain the highest likelihood. Therefore, operations performed in classification
stage is determined by step(ii).
The total number of operations f (n) required to execute this algorithm is the sum of feature
extraction and classification stage costs:
f (nw ) = 48 {21 + 10 (value << nw ) + 6 nw + nw log(nw )} + 105 000 000
n n n n
f (n) = 48 {21 + 10 (value << 48 ) +6 48 + 48 log( 48 )} + 105 000 000
where n = 48 nw .
References
Alkon, P. , & Cohen, A. (1986). Acoustical biotelemetry for wildlife research: a preliminary test
and prospects. Wildlife Society Bulletin, 14, 193-196.
Alkon, P., Cohen, A., & Jordan, P. A. (1989). Towards an acoustic biotelemetry system for animal
behavior studies. Journal of Wildlife Management, 53, 658-662.
Chambers, A., Hodgson, J ., & Milne, J. (1981). The development and use of equipment for the
automatic recording of ingestive behavior in sheep and cattle. Grass and Forage Science, 36,
97-105.Jose
Christov I. “Real time electrocardiogram QRS detection using combined adaptive threshold”,
Biomedical Engineering, vol. 3(1), pp. 28-37, 2004.
Clapham, W., Fedders, J., Beeman, K., & Neel, J. (2011). Acoustic monitoring system to quantify
ingestive behavior of free-grazing cattle. Computers and Electronics in Agriculture, 76, 96-104.
Electronics in Agriculture 76, 96–104.
De Boever J, Andries J, Brabander D, Cottyn B and F Buysse. “Chewing activity of ruminants as
a measure of physical structure – a review of factors affecting it”, Animal Feed Science and
Technology, vol. 27, pp. 281–291, 1990.
Delagarde R, Caudal P and J Peyraud. “Development of an automatic bitemeter for grazing
cattle”, Ann. Zootech. , vol. 48, pp. 329–339, 1999.
Duda, R.O., Hart, P.E., Stork, D.G., 1999. Pattern Classification, second ed. John Wiley and
Sons.
Galli, J., Cangiano, C., Demment, M., & Laca, E. (2006). Acoustic monitoring of chewing and
intake of fresh and dry forages in steers. Animal Feed Science and Technology, 128, 14-30.
Galli, J., Cangiano, C., Milone, D., & Laca, E. (2011). Acoustic monitoring of short-term ingestive
behavior and intake in grazing sheep. Livestock Science, 140, 32-41.
Laca, E., Ungar, E
., Seligman, N., Ramey, M., & Demment, M. (1992). An integrated
methodology for studying short-term grazing b
ehaviour of cattle. Grass and Forage Science,
47, 81-90.
Laca, E., & Wallis DeVries, M. (2000). Acoustic measurement of intake and grazing behaviour of
cattle. Grass and Forage Science, 55, 97-104.
Matsui, K., & Okubo, T. (1991). A method for quantification of jaw movements suitable for use on
free-ranging cattle. Applied Animal Behaviour Science, 32, 107-116.
Milone, D., Rufiner, H., Galli, J., Laca, E., & Cangiano, C. (2009). Computational method for
segmentation and classification of ingestive sounds in sheep. Computers and Electronics in
Agriculture, 65, 228 - 237.
Milone, D., Galli, J., Cangiano, C., Rufiner, H. and Laca, E. (2011). Automatic recognition nof
ingestive sounds of cattle based on hidden Markov models. Computers and Electronics in
Agriculture, 87, 51 - 55.
Navon S, Mizrach A, Hetzroni A and E Ungar. “Automatic recognition of jaw movements in
free-ranging cattle, goats and sheep, using acoustic monitoring”, Biosystems Engineering, vol.
114(4), pp. 474-483 (2013).
Penning, P. (1983). A technique to record automatically some aspects of grazing and ruminating
behaviour in sheep. Grass and Forage Science, 38, 89-96.
Rutter, S. (2000). Graze: a program to analyze recordings of the jaw movements of ruminants.
Behavior Research Methods, 32, 86-92.
Rutter, S., Champion, R., & Penning, P. (1997). An automatic system to record foraging
behaviour in free-ranging ruminants. Applied Animal Behaviour Science, 54, 185-195.
Rutter, S., Ungar, E., Molle, G., & Decandia, M. (2002). Bites and chews in sheep: acoustic
versus automatic recording. In Proceedings of the 10th European intake workshop, Reykjavik,
Iceland.
Sauvant D. “Granulométrie des rations et nutrition du ruminant”, INRA Productions Animales, vol.
13 (2), pp. 99–108, 2000.
Stobbs, T., & Cowper, L. (1972). Automatic measurement of the jaw movements of dairy cows
during grazing and rumination. Tropical Grasslands, 6, 107-112.
Tani Y, Yokota Y, Yayota M and S Ohtani. “Automatic recognition and classification of cattle
chewing activity by an acoustic monitoring method with a single-axis acceleration sensor”,
Computers and Electronics in Agriculture, vol. 92, pp. 54-65 (2013).
Undersander, D.J., E.C. Albert, D.R. Cosgrove, D.G. Johnson, and P.R. Peterson. 2002.
Pastures for profi t: A guide to rotational grazing. Univ. of Wisconsin Coop. Ext. Publ. A3529.
Univ. of Wisconsin, Madison
Ungar, E. (1996). Ingestive behaviour. In J. Hodgson, & A. W. Illius (Eds.), The ecology and
management of grazing systems, Wallingford: CAB International, pp. 185 - 218.
Ungar, E., & Rutter, S. (2006). Classifying cattle jaw movements: comparing IGER Behaviour
Recorder and acoustic techniques. Applied Animal Behaviour Science, 98, 11-27.
Young S., Evermann G., Gales M., Hain T., Kershaw D., Moore G., Odell J., Ollason D., Povey
D., Valtchev V., Woodland P.. “The HTK Book (for HTK Version 3.3)”, Cambridge University
Engineering Department (2005).