Professional Documents
Culture Documents
1511 04664
1511 04664
rty y
Watching TV
sxt = [ ··· rt+N T
] , (3) Supported
Walking
Eating
−1 activities
... Emission
matrix
szt = [ rtz ··· z
rt+N −1 T
] . (4) Class membership probability
Softmax
yt maxP(ai | x t )
regression 1i M
Data Preprocessing
Time-series
A spectrogram of an accelerometer signal is a three dimen- segmentation
Figure 2: The greedy layer-wise training of DBNs. The Φ = (π, ψ, Υ) where π = (P (y1 = ai ) : i = 1, . . . , M )
first level is trained on triaxial acceleration data. Then, more is the prior probabilities of all activities in the first hidden
RBMs are repeatedly stacked to form a deep activity recog- state, ψ = (P (yt = ai |yt−1 = aj ) : i, j = 1, . . . , M )
nition model until forming a high-level representation. is the transition probabilities, and Υ =
(P (xt |yt = ai ) : i = 1, . . . , M and t = 1, . . . , T ) is
the emission matrix for observables xt from hidden sym-
x are continuous and are fed to a deep activity recognition bols ai . Given a sequence of observations, the emission
model. As a result, the first layer of the deep model is se- probabilities is found using a deep model. In particular,
lected as a Gaussian-binary RBM (GRBM) which can model the joint probabilities P (yt , xt ) of each joint configuration
the energy content in the continuous accelerometer data. (yt , xt ) in an HMM is found as follows:
Afterward, the subsequent layers are binary-binary RBMs T
(BRBMs). RBMs are energy-based probabilistic models Y
P (yt , xt ) = P (y1 ) P (x1 |y1 ) P (yi |yi−1 ) P (xi |yi(9)
),
which are trained using stochastic gradient descent on the
i=2
negative log-likelihood of the training data. For the GRBM
layer, the energy of an observed vector v = x and a hidden = P (yt−1 , xt−1 ) P (yt |yt−1 ) P (xt |yt ) , (10)
code h is denoted as follows: Herein, (10) shows that an HMM infers the posterior distri-
1 > bution P (yt |xt ) as a recursive process. This decoding prob-
E (v = x, h) = (v − b) (v − b)−c> h−v> Wh (5)
2 lem is solved for the most probable path of sequential activ-
where W is the weight matrix connecting the input and hid- ities.
den layers, b and c are the visible and hidden unit biases,
respectively. For a BRBM, the energy function is defined as
Computational Complexity
follows: Our algorithm consists of three working phases: (a) data
gathering, (b) offline learning, and (c) online activity recog-
E (v, h) = −b> v − c> h − v> Wh. (6) nition and inference. The computational burden of the of-
An RBM can be trained using the contrastive divergence ap- fline learning is relatively heavy to be run on a mobile de-
proximation (Hinton 2002) as follows: vice as it based on stochastic gradient descent optimiza-
tion. Therefore, it is recommended to run the offline train-
4Wij = α hvi hj idata − hvi hj i1 (7) ing of a deep activity recognition model on a capable server.
Nonetheless, after the offline training is completed, the
where α is a learning rate. hvi hj idata is the expectation of model parameter θ is only disseminated to the wearable de-
reconstruction over the data, and hvi hj i1 is the expectation vice where the online activity recognition is lightweight with
of reconstruction over the model using one step of the Gibbs a linear time complexity (O (L × D)), where D is the num-
sampler. Please refer to (Hinton, Osindero, and Teh 2006; ber of layers in the deep activity recognition model. Here,
Hinton 2012) for further details on the training of DBNs. the time complexity of the online activity recognition sys-
For simplicity, we denote the weights and biases of a DBN tem represents the time needed to recognize the activity as
model as θ which can be used to find the posterior probabil- a function of the accelerometer input length. The time com-
ities P (ai |xt , θ) for each joint configuration (ai , xt ). plexity of finding the short-time Fourier transform (STFT)
To this end, the underlying activity yt can be predicted at is O (L log (L)). Finally, the time complexity of the HMM
time t using the softmax regression as follows:
decoding problem is O M 2 × T .
yt = arg max {P (ai |xt , θ)} . (8)
1≤i≤M
Baselines and Result Summary
Alternatively, the temporal patterns in a sequence of activ- Datasets
ities can be further analyzed using HMMs. The following
section establishes the probabilistic connection between the For empirical comparison with existing approaches, we use
input data xt and activity prediction yt over a sequence of three public datasets that represent different application do-
observations 1 ≤ t ≤ T . mains to verify the efficiency of our proposed solution.
These three testbeds are described as follows:
Temporal Activity Recognition Models (DL-HMM) • WISDM Actitracker dataset (Kwapisz, Weiss, and
In some activity recognition applications, there is a Moore 2011): This dataset contains 1, 098, 213 samples
temporal pattern in executed human activities, e.g., of one triaxial accelerometer that is programmed to sam-
car checkpoint (Zappi et al. 2008). Hidden Markov ple at a rate of 20 Hz. The data samples belong to 29 users
The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence 5
and 6 distinctive human activities of walking, jogging, sit- 25
20
(a) X-axis
25
20
(b) Y-axis
25
20
(c) Z-axis
Acceleration (m=s2 )
15 15 15
ting, standing, and climbing stairs. The acceleration sam- 10
5
10
5
10
5
40 60
Moving average
80 100 120
−15
−20
0 20
Raw signal
40 60
Moving average
80 100 120
−15
−20
0 20
Raw signal
40 60 80
Moving average
100 120
Time (sec) Time (sec) Time (sec)
Frequency (Hz)
0
−10
cations of deep activity recognition models. The data sam- 6
4
6
4
6
4
−20
−30
−40
ples are collected from patients with the Parkinson’s dis- 2 2 2 −50
−60
−70
ease. Three triaxial accelerometers are fixed at patient’s 0
0 20 40 60
Time (sec)
80 100
0
0 20 40 60
Time (sec)
80 100
0
0 20 40 60
Time (sec)
80 100
98
98.3
97.85 97.85
99
98 97.95
97.75 97.55 97.65 97.75
Daphnet dataset), we use three performance metrics: 97 96.87
97.26
Best 97
96.58
97.46
Best
TP TN
Sensitivity (TPR) = TP+FN , specificity (TNR) = TN+FP , 96
Training Testing
96
Training Testing
95 95
TP+TN 1 2 3 4 5 1 2 3 4 5
and accuracy (ACC) = TP+TN+FP+FN where TP, TN, FP, and Number of layers Number of layers
FN mean true positive, true negative, false positive, and Figure 4: Optimizing deep activity recognition models.
false negative, respectively. For multiclass classification of Activity recognition using the WISDM Actitracker dataset
non-overlapping activities, which are based on the experi- under different DBN setup. At each figure, the rates of ac-
mentation of the WISDM Actitracker and Skoda checkpoint tivity recognition accuracy are shown for both training and
datasets, the average recognition accuracy (ACC) is found testing data samples. The input length is 303 which corre-
as ACC = M 1
PM T Pi +T Ni sponds to 10-second frames.
i=1 T Pi +T Ni +F Pi +F Ni , where M is the
number of supported activities.
Baselines to activities with active body motion, e.g., jogging and walk-
ing. On the other hand, the low frequency signals (a.k.a. DC
Table 1 summarizes the main performance results of our
components) are collected during semi-static body motions,
proposed method and some previous solutions on using the
e.g., sitting and standing. Thereby, these low frequency ac-
three datasets. Deep activity recondition models introduce
tivities are only distinguishable by the accelerometer mea-
significant accuracy improvement over conventional meth-
surement of the gravitational acceleration.
ods. For example, it improves accuracy by 6.53% over MLPs
and 3.93% over ensemble learning on the WISDM Acti-
tracker dataset. Similarly, significant improvements are also Performance Analysis
reported for the Daphnet freezing of gait and Skoda check-
point datasets. This summarized result shows that the deep In our experiments, the data is firstly centered to the mean
models are both (a) effective in improving recognition ac- and scaled to a unit variance. The deep activity recogni-
curacy over state-of-the-art methods, and (b) practical for tion models are trained using stochastic gradient decent with
avoiding the hand-engineering of features. mini-batch size of 75. For the first GBRM layer, the pre-
training learning rate is set to 0.001 with pre-training epochs
Experiments on Real Datasets of 150. For next BRBM layers, the number of pre-training
epochs is fixed to 75 with pre-training learning rate of 0.01.
Spectrogram Analysis The fine-tuning learning rate is 0.1 and the number of fine-
Figure 3 shows triaxial time series and spectrogram signals tuning epochs is 1000. For interested technical readers, Hin-
of 6 activities of the WISDM Actitracker dataset. Clearly, ton (Hinton 2012) provides a tutorial on training RBMs with
the high frequency signals (a.k.a. AC components) belong many practical advices on parameter setting and tuning.
The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence 6
DATASET R EFERENCE S OLUTION W INDOW SIZE ACCURACY (%)
(Kwapisz, Weiss, and Moore 2011) C4.5 85.1
(Kwapisz, Weiss, and Moore 2011) Logistic regression 78.1
WISDM (Kwapisz, Weiss, and Moore 2011) MLPs 10 sec 91.7
(Catal et al. 2015) Ensemble learning 94.3
Our solution Deep learning models 98.23
(Bächlin et al. 2010) Energy threshold on power spectral 4 sec TPR: 73.1 and
Daphnet density (0.5sec) TNR: 81.6
(Hammerla et al. 2013) C4.5 and k-NNs with feature extraction - TPR and TNR ∼ 82
methods
Our solution Deep learning models 4 sec TPR and TNR ∼ 91.5
(Zappi et al. 2008) HMMs - Node 16 (86),
Skoda
nodes 20, 22 and 25 (84)
Our solution Deep learning models 4 sec Node 16 (89.38)
Table 1: Comparison of our proposed solution against existing methods in terms of recognition accuracy. C4.5 is a decision tree
generation method.
1 96.87 2
Generative & discriminative
3 97.75
training 4
5 97.85
Activity index
1 96.87 6
Discriminative training only 3 96.46
5 96.51 8