Professional Documents
Culture Documents
Abstract—Device-free Wi-Fi indoor localization has received From the mathematical techniques perspective, wireless
significant attention as a key enabling technology for many indoor localization can be classified as using triangula-
Internet of Things (IoT) applications. Machine learning-based tion/trilateration or fingerprinting for position calculation. The
location estimators, such as the deep neural network (DNN),
arXiv:1904.10154v2 [cs.NI] 22 May 2019
carry proven potential in achieving high-precision localization triangulation/trilateration technique calculates the location of
performance by automatically learning discriminative features a target based on distances and/or angles of the target with
from the noisy wireless signal measurements. However, the inner respect to signal sources. The fingerprinting technique matches
workings of DNNs are not transparent and not adequately the online testing measurements with the database created
understood especially in the indoor localization application. In from offline site survey to determine the location of a tar-
this paper, we provide quantitative and visual explanations for
the DNN learning process as well as the critical features that DNN get. For fingerprinting indoor localization, machine learning-
has learned during the process. Toward this end, we propose to based approaches have been studied [6]–[14]. In [6], machine
use several visualization techniques, including: 1) dimensionality learning classifiers such as k-nearest neighbor (k-NN), support
reduction visualization, to project the high-dimensional feature vector machine (SVM), linear discriminant analysis (LDA),
space to the 2D space to facilitate visualization and interpretation, and random forests for localization in changing environments
and 2) visual analytics and information visualization, to quantify
relative contributions of each feature with the proposed feature were studied and compared. Wu et al. [7] proposed an indoor
manipulation procedures. The results provide insightful views localization system using the Naive Bayes classifier and com-
and plausible explanations of the DNN in device-free Wi-Fi pared it with k-NN and SVM. Zhou et al. [8] applied SVM
indoor localization using channel state information (CSI) fin- to device-free localization based on channel state information
gerprints. (CSI) fingerprinting. In [9]–[14], deep neural network (DNN)-
Index Terms—Wireless indoor localization, fingerprinting, based approaches to indoor localization have been proposed.
channel state information (CSI), machine learning, deep neural Saha et al. [9] proposed a localization scheme based on the re-
networks (DNN), Internet of Things (IoT), visual analytics. ceived signal strength (RSS) of Wi-Fi using a neural network-
based classifier, which was shown to outperform the nearest
I. I NTRODUCTION neighbor classifier and the histogram matching method. Zhang
et al. [10] developed an RSS-based Wi-Fi localization scheme
Indoor localization is an essential component of many that combines DNN and hidden Markov model (HMM). It
Internet of Things (IoT) applications [1]. Applications such was shown that DNN outperforms other machine learning
as healthcare management [2], object tracking [3], and IoT- approaches including k-NN, SVM, and locally linear embed-
based smart environment [4] all require precise indoor location ding (LLE) each in combination with HMM, in terms of root
information. From the device perspective, wireless indoor mean square error (RMSE). Wang et al. [11] proposed a novel
localization can be classified as either device-based or device- deep learning-based fingerprinting indoor localization system
free, depending on whether or not a tracking device needs based on the CSI of Wi-Fi. A multilayer framework of a deep
to be attached to the target to be localized [5]. Device-based network was developed for learning discriminative features so
methods generally have the advantages of higher accuracy and as to effectively reduce the distance error in comparison with
robustness against environmental interferences and dynamics the probabilistic methods. Chang et al. [12] proposed a device-
as compared to device-free methods. However, device-free free DNN-based Wi-Fi fingerprinting localization system. CSI
methods have the advantages of less hardware costs, less pre-processing and data augmentation (including noise injec-
power demands, better privacy, and provision of real-time posi- tion and interperson interpolation) were incorporated into the
tioning and tracking, and find a wide array of IoT applications DNN framework for enhanced robustness and performance.
such as intrusion detection, elderly care, etc. Wang et al. [13] designed a machine learning framework
for simultaneous location, activity, and gesture recognition by
This work was supported in part by the Center for mmWave Smart
Radar Systems and Technologies, under the Featured Areas Research Center integrating a sparse autoencoder network with the softmax-
Program within the framework of the Higher Education Sprout Project by the regression-based classification module. The learning process
Ministry of Education (MOE), Taiwan, and in part by the Ministry of Science can automatically learn the discriminative features from the
and Technology (MOST), Taiwan, under Grants MOST 106-2628-E-001-001-
MY3, MOST 107-3017-F-009-001, and MOST 107-2221-E-009-037-MY2. RSS measurements and achieve an accuracy higher than 85%,
S.-J. Liu and R. Y. Chang are with the Research Center for Infor- which is better than the system without utilizing the learning
mation Technology Innovation, Academia Sinica, Taipei, Taiwan (e-mail: process. Gao et al. [14] used a multilayer deep learning-
{cindy1445858, rchang}@citi.sinica.edu.tw).
F.-T. Chien is with the Institute of Electronics, National Chiao Tung based image processing framework to learn the optimized
University, Hsinchu, Taiwan (e-mail: ftchien@mail.nctu.edu.tw). deep features from the radio images (transformed from the
2
amplitude and phase information of the measured CSI) to • We model the device-free wireless indoor localization as
estimate the location and activity of a target person. Note that a classification problem using the DNN, which yields a
[9]–[11] are device-based and [6]–[8], [12]–[14] are device- better average precision and recall rate than the k-NN and
free indoor localization schemes. SVM techniques.
DNNs have achieved extraordinary results in many areas. • To the best of our knowledge, this paper presents the
While the DNN can perform automatic feature extraction first study that provides visual analysis to understand the
without much human intervention, it is largely conceived as a mechanism of DNN-based device-free wireless indoor
“black box,” and the design of a neural network is typically localization. We employ two visual analytics techniques,
a trial-and-error process. The transparency issue of the DNN namely, dimensionality reduction visualization and visual
has received increasing attention lately, where both machine analytics and information visualization, to analyze the
learning and visualization communities have attempted to resultant clustering effect among the data after DNN and
understand the training process and interpret the machine visualize the critical channels of a given input relative to
cognition in a visually conceivable way. Craven and Shavlik a particular output decision.
[15] surveyed several visualization techniques, including Hin- • We exploit the relevance score of a channel in an input
ton diagrams, bond diagrams, hyperplane diagrams, response- CSI relative to a specific output class by regarding it
function plots, and trajectory diagrams, which help provide as a correlation coefficient between two channels: one
visual evidence for understanding the learning and decision- associated with the location class of the input CSI and
making processes of neural networks. Mohamed et al. [16] the other associated with the specific output class that is
introduced a dimension-reduction technique which shows that learned by the DNN. Particularly, we develop the pro-
pre-training, fine-tuning, and nonlinear hidden layers are three gressive channel nullification and channel modification
major components that contribute to the superb acoustic recog- procedures to evaluate the impacts of the channels with
nition performance of deep belief networks (DBNs). Rauber et higher absolute values of relevance scores on the output
al. [17] analyzed the DNNs by visualizing the low-dimensional prediction of each class. With extensive experiments
projections of hidden layer activations, and provided insights using these two channel manipulation procedures, we
into how the learned representations of samples have evolved justify that relevance scores are truly reliable quantitative
during training. Bach et al. [18] introduced the layer-wise indicators of critical features that distinguish classes,
relevance propagation (LRP) algorithm for identifying impor- which allows for a better understanding of the critical
tant pixels linked to a particular DNN prediction in a given features in wireless signals learned by the DNN which
input image. The significant features extracted by the training cannot be well interpreted by human perceptions.
process were visualized in the pixel/input space. Samek et al. The rest of the paper is organized as follows. Section II
[19] showed that LRP, as compared to the sensitivity analysis provides preliminaries on the system and experiments of
and the deconvolution method, can more accurately identify DNN-based device-free Wi-Fi indoor localization. Section III
important pixels based on which the DNN prediction was presents the visual analytics techniques. Sections IV and V
made. present the results and discussion. Finally, Section VI con-
Applying the visual analytics techniques to understand the cludes the paper.
working mechanism of DNNs in wireless indoor localization
has not been studied. The significance of this endeavor is II. E XPERIMENTS AND DATASETS
twofold. First, providing visual evidence and interpretation
Two real-world experiments for device-free Wi-Fi indoor
of DNN workings in wireless indoor localization is by it-
localization were conducted, in a conference room and in
self valuable, as the wireless signal patterns generally lack
a lounge, respectively, at the Research Center for Infor-
visual evidence and human intuition to interpret important
mation Technology Innovation, Academia Sinica. The two
features. Making DNN classifier interpretable also opens up
environments exemplify different but common, realistic indoor
new ways for model improvement and allows for analysis and
settings (open space in the conference room and space with
comparison of various machine learning models for wireless
furniture/obstructions in the lounge) for our analysis.
indoor localization. Second, since the features in wireless
localization applications are physical signatures of multiple-
input multiple-output (MIMO) wireless links, an investigation A. Experimental Settings
and discussion from specifically wireless perspectives provides The experimental settings in the conference room are de-
new insights on DNNs which cannot be directly inferred picted in Fig. 1. The conference room has dimensions 5.8×8.6
from other domain applications. Specifically, in this paper, we m2 . There are M = 16 target locations (p1 , p2 , . . . , p16 )
address the following issues: 1) How well has the DNN learned which are 1.2 m apart between any two nearest locations. A
and been trained in device-free Wi-Fi indoor localization? single Wi-Fi access point (AP) of the model Zyxel NBG-419N
2) What critical features (wireless signatures) has the DNN as the transmitter (Tx) and a single ASUS laptop equipped
learned to distinguish different classes in device-free Wi- with Ubuntu 10.04 LTS and Intel Wi-Fi Wireless Link 5300
Fi indoor localization? Toward these ends, we adopt several 802.11n MIMO-OFDM radios [20] as the receiver (Rx) are
visualization techniques to provide quantitative and plausible both placed at fixed locations with heights 88 cm and 82 cm,
explanations of DNN workings in this specific application. Our respectively. No tracking device is attached to the target to
main contributions are summarized as follows: be localized (i.e., device-free). The receiver pings the packets
3
580 cm
Input layer Hidden layer Output layer
Concrete wall
Screen
Tx cabinet
1.2m 1.2m
Concrete wall
0
! ! !
Table ( ' % & Table Rx
……
……
……
……
……
$ # ! "
1.2m
Table Rx
Wooden wall
Door
Door
(a) (b)
Fig. 3. The architecture of the DNN.
Fig. 1. Experiment 1 (conference room). A single Wi-Fi AP (Tx) and a single
fixed-location receiver (Rx) are deployed for device-free indoor localization.
(a) Floor plan. (b) Photograph.
afternoon, evening). Overall, 200 CSI samples were collected
540 cm for each location (3200 samples for all 16 locations). Each
Door
Concrete wall CSI sample is a K-dimensional vector, where K = 120 is
TX
Tx specified by the number of OFDM subcarriers (30) multiplied
#! #% #' ## by two transmit and two receive antennas. The elements
Door
of the K-dimensional CSI sample, termed CSI channels or
#$ & * simply channels in this paper, are indexed in the order of
Concrete wall
Table
Sofa
" ( )
pairing (denoted by Tx1–Rx1), 30 subcarriers for Tx1–Rx2, 30
subcarriers for Tx2–Rx1, and 30 subcarriers for Tx2–Rx2. In
1.2m
! % ' #
the online testing phase, a similar task was performed, which is
independent of the training phase. Each of the 16 locations was
1.2m
tested twice, at different times of the same day to incorporate
Rx RX environmental variations. In each test, 100 CSI samples were
Concrete wall collected for each location (1600 for all 16 locations).
(a) (b)
In the lounge experiment, in the offline training phase,
Fig. 2. Experiment 2 (lounge). A single Wi-Fi AP (Tx) and a single fixed-
location receiver (Rx) are deployed for device-free indoor localization. (a)
the same person stood at each of the 14 locations with
Floor plan. (b) Photograph. four different orientations at each location (facing forward,
backward, right, and left with respect to the transmitter),
and the fixed-location receiver pings the packets from the
sent from the transmitter and collects channel state information transmitter. The same procedure was repeated 12 times, in two
(CSI) during packet transmissions. The CSI data are used to days with six times of two hours apart each day. Overall, 384
train a DNN-based classification model for localization. CSI samples were collected for each location (5376 samples
The experimental settings in the lounge are depicted in for all 14 locations). In the online testing phase, the same task
Fig. 2. The lounge has dimensions 5.4 × 6.6 m2 . There are was performed, independently of the training phase. Each of
M = 14 target locations (p1 , p2 , . . . , p14 ) which are 1.2 m the 14 locations was tested twice, at different times of the
apart between two nearest locations. Unlike the conference same day. In each test, 92 CSI samples were collected for
room experiment, the target locations here are not uniformly each location (1288 for all 14 locations).
arranged due to obstructions. The Wi-Fi transmitter and re-
ceiver are placed at fixed locations with heights 76 cm and C. Deep Neural Networks
69 cm, respectively. All other settings are the same as in the A fully connected feedforward network consisting of an
conference room experiment. input layer, L = 3 hidden layers (from first to third, 300,
280, and 260 neurons, respectively)1 , and an output layer
B. Datasets is adopted, as illustrated in Fig. 3. The input to the in-
put layer is the K-dimensional raw CSI data, denoted by
In the conference room experiment, CSI data for both train-
ing and testing phases were collected. In the offline training 1 It is known that by increasing the number of hidden neurons we can
phase, the same person stood at each of the 16 locations and improve the approximation of the desired function, yet with a higher compu-
then the fixed-location receiver pings the packets from the tational cost. The selection of the number of hidden layers and the number
of neurons in each hidden layer here is based on both conventional wisdom
transmitter. The same procedure was repeated eight times, in and our experiments to strike a reasonable tradeoff between complexity and
two days and at four different times each day (morning, noon, performance.
4
x = [x1 , x2 , . . . , xK ]T , where xk denotes the input data of j, and pii = 0, ∀i. Then, t-SNE finds the low-dimensional data
CSI channel k. The input to neuron i in the first hidden representation such that the low-dimensional counterparts have
(1) P (1) (1) (1) (1)
layer is zi = j wi,j xj + bi , where wi,j and bi
similar joint probabilities as the high-dimensional data. The
are the weight and bias with self-explanatory indices. The joint probabilities qij for the low-dimensional
2 counterparts χi
and χj are defined as qij = (1 +
χi − χj
)−1 / k6=l (1 +
P
output of neuron i in the first hidden layer is given by
(1) (1) 2
ai = ReLU zi , where ReLU(·) is the rectified linear unit kχk − χl k )−1 for i 6= j, and qii = 0, ∀i, which is based
(ReLU) activation function defined by ReLU(x) = max(0, x). on the heavy-tailed Student t-distribution instead of Gaussian
The input of neuron i in the lth (l = 2, . . . , L) hidden as in the high-dimensional space to avoid the “crowding
(l) P (l) (l−1) (l)
layer is similarly given by zi = k=1 wi,k ak + bi , problem” [22] in the low-dimensional representation. t-SNE
(l) (l)
with the corresponding output ai = ReLU zi . The soft- finds the low-dimensional data representation with such qij ’s
max output of neuron i in the output layer is described by that P Kullback-Leibler divergence between pij ’s and qij ’s,
the P
(L+1) PM (L+1) i.e., i j pij log(pij /qij ), is minimized.
yi = exp zi / m=1 exp zm . The prediction is
location pm0 , where m0 = arg max ym , f (x), with f (·) In our application, the high-dimensional input or hidden-
m layer activations are projected to the 2D space via t-SNE
denoting the decision function of the DNN. The network is
and presented as scatterplots. In the scatterplot, the points
trained in a supervised manner with the cross-entropy objective
are colored or labeled according to the corresponding classes
function, with 30 iterations of pre-training and 1500 iterations
(i.e., locations p1 , . . . , p16 ). We adopt the silhouette score
of backpropagation.
[23] to visually assess the clustering of points into differ-
ent classes in the 2D representation. The silhouette score
D. Performance s(i) of the ith point is defined as s(i) = (dnearest (i) −
Table I and Table II summarize the location-specific statis- dintra (i))/ max{dintra (i), dnearest (i)}, where dintra (i) is the
tical measures, i.e., precision and recall, for the conference mean distance between the ith point and all other points within
room experiment and lounge experiment, respectively, for the same cluster (i.e., the mean intra-cluster distance), and
DNN-based device-free indoor localization system in com- dnearest (i) is the minimum mean distance between the ith
parison with k-NN (k = 5, the best-performing configura- point and all points in any other cluster (i.e., the mean nearest-
tion2 ) and SVM. k-NN classification is based on the raw cluster distance). The silhouette score lies in the interval
CSI data and is adopted to benchmark the analysis of DNN [−1, 1]. The score is close to 1 if dnearest (i) dintra (i),
in Sections IV and V. SVM with a Gaussian radial basis i.e., the data are well-clustered, close to −1 if dnearest (i)
function (RBF) kernel and the one-against-all technique [8], dintra (i), i.e., the data are not well-clustered in the sense that
[21] for solving the multi-classification problem is adopted. the data would have been better clustered if clustered to the
The precision and recall measure the percentages of correct neighboring cluster, and near zero if dnearest (i) ≈ dintra (i),
classification for each predicted and true class in the testing, i.e., the data are in between two clusters. We calculate the
respectively. average silhouette score over all points for each scatterplot.
TABLE I
E XPERIMENT 1 (C ONFERENCE ROOM ): P RECISION AND R ECALL (U NIT: %, AFTER ROUNDING ) FOR L OCATIONS p1 , . . . , p16 FOR k-NN, SVM, AND
DNN BASED D EVICE -F REE I NDOOR L OCALIZATION
Measure Scheme p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 Avg.
k-NN 85 100 59 100 85 99 100 95 31 70 98 73 40 98 100 83 82.25
Precision SVM 100 100 47 96 93 91 96 97 61 79 95 93 58 100 100 70 86
DNN 100 100 33 94 100 96 75 85 90 76 100 94 51 100 95 88 86.06
k-NN 34 45 96 53 41 100 100 72 99 96 85 38 55 60 60 100 70.88
Recall SVM 19 100 99 78 26 100 99 55 95 98 99 100 84 53 60 100 79.06
DNN 8 100 62 96 11 100 96 65 100 100 100 100 89 55 88 100 79.37
TABLE II
E XPERIMENT 2 (L OUNGE ): P RECISION AND R ECALL (U NIT: %, AFTER ROUNDING ) FOR L OCATIONS p1 , . . . , p14 FOR k-NN, SVM, AND DNN BASED
D EVICE -F REE I NDOOR L OCALIZATION
obtain the relevance scores. The relevance score of neuron i ordering (1, . . . , K) according to a certain ordering of the
in the Lth hidden layer is given by normalized relevance scores:
(L) (L+1) 1) O1 (pn →pm ) = (r1 , . . . , rK ) such that the corresponding
(L) ai wm,i (L+1) sequence of relevance scores is in descending order, i.e.,
Ri (pn → pm ) = P (L) (L+1) (L+1) m
z (2)
a w + b m h0r1 (pn → pm ) ≥ h0r2 (pn → pm ) ≥ · · · ≥ h0rK (pn →
j j m,j
pm ).
for some m, n, which approximates the conservation principle. 2) O2 (pn →pm ) = (r1 , . . . , rK ) such that the corresponding
The relevance scores of neuron i in the lth (l = 1, . . . , L − 1) sequence of relevance scores is in ascending order, i.e.,
hidden layer and of CSI channel i in the input layer are given h0r1 (pn → pm ) ≤ h0r2 (pn → pm ) ≤ · · · ≤ h0rK (pn →
respectively by pm ).
3) O3 (pn →pm ) = (r1 , . . . , rK ) such that the corresponding
(l−1) (l)
(l−1)
X ai wk,i (l) sequence of relevance scores satisfies |h0r1 (pn → pm )| ≥
Ri (pn → pm ) = R (pn → pm ),
P (l−1) (l) (l) k |h0r2 (pn → pm )| ≥ · · · ≥ |h0rK (pn → pm )|.
k j aj wk,j + bk
4) O4 (pn →pm ) = (r1 , . . . , rK ) such that the corresponding
l = 2, . . . , L, (3) sequence of relevance scores satisfies |h0r1 (pn → pm )| ≤
(1)
X xi wk,i (1)
|h0r2 (pn → pm )| ≤ · · · ≤ |h0rK (pn → pm )|.
hi (pn → pm ) = (1) k
R (pn → pm )
P (1) Then, the proposed channel manipulation processes to fa-
k j xj wk,j + bk
cilitate information visualization are described below.
(4)
1) Channel Nullification: The process is to progressively
for some m, n. The relevance score hi (pn → pm ) is normal- nullify or remove the channel in the input CSI data of
0 hi (pn →pm )
ized to [−1, 1] by hi (pn → pm ) = maxj=1,...,K (|hj (pn →pm )|) . location pn , denoted by xpn = [xpn ,1 , xpn ,2 , . . . , xpn ,K ]T , one
A positive valued h0i (pn → pm ) suggests that CSI channel i channel at a time according to a specific channel ordering
provides positive evidence in support of the DNN prediction sequence, and measure the impact of the nullified channel
of location pm given the input CSI data of location pn , a on the classification result. The tth (t = 1, . . . , K) step in
(t) (t−1)
0
negative valued hi (pn → pm ) suggests that CSI channel i the removal process returns x p n = gnull (xpn , rt ), where
(0)
provides negative evidence against such a prediction, and a x pn , xpn , and the function gnull sets the rt -th element
0 (t−1)
near-zero hi (pn → pm ) suggests that CSI channel i provides of xpn to zero. The influence of nullified CSI channels is
(t)
little evidence for or against such a prediction. reflected by the classification decision f (xpn ) of the DNN.
To examine how each input CSI channel has contributed to a 2) Channel Modification: The process is to progressively
specific output DNN decision through the relevance scores, we modify the channel in the input CSI data of location pn
propose two channel manipulation processes, inspired by the “toward” that of location pm , one channel at a time according
approach in [18] where the process is associated with flipping to a specific channel ordering sequence, given the input
the pixel value of a binary image. We first define, for some CSI data of location pn and the output DNN decision of
m, n, the following channel ordering sequences, where each location pm . The tth (t = 1, . . . , K) step in the modification
(t) (t−1) (0)
(r1 , . . . , rK ) represents a permutation of the natural channel process returns xpn = gmod (xpn , rt ), where xpn , xpn ,
6
(t−1)
function gmod sets the rσt -th element of xpn
and the to shade of the same color. It is observed from Fig. 4(a) that the
x
max 0, E [xpm ,rt ] +h0rt (pn →pm ) σxpm ,rt (xpn ,rt −E [xpn ,rt ]) , raw CSI samples are highly scattered, with a silhouette score
pn ,rt
which is based on the linear minimum mean square of 0.22, showing that the indoor localization problem at hand is
error (LMMSE) estimation of unknown X given Y , i.e., not an easy classification problem. The untrained DNN shows
Xb = E[X] + ρ σX (Y − E[Y ]), with correlation coefficient ρ similar visual separation among different classes (silhouette
σY
and standard deviations σX , σY . The influence of modified score: 0.09) in Fig. 4(b). The trained DNN yields much
(t)
CSI channels is reflected by the classification decision f (xpn ) improved visual separation (silhouette score: 0.66), as shown
of the DNN. in Fig. 4(c), by automatically extracting important CSI features
We examine the percentage of correct classification for to distinguish among different classes. It is expected that, as
each true class pn in the testing (i.e., recall), and also the more distinctive patterns are learned through DNN training,
percentage of classification as location pm given the true class different locations can be better classified with DNN in the
pn , after channel nullification/modification. More specifically, testing. We use location p13 as an example. The rectangular
(t)
we calculate, for t = 1, . . . , K, the number of f (xpn ) = n and boxes in Fig. 4(a) and Fig. 4(c) enclose all the training samples
(t) for location p13 . It is seen that after DNN training these
f (xpn ) = m, respectively, divided by the number of testing
training samples become more concentrated. This leads to
samples for location pn .
reduced misclassification of other locations as location p13 in
The proposed technique can be extended to examine the
the testing. Specifically, for the DNN scheme, only locations
contribution of each subcarrier to the DNN decision. We can
p1 , p5 , and p13 are classified as location p13 in the testing,
calculate
while for the k-NN scheme based on the raw CSI samples,
s0i (pn → pm ) locations p1 , p4 , p5 , p9 , p12 , p13 , and p15 are classified as
1 0 location p13 in the testing. Quantitatively, DNN yields a higher
= h (pn → pm ) + h0i+30 (pn → pm ) precision than k-NN for location p13 (51% vs. 40%), as shown
4 i in Table I. Overall, DNN generally yields higher precision
+ h0i+60 (pn → pm ) + h0i+90 (pn → pm ) ,
values, with a higher macro-average precision as compared to
i = 1, . . . , 30 (5) k-NN (86.06% vs. 82.25%). Thus, DNN training is effective
in learning discriminative features from the original noisy
to represent the average normalized relevance scores of sub- wireless signals.
carrier i corresponding to the input CSI data of location pn
and output DNN decision of location pm . Once we have the B. Testing Performance
relevance scores of all the subcarriers corresponding to input Fig. 5 depicts the same figure as Fig. 4, but with the testing
pn and output pm , we can define subcarrier ordering sequences samples (instead of the training samples) highlighted in a
similar to O1 –O4 to rank the importance of subcarriers accord- darker shade of the same color. It is observed that Fig. 5(c)
ing to s0i (pn → pm ). By a slight abuse of notations, we use the presents better visual separation than Fig. 5(a), with silhouette
same O1 –O4 to denote the subcarrier ordering sequences and scores (calculated for the testing samples only) of 0.41 vs.
clearly make the distinction from channel ordering sequences −0.01. Since testing was performed in two different time
in the context. Subcarrier nullification/modification procedures slots, Fig. 5(c) suggests that the trained DNN model modifies
are also similarly defined. For example, nullification of sub- the testing samples collected in different time slots to be
carrier i represents setting the ith, (i + 30)th, (i + 60)th, and more coherent, even though the raw testing data are more
(i + 90)th elements of the CSI sample to zero. scattered in the data space due to environmental variations in
two different time slots. The more concentrated distribution
IV. A NALYSIS AND V ISUALIZATION OF E XPERIMENTAL of the testing samples per location leads to the reduced
R ESULTS : E XPERIMENT 1 (C ONFERENCE ROOM ) misclassification of other locations as the true location in the
The results for experiment 1 (conference room) and exper- testing, as shown by the higher macro-average recall for DNN
iment 2 (lounge) are analyzed and discussed in this section than for k-NN (79.37% vs. 70.88% in Table I).
and next section, respectively. Consider location p11 as an example. In Fig. 5(a), the testing
samples for location p11 are located near the training samples
for locations p11 and p13 , as enclosed by the rectangular
A. Training Effects box. After the DNN operation, the distribution of the testing
The 2D visualization (via t-SNE) of 1) the raw CSI samples, data is altered, where the testing samples for location p11
(L)
2) the last DNN hidden layer activations (i.e., ai ’s) before are now located near the training samples for location p11
training (with random initializations, i.e., all the weights and only, as enclosed by the rectangular box in Fig. 5(c). The
biases are randomly generated from a Gaussian distribution visual distribution matches the classification results where the
with mean zero and standard deviation one), and 3) the last recall for k-NN and DNN for location p11 are 85% and 100%,
DNN hidden layer activations after training, are presented in respectively.
Fig. 4(a)–(c), respectively. 200 training samples and 100×2 =
200 testing samples per location are plotted and color-coded. C. Line-of-Sight Effects
For each location, the training samples are shown with a darker It is observed from Table I that, for DNN, p1 and p5 have
shade to distinguish from the testing samples with a lighter very small values of recall but large values of precision. This
7
✁
✂ ✕✜ ✕✖
✕✗
☎
✆ ✕✗
✝ ✕✙
✕✖
✞ ✕✙
✟
✄✠
✄✄ ✕✖ ✕✛
✕✖✘
✄ ✕✚
✄✁ ✕✖✘ ✕✗
✄✂
✄☎
✄✆ ✎✏✑✒✓✒✓✔
Fig. 4. The 2D visualization of (a) the raw CSI samples, (b) the last DNN hidden layer activations before training (with random initializations), and (c) the
last DNN hidden layer activations after training. For each location, the training samples are shown with a darker shade to distinguish from the testing samples
with a lighter shade of the same color. The silhouette scores (calculated for the training samples only) for (a)–(c) are 0.22, 0.09, and 0.66, respectively.
The rectangular boxes in (a) and (c) enclose all the training samples for location p13 . Black location markers refer to training samples and colored location
markers (in colors corresponding to respective locations) refer to testing samples collected at the respective locations.
Fig. 5. Same plots as Fig. 4, but here, for each location, the testing samples are shown with a darker shade to distinguish from the training samples with
a lighter shade of the same color. The silhouette scores (calculated for the testing samples only) for (a)–(c) are −0.01, 0.16, and 0.41, respectively. The
rectangular boxes in (a) and (c) enclose the testing samples for location p11 and the nearby training samples. Black location markers refer to training samples
and colored location markers (in colors corresponding to respective locations) refer to testing samples collected at the respective locations.
suggests that location p1 (or p5 ) is largely misclassified as to room layouts and object placement, the CSI patterns and
another location (small recall), and very few other locations the resulting classification results are not the same for p4 , p8 ,
are misclassified as location p1 (or p5 ) (large precision). A p12 , and p16 , which are also farther away from the LoS, in
confusion matrix analysis reveals that the testing samples for this experiment.
location p1 are predominantly misclassified as locations p3 and
p13 , and the testing samples for location p5 are predominantly D. Discriminative Features
misclassified as locations p3 , p9 , and p13 . Fig. 4(c) and
Fig. 6 shows the effect of progressive channel nullification
Fig. 5(c) show that the testing samples for locations p1 and
on the DNN prediction using locations p6 and p13 as an exam-
p5 are near the training samples for locations p3 , p9 , and p13 .
ple of locations nearer and farther from the LoS, respectively.
Fig. 4(c) also shows that the training samples for locations p1 ,
For location p6 , channel nullification according to channel
p5 , p9 , and p13 are close to each other, which suggests that
ordering sequences O3 (p6 → p6 ) and O4 (p6 → p6 ) represent,
locations p1 , p5 , p9 , and p13 are not discernable after DNN
respectively, that the most relevant and irrelevant channels,
learning. Fig. 4(a) shows that the raw CSI training samples
as determined by LRP, leading to the correct DNN decision
for these four locations are also not well separated, which
of location p6 are progressively nullified. As can be seen,
indicates that these original CSI patterns in the radio map are
the percentage of correct classification generally decreases,
not discernable, making it difficult for the DNN to distinguish
more rapidly for O3 than for O4 , as more channels are
the location-specific features in the learning process. This may
nullified. For O3 , when as few as 20 channels (out of 120) are
be in part because p1 , p5 , p9 , and p13 are farther away from the
nullified, the percentage of correct classification drops from
line-of-sight (LoS) between the Wi-Fi transmitter and receiver.
100% (the recall for location p6 ) to nearly 0%, showing that
However, since the propagation of wireless signals is sensitive
the channels with large absolute values of relevance scores
8
80 O 3 (p13 p 13 ) from 89% (the recall for location p13 ) to 0%, since locations
O 4 (p13 p 13 ) farther from the LoS could easily mutually misclassify as each
70
other, as discussed previously.
60 Fig. 7 shows how the percentages of classification change
50 as channels of the true class (p6 ) are progressively modified
40
toward another class (p10 ) according to the channel ordering
sequence O2 (p6 → p10 ). Note that the leading elements of
30
O2 (p6 → p10 ) are channels that have the most negative
20 relevance scores, i.e., provide the most negative evidence
10 against the prediction of location p10 given the true class
being location p6 . By modifying these channels of location
0
0 10 20 30 40 50 60 70 80 90 100 110 120 p6 toward those of location p10 , we gradually erase the
Number of channels nullified (t) evidence against the prediction of location p10 and enhance
the evidence for the prediction of location p10 (analogous
Fig. 6. The percentage of correct classification (true class: p6 (or p13 )) after
progressive channel nullification according to the channel ordering sequences to pixel flipping in binary images). Thus, we expect that
O3 (p6 → p6 ) and O4 (p6 → p6 ) (or O3 (p13 → p13 ) and O4 (p13 → the percentage of predicting location p6 (p10 ) will decrease
p13 )). (increase), provided that the relevance scores are reliable
indicators of critical features that distinguish between classes.
This is confirmed in Fig. 7. In comparison, nullifying, instead
100 of modifying, the same channels in the same order does not
90
yield the same effect. Since channel nullification erases the
evidence against, but does not enhance the evidence for, the
80
prediction of location p10 , misclassification favors not only
Percentage of classification (%)
40 1 40 1
0.8 0.8
35 35
0.6 0.6 100
30 30
0.4 0.4
25 25
O 3 (p6 p 6)
90
CSI amplitude
CSI amplitude
0.2 0.2
O 4 (p6 p 6)
15
-0.2
15
-0.2
80 O 3 (p13 p 13 )
-0.4 -0.4
10
-0.6
10
-0.6
O 4 (p13 p 13 )
5 5
70
-0.8 -0.8
0 -1 0 -1
0 20 40 60 80 100 120 0 20 40 60 80 100 120 60
Channel index Channel index
(a) (b)
50
Fig. 8. Heatmaps of relevance scores (a) h0i (p6 → p6 ) and (b) h0i (p6 → p10 )
for i = 1, . . . , 120, superimposed on the 200 testing samples of CSI of 40
location p6 .
30
20
40 40
30 30
0
0 5 10 15 20 25 30
20 20
Number of subcarriers nullified (t)
10 10
Fig. 10. The percentage of correct classification (true class: p6 (or p13 ))
after progressive subcarrier nullification according to the subcarrier ordering
sequences O3 (p6 → p6 ) and O4 (p6 → p6 ) (or O3 (p13 → p13 ) and
0 0
0 20 40 60 80 100 120 0 20 40 60 80 100 120 O4 (p13 → p13 )).
40
Channel index 40
(c) = 75% (t=22) (d) = 49.5% (t=32) Tx1 - Rx1 Tx1 - Rx2 Tx2 - Rx1 Tx2 - Rx2 1
40 40 40 40 0.8
CSI amplitude
CSI amplitude
CSI amplitude
CSI amplitude
30 30 0.6
0.4
0.2
20 20 20 20 0
20 20 -0.2
-0.4
-0.6
0 0 0 0 -0.8
0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 -1
10 10
Subcarrier index Subcarrier index Subcarrier index Subcarrier index
0
0 20 40 60 80 100 120
0
0 20 40 60 80 100 120
Fig. 11. Heatmaps of relevance scores s0i (p6 → p6 ) for i = 1, . . . , 30,
40 40 superimposed on the 200 testing samples of CSI for four transmit-receive
(e) = 25% (t=52) (f) = 0% (t=98) antenna pairings of location p6 .
30 30
20 20
(yellow/red in Fig. 8(a)) but neutral or against predicting p10
10 10 (cyan/blue in Fig. 8(b)). In Fig. 9(c), for the 25% misclassified
samples, the channels with h0i (p6 → p6 ) > 0.4 (i.e., strong
0
0 20 40 60 80 100 120
0
0 20 40 60 80 100 120
evidence in support of predicting p6 ) have all been modified.
40 40
By modifying these critical channels in support of predicting
(g) = 0% (t=120) (h) True class ଵ
30 30
p6 , the percentage of correct classification as p6 decreases
rapidly (from 100% to about 50% when only 22 channels are
20 20 modified, from Fig. 9(b) to Fig. 9(d)). As channel modifica-
tion progresses, channels with near-zero and positive valued
10 10
h0i (p6 → p10 ), i.e., the cyan/yellow/red-colored channels in
Fig. 8(b), start to be modified. Comparing Figs. 9(e) and
0 0
0 20 40 60 80 100 120 0 20 40 60 80 100 120 9(f), the additional channels modified, roughly corresponding
to the red-circled ones in Fig. 9(f), are evidence neutral or
Fig. 9. 200 testing samples of CSI of location p6 (in (a)) progressively
modified toward 200 testing samples of CSI of location p10 (in (h)) according mildly supporting predicting p6 (cyan/yellow in Fig. 8(a)) and
to the channel ordering sequence O2 (p6 → p10 ). The unmodified/modified supporting predicting p10 (yellow/red in Fig. 8(b)). Modifying
channels are color-coded by the CSI of p6 /p10 (in (b)–(g)). The percentages these channels not critically supporting predicting p6 and
of correct classification as p6 and the numbers of channels modified (t) are
shown. against predicting p10 results in a slower decrease in the
percentage of correct classification (from about 50% to 0%
when as many as 66 channels are modified, from Fig. 9(d)
to Fig. 9(f)). When all 120 channels of location p6 are
with the most negative valued h0i (p6 → p10 ), i.e., the blue-
modified (Fig. 9(g)), the resulting CSI pattern approaches that
colored channels in Fig. 8(b). When up to 10 channels
of location p10 (Fig. 9(h)).
are modified, the percentage of correct classification remains
100%. When more channels are modified, the percentage starts
to decline. Comparing Figs. 9(b) and 9(c), the additional E. Subcarrier Importance
channels modified, roughly corresponding to the red-circled From a wireless perspective, the logical 120 channels com-
ones in Fig. 9(c), are evidence in support of predicting p6 prise 30 physical subcarriers in 2 × 2 MIMO wireless links.
10
✄✁ ✄✂ ✄☎
✍
✎
✏
✑
✒ ☛☞
✓
✔
✕
✖
✍✗
✍✍
✍✎ ☛✌
✍✏
✍✑
✆✝✞✟✠✟✠✡
Fig. 12. The 2D visualization of (a) the raw CSI samples, (b) the last DNN hidden layer activations before training (with random initializations), and (c) the
last DNN hidden layer activations after training. For each location, the training samples are shown with a darker shade to distinguish from the testing samples
with a lighter shade of the same color. The silhouette scores (calculated for the training samples only) for (a)–(c) are −0.18, −0.21, and 0.09, respectively.
The rectangular boxes in (a) and (b) enclose all the training samples for locations p1 and p4 . Black location markers refer to training samples and colored
location markers (in colors corresponding to respective locations) refer to testing samples collected at the respective locations.
✁
✂
☎
✕✖
✆
✝
✞
✟
✄✠
✄✄
✄ ✕✗
✄✁
✄✂
✎✏✑✒✓✒✓✔
Fig. 13. Same plots as Fig. 12, but here, for each location, the testing samples are shown with a darker shade to distinguish from the training samples with
a lighter shade of the same color. The silhouette scores (calculated for the testing samples only) for (a)–(c) are −0.22, −0.23, and −0.15, respectively. The
rectangular boxes in (a) and (b) enclose the testing samples for locations p1 and p4 . Black location markers refer to training samples and colored location
markers (in colors corresponding to respective locations) refer to testing samples collected at the respective locations.
the black-circled ones in Fig. 17(b), represent evidence in the lounge environment yields smaller correlation coefficients
support of predicting p12 (yellow/red in Fig. 16(a)) but neutral in 26 out of the total 29 pairs of adjacent subcarriers). In
or against predicting p13 (cyan/blue in Fig. 16(b)). Comparing the lounge experiment, the logical channels that correspond to
Figs. 17(d) and 17(e), the additional 51 channels modified, the same physical subcarriers have even more greatly varying
roughly corresponding to the black-circled ones in Fig. 17(e), relevance scores. Thus, progressively nullifying a specific
are evidence neutral for predicting both p12 and p13 , leading to subcarrier in four wireless links simultaneously translates to
a slow decrease and increase in the percentages of predicting channel nullification in neither the channel ordering sequence
p12 and p13 , respectively. Comparing Figs. 17(g) and 17(h), of O3 nor O4 . Thus, Fig. 18 exhibits a slightly different trend
there is a more pronounced discrepancy between the modified as compared to Fig. 14. The richer multipath effect, which,
CSI toward the mean of CSI of p13 and the actual CSI of in theory, provides more degrees of freedom which could aid
p13 , due to a larger variation in the testing CSI samples in the feature selection for classification, may have led to a greater
lounge experiment. performance improvement of DNN over k-NN in the macro-
average recall and precision in the lounge experiment (see
D. Subcarrier Importance Tables I and II). Yet, in the meantime, the same environment
induces more noisy CSI across different samples, which dete-
Figs. 18 and 19 present results related to subcarrier im- riorates the performance of DNN (as well as k-NN and SVM)
portance. There is a richer multipath effect (larger delay in absolute terms in the lounge experiment.
spread) in the lounge environment than in the conference room
environment, as shown by the generally smaller correlation
coefficients between two adjacent subcarriers in the lounge
experiment (comparing the same pairs of adjacent subcarriers,
12
40 1 40 1
0.8 0.8
100
0.6 0.6
30 30
O 3 (p12 p 12 )
CSI amplitude
CSI amplitude
0.4 0.4
90 0.2 0.2
O 4 (p12 p 12 )
Percentage of correct classification (%)
20 0 20 0
20
10
✄✗ ✘✙✚☞ ✖✛✗✜✜ ✞☛✂ ✄✑ ✞☛✂ ☎✒✓✔✡✟ ✄✠☎✕
0 ✸
0 10 20 30 40 50 60 70 80 90 100 110 120 ✷
✶
Number of channels nullified (t) ✴✵✳
✲
✱
Fig. 14. The percentage of correct classification (true class: p12 (or p1 )) after ✰
✯
progressive channel nullification according to the channel ordering sequences ✮
O3 (p12 → p12 ) and O4 (p12 → p12 ) (or O3 (p1 → p1 ) and O4 (p1 → ✭
p1 )).
✤✥✦✧✧★✩ ✪✧✫★✬
100
Channel mod. O (p p )
2 12 13
90 True class p 12 , Predicted class p 12
Channel null. O (p p )
2 12 13
70
True class p 12 , Predicted class p 12
60 True class p , Predicted class p
12 13
✄☞ ✞☛✂ ☎✌✟ ✄✠☎✍✌ ✄✹ ✞☛✂ ☎✌✟ ✄✠☎✏✏✆
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100 110 120
✄✎ ✞☛✂ ☎✌✟ ✄✠☎✏✆✌ ✄✢ ✘✙✚☞ ✖✛✗✜✜ ✞☛✣
Number of channels modified/nullified (t)
Fig. 15. The percentage of classification as p12 and p13 (true class: p12 ) after
progressive channel modification according to the channel ordering sequence
O2 (p12 → p13 ), in comparison with the same setting but after progressive
channel nullification.
VI. C ONCLUSION
Fig. 17. 184 testing samples of CSI of location p12 (in (a)) progressively
In this paper, we have performed a detailed analysis of the modified toward 184 testing samples of CSI of location p13 (in (h)) according
to the channel ordering sequence O2 (p12 → p13 ). The unmodified/modified
DNN in device-free Wi-Fi fingerprinting indoor localization channels are color-coded by the CSI of p12 /p13 (in (b)–(g)). The percentages
through visual analytics techniques. By projecting the last of correct classification as p12 and the numbers of channels modified (t) are
DNN hidden layer activations to visualizable 2D representa- shown.
tions via the t-SNE dimensionality reduction technique, we ob-
served better clustering and visual separation among different
classes after DNN training. The more concentrated distribution procedures, revealed critical channels or features in wireless
of the testing samples matches the reduced misclassification signals learned by the DNN which cannot be well interpreted
in the testing for DNN. DNN generally yields higher precision by human perceptions. The relevance scores, determined by
results as compared to k-NN based on the raw data, suggesting LRP, provided analyzable, interpretable, and reliable measures
that DNN training is effective in learning discriminative fea- of how and why the percentages of classification change as
tures from the noisy wireless signals. The LRP visual analytics specific channels or features are manipulated, and thereby
technique, coupled with the channel nullification/modification allow for a better understanding of the mechanism of DNN-
13
80 O 3 (p1 p 1) for indoor localization: A deep learning approach,” IEEE Trans. Veh.
O 4 (p1 p 1)
Technol., vol. 66, no. 1, pp. 763–776, Jan. 2017.
70 [12] R. Y. Chang, S.-J. Liu, and Y.-K. Cheng, “Device-free indoor localization
using Wi-Fi channel state information for Internet of things,” in Proc.
60 IEEE GLOBECOM, Dec. 2018.
[13] J. Wang, X. Zhang, Q. Gao, H. Yue, and H. Wang, “Device-free wireless
50
localization and activity recognition: A deep learning approach,” IEEE
40 Trans. Veh. Technol., vol. 66, no. 7, pp. 6258–6267, Jul. 2017.
[14] Q. Gao, J. Wang, X. Ma, X. Feng, and H. Wang, “CSI-based device-
30 free wireless localization and activity recognition using radio image
features,” IEEE Trans. Veh. Technol., vol. 66, no. 11, pp. 10 346–10 356,
20
Nov. 2017.
10 [15] M. Craven and J. Shavlik, “Visualizing learning and computation in
artificial neural networks,” Int. J. Artif. Intell. Tools, vol. 1, no. 3, pp.
0 399–425, 1992.
0 5 10 15 20 25 30 [16] A. Mohamed, G. Hinton, and G. Penn, “Understanding how deep belief
Number of subcarriers nullified (t) networks perform acoustic modelling,” in Proc. IEEE Int. Conf. Acoust.
Speech Signal Process., 2012, pp. 4273–4276.
Fig. 18. The percentage of correct classification (true class: p12 (or p1 )) [17] P. E. Rauber, S. G. Fadel, A. X. Falco, and A. C. Telea, “Visualizing the
after progressive subcarrier nullification according to the subcarrier ordering hidden activity of artificial neural networks,” IEEE Trans. Vis. Comput.
sequences O3 (p12 → p12 ) and O4 (p12 → p12 ) (or O3 (p1 → p1 ) and Graphics, vol. 23, no. 1, pp. 101–110, Jan. 2017.
O4 (p1 → p1 )). [18] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and
W. Samek, “On pixel-wise explanations for non-linear classifier deci-
sions by layer-wise relevance propagation,” PLOS ONE, vol. 10, no. 7,
40
Tx1 - Rx1
40
Tx1 - Rx2
40
Tx2 - Rx1
40
Tx2 - Rx2 1 p. e0130140, Jan. 2015.
0.8
CSI amplitude
CSI amplitude
CSI amplitude
CSI amplitude
0.6
0.4
[19] W. Samek, A. Binder, G. Montavon, S. Lapuschkin, and K. R. Müller,
20 20 20 20
0.2
0
“Evaluating the visualization of what a deep neural network has learned,”
-0.2
-0.4
IEEE Trans. Neural Netw. Learn. Syst, vol. 28, no. 11, pp. 2660–2673,
0 0 0 0
-0.6
-0.8
Nov. 2017.
0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 -1 [20] D. Halperin, W. Hu, A. Sheth, and D. Wetherall, “Tool release: Gath-
Subcarrier index Subcarrier index Subcarrier index Subcarrier index
ering 802.11n traces with channel state information,” ACM SIGCOMM
Comput. Commun. Review., vol. 41, no. 1, pp. 53–53, Jan. 2011.
Fig. 19. Heatmaps of relevance scores s0i (p12 → p12 ) for i = 1, . . . , 30, [21] Z. Wu, K. Fu, E. Jedari, S. R. Shuvra, R. Rashidzadeh, and M. Saif, “A
superimposed on the 184 testing samples of CSI for four transmit-receive fast and resource efficient method for indoor positioning using received
antenna pairings of location p12 . signal strength,” IEEE Trans. Veh. Technol., vol. 65, no. 12, pp. 9747–
9758, Dec. 2016.
[22] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J.
Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008.
based device-free wireless indoor localization. [23] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and
validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, no. 1,
R EFERENCES pp. 53–65, 1987.