You are on page 1of 12

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CYBERNETICS 1

Single/Multi-Source Black-Box Domain Adaption


for Sensor Time Series Data
Lei Ren , Member, IEEE, and Xuejun Cheng , Graduate Student Member, IEEE

Abstract—Unsupervised domain adaption (UDA), which trans- led to a proliferation of studies that how to apply these new
fers knowledge from a labeled source domain to an unlabeled technologies to the field of industrial environment, including
target domain, has attracted tremendous attention in many fault diagnosis [4] and quality prediction [5]. Traditional sen-
machine learning applications. Recently, there have been attempts
to apply domain adaption for sensor time series data, such as sor time series methods usually rely on the feature engineer
human activity recognition and gesture recognition. However, or data preprocessing, and their generalization performance
existing methods suffer from some drawbacks that hinder fur- is poor. Recently, with the rise in deep learning, deep neu-
ther performance improvement. They often require access to ral networks for sensor time series have successfully achieved
source data or source models during training, which is unavail- promising performances in these applications. Several success-
able in some fields because of privacy protection and storage
limit. Typically, the source domains may only provide an appli- ful deep learning models have been proposed for time-series
cation programming interface (API) for the target domain to call. classification. Different from the interpretable feature-based
On the other hand, current UDA methods have not considered representations, various deep learning models are proposed to
the temporal consistency and low-signal-to-noise ratio (SNR) of capture the spatial interactions in sensor time series. There
sensor time series. To address the challenges, this article presents are three main encoder architectures incorporating tempo-
a black-box domain adaption framework for sensor time series
data (B2TSDA). First, we propose a single/multi-source teacher– ral information in time series-convolutional neural network
student learning framework to distill the knowledge from the (CNN)-based model, recurrent neural network (RNN)-based
source domains to a customized target model. Then we design a model, and attention-based model. CNNs extract local rela-
new temporal consistency loss by combining an adaptive mask tionships that are invariant across spatial dimensions. RNNs
method and dynamic threshold method to maintain consistent extract different range temporal dependencies in the sensor
temporal information and balance the learning difficulties of dif-
ferent classes. For the multisource black-box domain adaption, time series. Attention mechanisms weigh temporal features to
we further propose a Shapley-enhanced method to determine focus on significant time steps in the time-series. However,
the contribution of each source domain. Experimental results on these models usually need a large amount of labeled training
both single-source and multisource domain adaption show that data. To tackle the issue of expensive data annotation, unsuper-
our framework has superior performance compared to other vised domain adaption (UDA) models are applied to the sensor
black-box UDA methods.
time series. UDA aims to close the gap caused by domain shift
Index Terms—Activity recognition, gesture recognition, mul- with the help of a small amount of unlabeled data from the
tisource transfer learning, time series, unsupervised domain target domain. Generally, UDA minimizes the empirical risk
adaptation.
on the source domain data and generates the domain-invariant
representations [6].
I. I NTRODUCTION Despite recent advances of deep learning-based approaches
ULTISENSOR technologies in the Internet of Things in domain adaption for sensor time series data [7], [8], we
M have been applied in many fields, such as activity
recognition [1], cyber-security [2], and clinical diagnosis [3].
summarize three key challenges in these circumstances. The
first challenge is that the source domain models and source
Sensors can generate a large amount of multivariate time-series domain data are not always accessible due to limitations of
data. Recent trends in artificial intelligence technologies have data-privacy rules or computation resources [9], although most
domain adaption methods require access to the source data
Manuscript received 27 May 2023; revised 28 June 2023; accepted 23 July during adaptation. For example, the information of clients
2023. This work was supported by the National Natural Science Foundation
of China under Grant 62225302, Grant 92167108, and Grant 62173023. This from the different companies is sensitive and not be allowed
article was recommended by Associate Editor H. Han. (Corresponding author: to share, where an application programming interface (API)
Lei Ren.) service is provided to customers instead. Considering the
Lei Ren is with the School of Automation Science and Electrical
Engineering, Beihang University, Beijing 100191, China, also with data-privacy rules, current UDA methods, such as domain
Zhongguancun Laboratory, Beijing 100194, China, and also with the State Key adversarial training [10], are not suitable. Regarding the com-
Laboratory of Intelligent Manufacturing System Technology, Beijing 100854, putation resources, networks for the target domain adopt the
China (e-mail: renlei@buaa.edu.cn).
Xuejun Cheng is with the School of Automation Science and same networks as the source domains [11], which is inconve-
Electrical Engineering, Beihang University, Beijing 100191, China (e-mail: nient especially when the source domain model is large and
lukesoldier@163.com). complex. The second challenge is that two particular properties
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TCYB.2023.3300832. of sensor time series data are not considered. The sensor time
Digital Object Identifier 10.1109/TCYB.2023.3300832 series data is collected from the industrial machines. So the
2168-2267 
c 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:44:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON CYBERNETICS

problem in the multisource domain adaption, which


conducts a model aggregation scheme. We define the
confidence score to control the penalty degree for the
negative transfer. Then we aggregate a series of source
data time series and set weights according to their
relevance to the target domain time series.
The remainder of this article is structured as follows.
Section II describes the related work, including the single-
source domain adaption, multisource domain adaption, source-
free domain adaption, black-box domain adaption, federated
domain adaption, and teacher–student learning. Section III
introduces the proposed black-box domain adaption frame-
work for time series data, followed by the details for each
part of this method. Section IV depicts the experiments on
single-source and multisource time-series datasets to show the
effectiveness of the proposed approach. Section V concludes
Fig. 1. Description of the back-box domain adaption problems. The source this article and discusses the future work.
domains only provide information through black-box predictors, where both
source domain data and models are not acquired during the domain adaption
process. II. R ELATED W ORK
In this section, related work, including UDA and teacher–
student learning for time series data is briefly introduced
sensor time series data have significant temporal consistency
below.
and low signal-to-noise ratio (SNR). The existing domain
adaption methods are mainly devised for the image or tex-
tual data. Trivially adopting UDA methods [12], [13] for time A. Unsupervised Domain Adaption for Sensor Time Series
series data may break the temporal consistency and therefore Data
harm the performance of the model. The third challenge is Single-Source Domain Adaption: Most single-source UDA
the negative transfer effect in multisource domain adaption. methods attempt to minimize domain discrepancy between
The irrelevant or malicious source domains may cause nega- domains to learn a domain invariant space. They minimize
tive transfer [14], since we cannot verify whether the source domain discrepancy by utilizing metric-based learning and
domain and target domain data share a similar distribution. adversarial learning. Metric-based learning designs a specific
To tackle these challenges, we propose black-box domain metric, such as Gaussian kernel maximum mean discrep-
adaption for time series data (B2TSDA) to consider the model- ancy [16] and higher-order moment matching [17], which
agnostic properties, data-privacy, temporal-consistency, and can measure the discrepancy distance to learn domain invari-
low SNR. The contribution of this article is summarized as ant properties. Adversarial learning constructs an adversarial
follows. loss to train discriminators to distinguish between source and
1) Single/Multi-Source Teacher–Student Learning: The target domains [18]. Ganin et al. [6] proposed a domain-
proposed “Black-Box Time-series UDA” framework adversarial neural network to achieve domain adaption.
is shown in Fig. 1, where both source domain time Moreover, domain separation [19], Wasserstein distance [20],
series and models are not accessed. We then introduce collaborative network [21], and few-shot learning [22] com-
single/multi-source teacher–student learning to distill the bined with adversarial learning are applied furthermore to
knowledge from the provided source domain black-box improve the performance of the adversarial learning model.
predictors to a customized target model. Teacher–student Because of the nonstationary and nonmonotonic properties of
learning is model-agnostic and is better for obtaining the time series, direct metric-based learning and adversarial
domain invariant representation, especially when the learning methods are not suitable for domain adaption for
source data cannot be directly accessed. time series. Zhao et al. [23] proposed deep representation-
2) Temporal Consistency Loss Combining Adaptive based domain adaptation to solve the domain adaption for
Masking and Dynamic Threshold: Considering the time series data. Ragab et al. [24] addressed the time-series
characteristics of the time series data, we introduce a domain adaption with self-supervised autoregressive domain
temporal consistency loss mixing up adaptive masking adaptation. Different from those approaches, we explore spe-
and dynamic threshold to take the temporal information cial situations where neither source domain data nor source
into consideration. We use pseudo-learning [15] with a models are not accessible.
dynamic threshold to select domain-invariant samples Multisource Domain Adaption: Compared to sing-source
from the target domain. Then we design a Bernoulli domain adaption for time series data, there are limited
masking method to leverage the time series information researches on multisource domain adaptation, especially time
and improve model robustness. series domain adaptation. MDAN [25] optimizes task-adaptive
3) Shapley-Enhanced Method: We propose a Shapley- generalization bound using adversarial neural networks.
enhanced method to address the negative transfer Xie et al. [26] adopted multiclass domain classifier to

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:44:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

REN AND CHENG: SINGLE/MULTI-SOURCE BLACK-BOX DOMAIN ADAPTION FOR SENSOR TIME SERIES DATA 3

Fig. 2. Our proposed framework—single/multi-source black-box domain adaption for sensor time series data.

maximize the certainty of making task-specific predictions. Black-Box Domain Adaption: There is little research on
Guo et al. [27] constructed a point-to-set metric to cap- black-box domain adaption. DINE [32] introduces a new two-
ture the relationship between the target domain and different step knowledge adaption framework called DIstill and fine-
source domains. These multisource domain adaption methods tuNE to solve the black-box domain adaption. B2UDA [33]
are all applied in computer vision and natural language pro- adopts iterative learning to conduct noisy labeling and to learn
cessing. CALDA [28] transfers the distributions of weights with noisy labels. However, on the one hand, these methods
in multisource time series domain adaptation via contrastive are used for image data, which is inappropriate for time series
adversarial learning. CoDATS [10] considers weak supervision data. On the other hand, these methods apply a simple mean
of the target domain’s label proportion to design a convo- ensemble method for multisource black-box domain adaption,
lution deep domain adaptation model for time series data. which does not take negative transfer into consideration.
MRTN [29] designs a multisource-domain-refined adversarial Federated Domain Adaption: Federated learning [34] is a
adaptation network that is applied in fault diagnosis task under distributed machine learning technique that learns a shared
both domain and category inconsistencies. The above methods prediction model across multiple decentralized edge devices
depend on the availability of labeled data in the source sensor or servers holding local data samples, without exchanging
time series. Moreover, these methods may be inefficient and them. FADA [35] applies adversarial training to align the rep-
time-consuming when a source-domain time series model has resentations learned among the different nodes with the data
ample parameter space. distribution of the target node. FedKA [36] provides a fed-
Source-Free Domain Adaption: Source-free domain adap- erated multisource domain adaptation method to improve the
tion has been proposed to solve the data privacy and secu- model’s generality in a target domain. However, the communi-
rity issues in domain adaption. Source-free approaches are cation cost restricts the application of federated learning. Also,
classified into two types: 1) self-supervision and 2) virtual privacy leakage is still a concern for the enterprise.
source transfer. The self-supervision methods adopt pseudo-
label-based strategy by combining information maximization
and self-supervised pseudo-labeling. SHOT [30] develops a B. Teacher–Student Learning
target-specific feature extraction module and implicitly aligns Teacher–student learning trains a student model under
representations of the target domain with the source model. the supervision of the true labels and the teacher model’s
The virtual source transfer methods adopt generative adversar- guidance [37]. Teacher–student learning has been applied
ial nets (GANs) to generate more samples. MA [31] applies in image classification [38] and emotion recognition [39].
a class conditional generative adversarial network to gener- Traditional teacher–student learning is employed to build a
ate target-style data. However, because the model weights are smaller model with faster inference speed and maintain the
exposed, there is a possibility of data leaking with source-free model accuracy. Recently, teacher–student learning in neural
domain adaption. machine translation [40], speech recognition [41], and image

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:44:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON CYBERNETICS

classification [42] has also verified the effectiveness in domain adaption. We want to train a unified model from various
adaption tasks. However, on the one hand, because the data teacher models or their ensembles for more accurate time
could contain sensitive information, thus privacy and secu- series prediction rather than just compressing the model size.
rity are the main concerns. On the other hand, the validity of Teacher–student learning in domain adaption [40], [41], [42]
teacher–student learning in black-box domain adaption has not assumes that source model and source data are accessible. And
been explored. we apply teacher–student learning in black-box time series
domain adaption, where the source data cannot be directly
III. S INGLE /M ULTI -S OURCE B LACK -B OX D OMAIN accessible.
A DAPTION FOR T IME S ERIES DATA
Problem Formulation: Let {θsk }M k=1 be set of source mod-
B. Temporal Consistency Loss
els, where the kth source model θsk corresponding to the The outputs of the source models for target samples are
source domain DkS is learned using the source domain dataset not accurate, sometimes not reliable, and even incomplete.
Nk
{(xik , yki )}i=1 , where xik and yki denote the ith source time Specifically, the soft label from the teacher model may lose
series and the label, respectively, and Nk means the size of important time series information, such as data structure and
source domain dataset. Also the target domain DT has Nt trend information. Besides, there is a domain shift between
unlabeled samples {xit }N t t
i=1 , where xi denotes the target time source domains and target domains [44]. The soft labels from
series and Nt means the size of target domain dataset. The the output of the teacher model may be noisy that is harmful
UDA attempts to learn a model θt to minimize the target error to student model training.
Ez∼θT [pt (y|z) − p(y|z, f )], where pt (y|z) is the conditional dis- Considering the sensor time series data properties, we fol-
tribution of the target domain and p(y|z, f ) is the conditional low the smoothness assumption proposed in [45] and [46]. The
distribution of the prediction from the target domain. For the smoothness assumption encourages the same output when tak-
single-source UDA scenario, the number of source domains ing original samples and the perturbed samples within a small
M is equal to 1. Different from standard UDA, black-box neighborhood as the model’s input and helps to alleviate the
domain adaption only provide a black-box predictor. Both over-fitting. We apply the time-series mask strategies to create
source-domain data and the parameters of the source model corrupted samples as follows:
are all not accessed.
x̃i = Mi  xi + (E − Mi )  xi (3)
A. Teacher Student Learning for Black-Box Domain where xi and xi represent the augmented and original time-
Adaption series, respectively.  indicates the elementwise product oper-
The framework of the single/multi-source domain adaption ator. Mi ∈ [0, 1]d×r×c is a binary mask matrix and E := 1d×r×c
for sensor time series data is shown in Fig. 2. For UDA for is all-ones matrix, where d is the number of mask trans-
sensor time series data, the teacher models {θsk }M k=1 which have
formations, r is the number of the time steps, and c is the
been trained on respective source domains, are directly used to number of features of xi . We have Mi = [m1 , . . . , md ]T , where
conduct inference on unlabeled target domain time-series data mj ∈ [0, 1]r×c is sampled from a Bernoulli distribution with
{xit }Nt
i=1 . Since the soft labels have richer information compared
probability pm . We adjust the level of difficulty of the model
to the hard labels [43], the soft pseudo labels produced by the training by changing the hyperparameter pm to control the
teacher model are harnessed to train the student model. By proportion of masked time steps. Traditional mask method
designing the loss between the prediction of the student model generates a binary mask vector that may set 0 for masked
and the soft labels of the teacher model, the student model samples [47], [48]. These methods may destroy the intrin-
can not only learn the knowledge transferred from the sic temporal structure for sensor time series data. However,
teacher model but also obtain specific information from the temporal dependency is important for sensor time series.
the unlabeled target domain. In black-box domain adaption, They fail to explicitly model the temporal dependency when
the teacher–student learning calculates the loss as follows: exploring the mask methods for sensor time series data. We
  apply the strategy by adding noise of Gaussian and Gaussian
ŷi = p̂ xit , θs (1) blur kernel to maintain the temporal structure [49]. Blurring
1 
Nt
   can extract the trend information of a sensor time series and
Lt = Dkl ŷi , p̃ xit , θt (2) adding noise contributes to enhancing the local information of
Nt
i=1 a sensor time series. So a noisy sample xi in x is generated as
where Lt denotes the Kullback–Leibler (KL) divergence loss. 
x + σ1 , noise
Given an unlabeled time series, the teacher–student learn- xi = i (4)
gσ2 (xi ), blur
ing loss is defined as the KL divergence between the stu-
dent model’s output probability distributions and the teacher where σ1 is a Gaussian noise with the mean μ and variance
model’s soft label. When inferencing with the target time σ12 and gσ2 is a Gaussian blur kernel with the isotropic vari-
series, the trained student model is used to predict the probabil- ance σ22 . Our generated time series can not only include the
ity distribution of the label. Then argmax function is applied input space in the target domain but also cover the tempo-
to get the predicted label with the highest probability. Our ral information. After generating the augmented samples x̃
research adopts teacher–student learning for black-box domain from x̃, We use the consistency regularization to construct

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:44:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

REN AND CHENG: SINGLE/MULTI-SOURCE BLACK-BOX DOMAIN ADAPTION FOR SENSOR TIME SERIES DATA 5

the loss for the unlabeled samples. The object of consistency


regularization is given as follows:

1 
Nt 
d     
Lu = t
Dkl f x̃i,j , f xit (5)
Nt d
i=1 j=1
t is the jth augmented samples from the target domain
where x̃i,j
sample xit . We denote Nt as the number of the original samples,
d as the number of augmented samples from the ith original
sample and f is the function of student model for black-box
domain adaption.
To alleviate the domain shift and class imbalance issues,
inspired by [50], we develop pseudo labeling with dynamic
thresholds to mine pseudo labels with high confidence for
target data. We use a curriculum learning method to adjust Fig. 3. Confidence score of coalition in multidomain adaption for sensor
thresholds flexibly for different classes to select more confi- time series data.
dence samples. The adjusted thresholds Tt (x) are defined as
follows:

N where α is a hyperparameter empirically set to 1 which is
   
σt (x) = 1 max pm,t (y | un ) > τ applied to control the importance of Lu in black-box domain
n=1 adaption.
  
·1 arg max pm,t (y | un ) = x (6)
σt (x) C. Shapley-Enhanced Method in Multisource Black-Box
βt (x) = (7)
maxx σt Domain Adaption
Tt (x) = βt (x) · τ (8)
The core of multisource black-box domain adaption lies in
where x is the name of one class. Tt (x) is the threshold for a model aggregation scheme. We obtain a set of weights wk
class x at training step t. For example, in the har dataset, corresponding to each of the source models. These weights
Tt (x = “biking”) means the threshold of class “biking” at represent a probability mass function over the source domains,
time step t. τ denotes the predefined threshold of the target with a higher value implying high-transfer ability from that
samples. We set τ to exceed 0.8 to filter noisy labels and particular domain. After weighing the black-box predictors
remain high-quality samples. N is the total number of target with wk , the object of the multisource domain adaption
domain data. In (7), when x is best-learned class, we can find converts to the single-source domain adaption like (10).
σt (x) = maxx σt and βt (x) = 1. For classes are hard to learn, Considering the multisource black-box domain adaption, it
we have βt (x) < 1 and Tt (x) < τt . So more training sam- is essential to assign different weights for the different teacher
ples in these classes are added into the model. We design the models. There are usually two ways to tackle this problem.
enhanced KL divergence loss Lt as 1) arg maxk wk : The method uses the most confident
prediction (maximum probability prediction) among wk .
1 
Nt
      2) wk = (1/K): The method computes the mean of multiple
Lt = 1 max ŷi > Tt arg max ŷi source predictors and uses the mean value as the pseudo
Nt
i=1
   label for target samples.
· Dkl ŷi , p̃ xit , θt . (9) However, because of the model bias and domain shift, the
In this way, we select high-confident samples using Tt to outputs are sometimes assigned to the wrong class when we
cover the information from the teacher models. We apply the input the same sample to the different source predictors. So
curriculum learning strategy [51] to make the value of dynamic these methods may lose important information from the multi-
threshold Tt gradually decrease with the process of model source domains and also do not consider the negative transfer
training with the step t. Similar to curriculum learning, we problems.
train the model in black-box domain adaption from easy sam- Inspired by the game theory [52], [53], we propose the
ples (high-threshold samples) to hard samples (low-threshold shapley-enhanced method to calculate the relevance between
samples). High-threshold samples in the target domain usu- different source domains and target domains. When the out-
ally are more similar to source domains in distribution, while put class probability of the network exceeds a threshold τ , we
low-threshold samples are hard to learn but may have specific assume this predicted label is a highly confident label. The
time-series information. value of a confidence number means the number of confidence
Combining these objectives introduced in (9) and (5), labels. The value of a confidence degree means the confidence
we train the predictive model by minimizing the following degree of these confidence labels. Generally speaking, when
objective function: a source domain is more notable and more relative to the tar-
get domain, both the confidence number and the confidence
Lfinal = Lt + α Lu (10) degree are larger. We further divide these high-confidence

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:44:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON CYBERNETICS

labels into two types, approving and opposing confidence Algorithm 1: Procedure of Single/Multi Black-Box
labels, to eliminate the negative domain adaption. Domain Adaption for Sensor Time Series Data
We assuming that there are multisource domains denoted N
Input: Black-box source model θsk ;Target data xit i=1
t
;
by S = {DkS }K k=1 . And based on each coalition of source Output:
domains S , S ⊆ S, we want to estimate the confidence score
Target Model θt ;
obtained from S . For each sample xit ∈ DT , we get four val-
Teacher Model Output;
ues (p+ (S ), np+ (S ), p− (S ), np− (S )) from S . As shown in
1
if K > 1 then
Fig. 3, S = {D1S , D2S , D3S , D4S } and we have p+  2
i (S ) = 0.92, Calculate the weight for different teacher models via
−    3
pi (S ) = 0.95, np+ (S ) = 2 and np− (S ) = 1. Then we can
i i Eq. (14);
compute the confidence score for each coalition S of source 4 Apply the model aggregation via Eq. (15);
domains as follows: 5 else
       6 Obtain the pseudo label of teacher model via Eq. (1);
F S = np+ S · max p+ i S 7 end
i
xit ∈DT Student Model Training;
    8
− λ ∗ np− S · max p−
i S (11) 9 for e=1 to Tm do
i
10 for i=1 to maximum iteration do
where xit is the ith samples in target domain. The confidence 11 Acquire a batch time series from the target
score is proposed based on three assumptions. First, when a domain;
coalition of source domains S has a larger maximum prob- 12 Generate augmented time series using Eq. (3);
ability max p+ 
i (S ) which means that the domain has lower
13 Update student output by minimizing the
entropy, the prediction has more information and is high-level objective in Eq. (10);
confidence. Second, most source domains support particular 14 end
label with higher confidence, the prediction is more likely to 15 for c=1 to C do
become the true label. So the confidence number np+ (S ) is 16 Determine the flexible threshold T(c) for class c;
i
higher. Third, if a domain is different from other domains, it 17 end
is possible to cause negative transfer. The hyperparameter λ 18 Update the teacher output via Eq. (16);
controls the penalty degree for the negative transfer. 19 end
Shapley Value: A set function v : 2N → R is required as
input for the Shapley value [52]. Each player i ∈ N receives
attributions si from the Shapley value that sum to v(N). The the weight wk
Shapley value of a player i is determined by  
G Dks
 |S|! ∗ (|N| − |S| − 1)! wk =  . (14)
si = (v(S ∪ i) − v(S)) (12) K
k=1 G Ds
k
N!
S⊆N\i
Our purpose is to aggregate a series of source domain time
we first order the players at random, then we add one player series and set weights according to their relevance to the target
at a time in order, finally we assign each player i its expected domain time series. The formula is shown as follows:
marginal contribution v(S ∪ i) − v(S). S is the set of players 
K  
that come before i in the ordering. In this article, we extend ŷi = wk · p̂ xit , θsk (15)
the Shapley value to multisource domain adaption. To get the k=1
confidence score of the source domain Dks , we need to list
all coalitions of source domains S ⊆ S in (11). We get the where θsk represents the output probability distribution of the
information gain to quantify the contribution of each source model and wk is the weight concerning the teacher model θsk .
domain as follows: wk represents the degree of transfer ability from the particular
     domain k, which is learned according to the unlabeled target
   Sl ! K − Sl  − 1 ! domain data. Assuming that there are n source domains, we
G Ds =
k
have K
K! k=1 wk = 1, wk ≥ 0. To further alleviate the noise in the
Sl ⊆S\{Dks } teacher prediction, we adopt an exponential moving average
    
· F Sl ∪ Dks − F Sl (13) strategy
 
ŷi ← wŷi + (1 − w)p̂ xit , θt (16)
where K is the number of multisource domains. For example,
there are four source domains. Then we get K = 4 and K! = where θt represents the model of the current epoch. We
4! = 4 ∗ 3 ∗ 2 ∗ 1 = 24. Sl represents each coalition of another update teacher predictions after every epoch like [54]. w is
source domains S \ {Dks }, Sl ⊆ S \ {Dks }. G(Dks ) describes a momentum hyperparameter to control the degree of tem-
the contribution of domain k to all domains. If one domain poral ensembling. Algorithm 1 summarized all the steps of
has more contributions to all domains, the domain’s weight is single/multi black-box domain adaption for sensor time series
larger. After getting the gains of different domains, we can get data.

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:44:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

REN AND CHENG: SINGLE/MULTI-SOURCE BLACK-BOX DOMAIN ADAPTION FOR SENSOR TIME SERIES DATA 7

TABLE I
D ETAILS OF T HREE S ENSOR T IME S ERIES DATASETS vision are used to compare the performance with our method.
A brief introduction of the mentioned baselines is as follows.
1) MT: The method is known as “source only” in this field
that infers the class directly from the predictions. So it
is a direct model transfer method.
2) HT [59]: The method is applied to train the student
model with one-hot labeling vectors (hard label).
3) ST [59]: The method is applied to train the student
model with probability labeling vectors (soft label).
4) SP-UDA [60]: A source-protected UDA method using a
generative model.
5) IterLNL [33]: The iterative learning method with noisy
labels mixing up the noise rate and categorywise sam-
pling to tackle the unbalanced label problems.
IV. E XPERIMENTS 6) CMSS [61]: CMSS is a source selection method using
The source and target domain samples can be either univari- the curriculum manager.
ate or multivariate sensor time series. Multivariate sensor is 7) DAEL [62]: DAEL is a knowledge ensemble-based
more complicated. So we select three multivariate time-series method, that adopts the domain of adaptive ensemble
dataset to verify the model’s performance. learning.
8) SimKD3A [63]: The KD3A is a privacy-preserving unsu-
A. Dataset pervised multisource domain adaption method through
Our proposed framework is evaluated on three public dataset knowledge distillation.
as shown in Table I, which collect the sensor time series in 9) DINE [32]: DINE is a two-step knowledge adaptation
IoT scenarios for different tasks. The reason we select these framework that combines distillation and fine-tunes to
datasets is that they are mostly widely used in current time- solve the domain adaption for image data.
series domain adaption circumstances [10], [24], [55]. The For fully comparison, we also compare our method with
three datasets are representative. the latest source-free domain adaption methods, including
1) har [56]: The har dataset contains multichannel sen- SHOT [64], A2Net [65], GSFDA [66], and DECISION [67].
sor time series data for 30 participants performing six tasks:
1) sitting; 2) standing; 3) walking; 4) biking; 5) stair up; and D. Comparison Results
6) stair down. The data are collected by heterogeneous devices,
including 31 smartphones, four smartwatches, and one tablet. 1) Single-Source: We first present the result of cross-
2) uwave [57]: We use a gesture recognition dataset domain time-series classification on har in Table II, uwave in
(uwave), in which multichannel sensor time series are col- Table III and wisdm in Table IV. These three tables describe
lected by the accelerometers from 33 participants showing the target classification accuracy (source → target) on 14
various hand gestures. randomly chosen problems for each dataset, which achieve
3) wisdm [58]: The wisdm dataset is collected by 36 sub- domain adaption between users. For example, 0 → in Table II
jects performing six tasks that are the same as the har dataset. represents adapting from user 0 to user 2 on the har dataset.
But the dataset is more challenging because of the class To compare with other black-box methods, we adopt the same
imbalance problems. backbone-transformer [68]. It covers the advantage of 1D-
CNN and the attention model. We can see B2TSDA achieves
the best-mean accuracy at 76.4% on har, 74.6% on uwave, and
B. Implementation Details
70.9% on wisdom, which outperforms the strongest black-box
In our experiment, we run our methods four times with dif- baselines by 0.4% on har, 0.7% on uwave and 1.9% on wisdm
ferent random seeds and then we report the average accuracies. in terms of accuracy. We also test the results using different
We adopt the transformer as the main backbone. We use SGD backbones to verify that B2TSDA is a model-agnostic method
as the optimizer with momentum (0.9). We set the batch size in ablation study.
to 64. And the initial learning rate is set as 0.1. We use relu as 2) Multisource: We also study the performance of multi-
the activation function. Also, we add batch normalization and source domain black-box time-series domain adaption. The
skip connection in the model. To evaluate the model-agnostic results on har, uwave and wisdm are, respectively, shown in
property of the B2TSDA framework, different model architec- Tables V–VII. → 1 in Table V indicates that we select the
tures, including FCN, transformer, XceptionTime, XCMPlus, data of user 1 as the target domain and other users as the
GRU, and ResCNN, are adopted and evaluated. source domain to achieve multisource domain adaption. We
find that B2TSDA gets competitive results on all three datasets.
C. Baseline Methods KD3A uses the size of the source domains, but our circum-
There is little work on time series domain adaption. To stances do not provide the information. The KD3A we applied
demonstrate the feasibility of the proposed method in this arti- is a simplified version. We note that B2TSDA in multisource
cle, several representatives and classic baselines in computer black-box domain adaption achieves a better result. On wisdm,

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:44:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON CYBERNETICS

TABLE II
S INGLE -S OURCE D OMAIN A DAPTION P ERFORMANCE ON THE HAR DATASET

TABLE III
S INGLE -S OURCE D OMAIN A DAPTION P ERFORMANCE ON THE UWAVE DATASET

TABLE IV
S INGLE -S OURCE D OMAIN A DAPTION P ERFORMANCE ON THE WISDM DATASET

B2TSDA achieved 78% classification accuracy, outperforming The effectiveness of each component in our model via abla-
SimK3DA (76.4%). tion studies on har, uwave and wisdm datasets, is shown
in Table VIII, where sing-source domain adaption is on the
top, and multisource domain adaption is on the bottom. With
E. Ablation Study TM or DT ablated, the performance degrades 2.1%, 1.2%
We adopt an ablation study to demonstrate the effectiveness in single-source domain adaption, and 0.6%, 0.5% in mul-
of our designed components, including mask augmentation, tisource domain adaption, respectively. The absence of SE
dynamic threshold, and information gain. causes 1.7% performance drop in multisource domain adap-
1) B2TSDA(w/o TM), which denotes that we adapt the tion. The results demonstrate the effectiveness and rationality
model without the Temporal Mask, only optimize the of all design choices in our framework.
objective Lt in (10).
2) B2TSDA(w/o DT), which eliminates the process of using
the Dynamic Threshold Tt to select confidence labels.
3) B2TSDA(w/o SE), which shows the model without F. Model Agnostic Analysis
Shapley-Enhanced method and directly averages the We also compare the different backbones to verify
predictions of multisource models. So the weight of the model-agnostic model. We do investigations for dif-
source models wk are equal. ferent student models in our black-box domain adaption

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:44:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

REN AND CHENG: SINGLE/MULTI-SOURCE BLACK-BOX DOMAIN ADAPTION FOR SENSOR TIME SERIES DATA 9

TABLE V
M ULTISOURCE D OMAIN A DAPTION P ERFORMANCE ON THE HAR DATASET

TABLE VI
M ULTISOURCE D OMAIN A DAPTION P ERFORMANCE ON THE UWAVE DATASET

TABLE VII
M ULTISOURCE D OMAIN A DAPTION P ERFORMANCE ON THE WISDM DATASET

TABLE VIII
A BLATION R ESULTS OF D IFFERENT VARIANTS FOR S INGLE -S OURCE
AND M ULTISOURCE B LACK -B OX D OMAIN A DAPTIONS

Fig. 4. Illustration of classification accuracy when training on various


amounts of target domain data on tasks 1→5 on the har dataset and 3→8 on
the uwave dataset based on B2TSDA. (a) task1→task5. (b) task3→task8.

applied as the teacher–student model. As shown in Fig. 4(a)


task. We choose popular time-series models as the back- and (b), we can see that sometimes a simple backbone is more
bones of the student models, including FCN [69], trans- accurate than the complex model. That is to say, we can choose
former [68], XceptionTime [70], XCMPlus [71], GRU [72], a simple backbone as a student model. Since a simple model
and ResCNN [73]. XceptionTime [70] applies depthwise sep- is effective, it is unnecessary to choose the complex model as
arable convolutions, adaptive average pooling, and a novel the same as the teacher model.
no-linear normalization technique in temporal and spatial time-
series. We use the transformer [68] encoder to extract the G. Robustness To Negative Transfer
importance of time-series representation. XCMPlus [71] is an Irrelevant and malicious source domains in har dataset are
explainable CNN for time series classification, Classic deep constructed to verify the effectiveness of sharply enhanced
learning models for time series, FCN [69], GRU [72], is also methods and robustness to negative transfer. We select domain

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:44:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON CYBERNETICS

TABLE IX
M ULTISOURCE D OMAIN A DAPTION P ERFORMANCE W ITH I RRELEVANT V. C ONCLUSION
AND M ALICIOUS D OMAINS
In this article, we propose a black-box domain adaption
framework for time series data (B2TSDA). We design the con-
sistency loss by combining dynamic threshold and adaptive
mask to balance the learning difficulties of different classes
and maintain the consistency of temporal information. For the
TABLE X multisource black-box domain adaption, we further propose
M ULTISOURCE D OMAIN A DAPTION W EIGHTS A SSIGNED TO THE a Shapley-enhanced method to weigh the information from
I RRELEVANT AND M ALICIOUS D OMAINS
different teacher models. Experiment results on both single-
source domain adaption and multisource domain adaption
show that our framework has superior performance compared
to other black-box UDA methods. There are some exciting
directions to pursue, including: we suspect that our time-series
mask algorithms can perform well in the universal UDA tasks;
when there are too many source models to utilize, it would be
interesting to study how to improve computational efficiency.

R EFERENCES
[1] J. Roche, V. De-Silva, J. Hook, M. Moencks, and A. Kondoz, “A
multimodal data processing system for LiDAR-based human activity
recognition,” IEEE Trans. Cybern., vol. 52, no. 10, pp. 10027–10040,
Oct. 2022.
Fig. 5. Illustration of classification accuracy when training under different [2] L. Vu, V. L. Cao, Q. U. Nguyen, D. N. Nguyen, D. T. Hoang, and
parameters based on B2TSDA. (a) Sensitivity to α. (b) Sensitivity to λ. E. Dutkiewicz, “Learning latent representation for IoT anomaly detec-
tion,” IEEE Trans. Cybern., vol. 52, no. 5, pp. 3769–3782, May 2022.
[3] M. Liu, J. Zhang, C. Lian, and D. Shen, “Weakly supervised deep
learning for brain disease prognosis using MRI and incomplete clin-
5 in har dataset as the target domain. To construct an ical scores,” IEEE Trans. Cybern., vol. 50, no. 7, pp. 3381–3392,
irrelevant domain, we select domain 2 as the irrelevant domain, Jul. 2020.
[4] Z. Chen et al., “A multi-source weighted deep transfer network for open-
denoted by IR-2. Because the domain 2 is different from set fault diagnosis of rotary machinery,” IEEE Trans. Cybern., vol. 53,
other domains. To construct a malicious domain, we con- no. 3, pp. 1982–1993, Mar. 2023.
duct a poisoning attack on the high-quality domain-domain [5] C. Liu, K. Wang, Y. Wang, and X. Yuan, “Learning deep multimanifold
4 with 30% wrong labels, denoted by AT-30. Also, we report structure feature representation for quality prediction with an industrial
application,” IEEE Trans. Ind. Informat., vol. 18, no. 9, pp. 5849–5858,
the performance with the bad domain dropped. We compare Sep. 2022.
the shapley-enhanced methods with two common strategies: [6] Y. Ganin et al., “Domain-adversarial training of neural
1) mean focus and 2) consensus focus in SimKD3A [63]. From networks,” J. Mach. Learn. Res., vol. 17, no. 1, pp. 2096–2030,
2016.
the results shown in Tables IX and X, we can get the following [7] H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller,
information. “Transfer learning for time series classification,” in Proc. IEEE Int. Conf.
1) Since the domain-drop outperforms the other methods, Big Data (Big Data), 2018, pp. 1367–1376.
[8] Y. Feng, J. Chen, S. He, T. Pan, and Z. Zhou, “Globally localized
the negative transfer for IR-2 truly exists. multisource domain adaptation for cross-domain fault diagnosis with
2) The consensus focus method and shapley-enhanced category shift,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 6,
method are robust to the irrelevant domains and mali- pp. 3082–3096, Jun. 2023.
cious domains, because the weights for the irrelevant [9] S. Liu, H. Zhang, and Y. Jin, “A survey on computationally effi-
cient neural architecture search,” J. Autom. Intell., vol. 1, no. 1, 2022,
domains and malicious domains are low. Art. no. 100002.
3) Shapley-enhanced method performs better in malicious [10] G. Wilson, J. R. Doppa, and D. J. Cook, “Multi-source deep domain
domains than other techniques because it gives the adaptation with weak supervision for time-series sensor data,” in
Proc. 26th ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., 2020,
malicious domain lower weight (5.1% for AT-30). pp. 1768–1778.
[11] G. Cai, L. He, M. Zhou, H. Alhumade, and D. Hu, “Learning smooth
representation for unsupervised domain adaptation,” IEEE Trans. Neural
H. Parameter Sensitivity Netw. Learn. Syst., vol. 34, no. 8, pp. 4181–4195, Aug. 2023.
[12] S. Xie, Z. Zheng, L. Chen, and C. Chen, “Learning semantic represen-
In this section, we evaluate the sensitivity of two hyperpa- tations for unsupervised domain adaptation,” in Proc. Int. Conf. Mach.
rameters: 1) α controlling the importance of Lu on tasks 1→5 Learn., 2018, pp. 5423–5432.
on the har dataset and 2) λ controlling the penalty degree for [13] S. Zhao et al., “A review of single-source deep unsupervised visual
domain adaptation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33,
the negative transfer on multisource domain adaption →1 on no. 2, pp. 473–493, Feb. 2022.
the uwave dataset. From results of hyperparameters in Fig. 5, [14] Y. Yao and G. Doretto, “Boosting for transfer learning with multiple
we find that the best parameter of α is around 0.6 and the sources,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
best parameter of λ is around 0.4. Different values have major Recognit., 2010, pp. 1855–1862.
[15] D.-H. Lee, “Pseudo-label: The simple and efficient semi-supervised
influences on the model performances and need to be tuned learning method for deep neural networks,” in Proc. Workshop
according to the industrial task. Challenges Represent. Learn., vol. 3, 2013, p. 896.

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:44:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

REN AND CHENG: SINGLE/MULTI-SOURCE BLACK-BOX DOMAIN ADAPTION FOR SENSOR TIME SERIES DATA 11

[16] B. Sun and K. Saenko, “Deep CORAL: Correlation alignment for [41] Z. Meng, J. Li, Y. Gaur, and Y. Gong, “Domain adaptation via
deep domain adaptation,” in Proc. Eur. Conf. Comput. Vis., 2016, teacher–student learning for end-to-end speech recognition,” in Proc.
pp. 443–450. IEEE Autom. Speech Recognit. Understand. Workshop (ASRU)2019,
[17] C. Chen et al., “HoMM: Higher-order moment matching for unsuper- pp. 268–275.
vised domain adaptation,” in Proc. AAAI Conf. Artif. Intell., vol. 34, [42] S. You, C. Xu, C. Xu, and D. Tao, “Learning from multiple teacher
2020, pp. 3422–3429. networks,” in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Disc. Data
[18] J. Yang, Y. Xu, H. Cao, H. Zou, and L. Xie, “Deep learning and transfer Min., 2017, pp. 1285–1294.
learning for device-free human activity recognition: A survey,” J. Autom. [43] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural
Intell., vol. 1, no. 1, 2022, Art. no. 100007. network,” Comput. Sci., vol. 14, no. 7, pp. 38–39, 2015.
[19] K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, [44] A. Hussein and H. Hajj, “Domain adaptation with representation learning
“Domain separation networks,” in Proc. Adv. Neural Inf. Process. Syst., and nonlinear relation for time series,” ACM Trans. Internet Things,
vol. 29, 2016, pp. 343–351. vol. 3, no. 2, pp. 1–26, 2022.
[20] J. Shen, Y. Qu, W. Zhang, and Y. Yu, “Wasserstein distance guided rep- [45] T. Miyato, S.-I. Maeda, M. Koyama, and S. Ishii, “Virtual adversarial
resentation learning for domain adaptation,” in Proc. 32nd AAAI Conf. training: A regularization method for supervised and semi-supervised
Artif. Intell., 2018. learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 8,
[21] W. Zhang, W. Ouyang, W. Li, and D. Xu, “Collaborative and adversar- pp. 1979–1993, Aug. 2019.
ial network for unsupervised domain adaptation,” in Proc. IEEE Conf. [46] O. Chapelle, B. Scholkopf, and A. Zien, “Semi-supervised learning
Comput. Vis. Pattern Recognit., 2018, pp. 3801–3809. (Chapelle, O. et al., Eds.; 2006) [book reviews],” IEEE Trans. Neural
[22] S. Motiian, Q. Jones, S. Iranmanesh, and G. Doretto, “Few-shot adver- Netw., vol. 20, no. 3, p. 542, Mar. 2009.
sarial domain adaptation,” in Proc. Adv. Neural Inf. Process. Syst., [47] W.-N. Hsu, B. Bolte, Y.-H. H. Tsai, K. Lakhotia, R. Salakhutdinov, and
vol. 30, 2017, pp. 6670–6680. A. Mohamed, “HuBERT: Self-supervised speech representation learning
[23] H. Zhao, Q. Zheng, K. Ma, H. Li, and Y. Zheng, “Deep representation- by masked prediction of hidden units,” IEEE/ACM Trans. Audio, Speech,
based domain adaptation for nonstationary EEG classification,” IEEE Lang. Process., vol. 29, pp. 3451–3460, 2021.
Trans. Neural Netw. Learn. Syst., vol. 32, no. 2, pp. 535–545, [48] Y. Zhao, M. Elhousni, Z. Zhang, and X. Huang, “Distance trans-
Feb. 2021. form pooling neural network for LiDAR depth completion,” IEEE
[24] M. Ragab, E. Eldele, Z. Chen, M. Wu, C.-K. Kwoh, and X. Li, Trans. Neural Netw. Learn. Syst., early access, Dec. 13, 2021,
“Self-supervised autoregressive domain adaptation for time series doi: 10.1109/TNNLS.2021.3129801.
data,” IEEE Trans. Neural Netw. Learn. Syst., early access, Jun. 23, [49] R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and
2022, doi: 10.1109/TNNLS.2022.3183252. Practice. Melbourne, VIC, Australia: OTexts, 2018.
[25] H. Zhao, S. Zhang, G. Wu, J. M. Moura, J. P. Costeira, and G. J. Gordon, [50] B. Zhang et al., “FlexMatch: Boosting semi-supervised learning with
“Adversarial multiple source domain adaptation,” in Proc. Adv. Neural curriculum pseudo labeling,” in Proc. Adv. Neural Inf. Process. Syst.,
Inf. Process. Syst., vol. 31, 2018, pp. 8559–8570. vol. 34, 2021, pp. 1–16.
[26] Q. Xie, Z. Dai, Y. Du, E. Hovy, and G. Neubig, “Controllable invariance [51] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum
through adversarial feature learning,” 2017, arXiv:1705.11122. learning,” in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 41–48.
[27] J. Guo, D. Shah, and R. Barzilay, “Multi-source domain adaptation [52] S. Lipovetsky and M. Conklin, “Analysis of regression in game the-
with mixture of experts,” in Proc. Conf. Empir. Methods Natural Lang. ory approach,” Appl. Stochastic Models Bus. Ind., vol. 17, no. 4,
Process., 2018, pp. 4694–4703. pp. 319–330, 2001.
[28] G. Wilson, J. R. Doppa, and D. J. Cook, “CALDA: Improving [53] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model
multi-source time series domain adaptation with contrastive adversarial predictions,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017,
learning,” 2021, arXiv:2109.14778. pp. 4768–4777.
[29] Z. Chai, C. Zhao, and B. Huang, “Multisource-refined transfer network [54] K. Kim, B. Ji, D. Yoon, and S. Hwang, “Self-knowledge distillation
for industrial fault diagnosis under domain and category inconsisten- with progressive refinement of targets,” in Proc. IEEE/CVF Int. Conf.
cies,” IEEE Trans. Cybern., vol. 52, no. 9, pp. 9784–9796, Sep. 2022. Comput. Vis., 2021, pp. 6567–6576.
[30] J. Liang, D. Hu, and J. Feng, “Do we really need to access the source [55] Q. Liu and H. Xue, “Adversarial spectral kernel matching for unsuper-
data? Source hypothesis transfer for unsupervised domain adaptation,” in vised time series domain adaptation,” in Proc. Int. Joint Conf. Artif.
Proc. Int. Conf. Mach. Learn., 2020, pp. 6028–6039. Intell., 2021, pp. 2744–2750.
[31] R. Li, Q. Jiao, W. Cao, H.-S. Wong, and S. Wu, “Model adapta- [56] A. Stisen et al., “Smart devices are different: Assessing and mitigating-
tion: Unsupervised domain adaptation without source data,” in Proc. mobile sensing heterogeneities for activity recognition,” in Proc. 13th
IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9641–9650. ACM Conf. Embedded Netw. Sens. Syst., 2015, pp. 127–140.
[32] J. Liang, D. Hu, J. Feng, and R. He, “DINE: Domain adaptation from [57] J. Liu, Z. Wang, L. Zhong, J. Wickramasuriya, and V. Vasudevan,
single and multiple black-box predictors,” in Proc. IEEE/CVF Conf. “uWave: Accelerometer-based personalized gesture recognition and its
Comput. Vis. Pattern Recognit., 2022, pp. 1–11. applications,” in Proc. IEEE Int. Conf. Pervasive Comput. Commun.,
[33] H. Zhang, Y. Zhang, K. Jia, and L. Zhang, “Unsupervised domain 2009, pp. 1–9.
adaptation of black-box source models,” 2021, arXiv:2101.02839. [58] J. R. Kwapisz, G. M. Weiss, and S. A. Moore, “Activity recognition
[34] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and using cell phone accelerometers,” ACM SigKDD Explorations Newslett.,
D. Bacon, “Federated learning: Strategies for improving communication vol. 12, no. 2, pp. 74–82, 2011.
efficiency,” 2016, arXiv:1610.05492. [59] H. Wang, Q. Zhang, J. Wu, S. Pan, and Y. Chen, “Time series feature
[35] X. Peng, Z. Huang, Y. Zhu, and K. Saenko, “Federated adversar- learning with labeled and unlabeled data,” Pattern Recognit., vol. 89,
ial domain adaptation,” in Proc. Int. Conf. Learn. Represent., 2019, pp. 55–66, May 2019.
pp. 1–19. [60] B. Yang, A. J. Ma, and P. C. Yuen, “Revealing task-relevant model mem-
[36] Y. Sun, N. Chong, and O. Hideya, “Multi-source domain adaptation orization for source-protected unsupervised domain adaptation,” IEEE
based on federated knowledge alignment,” 2022, arXiv:2203.11635. Trans. Inf. Forensics Security, vol. 17, pp. 716–731, 2022.
[37] H. Chen, Y. Wang, C. Xu, C. Xu, and D. Tao, “Learning student [61] L. Yang, Y. Balaji, S.-N. Lim, and A. Shrivastava, “Curriculum manager
networks via feature embedding,” IEEE Trans. Neural Netw. Learn. for source selection in multi-source domain adaptation,” in Proc. Eur.
Syst., vol. 32, no. 1, pp. 25–35, Jan. 2021. Conf. Comput. Vis., 2020, pp. 608–624.
[38] S. Li et al., “Distilling a powerful student model via online knowledge [62] K. Zhou, Y. Yang, Y. Qiao, and T. Xiang, “Domain adaptive ensem-
distillation,” IEEE Trans. Neural Netw. Learn. Syst., early access, Mar. 7, ble learning,” IEEE Trans. Image Process., vol. 30, pp. 8008–8018,
2022, doi: 10.1109/TNNLS.2022.3152732. 2021.
[39] T. Gu, Z. Wang, X. Xu, D. Li, H. Yang, and W. Du, “Frame-level [63] H. Feng et al., “KD3A: Unsupervised multi-source decentralized domain
teacher–student learning with data privacy for EEG emotion recogni- adaptation via knowledge distillation,” in Proc. Int. Conf. Mach. Learn.,
tion,” IEEE Trans. Neural Netw. Learn. Syst., early access, Apr. 29, 2021, pp. 3274–3283.
2022, doi: 10.1109/TNNLS.2022.3168935. [64] J. Liang, D. Hu, Y. Wang, R. He, and J. Feng, “Source data-absent
[40] X. Tan, Y. Ren, D. He, T. Qin, Z. Zhao, and T.-Y. Liu, unsupervised domain adaptation through hypothesis transfer and label-
“Multilingual neural machine translation with knowledge distilla- ing transfer,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11,
tion,” 2019, arXiv:1902.10461. pp. 8602–8617, Nov. 2022.

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:44:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON CYBERNETICS

[65] H. Xia, H. Zhao, and Z. Ding, “Adaptive adversarial network for source- Lei Ren (Member, IEEE) received the Ph.D. degree
free domain adaptation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., in computer science from the Institute of Software,
2021, pp. 9010–9019. Chinese Academy of Sciences, Beijing, China,
[66] S. Yang, Y. Wang, J. Van De Weijer, L. Herranz, and S. Jui, “Generalized in 2009.
source-free domain adaptation,” in Proc. IEEE/CVF Int. Conf. Comput. He is currently a Professor with the School
Vis., 2021, pp. 8978–8987. of Automation Science and Electrical Engineering,
[67] S. M. Ahmed, D. S. Raychaudhuri, S. Paul, S. Oymak, and Beihang University; Zhongguancun Laboratory;
A. K. Roy-Chowdhury, “Unsupervised multi-source domain adaptation and the State Key Laboratory of Intelligent
without access to source data,” in Proc. IEEE/CVF Conf. Comput. Vis. Manufacturing System Technology, Beijing. His
Pattern Recognit., 2021, pp. 10103–10112. research interests include neural networks and
[68] G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A deep learning, industrial digital twin, and industrial
transformer-based framework for multivariate time series representation AI applications.
learning,” in Proc. 27th ACM SIGKDD Conf. Knowl. Disc. Data Min., Dr. Ren serves as an Associate Editor for the IEEE T RANSACTIONS
2021, pp. 2114–2124. ON N EURAL N ETWORKS AND L EARNING S YSTEMS and other international
[69] Z. Wang, W. Yan, and T. Oates, “Time series classification from scratch journals.
with deep neural networks: A strong baseline,” in Proc. Int. Joint Conf.
Neural Netw. (IJCNN), 2017, pp. 1578–1585.
[70] E. Rahimian, S. Zabihi, S. F. Atashzar, A. Asif, and A. Mohammadi,
“Xceptiontime: Independent time-window xceptiontime architecture for
hand gesture classification,” in Proc. IEEE Int. Conf. Acoust., Speech
Signal Process. (ICASSP) 2020, pp. 1304–1308.
[71] K. Fauvel, T. Lin, V. Masson, É. Fromont, and A. Termier, “XCM: An
Xuejun Cheng (Graduate Student Member, IEEE)
explainable convolutional neural network for multivariate time series
received the bachelor’s and master’s degrees in
classification,” Mathematics, vol. 9, no. 23, p. 3137, 2021.
automation from Beihang University, Beijing, China,
[72] L. Ren, X. Cheng, X. Wang, J. Cui, and L. Zhang, “Multi-scale
in 2016 and 2019, respectively, where he is cur-
dense gate recurrent unit networks for bearing remaining useful
rently pursuing the Ph.D. degree with the School
life prediction,” Future Gener. Comput. Syst., vol. 94, pp. 601–609,
of Automation Science and Electrical Engineering.
May 2019.
His research interests include deep learning for
[73] X. Zou, Z. Wang, Q. Li, and W. Sheng, “Integration of residual network
time series forecasting and deep learning for Internet
and convolutional neural network along with various activation func-
of Things.
tions and global pooling for time series classification,” Neurocomputing,
vol. 367, pp. 39–45, Nov. 2019.

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:44:18 UTC from IEEE Xplore. Restrictions apply.

You might also like