You are on page 1of 11

Measurement 180 (2021) 109546

Contents lists available at ScienceDirect

Measurement
journal homepage: www.elsevier.com/locate/measurement

Graph neural network approach for anomaly detection


Lingqiang Xie a, Dechang Pi a, *, Xiangyan Zhang b, Junfu Chen a, Yi Luo a, Wen Yu a
a
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
b
Beijing Institute of Spacecraft System Engineering, Beijing, China

A R T I C L E I N F O A B S T R A C T

Keywords: To ensure the stable long-time operation of satellites, evaluate the satellite status, and improve satellite main­
Telemetry data tenance efficiency, we propose an anomaly detection method based on graph neural network and dynamic
Graph neural network threshold (GNN-DTAN). Firstly, we build the graph neural network model for telemetry data. The graph con­
Dynamic threshold
struction module in the model extracts the relationship between features, and the spatial dependency extraction
Anomaly detection
module and the temporal dependency extraction module extract the spatial and temporal dependencies of the
data, respectively. The trained model is then used to predict the data, and the anomaly score between the
predicted and actual values is calculated. Finally, the wavelet variance is used to analyze the data period. A
dynamic threshold method based on the period time window is used to detect anomalies in the data set.
Experimental results of satellite power system telemetry data show that the proposed algorithm’s accuracy
reaches more than 98%, with good effectiveness and robustness.

1. Introduction status. The anomaly detection of the time series data can provide an
early warning of the satellite failure, so we propose a method to detect
Anomaly detection of satellite telemetry time series data can be used the satellite telemetry time series data anomaly. Most of the current
to evaluate the status of the satellite. The space environment of the algorithmic models in anomaly detection ignore or fail to exploit the
satellite on-orbit operation is very harsh. Once the satellite in orbit potential dependencies between the feature variables and the time se­
failure, it may lead to system service failure and poor availability if not ries. Ignoring their correlation will seriously affect the accuracy of the
handled in time. Even the satellite is out of control, bringing huge model. Besides, the harsh space environment leads to many peaks in
economic losses to the country. The anomaly must be identified as soon satellite data, and the fixed threshold will make the anomaly detection
as they occur [1]. Timely and effective anomaly detection and fault accuracy low. These problems have not been well solved in the field of
location can ensure the safe and reliable operation of satellites [2–4], anomaly detection at present.
and fault diagnosis is playing an increasingly important role in the sat­ In response to the above problems of existing anomaly detection
ellite management process, it can provide maintenance for the satellite methods, this paper proposes an anomaly detection method based on a
to extend their life [5,6]. Therefore, the anomaly detection of satellite graph neural network and dynamic thresholding for satellite telemetry
telemetry timing data is of great significance. time-series data. The main contributions in this paper are summarized as
Due to many satellite sensors and complex physical structures, sat­ follows:
ellite telemetry time series data are characterized by large data volume
and high dimensionality [7,8]. With the development and progress of (1) An unsupervised regression prediction method based on a graph
the times, the neural network structure and computing power have been neural network is proposed. The model does not require the
further optimized and improved. People have conducted more and more knowledge of relevant domain experts or a large amount of
research on deep learning. Because deep learning works well for pro­ manually labeled data. This paper predicts all feature values in
cessing large data and high-dimensional data, deep learning methods the next moment by graph neural network model.
have started to be used in large quantities in the field of anomaly (2) A novel method is proposed to calculate the relationship between
detection. the features of satellite telemetry data. Different from traditional
The satellite telemetry time series data can reflect the satellite’s graph neural networks, our model does not require a predefined

* Corresponding author.
E-mail addresses: xlqnuaa@nuaa.edu.cn (L. Xie), pinuaa@nuaa.edu.cn (D. Pi), cjf@nuaa.edu.cn (J. Chen), Luoyi3819@nuaa.edu.cn (Y. Luo).

https://doi.org/10.1016/j.measurement.2021.109546
Received 10 January 2021; Received in revised form 23 March 2021; Accepted 3 May 2021
Available online 9 May 2021
0263-2241/© 2021 Elsevier Ltd. All rights reserved.
L. Xie et al. Measurement 180 (2021) 109546

graph structure. We construct the graph adjacency matrix adap­ anomalies. The benefit of the neighborhood-based approach is that it
tively by calculating the relationship between feature nodes to relaxes the assumptions on data distribution and prior knowledge. The
transform the linear structure data into graph data. The graph drawback is that when the domain is complex, it leads to a high prob­
structure obtained by learning is optimal. ability of error detection. Besides, the computational and storage over­
(3) The TDE module of the graph neural network combines the head of the neighborhood-based approach is large. Ishimtsev et al. [14]
dilated convolution and inception structures to extract the tem­ used deferred scores adapted non-stationarity and a non-parametric
poral dependence of the data; the SDE module fuses and propa­ probability measure to quantify the confidence in the predicted
gates the data information through the mix-hop propagation values’ anomalies. Breunig et al. [15] utilized the local anomaly factor
module to extract the spatial dependence of the data method to measure the clustering result anomaly of flight data.
(4) A dynamic thresholding method based on periodic time windows The anomaly detection algorithm based on statistics selects the sta­
is proposed for anomaly detection of data. The paper first ana­ tistical model and fits normal data using the statistical model. Apply
lyzes the data period using wavelet variance, then uses the statistical inference tests and data distribution hypotheses to calculate
moving weighted average method to calculate the threshold the anomaly scores of satellite time-series data, then determine whether
value at each moment and judges the anomaly by the threshold the data are outliers. The statistics-based approach reduces the de­
value. pendency on labels. However, it has a serious problem: with the increase
of variable dimension, the model complexity of the statistical method
The rest is organized as follows: In section 2, we introduce related increases quadratic, and the model may lead to overfitting. Schumann
work on aircraft anomaly detection and graph neural networks. In sec­ et al. [16] modeled real-time sensor health state abnormality determi­
tion 3, we describe a regression prediction model based on graph neural nation based on Bayesian networks, encoding a conditional probability
networks. In section 4, the dynamic threshold anomaly detection is table for each sensor node by the prior probability of sensor health state.
introduced. In section 5, experimentation and analysis of the proposed Khalastchi et al. [17] first assumed that the flight data obeyed a
model using the satellite power system dataset compares it with the Gaussian distribution. Then, considering that the flight data had a low-
state-of-the-art models. Finally, section 6 summarizes our work and rank characteristic, they used the Mahalanobis distance metric data
future outlook. variation to determine the anomaly.
The regression-based anomaly detection algorithm’s core is to build
2. Related works a regression model, fit the satellite data using the regression prediction
model, and determine the anomaly by comparing the degree of differ­
2.1. Anomaly detection in aircraft ence between the predicted and sample data. The regression-based
approach does not need to label data; it focuses on mining and
Satellite telemetry data anomaly detection has been a hot research exploiting the relationships between data and can process data online.
topic. According to the algorithmic principles of anomaly detection, The difficulty lies in the accurate fitting of satellite parameters. Melnyk
telemetry time series data anomaly detection algorithms can be broadly et al. [18] established a vector autoregressive model. They used the
classified into five categories, which are based on clustering, classifi­ vector autoregressive coefficient model to establish the data’s distance
cation, neighborhood, statistical, and regression approaches. The matrix and determine whether the data are anomalies by the distance
anomaly detection method based on clustering takes the distance be­ matrix differences. Akouemo et al. [19] applied a linear regression
tween the sample and the nearest cluster center as the anomaly score to model to detect anomalies; the principle is to use a Bayesian maximum
judge the anomaly; the advantage of the clustering-based algorithm is likelihood classifier to learn the known anomaly time series pattern.
that the model does not require labeled data and focuses on mining the Abdelghafar et al. [20] use an optimized extreme learning machine for
relationships between data. The disadvantage is that anomaly de­ anomaly detection of satellite telemetry data.
tection’s effectiveness depends on capturing normal data structure, For high-dimensional data anomaly detection, deep learning
leading to a potentially wide variation between clustering algorithms. Li methods are more accurate than traditional machine learning methods,
L et al. [9] proposed the ClusterAD-DataSample model and used the and a large number of researchers have started to investigate them. Kieu
Gaussian mixture model to cluster flight data to judge anomalies; Ben­ et al. [21] first enrich the original data feature space and then recon­
kabou et al. [10] measured the minimum intra-cluster distance by dy­ struct the data using a self-encoder, and the reconstruction deviation
namic time warping. Based on the entropy of time series and dynamic represents the anomaly indicator. Li et al. [22] uses a deep belief
time planning, it became a weighted clustering problem, and the time network for anomaly detection on spacecraft, building neural network
series with small weights were identified as outliers. models to describe correlations between multiple variables. Junfu et al.
The core of the classification-based anomaly detection method is a [23] developed a Bayesian LSTM model for the satellite telemetry data
classifier trained by labeled data, and the trained classifier can be used and re-evaluated the samples with high uncertainty for the variational
to determine anomalies [11]. The classification-based approach makes auto-encoder. Pan et al. [24] proposed a bi-directional long short-term
full use of labeled data to understand the category boundaries of the data memory neural network to extract features and regress the data to
better. But a lot of labeled data is needed to train the model. The number determine anomalies by the deviation between the predicter and actual
of anomalous samples in satellite time series data is small, and over­ values. These deep learning methods do not consider the potential re­
fitting is easy due to category imbalance. Das et al. [12] used stan­ lationships between variables, resulting in the inadequate fitting of high-
dardized longest common subsequence and symbol aggregation dimensional satellite parameter data and reduced anomaly detection
approximation to represent continuous and discrete heterogeneous data, effect.
respectively and then constructed an anomaly detection model based on
a single-class support vector machine. The principle is to construct the 2.2. Graph neural network
optimal hyperplane in the feature space to distinguish anomaly data and
normal data for detecting multivariate time series data anomalies. Graph neural network research has now achieved a large number of
JanaKiraman et al. [13] proposed the ADOPT model to transform the studies. Most of the currently existing graph neural network model input
anomaly detection problem into a suboptimal decision problem. is graph data; graph data consists of feature nodes and edges, the size of
The core of the anomaly detection algorithm based on the neigh­ the edge is used to measure the correlation between the feature nodes.
borhood is to define the neighborhood, and the method used is generally Usually, an adjacency matrix or an adjacency table is used to store graph
to use distance or similarity measure. The distance or relative density data. The following definitions are given for the concepts related to
between data and neighborhood is used as an anomaly score to judge graphs.

2
L. Xie et al. Measurement 180 (2021) 109546

based on space defines the graph convolution according to the spatial


relationship between feature nodes. Then each feature node aggregates
the feature node information and neighbor information through
embedding. The basic principle is illustrated in Fig. 1.

3. Graph neural network model

Compared with other methodological models, graph neural networks


have achieved great success in dealing with spatial dependencies, and
the fitting accuracy of the data has been improved to some extent. The
conventional graph neural models have the following drawbacks: The
traditional graph-neural models have the following disadvantages: First,
they inefficiently deal with pure high-dimensional time series data.
Second, they use predefined graph structures, so there is no way to
determine if the graph structure is optimal. We propose a graph neural
Fig. 1. The basic principle diagram of graph neural network.
network-based model to address the above drawbacks, The network
structure is shown in Fig. 2.
Definition 1(Graph): The graph is an ordered binary group G = (V, The model contains three core parts: graph construction (GC) mod­
E),Vdenotes the set of feature nodes, and Edenotes the set of edges. This ule, spatial dependency extraction (SDE) module, and temporal de­
paperNdenotes the number of feature nodes. pendency extraction (TDE) module. The GC module adaptively learns
Definition 2(Neighborhood Node): Let v ∈ Vandn ∈ V denote two the adjacency matrix composed of feature nodes, and the graph structure
feature nodes, and e = (v, n) ∈ Eto denote an edge pointing from v to n, obtained by learning is optimal. Use the learned adjacency matrix in the
the neighbor node of the feature node v is defined as N(v) = {n ∈ V|(v, SDE module. The SDE module serves to fuse the information of feature
n) ∈ E}. nodes and neighbor nodes, and the TDE module obtains the temporal
Definition3(Adjacency Matrix): Set the neighbor matrix M ∈ RN×N . If patterns of data by 1D convolution. Besides, residual connections are
Mi,j > 0denote (Vi , Vj ) ∈ E and if Mi,j < 0, denote (Vi , Vj ) ∕
∈ E. added to the model to avoid gradient disappearance. The residual
In this paper, the high-dimensional satellite data features represent connection is connected from the output of the TDE module to the
the feature nodes in the graph. output of the model. and this section will introduce the four modules in
In recent years, Gama [25] et al. discussed the role of graph convo­ detail and the processing strategy for graphs.
lution filters in GNNs, and characterizing GNNs by GSP can help us
understand it and thus further improve the design of GNNs. Isufi [26,27]
3.1. Graph construction module
et al. proposed EdgeNets and GTCNN, the EdgeNets can unify the latest
graph neural networks, and GTCNN proved excellent learning spatio­
The traditional graph neural network input can only be graph-
temporal representation by classification and regression tasks. Ruiz [28]
et al. proposed Gated Graph Recurrent Neural Networks, which can structured data. For telemetry data, we use the graph construction
module to calculate the relationship between feature nodes of the data
consider both the sequential structure of the data and the underlying
graph topology. The PinSage algorithm proposed by Ying [29] et al. and construct the graph adjacency matrix of feature nodes adaptively so
that the graph structure is optimal. In satellite telemetry time series data,
greatly improves the quality of node embedding.
The current graph neural network models can be broadly classified the change of one feature may cause the change of another feature, but
on the contrary, it may not cause the change, or the change amplitude is
into four categories [30]: recurrent graph neural networks (RecGNNs),
convolutional graph neural networks (ConvGNNs), graph autoencoder different. The relationship between data features is asymmetric and
bidirectional, so the adjacency matrix of feature nodes learned by our
(GAEs), and spatial–temporal graph neural networks (STGNNs). The
graph convolution neural networks can be divided into spectral-based graph construction module should be asymmetric. For the above prob­
and spatial-based. The spectral-based graph neural network is ineffi­ lem of the relationship between feature nodes of telemetry data, in this
cient in processing large graphs, and the spatial-based graph neural paper, the calculation formula of the relationship between feature nodes
network is more flexible and efficient. The idea of a spatial-based graph in the adjacency matrix is as follows:
convolution neural network is the same as that of a convolution neural D1 = sigmoid(αN1 θ1 ) (1)
network. The difference is that the graph convolution neural network

Fig. 2. The framework of the regression prediction model.

3
L. Xie et al. Measurement 180 (2021) 109546

Fig. 5. temporal dependency extraction module.

hop propagation layers added together, which will handle the spatial
dependence of telemetry time series data by fusing information from
feature nodes in the graph adjacency matrix and other feature nodes.
Fig. 3. Spatial dependency extraction module. And the structure is shown in Fig. 3. The inputs of the three mix-hop
propagation layers in the figure are B, C, D and Ein , respectively. B is
the adjacency matrix obtained from the graph construction module, C is
B obtained by the matrix primary diagonal symmetric transformation,
where Ci,j = Bj,i , Dis Bobtained by matrix sub-diagonal symmetric
transformation, whereDi,j = Bn− j+1,n− i+1 , n is the number of feature
nodes of satellite telemetry time series data, the inputs B, C and D allow
for adequate information dissemination and information selection. The
structure of each mix-hop propagation layer is shown in Fig. 4. The
vertical direction is used to propagate feature node information, and the
horizontal direction is used to extract node features.
The mix-hop propagation layer’s role is to disseminate and integrate
the input information; Its input is composed of the graph adjacency
matrix obtained from the graph construction module’s learning and the
output of the previous temporal dependency extraction module. The
following equation is used for the mix-hop propagation layer to fuse
feature nodes’ neighborhood information without adding noise and
Fig. 4. Mix hop propagation layer. without loss of information of feature nodes.

(6)
− 1
̃ (B + I)E(d−
E(d) = γEin + (1 − γ)T 1)
D2 = sigmoid(αN2 θ2 ) (2)
( ( ( ))) ∑
d
B = ReLu sigmoid α D1 DT2 − D2 DT1 (3) Eout = E(d) w(d) (7)
j=0
WhereN1 andN2 represents randomly initialized feature nodes
embeddings,αis the hyperparameter that controls the saturation rate of Where γ is the hyperparameter used to adjust the proportion of the
the activation function set prior to training, θ1 andθ2 are the model pa­ original feature node information, d is the depth of the propagation
rameters, which can be derived after training,D1 and D2 are intermediate layer, I is the unit matrix, Ein denotes the output of the previous temporal
variable, which represent the calculated values of the equation. B is the dependency extraction module of the neural network as input,
feature node weight adjacency matrix obtained from the GC mod­ Eout denotes the output of the current spatial dependency extraction
ule,sigmoidandReLuare the activation functions. module of the graph neural network, w(d) is the model parameter matrix,
From Eqs. (1)–(3), it can be seen that the relationship between the B is the graph adjacency matrix obtained from graph construction
computed telemetry time series data features is an asymmetric bidirec­ ̃− 1 = 1 +

module, where T Bij .
j
tional relationship, and the diagonal value of the graph adjacency matrix Eqs. (6) and (7) are the inputs for information propagation and in­
is 0. We use ReLu the activation function to regularize the graph adja­ formation selection calculation. Eq. (6) mitigates too many graph
cency matrix and convert the matrix’s negative values to 0. convolution layers, leading to excessive smoothing and failure to
The resulting adjacency matrix B is further processed. distinguish feature nodes, and Eq. (7) prevents the loss of feature node
id = topk(B[i, : ]), i ∈ 1, 2……N (4) information and selects the important features.

B[i, !id] = 0, i ∈ 1, 2……N (5)


3.3. Temporal dependency extraction module
Where N is the number of features of the data; the topk function is
used to obtain the subscript of the top k neighbor nodes with the largest Our temporal dependency extraction module consists of two
connection weight for each feature node, idis the subscript value extended initial convolution modules. Its function is to extract the time
obtained. characteristics of telemetry time series data, divide the modules by tanh
Eq. (4) is used to obtain the nearest previous neighbor node subscript and sigmoid activation functions, and then carry out product operation.
for each feature node, and Eq. (5) sets the remaining neighbor node The calculation result is the output of the time dependence extraction
connection weights to 0 for each feature node to transform the graph module, and the specific structure is shown in Fig. 5.
adjacency matrix into a sparse matrix, reduce the computational cost of One of the essential components of the temporal correlation
the whole model, and improve the model training rate. extraction module is the dilated convolution. It is based on increasing
the voids of the standard convolution kernel, thus greatly increasing the
3.2. Spatial dependency extraction module received field. The convolution kernel’s convolution operation can
obtain a larger range of information without increasing the model pa­
Our spatial dependence extraction module consists of three mixed- rameters. Yu et al. [31] made the first attempt to do semantic

4
L. Xie et al. Measurement 180 (2021) 109546

into four times deep feature maps after performing the convolution
operation. The dilated inception layer expression is as follows:
eout = cat(ein *g1x2 , ein *g1x3 , ein *g1x6 , ein *g1x12 ) (9)
where g1x2 ,g1x3 ,g1x6 and g1x12 denote the corresponding 1x2, 1x3, 1x6,
and 1x12 convolution kernels, respectively, and the cat function denotes
the splicing, ein the input, eout and the output.
After filtering the four convolution kernels in the inception structure,
the largest convolutional kernel 1 × 12 is used to align the output length
of the other three kernels’ filtered results. The final express formula of
dilated inception layer is as follows:
-1
k∑
ein *g1xk (t) = g1xk (s)ein (t − d × s) (10)
Fig. 6. The principle diagram of the dilated convolution. s=0

where d is the dilation rate, ein *g1xk (t) denotes the result after the
moment t is filtered by four convolution kernels.

3.4. Residual connection

The idea of the residual connection is to add inputs to the final


output. To some extent, the better the model performance as the layers
of the neural network stack up, but it can make the model suffer from
problems such as gradient explosion and gradient disappearance. For
example, during the gradient calculation of traditional neural networks,
there is a derivative of d(f)/d(x). When using the residual connection,
Fig. 7. inception structure.
the gradient is calculated with this partial derivative transformed into
d(f + x)/d(x) = 1 + d(f)/d(x). The calculation result will not approxi­
segmentation with dilated convolution. They demonstrated that dilated mate 0, which ensures effective propagation and no gradient disap­
convolution could improve semantic segmentation accuracy, is partic­ pearance.
ularly suitable for dense prediction, and can expand the reception field The idea of residual connection in this paper is to do a linear su­
without loss of resolution or coverage. The schematic diagram of the perposition of each temporal dependency extraction module’s output
expansion convolution is shown in Fig. 6. The ordinary’s reception field and add the result to the graph neural network’s final output. The output
3 × 3 is 9, and the figure shows the expansion convolution process with calculation formula of the model after adding residual connection is as
a void of 2. The reception field can reach 49, and the reception field has follows:
been greatly improved.
The telemetry time-series data sequence of the satellite is very long, ∑
k
y = f (x, ω) + ei (x) (11)
and the features of the data cannot be extracted effectively. The ordinary i=1
means is by increasing the depth of the network or increasing the filter;
these methods will increase the model’s complexity, so we use the Y = ReLU(y) (12)
dilated convolution to extract the time-series features. The calculation
Where ei (x) represents the convolution output of each time for
formula of the dilated convolution reception field is as follows:
mapping to the final output of the model by correlation operation, k
1 − Pn denotes the number of residual connections, f(x, ω) denotes the output
V = 1 + (a − 1) d (8)
1− p of the previous layer of the network, and finally, the final output is
obtained using the ReLuactivation function.
where a is the convolution kernel size, n is the number of 1D
Equations (11) and (12) show the calculation process of the residual
convolution kernels, d is the initial dilation rate, and p (p greater than 1)
connection. This paper’s residual connection does a 1D standard
is the rate at which the dilation rate grows.
convolution mapping of each temporal dependency extraction module
The features of the very long telemetry data time series are extracted
output to the model output. Finally, the output is transformed into a
by increasing the reception field in Eq. (8), while we adjust the reception
standard output format using standard convolution.
field by changing the magnitude of the growth rate p.
The inception structure in the temporal dependency extraction
module is designed to enhance the reception field from the convolution 3.5. Learning strategies
width perspective. The principle is to design a local topology network to
perform multiple convolution operations by splicing multiple convolu­ Due to many characteristic parameters in the telemetry time series
tion kernels of different sizes to obtain different reception fields while data, the graph adjacency matrix learned from the graph construction
fusing feature information. Compared to improving the model by module is too large, which occupies a lot of memory in the model
increasing the network’s depth, the inception structure does not make training process and greatly reduces the performance. In order to
the network structure over-parameterized and over-fitting, and the improve the ability of model learning to handle high-dimensional
computational complexity is greatly reduced. Considering that the time feature data, we divide the feature nodes into several sets randomly
interval of the telemetry series time data used in this paper is 1 min, and and learn the sub-graph structure separately, each time the division of
the period of satellite flight maybe 1 h, 12 h, 24 h, and 30 days, etc. feature nodes is random, so that each feature node may be in several
according to the periodic characteristics of satellite flight, the convo­ different sets, and each feature node can calculate the relationship with
lution kernels used in this paper are 1x2, 1x3, 1x6 and 1x12, respec­ other feature nodes through Eqs. (1)–(3), so that the space complexity of
tively, Fig. 7 shows the dilate inception layer. our graph neural network model can be greatly reduced, as shown in the
The four convolution kernels are connected and stitched together description of Algorithm 1.

5
L. Xie et al. Measurement 180 (2021) 109546

Fig. 8. Frame of the anomaly detection model.

Algorithm 1. (. Graph neural network framework learning algorithm) trend of the time series. The anomaly data in the telemetry time series
data are then determined. The default data used is normal for the first
Input: Satellite telemetry time series data training set D1 , feature node number N, sliding period; we calculate the dynamic threshold from the second period to
window size H, number of channels T, batch size b, step size s, learning rate η, training
compare with the anomaly score. The flow chart of the GNN-DTAN
algebraA
iter = 1, r = 1
model is shown in Fig. 8.
REPEAT Our GNN-DTAN model is divided into three steps as follows: data
sample a batch from D1 :X ∈ RbxTxNxH Y ∈ RbxT’xN preprocessing, building graph neural network model, and anomaly
Split Vinto c groups, Uc(i=1) Vi = V detection using dynamic threshold method with the sliding time win­
IF iter%s==0 and r < T then
′ dow. The processed satellite time series data is input to the graph neural
r =r+1 network model, and the trained graph neural network model is saved by
END IF accurately fitting multiple parameters of the satellite timing data
FOR i = 1: c DO through training. The test set is input to the trained graph neural
forward propagation.
network model to predict the values of all parameters at a certain
compute the predicted value Y.
̂
moment so as to calculate the anomaly score at that moment. The data
compute the loss.
period is obtained by wavelet analysis. Then the dynamic threshold al­
back propagation updates the model parameters θ.
END FOR
gorithm is used to calculate the threshold value for that moment. The
UNTIL convergence threshold value and the anomaly score are compared to determine
whether the data is an anomaly.

Where lines 3–4 use a batch of data sets for each training, and then
4.1. Satellite data period analysis
the feature nodes are randomized into c groups; lines 5–7 record the
number of groups divided, and lines 8–13 are the model training
The satellite orbits around the Earth, so the change of telemetry time
process.
series data parameters is periodic. The data at each moment is closely
related to the previous period, and it is important to analyze the period
4. Dynamic threshold anomaly detection method of telemetry time series data accurately. Telemetry time series data
belong to time series, and the common basic forms of time series are time
Considering the complex environment in which satellite sensors domain and frequency domain. Still, the time series of telemetry data is
collect data and the fact that satellite telemetry time series data are not stable, and the period cannot be obtained directly by time domain or
generally periodic in nature, fixed thresholds are not good enough to frequency domain analysis. Wavelet variance analysis is a period anal­
determine whether the data are anomalies or not. We extract a dynamic ysis method with multiple discriminations in the time and frequency
threshold anomaly detection (GNN-DTAN) method based on a periodic domains. Compared with the traditional Fourier variation time­
time sliding window. The weighted average of the previous period –frequency window analysis period, wavelet variance analysis can be
anomaly score is calculated for each moment, and the corresponding adaptive to time–frequency signal analysis requirements, thus obtaining
multiple is magnified as the threshold value for that moment. The more details of the signal. Therefore, we use wavelet variance analysis to
exponential moving weighted average can better reflect the changing accurately obtain the telemetry data’s variation period to provide the

6
L. Xie et al. Measurement 180 (2021) 109546

period size for the dynamic thresholding method based on the period 4.2. Anomaly measurement
sliding time window. The following describes the important parts of
wavelet analysis in turn: wavelet function, wavelet transform, and In this paper, the trained graph neural network regression prediction
wavelet variance. model is trained using telemetry time series data samples from normal
satellite operations, and the trained model is preserved. By feeding the
(1) Wavelet function test set data into the trained model, the deviation of the model’s pre­
dicted values from the actual data will be large when anomalies occur in
One of the keys to wavelet variance analysis is the wavelet function. the satellite. Considering that the telemetry time-series data are high-
Set the base wavelet function to ψ (t); the basic wavelet function needs to dimensional data and the different scales and units of the time-series
satisfy the integral over [ − ∞, +∞] to be 0. A family of cluster wavelet data parameters collected by different satellite sensors, we use the
functions is obtained by stretching and translational transformation, as Mahalanobis distance as an anomaly score to measure the magnitude of
Eq. (13) shown: the deviation as a measure of the satellite state. Eq. (18) is used to
( ) calculate a sample point’s Mahalanobis distance from the whole sample.
t− j
ψ i,j (t) = |i|− 1/2 ψ (13) √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
i
Dm = (x − μ)T Σ− 1 (x − μ) (18)
Where i,j ∈ R,i ∕
= 0, ψ i,j (t)denotes the transformed daughter wavelet;
Where x represents a sample point, x = (x(1) , x(2) , ……, x(n) )T . n
i is the scale expansion factor, which measures the wavelet period
represents the number of features of the data, μ represents the vector
length; and j is the translation factor, which measures the translation in
consisting of the mean of each feature of x and the sample data with the
time.
The choice of the basic wavelet function plays a crucial role in same features, μ = (μ1 , μ2 , ……, μn )T . Σ− 1 represents the inverse matrix
wavelet analysis and will directly determine the analysis results. of the covariance matrix of the sample space.
For the Mahalanobis distance between the predicted value of the
(2) Wavelet transform graph neural network and the actual sample, the calculation formula is
as follows:
Setting the energy finite signal f(t) ∈ L2 (R), combined with the √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
daughter wavelet in Eq. (13), the continuous wavelet transform for Score(t) = (x(t) − P(t))Σ−X 1 (x(t) − P(t))T (19)
continuous type data can be obtained as follows: where x(t)denotes the actual value of the test set samples at time t,
∫ ( ) T
Wf (i, j) = |i|− 1/2 f (t)ψ
t− j
dt (14) x(t) = (x(1)
t , xt , ……, xt ) . ΣX
(2) (n) − 1
denotes the inverse matrix of the
R i covariance matrix of the previous period’s telemetry dataset at time t. In
particular, when calculating the anomaly scores for all moments of the
where Wf (i, j) is the wavelet coefficients, i is the scale expansion
factor,j is the translation factor, ψ (t) and ψ (t) are conjugate functions of first period, Σ−X 1 it denotes the inverse matrix of the covariance matrix of
each other, and f(t) is the signal function that satisfies the product in the the first period’s dataset. P(t) is the predicted value of the graph neural
T
range of real numbers. network model at time t, P(t) = (P(1)t , Pt , ……, Pt ) .
(2) (n)

The telemetry time-series data used in this paper belongs to discrete Eq. (19) can calculate the anomaly score of telemetry series time data
data, then the representation of wavelet transform for discrete data is as at each moment. Set the threshold value of t moment as threshold(t),
follows: when the anomaly score of a moment exceeds the threshold value of that
( ) moment, the data is judged to be an anomaly, and the opposite is

N
Wf (i, j) = |i|− 1/2
Δt f (kΔt)ψ
kΔt − j
(15) considered normal. The description formula is as follows:
k=1
i {
0, score(t) ⩽ threshold(t)
lab(t) = (20)
where Δt is the sampling interval of the satellite telemetry time series 1, score(t) > threshold(t)
data, and f(kΔt), k = 1, 2, ⋯, N is the signal function of the discrete data.
In this paper, anomaly data are marked as 1, and normal data are
(3) Wavelet variance marked as 0 by Eq. (20).

The wavelet variance is obtained by integrating the squared values of 4.3. Anomaly detection method
the wavelet coefficients over the time translation axis as follows:
∫ +∞ The telemetry time series data anomaly detection method based on
Var(i) = 2
|Wf (i, j)| dj (16) the dynamic period sliding time window uses wavelet variance analysis
− ∞
to obtain the satellite data period. Then, the data at each moment is
Similarly, the wavelet variance formula for the discrete satellite different from the data in the previous period, and the correlation is
telemetry time-series data in this paper can is: higher the closer the time is. Hence, we calculate the threshold value
based on the exponential moving weighted average of the anomaly score
1 ∑N
Var(i) = |Wf (i, jk )|2 (17) of the previous period at each moment. The exponential weighted
N k=1
moving average sets the size of the data weights at each moment in the
The wavelet variance analysis period is first calculated according to previous period by decreasing exponentially. The further away from a
Eq. (17), and the wavelet variance plot is drawn. The horizontal axis distance, the smaller the set weights are. It can reflect the trend of the
coordinates of the wavelet variance plot represent the stretching scale time series very well. The calculation formula of the threshold at each
and time translation change, while the vertical axis coordinates repre­ moment is as follows:
sent the wavelet variance. We look at the highest point of the wavelet
Vt = βVt− 1 + (1 − β)θt (21)
variance plot. The corresponding wavelet variance at the highest point is
the period of the telemetry time series data. Ut = Vt ∙(1 + α) (22)
Where θt denotes the anomaly score at time t, β denotes the weighted
rate of decline, the larger the value of β the slower the decline, Vt

7
L. Xie et al. Measurement 180 (2021) 109546

denotes the exponential weighted moving average at time t, α is the scale The recall is the proportion of true-positive samples to those that are
factor, and Ut denotes the threshold at time t. actually positive; the calculation formula is as follows:
Eq. (21) shows the formula for calculating the exponential moving
TP
weighted average, generally initializedV0 = 0; the weights are bounded Recall = (23)
TP + FN
by 1/e and weights less than 1/e is identified as 0. We use all data from
the previous period at each moment to calculate the exponential The precision rate is the proportion of true positive samples to those
weighted moving average, and the weights are not bounded. Eq. (22) is predicted to be positive; the calculation formula is as follows:
used to calculate the threshold value at each moment, and a part of the TP
sample set is divided as the validation set, which is used to continuously Precision = (24)
TP + FP
debug the hyperparameter α until a better anomaly detection effect is
achieved. The accuracy is the proportion of true positive samples and true
The steps of the dynamic threshold method based on periodic time negative samples to all samples; the calculation formula is as follows:
window are as follows: ①Wavelet variance analysis to obtain the period TP + TN
of telemetry data. ②The trained graph neural network model is used to Accuracy = (25)
TP + TN + FP + FN
predict the sample values at each moment, and then the anomaly scores
The false alarm rate is the proportion of false-positive samples to
at each moment are calculated according to Eq. (19).③The threshold
those predicted to be positive; the calculation formula is as follows:
value of each moment is calculated according to Eqs. (21)–(22). The
anomaly score of each moment is compared with the threshold value to FP
Error = (26)
determine whether the telemetry data is outliers. The specific process, as TP + FP
described in algorithm 2.
F1 is calculated from recall and precision and is a composite metric
Algorithm 2: Sliding period time window dynamic threshold method
for anomaly detection.
Input: Satellite telemetry time-series test data set D2 , data set size n, training a good
graph neural network model F(∙) 2Recall*Precision 2TP
F1 = = (27)
Output: predicted results Result Recall + Precision 2TP + FP + FN
FOR i = 1:n DO
The above five indicators, recall, precision, accuracy, and the four
compute the predicted value Y ̂ = F(D2 (i))
compute the anomaly score according to Eq. (19) Score(i)
indicators, the larger, the better. The lower the false alarm rate, the
END FOR better.
FOR i = 1:n DO
compute the moving weighted average of the previous period’s anomaly score index at 5.2. Data preprocessing
each moment V(i)
compute the threshold value according to equation (22)threshold(i)
IFScore(i) > threshold(i)THEN
Since the data are collected by sensors in orbit, with strong radiation
Result(i) = 1 from outer space and a harsh environment with many interfering fac­
ELSE tors, resulting in a lot of noise in the data collected by the satellite
Result(i) = 0 sensors, some data loss occurs in the raw data. Hence, the data need to
END IF be cleared and denoised.
END FOR
return Result
Wavelet threshold denoising has the characteristics of low entropy,
multi-resolution properties, de-correlation, and diversity of wavelet
basis selection and has the advantages of small computation and flexi­
bility. Hence, we use wavelet threshold denoising to process telemetry
Line 1–4 are used to calculate the predicted value and anomaly score for time series data. The wavelet threshold denoising principle is to
each moment. Line 5–13 are used to calculate the threshold value and distinguish the real signal from the noise by the difference in amplitude
compare it with the anomaly score to determine whether anomaly or of wavelet coefficients. The real signal frequency is higher than the noise
not. frequency, and the noise can be removed by setting the threshold. The
wavelet threshold denoising steps are described below.
5. Experiments Setting the noise-containing signal as f(a, b), and the wavelet
decomposition coefficient Wf (a,b) (s,b), the noise variance is estimated as:
In this model, we are the first to apply graph neural networks to the
(⃒ ⃒)/
field of anomaly detection in satellite telemetry time series data. The ⃒ ⃒
σ na,b (s, b) = median ⃒Wfa,b (s, b)⃒ 0.647
̂ (28)
programming environment in the experiment is the win10 operating
system python3.6.1 and pytorch1.2.0. The experiment is performed
where s denotes the number of layers of signal orthogonal redundant
using the satellite power subsystem’s time-series data, and the data
wavelet variations and σ na,b denotes the noise variance in each direction
feature name is hidden. The data contains 57 features in total, expressed
for each decomposition layer.
as X = (x1 , x2 ……x57 ). The data set has 170,911 data; there are 7588
Then we can get the wavelet coefficient variance is:
anomalies.
N(b)

1
5.1. Evaluation indicators σ fa,b (s, b) =
̂ Wf2a,b (s, b) (29)
N 2 (b) a,b=1

Anomaly detection can be understood as a dichotomous classifica­ √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅


( )̅
tion problem, divided into normal class and anomaly class. The anomaly σ ha,b (s, b) =
̂ σ 2fa,b (s, b) − ̂
max ̂ σ 2na,b (s, b), 0 (30)
class is marked as 1, and the normal class is marked as 0. Since the
number of normal data in satellite telemetry data is much larger than the
Where ̂σ ha,b (s, b) is the of the wavelet coefficient variance, and since
anomaly data, we cannot use only the accuracy as an evaluation indi­
cator like the general classification problem. This paper uses five eval­ they belong to the Gaussian distribution, we can obtain ̂ σ fa,b (s, b).
uation indicators: recall, precision, accuracy, false alarm rate, and F1. Based on the above equation, a new threshold expression can be
Where TPis true positive samples, FPis false-positive samples, FNis false obtained as follows:
negative samples, and TNis true negative samples.

8
L. Xie et al. Measurement 180 (2021) 109546

Table 1 thresholding function after denoising, and the smoothness is better.


Wavelet threshold denoising parameter setting. The 57 features of the original satellite telemetry time series data are
denoted as X = (x1 , x2 ……x57 ) after wavelet threshold denoising to
′ ′ ′ ′
Parameters Setting

Wavelet function Daubechies8


remove the noise. The variation of the data after the wavelet threshold
Threshold function soft
denoising the original data from 0 to 3000 moments for three parame­
Decomposition order 7
ters is shown in Fig. 9. It can be seen that the variation curve of the data
after noise removal is smoother compared to the curve of the original
data, indicating that the noise is removed.
(√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
( )̅/ )
Li σ na,b (s, b)2
̂ Wavelet variance was used to analyze the period of the telemetry
T(s, b) = log 4.08 (31) time series data of this experiment. Fig. 10 shows the wavelet variance
b σ ha,b (s, b)2
̂
plots of attributes x1 and x2 .
where Li denotes the wavelet coefficient length of the ith decompo­ It can be seen that the highest values of the wavelet variance plots are
sition layer and b is the number of layers of the wavelet decomposition. 1449 and 1436, respectively. Considering that the interval of telemetry
The following equation calculates the wavelet coefficients: time-series data acquisition for this experiment’s satellite power system
( ) is 1 min, the data period is 1440 min, which is one day.
̃ fa,b (s, b) = SOFT Wfa,b (s, b)
W (32)
5.3. Comparison methods and model settings
where SOFT is the soft threshold processing function.
The wavelet W
̃ f (s, b) is inverted by wavelet inversion to obtain the
a,b To test the GNN-DTAN model’s effectiveness in this paper, we will
data after noise removal. The parameters setting for wavelet threshold experimentally compare fixed-threshold anomaly detection algorithms
denoising are shown in Table 1, where DaubechiesN is called a tightly- based on graph neural networks and two classical models, and three
branched orthogonal wavelet. The larger N is, the better the smooth­ latest anomaly detection models in the field of anomaly detection.
ness after noise removal and the better the frequency band division. The fixed-threshold-GNN: The graph neural network’s parameter settings
soft thresholding function avoids the local jitter generated by the hard are kept consistent with the algorithm model proposed in this paper.

Fig. 9. Comparison diagram of partial parameter wavelet denoising.

Fig. 10. The diagram of wavelet variance.

9
L. Xie et al. Measurement 180 (2021) 109546

Table 2 Table 3
GNN-DTAN model parameter. Anomaly detection index results.
Parameters Setting Method Recall Precision Accuracy False F1
alarm
Input Sequence Length 90
Output Sequence Length 1 fixed-threshold- 70.19% 68.70% 96.82% 31.30% 69.44%
Nodes Number 57 GNN
Step size 100 OC-SVM 68.66% 53.51% 95.33% 46.49% 60.10%
optimizer adam AMSD-KNN 65.33% 54.63% 95.44% 45.37% 59.50%
dropout rate 0.3 OmniAnomaly 75.23% 81.12% 97.83.% 18.88% 78.06%
Dilation exponential 3 ADDICT 73.20% 74.32% 97.33% 25.68% 73.76%
Number of neighbor nodes per node 20 LSTM-NDT 81.32% 79.47% 97.95% 20.53% 80.38%
Learning rate 0.001 GNN-DTAN 85.30% 82.31% 98.30% 17.69% 83.79%
Weight decay rate 0.0001
Activation function ReLu
experiment’s the validity and fairness, the parameters of the compared
algorithm models are adjusted as much as possible to make the best
Determine the optimal threshold according to the ROC curve, and then results in the telemetry time-series data set anomalies of this experi­
detect anomaly data. ment. OmniAnomaly uses the adam optimizer. The learning rate is set to
OC-SVM: Support vector machine is a classical machine learning 0.001. The activation function and other parameters are set the same as
classification algorithm. Ma et al. [32] proposed a single-class support possible, and the experimental environment is kept consistent.
vector machine algorithm OC-SVM for anomaly detection of time series
and applied the algorithm model to the anomaly detection task of the 5.4. Experimental results
experimental data.
AMSD-KNN: Sarmadi et al. [33] proposed an unsupervised anomaly The processed data are input into the graph neural network model
detection model with adaptive Mahalanobis-squared distance and k- for training, and the first period of the test set data of the experiment is
nearest neighbor algorithm. The adaptive Mahalanobis-squared distance normal data without detection. After the first period, each moment
is used to measure the variability between data and classification, find threshold value is calculated by Eqs. (21)–(22). Fig. 11 shows the
the nearest neighbor classification by a multivariate normality hypoth­ comparison plots of anomaly scores and thresholds for each moment in
esis test, and then model the block maximum as a generalized extreme two representative periods. Table 3 shows the evaluation index values
value distribution to determine the threshold. for all algorithmic models’ anomaly detection effectiveness, and the
OmniAnomaly: Su et al. [34] used the technology of stochastic var­ indicators are calculated, as shown in Eqs. (23)–(27).
iable connection and planar normalized flow to study the normal dis­ The threshold value for each moment in the first period selected in
tribution of multivariate time series data. They combined a variational Fig. 11 is higher than the anomaly score, so all data detection results in
autoencoder and a gated recurrent unit to build a stochastic recurrent normal for that period. In the first half of the second period, the teach
neural network model and then used it to reconstruct multivariate data. moment’s threshold value. The data detection result is normal in that
Finally, the anomalies are determined according to the reconstruction period. The threshold value is less than the anomaly score in the second
probability. half of the period. The data detection result is an anomaly in that period.
ADDICT: Pilastre et al. [35] used a telemetry time-series data From the two figures, we can see that the anomaly scores at each
anomaly detection method based on sparse representation and dictio­ moment have great fluctuations and there are some peaks, and the dy­
nary learning to capture the correlation between data features by pro­ namic threshold anomaly detection based on the period time window is
cessing all high-dimensional data features together. more appropriate.
LSTM-NDT: Kyle et al. [2] proposed an LSTM-based model for Comparing the indicators in Table 3 shows that the GNN-DTAN
anomaly detection of time-series data, a temporal recurrent neural model proposed in this paper performs better than the other models. It
network for predicting data and then detecting whether the data are is impossible to distinguish the effect of anomaly detection based on
anomalous using a dynamic thresholding approach. accuracy, so the multiple indexes used in this paper to evaluate the index
For this experiment, we propose the model of GNN-DTAN that con­ system are more reasonable.
tains 4 GC modules, 4 SDE modules, and 4 TDE modules. The parameter Among the seven algorithm models, OC-SVM and AMSD-KNN, two
settings are shown in Table 2. and the input sequence of each time is 90 traditional machine learning methods, are less accurate in anomaly
min of telemetry time series data. The number of nodes represents the detection than the other models, which indicates that the deep learning
number of features of the data, the number of neighboring nodes per anomaly detection method has advantages for telemetry time series data
node represents the weight of the 20 closest neighboring nodes kept by with large data volume and high dimensions.
each feature node in the GC module, and the rest is set to 0. The dilation
rate refers to the number of intervals of the kernel. To ensure

Fig. 11. Anomaly detection diagram.

10
L. Xie et al. Measurement 180 (2021) 109546

6. Conclusion [10] S. Benkabou, K. Benabdeslem, B. Canitia, Unsupervised outlier detection for time
series by entropy and dynamic time warping[J], Knowledge and Information
Systems 54 (2) (2018) 463–486.
We proposed an anomaly detection model GNN-DTAN based on [11] C. Dai, D. Pi, S. Becker, J. Wu, L. Cui, B. Johnson, CenEEGs: Valid EEG Selection for
graph neural network and dynamic thresholding of periodic time win­ Classification[J], ACM Transactions on knowledge discover from data 14 (2)
dows for high-dimensional telemetry time-series data. A novel method (2020) 1–25.
[12] Das, S., Matthews, B., Srivastava, A., & Oza, N. Multiple Kernel Learning for
calculates the relationship between feature nodes to transform high- Heterogeneous Anomaly Detection: Algorithm and Aviation Safety Case Study[C]
dimensional time-series data into graph data. Considering the periodic //Acm Sigkdd International Conference on Knowledge Discovery & Data Mining.
characteristics of satellite rotation around the Earth, the dynamic ACM, (2010) 47-56.
[13] V.M. Janakiraman, B. Matthews, N. Oza, Finding Precursors to Anomalous Drop in
threshold is also calculated using the periodic sliding time window and Airspeed During a Flight’s Takeoff[C], //Acm Sigkdd International Conference.
the exponential moving weighted average, which improves the detec­ ACM (2017) 1843–1852.
tion effect of telemetry time series data anomaly. [14] Ishimtsev, V., Bernstein, A., Burnaev, E., & Nazarov, I. Conformal k-NN Anomaly
Detector for Univariate Data Streams[C] //In Proceedings of the Sixth Workshop
Future research directions are as follows: (1) Consider different dy­ on Conformal and Probabilistic Prediction and Applications, (2017) 213–227.
namic threshold setting methods to improve anomaly detection effec­ [15] Breunig, M., Kriegel, H., Ng, R., & Sander, J. LOF: Identifying Density-Based Local
tiveness. (2) Extending our model to other temporal data for anomaly Outliers[C] //Acm Sigmod International Conference on Management of Data.
ACM, (2000) 93-104.
detection to increase the generalizability of the model. (3) How to [16] J. Schumann, K.Y. Rozier, T. Reinbacher, O. Mengshoel, T. Mbaya, C. Ippolito,
further reduce the time cost and achieve anomaly detection for high- Towards Real-time, On-board, Hardware-supported Sensor and Software Health
dimensional spatial time series data. Management for Unmanned Aerial Systems[J], International Journal of
Prognostics and Health Management 6 (21) (2015) 1–27.
[17] E. Khalastchi, M. Kalech, G. Kaminka, R. Lin, Online data-driven anomaly detection
CRediT authorship contribution statement in autonomous robots[J], Knowledge and Information Systems 43 (3) (2015)
657–688.
[18] I. Melnyk, A. Banerjee, B. Matthews, N. Oza, Vector Autoregressive Model-Based
Lingqiang Xie: Methodology, Software, Writing - original draft, Anomaly Detection in Aviation Systems[J], Journal of Aerospace Information
Visualization. Dechang Pi: Conceptualization, Writing - review & Systems 13 (4) (2016) 1–13.
editing, Resources, Funding acquisition. Xiangyan Zhang: Formal [19] H.N. Akouemo, R.J. Povinelli, Probabilistic anomaly detection in natural gas time
series data[J], International Journal of Forecasting 32 (3) (2016) 948–956.
analysis. Junfu Chen: Investigation. Yi Luo: Validation. Wen Yu:
[20] S. Abdelghafar, A. Darwish, A. Hassanien, M. Yahia, A. Zaghrout, Anomaly
Supervision. detection of satellite telemetry based on optimized extreme learning machine[J],
Journal of space safety engineering 6 (4) (2019) 291–298.
[21] Kieu, T., Yang, B., & Jensen, C.S. Outlier Detection for Multidimensional Time
Declaration of Competing Interest Series Using Deep Neural Networks[C] //2018 19th IEEE International Conference
on Mobile Data Management (MDM). IEEE, (2018) 125-134.
[22] X. Li, T. Zhang, Y. Liu, Detection of Voltage Anomalies in Spacecraft Storage
The authors declare that they have no known competing financial Batteries Based on a Deep Belief Network[J], Sensors (Basel, Switzerland) 19 (21)
interests or personal relationships that could have appeared to influence (2019) 4702.
the work reported in this paper. [23] C. Junfu, P. De-chang, W. Zhiyuan, Z. Xiaodong, Y. Pan, Q. Zhang, Imbalanced
satellite telemetry data anomaly detection model based on Bayesian LSTM[J], Acta
Astronautica. 180 (2021) 232–242.
References [24] D. Pan, Z. Song, L. Nie, B. Wang, Satellite Telemetry Data Anomaly Detection Using
Bi-LSTM Prediction Based Model[C] //2020 IEEE International Instrumentation
and Measurement Technology Conference (I2MTC), IEEE (2020) 192–196.
[1] R. Tu, R. Zhang, L. Fan, J. Han, P. Zhang, W. Xing-xing, J. Hong, J. Liu, X. Lu, Real-
[25] F. Gama, E. Isufi, G. Leus, A. Ribeiro, Graphs, Convolutions, and Neural Networks:
time monitoring of the dynamic variation of satellite orbital maneuvers based on
From Graph Filters to Graph Neural Networks[J], IEEE Signal Processing Magazine
BDS observations, Measurement 168 (2021), 108331, https://doi.org/10.1016/j.
37 (6) (2020) 128–138.
measurement.2020.108331.
[26] Isufi, E., Gama, F., & Ribeiro, A. EdgeNets: Edge Varying Graph Neural Networks,
[2] Hundman, K., Constantinou, V., Laporte, C., Colwell, I., & Söderström, T. Detecting
ArXiv Prepr. ArXiv2001.07620, 2020.
Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding[C]
[27] E. Isufi, G. Mazzola, Graph-Time Convolutional Neural Networks, ArXiv Prepr.
//Proceedings of the 24th ACM SIGKDD International Conference on knowledge
ArXiv2003.01730 (2021).
discovery & data mining. KDD, (2018) 387-395.
[28] L. Ruiz, F. Gama, A. Ribeiro, Gated Graph Recurrent Neural Networks[J], IEEE
[3] H. Jiang, K. Zhang, J. Wang, X. Wang, P. Huang, Anomaly Detection and
Transactions on Signal Processing 68 (2020) 6303–6318.
Identification in Satellite Telemetry Data Based on Pseudo-Period[J], Applied
[29] Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W.L., & Leskovec, J. Graph
Sciences 10 (1) (2019) 103.
Convolutional Neural Networks for Web-Scale Recommender Systems[C] //
[4] J. Pang, D. Liu, Y. Peng, X. Peng, Anomaly Detection for Satellite Telemetry Series
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge
with Prediction Interval Optimization[C], //2018 International Conference on
Discovery & Data Mining, (2018).
Sensing, Diagnostics, Prognostics, and Control. SDPC (2018) 408–414.
[30] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, P.S. Yu, A comprehensive survey on
[5] A. Valmorbida, M. Mazzucato, M. Pertile, Calibration procedures of a vision-based
graph neural networks[J], IEEE Transaction on Neural Networks and Learning
system for relative motion estimation between satellites flying in proximity,
Systems 32 (1) (2020) 1–21.
Measurement 151 (2019), 107161, https://doi.org/10.1016/j.
[31] F., & Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions[C] // 4th
measurement.2019.107161.
International Conference on Learning Representations. ICLR, (2016).
[6] C. Sun, S. Chen, E. Mingzhang, Y. Du, C. Ruan, Satellite Micro Anomaly Detection
[32] Ma, J., & Perkins, S. Time-series novelty detection using one-class support vector
Based on Telemetry Dat.[C], in: 2020 IEEE 9th Data Driven Control and Learning
machines[C] //Neural Networks, 2003. Proceedings of the International Joint
Systems Conference (DDCLS), 2020, pp. 140–144.
Conference on. IEEE, (2003) 1741-1745.
[7] Chen, Y., Wang, K. Prediction of Satellite Time Series Data Based on Long Short
[33] H. Sarmadi, A. Karamodin, A novel anomaly detection method based on adaptive
Term Memory-Autoregressive Integrated Moving Average Model (LSTM-ARIMA)
Mahalanobis-squared distance and one-class kNN rule for structural health
[C] //2019 IEEE 4th International Conference on Signal and Image Processing
monitoring under environmental effects, Mechanical systems and signal processing
(ICSIP). IEEE, (2019) 308-312.
140 (2020), 106495.
[8] Zhang, L., Yu, J., Tang, D., Han, D., Tian, L., & Dai, J. Anomaly Detection for
[34] Y. Su, Y. Zhao, C. Niu, R. Liu, W. Sun, D. Pei, Robust Anomaly Detection for
Spacecraft using Hierarchical Agglomerative Clustering based on Maximal
Multivariate Time Series through Stochastic Recurrent Neural Network[C], in:
Information Coefficient[C] //2020 15th IEEE Conference on Industrial Electronics
//the 25th ACM SIGKDD International Conference, 2019, pp. 2828–2837.
and Applications (ICIEA). IEEE, (2020) 1848-1853.
[35] Pilastre, B., Boussouf, L., D’escrivan, S., & Tourneret, J. Anomaly Detection in
[9] L. Li, R. Hansman, R. Palacios, R. Welsch, Anomaly detection via a Gaussian
Mixed Telemetry Data Using a Sparse Representation and Dictionary Learning[J].
Mixture Model for flight operation and safety monitoring - ScienceDirect[J],
Signal Processing, 168 (2019) 107320.
Transportation Research Part C: Emerging Technologies 64 (2016) 45–57.

11

You might also like