You are on page 1of 19

RESEARCH ARTICLE Synthesis-Style Auto-Correlation-Based Transformer: A

10.1029/2023SW003472
Learner on Ionospheric TEC Series Forecasting
Key Points:
Yuhuan Yuan1, Guozhen Xia1 , Xinmiao Zhang1, and Chen Zhou1
• T otal electron content (TEC) Data
augmentation: synthesize the TEC Department of Space Physics, Wuhan University, Wuhan, China
1
sample by feeding the selected
original TEC map data set into a
variational auto-encode model
• Training Auto-correlation-based Abstract Accurate 1-day global total electron content (TEC) forecasting is essential for ionospheric
transformer and Transformer models monitoring and satellite communications. However, it faces challenges due to limited data and difficulty
using the imitation samples without in modeling long-term dependencies. This study develops a highly accurate model for 1-day global TEC
any further action on fine-tuning
• Improved the accuracy of the forecasting. We utilized generative TEC data augmentation based on the International Global Navigation
predictive auto-correlation-based Satellite Service (IGS) data set from 1998 to 2017 to enhance the model's prediction ability. Our model takes
transformer models through data the TEC sequence of the previous 2 days as input and predicts the global TEC value for each hourly step of
augmentation
the next day. We compared the performance of our model with 1-day predicted ionospheric products provided
by both the Center for Orbit Determination in Europe (C1PG) and Beihang University (B1PG). We proposed
Correspondence to: a two-step framework: (a) a time series generative model to produce realistic synthetic TEC data for training,
X. Zhang and C. Zhou,
and (b) an auto-correlation-based transformer model designed to capture long-range dependencies in the TEC
zxmwhu@whu.edu.cn;
chenzhou@whu.edu.cn sequence. Experiments demonstrate that our model significantly improves 1-day forecast accuracy over prior
approaches. On the 2018 benchmark data set, the global root mean squared error (RMSE) of our model is
Citation: reduced to 1.17 TEC units (TECU), while the RMSE of the C1PG model is 2.07 TECU. Reliability is higher in
Yuan, Y., Xia, G., Zhang, X., & Zhou, C. middle and high latitudes but lower in low latitudes (RMSE < 2.5 TECU), indicating room for improvement.
(2023). Synthesis-style auto-correlation- This study highlights the potential of using data augmentation and auto-correlation-based transformer models
based transformer: A learner on
trained on synthetic data to achieve high-quality 1-day global TEC forecasting.
ionospheric TEC series forecasting. Space
Weather, 21, e2023SW003472. https://
doi.org/10.1029/2023SW003472 Plain Language Summary The ionosphere is prone to complex fluctuations that are difficult
to predict. To enhance the 1-day global total electron content (TEC) prediction ability, we first augmented
Received 2 MAR 2023 the limited International Global Navigation Satellite Service TEC data set by generating synthetic data sets.
Accepted 5 OCT 2023
These synthetic data sets can better capture the intricate patterns in TEC fluctuations. Then, we built an
auto-correlation-based transformer to model the temporal dependencies in TEC by presenting series-wise
Author Contributions:
connections between time steps. Experimental results demonstrate that our proposed model is highly effective
Data curation: Yuhuan Yuan
Funding acquisition: Chen Zhou
in predicting global TEC compared with the Transformer model, prior C1PG model, and B1PG model.
Investigation: Yuhuan Yuan
Methodology: Yuhuan Yuan
Resources: Chen Zhou 1. Introduction
Software: Yuhuan Yuan, Guozhen Xia
Supervision: Chen Zhou Ionospheric total electron content (TEC) is significant for Global Navigation Satellite Service (GNSS), GPS
Validation: Yuhuan Yuan
signal propagation and applications. 1 TEC units (TECU) corresponds to a 0.163 m range delay of an L1
Visualization: Yuhuan Yuan
Writing – original draft: Yuhuan Yuan frequency (1.57542 GHz) (Saito & Yoshihara, 2017). Radio waves propagating in the ionosphere experience a
Writing – review & editing: Yuhuan delay in group velocity and advance in phase velocity due to the electrons in the ionosphere. The ionospheric
Yuan, Guozhen Xia, Xinmiao Zhang
delay is proportional to the ionospheric TEC along the propagation path. Moreover, the L1 frequency used in GPS
systems has marginal sensitivity to TEC changes. This low sensitivity means that even small errors in predicted
TEC can translate into much larger ranging errors, compounding the challenges of accurate TEC forecasting
for GPS applications. Global Ionosphere Maps (GIMs) are generated on a daily basis by using data from more
than 600 GNSS sites of the International GNSS Service (IGS) and other institutions. The Differential Code Bias
(DCB) can be estimated and eliminated by the fitting process (Q. Liu et al., 2021). However, for actual data,
© 2023 The Authors. DCBs cannot be calculated properly and hence the error could increase significantly. Industrial applications rely
This is an open access article under
the terms of the Creative Commons on good modeling and prediction of TEC including satellite navigation (Ratnam et al., 2018), precise point posi-
Attribution-NonCommercial License, tioning (Prol et al., 2018; Z. Li et al., 2019), and time-frequency transmission (Béniguel & Hamel, 2011). For the
which permits use, distribution and above, despite modeling long-term dependency for TEC is hard, researchers in different societies that is, space
reproduction in any medium, provided the
original work is properly cited and is not physics and remote sensing proposed various works of literature for TEC forecasting (Feng et al., 2019; Kaselimi
used for commercial purposes. et al., 2020; Nath et al., 2023; Zhang et al., 2023; Zhao et al., 2019).

YUAN ET AL. 1 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Recently there have been mainly two directions of the work for forecasting global TEC maps using learning-based
methods. One direction follows the pipeline that first predicts the spherical harmonic (SH) coefficients and then
expands them to complete TEC maps. For example, C. Wang et al. (2018) proposed an adaptive autoregressive
model to predict the SH coefficients used in TEC map fitting, while Iyer and Mahajan (2023) used both linear
and polynomial autoregression coefficients of recent past data to forecast TEC over equatorial regions. L. Liu
et al. (2022) adopted a long short-term memory (LSTM) network to forecast the SH coefficient to predict the
TEC maps further.

Another stream of work lies in forecasting a sequence of global TEC maps following past given TEC maps with-
out introducing any prior information. Monte-Moreno et al. (2022) used a nearest-neighbor algorithm to search
the historical database for the dates of the maps closest to the current map and used a prediction of the maps in the
database. L. Liu et al. (2020) adopted a convolutional neural network to extract features from past TEC maps, then
predicted the future TEC maps based on the extracted features. Q. Li et al. (2022), Chen et al. (2019), and K. Yang
and Liu (2022) proposed a generative adversarial network for TEC forecasting, which were composed of a gener-
ator to generate maps that are indistinguishable from real TEC maps and a discriminator trying to distinguish
between the generated maps and real maps. This deep learning method can generate satisfactory ionospheric peak
structures at different times and geomagnetic conditions and can be used to predict the regional TEC over China
2 hr in advance (Q. Li et al., 2022). H. Wang et al. (2022), X. Lin et al. (2022) adopted the spatiotemporal network
model as a source for forecasting TEC maps. These models were used to correct ionospheric delay and improve
satellite navigation positioning accuracy, as well as forecast global TEC 24 hr in advance (Cesaroni et al., 2020).
LSTM can also serve as an end-to-end model for TEC forecasting (Cherrier et al., 2017; Xia, Zhang, et al., 2022).
These studies show that near real-time TEC maps can be provided within 5 min after observation. As in Mendoza
et al. (2019), these timely TEC maps can be used to estimate GPS signal delay caused by ionospheric electron
content between a receiver and a GPS satellite. M. Lin et al. (2022) used the self-attention mechanism of the
transformer structure is utilized to capture the long-term characteristics of the TEC in China.

However, despite flourishing progress, applying deep models for TEC forecasting still faces some challenges.
From a data perspective: Firstly, training very deep models like transformer architecture (Vaswani et al., 2017)
requires large-scale data sets. Insufficient training data can lead to overfitting and degraded performance on
out-of-distribution test samples. Secondly, variational auto-encoder (VAE) as a common tool for anomaly detec-
tion (Desai et al., 2021; Ha & Schmidhuber, 2018), has better capabilities in synthesizing exceptional cases. VAEs
can help create data sets containing outliers or changepoints, which are necessary for robust TEC forecasting.

Existing models for TEC mapping forecasting exhibit limitations in effectively capturing the long-term spatio-
temporal dependencies in the data. RNN and LSTM-based approaches often struggle with vanishing and explod-
ing gradients when modeling long input sequences (Ruwali et al., 2020; L. Liu et al., 2022). This hinders their
ability to model the complex temporal relationships in TEC maps. Although LSTMs can learn dependencies
between memory units, the quadratic growth in parameters with memory size leads to prohibitively high compute
costs for the multi-step forecasting task. More recent transformer models adopt self-attention to mitigate the
challenges of modeling long-term dependencies (Xia, Liu, et al., 2022). However, relying solely on point-wise
self-attention disregards local temporal relationships, leading to an information bottleneck. By only calculating
relationships between individual points, transformers fail to fully capture the sequential dependencies inherent in
TEC map time series data. These limitations motivate the need to develop alternative models that can efficiently
learn both global long-range and local short-term spatiotemporal dependencies for accurate TEC forecasting.
Therefore, we aimed to address the following questions: (a) Can we design a generative module to synthesize
inexhaustible high-quality samples that closely match the real data distribution? (b) Can we design a prediction
model expert to capture both long-term dependencies and temporal dynamics for long-term TEC forecasting?
(c) Can a model trained on synthetic TEC data outperform overfitted deep models trained on limited real data?

In our work, we propose a novel two-step approach for forecasting TEC maps by utilizing a generative model
and an auto-correlation-based transformer network as the prediction model (Desai et al., 2021; Vaswani
et al., 2017). In the first step, the VAE model captures both feature distribution and temporal relationships in
the IGS TEC data released by IGS product center to generate the imitation samples. In the second step, we used
the Auto-correlation-based transformer network as the prediction model to forecast 1-day global TEC data. The
Auto-correlation-based transformer decomposes the time series into its trend-cyclical part and seasonal part to
capture complex temporal patterns in long-context forecasting. The Auto-correlation-based transformer shows its

YUAN ET AL. 2 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Figure 1. Pipeline. We introduced a two-step synthesis and auto-correlation method for forecasting total electron content
(TEC) maps. The generation model TimeVAE takes the selected original International GNSS Service (IGS) TEC maps as
input and captures both the distributions in features and the temporal relationships to synthesize the generated data set. The
prediction model is an Auto-correlation-based transformer that decomposes the series to learn complex temporal patterns in
long-context forecasting. The model demonstrates robust performance by outperforming overfitting deep models in the IGS
testing manner.

robustness by outperforming other deep-learning models that suffer from overfitting. We summarized our contri-
butions as follows: (a) Generating synthetic TEC data to enable training of 1-day global TEC forecasting models.
(b) With the auto-correlation transformer, our approach captures the complex temporal patterns in the TEC maps
data sets, leading to more accurate 1-day global TEC forecasting results. (c) We quantitatively demonstrated the
efficacy of synthetic TEC data by training the auto-correlation transformer on generated samples. This improves
model robustness and reduces overfitting compared to models trained only on original data.

In summary, this two-step approach utilizing data augmentation and an auto-correlation transformer provides
better performance for the challenging task of global scale 1-day TEC forecasting.

The paper is organized as follows. Section 2 provides concrete details on the proposed methods. Section 3.1
describes the data sources and preprocessing methods. The experimental setup, results, and analysis are presented
in Section 4. Finally, Section 5 summarizes the conclusions, discussions, and future research directions.

2. Methods
Generative augmentation is a vital technique in machine learning for expanding and diversifying training data,
which ultimately enhances model generalization (Calimeri et al., 2017; Sandfort et al., 2019; Shao et al., 2019).
By generating synthetic training data, preserving quality and diversity, Y. Yang et al. (2020) described a frame-
work for Generative Data Augmentation for Commonsense Reasoning. Additionally, synthetic data has been
demonstrated that it can significantly improve classification image recognition (He et al., 2022). Inspired by this
article, we proposed a method to train the forecasting model using synthetic data. Our deep learning method
mainly includes two steps. First, we synthesized the samples by feeding the selected IGS TEC maps into a VAE
model to obtain synthetic samples (Desai et al., 2021). Second, we trained the auto-correlation-based trans-
former using the synthetic samples. Moreover, the empirical reference Center for Orbit Determination in Europe
1-day forecast products (C1PG) model and an adaptive autoregressive 1-day forecast model (Beihang University
(B1PG)) developed by C. Wang et al. (2018) were chosen as the comparison models. In this section, we demon-
strated the architecture of our generation model and prediction model, as well as their training processes. Based
on the above designs, we implemented the following pipeline in Figure 1.

Compared to RNN and LSTM, the Transformer and Auto-correlation-based transformer models have a lower
computational complexity O(n 2d), where n is the smaller sequence length and d is the dimensionality. Thus, we
utilized these models instead of RNN and LSTM models to achieve a lower computational complexity. While
the original Transformer relies solely on point-wise self-attention, the architecture of the Auto-correlation-based
transformer model incorporates auto-correlation to capture dependencies between subsequences. This mecha-
nism provides a vital local temporal context that is lacking in standard self-attention. By modeling relationships
both within and across subsequences, the Auto-correlation-based transformer achieves more effective informa-
tion utilization for TEC sequence data.

Considering sample efficiency, we generated the same amount of data as the original data, as we considered it to
be an important factor. Additionally, we also implemented data augmentation in our study by generating original
data twice. As GNSS stations are sparsely distributed in oceanic areas (W. Li et al., 2019), the availability of GPS

YUAN ET AL. 3 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Figure 2. The architecture of TimeVAE.

sites is limited and sparse, which poses a challenge for accurately predicting TEC. However, data augmentation
techniques have shown promise in addressing this challenge by enhancing diversity. The TimeVAE generative
modeling approach provides the core functionality to synthesize realistic augmented training data, enabling criti-
cal performance gains in the downstream forecasting model. Increases size and diversity of training set by gener-
ating original data twice from existing examples, data augmentation creates a larger, more robust training set for
the prediction model to learn from it. This helps improve the performance of the predicted model.

2.1. Generative Model: TimeVAE

By synthesizing highly realistic synthetic training data, the TimeVAE generative modeling approach plays a
pivotal role in substantial performance gains for the downstream forecasting model. It provides the necessary
conditions for generating synthetic data, which can improve the generalization performance of the prediction
model.
2.1.1. TimeVAE Training Data Set
We considered each hourly TEC data set to be an independent and identically distributed (i.i.d.) set of samples.
The inputs consist of N i.i.d. samples, where N denotes the total number of hours in the TEC data set. The spatial
longitude ranges from 180° west to 180° east with a resolution of 5° and the latitude ranges from 87.5° north to
87.5° south with a resolution of 2.5°. As a result, the global TEC map grid consists of 71 × 73 points, with 71 and
73 representing the latitude and longitude information of the TEC map at each hour, respectively, corresponding
to different geographical locations. The structure of the generation model is shown in Figure 2, where the input
data set array, represented as (N, 71, 73), is a 3-dimensional array. The latitude and longitude information of the
TEC map at each hour, represented by 71 and 73, respectively, correspond to different geographic locations,
while N represents the total number of samples. In this generation module, the input consists of IGS TEC maps
data from 1998 to 2017. The goal is to obtain an output that closely matches the probability distribution of the
original input data set. Each time we input a data set of size (N, 71, 73), where the value of N depends on the total
number of hourly input IGS TEC maps from 1998 to 2017, we generate a data set of the same size as the output
item with dimensions (N, 71, 73).
2.1.2. TimeVAE Architecture

To adapt the generation model to the synthesis of ionospheric TEC maps, we adopted an encoder-decoder VAE
model. The encoder extracts the features from the input, which is a 3-dimensional array with a size of N × t × D,

YUAN ET AL. 4 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

N for batch size, T for the number of time steps, and D for the number of feature dimensions, into a multivariate
Gaussian distribution by passing the inputs through a series of convolutional layers with ReLU activation and a
fully connected linear layer. The encoder outputs the parameters of the multivariate Gaussian which can be used
to sample the latent vector z using the reparameterization trick. by taking the latent state vector z from the multi-
variate Gaussian, The decoder first passes the latent vector through a fully connected linear layer, then reshapes
the data into a 3-dimensional array, and passes it through a series of transposed convolutional layers with ReLU
activation. Finally, the data is passed through a time-distributed fully connected layer to produce the final output,
which should have the same shape as the original TEC map signal. The goal of the decoder is to generate TEC
maps that are as similar as possible to the original TEC maps, based on the information encoded in the latent
vector “z”.
2.1.3. TimeVAE Loss Function

We trained TimeVAE using the Evidence Lower Bound loss (ELBO) function, which is written as follows:
[ ]
(1) 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 = 𝐸𝐸𝑞𝑞(𝒛𝒛|𝒙𝒙;𝜙𝜙) log(𝒙𝒙|𝒛𝒛; 𝜃𝜃) − 𝐷𝐷𝐾𝐾𝐾𝐾 (𝑞𝑞(𝒛𝒛|𝒙𝒙; 𝜙𝜙)‖𝑝𝑝(𝒛𝒛; 𝜃𝜃))

The process of ELBO loss is to reconstruct x given z sampled from q (z∣x; ϕ). Specifically, the Right Hand Side
is composed of two parts, and the first term is the log-likelihood of our data given z sampled from q(z∣x; ϕ). The
second term is the KL-Divergence loss between the encoded latent space distribution q(z∣x; ϕ) and the prior
distribution p(z; θ).

2.2. Transformer Architecture

The Transformer model consists of an encoder and a decoder. The encoder takes an input sequence and gener-
ates a sequence of hidden states, while the decoder takes the encoder output and generates a sequence of output
tokens. Both the encoder and decoder consist of multiple layers of self-attention and feedforward neural networks.
The self-attention mechanism allows the model to attend to different parts of the input sequence, while the feed-
forward neural networks enable the model to capture complex patterns in the data. To build the Transformer
model, we directly quote the model-building method mentioned in the article (Vaswani et al., 2017). Specifically,
we utilized the IGS TEC maps from 48 hr earlier as the input to generate a prediction for the subsequent 24 hourly
IGS TEC maps. Hence, the time dimensions of the input and output for the Transformer model are (48, 71, 73)
and (24, 71, 73) respectively. We trained our forecasting Transformer models using the Mean Absolute Error loss
function, and the Adam optimization algorithm is employed in the training process of our deep learning models.
2.2.1. Encoder
The encoder is composed of six identical layers. Each layer consists of two sub-layers: the first is a multi-head
self-attention (MSA) mechanism, and the second is a simple, position-wise fully connected feed-forward network.
The residual connection is applied to both sub-layers, followed by layer normalization.

2.2.2. Decoder
The decoder is composed of six identical layers. Each layer consists of three sub-layers. Similar to the encoder,
the first layer is a MSA mechanism, and the second is a simple fully connected feed-forward network. The third
layer performs modified multi-head attention over the output of the encoder stack. Similar to the encoder, the
decoder also used residual connections around each of the sub-layers, followed by layer normalization. To prevent
positions from attending to subsequent positions, the decoder stack modifies the self-attention sub-layer using a
mask-multi-head self-attention mechanism.

2.2.3. Multi-Headed Self-Attention (MSA)


MSA allows the model to jointly attend to information from different representation subspaces at different posi-
tions. A single attention head inhibits this capability. The basic structure is the same as self-attention. However,
MSA is divided into multiple heads, and the self-attention calculation is performed in parallel. The output vectors
from the different heads are then concatenated together, enabling the model to jointly attend to information from
different representation subspaces at different positions. This allows different heads to learn different levels of
knowledge.

(2)
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝑄𝑄𝑄 𝑄𝑄𝑄 𝑄𝑄 ) = 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶(ℎ𝑒𝑒𝑒𝑒𝑒𝑒1 , . . . , ℎ𝑒𝑒𝑒𝑒𝑒𝑒ℎ )𝑊𝑊 𝑂𝑂

YUAN ET AL. 5 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Figure 3. The architecture of auto-correlation-based transformer, N = 6, and M = 6.

( )
where
𝐴𝐴 𝐴𝐴𝐴𝐴𝐴𝐴𝐴 = 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑄𝑄𝑄𝑄𝑖𝑖𝑄𝑄 , 𝐾𝐾𝐾𝐾𝑖𝑖𝐾𝐾 , 𝑉𝑉 𝑉𝑉𝑖𝑖𝑉𝑉 , and the projections are parameter matrices
𝐴𝐴 𝐴𝐴 ×𝑑𝑑𝑣𝑣, 𝐴𝐴 𝑂𝑂 ∈ 𝑅𝑅ℎ𝑑𝑑𝑣𝑣 ×𝑑𝑑model , dmodel = 512, dv, dk = 64, and h = 8.
𝐴𝐴𝑖𝑖𝑄𝑄 ∈ 𝑅𝑅𝑑𝑑model ×𝑑𝑑𝑘𝑘 , 𝑊𝑊𝑖𝑖𝐾𝐾 ∈ 𝑅𝑅𝑑𝑑model ×𝑑𝑑𝑘𝑘 , 𝑊𝑊𝑖𝑖𝑉𝑉 ∈ 𝑅𝑅𝑑𝑑model

2.2.4. Positional Encoding

This part is used to inject some information about the relative or absolute position tokens from the TEC sequence.
To this end, add positional encodings at the bottoms of the encoder and decoder stacks. The positional encodings
have the same dimension dmodel as the embeddings so that the two can be summed.
( )
𝑃𝑃 𝑃𝑃(𝑝𝑝𝑝𝑝𝑝𝑝𝑝2𝑖𝑖) = sin 𝑝𝑝𝑝𝑝𝑝𝑝∕100002𝑖𝑖∕𝑑𝑑model
(3) ( )
𝑃𝑃 𝑃𝑃(𝑝𝑝𝑝𝑝𝑝𝑝𝑝2𝑖𝑖+1) = cos 𝑝𝑝𝑝𝑝𝑝𝑝∕100002𝑖𝑖∕𝑑𝑑model

here, the pos is the position and i is the dimension. That is, each dimension of the positional encoding corresponds
to a sinusoid.

2.3. Auto-Correlation Based Transformer Architecture

Modeling long-term time series forecasting for IGS TEC maps is not easy: it requires handling intricate temporal
patterns. The original transformer architecture, Vaswani et al. (2017) adopted self-attention modules to calcu-
late the correlation between scattered points but it does ignore the dependencies among sub-series. In contrast,
our approach leverages an auto-correlation-based transformer (Wu et al., 2021) as a prediction model which
enables series-wise connections to model dependencies between each sub-series and improves the information
utilization. The architecture of the auto-correlation-based transformer is shown in Figure 3. We utilized the

YUAN ET AL. 6 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Auto-correlation-based transformer model to predict the IGS TEC maps for the next day. To achieve this, we
input the IGS TEC maps from the two preceding days into our model and aim to forecast the TEC maps for the
upcoming day. Specifically, we used a lead time of 48 hr of the IGS TEC maps as the input and predicted for
the next 24 hourly TEC maps.

2.3.1. Series Decomposition Block

The block inherits the ideas from (Cleveland et al., 1990) and separates the long time series into two parts:
trend-cyclical part and seasonal part, where the former reflects the overall trend and fluctuations and the latter
reflects the repeating patterns (seasonality) of the series, respectively. The series decomposition block is deployed
along the model as it goes deeper to capture complex patterns. For the input, TEC maps series X ∈ R I×d, where I is
the length of the input series, and d is the number of features of the TEC map series. The concrete inner operation
(Cleveland et al., 1990) used for gathering two composed series is:

(4)
𝑋𝑋𝑡𝑡 = 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝐴𝐴𝐴𝐴𝐴𝐴(𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃(𝑋𝑋)) and 𝑋𝑋𝑠𝑠 = 𝑋𝑋 − 𝑋𝑋𝑡𝑡

where Xt is the trend-cyclical part, and Xs is the seasonal part. Within the inner operation, AvgPool (⋅) is a moving
average pooling with the padding operation to keep the series length unchanged.

2.3.2. Auto-Correlation Mechanism

The auto-correlation mechanism is designed based on the periodicity of the time series and aims to discover and
aggregate dependencies at the sub-series level. According to stochastic process theory, auto-correlation Rxx(τ)
represents a time-delay similarity between the original series Xt and its τ lagged series Xt−τ. It can be calculated
by the following equation:
𝐿𝐿

(5)
𝑅𝑅𝑥𝑥𝑥𝑥 (𝜏𝜏) = lim 𝑋𝑋𝑡𝑡 𝑋𝑋𝑡𝑡−𝜏𝜏
𝐿𝐿→∞
𝑡𝑡=1

In the real-world application for TEC map predictions, we first projected the embedding of TEC maps to obtain
query Q, key K. Then, a time delay aggregation block was applied to shift the series based on the selected time
delay, allowing us to aggregate the sub-series using softmax. Concretely we first get the arguments of TopK
autocorrelations:

(6)
𝜏𝜏1 . . . 𝜏𝜏𝑘𝑘 = argTopk𝜏𝜏 ∈ {1, . . . , 𝐿𝐿}(𝑅𝑅𝑄𝑄𝑄𝑄𝑄 (𝜏𝜏))

RQ,K is the autocorrelation between Q and K. After that, the series are fed into a softmax layer:

(7)
𝑅𝑅𝑄𝑄𝑄𝑄𝑄̂ (𝜏𝜏1 ), . . . , 𝑅𝑅𝑄𝑄𝑄𝑄𝑄̂ (𝜏𝜏𝑘𝑘 ) = 𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆(𝑅𝑅𝑄𝑄𝑄𝑄𝑄 (𝜏𝜏1 ). . . 𝑅𝑅𝑄𝑄𝑄𝑄𝑄 (𝜏𝜏𝑘𝑘 ))

Then the Auto-correlation can be obtained by the relationship between the series and its time-delay shifted
version. Roll(X, τ) the operation to shift the series {Xt} with time delay τ. The expression of auto-correlation is:
𝑘𝑘
∑ ( )
(8)
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴− 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑄𝑄𝑄 𝑄𝑄𝑄 𝑄𝑄 ) = 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅(𝑉𝑉𝑉 𝑉𝑉𝑖𝑖 ) 𝑅𝑅̂ 𝑄𝑄𝑄𝑄𝑄 (𝜏𝜏𝑖𝑖 )
𝑖𝑖=1

3. Experiments
We aim to achieve two primary objectives. First, we employed TimeVAE to enhance our capability of producing
credible TEC time-series data set samples when we had a shortage of actual TEC data sets for model train-
ing. Second, by using these generated samples, we seek to train advanced time forecasting models, specifically
the Transformer model and Auto-correlation-based Transformer model. These two forecasting models are the
most advanced models in long-series time forecasting. Finally, we evaluated the accuracy of both prediction
models and presented our findings. The advantage of TimeVAE is that it learns the distribution of TEC data sets,
allowing us to generate multiple TEC data sets. This enables us to produce theoretically unlimited data sets even
in the absence of sufficient real data.

YUAN ET AL. 7 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

3.1. Data Source and Processing

In this paper, we used the Global ionosphere maps (GIMs) provided by the IGS. Generative data augmentation
plays a significant role in expanding and diversifying training data. By generating additional synthetic data, such
an approach helps improve the generalization capabilities of models. Given the same amount of training data set
as the original data “ori_data,” generated a synthetic data set “gen_once.” Deep learning models trained on larger
data sets are able to better generalize to new data and perform better on a variety of tasks (Recht et al., 2019).
Therefore, we generated the “gen_twice” TEC data set and utilized it to train our forecasting models. By incorpo-
rating this augmented data set, we aimed to enhance the models' predictive capabilities. Furthermore, to prevent
any leakage of test data, we used the IGS TEC data from 1998 to 2017 as input data for the TimeVAE model,
excluding the period from 2018 to 2020. 90% of the input data is used as TimeVAE training data, and the remain-
ing 10% is used as TimeVAE validation data. Thus, we got synthetic data sets that have the same period as 90%
of the input data, namely gen_once. We obtained the gen_twice data set using the same method, but we generated
the data twice, resulting in a period twice as long as the gen_once data set, and we call it gen_twice.

We conducted experiments with three distinct training data sets, namely ori_data, gen_once, and gen_twice. Here
is the description of the experiment data sets:

1) Training set ori_data: The global ionospheric TEC data is generated by the standard Ionosphere map exchange
format file that IGS provides. We processed our ori_data data set into a 4-dimensional time series sequence
(number × 0.9, 24, 71, 73). The longitude dimension consists of 73 points, spanning from 180° west longitude
to 180° east longitude with a resolution of 5°. The latitude dimension consists of 71 points, ranging from 87.5°
north latitude to 87.5° south latitude with a resolution of 2.5°. The scale of the global TEC map grid points
is 71 × 73. 24 indicates the total number of hourly time steps for each day, with a time resolution of 1 hr, and
number represents the total number of days in the time range from 1 January 1998, to 31 December 2017.
number × 0.9 means that 90% of number is selected as the training set.
2) Training set gen_once: We generated synthetic data as the second data set, which has been confirmed to have
a high similarity to the original data set ori_data with an identical size.
3) Training set gen_twice: We repeatedly generated ori_data, each time we obtained an identical data set with
ori_data. Finally, the gen_twice data set consists of two synthetic data sets, which are twice the size of
ori_data.
4) Validation set: During the real data training model process, 90% of the data set which includes IGS TEC data
from 1998 to 2017, is used for training and the remaining 10% is used for validation purposes. The validation
set consists of the remaining 10%.
5) Testing set: We chose the IGS TEC data in 2018, 2019, and 2020 as our three testing data sets.

We generated synthetic data of two different sizes from ori_data, that is, 1×, 2× ori_data size, concretely 8.2 G
and 16 G. We evaluated the benefits of using generated data over original data in our prediction models while
keeping the number of training sets constant, that is, comparing the performance of the models trained by gen_
once and ori_data data sets. Additionally, the deep learning model depends on the number of training sets to
some extent, and increasing the size of the data set helps to enhance their predictive ability. Thanks to our TEC
generation model, we were able to generate a large volume of TEC data, which allowed us to test our hypothesis
that adding more data can improve the accuracy of prediction models. Therefore, we created the gen_twice data
set which is twice the size of the original data ori_data.

Moreover, the B1PG model and C1PG model, as the empirical reference models for the ionosphere, are selected
as the comparison models with our Auto-correlation-based transformer model. Our models, including the B1PG
model and C1PG model, are designed to predict TEC data for the next day. Therefore, the B1PG model and C1PG
model serve as the comparison models for our study. Some of the prediction data of the B1PG model in 2018
(lacking 21 days) and 2020 (lacking 22 days) are lacking, so we test the accuracy of the B1PG model only when
prediction data is available in 2018 and 2020.

3.2. Generated Samples

T-Distributed Stochastic Neighbor Embedding is a machine learning algorithm for non-linear dimensional
reduction developed by van der Maaten and Hinton (2008). It is used to reduce high-dimensional data into a
lower-dimensional space for visualization or machine learning. t-SNE works on the principle of minimizing the

YUAN ET AL. 8 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Figure 4. The root mean squared error scores between original total electron content (TEC) data and generated TEC data.

divergence between a distribution of actual data points in high dimensional space and a distribution of corre-
sponding points in a lower dimensional space. This is done by mapping the data points to a probability distribu-
tion in the lower dimensional space.

We used these comparison metrics proposed by Yoon et al. (2019) to assess the quality of the synthetic data.
The lower the discriminative score, the better the quality of the synthetic data. The abscissa (x-axis) and ordi-
nate (y-axis) in t-SNE visualization represent the positions of data points in a low-dimensional space. The
t-SNE algorithm is designed to map high-dimensional data to a low-dimensional space, typically two or three
dimensions, for visualization purposes. In t-SNE, each data point in the input data set is represented as a point
in the low-dimensional space, and the algorithm aims to preserve the similarity between data points in the
high-dimensional space by maintaining their proximity or distance in the low-dimensional space. The discrim-
inator score of generating the TEC data set is 0.0035 ± 0.007, demonstrating the high correlation between
synthetic data and original data, indicating that TimeVAE performs well in generating synthetic data. As shown
in Figure 4, we calculated the root mean squared error (RMSE) between IGS TEC data and synthetic TEC data
from the year 1998 to 2017. Because we generated the ori_data twice, there are two RMSE curves. Figure 5
displays the t-SNE charts of data generated by TimeVAE. The TimeVAE generated data consistently shows
heavy overlap with original data.

Figure 6 presents the comparison of the TEC provided by IGS and generated by the model TimeVAE at six
randomly selected time points from the gen_twice data set. The top row displays TEC maps provided by IGS,
while the bottom row shows TEC maps generated by the TimeVAE model. Six randomly selected time points are
labeled at the top of each map, highlighting the comparison between the two sources. The comparison showcases
the accuracy of the TimeVAE model in TEC mapping, as well as its potential for use in ionosphere research and
satellite-based communication systems.

3.3. Forecasting Deep Learning Models


3.3.1. Training Loss Function

In PyTorch, the Means Squared Error (MSE) loss function is provided as torch.nn.MSELoss. It is a predefined
loss function that calculates the mean squared error between the input and target. When training our models, we
typically compute the MSE loss using this function and use it as a measure of how well the model is performing.
The formula for calculating MSE loss is as follows:
𝑁𝑁
1 ∑( )2
(9)
𝑀𝑀𝑀𝑀𝑀𝑀_𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = 𝑇𝑇 𝑇𝑇𝑇𝑇ori − 𝑇𝑇 𝑇𝑇𝑇𝑇pred
𝑁𝑁 𝑖𝑖=1

where N is the total number of data samples, TECori and TECpred are the observed value and forecasting values,
respectively.

YUAN ET AL. 9 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Figure 5. The visual t-SNE plots show generated and original International GNSS Service data from 1998 to 2017. Red is used for original data, and blue is used
for synthetic data. Higher overlap rates represent higher similarity. The x-axis and y-axis of t-SNE represent the positions of data points (number, 24, 73, 71) in a
two-dimensional space.

3.3.2. Evaluation Metric

Root-mean-square error (RMSE) and percentage deviation (PD) following are used to estimate the forecasting
performance of the model. The lower the RMSE value, the better the model's accuracy in prediction. In essence,
RMSE represents the average magnitude of the errors in the predictions made by a model. The PD score is calcu-
lated by taking the absolute difference between the predicted value and the actual value, dividing by the actual
value, and multiplying by 100. The formulas for calculating RMSE and PD are as follows:


√1 ∑ 𝑁𝑁
( )2
(10) 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = √ 𝑇𝑇 𝑇𝑇𝑇𝑇ori − 𝑇𝑇 𝑇𝑇𝑇𝑇pred
𝑁𝑁 𝑖𝑖=1

1 ∑ ||𝑇𝑇 𝑇𝑇𝑇𝑇ori − 𝑇𝑇 𝑇𝑇𝑇𝑇pred ||


𝑁𝑁

(11)
𝑃𝑃 𝑃𝑃 = × 100%
𝑁𝑁 𝑖𝑖=1 𝑇𝑇 𝑇𝑇𝑇𝑇ori

where N is the total number of data samples, TECori and TECpred are the observed value and forecasting value,
respectively.

YUAN ET AL. 10 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Figure 6. Comparison between the global total electron content map provided by International GNSS Service and generated twice by TimeVAE model for six
stochastic times.

Additionally, the RMSE function is calculated separately for three latitude ranges: low latitudes
(30°S ≤ LAT ≤ 30°N), middle latitudes (30°S < LAT ≤ 60°S, and 30°N < LAT ≤ 60°N) and high latitudes
(60°S < LAT ≤ 90°S, and 60°N < LAT ≤ 90°N). LAT refers to latitude.

4. Forecasting Performance Evaluation and Results Analysis


Our study trains two models, Auto-correlation-based transformer and Transformer models, using three different
data sets: original data, once-generated synthetic data, and twice-generated synthetic data (ori_data, gen_once,
and gen_twice). We evaluated the RMSE of all six trained models in 2018, 2019, and 2020, as well as the RMSE
of the C1PG and B1PG models. Furthermore, we assessed the RMSE and PD scores of the models in high, low,
and middle latitudes to compare their predictive capabilities. Overall, we demonstrated that by increasing the
training set amount of forecasting models, the enlarged data amount could further bring performance boosts.
When there is the same amount of training data, the model trained by synthetic TEC data is more likely to gener-
alize well to new data and can capture more complex patterns in the data.

YUAN ET AL. 11 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Table 1 4.1. Overall Results Analysis


We Recorded the Means Squared Error Loss on the Validation Data Set
In order to conveniently assess the loss values of the training and vali-
and Training Data Set for Six Different Models: Auto-Correlation-Based
Transformer Model and Transformer Model Trained by Three Different dation sets once the six models have reached the convergence state, we
Data Sets recorded them in Table 1. As described in Equation 9, we utilized MSE
loss as the training loss function for our models. Figure 7 contains
MSE_loss (TECU)
two subfigures depicting the change curve of the loss function for the
Training Training Validation Auto-correlation-based transformer model as the number of epochs
Models data sets set set
increases during the training process. The right one is the model trained by
Auto-correlation-based transformer ori_data 1.295 1.291 synthetic IGS TEC data (gen_once), and the left is trained by the original
gen_once 1.268 1.184 data (ori_data). We observed a clear trend of decreasing training loss as the
gen_twice 1.208 1.189 number of training epochs increased. Both the model trained on generated
data and the model trained by original data converged to a stable loss value
Transformer ori_data 1.361 1.430
after 100 epochs, demonstrating the success of our training and validation
gen_once 1.337 1.324
process. Models trained by synthetic TEC data consistently showed lower
gen_twice 1.327 1.323 training and validation loss compared to models trained by original IGS
TEC data. This suggests that synthetic TEC data is more likely to enhance
the generalization ability of forecasting models, leading to more accurate
prediction results.

We observed that our Auto-correlation-based transformer models outperform the C1PG and B1PG models.
Table 2 displays the forecast RMSE from our trained prediction models and the C1PG and B1PG models
with respect to the IGS TEC in 2018, 2019, and 2020. With double the amount of training data, the global
mean RMSE of the Auto-correlation-based transformer model is 1.17 TECU in 2018, which is 1.07 TECU
lower than the B1PG model and 0.90 TECU lower than the C1PG model. With synthetic training data, our
models achieve a lower RMSE value than the C1PG and B1PG models across the three testing data sets, which
demonstrates that our models trained using synthetic data outperform the C1PG and B1PG models. Compared
to the model trained by original data, the Auto-correlation-based transformer model trained by once-generated
data significantly decreased its RMSE from 1.55 to 1.22 in 2018, 1.48 to 1.09 in 2019, and 1.58 to 1.25 in
2020 and Transformer models also improve RMSE accuracy with an average gain of 0.13 TECU, indicating
that using synthetic TEC data is likely to enable accurate predictions. Additionally, it is observed that the
more synthetic data used for model training, the better the model performance. With synthetic gen_twice
TEC data, we achieved a remarkable average gain of 0.35 TECU for the Auto-correlation-based transformer
and 0.14 TECU for the Transformer model, we achieved the largest performance boost of 0.38 TECU for the
Auto-correlation-based transformer model in 2018. In comparison to the Auto-correlation-based Transformer
model trained by 2× synthetic TEC data, the C1PG and B1PG models have poorer performance, resulting
in higher RMSE scores in the three testing data sets. Overall, the Auto-correlation-based transformer model
trained by generated data outperforms the other models in forecasting TEC from 2018 to 2020. We demon-
strated that our prediction models can further improve performance by adding generated training data sets and
increasing data amounts.

Figure 7. The training loss and validation loss of the Auto-correlation-based transformer model in different data sets.

YUAN ET AL. 12 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Table 2
Multivariate Accuracy Root Mean Squared Error Scores for Eight Models: Auto-Correlation-Based Transformer Model
and Transformer Model Trained by Three Different Data Sets, the Center for Orbit Determination in Europe Model, and
Beihang University Model
RMSE (TECU)

Models Training data sets 2018 2019 2020


Auto-correlation-based transformer ori_data 1.55 1.48 1.58
gen_once 1.22 1.09 1.25
gen_twice 1.17 1.19 1.21
Transformer ori_data 1.37 1.34 1.96
gen_once 1.31 1.23 1.75
gen_twice 1.29 1.22 1.73
C1PG / 2.07 1.92 2.13
B1PG / 2.14 2.02 2.55
Note. Training data sets include ori_data, gen_once, and gen_twice. And ori_data means IGS TEC data sets from 1998 to
2017, gen_once means the generated data sets from 1998 to 2017 and gen_twice means generated data set twice from 1998
to 2017. All of the data sets mentioned above were covered in detail in the previous Chapter 2. The best performances are in
bold, and a lower RMSE score indicates a better performance.

4.2. Comparison of Predicted TEC Maps

We trained the Auto-correlation-based transformer model and the Transformer model on multivariate training
data sets and also evaluated their monthly RMSE during 2018 and 2020.

Figure 8 displays a visual comparison of predicted versus actual TEC maps for 3 January 2019. Figure 9 displays
a visual comparison of predicted versus actual TEC maps for 3 January 2020. The top line map shows the orig-
inal IGS data representing the ground truth measurements. The second map illustrates the prediction from the
best-performing auto-correlation-based transformer model. The bottom three maps are generated by the B1PG,
transformer, and C1PG models respectively.

We assessed the monthly RMSE of the C1PG and B1PG models in 2018 and compared it with Transformer
models and Auto-correlation-based transformer models trained by three different data sets in Figure 10. The
results indicate that both the Auto-correlation-based transformer model and the Transformer model offer
improved monthly predictive capabilities when compared to the C1PG and B1PG models, and also reveal that
the Auto-correlation-based transformer model trained by gen_twice outperformed the other models throughout
each month of 2018. The model exhibited the highest accuracy in June and October, while the lowest accuracy of
1.175 TECU was observed in January. On the other hand, the Auto-correlation-based transformer model trained
on original data performed the worst among our six models. However, it still outperformed the B1PG and C1PG
models, with all RMSE values ranging from 1.16 TECU to 1.76 TECU across the months of 2018.

The results from the comparison of the monthly RMSE values of the C1PG and B1PG models and the Trans-
former models trained on three data sets (ori_data, gen_once, and gen_twice) in 2020, as shown in Figure 11,
indicate that, with an equivalent amount of training data as that of ori_data, synthetic TEC data for Transformer
model training can reduce a larger gap between real data and the prediction data than original TEC training data,
with the best predictions occurring in October and November and the worst prediction performance occurring in
July and August, even not being better than the C1PG and B1PG models. However, with 2× data amount, there
is no significant improvement compared to the improvement achieved with 1× generated training data amount.

In terms of the Auto-correlation-based transformer model, the synthetic data training model shows surprisingly prom-
ising results. The comparison of the prediction accuracy of the Auto-correlation-based transformer model in 2020
on three different training data sets in Figure 12 shows that, with 2× synthetic data, the Auto-correlation-based
transformer model outperforms the other four models, achieving the highest accuracy in predicting monthly TEC
values. The RMSE values of all of the models fluctuate between approximately 0.9 and 2.5 TECU throughout the
year. On the other hand, when using the same amount of data, the Auto-correlation-based transformer model trained
on the gen_once data set outperformed the model trained on ori_data in every month of 2020. Moreover, increasing

YUAN ET AL. 13 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Figure 8. Visual comparison of predicted and actual total electron content (TEC) maps on 3 January 2019. The TEC maps predicted by the Auto-correlation-
based transformer model and Transformer model trained on the gen_twice synthetic data set are labeled as “AUTO” and “TRANS’ respectively, “IGS” for the real
International GNSS Service data, “C1PG” for the Center for Orbit Determination in Europe model, and ’B1PG’ for the Beihang University model.

the amount of training data has a positive impact on the model's prediction ability. When comparing gen_once with
gen_twice, the model using twice the amount of synthetic data performs better and achieves improved performance.

4.3. Latitude Results Analysis

To assess the performance of the model in different geographic regions, we classified the TEC maps on a global
scale into distinct zones based on latitude. These zones were categorized as low, middle, and high latitudes. This
categorization allowed us to analyze how well the model performed across various latitudinal areas and under-
stand any variations or trends that may exist. Specifically, the low latitudes (30°S ≤ LAT ≤ 30°N) range spans
from 30°S to 30°N. The middle latitudes (30°S < LAT ≤ 60°S, and 30°N < LAT ≤ 60° N) range extends from
30° to 60° north and south of the equator. On the other hand, areas above 60° but less than 90° can be classified
as part of the high latitudes (60°S < LAT ≤ 90°S, and 60°N < LAT ≤ 90°N) range.

Figure 13 depicts the RMSE scores across different latitudes for our Auto-correlation-based transformer model, which
allows comparison of model performance by different latitude regions. The outcomes indicate a notable decrease in
error compared to C1PG and B1PG models at middle and high latitudes, suggesting significant advancements in TEC
forecasting capabilities within these particular regions. However, when it comes to the low latitudes, all of the models
show relatively higher RMSE values. More specifically, even though our model does slightly better than B1PG with
RMSE values of 2.51 compared to 2.73 TECU, there is still room for improvement. The increased forecasting error in
the equatorial region highlights the persistent challenges of modeling the complex physical dynamics governing the
ionosphere thermosphere system in this zone. The couplings between tidal electrodynamic and instability processes
introduce variability that current approaches still struggle to capture.

Among the different prediction models evaluated, the auto-correlation-based transformer model demonstrates superior
performance. Our model improves forecasting accuracy considerably in high and middle-latitude regions compared
to prior methods. All six of our trained machine-learning models substantially outperformed the prior B1PG and

YUAN ET AL. 14 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Figure 9. Visual comparison of predicted and actual total electron content (TEC) maps on 3 January 2020. The TEC maps predicted by the Auto-correlation-
based transformer model and Transformer model trained on the gen_twice synthetic data set are labeled as “AUTO” and “TRANS” respectively, “IGS” for the real
International GNSS Service data, ’C1PG’ for the Center for Orbit Determination in Europe model, and “B1PG” for the Beihang University model.

C1PG models. In contrast, the B1PG and C1PG models have RMSE around 2 TECU, and our training models achieve
under 1 TECU. This significant improvement from nearly 2 TECU down to less than 1 TECU demonstrates the abil-
ity of our approach to greatly enhance forecasting accuracy compared to traditional modeling. In high latitudes, our
best-performing model achieved substantial accuracy gains, reducing the RMSE by over 53% compared to the B1PG
model and 58% relative to the C1PG model. This improvement in accuracy is of great significance, as it enables us to
better understand and anticipate the variations in TEC for both high latitudes and middle latitudes.

Figure 10. The monthly root mean squared error scores for the Auto-correlation-based transformer models, Transformer models,
Beihang University model, and Center for Orbit Determination in Europe model in the year 2018. The labels in this figure
begin with the letter “A” to represent the Auto-correlation-based transformer model, while the letter “T” is used to represent the
Transformer model. For example, A_once_ori means the Auto-correlation-based transformer model trained by ori_data.

YUAN ET AL. 15 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Figure 11. The monthly root mean squared error values of the Center for Orbit Determination in Europe and Beihang
University models and the Transformer models trained on three data sets (ori_data, gen_once, and gen_twice) in 2020.

The PD score is a metric used to quantify how much a model's predictions deviate from the actual values,
expressed as a percentage. A lower value for this score indicates that the predictions are more accurate and closer
to the true values.

The results presented in Figure 14 corroborate this notion by illustrating that all of the models exhibit their
best performance in areas with middle latitude regions. These models yield remarkably low PD scores, rang-
ing from 8% to 10%. This finding suggests that these models have minimized errors and made highly precise
estimates within these latitudes. In addition, the analysis of Figure 14 reveals that our Auto-correlation-based
transformer model exhibits commendable performance in middle latitudes. The calculated PD score stands at
a promising 8.4%, implying that the model's predictions are generally characterized by higher accuracy within
this specific range of latitudes. In contrast to the Transformer model, which exceeded 17% in high latitudes,
the Auto-correlation-based transformer model trained by gen_twice achieved the lowest PD scores across
all three latitude regions. Auto-correlation-based transformer model trained by gen_twice outperformed the
Transformer models. In high latitudes, the Auto-correlation-based transformer model trained by gen_twice
achieved a 12% percent deviation, significantly outperforming the Transformer model. This demonstrates the
Auto-correlation-based transformer model's greater skill in modeling the complex ionosphere in high latitudes,
especially when leveraging augmented training data. These results demonstrate the Auto-correlation-based
transformer's better superiority for TEC forecasting than the Transformer model across latitudes, as it better
captures the geographic variability in ionospheric dynamics.

Figure 12. Monthly root mean squared error comparison in 2020 between Center for Orbit Determination in Europe and
Beihang University models versus the Auto-correlation-based transformer models trained on three different data sets (ori_
data, gen_once, and gen_twice) in 2020.

YUAN ET AL. 16 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Figure 13. The averaged root mean squared error scores of all models with three different latitudes (high latitudes, middle
latitudes, low latitudes) during 2018.

5. Conclusions
In this study, we compared the performance of two transformer models trained by synthetic data to the C1PG and
B1PG models. We applied data enhancement techniques to TEC forecasting and observed their effectiveness in
improving the quality of predictions.

Our model has significantly improved predicting TEC at middle and high latitudes. However, for low latitudes,
although our model's RMSE accuracy surpasses that of the B1PG model, it remains above 2.5 TECU, indicating
the need for further improvement in this area. Our model outperforms the C1PG model in both monthly and
yearly global mean RMSE metrics, highlighting that the Auto-correlation-based transformer model trained on
synthetic data achieves heightened accuracy and reliability in TEC prediction, leading to a substantial global
RMSE reduction of 1.17 TECU compared to 2.07 TECU for the C1PG, and 2.14 TECU for B1PG in 2018. This
finding has important implications for the development of advanced TEC prediction models and highlights the
potential of transformer models trained on synthetic data for a range of applications in ionospheric research and
satellite communication systems.

Figure 14. The percentage deviation scores of all models with three different latitudes (high latitudes, middle latitudes, low
latitudes) during 2018.

YUAN ET AL. 17 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Moreover, the Auto-correlation-based transformer model utilizing synthetic data exhibits a significant reduction
in RMSE of 0.38 TECU compared to the model trained on original data when tested with prediction data from
2018. This finding highlights that models trained on synthetic data provide more accurate predictions than those
relying solely on actual data sources. The use of synthetic data has several contributions to this study. First, it can
effectively enhance the prediction efficiency of the model, improving the accuracy of TEC prediction. Second,
it offers a way to generate more data without the need for additional data collection, which is particularly useful
in cases where obtaining real data is difficult or expensive. The outcomes obtained from this research underscore
how transformer models trained with synthetic data sets hold immense promise for facilitating precise TEC
prediction tasks.

Data Availability Statement


The global ionosphere maps (GIMs) TEC data sets from global navigation satellite systems (GNSS) were used in
the creation of this manuscript, and are available from the website at “http://pub.ionosphere.cn/products/daily/.”
The B1PG prediction data proposed by C. Wang et al. (2018), is the prediction data from an adaptive autoregres-
sive one-day-ahead forecast model. The C1PG prediction data proposed by Schaer (1999) is from a global TEC
prediction model based on the spherical harmonic (SH) expansion proposed by the Center for Orbit Determina-
tion in Europe (CODE).

Acknowledgments References
The authors thank the International
GNSS Service for providing the total Béniguel, Y., & Hamel, P. (2011). A global ionosphere scintillation propagation model for equatorial regions. Journal of Space Weather and
electron content map data, and the Space Climate, 1(1), A04. https://doi.org/10.1051/swsc/2011004
Beijing University of Aeronautics and Calimeri, F., Marzullo, A., Stamile, C., & Terracina, G. (2017). Biomedical data augmentation using generative adversarial neural networks. In
Astronautics for providing the 1-day Artificial Neural Networks and Machine Learning–Icann 2017: 26th International Conference on Artificial Neural Networks, Alghero, Italy,
ionospheric TEC prediction data September 11-14, 2017, Proceedings, Part II (Vol. 26, pp. 626–634). https://doi.org/10.1007/978-3-319-68612-7_71
(B1PG) used in this study.The authors Cesaroni, C., Spogli, L., Aragon-Angel, A., Fiocca, M., Dear, V., De Franceschi, G., & Romano, V. (2020). Neural network based model for global
greatly appreciated the Center for Orbit total electron content forecasting. Journal of space weather and space climate, 10, 11. https://doi.org/10.1051/swsc/2020013
Determination in Europe for providing Chen, Z., Jin, M., Deng, Y., Wang, J.-S., Huang, H., Deng, X., & Huang, C.-M. (2019). Improvement of a deep learning algorithm for total electron
the 1-day forecast products (C1PG). We content maps: Image completion. Journal of Geophysical Research: Space Physics, 124(1), 790–800. https://doi.org/10.1029/2018JA026167
sincerely appreciate the constructive Cherrier, N., Castaings, T., & Boulch, A. (2017). Forecasting ionospheric total electron content maps with deep neural networks. In Conference
comments from the reviewers, which Proceedings of Big Data Space (BiDS), ESA Workshop. Retrieved from https://delta-onera.github.io/files/2017_bids_esa_forecasting.pdf
have helped improve this manuscript. Cleveland, R. B., Cleveland, W. S., McRae, J. E., & Terpenning, I. (1990). Stl: A seasonal-trend decomposition. Journal of Official Statistics,
We are also grateful to the editors for 6(1), 3–73.
their professional handling and useful Desai, A., Freeman, C., Wang, Z., & Beaver, I. (2021). Timevae: A variational auto-encoder for multivariate time series generation. arXiv preprint
suggestions during the review process. arXiv:2111.08095. https://doi.org/10.1051/swsc/2011004
This work was financially supported by Feng, J., Han, B., Zhao, Z., & Wang, Z. (2019). A new global total electron content empirical model. Remote Sensing, 11(6), 706. https://doi.
the National Natural Science Foundation org/10.3390/rs11060706
of China (Grants 41574146, 41774162, Ha, D., & Schmidhuber, J. (2018). World models. arXiv preprint arXiv:1803.10122 Retrieved from https://arxiv.org/abs/1803.10122
42074187, 41804026, and 41931075), He, R., Sun, S., Yu, X., Xue, C., Zhang, W., Torr, P., et al. (2022). Is synthetic data from generative models ready for image recognition? arXiv
the National Key R&D Program of China preprint arXiv:2210.07574. https://doi.org/10.48550/arXiv.2210.07574
(Grant 2018YFC1503506), the National Iyer, S., & Mahajan, A. (2023). Short-term adaptive forecast model for tec over equatorial low latitude region. Dynamics of Atmospheres and
Key Laboratory of Electromagnetic Envi- Oceans, 101, 101347. https://doi.org/10.1016/j.dynatmoce.2022.101347
ronment (Grant 6142403180204), and Kaselimi, M., Voulodimos, A., Doulamis, N., Doulamis, A., & Delikaraoglou, D. (2020). A causal long short-term memory sequence to sequence
the Excellent Youth Foundation of Hubei model for TEC prediction using GNSS observations. Remote Sensing, 12(9), 1354. https://doi.org/10.3390/rs12091354
Provincial Natural Science Foundation Li, Q., Yang, D., & Fang, H. (2022). Two hours ahead prediction of the TEC over China using a deep learning method. Universe, 8(8), 405. https://
(Grant 2019CFA054), the National Key doi.org/10.3390/universe8080405
Research and Development Program Li, W., Huang, L., Zhang, S., & Chai, Y. (2019). Assessing global ionosphere TEC maps with satellite altimetry and ionospheric radio occultation
of China (No. 2021YFC2802502, No. observations. Sensors, 19(24), 5489. https://doi.org/10.3390/s19245489
2022YFC3301401). Li, Z., Wang, N., Wang, L., Liu, A., Yuan, H., & Zhang, K. (2019). Regional ionospheric tec modeling based on a two-layer spherical harmonic
approximation for real-time single-frequency PPP. Journal of Geodesy, 93(9), 1659–1671. https://doi.org/10.1007/s00190-019-01275-5
Lin, M., Zhu, X., Tu, G., & Chen, X. (2022). Optimal transformer modeling by space embedding for ionospheric total electron content prediction.
IEEE Transactions on Instrumentation and Measurement, 71, 1–14. https://doi.org/10.1109/TIM.2022.3211550
Lin, X., Wang, H., Zhang, Q., Yao, C., Chen, C., Cheng, L., & Li, Z. (2022). A spatiotemporal network model for global ionospheric TEC fore-
casting. Remote Sensing, 14(7), 1717. https://doi.org/10.3390/rs14071717
Liu, L., Morton, Y. J., & Liu, Y. (2022). Ml prediction of global ionospheric TEC maps. Space Weather, 20(9), e2022SW003135. https://doi.
org/10.1029/2022SW003135
Liu, L., Zou, S., Yao, Y., & Wang, Z. (2020). Forecasting global ionospheric total electron content (TEC) using deep learning. In AGU Fall
Meeting Abstracts (Vol. 2020, pp. NG004–0017). https://doi.org/10.1029/2020SW002501
Liu, Q., Hernández-Pajares, M., Yang, H., Monte-Moreno, E., Roma-Dollase, D., García-Rigo, A., et al. (2021). The cooperative IGS RT-GIMS:
A reliable estimation of the global ionospheric electron content distribution in real time. Earth System Science Data, 13(9), 4567–4582. https://
doi.org/10.5194/essd-13-4567-2021
Mendoza, L. P. O., Meza, A. M., & Aragón Paz, J. M. (2019). A multi-GNSS, multifrequency, and near-real-time ionospheric TEC monitoring
system for South America. Space Weather, 17(5), 654–661. https://doi.org/10.1029/2019SW002187
Monte-Moreno, E., Yang, H., & Hernández-Pajares, M. (2022). Forecast of the global TEC by nearest neighbour technique. Remote Sensing,
14(6), 1361. https://doi.org/10.3390/rs14061361

YUAN ET AL. 18 of 19
15427390, 2023, 10, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023SW003472 by Universiteitsbibliotheek Gent, Wiley Online Library on [03/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Space Weather 10.1029/2023SW003472

Nath, S., Chetia, B., & Kalita, S. (2023). Ionospheric TEC prediction using hybrid method based on ensemble empirical mode decomposition
(EEMD) and long short-term memory (LSTM) deep learning model over India. Advances in Space Research, 71(5), 2307–2317. https://doi.
org/10.1016/j.asr.2022.10.067
Prol, F. D. S., Camargo, P. d. O., Monico, J. F. G., & Muella, M. T. D. A. H. (2018). Assessment of a TEC calibration procedure by single-frequency
PPP. GPS Solutions, 22(2), 1–11. https://doi.org/10.1007/s10291-018-0701-6
Ratnam, D. V., Vishnu, T. R., & Harsha, P. B. S. (2018). Ionospheric gradients estimation and analysis of s-band navigation signals for NAVIC
system. IEEE Access, 6, 66954–66962. https://doi.org/10.1109/access.2018.2876795
Recht, B., Roelofs, R., Schmidt, L., & Shankar, V. (2019). Do imagenet classifiers generalize to imagenet? In International Conference on
Machine Learning (pp. 5389–5400). https://doi.org/10.48550/arXiv.1902.10811
Ruwali, A., Kumar, A. S., Prakash, K. B., Sivavaraprasad, G., & Ratnam, D. V. (2020). Implementation of hybrid deep learning model (LSTM-
CNN) for ionospheric TEC forecasting using GPS data. IEEE Geoscience and Remote Sensing Letters, 18(6), 1004–1008. https://doi.
org/10.1109/LGRS.2020.2992633
Saito, S., & Yoshihara, T. (2017). Evaluation of extreme ionospheric total electron content gradient associated with plasma bubbles for GNSS
ground-based augmentation system. Radio Science, 52(8), 951–962. https://doi.org/10.1002/2017RS006291
Sandfort, V., Yan, K., Pickhardt, P. J., & Summers, R. M. (2019). Data augmentation using generative adversarial networks (CycleGAN) to
improve generalizability in CT segmentation tasks. Scientific Reports, 9(1), 16884. https://doi.org/10.1038/s41598-019-52737-x
Schaer, S. (1999). Mapping and predicting the Earth’s ionosphere using the global positioning system [Dataset]. Geod Geophys Arb Schweiz, 59.
Retrieved fromhttps://cddis.nasa.gov/archive/gnss/products/ionex
Shao, S., Wang, P., & Yan, R. (2019). Generative adversarial networks for data augmentation in machine fault diagnosis. Computers in Industry,
106, 85–93. https://doi.org/10.1016/j.compind.2019.01.001
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(86), 2579–2605. Retrieved
from http://jmlr.org/papers/v9/vandermaaten08a.html
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). Attention is all you need. Advances in Neural Informa-
tion Processing Systems, 30. https://doi.org/10.48550/arXiv.2210.07574
Wang, C., Xin, S., Liu, X., Shi, C., & Fan, L. (2018). Prediction of global ionospheric VTEC maps using an adaptive autoregressive model [Data-
set]. Earth Planets and Space, 70(1), 18. https://doi.org/10.1186/s40623-017-0762-8
Wang, H., Lin, X., Zhang, Q., Chen, C., Cheng, L., & Wang, Z. (2022). Global ionospheric total electron content prediction based on spati-
otemporal network model. In China Satellite Navigation Conference (CSNC 2022) Proceedings: Volume II (pp. 153–162). https://doi.
org/10.1007/978-981-19-2580-1_13
Wu, H., Xu, J., Wang, J., & Long, M. (2021). Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting.
Advances in Neural Information Processing Systems, 34, 22419–22430. Retrieved from https://arxiv.org/abs/2106.13008
Xia, G., Liu, M., Zhang, F., & Zhou, C. (2022). Caitst: Conv-attentional image time sequence transformer for ionospheric TEC maps forecast.
Remote Sensing, 14(17), 4223. https://doi.org/10.3390/rs14174223
Xia, G., Zhang, F., Wang, C., & Zhou, C. (2022). Ed-ConvLSTM: A novel global ionospheric total electron content medium-term forecast model.
Space Weather, 20(8), e2021SW002959. https://doi.org/10.1029/2021SW002959
Yang, K., & Liu, Y. (2022). Global ionospheric total electron content completion with a GAN-based deep learning framework. Remote Sensing,
14(23), 6059. https://doi.org/10.3390/rs14236059
Yang, Y., Malaviya, C., Fernandez, J., Swayamdipta, S., Bras, R. L., Wang, J.-P., et al. (2020). Generative data augmentation for commonsense
reasoning. arXiv preprint arXiv:2004.11546 Retrieved from https://aclanthology.org/2020.findings-emnlp.90
Yoon, J., Jarrett, D., & Van der Schaar, M. (2019). Time-series generative adversarial networks. Advances in Neural Information Processing
Systems, 32.
Zhang, Q., Yao, Y., Kong, J., Ma, X., & Zhu, H. (2023). A new GNSS TEC neural network prediction algorithm with the data fusion of physical
observation. IEEE Transactions on Geoscience and Remote Sensing, 61, 1–12. https://doi.org/10.1109/TGRS.2023.3285794
Zhao, J., Li, X., Liu, Y., Wang, X., & Zhou, C. (2019). Ionospheric foF2 disturbance forecast using neural network improved by a genetic algo-
rithm. Advances in Space Research, 63(12), 4003–4014. https://doi.org/10.1016/j.asr.2019.02.038

YUAN ET AL. 19 of 19

You might also like