You are on page 1of 67

IEEE Access

Assessing Deep Generative Models on Time Series Network


Data

Journal: IEEE Access

Manuscript ID Access-2022-10358

Manuscript Type: Regular Manuscript

Date Submitted by the


17-Apr-2022
Author:

Complete List of Authors: NAVEED, MUHAMMAD HARIS; National University of Sciences and
Technology School of Electrical Engineering and Computer Science
Hashmi, Umair; National University of Sciences and Technology School of
Electrical Engineering and Computer Science
TAJVED, NAYAB; National University of Sciences and Technology School
of Electrical Engineering and Computer Science
SULTAN, NEHA; National University of Sciences and Technology School
of Electrical Engineering and Computer Science
Imran, Ali; The University of Oklahoma

Keywords: <b>Please choose


keywords carefully as they Machine learning, Adversarial machine learning, Artificial intelligence,
help us find the most suitable Wireless networks, Time series analysis, Forecasting
Editor to review</b>:

Subject Category<br>Please
select at least two subject Vehicular and wireless technologies, Computational and artificial
categories that best reflect intelligence
the scope of your manuscript:

Additional Manuscript
Keywords:

For Review Only


Page 1 of 65 IEEE Access

1
2
3
4
5
6
7
AUTHOR RESPONSES TO IEEE ACCESS
8 SUBMISSION QUESTIONS
9 Author chosen
10 Research Article
manuscript type:
11
This paper consists of a unique work on time-series generative
12
Author explanation models using GAN architectures on a telecommunications activity
13
14
/justification for dataset. It also contains a detailed metric evaluation and
15 choosing this comparison study with a deep learning based auto-regressive
16 manuscript type: model. Therefore, Regular manuscript is the most suited category
17 for this paper.
18 The paper is an intersection of telecommunications generated data
Author description of
19 and application of genrative adversarial models to understand how
how this manuscript fits
20 the unavailability of large scale data sets can be met by these AI
within the scope of IEEE
21 based techniques. The work certainly falls under the scope of
Access:
22 IEEE's wireless and applied artificial intelligence domains.
23 • A performance analysis of GANs designed to generate realistic
24 time-series data on a scarce telecommunications data set is
25 Author description conducted using a select group ofindirect metrics, which assess the
26 detailing the unique quality, fidelity and practical usefulness of GAN generated
27 contribution of the time-series data. • Analysis of the impact of data scarcity on
28 manuscript related to the performance of GAN techniques in generating realistic
29
existing literature: timeseries synthetic data. • Quantitative study of the utility of
30
generated data in downstream predictive modelling using
31
32
supervised ML approaches.
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60 For Review Only
IEEE Access Page 2 of 65

1
2 Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
3 Digital Object Identifier 10.1109/ACCESS.2022.DOI

4
5
6
7 Assessing Deep Generative Models on
8
9 Time Series Network Data
10
11 MUHAMMAD HARIS NAVEED1 , UMAIR SAJID HASHMI1 ,(Member, IEEE), NAYAB TAJVED1 ,
12 NEHA SULTAN1 , ALI IMRAN2 , (Senior Member, IEEE)
13 1
School of Electrical Engineering & Computer Science, National University of Sciences & Technology, PK
2
14 AI4Networks Research Center, School of Electrical & Computer Engineering, University of Oklahoma, OK, USA

15 Corresponding author: Umair Sajid Hashmi (e-mail: umair.hashmi@seecs.edu.pk)

16 This work was supported in part by the National Science Foundation (NSF) under Grant 1619346, and Grant 1923669; and in part by the
17 Qatar National Research Fund (QNRF) under Grant NPRP12-S 0311-190302.

18
19
20 ABSTRACT To achieve zero touch automation in next generation wireless networks through artificial
21 intelligence (AI), large amounts of training data is required. This training data is publicly unavailable and is
22 a major hindrance in research on AI applications to wireless communication. One solution is using limited
23 real data to generate synthetic data that can be used in lieu of real data. Generative Adversarial Networks
24 (GAN) have been used successfully for this purpose. In this paper, we choose two publicly available GAN
25 - based models and one deep learning - based autoregressive model. We then compare their performance
26 at generating synthetic time-series wireless network traffic data. We also assess the impact of data scarcity
27 on the generated data quality by varying the level of data available to the models for training. Moreover, in
28 order to assess the usefulness of this generated data, we compare the performance of a gradient boosting
29 regressor trained solely on generated data, real data, and a mix of both at forecasting network traffic. Our
30 experiments show that the GANs perform better than the autoregressive approach in each aspect considered
31 in this work and forecasting models trained to predict network load based on data generated by these GANs
32 yield error rates comparable to models trained on real data. Finally, augmenting small amounts of real data
33 with generated data leads to minor performance gains in some cases.
34
35
INDEX TERMS Machine learning, GAN, TimeGAN, PAR, DoppleGANger, Time series, Forecast analysis
36
37
38
39 I. INTRODUCTION detection and other such applications is substantial [5]–[7].
40 The ultimate desirable goal is to achieve AI enabled zero
The application of artificial intelligence (AI) in medicine,
41 touch automation in next generation networks with the aim to
power systems, image processing and other domains has
42 minimize operational cost, overcome operational complexity
become commonplace. Although motivated by gains and
43 and human errors, thereby maximizing resource efficiency
benefits of zero touch automation presented in earlier studies
and Quality of Experience (QoE).
44 such as [1], the realization of AI enabled gains in wireless
45 networks is yet to be witnessed in the real world. This is set Unfortunately, sufficient telecommunications network data
46 to change as the world moves to sixth generation networks required for executing sophisticated ML models is either not
47 (6G) and beyond, in which networks will be able to per- available, or is too scarce for effective ML model training
48 form self–configuration, self-optimization and self–healing and execution [8], [9]. This is largely due to privacy concerns
49 via real-time AI on network parameters [2]. Paired with and the hesitance of the telecom industry to open source
50 the rapid proliferation of new and diverse paradigms, such data that could potentially be used by their competition.
51 as AR (Augmented Reality) and VR (Virtual Reality), new Another challenge in getting ample training data is the large
52 connectivity use cases and applications such as IoT (Inter- amount of technical effort required to get data out of silos
53 net of Things) [3], URLLC (Ultra Reliable Low Latency within the operators where it remains trapped. When data is
54 Communication) [4], and holographic communications, the made available, it is usually locked behind non–disclosure
55 potential of harnessing the vast data produced by these sys- agreements or released to specific research groups. This is
56 tems in supervised machine learning (ML) problems to per- a major impediment to research in this domain and partly
57 form usage prediction, optimal resource allocation, anomaly responsible for the lag in applying ML techniques that we
58
59 VOLUME X, 2022 For Review Only 1
60
Page 3 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 see in the communication systems domain compared to other relevant literature in this domain, Section III describes the
3 domains where data is more freely available, such as image data sets we used, Section IV describes different time-series
4 processing. generative models and the architectures we are using. Section
5 One solution to this deadlock is to generate synthetic data V explains our methodology while Section VI explains the
6 that is faithful to the properties of the original data, but is experimental setup employed in our simulations as well as
7 different from the original data in terms of actual values. the results. The paper is concluded in Section VII with a
8 This paper aims to test the latest synthetic data generation summary of the work and possible future research directions.
9 techniques, especially those based on the generative adver-
10 sarial networks (GAN) model. GAN is a deep learning based II. RELATED WORK
11 generative model that has delivered impressive results in GANs were originally designed for simple image data, but
12 generating synthetic but near realistic images and videos since their inception have seen great advancements [10].
13 [10]. We apply two variants of the GAN model to three Although the extent of applicability of GANs in wireless
14 distinct internet activity time-series data sets and compare communications domain is still being explored, some recent
15 their performance against a non–GAN based method. We studies have investigated this particular research theme. To
16 then test the performance of the generated data in down- model the channel effects in an end-to-end wireless network
17 stream supervised machine learning applications, specifically in a data-driven way, [11] proposes to use a Conditional
18 forecasting internet traffic levels. We also analyze how, if GAN (CGAN). A novel architecture using GAN in [12] is
19 at all, is the performance of these methods affected by the designed to directly learn the channel transition probability
20 amount of data available. While the idea that more data (CTP) from receiver observations. Authors in [13] leverage
21 equals better performance is valid, it is interesting to see a GAN to create large amounts of synthetic call data records
22 on how much data can GANs give usable results. Finally, (CDRs) containing the start hour and duration of a call. The
23 since evaluation of GAN performance is an open problem, authors show a marked improvement in future call duration
24 and the metrics that do exist are geared towards evaluating prediction accuracy using real data augmented with GAN
25 image output, we use a variety of direct and indirect metrics generated synthetic data points.
26 to comprehensively evaluate the quality of the generated There have also been GAN-based approaches proposed for
27 synthetic data. Simply put, we assess how data sparsity generating different types of time-series data. Navid Fekri et
28 impacts several deep generative models’ ability to produce al. introduces a Recurrent GAN (R-GAN) [14] for generating
29 high quality synthetic time-series data, and then assess how realistic energy consumption data by learning from real data.
30 using this synthetic data improves or degrades performance Hyland et al. proposed a Recurrent Conditional GAN (RC-
31 of a forecasting model on real, unknown test data. GAN) [15] to produce realistic real valued multi-dimensional
32 With this study, we offer the following contributions: medical time series data. Olof Mogren proposes a continuous
33 recurrent neural network (C-RNN) based GAN model [16]
34 • A performance analysis of publicly available deep gen- that works on sequential data to synthesize music using a
35 erative models, with particular focus on GANs, de- long short-term memory (LSTM) NN for both generator and
36 signed to generate realistic time-series data on a scarce discriminator.
37 telecommunications data set is conducted using a select Many studies have tried to solve the training data scarcity
38 group of indirect metrics, which assess the quality, problem using GANs. Changhee Han et al. [17] introduced
39 fidelity and practical usefulness of the generated time- data augmentation in medical images using GANs. They con-
40 series data. We observe that the range of values and cluded that data augmentation can boost diagnosis accuracy
41 structure of a given time-series is retained in the GAN- by 10%. Similar work has been done on images of skin
42 generated series, but the same cannot be said for non- lesions [18] and brain tumor MRIs [19]. Similarly, SimGAN
43 GAN generated time-series data. [20] shows a 21% performance improvement in eye-gaze
44 • Analysis of the impact of data scarcity on the perfor- estimation by using an architecture similar to GANs.
45 mance of these techniques in generating realistic time Several evaluation techniques have been proposed for data
46 series synthetic data. While it seems intuitive that in- generated by GANs. Ali Borji [21] analyzes more than 24
47 creasing the training data would improve the generated quantitative and 5 qualitative measures for evaluating genera-
48 data quality, we observe that this is not always the case, tive models. He concludes that each have their own strengths
49 and that longer time-series sequences with long-term and limitations. He also proposes that the evaluation tech-
50 trends are harder for GAN models to learn faithfully. niques should be application specific. The most common
51 • Quantitative study of the utility of generated data in GAN metrics are Frechet Distance [22] and inception score
52 downstream predictive modelling using supervised ML [23], but many more have been proposed, such as the fidelity
53 approaches. We observe that for simple forecasting pur- and diversity generative model metrics explained in [24]. All
54 poses, the error between the generated and real data in all, as noted in [25], GAN performance evaluation remains
55 trained model is between 1 to 4 percent for our best an open and challenging research problem.
56 performing model. To the best of our knowledge, our work is the first to
57 The paper is organized as follows: Section II discusses compare several modern GAN models vs a non-GAN ap-
58
2 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 4 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 FIGURE 1. A sample figure taken from the Telecom Italia dataset [8] which
divides the coverage area of Radio Base Station into grids which helps in
17 identifying the geographical location of the user.
18 FIGURE 2. Administrative map of Milan with our selected areas highlighted.
19
20
proach as well as the first to stress test the given models
21
with scarce data. It is also one of the few papers that looks
22 at telecommunications data instead of energy, financial or
23 medical time-series data.
24
25
26 III. TELECOM ITALIA DATASET
27 A. BACKGROUND
28
As mentioned before, telecommunications data is rarely
29
open-sourced or easily available. When it is available, it is
30
limited to a few research teams that have signed Non - Dis-
31
closure Agreements with service providers. Telecom Italia
32 (renamed TIM Group in 2019) recognized that this situation
33 hampered independent researchers who couldn’t access data
34 for analysis and model training purposes. So in 2014, it orga-
35 nized the ‘Telecom Italia Big Data Challenge’. Participants
36 were given access to telecommunications, weather, social FIGURE 3. Internet activity over seven weeks in our selected areas. Note the
37 media, news and electricity consumption data of Milan and differences in trend and seasonality over time across all three localities.
38 the province of Trentino from November 2013 to January
39 2014, with the non - telecommunication data sets provided
40 by other industrial partners [8].
B. REGIONS
41 In order to judge how well our selected generation methods
42 We will be focusing on the telecommunication activity work for time-series data of different nature, we look at
43 data provided in this set, specifically the call detail records data from three regions with interesting activity patterns. By
44 (CDRs), which contain time-series data on internet usage in ‘interesting’, we mean that the network activity from these
45 all regions of the Milan Metropolitan area from November regions contains daily and weekly seasonality as well as
46 to December 2013. The time range considered in our work an overall trend over the selected timespan. These selected
47 is from Monday, November 4, 2013, to Sunday, December regions are Bocconi university, a private university, the
48 22, 2013. Since the overall challenge made use of data Navigli district, a wedge between two canals that boasts
49 from multiple domains provided by different companies, all upscale restaurants, bars, art galleries and is popular with
of which used different spatial and temporal aggregation
50 tourists. The third region is the Duomo cathedral and its
methods, Telecom Italia standardised them all onto a single
51 surroundings, which is a religious and tourist hotspot.
100x100 grid for Milan, with each square covering about
52 Fig. 1 and 2 illustrate the coverage area and the geograph-
235x235 metres of area. The internet usage is measured
53 ical placement of the selected regions respectively. Fig. 3
in ‘number of connections created’ in 10 minute intervals.
54 illustrates the activity patterns of each region over seven
However, for ease of analysis and training we down-sample
55 weeks. We can observe a daily seasonality in Bocconi, with
these measurements to an hourly interval.
56 usage peaking around midday and almost disappearing at
57 night. There is an expected decline in internet usage on
58
VOLUME X, 2022 3
59 For Review Only
60
Page 5 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 the weekends, which is due to low activity levels in the Bernoulli distribution means that this model cannot learn all
3 University campus. In Navigli, we do not observe a dramatic types of distributions. This weakness is corrected by models
4 decrease in activity on any particular day, but we observe like Neural Auto-regressive Distribution Estimator (NADE)
5 two peaks per day, one at midday, and a higher one around [30], that use neural networks for parameter estimation,
6 midnight, indicating the district is a nightlife hub. Lastly, which is more efficient than the simple approach described
7 Duomo resembles a sinosoidal pattern, but a closer look earlier.
8 shows that there’s a slight uptick in activity every Friday and
9 Saturday towards twilight. Thus, all three regions are distinct 1) Probabilistic Auto-regressive model (PAR)
10 time-series data sets and it will be interesting to see whether One implementation of an auto-regressive generative model
11 the performance of GANs will differ due to this fact. This is PAR, which is a model in the Synthetic Data Vault
12 is also important since time-series data can possess many collection of generative models [31]. PAR uses recurrent
13 different trends and patterns thus, any generative models neural networks to model the distributions. Since RNNs are
14 must be able to perform well across the board in order to designed to handle sequential data, it is much more suited to
15 be useful. our task of time series generation.
16
Given a function h employed within an RNN, we calcu-
17 IV. TIME SERIES GENERATIVE MODELS
late its hidden states hi = h(hi−1 , pi−1 , θ) for each time-
18 We now provide an overview of the structure of time-series instance, depending on the prior corresponding hidden state
19 data and the various methods used to generate them, includ- hi−1 , preceding input value pi−1 and the network hyper
20 ing a more in-depth explanation of the specific techniques parameters θ. These three variables establishing the hidden
21 that we’re using in this work. A time series is an ordered state hi construct a set of parameters θ(hi ) which represent a
22 sequence of values of a variable taken at equally spaced time distribution with density θ(hi )(pi ), resulting in the following
23 intervals. Thus, any analysis or generation of time series must conditional distribution:
24 take into account that data points taken over time may have
25 an internal seasonality and trend [26]. Mathematically, most T
Y
26 time series’ can be written as: qθ [pt0 :T |p1:t0 −1 ] = ℓψhi (pi ) (3)
27 i=t0
28 xt = st + gt + et (1)
29 where qθ [pt0 :T |p1:t0 −1 ] is a parametric distribution specified
30 where st =seasonality, gt =trend, and et =residuals. by learn-able parameters θ, which are obtained by using past
31 Here, t = 1, 2, 3, ..., N represents the time index at which values p1:t0 −1 from a time series p = (pt ); 1 ≤ t0 ≤ T ;
32 observations have been recorded. to forecast future values pt0 :T This conditional distribution
33 can then be used to generate new synthetic data similar to the
34 A. DECOMPOSITION - BASED METHODS original input data. We will be evaluating PAR against our
35 As shown above, a time-series generally comprises of sea- chosen GAN - based models to evaluate whether the GAN
36 sonality, trend and residual terms. One method of time framework yields any noticeable performance improvements
37 series generation involves decomposing a time series into against this simpler deep learning-based model. A more com-
38 its components, then adding a deterministic and stochastic prehensive review of time series data generation techniques,
39 component which is constructed by optimizing weights for specifically for data augmentation in time series classification
40 the trend and seasonality components and by modelling the and clustering tasks, can be found in [32].
41 residuals via some statistical model [27]. Another approach
42 uses bootstrapping on the residuals obtained after decompos- C. GAN - BASED METHODS
43 ing to create augmented signals and then combines them with We now move to our main focus, the GAN-based models, in
44 trend and seasonality to create a new time series [28]. this section. We will give a brief overview of the GAN design,
45 its structure, and its potential issues. First introduced in 2014
46 B. AUTO-REGRESSIVE MODELS [10], GANs consist of two competing agents, typically neural
47 Auto-regressive models try to forecast future values of a networks, referred to as the generator and discriminator. The
48 series based on past values of the same series and a stochastic generator network attempts to model a noise vector z to fit
49 term. In the simplest auto-regressive generative model, the the probability distribution of the input data, whereas the
50 conditional distributions p(xi |x<i ) correspond to a Bernoulli discriminator attempts to accurately classify the generated
51 random variable and the model learns a function that maps data from the real data. Loss convergence of generator and
52 preceding variables x1 , x2 , ...xi−1 to the mean of this distri- discriminator terminates the training period. Essentially, the
53 bution resulting in (2) from [29]: two networks are jointly involved in a 2 – player min-max
54 game up until the discriminator fails to distinguish between
pqi (xi |x<i ) = Bern(fi (x1 , x2 ...xi−1 )). (2)
55 the real data and the generated data, a point denoted by
56 New data can then be generated by sampling from the attainment of Nash equilibrium [33].
57 conditional distribution learnt by the model. The use of the Mathematically, this process is expressed in [10] as:
58
4 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 6 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
FIGURE 4. Simple GAN Structure
13
14
15 min max V (D, G) =Ex∼pdata(x) [log(D(x)]+
G D (4)
16
Ez∼pz (z) [1 − log(D(G(z))]
17
18 Here x is the input data, log(D(x)) is the predicted output
19 of the discriminator for xi , log(D(G(z)) is the output of FIGURE 5. Simplified block diagram of TimeGAN
20 the discriminator on the GAN generated data G(z). The aim
21 is to maximize the ability of the discriminator to identify
22 real data from generator produced data, so we maximize this network also first produces samples in the latent space which
23 value, whereas the generator part of the equation tries to are then converted back to the original feature space via a
24 minimize the discriminator’s ability to correctly classify real recovery network. This is done to reduce high dimensionality
25 and fake data. This translates to maximizing the first term and in the adversarial learning space. A high level structure of the
26 minimizing the second term of (4). The basic GAN structure TimeGAN is shown in Fig. 5.
27 is illustrated in Fig. 4. The recovery and embedding parts are trained via a super-
28 GANs initially became popular for their ability to produce vised and reconstruction loss. The reconstruction loss ensures
29 high-quality image data while avoiding the problems associ- the learnt latent representation is correct and the supervised
30 ated with using Markov chains or approximating unsolvable loss aids the generator in learning the temporal dynamics of
31 likelihood functions. This is achieved by training the GANs the data. The generator and discriminator are then trained in
32 via backpropagation and using dropout regularization. The a typical adversarial fashion. The losses used to train the Em-
33 drawback is that GANs are often hard to train, and suf- bedding, Recovery, Generator and Discriminator networks
34 fer from problems like overfitting (reproducing input data), can be found as Eqs. (7), (8) and (9) in [35].
35 mode collapse (generating samples from only one class in
36 the data) and training instability. 2) DoppelGANger
37 The use of GANs to generate tabular or time-series data is The DoppelGANger introduces several new ideas to solve
38 less common than generating images or video. One model typical GAN problems like overfitting as well as problems
39 used to generate tabular data is the CTGAN (Conditional faced when generating longer, more complex time-series data
40 Tabular GAN) [34]. Whereas, the TimeGAN [35] and Dop- [36]. One change in design that is most relevant to our work,
41 pelGANger [36] are frameworks that modify the traditional is how much of the data is generated in a single instance.
42 GAN architecture to make it more suitable for time-series Since RNNs produce a single measurement in a single pass
43 data. These last two models are what we use in this work. A and for a time-series of length L perform L passes, they
44 brief overview of them is given below. tend to forget prior values and struggle to capture long
45 term correlations in a time-series. Furthermore, authors in
46 1) TimeGAN [36], in Section 4.1, page 5 state that even LSTMs, which
47 The autoregressive models mentioned above are good for were designed to correct the above mentioned problem with
48 capturing the temporal dynamics of a sequence, but are RNNs, empirically struggle to perform well when the length
49 deterministic in nature as opposed to generative. Conversely, of a time-series surpasses a few hundred records.
50 GAN architectures such as the RGAN used in [15] do not DoppelGANger solves this by modifying the RNN struc-
51 really take into account the inter-row dependencies of time ture to produce S values in a single pass. This reduces
52 series data. The TimeGAN solves this problem by combining the overall number of passes required to generate the en-
53 the GAN framework with an autoregressive setup. Both are tire series, but the quality of the generated samples also
54 trained jointly with the unsupervised GAN loss guided by deteriorates as S increases. It uses the Wasserstein Loss as
55 the supervised autoregressive loss. Additionally, the model opposed to the regular GAN loss function since the former
56 makes use of an embedding network to map high level leads to more stable training in this case. The optimization
57 features to a low-level latent feature space. The generator function for this GAN architecture may be expressed as:
58
VOLUME X, 2022 5
59 For Review Only
60
Page 7 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 A. QUALITATIVE ASSESSMENT
3 Perhaps the simplest way to determine how similar our data is
4 to the original, true data is to simply visualize it. While this
5 approach does leave out hard numbers, it gives us a quick
6 high level view of the shape and spread of the generated data
7 distribution. In this work, we will analyze histograms and
8 auto-correlation function (ACF) plots for this purpose. The
9 histograms will allow us to judge whether the GANs produce
10 data that faithfully captures the range of values present in
11 the original data as well as its distribution. The ACF plots
12 are based on calculating the correlation of a time-series with
13 itself at different, equidistant points in time (referred to as
14 lags). In our case, we choose a lag of up to 168 since we have
15 hourly data over multiple weeks and expect network activity
16 FIGURE 6. Simplified block diagram of DoppelGANger
patterns to repeat. In general, our aim is to see how similar
17 the ACF plot and histograms of the generated data are to real
18 data.
19 min max L1 (D1 , G) + L2 (D2 , G) where Li for i=1, 2 is the
G D1 ,D2
20 Wasserstein Loss, described in [36] as: B. KULLBACK-LEIBLER DIVERGENCE
21
Li = Ex∼px [Ti (Di (x)] − Ez∼pz [Di (Ti (Gz ))] The Kullbeck-Liebler Divergence (or KL - Divergence, or
22 (5)
2 KLD) measures the number of extra bits needed to repre-
23 −λEx̂∼pˆx [(||∇x̂ Di (Ti (x̂)||2 ) − 1) ]
sent a true distribution P with a code written to represent
24
where T1 (x) = x, T2 (x) = tx + (1 − t)(G(z)) and t is a distribution Q which is an approximation of P . Thus, KL
25
value from the uniform distribution. - Divergence can be interpreted as the inefficiency caused
26
A basic block diagram for a DG is illustrated in Fig. 6. by using an approximate distribution Q rather than P . Note
27
A more detailed structure with explanation can be found in that this does not mean that KLD is a distance measure; it
28
Fig.7 of [36]. is not since it is asymmetric and does not obey the triangular
29
inequality. While the use of KL - Divergence is uncommon in
30
V. EVALUATION METRICS comparing time-series data, we use it to evaluate the distribu-
31
Evaluating the quality of GAN generated data is an open tions of the two data sets rather than their relationship in time.
32
research problem. Unlike discriminative models which can This can be done by calculating the KLD after discretizing
33
be evaluated on fairly robust metrics like accuracy, precision the continuous time-series and using the bin counts to create
34
and F1 score among others, generative models have no such probability distributions.
35
counterparts. In the case of images, visual inspection is relied The mathematical form of the KLD is shown below:
36
on to determine whether the produced image is good quality.
37 DKL (P || Q) =
X
P (x) log
P (x)
. (6)
This is not feasible with tabular forms of data. Another
38 x∈X
Q(x)
method is to indirectly evaluate the quality of the generated
39
data by seeing how well it performs when substituted in place
40 P(x) represents our true distribution (the real data), whereas
of real data in supervised tasks such as classification and
41 Q(x) represents the approximate distribution (the generated
forecasting.
42 data). Since the metric is asymmetric, the position of the
43 Study in [21] provides a comprehensive survey of potential two series in the formula is important and interchanging the
metrics that can be used to evaluate GANs, Most of our cho-
44 position changes the results.
sen metrics indirectly evaluate the quality of the generated
45
data either via use in forecasting models, qualitative assess-
46 C. DYNAMIC TIME WARPING (DTW)
ments, or quantitative measures such as distance. Our criteria
47 First introduced in 1978, Dynamic Time Warping is a class
for choosing evaluation metrics is based on the following
48 of algorithms that can be used to compare two ordered
three principles:
49 sequences against each other [37]. These could be speech
50 • The metrics should favor the generated data that is most sequences, music or any other time ordered sequence. It was
51 similar to the original data. originally used in spoken word recognition since it could
52 • The metrics should reward models that generate diverse align the time axis of two sequences, say, the words now
53 examples and are not prone to common GAN problems and noow and calculate the Euclidean distance between them.
54 such as overfitting, mode collapse, and mode drop. This is opposed to directly using a distance metric which
55 • The metrics should be computationally inexpensive and would give a large value in the comparison of any two such
56 as simple to interpret as possible. sequences, even though they are identical. In this work, we
57 employ an R implementation of the algorithm [38].
58
6 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 8 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
13 FIGURE 7. Aligning two dummy series in DTW [38].
14
15
16 In mathematical terms, DTW finds a warping function
17 ϕ(k), described in [38] as:
18
ϕ(k) = (ϕx (k), ϕy (k)), (7)
19
FIGURE 8. Block Diagram of the Imputation Utility process
20 where ϕx and ϕy remap the time indexes of the reference
21 series x and test series y. Given these warping functions,
22 we find the average distance between these warped x and y
23 series. The aim of the algorithm is to align the two series in
24 such as a way so as to reduce the distance between the series’
25 as much as possible. Thus, the optimization problem is given
26 in [38] as:
27 D(x, y) = minϕ dϕ (x, y). (8)
28
29 The left-over distance is the inherent difference between two
30 sequences. Fig. 7 illustrates how the DTW algorithm aligns
31 two time-series sequences.
32
D. TRAIN SYNTHETIC TEST REAL (TSTR) AND DATA
33
AUGMENTATION
34
35 TSTR is an indirect evaluation technique where a predictive FIGURE 9. Block Diagram of the TSTR process.
36 model is trained on synthetic data and verified on real data.
37 The dataset generated by GAN is used for training a model
38 which is then tested on examples from real dataset. It was the data changes. The forecasted values are then compared
39 proposed by [15]. We use this technique to evaluate the against the real values to calculate the MAPE. Following this
40 telecommunications data generated by our selected models TSTR pipeline, illustrated in Fig. 9, allows us to compare
41 using a simple Gradient Boosting Regressor. Firstly, we how much prediction accuracy is affected by replacing the
42 partition the original dataset into a train and test set. The real data with synthetic data in a downstream application
43 model is trained on the training set and tested on the held-out such as network load forecasting.
44 test set. This process is Train on Real, Test on Real (TRTR).
45 The test set produced during TRTR is passed to the same E. IMPUTATION UTILITY
46 model trained on synthetic data. The model’s performance In machine learning problems, missing data poses a serious
47 is assessed using mean absolute percentage error (MAPE) threat to the learning ability of the model. It can introduce
48 which has the following mathematical representation: bias, make data handling onerous, and reduce efficiency.
49 n
100 X |y − ŷ| Missing data is often filled in, or imputed, using a variety
50 M AP E (y, ŷ) =
n i=1 y
. (9) of statistical and ML - based techniques. These include: i)
51 carrying the last available reading forward, ii) filling with
52 Here, y and ŷ represent the true and forecasted data points, mean values, iii) imputation via K-nearest neighbours and iv)
53 respectively. interpolation.
54 Additionally, we also augment a small amount of real In this work, to introduce scarcity, we delete data from
55 data with different amounts of synthetic data to study any the sequence randomly. After we have obtained our GAN
56 improvements in forecasting accuracy. The total data is fixed generated data, we impute the missing data using the cor-
57 at 5 weeks, but the number of real and synthetic weeks in responding sample points from our synthetic data. Similarly,
58
VOLUME X, 2022 7
59 For Review Only
60
Page 9 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 TABLE 1. Data organization scheme. ’Weeks’ represents the number of TABLE 2. An example of the required structure for three weeks of data before
3 weeks of data that is being used. ’Timespan’ shows the date range and it is passed to the DoppelGANger.
’length’ is the number of recordings in the data.
4
Week of year 0 1 2 3 ... 168
5 Weeks timespan length 45 100 150 200 250 ... 400
6 1 11/4/2013 - 11/10/2013 168 46 10 15 20 25 ... 40
7 3 11/4/2013 - 11/24/2013 504 47 1 5 2 4 ... 6
5 11/4/2013 - 12/08/2013 840
8 7 11/4/2013 - 12/22/2013 1176
9 test week 12/9/2013 - 12/15/2013 168 TABLE 3. Hyper-parameters used for each data set for the DoppelGANger.
10
11 Hyper-parameter Bocconi Navigli Duomo
12 we fill the same points via quadratic interpolation. Finally,
epochs 10,000 2,000 2,000
aux discriminator False False False
13 we pass the sequence with missing values, the sequence self - norm False False False
14 with imputed GAN generated values, and the sequence with sequence length 24 12 12
15 interpolated values to the same model and assess how the
16 forecasting accuracy is affected. Fig. 8 shows the imputation
17 utility methodology. The %x denotes the amount of data Thus, by using the features above as inputs and training
18 removed, which takes on the value of 20, 40, 60 and 80 to predict the difference, we are able to ‘tabularize’ the
19 percent. time-series data and use it to train a decision tree - based
20 algorithm to make forecasts. The model uses 100 estimators,
21 VI. RESULTS AND DISCUSSION
0.05 learning rate, and max tree depth of 12.
22
A. PRE-PROCESSING & SETUP
23
24 Figures 10 and 11 show our data generation pipeline and how
B. MODEL TRAINING
25 real and generated data is compared with our comparative
1) DoppelGANger
26 methods respectively.
We break each regions’ data into four segments and one Detailed instructions on how to prepare data for use with the
27 DoppelGANger can be found on the GitHub repository of
28 test week, details of which are given in Table 1. The test
week in Table 1 is the week we forecast for. For conciseness, [36]. In this section, we will only share the details relevant to
29 reproducing our results.
30 we report the qualitative assessment, KL - Divergence and
Dynamic Time Warping of 3, 5 and 7 weeks of data only. The DoppelGANger is designed to work with time-series
31 that also have corresponding static features/metadata (like
32 For the Train Synthetic Test Real (TSTR), augmentation
and imputation assessments, a forecasting model is trained ISP provider, area etc). However, the data sets we’re working
33 with are univariate time-series with no metadata.
34 on 1, 3 and 5 weeks of data only and then used to forecast
over the test week, which is chronologically the 6th week in To solve this, we create a dummy metadata variable based
35
the time-series. If we use 7 weeks of data as well, that would on the week of year corresponding to our data. Since our data
36
require making the test week the 8th week. The problem is has hourly resolution, there are 168 readings in each week.
37
that the 8th week lies in the Christmas season and its readings This dummy metadata is then scaled between 0 and 1. Note
38
are radically different from our training data. Thus, using 7 that it is also possible to use a constant value in the place of
39
weeks of data is not appropriate in these assessments. metadata as well.
40
To perform the forecasting, we avoid statistical models like Table 2 illustrates how the data should be structured when
41
autoregressive integrated moving average (ARIMA) or error using three weeks of data. The same can be extended to five
42
trend and seasonality (ETS) since they require significant and seven weeks as well. Table 3 lists the hyper-parameters
43
tuning with domain knowledge of the data set. Also LSTMs used in our work. Any other hyper-parameters not mentioned
44
or other deep learning based architectures are not employed here used default values.
45
46 owing to a lack of training data. Instead, a simple gradient
boosting regressor provided by scikit-learn [39] is used in 2) TimeGAN
47
48 this work. It is possible to use Tree - based models for this As in the case of DoppelGANger, instructions on how to use
49 purpose, as demonstrated in [40]. In order to train the model, TimeGAN can be found in the GitHub repository correspond-
50 we extract several features of the time-series by hand, these ing to [35]. The code for the TimeGAN does not provide
51 are briefly explained below: a way to get the generated time-series with timestamps and
52 • T-1: The time-series with lag 1. corresponding values. Therefore, there are certain modifica-
53 • Hours: The hour at which the sample is taken. tions required before it can be used. These changes are listed
54 • 1st difference: Difference between consecutive time- below:
55 series values. • The input data cannot contain timestamps. Thus, in
56 • 2nd difference: Difference between consecutive values order to recreate timestamps for the generated data, pass
57 of the 1st difference. ’day of year’ and ’Hour’ as features along with the
58
8 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 10 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
FIGURE 10. Block diagram depicting our data division and generation process.
15
16
17 TABLE 4. Hyper-parameters used for the TimeGAN
18
19 Hyper-parameter Value
20 sequence length 10,000
module GRU
21 hidden dimensions 28
22 number of layers 3
23 iterations 50,000
batch size 64
24
25
TABLE 5. Hyper-parameters used for PAR
26
27 Hyper-parameter Value
28 epochs 1,000
29 segment size None
30
31
32 three used in this work. The input data is simply the internet
33 usage values against the corresponding timestamps. Table 5
FIGURE 11. Block diagram depicting our comparative methods and outlining
34 which techniques use what amount of data.
lists the hyper-parameter values we used.
35 We generate 1 sample from the trained PAR model. This
36 generated data is identical to the input data but filled with
37 internet connections value. If the data spans multiple synthetic values. More information on how PAR works can
38 years, then a ’year’ feature will also be included. be found in the documentation of SDV’s GitHub repository.
39 • The TimeGAN outputs generated data in normalized
40 form. Thus, the ’MinMaxScalar’ function in the code C. QUALITATIVE ANALYSIS
41 needs to be modified to return the maximum value 1) Histograms
42 and difference between the minimum and maximum For Bocconi, a quick look at Fig. 12 shows the performance
43 values. These values can then be used to denormalize disparity between the three models. It is clear that both
44 the generated data. DoppelGANger (DG) and TimeGAN (tgan) perform well
45 • The generated data is in the form of a 3D array in the in terms of capturing the full range of values present in
46 shape of (Number of samples, sequence length, Number the original data. However, a closer look shows that the
47 of features). In order to recreate the generated data in the TimeGAN struggles to capture outlier values such as the
48 form of the original data, iterate through each sample, values greater than 700. We see that the DoppelGANger does
49 denormalize it, and then append all the samples together. a slightly better job at capturing less frequent values. The
50 We observed that the same parameters worked well for all PAR model misses large parts of the original distribution, and
51 three data sets in the case of the TimeGAN. Please note that even generates negative values at 7 weeks of data.
52 hyper-parameter ’sequence length’ in Table 3 is not the same In the case of Navigli, these observations are more marked in
53 as ’sequence length’ in Table 4. a similar analysis of the Navigli area, shown in Fig. 14. We
54 observe an increasing overlap between the two distributions
55 3) PAR as the training data is increased. PAR performs better here
56 PAR (Probabilistic Auto-Regressive) is part of the Synthetic too compared to Bocconi, although it still doesn’t capture
57 Data Vault [31]. PAR is the easiest model to train out of all the multi-modal distribution properly, it doesn’t output any
58
VOLUME X, 2022 9
59 For Review Only
60
Page 11 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 negative values either. of data and the opposite for the DoppelGANger. While not
3 In Duomo in Fig. 16 both DoppelGANger and TimeGAN shown here, the same explanation also applies to Navigli.
4 perform really well. PAR performs decently when trained and Duomo on the other hand, features no such long-term down-
5 compared with three weeks of data, but performance quickly trend, which is why we see a decrease in DTW scores as the
6 deteriorates; on five weeks of data it produces negative values amount of available data is increased.
7 and at seven weeks of data it underestimates the right-skew
8 of the original distribution. 2) KL - Divergence
9 In short then, both TimeGAN and DoppelGANger generated The KL - Divergence calculations for all three regions are
10 data lies between the range of the original input data, whereas shown in Table 7. There is no significant pattern to be seen,
11 PAR’s performance fluctuates significantly and its generated other than that differences for DoppelGANger and TimeGAN
12 data contains negative values that weren’t present or even tend to stay consistent, whereas PAR generated data outputs
13 possible in the input data. the largest values and demonstrates the most volatility. In
14 general, from an information theory perspective, we would
15 2) ACF Plots not need a great many more bits to represent the generated
16 While the histograms give an idea about the distribution of distribution as the real distribution.
17 the values of a time-series, they do not provide any informa-
18 tion on how well the generated series captures the temporal 3) Train Synthetic Test Real (TSTR)
19 correlation of the original series. For this, we create a series Looking at Table 8, we can make two observations. The first
20 of auto-correlation function (ACF) plots in a similar fashion can be seen in the results for Bocconi where we see that
21 to the histograms. Fig. 13 shows the ACF plots for Bocconi increasing the amount of data leads to negligible changes in
22 region. The plots depict slightly better performance of the accuracy for both DoppelGANger and TimeGAN. PAR goes
23 DoppelGANger than the TimeGAN in capturing the extreme against this trend but we’ve already seen above that PAR’s
24 correlation peaks. However, PAR fails to capture the temporal generated data is inconsistent so this is understandable. In
25 dependence between the series. any case, compared to the TRTR values, we see a 5% and 9%
26 The same idea is repeated in the Navigli district in Fig. 15, error difference between models trained on five weeks of data
27 where the performance of the TimeGAN deteriorates com- generated by TimeGAN and DoppelGANger respectively,
28 pared to that of DoppelGANger, while PAR again strug- this shows that for forecasting a series’ such as Bocconi,
29 gles with capturing temporal dynamics of the input time- which has weekly and daily seasonality, the generated data is
30 series. We observe an improvement in how well Doppel- somewhat less useful. The advantage of the DoppelGANger
31 GANger captures the negative correlation peaks as we in- over the TimeGAN comes because the TimeGAN cannot
32 crease the training data but the opposite applies in case of predict the troughs on the weekends well, largely because it
33 the TimeGAN. Compare that to Fig. 17 that shows the ACF fails to capture them during generation as well.
34 plots for Duomo, and where the plots for DoppelGANger The situation in Navigli and Duomo is somewhat better. In
35 and TimeGAN are nearly identical. Thus, it seems that the the former the accuracy for DoppelGANger and TimeGAN
36 models struggle to capture the correlations across a more improves as we increase the amount of generated data. A
37 chaotic time-series like Navigli but easily reproduce the plot of the real vs predicted value is given in Figure 19
38 correlations in places like Duomo and Bocconi, which have for TimeGAN. These accuracy values are also close to the
39 more regularly repeating activity patterns. accuracy of the real data, indicating that both can produce
40 synthetic data comparable to the true data. In Duomo, this
41 D. QUANTITATIVE ANALYSIS trend only applies to the DoppelGANger, with an error
42 Now that we have a cursory idea of the quality of the difference less than or equal to 3% between it and the true
43 generated data, we will quantify its usefulness through some data.
44 direct and indirect metrics which have already been explained Overall, the DoppelGANger generates significantly better
45 in the preceding sections. quality data than the TimeGAN or PAR across all three
46 regions.
47 1) Dynamic Time Warping
48 The DTW results in Table 6 show several interesting things 4) Performance with Augmented Data Set
49 and corroborate some of the observations made in the prior Augmenting the real data with synthetic data yields results
50 section. In all three regions, PAR’s generated series’ are shown in Table 9. There are no across the board improve-
51 extremely different from the ground truth. While looking ments, and the improvements observed are minimal. For
52 at Bocconi, we see that the DTW score increases as the instance, in Bocconi, adding 4 weeks of DoppelGANger syn-
53 generated series get longer. This may be explained by Fig. 18, thetic data to 1 week of real data does improve performance
54 where we see that the real data decreases in amplitude to- over just using 1 week of real data as shown in Table 9,
55 wards the end. This trend is not captured by DoppelGANger but the improvement is only 0.6%. In Navigli, adding two
56 but is somewhat reflected in the TimeGAN’s output leading weeks from either PAR, DoppelGANger or TimeGAN to
57 to an improved DTW score for the TimeGAN on 7 weeks three weeks of real data yields improvement, but as before
58
10 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 12 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 FIGURE 12. (a), (b) and (c) show the distribution of values in the generated data when the models were trained on 3, 5 and 7 weeks of real data respectively, for
Bocconi. Note that DG and tgan are short for DoppelGANger and TimeGAN respectively
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54 FIGURE 13. (a), (b) and (c) show the ACF plots of real vs generated data when the models were trained on 3, 5 and 7 weeks of real data respectively, for Bocconi.
55 Note that DG and tgan are short for DoppelGANger and TimeGAN respectively.

56
57
58
VOLUME X, 2022 11
59 For Review Only
60
Page 13 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 FIGURE 14. (a), (b) and (c) show the distribution of values in the generated data when the models were trained on 3, 5 and 7 weeks of real data respectively, for
Navigli. Note that DG and tgan are short for DoppelGANger and TimeGAN respectively
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54 FIGURE 15. (a), (b) and (c) show the ACF plots of real vs generated data when the models were trained on 3, 5 and 7 weeks of real data respectively, for Navigli.
55 Note that DG and tgan are short for DoppelGANger and TimeGAN respectively.

56
57
58
12 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 14 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 FIGURE 16. (a), (b) and (c) show the distribution of values in the generated data when the models were trained on 3, 5 and 7 weeks of real data respectively, for
Duomo. Note that DG and tgan are short for DoppelGANger and TimeGAN respectively
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54 FIGURE 17. (a), (b) and (c) show the ACF plots of real vs generated data when the models were trained on 3, 5 and 7 weeks of real data respectively, for Duomo.
55 Note that DG and tgan are short for DoppelGANger and TimeGAN respectively.

56
57
58
VOLUME X, 2022 13
59 For Review Only
60
Page 15 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 TABLE 6. DTW - based similarity scores (lower the better) for Bocconi, Navigli and Duomo. ’Weeks’ indicates the amount of data used in the calculations in terms
3 of number of weeks.

4 Bocconi Navigli Duomo


5 Weeks DoppelGANger PAR TimeGAN DoppelGANger PAR TimeGAN DoppelGANger PAR TimeGAN
6 3 19.28 260 30.65 35.2 67.93 32.37 42.38 95.74 47.52
7 5 21.89 67 36.7 31.58 53.04 34 38.73 459.25 48.4
7 25.28 238.54 31.47 40.19 54.91 34.26 36.14 109.4 36.8
8
9
TABLE 7. KL - Divergence (lower the better) for Bocconi, Navigli and Duomo. ’Weeks’ indicates the amount of data used in the calculations in terms of number of
10 weeks.
11
12 Bocconi Navigli Duomo
13 Weeks DoppelGANger PAR TimeGAN DoppelGANger PAR TimeGAN DoppelGANger PAR TimeGAN
3 0.01 0.08 0.02 0.006 0.12 0.009 0.02 0.1 0.049
14 5 0.01 0.13 0.032 0.014 0.10 0.004 0.05 0.15 0.06
15 7 0.03 0.22 0.034 0.034 0.04 0.017 0.05 0.24 0.05
16
17 TABLE 8. Train Synthetic Test Real (TSTR) and Train Real Test Real (TRTR) prediction accuracies (in MAPE) Bocconi, Navigli and Duomo. ’Weeks’ indicates the
18 amount of data used in the calculations in terms of number of weeks. Note that DG and tgan are short for DoppelGANger and TimeGAN respectively.
19
20 Bocconi Navigli Duomo
Weeks DG tgan PAR TRTR DG tgan PAR TRTR DG tgan PAR TRTR
21 1 12.21 15.72 28.16 10.48 7.6 10.33 15.46 7.34 12.5 13.2 26.5 9.34
22 3 12.25 16.54 26.66 8.47 7.51 7.89 24.58 7.20 11.08 12.4 22.1 7.68
23 5 12.59 16.21 20.7 7.26 7.24 8.1 16.54 6.84 10.11 18.3 30.8 7.14
24
25 TABLE 9. Augmented data performance for Bocconi, Navigli and Duomo (For reference, TRTR for 5 weeks’ real data is 7.26%, 6.84% and 7.14% for Bocconi,
Navigli and Duomo respectively). All values are in MAPE. ’R + S (Weeks)’ means Real + Synthetic number of weeks, so ’1 + 4’ means 1 real and 4 synthetic weeks
26 of data.
27
28 Bocconi Navigli Duomo
29 R + S (Weeks) DoppelGANger TimeGAN PAR DoppelGANger TimeGAN PAR DoppelGANger TimeGAN PAR
1+4 9.88 15.04 13.45 7.41 7.67 8.5 9.6 10.9 13.85
30 4+1 7.61 8.46 7.73 6.76 7.07 6.78 8.5 7.63 8.59
31 2+3 8.84 11.27 10.4 7.42 7.43 7.66 8.8 9 11.7
32 3+2 7.88 7.70 8.83 6.93 6.53 6.37 7.9 8.8 7.86
33
34
35 data at all.
36 The above results show that combining synthetic data with
37 real data can yield performance benefits in a time-series
38 forecasting framework. These improvements are minor, but
39 that can be expected given that we are working with scarce
40 data.
41
42 5) Imputation Utility
43 Tables 10, 11 and 12 display the prediction errors for
44 five models trained on mixed data from all three generative
45 models, as well as interpolated data. For comparison, we also
46 train a model on the smaller data set created by randomly
47 removing observations. We see that the performance of all
48 models deteriorate as we use them to fill larger gaps in the
49 real data set. However, the DoppelGANger’s performance
50 appears the most stable, followed by the TimeGAN although
51 it exhibits higher errors overall. We also observe the model
52 FIGURE 18. Generated Data vs Real Data, Bocconi, shown over 7 Weeks for trained on PAR imputed data performs worse than a model
all three models.
53 trained on true unimputed data, but its performance suddenly
54 improves when the original data is reduced by 90%. We can
55 say that PAR’s uncharacteristically good performance in this
56 this improvement is small; less than 0.5%. In Duomo, we case is probably due to our model learning some spurious
57 see no improvement in any combination of real and synthetic correlations that happen to allow it to predict well on the
58
14 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 16 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 TABLE 10. Imputing missing data via interpolation and with generated data for Bocconi university. All values are MAPE. The total amount of data used is five
3 weeks or 840 hourly readings.

4
Bocconi University
5 %age data removed DoppelGANger TimeGAN PAR Missing Interpolated
6 20 7.22 11.25 14.61 10.62 7.72
7 40 8.23 14.25 19.84 16.83 8.36
60 9.99 15.45 24.24 31.67 10.34
8 80 11.51 14.28 21.71 53.91 13.23
9 90 11.35 14.98 15.13 54.04 13.93
10
11 TABLE 11. Imputing missing data via interpolation and with generated data for Navigli District. All values are MAPE. The total amount of data used is five weeks or
12 840 hourly readings.
13
14 Navigli District
%age data removed DoppelGANger TimeGAN PAR Missing Interpolated
15 20 6.88 6.54 9.91 8.42 7.08
16 40 6.36 7.47 18.71 9.46 6.9
17 60 7.02 7.94 21.66 16.08 7.76
80 6.94 8.74 22.83 27.65 11.57
18 90 7.24 7.8 14.85 27.74 11.31
19
20
TABLE 12. Imputing missing data via interpolation and with generated data for Duomo Cathedral. All values are MAPE. The total amount of data used is five weeks
21 or 840 hourly readings.
22
23 Duomo Cathedral
24 %age data removed DoppelGANger TimeGAN PAR Missing Interpolated
20 9.68 10.8 64.47 10.83 7.12
25 40 11.23 14.7 84.37 16.09 8.41
26 60 12.94 22.33 71.15 29.37 8.54
27 80 12.4 20.83 36.65 72.56 10.26
90 12.13 22.36 30.24 91.13 10.65
28
29
30
missing data with a common technique like quadratic inter-
31
polation seems to impute the data just as well. The inter-
32
polated model’s performance deteriorates at 80% and 90%
33
missing data levels, where the DoppelGANger imputed data
34
yields a roughly 2.5% accuracy gain in Bocconi, a 4% gain
35
in Navigli and a degradation of 2% in Duomo. These results
36
emphasize that GAN based models really start yielding true
37
benefits at high sparsity levels, where interpolation schemes
38
fade off.
39
40 The key takeaway here is that the DoppelGANger’s perfor-
41 mance for different types of time series data is more reliable,
42 since using it to impute random gaps in the true data yields
the most accuracy, in some cases outperforming the model
43
trained entirely on true data (when 40% data is removed
44
in Navigli) The TimeGAN performs well for Navigli, but
45
does relatively poorly in Bocconi and Duomo. Finally, PAR’s
46
FIGURE 19. True vs predicted, Navigli, trained on 5 weeks of TimeGAN data. output is more or less random, and it only captures the value
47
distribution of the original data sets reasonably well.
48
49
50 test data despite the poor training data. The Navigli district VII. CONCLUSION
51 recordings show similar trends, except that the TimeGAN In this work, we compared three publicly available time-
52 and DoppelGANger perform equally well, and the gap be- series generative models against each other using an actual
53 tween the GAN imputed data sets and the interpolated data mobile network data set. Two methods are based on the
54 is wider, at around 4% in favor of the DoppelGANger and GAN - architecture, while one is a deep learning based auto-
55 TimeGAN. regressive model. We see that the GAN based architectures
56 This is not the case in Duomo, where the DoppelGANger are superior to the auto-regressive approach across an array
57 is the clear winner. However, simply interpolating over the of numerical and graphical measures. We used the gener-
58
VOLUME X, 2022 15
59 For Review Only
60
Page 17 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 ated data to train a supervised machine learning algorithm [9] Deniz Gündüz, Paul de Kerret, Nicholas D. Sidiropoulos, David Gesbert,
3 and assessed its performance on unseen real data. These Chandra R. Murthy, and Mihaela van der Schaar. Machine Learning in the
4 experiments revealed that models trained on GAN - based
Air. IEEE Journal on Selected Areas in Communications, 37(10):2184–
2199, 2019.
5 DoppelGANger and TimeGAN generated data were com- [10] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
6 petitive with a model trained on true data, but the Doppel- Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen-
7 GANger was most superior in performance across all three
erative Adversarial Nets. In Z. Ghahramani, M. Welling, C. Cortes,
N. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Infor-
8 regions. Finally, our simulations revealed that increasing the mation Processing Systems, volume 27. Curran Associates, Inc., 2014.
9 training data did not always prove beneficial but in some [11] H. Ye, L. Liang, G. Y. Li, and B. Juang. Deep Learning-Based End-to-End
10 cases degraded the generative model’s performance and that Wireless Communication Systems With Conditional GANs as Unknown
Channels. IEEE Transactions on Wireless Communications, 19(5):3133–
11 some models, like the DoppelGANger, perform very well 3143, 2020.
12 even on relatively little data. We then saw that augmenting [12] L. Sun, Y. Wang, A. L. Swindlehurst, and X. Tang. Generative-
13 small amounts of real data with comparatively large amounts Adversarial-Network Enabled Signal Detection for Communication Sys-
tems With Unknown Channel Models. IEEE Journal on Selected Areas in
14 of synthetic data yielded minor performance improvements. Communications, 39(1):47–60, 2021.
15 Finally, we saw that GAN generated values were a good [13] Ben Hughes, Shruti Bothe, Hasan Farooq, and Ali Imran. Generative Ad-
16 substitute for real data when imputing missing values in a versarial Learning for Machine Learning empowered Self Organizing 5G
Networks. In 2019 International Conference on Computing, Networking
17 time-series, but interpolation techniques in most cases can and Communications (ICNC), pages 282–286, 2019.
18 perform just as well except when the number of missing [14] Mohammad Navid Fekri, Ananda Mohon Ghosh, and Katarina Grolinger.
19 values is quite large. Generating energy data for machine learning with recurrent generative
20 While this is the first in-depth research on comparing the
adversarial networks. Energies, 13(1):1–23, 2019.
[15] Stephanie L. Hyland, Cristóbal Esteban, and Gunnar Rätsch. Real-valued
21 latest deep learning based generative models for a time series (medical) time series generation with recurrent conditional GANs. arXiv,
22 telecommunications data set, future extensions of the work 2017.
23 may involve replicating it with a much larger data set, as [16] Olof Mogren. C-RNN-GAN: A continuous recurrent neural network with
adversarial training. In Constructive Machine Learning Workshop (CML)
24 well as with multivariate time-series data. Another direction at NIPS 2016, page 1, 2016.
25 could be to work with tabular data, which would entail [17] Changhee HAN, Kohei MURAO, Shin’ichi SATOH, and Hideki
26 using different GAN models as well as different evaluation NAKAYAMA. Learning More with Less: GAN-based Medical Image
Augmentation. Medical Imaging Technology, 37(3):137–142, 2019.
27 metrics. Finally, the DoppelGANger as well as PAR are [18] Maayan Frid-Adar, Idit Diamant, Eyal Klang, Michal Amitai, Jacob
28 capable of reproducing time-series data with corresponding Goldberger, and Hayit Greenspan. GAN-based synthetic medical image
29 context/metadata information, so it may be worthwhile to see augmentation for increased CNN performance in liver lesion classification.
Neurocomputing, 321:321–331, 2018.
30 if their performance differs in that environment. [19] Changhee Han, Leonardo Rundo, Ryosuke Araki, Yujiro Furukawa, Gian-
31 carlo Mauri, Hideki Nakayama, and Hideaki Hayashi. Infinite Brain MR
32 REFERENCES Images: PGGAN-Based Data Augmentation for Tumor Detection. Smart
33 [1] Ali Imran, Ahmed Zoha, and Adnan Abu-Dayya. Challenges in 5G: how
Innovation, Systems and Technologies, 151:291–303, 2020.
[20] Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda
34 to empower SON with big data for enabling 5G. IEEE Network, 28(6):27–
Wang, and Russ Webb. Learning from simulated and unsupervised
33, 2014.
35 [2] Jessica Moysen and Lorenza Giupponi. From 4G to 5G: Self-organized
images through adversarial training. Proceedings - 30th IEEE Confer-
36 network management meets machine learning. Computer Communica-
ence on Computer Vision and Pattern Recognition, CVPR 2017, 2017-
January:2242–2251, 2017.
37 tions, 129:248–268, 2018.
[21] Ali Borji. Pros and Cons of GAN Evaluation Measures. 2018.
[3] Alireza Ghasempour. Internet of things in smart grid: Architecture,
38 applications, services, key technologies, and challenges. Inventions, 4(1), [22] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler,
39 2019. and Sepp Hochreiter. GANs Trained by a Two Time-Scale Update Rule
Converge to a Local Nash Equilibrium. In I. Guyon, U. V. Luxburg,
40 [4] Gordon J. Sutton, Jie Zeng, Ren Ping Liu, Wei Ni, Diep N. Nguyen,
S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett,
Beeshanga A. Jayawickrama, Xiaojing Huang, Mehran Abolhasan, Zhang
41 Zhang, Eryk Dutkiewicz, and Tiejun Lv. Enabling technologies for editors, Advances in Neural Information Processing Systems, volume 30.
42 ultra-reliable and low latency communications: From phy and mac layer Curran Associates, Inc., 2017.
[23] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec
43 perspectives. IEEE Communications Surveys Tutorials, 21(3):2488–2524,
Radford, Xi Chen, and Xi Chen. Improved Techniques for Training GANs.
2019.
44 [5] Umair Sajid Hashmi, Arsalan Darbandi, and Ali Imran. Enabling proactive In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors,
45 self-healing by data mining network failure logs. In 2017 International Advances in Neural Information Processing Systems, volume 29. Curran
Associates, Inc., 2016.
46 Conference on Computing, Networking and Communications (ICNC),
[24] Muhammad Ferjad Naeem, Seong Joon Oh, Youngjung Uh, Yunjey Choi,
pages 511–517, 2017.
47 [6] Umair Sajid Hashmi, Ashok Rudrapatna, Zhengxue Zhao, Marek Rozwad- and Jaejun Yoo. Reliable Fidelity and Diversity Metrics for Generative
48 owski, Joseph Kang, Raj Wuppalapati, and Ali Imran. Towards Real-Time Models. In Hal Daumé III and Aarti Singh, editors, Proceedings of
the 37th International Conference on Machine Learning, volume 119 of
49 User QoE Assessment via Machine Learning on LTE Network Data. In
Proceedings of Machine Learning Research, pages 7176–7185. PMLR,
2019 IEEE 90th Vehicular Technology Conference (VTC2019-Fall), pages
50 1–7, 2019. 13–18 Jul 2020.
51 [7] Mostafa Ibrahim, Umair Sajid Hashmi, Muhammad Nabeel, Ali Imran, [25] Konstantin Shmelkov, Cordelia Schmid, and Karteek Alahari. How good
52 and Sabit Ekin. Embracing Complexity: Agent-Based Modeling for is my GAN? In Proceedings of the European Conference on Computer
Vision (ECCV), September 2018.
HetNets Design and Optimization via Concurrent Reinforcement Learning
53 Algorithms. IEEE Transactions on Network and Service Management, [26] NIST/SEMATECH. Introduction to Time Series Analysis. e-Handbook of
54 18(4):4042–4062, 2021. Statistical Methods, 2013.
55 [8] Gianni Barlacchi, Marco De Nadai, Roberto Larcher, Antonio Casella, [27] Lars Kegel, Martin Hahmann, and Wolfgang Lehner. Generating What-If
Cristiana Chitic, Giovanni Torrisi, Fabrizio Antonelli, Alessandro Vespig- Scenarios for Time Series Data. In Proceedings of the 29th International
56 nani, Alex Pentland, and Bruno Lepri. A multi-source dataset of urban life Conference on Scientific and Statistical Database Management, SSDBM
57 in the city of Milan and the Province of Trentino. Scientific Data, 2, 2015. ’17, New York, NY, USA, 2017. Association for Computing Machinery.
58
16 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 18 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 [28] Christoph Bergmeir, Rob J. Hyndman, and José M. Benítez. Bagging MUHAMMAD HARIS NAVEED received his
3 exponential smoothing methods using STL decomposition and Box–Cox B.Sc. degree in Electrical Engineering from
4 transformation. International Journal of Forecasting, 32(2):303–312, 2016. School of Electrical Engineering and Computer
[29] Aditya Grover and Stefano Ermon. Autoregressive models, Nov 2019. Science, National University of Sciences and
5 [30] Hugo Larochelle and Iain Murray. The Neural Autoregressive Distribution Technology, Pakistan, in 2021. His research inter-
6 Estimator. In Geoffrey Gordon, David Dunson, and Miroslav Dudík, ests include the application of GANs to tabular
7 editors, Proceedings of the Fourteenth International Conference on Arti- and time-series network data and audio keyword
ficial Intelligence and Statistics, volume 15 of Proceedings of Machine
8 Learning Research, pages 29–37, Fort Lauderdale, FL, USA, 11–13 Apr
spotting using using ASR-free approaches.
9 2011. PMLR.
10 [31] Neha Patki, Roy Wedge, and Kalyan Veeramachaneni. The Synthetic
Data Vault. In 2016 IEEE International Conference on Data Science and
11 Advanced Analytics (DSAA), pages 399–410, 2016.
12 [32] Qingsong Wen, Liang Sun, Fan Yang, Xiaomin Song, Jingkun Gao, Xue
13 Wang, and Huan Xu. Time series data augmentation for deep learning: A
survey. In Zhi-Hua Zhou, editor, Proceedings of the Thirtieth International
14 Joint Conference on Artificial Intelligence, IJCAI-21, pages 4653–4660.
UMAIR SAJID HASHMI (Member, IEEE) re-
ceived the B.S. degree in electronics engineering
15 International Joint Conferences on Artificial Intelligence Organization, 8
from the GIK Institute of Engineering Sciences
16 2021. Survey Track.
[33] John F. Nash. Equilibrium points in n-person games. Proceedings of the and Technology, Pakistan, in 2008, the M.Sc.
17 National Academy of Sciences, 36(1):48–49, 1950. degree in advanced distributed systems from the
18 [34] Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veera- University of Leicester, U.K., in 2010, and the
Ph.D. degree in electrical and computer engineer-
19 machaneni. Modeling Tabular data using Conditional GAN. In H. Wallach,
H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, ing from the University of Oklahoma, OK, USA,
20 editors, Advances in Neural Information Processing Systems, volume 32. in 2019. During his Ph.D., he worked as a Grad-
21 Curran Associates, Inc., 2019. uate Research Assistant with the AI4Networks
[35] Jinsung Yoon, Daniel Jarrett, and Mihaela van der Schaar. Time-series
22 generative adversarial networks. Advances in Neural Information Process-
Research Center. He also worked with AT&T, Atlanta, GA, USA, and Nokia
Bell Labs, Murray Hill, NJ, USA, on multiple research internships and co-
23 ing Systems, 32(NeurIPS):1–11, 2019. ops. Since Fall 2019, he has been serving as an Assistant Professor with the
24 [36] Zinan Lin, Alankar Jain, Chen Wang, Giulia Fanti, and Vyas Sekar. Using School of Electrical Engineering and Computer Science, National University
gans for sharing networked time series data: Challenges, initial promise,
25 and open questions. In Proceedings of the ACM Internet Measurement
of Sciences and Technology, Pakistan, where he is working in the broad
26 Conference, IMC ’20, page 464–483, New York, NY, USA, 2020. Associ-
area of 5G wireless networks and application of artificial intelligence toward
system-level performance optimization of wireless networks, and health
27 ation for Computing Machinery.
care applications. He is also affiliated with the University of Toronto as a
[37] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for
28 spoken word recognition. IEEE Transactions on Acoustics, Speech, and postdoctoral research fellow. He has published about 20 technical papers
29 Signal Processing, 26(1):43–49, 1978. in high impact journals and proceedings of IEEE flagship conferences on
30 [38] Giorgino Toni. Computing and Visualizing Dynamic Time Warping communications. He has been involved in four NSF funded projects on
Alignments in R: The dtw Package. Journal of Statistical Software, 31, 5G self organizing networks, and is a Co-Pi on an Erasmus+ consortium
31 08 2009. with a combined award worth of over $4 million USD. Since 2020, he has
32 [39] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, been serving as a Review Editor for IoT and Sensor Networks stream in the
33 M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- Frontiers in Communications and Networks.
sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-
34 learn: Machine Learning in Python. Journal of Machine Learning Re-
35 search, 12:2825–2830, 2011.
36 [40] Amirhossein Ahmadi, Mojtaba Nabipour, Behnam Mohammadi-Ivatloo,
Ali Moradi Amani, Seungmin Rho, and Md. Jalil Piran. Long-term wind
37 power forecasting using tree-based learning algorithms. IEEE Access,
38 8:151511–151522, 2020. NAYAB TAJVED received her B.S. degree in
39 Electrical Engineering from National University
of Science and Technology, Pakistan, in 2021. Her
40 research work includes studies on tackling data
41 scarcity problems faced in Future Big-Data Em-
42 powered Cellular Networks using analytical and
43 machine learning tools.
44
45
46
47
48
49
NEHA SULTAN received her Bachelor of Science
50 (B.Sc), in Electrical Engineering, from National
51 University of Sciences and Technology(NUST),
52 Pakistan, in 2021. Her research interests include
53 application of AI - driven systems in wireless
networks.
54
55
56
57
58
VOLUME X, 2022 17
59 For Review Only
60
Page 19 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 ALI IMRAN (Senior Member, IEEE) received the
3 B.Sc. degree in electrical engineering from the
4 University of Engineering and Technology La-
hore, Pakistan, in 2005, and the M.Sc. degree
5 (Hons.) in mobile and satellite communications
6 and the Ph.D. degree from the University of Sur-
7 rey, Guildford, U.K., in 2007 and 2011, respec-
8 tively. He is a Presidential Associate Professor of
ECE and the Founding Director of the Artificial
9 Intelligence (AI) for Networks Research Center
10 and TurboRAN Testbed for 5G and Beyond, University of Oklahoma. His
11 research interests include AI and its applications in wireless networks and
12 healthcare. His work on these topics has resulted in several patents and over
100 peer-reviewed articles, including some of the most influential papers
13 in domain of wireless network automation. On these topics, he has led
14 numerous multinational projects, given invited talks/keynotes and tutorials
15 at international forums and advised major public and private stakeholders
16 and cofounded multiple start-ups. He is an Associate Fellow of the Higher
Education Academy, U.K. He is also a member of the Advisory Board to the
17 Special Technical Community on Big Data, the IEEE Computer Society.
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
18 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 20 of 65

1
2 Original Manuscript ID: Access-2021-42131
3
4 Original Article Title: “Is Synthetic the New Real? Performance Analysis of Time Series Generation
5 Techniques with Focus on Network Load Forecasting”
6
7
8
9 To: IEEE Access Editor
10
11 Re: Response to reviewers
12
13
14
15
16
17
18
19 Dear Editor,
20
21
22
23 Thank you for allowing a resubmission of our manuscript, with an opportunity to address the reviewers’
24 comments.
25
26 We are uploading (a) our point-by-point response to the comments (below) (response to reviewers), (b) an
27 updated manuscript with yellow highlighting indicating changes (Supplementary Material for Review), and
28 (c) a clean updated manuscript without highlights (Main Manuscript).
29
30 A short summary of the changes implemented in the paper in accordance with the suggestions of the
31 respected reviewers is given below
32
33 Summary of Changes in the Manuscript:
34
35 • Reformatted figures and reworded their captions to improve readability. Added references
36
suggested by the reviewers and restructured the text in some sections to improve clarity.
37
38 • Restructured the abstract to more clearly communicate our findings and contributions.
39 • Removed excess background information and references to decrease the length of the paper
40 and make it more readable.
41
• Added a section that shares the parameters as well as the detailed methodology behind our
42
43 work to aid reproducibility of our results.
44 • Performed the analysis on a new time-series from another region called Duomo and updated
45 all sections with its results accordingly.
46
47
48
49 Point – by – point Responses:
50
51 In the point-by-point response below, the following scheme is used, (I) red represents the reviewers’
52 comments, (II) our response is in black font, (III) any changes made in the paper, or text taken directly from
53 the paper is shown in italic blue font. Note that not all revisions have been copied into this letter to avoid
54
cluttering.
55
56
57
58
Best regards,
59
60 M.H Naveed et al.

For Review Only


Page 21 of 65 IEEE Access

1
2 Reviewer#1, Concern # 1: All abbreviations should be defined in the first place, e.g., AR, VR, URLLC, IoT, etc.
3 The same acronym in abstract and another part of the paper should be defined in both abstract and another
4 place in the first place.
5
6 Author response: We are grateful to the author for providing this suggestion.
7
8 Author action: We updated the manuscript by defining all abbreviations in both the abstract as well as when
9 they initially appear in the main body of the paper.
10
11
12
13
14
15 Reviewer#1, Concern # 2: To provide enough and more information about Internet of Things to readers, the
16 following article is suggested to be used and cited in the 1st paragraph of introduction section, i.e.,
17 “applications such as IoT […], URLLC” and it should be added to reference section:
18
[…] “Internet of Things in Smart Grid: Architecture, Applications, Services, Key Technologies, and Challenges,”
19
20 Inventions journal, vol. 4, no. 1, pp. 1-12, 2019.
21
22
Author response: We read the reference suggested by the reviewer and decided that its addition improved
23 our paper.
24
25 Author action: We updated the manuscript by adding the suggested reference in paragraph 1 of Section I
26 (Introduction), page 1.
27
28
29
30
31
32 Reviewer#1, Concern # 3: References should be provided for the equations which were borrowed from the
33 literature.
34
35 Author response: We completely agree with the reviewer’s suggestion.
36
37 Author action: The paper from which an equation was taken is now referenced before the said equation is
38 presented in our paper. This applies to equations (2), (4), (5) in Section IV on pages 4 – 6 and equations (7),
39 (8) in Section V on page 7. Equations related to the TimeGAN in Section IV, page 5 have been removed from
40 the paper and are now referenced with respect to their number in the original TimeGAN paper. This change
41 is shown below:
42
43 The losses used to train the Embedding, Recovery, Generator and Discriminator networks can be
44 found as Eqs. (7), (8) and (9) in [35].
45
46
47
48
49
50 Reviewer#1, Concern # 4: All figures and tables should be placed at the top of the page or at the bottom of
51 the page but not between paragraphs, e.g., Fig. 5-8, Table 1.
52
53 Author response: We thank the reviewer for their invaluable observation.
54
55 Author action: All figures and tables are now positioned on the top of pages rather than between paragraphs.
56
57
58
59
60

For Review Only


IEEE Access Page 22 of 65

1
2 Reviewer#1, Concern # 5: Each part of Figures 10-14 should be labeled using a), b), c), and so on. Also, each
3 part should have a separate explanation in its caption.
4
5 Author response: We agree with the reviewer’s comment and believe it will help make the figures more
6 understandable.
7
8 Author action: We have added the labels (a), (b) and (c) to each row in Figures 10 – 16. We have also
9 explained each part in the corresponding caption of each figure.
10
11
12
13
14
15
16
17 Reviewer#1, Concern # 6: The future work should be explained in conclusion section or separately after
18 conclusion as section VIII.
19
20 Author response: We thank the reviewer for their constructive feedback on the structuring of the paper.
21
22 Author action: Possible future extensions of the work are now explained in paragraph 2 of Section VII, page
23 16. The paragraph is reproduced below:
24
25 While this is the first in-depth research on comparing the latest GAN models for a time series
26
27
telecommunications data set, future extensions of the work may involve replicating it with a much
28 larger data set, as well as with multivariate time-series data. Another direction could be to work
29 with tabular data, which would entail using different GAN models as well as different evaluation
30 metrics. Finally, the DoppelGANger as well as PAR are capable of reproducing time-series data with
31 corresponding context/metadata information, so it could be worthwhile to see if their performance
32
33 differs in that environment.
34
35
36
37
38
Reviewer#2, Concern # 1: The author needs to restructure the abstract, it should be in line with contributions
39
40 listed in the Introduction section.
41
42
Author response: We regret that our choice of words in the abstract did not clearly communicate our
43 contributions.
44
45 Author action: We have restructured the abstract to reflect our contributions more clearly. The part of the
46 abstract most pertinent to this is reproduced below:
47
48 In this paper, we choose two GAN - based models and one deep learning - based autoregressive
49 model. We then compare their performance at generating synthetic time-series cellular traffic data.
50 We also assess the impact of data scarcity on the generated data quality by varying the level of data
51
available to the GANs for training. Moreover, in order to assess the usefulness of this generated
52
53 data, we compare the forecasting performance of a gradient boosting regressor model trained solely
54 on synthetic data, real data, and a mix of both.
55
56 This summarizes our main contributions, which are:
57
58 • Assess the performance of open – source GAN – based generative models vs a deep learning
59 based autoregressive model.
60
• Determine how the availability of data impacts the performance of these models.

For Review Only


Page 23 of 65 IEEE Access

1
2 • Numerically evaluate the usefulness of generated data in downstream applications such as
3 network load forecasting.
4
5
6
7
8
9 Reviewer#2, Concern # 2: There are grammatical mistakes in the paper, in result analysis. line no 10, Hence,
10 we train our forecasting model on 1, 3 and 5 weeks of data only and forecasted for a week from 12/09/2013
11
- 12/15/2013. Which week author consider for the experiment? Are these dates and weeks correct
12
13 12/09/2013 - 12/15/2013? Result analysis shows 1, 3 and 5 week, but in explanation author trace data to the
14 7th week as well
15
16 Author response: We are thankful to the reviewer for pointing out this potentially confusing paragraph. 1,
17 3 and 5 weeks of data are used in training the forecasting model only. This is because our forecasting interval
18 is chronologically the 6th week so it is inappropriate to use 7 weeks of data to train a forecaster. However,
19 the 7 weeks of generated data are used in other comparative metrics such DTW score, auto-correlation plots,
20 histograms and KL - Divergence.
21
22 Author action: We have added a paragraph to further elaborate on the data used in our analysis within
23 Section VI on page 8. This added paragraph is presented as follows:
24
25 For conciseness, we report the qualitative assessment, KL - Divergence and Dynamic Time Warping
26
27
of 3, 5 and 7 weeks of data only. For the Train Synthetic Test Real (TSTR), augmentation and
28 imputation assessments, a forecasting model is trained on 1, 3 or 5 weeks of data only and then used
29 to forecast over the test week, which is chronologically the 6th week in the time-series. If we use 7
30 weeks of data as well, that would require making the test week the 8th week. The problem is that
31 the 8th week lies in the Christmas season and its readings are radically different from our training
32
33 data. Thus, using 7 weeks of data is not appropriate in these assessments.
34
35
We have also added the test-week span and length to Table 1, which describes how the data is partitioned.
36 The caption of Table 1 also offers more explanation now as well.
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51 Reviewer#3, Concern # 1: Please use a short yet informative title for the paper.
52
53 Author response: We agree with the reviewer that the original title was exceedingly long and too wordy.
54
55 Author action: We have changed the title of the Manuscript to Assessing Deep Generative Models on
56 Time Series Network Data from Is Synthetic the New Real? Performance Analysis of Time Series
57 Generation Techniques with Focus on Network Load Forecasting
58
59
60

For Review Only


IEEE Access Page 24 of 65

1
2 Reviewer#3, Concern # 2: Please use correct abbreviations, GAN is abbreviated several times.
3
4 Author response: We appreciate the reviewer’s observation and have taken measures to implement it
5
6 Author action: Abbreviations are now defined once in the abstract and once when they are initially used in
7 the main body. Any other excess abbreviations have been removed.
8
9
10
11
12
13 Reviewer#3, Concern # 3: Please clarify we “traffic forecasting models” in the Abstract since it is misleading
14 at the first glance!
15
16 Author response: We regret that the reviewer was inconvenienced by our choice of words and agree that
17 our use of the term was not clear.
18
19 Author action: We have updated the abstract to be more explicit about what we are forecasting. The portion
20 of the abstract most relevant to this suggestion is reproduced below:
21
22 Moreover, in order to assess the usefulness of this generated data, we compare the performance of
23
a gradient boosting regressor model trained solely on generated data, real data, and a mix of both
24
25 at forecasting network usage activity. We do not consider any privacy issues pertaining to the
26 generated data in this work. Our experiments show that the GANs perform better than the
27 autoregressive approach in each aspect considered in this work. Forecasting models trained to
28 predict network load based on data generated by these GANs yield error rates comparable to models
29
30
trained on real data.
31
Thus, it is now clearer that by ‘traffic forecasting’ we meant predicting network traffic.
32
33
34
35
36
37 Reviewer#3, Concern # 4: Please replace section with Section at the end of the first part of the Introduction.
38
39 Author response: We regret that such a basic mistake occurred on our end.
40
41 Author action: The above amendment has been made and the manuscript has been checked for other such
42 mistakes as well.
43
44
45
46
47
48 Reviewer#3, Concern # 5: What is the main aim of the paper? Applying GANs, traffic forecasting or privacy
49 issue?! What we are talking about sparsity or privacy?
50
51 Author response: We are thankful to the reviewer for sharing their concerns.
52
53 Author action: The main aim of the paper is to compare three different generative models, then determine
54 how the availability of data affects their performance and finally see how the generated data can be used in
55
a network load forecasting task. We do not consider the privacy issue particularly in this work, rather conduct
56
57 a thorough analysis on the performance of generative techniques in producing time-series data along with a
58 study on the impact of data scarcity, often created by virtue of privacy policies on the efficiency of these
59 deep generative techniques. This point has now been clarified within the abstract, the relevant part of which
60 is reproduced below:

For Review Only


Page 25 of 65 IEEE Access

1
2 In this paper, we choose two publicly available GAN - based models and one deep learning - based
3 autoregressive model. We then compare their performance at generating synthetic time-series
4
5
cellular traffic data. We also assess the impact of data scarcity on the generated data quality by
6 varying the level of data available to the GANs for training. Moreover, in order to assess the
7 usefulness of this generated data, we compare the performance of a gradient boosting regressor
8 model trained solely on generated data, real data, and a mix of both at forecasting network usage
9
activity.
10
11
12
13
14
15 Reviewer#3, Concern # 6: Preliminaries are described in excessive detail such that the reader wonders which
16
method and where is used in the simulations. Please restructure the paper with concise descriptions to avoid
17
18
making a simple complicated.
19
Author response: We thank the reviewer for bringing this to our attention. For the sake of completion, we
20
21 explained each model’s novelties and its working. However, we see now that this approach has done more
22 harm than good.
23
24 Author action: We have significantly cut down on the amount of detail offered in Section II: Related Work
25 and Section IV: Time – Series Generative Models. The paras that underwent the most significant editing are
26 reproduced below:
27
28 Section II Para 1 (page 2):
29
30 The Generative Adversarial Network (GAN) framework was originally designed for simple image
31 data, but since its inception has seen great advancements [10]. CycleGAN [11] deals with image-to-
32
image translation as compared to generating image from a noise vector, with applications such as
33
34 super resolution and style transfer. StyleGAN [12] focuses on generating high resolution human faces
35 by proposing an alternative generator architecture for GAN based on style transfer learning.
36 StackGAN [13] synthesizes high-quality images from text descriptions using a two-stage process.
37
38
39
40 Section IV, Subsection C, Para 4 (page 5):
41
42 The use of GANs to generate tabular or time-series data is less common than generating images or
43
44
video, with the most popular model (as per GitHub statistics) being the CTGAN (Conditional Tabular
45 GAN) [47]. Another model is the ITS-GAN [48], which produces synthetic data tables that exhibit the
46 same statistical properties and functional dependencies as the real, partially available data.
47 MedGAN [49] was the first GAN to model high dimensional, multi-label discrete variables in
48 electronic health records (EHRs). GANs have been used to generate time-series data from many
49
50 domains such as physiological signals [50], medical ICU data [18], financial time-series [21], [20],
51 and PV (photovoltaic) production [51]. The TimeGAN [52] and DoppelGANger [53] are frameworks
52 that modify the traditional GAN architecture to make it more suitable for time-series data. A brief
53 overview of these models is given below.
54
55
56
57 Section IV, Subsection C, TimeGAN (page 5):
58
59
We remove explanations of the individual losses of the TimeGAN and instead refer readers directly
60
to the paper.

For Review Only


IEEE Access Page 26 of 65

1
2 The losses used to train the Embedding, Recovery, Generator and Discriminator networks can be
3 found as Eqs. (7), (8) and (9) in [35].
4
5 Section IV, Subsection C, DoppelGANger, Para 2, (Page 5):
6
7
DoppelGANger solves this by modifying the RNN structure to produce S values in a single pass. This
8
9 reduces the overall number of passes required to generate the entire series, but the quality of the
10 generated samples also deteriorates as S increases. The authors recommend a value of S = 5 for best
11 results. In order to counter mode collapse, common in data sets that have large variability in values,
12 the authors use an idea they call auto-normalization. Instead of normalizing the entire data set using
13
14
the minimum and maximum values, they normalize each sample individually and treat the maximum
15 and minimum values of each time series as random data that has to be learnt by the model rather
16 than passed as input. Perhaps the largest design contribution is the use of a separate auxiliary
17 discriminator in addition to the regular Figure 6. Simplified block diagram of DoppelGANger
18
generator and discriminator setup, that works only on static features (called metadata) of a single
19
20 time-series sample. This is done to ensure that the complex relationships between a time-series and
21 its associated metadata are replicated in the generated data. This approach is unique because most
22 models that we have discussed so far trained on the metadata and temporal features jointly. This is
23 also why It uses the Wasserstein Loss …
24
25
26
27
28
29 Reviewer#3, Concern # 7: Instead of referring to a paper from 2011, please refer to up-to-date references
30 like 10.1109/ACCESS.2020.3017442 and make it clear if you used a simple gradient Boosting and not
31
XGBoost, CatBoost or LightGboost.
32
33 Author response: We are thankful to the reviewer for pointing out this omission on our end and suggesting
34
a paper that is relevant to our methodology.
35
36
Author action: The abstract as well as Section VI, Subsection A (Pre-processing setup), page 8, now explicitly
37
38
state that we use a simple gradient boosting regressor model and not one of its variants such as XGBoost,
39 CatBoost etc. Furthermore, we also cite the suggested paper in Section VI, Subsection A (Pre – processing
40 setup), paragraph 3, page 8 so that readers unfamiliar with the use of this technique for forecasting can read
41 more about it.
42
43
44
45 Reviewer#3, Concern # 8: There are 60 references which are excessive for a regular paper. Please critically
46 review the most related ones.
47
48 Author response: This suggestion of the reviewer compliments the earlier suggestion of cutting down on
49 excessive explanations in the preliminaries and so both have been adequately addressed.
50
51 Author action: We have removed all references that are not directly concerned with the main scope of our
52 paper. This has led to the total references being reduced from 60 to 40.
53
54
55
56
57
58 Reviewer#3, Concern # 9: It is totally unclear how the authors train the models! Please provide a clear flow
59 diagram briefly clarifying the presented method and metrics at the beginning of the Simulation section.
60 Please provide all hyper-parameters used in this paper in a table helping to regenerate the results.

For Review Only


Page 27 of 65 IEEE Access

1
2 Author response: We are immensely grateful to the reviewer for this particular suggestion. Lack of
3 reproducibility in machine learning research is a very real concern, and so we have worked especially hard to
4 accommodate this suggestion.
5
6 Author action: We have added Figures 10 and 11 in Section VI, page 9 that depict our comparison process as
7 well as the generation process that comes before. Together they provide a succinct summary of the
8
methodology used in the paper. Furthermore, we added an entire new subsection (Subsection B) to Section
9
10
VI (Results and Discussion) on page 8 called ‘Model Training.’ We did this so that, in addition to providing the
11 hyper-parameters for each model in Tables 3 – 5, we could also provide additional information on how to
12 prepare data for each model or modify their code for our specific use case. This information will make it
13 easier for any readers who wish to reproduce our work. Figures 10 and 11 are reproduced below:
14
15 Figure 10:
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 Figure 11:
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58 Reviewer#3, Concern # 10: The performance deterioration mentioned in Section B.2 is the ease of learning
59 of a small set of data. Gradient boostings, especially XGBoost, can easily handle more than 7 weeks without
60 concerning about long-term dependencies. As far as we know LSTM, as an RNN, is developed to handle long-

For Review Only


IEEE Access Page 28 of 65

1
2 term dependencies, hence, how do the authors claim, “holding these dependencies in memory is difficult for
3 RNN - based architectures”?
4
5 Author response: We thank the reviewer for pointing out the lack of clarity in our statement and appreciate
6 the chance to explain it once more.
7
8 In the original manuscript, the idea that LSTMs cannot capture long-term dependencies is first introduced in
9 Section IV, Subsection C, DoppelGANger, Para 1, where we state:
10
11 Since RNNs produce a single measurement in a single pass and for a time-series of length L perform
12
13
L passes, they tend to forget prior values and struggle to capture long-term correlations in a time-
14 series.
15
16 This claim is originally made in the paper ‘Using GANs for Sharing Networked Time Series Data: Challenges,
17 Initial Promise, and Open Questions’ (citation 36) where the authors prove empirically that even LSTMs
18 struggle to capture long-term dependencies when a time-series is longer than a few hundred measurements
19 in length. We regret that we did not make this more explicit in the paragraph and this mistake has now been
20 corrected.
21
22 Secondly, we refer to this statement while explaining the performance degradation observed in the ACF Plots
23 in Section VI, Subsection B.2 as:
24
25 Firstly, the amount of data added is simply not enough to have a significant improvement in
26
27
performance; time-series data usually spans years instead of a few weeks. Secondly, increasing the
28 data increases the number of long-term dependencies that each model has to learn. As previously
29 explained, holding these dependencies in memory is difficult for RNN - based architectures, but the
30 DoppelGANger’s superior performance indicates that it’s solution for this problem does improve
31
performance over standard RNNs (as used in PAR) or RNNs used in conjunction with auto-regressive
32
33 models (as used in TimeGAN)
34
35 We reassessed this explanation and decided that while it could apply in our case (the length of one week of
36 data is 168), there must be another explanation for the degradation observed. It turns out that we were
37 correct. The degradation observed in the ACF plots is only visible for Navigli district, not for Bocconi or Duomo
38 (a new region we added, more discussion on that in Concern # 13). This indicates that the correlation plots
39 show more deterioration when dealing with a volatile time-series (like Navigli) rather than more regular,
40 repeating time-series data (like Duomo or Bocconi)
41
42 Author action: We have made the first statement in Section IV clearer; the new additions (in bold) are shown
43 below:
44
45 Since RNNs produce a single measurement in a single pass and for a time-series of length L perform
46
47
L passes, they tend to forget prior values and struggle to capture long term correlations in a time-
48 series. Furthermore, authors in [36] in Section 4.1, page 5 state that even LSTMs, which were
49 designed to correct the above mentioned problem with RNNs, empirically struggle to perform well
50 when the length of a time-series surpasses a few hundred records.
51
52 We have also removed the original explanation in Section VI, Subsection B.2, page 10 and replaced it with
53 the explanation given above. In Para 2, we now state:
54
55 Thus, it seems that the models struggle to capture the correlations across a more chaotic time-series
56
57
like Navigli but easily reproduce the correlations in places like Duomo and Bocconi, which have more
58 regularly repeating activity patterns.
59
60

For Review Only


Page 29 of 65 IEEE Access

1
2
3
4 Reviewer#3, Concern # 11: There are flaws in the presentation and analysis of the results. The paper requires
5 better comparative methods to substantiate its originality and superiority. For example, did you answer the
6 claim you stated in the Abstract? i.e. “We also assess how much real data is required to produce high-quality
7 synthetic data.”? To this end, the authors are requested to compare the presented technique with active
8
learning-based models.
9
10 Author response: We agree that our original statement, “We also assess how much real data is required to
11
produce high-quality synthetic data”, did not accurately represent our contributions. Our work is more of a
12
13 case study that looks at how publicly available deep learning based generative models, particularly GANs
14 perform, especially in the case when training data is scarce. What we meant by the above statement was to
15 state how much real data would be required to produce useful synthetic data in our particular use-case.
16 However, we now realize that the statement came off as a generalization about how much data of any nature
17 would be required to produce useful synthetic data from these models.
18
19 The comparison with active learning – based ML models is unsuitable in this use case though. Active learning-
20 based solutions are applied when one has a small amount of labelled data, lots of unlabeled data and not
21 enough resources to manually label it. In that case, you train a model on the limited labeled data available
22
and then use that to predict the labels on the unlabeled data. Then, with a suitable score, we prioritize which
23
24
data to label and the process is iterated repeatedly. Approaches like these are more suitable for classification
25 problems where one has a distinct number of classes or problems that use independent data samples.
26
27 In contrast, deep generative models such as the models we use learn the underlying distribution from which
28 a time-series originates. This includes learning the distribution of values of the recordings in the time-series
29 as well as the temporal relationship between the measurements. We then use these trained models to
30 generate new, unique data points that are not identical to the true data we have but appear to have been
31 generated from the same process.
32
33 Since the two approaches address different problems, we feel it is outside the scope of our work to assess
34 active – learning models on this particular use case.
35
36 Author action: We have rewritten the abstract and removed the line stated above since it does not clearly
37 represent our work as noted above.
38
39
40
41
42
43 Reviewer#3, Concern # 12: Why augmented data does not improve forecasting accuracy? Is it cogent to
44 simply say “there is not enough data”?
45
46 Author response: Augmenting small amounts of real data with some synthetic data does lead to
47 improvements in forecasting accuracy over just using the small amount of real data. In the explanation in
48 Section VI, Subsection C.4, we state that although there is improvement, it is just not large enough to be
49
significant. We believe this is reasonable because we are only using five weeks of data to forecast, so any
50
51 measurements in the synthetic data that degrade the model’s learning have more weight than they would if
52 we were using twelve weeks of data to train a forecaster. In the latter case, the overall characteristics of the
53 synthetic data would have more effect that any individual readings. Therefore, we do believe that the small
54 amount of improvement seen is indicative of possible larger improvements if we were using a larger data set
55 to perform the forecasting (as well as the GAN training)
56
57 Author action: NA
58
59
60

For Review Only


IEEE Access Page 30 of 65

1
2 Reviewer#3, Concern # 13: Please make sure to use datasets with considerable sample numbers. Do you
3 think it is cogent claiming on a small-sized dataset? Or on a single dataset? I don’t think so!
4
5 Author response: We are extremely thankful to the reviewer for bringing this up since this comment gave us
6 a chance to really work through the entire paper again. This exercise brought new insights and resulted in
7 slightly improved results.
8
9 To answer the first part of the comment, it is important to reiterate the fact that the entire premise of this
10 work revolves lack of useful data. The idea of synthetic data generation, as detailed in our introduction,
11
hinges on the idea of large amounts of data being unavailable. If one has access to large amounts of data,
12
13 then the need for synthetic data is severely limited to only situations where the vendor does not want to
14 share real data as is. Another problem is that in our chosen domain; telecommunications, large datasets are
15 not open-source. The Telecom Italia set is one of few publicly available telecom activity data sets. Another
16 one is Broadband America, however that set deals with internet usage in homes whereas telecom Italia deals
17 with mobile internet usage, which we found more interesting in today’s mobile first environment. Note that
18 for many homes in the Broadband America data set, there are lots of missing values as well, so that even it
19 is not an extremely large data set.
20
21 Regarding the other point about using a single data set, note that while the measurements for each region
22
come from the same provider, they are in fact separate data sets. Both Bocconi University and Navigli District
23
24
have distinct measurements that are independent from one another. Therefore, the analysis conducted on
25 them should be considered as being done on two data sets instead of one.
26
27
28
29
Author action: We decided to analyze data of a third region in addition to the two we already had. This region
30 is a historical cathedral in central Milan and arguably one of its biggest tourism spots. We have updated all
31 sections of the paper with insights from this region. In particular, we have tried to rewrite Section III to make
32 it clearer that we are using three different data sets instead of one. For instance, we have updated Figure 3
33 as:
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52 We have also added a brief description of Duomo’s time-series in Section III, Subsection B, page 4 as:
53
54 Lastly, Duomo resembles a sinusoidal pattern, but a closer look shows that there’s a slight uptick in
55
56 activity every Friday and Saturday towards twilight. Thus, all three regions …
57
58
Reconducting the analysis on Duomo allowed us to reassess our choice of parameters in the case of
59 DoppelGANger and PAR. For instance, in the original work, we were applying an absolute function on PAR’s
60 output to ensure all its generated values were positive. While working on Duomo, we decided that this was

For Review Only


Page 31 of 65 IEEE Access

1
2 unfair since there was no such postprocessing being done for other models. This led to repeating the analysis
3 for PAR for the other regions as well and all corresponding figures/tables were updated accordingly.
4
5 Finally, assessing a region with a regular pattern like Duomo allowed us to contrast performance with other
6 regions with more volatile trends such as Navigli. This provided insights that were previously unclear from
7 just looking at the original two regions and as such, large parts of the results section have been updated,
8
especially the section regarding DTW Score, where we realized the degradation observed when using 7 weeks
9
10
of data was because of an overall downwards trend in Bocconi and Navigli (visible in Figure 3). This
11 downwards trend is somewhat inaccurately captured by the TimeGAN, which is why its performance
12 improves at that level.
13
14
15
16
17
18
19
20 Note: References suggested by reviewers should only be added if it is relevant to the article and makes it more
21 complete. Excessive cases of recommending non-relevant articles should be reported to
22 ieeeaccesseic@ieee.org
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

For Review Only


IEEE Access Page 32 of 65

1
2 Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
3 Digital Object Identifier 10.1109/ACCESS.2022.DOI

4
5
6
7 Assessing Deep Generative Models on
8
9 Time Series Network Data
10
11 MUHAMMAD HARIS NAVEED1 , UMAIR SAJID HASHMI1 ,(Member, IEEE), NAYAB TAJVED1 ,
12 NEHA SULTAN1 , ALI IMRAN2 , (Senior Member, IEEE)
13 1
School of Electrical Engineering & Computer Science, National University of Sciences & Technology, PK
2
14 AI4Networks Research Center, School of Electrical & Computer Engineering, University of Oklahoma, OK, USA

15 Corresponding author: Umair Sajid Hashmi (e-mail: umair.hashmi@seecs.edu.pk)

16 This work was supported in part by the National Science Foundation (NSF) under Grant 1619346, and Grant 1923669; and in part by the
17 Qatar National Research Fund (QNRF) under Grant NPRP12-S 0311-190302.

18
19
20 ABSTRACT To achieve zero touch automation in next generation wireless networks through artificial
21 intelligence (AI), large amounts of training data is required. This training data is publicly unavailable and is
22 a major hindrance in research on AI applications to wireless communication. One solution is using limited
23 real data to generate synthetic data that can be used in lieu of real data. Generative Adversarial Networks
24 (GAN) have been used successfully for this purpose. In this paper, we choose two publicly available GAN
25 - based models and one deep learning - based autoregressive model. We then compare their performance
26 at generating synthetic time-series wireless network traffic data. We also assess the impact of data scarcity
27 on the generated data quality by varying the level of data available to the models for training. Moreover, in
28 order to assess the usefulness of this generated data, we compare the performance of a gradient boosting
29 regressor trained solely on generated data, real data, and a mix of both at forecasting network traffic. Our
30 experiments show that the GANs perform better than the autoregressive approach in each aspect considered
31 in this work and forecasting models trained to predict network load based on data generated by these GANs
32 yield error rates comparable to models trained on real data. Finally, augmenting small amounts of real data
33 with generated data leads to minor performance gains in some cases.
34
35
INDEX TERMS Machine learning, GAN, TimeGAN, PAR, DoppleGANger, Time series, Forecast analysis
36
37
38
39 I. INTRODUCTION detection and other such applications is substantial [5]–[7].
40 The ultimate desirable goal is to achieve AI enabled zero
The application of artificial intelligence (AI) in medicine,
41 touch automation in next generation networks with the aim to
power systems, image processing and other domains has
42 minimize operational cost, overcome operational complexity
become commonplace. Although motivated by gains and
43 and human errors, thereby maximizing resource efficiency
benefits of zero touch automation presented in earlier studies
and Quality of Experience (QoE).
44 such as [1], the realization of AI enabled gains in wireless
45 networks is yet to be witnessed in the real world. This is set Unfortunately, sufficient telecommunications network data
46 to change as the world moves to sixth generation networks required for executing sophisticated ML models is either not
47 (6G) and beyond, in which networks will be able to per- available, or is too scarce for effective ML model training
48 form self–configuration, self-optimization and self–healing and execution [8], [9]. This is largely due to privacy concerns
49 via real-time AI on network parameters [2]. Paired with and the hesitance of the telecom industry to open source
50 the rapid proliferation of new and diverse paradigms, such data that could potentially be used by their competition.
51 as AR (Augmented Reality) and VR (Virtual Reality), new Another challenge in getting ample training data is the large
52 connectivity use cases and applications such as IoT (Inter- amount of technical effort required to get data out of silos
53 net of Things) [3], URLLC (Ultra Reliable Low Latency within the operators where it remains trapped. When data is
54 Communication) [4], and holographic communications, the made available, it is usually locked behind non–disclosure
55 potential of harnessing the vast data produced by these sys- agreements or released to specific research groups. This is
56 tems in supervised machine learning (ML) problems to per- a major impediment to research in this domain and partly
57 form usage prediction, optimal resource allocation, anomaly responsible for the lag in applying ML techniques that we
58
59 VOLUME X, 2022 For Review Only 1
60
Page 33 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 see in the communication systems domain compared to other relevant literature in this domain, Section III describes the
3 domains where data is more freely available, such as image data sets we used, Section IV describes different time-series
4 processing. generative models and the architectures we are using. Section
5 One solution to this deadlock is to generate synthetic data V explains our methodology while Section VI explains the
6 that is faithful to the properties of the original data, but is experimental setup employed in our simulations as well as
7 different from the original data in terms of actual values. the results. The paper is concluded in Section VII with a
8 This paper aims to test the latest synthetic data generation summary of the work and possible future research directions.
9 techniques, especially those based on the generative adver-
10 sarial networks (GAN) model. GAN is a deep learning based II. RELATED WORK
11 generative model that has delivered impressive results in GANs were originally designed for simple image data, but
12 generating synthetic but near realistic images and videos since their inception have seen great advancements [10].
13 [10]. We apply two variants of the GAN model to three Although the extent of applicability of GANs in wireless
14 distinct internet activity time-series data sets and compare communications domain is still being explored, some recent
15 their performance against a non–GAN based method. We studies have investigated this particular research theme. To
16 then test the performance of the generated data in down- model the channel effects in an end-to-end wireless network
17 stream supervised machine learning applications, specifically in a data-driven way, [11] proposes to use a Conditional
18 forecasting internet traffic levels. We also analyze how, if GAN (CGAN). A novel architecture using GAN in [12] is
19 at all, is the performance of these methods affected by the designed to directly learn the channel transition probability
20 amount of data available. While the idea that more data (CTP) from receiver observations. Authors in [13] leverage
21 equals better performance is valid, it is interesting to see a GAN to create large amounts of synthetic call data records
22 on how much data can GANs give usable results. Finally, (CDRs) containing the start hour and duration of a call. The
23 since evaluation of GAN performance is an open problem, authors show a marked improvement in future call duration
24 and the metrics that do exist are geared towards evaluating prediction accuracy using real data augmented with GAN
25 image output, we use a variety of direct and indirect metrics generated synthetic data points.
26 to comprehensively evaluate the quality of the generated There have also been GAN-based approaches proposed for
27 synthetic data. Simply put, we assess how data sparsity generating different types of time-series data. Navid Fekri et
28 impacts several deep generative models’ ability to produce al. introduces a Recurrent GAN (R-GAN) [14] for generating
29 high quality synthetic time-series data, and then assess how realistic energy consumption data by learning from real data.
30 using this synthetic data improves or degrades performance Hyland et al. proposed a Recurrent Conditional GAN (RC-
31 of a forecasting model on real, unknown test data. GAN) [15] to produce realistic real valued multi-dimensional
32 With this study, we offer the following contributions: medical time series data. Olof Mogren proposes a continuous
33 recurrent neural network (C-RNN) based GAN model [16]
34 • A performance analysis of publicly available deep gen- that works on sequential data to synthesize music using a
35 erative models, with particular focus on GANs, de- long short-term memory (LSTM) NN for both generator and
36 signed to generate realistic time-series data on a scarce discriminator.
37 telecommunications data set is conducted using a select Many studies have tried to solve the training data scarcity
38 group of indirect metrics, which assess the quality, problem using GANs. Changhee Han et al. [17] introduced
39 fidelity and practical usefulness of the generated time- data augmentation in medical images using GANs. They con-
40 series data. We observe that the range of values and cluded that data augmentation can boost diagnosis accuracy
41 structure of a given time-series is retained in the GAN- by 10%. Similar work has been done on images of skin
42 generated series, but the same cannot be said for non- lesions [18] and brain tumor MRIs [19]. Similarly, SimGAN
43 GAN generated time-series data. [20] shows a 21% performance improvement in eye-gaze
44 • Analysis of the impact of data scarcity on the perfor- estimation by using an architecture similar to GANs.
45 mance of these techniques in generating realistic time Several evaluation techniques have been proposed for data
46 series synthetic data. While it seems intuitive that in- generated by GANs. Ali Borji [21] analyzes more than 24
47 creasing the training data would improve the generated quantitative and 5 qualitative measures for evaluating genera-
48 data quality, we observe that this is not always the case, tive models. He concludes that each have their own strengths
49 and that longer time-series sequences with long-term and limitations. He also proposes that the evaluation tech-
50 trends are harder for GAN models to learn faithfully. niques should be application specific. The most common
51 • Quantitative study of the utility of generated data in GAN metrics are Frechet Distance [22] and inception score
52 downstream predictive modelling using supervised ML [23], but many more have been proposed, such as the fidelity
53 approaches. We observe that for simple forecasting pur- and diversity generative model metrics explained in [24]. All
54 poses, the error between the generated and real data in all, as noted in [25], GAN performance evaluation remains
55 trained model is between 1 to 4 percent for our best an open and challenging research problem.
56 performing model. To the best of our knowledge, our work is the first to
57 The paper is organized as follows: Section II discusses compare several modern GAN models vs a non-GAN ap-
58
2 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 34 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 FIGURE 1. A sample figure taken from the Telecom Italia dataset [8] which
divides the coverage area of Radio Base Station into grids which helps in
17 identifying the geographical location of the user.
18 FIGURE 2. Administrative map of Milan with our selected areas highlighted.
19
20
proach as well as the first to stress test the given models
21
with scarce data. It is also one of the few papers that looks
22 at telecommunications data instead of energy, financial or
23 medical time-series data.
24
25
26 III. TELECOM ITALIA DATASET
27 A. BACKGROUND
28
As mentioned before, telecommunications data is rarely
29
open-sourced or easily available. When it is available, it is
30
limited to a few research teams that have signed Non - Dis-
31
closure Agreements with service providers. Telecom Italia
32 (renamed TIM Group in 2019) recognized that this situation
33 hampered independent researchers who couldn’t access data
34 for analysis and model training purposes. So in 2014, it orga-
35 nized the ‘Telecom Italia Big Data Challenge’. Participants
36 were given access to telecommunications, weather, social FIGURE 3. Internet activity over seven weeks in our selected areas. Note the
37 media, news and electricity consumption data of Milan and differences in trend and seasonality over time across all three localities.
38 the province of Trentino from November 2013 to January
39 2014, with the non - telecommunication data sets provided
40 by other industrial partners [8].
B. REGIONS
41 In order to judge how well our selected generation methods
42 We will be focusing on the telecommunication activity work for time-series data of different nature, we look at
43 data provided in this set, specifically the call detail records data from three regions with interesting activity patterns. By
44 (CDRs), which contain time-series data on internet usage in ‘interesting’, we mean that the network activity from these
45 all regions of the Milan Metropolitan area from November regions contains daily and weekly seasonality as well as
46 to December 2013. The time range considered in our work an overall trend over the selected timespan. These selected
47 is from Monday, November 4, 2013, to Sunday, December regions are Bocconi university, a private university, the
48 22, 2013. Since the overall challenge made use of data Navigli district, a wedge between two canals that boasts
49 from multiple domains provided by different companies, all upscale restaurants, bars, art galleries and is popular with
of which used different spatial and temporal aggregation
50 tourists. The third region is the Duomo cathedral and its
methods, Telecom Italia standardised them all onto a single
51 surroundings, which is a religious and tourist hotspot.
100x100 grid for Milan, with each square covering about
52 Fig. 1 and 2 illustrate the coverage area and the geograph-
235x235 metres of area. The internet usage is measured
53 ical placement of the selected regions respectively. Fig. 3
in ‘number of connections created’ in 10 minute intervals.
54 illustrates the activity patterns of each region over seven
However, for ease of analysis and training we down-sample
55 weeks. We can observe a daily seasonality in Bocconi, with
these measurements to an hourly interval.
56 usage peaking around midday and almost disappearing at
57 night. There is an expected decline in internet usage on
58
VOLUME X, 2022 3
59 For Review Only
60
Page 35 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 the weekends, which is due to low activity levels in the Bernoulli distribution means that this model cannot learn all
3 University campus. In Navigli, we do not observe a dramatic types of distributions. This weakness is corrected by models
4 decrease in activity on any particular day, but we observe like Neural Auto-regressive Distribution Estimator (NADE)
5 two peaks per day, one at midday, and a higher one around [30], that use neural networks for parameter estimation,
6 midnight, indicating the district is a nightlife hub. Lastly, which is more efficient than the simple approach described
7 Duomo resembles a sinosoidal pattern, but a closer look earlier.
8 shows that there’s a slight uptick in activity every Friday and
9 Saturday towards twilight. Thus, all three regions are distinct 1) Probabilistic Auto-regressive model (PAR)
10 time-series data sets and it will be interesting to see whether One implementation of an auto-regressive generative model
11 the performance of GANs will differ due to this fact. This is PAR, which is a model in the Synthetic Data Vault
12 is also important since time-series data can possess many collection of generative models [31]. PAR uses recurrent
13 different trends and patterns thus, any generative models neural networks to model the distributions. Since RNNs are
14 must be able to perform well across the board in order to designed to handle sequential data, it is much more suited to
15 be useful. our task of time series generation.
16
Given a function h employed within an RNN, we calcu-
17 IV. TIME SERIES GENERATIVE MODELS
late its hidden states hi = h(hi−1 , pi−1 , θ) for each time-
18 We now provide an overview of the structure of time-series instance, depending on the prior corresponding hidden state
19 data and the various methods used to generate them, includ- hi−1 , preceding input value pi−1 and the network hyper
20 ing a more in-depth explanation of the specific techniques parameters θ. These three variables establishing the hidden
21 that we’re using in this work. A time series is an ordered state hi construct a set of parameters θ(hi ) which represent a
22 sequence of values of a variable taken at equally spaced time distribution with density θ(hi )(pi ), resulting in the following
23 intervals. Thus, any analysis or generation of time series must conditional distribution:
24 take into account that data points taken over time may have
25 an internal seasonality and trend [26]. Mathematically, most T
Y
26 time series’ can be written as: qθ [pt0 :T |p1:t0 −1 ] = ℓψhi (pi ) (3)
27 i=t0
28 xt = st + gt + et (1)
29 where qθ [pt0 :T |p1:t0 −1 ] is a parametric distribution specified
30 where st =seasonality, gt =trend, and et =residuals. by learn-able parameters θ, which are obtained by using past
31 Here, t = 1, 2, 3, ..., N represents the time index at which values p1:t0 −1 from a time series p = (pt ); 1 ≤ t0 ≤ T ;
32 observations have been recorded. to forecast future values pt0 :T This conditional distribution
33 can then be used to generate new synthetic data similar to the
34 A. DECOMPOSITION - BASED METHODS original input data. We will be evaluating PAR against our
35 As shown above, a time-series generally comprises of sea- chosen GAN - based models to evaluate whether the GAN
36 sonality, trend and residual terms. One method of time framework yields any noticeable performance improvements
37 series generation involves decomposing a time series into against this simpler deep learning-based model. A more com-
38 its components, then adding a deterministic and stochastic prehensive review of time series data generation techniques,
39 component which is constructed by optimizing weights for specifically for data augmentation in time series classification
40 the trend and seasonality components and by modelling the and clustering tasks, can be found in [32].
41 residuals via some statistical model [27]. Another approach
42 uses bootstrapping on the residuals obtained after decompos- C. GAN - BASED METHODS
43 ing to create augmented signals and then combines them with We now move to our main focus, the GAN-based models, in
44 trend and seasonality to create a new time series [28]. this section. We will give a brief overview of the GAN design,
45 its structure, and its potential issues. First introduced in 2014
46 B. AUTO-REGRESSIVE MODELS [10], GANs consist of two competing agents, typically neural
47 Auto-regressive models try to forecast future values of a networks, referred to as the generator and discriminator. The
48 series based on past values of the same series and a stochastic generator network attempts to model a noise vector z to fit
49 term. In the simplest auto-regressive generative model, the the probability distribution of the input data, whereas the
50 conditional distributions p(xi |x<i ) correspond to a Bernoulli discriminator attempts to accurately classify the generated
51 random variable and the model learns a function that maps data from the real data. Loss convergence of generator and
52 preceding variables x1 , x2 , ...xi−1 to the mean of this distri- discriminator terminates the training period. Essentially, the
53 bution resulting in (2) from [29]: two networks are jointly involved in a 2 – player min-max
54 game up until the discriminator fails to distinguish between
pqi (xi |x<i ) = Bern(fi (x1 , x2 ...xi−1 )). (2)
55 the real data and the generated data, a point denoted by
56 New data can then be generated by sampling from the attainment of Nash equilibrium [33].
57 conditional distribution learnt by the model. The use of the Mathematically, this process is expressed in [10] as:
58
4 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 36 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
FIGURE 4. Simple GAN Structure
13
14
15 min max V (D, G) =Ex∼pdata(x) [log(D(x)]+
G D (4)
16
Ez∼pz (z) [1 − log(D(G(z))]
17
18 Here x is the input data, log(D(x)) is the predicted output
19 of the discriminator for xi , log(D(G(z)) is the output of FIGURE 5. Simplified block diagram of TimeGAN
20 the discriminator on the GAN generated data G(z). The aim
21 is to maximize the ability of the discriminator to identify
22 real data from generator produced data, so we maximize this network also first produces samples in the latent space which
23 value, whereas the generator part of the equation tries to are then converted back to the original feature space via a
24 minimize the discriminator’s ability to correctly classify real recovery network. This is done to reduce high dimensionality
25 and fake data. This translates to maximizing the first term and in the adversarial learning space. A high level structure of the
26 minimizing the second term of (4). The basic GAN structure TimeGAN is shown in Fig. 5.
27 is illustrated in Fig. 4. The recovery and embedding parts are trained via a super-
28 GANs initially became popular for their ability to produce vised and reconstruction loss. The reconstruction loss ensures
29 high-quality image data while avoiding the problems associ- the learnt latent representation is correct and the supervised
30 ated with using Markov chains or approximating unsolvable loss aids the generator in learning the temporal dynamics of
31 likelihood functions. This is achieved by training the GANs the data. The generator and discriminator are then trained in
32 via backpropagation and using dropout regularization. The a typical adversarial fashion. The losses used to train the Em-
33 drawback is that GANs are often hard to train, and suf- bedding, Recovery, Generator and Discriminator networks
34 fer from problems like overfitting (reproducing input data), can be found as Eqs. (7), (8) and (9) in [35].
35 mode collapse (generating samples from only one class in
36 the data) and training instability. 2) DoppelGANger
37 The use of GANs to generate tabular or time-series data is The DoppelGANger introduces several new ideas to solve
38 less common than generating images or video. One model typical GAN problems like overfitting as well as problems
39 used to generate tabular data is the CTGAN (Conditional faced when generating longer, more complex time-series data
40 Tabular GAN) [34]. Whereas, the TimeGAN [35] and Dop- [36]. One change in design that is most relevant to our work,
41 pelGANger [36] are frameworks that modify the traditional is how much of the data is generated in a single instance.
42 GAN architecture to make it more suitable for time-series Since RNNs produce a single measurement in a single pass
43 data. These last two models are what we use in this work. A and for a time-series of length L perform L passes, they
44 brief overview of them is given below. tend to forget prior values and struggle to capture long
45 term correlations in a time-series. Furthermore, authors in
46 1) TimeGAN [36], in Section 4.1, page 5 state that even LSTMs, which
47 The autoregressive models mentioned above are good for were designed to correct the above mentioned problem with
48 capturing the temporal dynamics of a sequence, but are RNNs, empirically struggle to perform well when the length
49 deterministic in nature as opposed to generative. Conversely, of a time-series surpasses a few hundred records.
50 GAN architectures such as the RGAN used in [15] do not DoppelGANger solves this by modifying the RNN struc-
51 really take into account the inter-row dependencies of time ture to produce S values in a single pass. This reduces
52 series data. The TimeGAN solves this problem by combining the overall number of passes required to generate the en-
53 the GAN framework with an autoregressive setup. Both are tire series, but the quality of the generated samples also
54 trained jointly with the unsupervised GAN loss guided by deteriorates as S increases. It uses the Wasserstein Loss as
55 the supervised autoregressive loss. Additionally, the model opposed to the regular GAN loss function since the former
56 makes use of an embedding network to map high level leads to more stable training in this case. The optimization
57 features to a low-level latent feature space. The generator function for this GAN architecture may be expressed as:
58
VOLUME X, 2022 5
59 For Review Only
60
Page 37 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 A. QUALITATIVE ASSESSMENT
3 Perhaps the simplest way to determine how similar our data is
4 to the original, true data is to simply visualize it. While this
5 approach does leave out hard numbers, it gives us a quick
6 high level view of the shape and spread of the generated data
7 distribution. In this work, we will analyze histograms and
8 auto-correlation function (ACF) plots for this purpose. The
9 histograms will allow us to judge whether the GANs produce
10 data that faithfully captures the range of values present in
11 the original data as well as its distribution. The ACF plots
12 are based on calculating the correlation of a time-series with
13 itself at different, equidistant points in time (referred to as
14 lags). In our case, we choose a lag of up to 168 since we have
15 hourly data over multiple weeks and expect network activity
16 FIGURE 6. Simplified block diagram of DoppelGANger
patterns to repeat. In general, our aim is to see how similar
17 the ACF plot and histograms of the generated data are to real
18 data.
19 min max L1 (D1 , G) + L2 (D2 , G) where Li for i=1, 2 is the
G D1 ,D2
20 Wasserstein Loss, described in [36] as: B. KULLBACK-LEIBLER DIVERGENCE
21
Li = Ex∼px [Ti (Di (x)] − Ez∼pz [Di (Ti (Gz ))] The Kullbeck-Liebler Divergence (or KL - Divergence, or
22 (5)
2 KLD) measures the number of extra bits needed to repre-
23 −λEx̂∼pˆx [(||∇x̂ Di (Ti (x̂)||2 ) − 1) ]
sent a true distribution P with a code written to represent
24
where T1 (x) = x, T2 (x) = tx + (1 − t)(G(z)) and t is a distribution Q which is an approximation of P . Thus, KL
25
value from the uniform distribution. - Divergence can be interpreted as the inefficiency caused
26
A basic block diagram for a DG is illustrated in Fig. 6. by using an approximate distribution Q rather than P . Note
27
A more detailed structure with explanation can be found in that this does not mean that KLD is a distance measure; it
28
Fig.7 of [36]. is not since it is asymmetric and does not obey the triangular
29
inequality. While the use of KL - Divergence is uncommon in
30
V. EVALUATION METRICS comparing time-series data, we use it to evaluate the distribu-
31
Evaluating the quality of GAN generated data is an open tions of the two data sets rather than their relationship in time.
32
research problem. Unlike discriminative models which can This can be done by calculating the KLD after discretizing
33
be evaluated on fairly robust metrics like accuracy, precision the continuous time-series and using the bin counts to create
34
and F1 score among others, generative models have no such probability distributions.
35
counterparts. In the case of images, visual inspection is relied The mathematical form of the KLD is shown below:
36
on to determine whether the produced image is good quality.
37 DKL (P || Q) =
X
P (x) log
P (x)
. (6)
This is not feasible with tabular forms of data. Another
38 x∈X
Q(x)
method is to indirectly evaluate the quality of the generated
39
data by seeing how well it performs when substituted in place
40 P(x) represents our true distribution (the real data), whereas
of real data in supervised tasks such as classification and
41 Q(x) represents the approximate distribution (the generated
forecasting.
42 data). Since the metric is asymmetric, the position of the
43 Study in [21] provides a comprehensive survey of potential two series in the formula is important and interchanging the
metrics that can be used to evaluate GANs, Most of our cho-
44 position changes the results.
sen metrics indirectly evaluate the quality of the generated
45
data either via use in forecasting models, qualitative assess-
46 C. DYNAMIC TIME WARPING (DTW)
ments, or quantitative measures such as distance. Our criteria
47 First introduced in 1978, Dynamic Time Warping is a class
for choosing evaluation metrics is based on the following
48 of algorithms that can be used to compare two ordered
three principles:
49 sequences against each other [37]. These could be speech
50 • The metrics should favor the generated data that is most sequences, music or any other time ordered sequence. It was
51 similar to the original data. originally used in spoken word recognition since it could
52 • The metrics should reward models that generate diverse align the time axis of two sequences, say, the words now
53 examples and are not prone to common GAN problems and noow and calculate the Euclidean distance between them.
54 such as overfitting, mode collapse, and mode drop. This is opposed to directly using a distance metric which
55 • The metrics should be computationally inexpensive and would give a large value in the comparison of any two such
56 as simple to interpret as possible. sequences, even though they are identical. In this work, we
57 employ an R implementation of the algorithm [38].
58
6 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 38 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
13 FIGURE 7. Aligning two dummy series in DTW [38].
14
15
16 In mathematical terms, DTW finds a warping function
17 ϕ(k), described in [38] as:
18
ϕ(k) = (ϕx (k), ϕy (k)), (7)
19
FIGURE 8. Block Diagram of the Imputation Utility process
20 where ϕx and ϕy remap the time indexes of the reference
21 series x and test series y. Given these warping functions,
22 we find the average distance between these warped x and y
23 series. The aim of the algorithm is to align the two series in
24 such as a way so as to reduce the distance between the series’
25 as much as possible. Thus, the optimization problem is given
26 in [38] as:
27 D(x, y) = minϕ dϕ (x, y). (8)
28
29 The left-over distance is the inherent difference between two
30 sequences. Fig. 7 illustrates how the DTW algorithm aligns
31 two time-series sequences.
32
D. TRAIN SYNTHETIC TEST REAL (TSTR) AND DATA
33
AUGMENTATION
34
35 TSTR is an indirect evaluation technique where a predictive FIGURE 9. Block Diagram of the TSTR process.
36 model is trained on synthetic data and verified on real data.
37 The dataset generated by GAN is used for training a model
38 which is then tested on examples from real dataset. It was the data changes. The forecasted values are then compared
39 proposed by [15]. We use this technique to evaluate the against the real values to calculate the MAPE. Following this
40 telecommunications data generated by our selected models TSTR pipeline, illustrated in Fig. 9, allows us to compare
41 using a simple Gradient Boosting Regressor. Firstly, we how much prediction accuracy is affected by replacing the
42 partition the original dataset into a train and test set. The real data with synthetic data in a downstream application
43 model is trained on the training set and tested on the held-out such as network load forecasting.
44 test set. This process is Train on Real, Test on Real (TRTR).
45 The test set produced during TRTR is passed to the same E. IMPUTATION UTILITY
46 model trained on synthetic data. The model’s performance In machine learning problems, missing data poses a serious
47 is assessed using mean absolute percentage error (MAPE) threat to the learning ability of the model. It can introduce
48 which has the following mathematical representation: bias, make data handling onerous, and reduce efficiency.
49 n
100 X |y − ŷ| Missing data is often filled in, or imputed, using a variety
50 M AP E (y, ŷ) =
n i=1 y
. (9) of statistical and ML - based techniques. These include: i)
51 carrying the last available reading forward, ii) filling with
52 Here, y and ŷ represent the true and forecasted data points, mean values, iii) imputation via K-nearest neighbours and iv)
53 respectively. interpolation.
54 Additionally, we also augment a small amount of real In this work, to introduce scarcity, we delete data from
55 data with different amounts of synthetic data to study any the sequence randomly. After we have obtained our GAN
56 improvements in forecasting accuracy. The total data is fixed generated data, we impute the missing data using the cor-
57 at 5 weeks, but the number of real and synthetic weeks in responding sample points from our synthetic data. Similarly,
58
VOLUME X, 2022 7
59 For Review Only
60
Page 39 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 TABLE 1. Data organization scheme. ’Weeks’ represents the number of TABLE 2. An example of the required structure for three weeks of data before
3 weeks of data that is being used. ’Timespan’ shows the date range and it is passed to the DoppelGANger.
’length’ is the number of recordings in the data.
4
Week of year 0 1 2 3 ... 168
5 Weeks timespan length 45 100 150 200 250 ... 400
6 1 11/4/2013 - 11/10/2013 168 46 10 15 20 25 ... 40
7 3 11/4/2013 - 11/24/2013 504 47 1 5 2 4 ... 6
5 11/4/2013 - 12/08/2013 840
8 7 11/4/2013 - 12/22/2013 1176
9 test week 12/9/2013 - 12/15/2013 168 TABLE 3. Hyper-parameters used for each data set for the DoppelGANger.
10
11 Hyper-parameter Bocconi Navigli Duomo
12 we fill the same points via quadratic interpolation. Finally,
epochs 10,000 2,000 2,000
aux discriminator False False False
13 we pass the sequence with missing values, the sequence self - norm False False False
14 with imputed GAN generated values, and the sequence with sequence length 24 12 12
15 interpolated values to the same model and assess how the
16 forecasting accuracy is affected. Fig. 8 shows the imputation
17 utility methodology. The %x denotes the amount of data Thus, by using the features above as inputs and training
18 removed, which takes on the value of 20, 40, 60 and 80 to predict the difference, we are able to ‘tabularize’ the
19 percent. time-series data and use it to train a decision tree - based
20 algorithm to make forecasts. The model uses 100 estimators,
21 VI. RESULTS AND DISCUSSION
0.05 learning rate, and max tree depth of 12.
22
A. PRE-PROCESSING & SETUP
23
24 Figures 10 and 11 show our data generation pipeline and how
B. MODEL TRAINING
25 real and generated data is compared with our comparative
1) DoppelGANger
26 methods respectively.
We break each regions’ data into four segments and one Detailed instructions on how to prepare data for use with the
27 DoppelGANger can be found on the GitHub repository of
28 test week, details of which are given in Table 1. The test
week in Table 1 is the week we forecast for. For conciseness, [36]. In this section, we will only share the details relevant to
29 reproducing our results.
30 we report the qualitative assessment, KL - Divergence and
Dynamic Time Warping of 3, 5 and 7 weeks of data only. The DoppelGANger is designed to work with time-series
31 that also have corresponding static features/metadata (like
32 For the Train Synthetic Test Real (TSTR), augmentation
and imputation assessments, a forecasting model is trained ISP provider, area etc). However, the data sets we’re working
33 with are univariate time-series with no metadata.
34 on 1, 3 and 5 weeks of data only and then used to forecast
over the test week, which is chronologically the 6th week in To solve this, we create a dummy metadata variable based
35
the time-series. If we use 7 weeks of data as well, that would on the week of year corresponding to our data. Since our data
36
require making the test week the 8th week. The problem is has hourly resolution, there are 168 readings in each week.
37
that the 8th week lies in the Christmas season and its readings This dummy metadata is then scaled between 0 and 1. Note
38
are radically different from our training data. Thus, using 7 that it is also possible to use a constant value in the place of
39
weeks of data is not appropriate in these assessments. metadata as well.
40
To perform the forecasting, we avoid statistical models like Table 2 illustrates how the data should be structured when
41
autoregressive integrated moving average (ARIMA) or error using three weeks of data. The same can be extended to five
42
trend and seasonality (ETS) since they require significant and seven weeks as well. Table 3 lists the hyper-parameters
43
tuning with domain knowledge of the data set. Also LSTMs used in our work. Any other hyper-parameters not mentioned
44
or other deep learning based architectures are not employed here used default values.
45
46 owing to a lack of training data. Instead, a simple gradient
boosting regressor provided by scikit-learn [39] is used in 2) TimeGAN
47
48 this work. It is possible to use Tree - based models for this As in the case of DoppelGANger, instructions on how to use
49 purpose, as demonstrated in [40]. In order to train the model, TimeGAN can be found in the GitHub repository correspond-
50 we extract several features of the time-series by hand, these ing to [35]. The code for the TimeGAN does not provide
51 are briefly explained below: a way to get the generated time-series with timestamps and
52 • T-1: The time-series with lag 1. corresponding values. Therefore, there are certain modifica-
53 • Hours: The hour at which the sample is taken. tions required before it can be used. These changes are listed
54 • 1st difference: Difference between consecutive time- below:
55 series values. • The input data cannot contain timestamps. Thus, in
56 • 2nd difference: Difference between consecutive values order to recreate timestamps for the generated data, pass
57 of the 1st difference. ’day of year’ and ’Hour’ as features along with the
58
8 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 40 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
FIGURE 10. Block diagram depicting our data division and generation process.
15
16
17 TABLE 4. Hyper-parameters used for the TimeGAN
18
19 Hyper-parameter Value
20 sequence length 10,000
module GRU
21 hidden dimensions 28
22 number of layers 3
23 iterations 50,000
batch size 64
24
25
TABLE 5. Hyper-parameters used for PAR
26
27 Hyper-parameter Value
28 epochs 1,000
29 segment size None
30
31
32 three used in this work. The input data is simply the internet
33 usage values against the corresponding timestamps. Table 5
FIGURE 11. Block diagram depicting our comparative methods and outlining
34 which techniques use what amount of data.
lists the hyper-parameter values we used.
35 We generate 1 sample from the trained PAR model. This
36 generated data is identical to the input data but filled with
37 internet connections value. If the data spans multiple synthetic values. More information on how PAR works can
38 years, then a ’year’ feature will also be included. be found in the documentation of SDV’s GitHub repository.
39 • The TimeGAN outputs generated data in normalized
40 form. Thus, the ’MinMaxScalar’ function in the code C. QUALITATIVE ANALYSIS
41 needs to be modified to return the maximum value 1) Histograms
42 and difference between the minimum and maximum For Bocconi, a quick look at Fig. 12 shows the performance
43 values. These values can then be used to denormalize disparity between the three models. It is clear that both
44 the generated data. DoppelGANger (DG) and TimeGAN (tgan) perform well
45 • The generated data is in the form of a 3D array in the in terms of capturing the full range of values present in
46 shape of (Number of samples, sequence length, Number the original data. However, a closer look shows that the
47 of features). In order to recreate the generated data in the TimeGAN struggles to capture outlier values such as the
48 form of the original data, iterate through each sample, values greater than 700. We see that the DoppelGANger does
49 denormalize it, and then append all the samples together. a slightly better job at capturing less frequent values. The
50 We observed that the same parameters worked well for all PAR model misses large parts of the original distribution, and
51 three data sets in the case of the TimeGAN. Please note that even generates negative values at 7 weeks of data.
52 hyper-parameter ’sequence length’ in Table 3 is not the same In the case of Navigli, these observations are more marked in
53 as ’sequence length’ in Table 4. a similar analysis of the Navigli area, shown in Fig. 14. We
54 observe an increasing overlap between the two distributions
55 3) PAR as the training data is increased. PAR performs better here
56 PAR (Probabilistic Auto-Regressive) is part of the Synthetic too compared to Bocconi, although it still doesn’t capture
57 Data Vault [31]. PAR is the easiest model to train out of all the multi-modal distribution properly, it doesn’t output any
58
VOLUME X, 2022 9
59 For Review Only
60
Page 41 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 negative values either. of data and the opposite for the DoppelGANger. While not
3 In Duomo in Fig. 16 both DoppelGANger and TimeGAN shown here, the same explanation also applies to Navigli.
4 perform really well. PAR performs decently when trained and Duomo on the other hand, features no such long-term down-
5 compared with three weeks of data, but performance quickly trend, which is why we see a decrease in DTW scores as the
6 deteriorates; on five weeks of data it produces negative values amount of available data is increased.
7 and at seven weeks of data it underestimates the right-skew
8 of the original distribution. 2) KL - Divergence
9 In short then, both TimeGAN and DoppelGANger generated The KL - Divergence calculations for all three regions are
10 data lies between the range of the original input data, whereas shown in Table 7. There is no significant pattern to be seen,
11 PAR’s performance fluctuates significantly and its generated other than that differences for DoppelGANger and TimeGAN
12 data contains negative values that weren’t present or even tend to stay consistent, whereas PAR generated data outputs
13 possible in the input data. the largest values and demonstrates the most volatility. In
14 general, from an information theory perspective, we would
15 2) ACF Plots not need a great many more bits to represent the generated
16 While the histograms give an idea about the distribution of distribution as the real distribution.
17 the values of a time-series, they do not provide any informa-
18 tion on how well the generated series captures the temporal 3) Train Synthetic Test Real (TSTR)
19 correlation of the original series. For this, we create a series Looking at Table 8, we can make two observations. The first
20 of auto-correlation function (ACF) plots in a similar fashion can be seen in the results for Bocconi where we see that
21 to the histograms. Fig. 13 shows the ACF plots for Bocconi increasing the amount of data leads to negligible changes in
22 region. The plots depict slightly better performance of the accuracy for both DoppelGANger and TimeGAN. PAR goes
23 DoppelGANger than the TimeGAN in capturing the extreme against this trend but we’ve already seen above that PAR’s
24 correlation peaks. However, PAR fails to capture the temporal generated data is inconsistent so this is understandable. In
25 dependence between the series. any case, compared to the TRTR values, we see a 5% and 9%
26 The same idea is repeated in the Navigli district in Fig. 15, error difference between models trained on five weeks of data
27 where the performance of the TimeGAN deteriorates com- generated by TimeGAN and DoppelGANger respectively,
28 pared to that of DoppelGANger, while PAR again strug- this shows that for forecasting a series’ such as Bocconi,
29 gles with capturing temporal dynamics of the input time- which has weekly and daily seasonality, the generated data is
30 series. We observe an improvement in how well Doppel- somewhat less useful. The advantage of the DoppelGANger
31 GANger captures the negative correlation peaks as we in- over the TimeGAN comes because the TimeGAN cannot
32 crease the training data but the opposite applies in case of predict the troughs on the weekends well, largely because it
33 the TimeGAN. Compare that to Fig. 17 that shows the ACF fails to capture them during generation as well.
34 plots for Duomo, and where the plots for DoppelGANger The situation in Navigli and Duomo is somewhat better. In
35 and TimeGAN are nearly identical. Thus, it seems that the the former the accuracy for DoppelGANger and TimeGAN
36 models struggle to capture the correlations across a more improves as we increase the amount of generated data. A
37 chaotic time-series like Navigli but easily reproduce the plot of the real vs predicted value is given in Figure 19
38 correlations in places like Duomo and Bocconi, which have for TimeGAN. These accuracy values are also close to the
39 more regularly repeating activity patterns. accuracy of the real data, indicating that both can produce
40 synthetic data comparable to the true data. In Duomo, this
41 D. QUANTITATIVE ANALYSIS trend only applies to the DoppelGANger, with an error
42 Now that we have a cursory idea of the quality of the difference less than or equal to 3% between it and the true
43 generated data, we will quantify its usefulness through some data.
44 direct and indirect metrics which have already been explained Overall, the DoppelGANger generates significantly better
45 in the preceding sections. quality data than the TimeGAN or PAR across all three
46 regions.
47 1) Dynamic Time Warping
48 The DTW results in Table 6 show several interesting things 4) Performance with Augmented Data Set
49 and corroborate some of the observations made in the prior Augmenting the real data with synthetic data yields results
50 section. In all three regions, PAR’s generated series’ are shown in Table 9. There are no across the board improve-
51 extremely different from the ground truth. While looking ments, and the improvements observed are minimal. For
52 at Bocconi, we see that the DTW score increases as the instance, in Bocconi, adding 4 weeks of DoppelGANger syn-
53 generated series get longer. This may be explained by Fig. 18, thetic data to 1 week of real data does improve performance
54 where we see that the real data decreases in amplitude to- over just using 1 week of real data as shown in Table 9,
55 wards the end. This trend is not captured by DoppelGANger but the improvement is only 0.6%. In Navigli, adding two
56 but is somewhat reflected in the TimeGAN’s output leading weeks from either PAR, DoppelGANger or TimeGAN to
57 to an improved DTW score for the TimeGAN on 7 weeks three weeks of real data yields improvement, but as before
58
10 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 42 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 FIGURE 12. (a), (b) and (c) show the distribution of values in the generated data when the models were trained on 3, 5 and 7 weeks of real data respectively, for
Bocconi. Note that DG and tgan are short for DoppelGANger and TimeGAN respectively
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54 FIGURE 13. (a), (b) and (c) show the ACF plots of real vs generated data when the models were trained on 3, 5 and 7 weeks of real data respectively, for Bocconi.
55 Note that DG and tgan are short for DoppelGANger and TimeGAN respectively.

56
57
58
VOLUME X, 2022 11
59 For Review Only
60
Page 43 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 FIGURE 14. (a), (b) and (c) show the distribution of values in the generated data when the models were trained on 3, 5 and 7 weeks of real data respectively, for
Navigli. Note that DG and tgan are short for DoppelGANger and TimeGAN respectively
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54 FIGURE 15. (a), (b) and (c) show the ACF plots of real vs generated data when the models were trained on 3, 5 and 7 weeks of real data respectively, for Navigli.
55 Note that DG and tgan are short for DoppelGANger and TimeGAN respectively.

56
57
58
12 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 44 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 FIGURE 16. (a), (b) and (c) show the distribution of values in the generated data when the models were trained on 3, 5 and 7 weeks of real data respectively, for
Duomo. Note that DG and tgan are short for DoppelGANger and TimeGAN respectively
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54 FIGURE 17. (a), (b) and (c) show the ACF plots of real vs generated data when the models were trained on 3, 5 and 7 weeks of real data respectively, for Duomo.
55 Note that DG and tgan are short for DoppelGANger and TimeGAN respectively.

56
57
58
VOLUME X, 2022 13
59 For Review Only
60
Page 45 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 TABLE 6. DTW - based similarity scores (lower the better) for Bocconi, Navigli and Duomo. ’Weeks’ indicates the amount of data used in the calculations in terms
3 of number of weeks.

4 Bocconi Navigli Duomo


5 Weeks DoppelGANger PAR TimeGAN DoppelGANger PAR TimeGAN DoppelGANger PAR TimeGAN
6 3 19.28 260 30.65 35.2 67.93 32.37 42.38 95.74 47.52
7 5 21.89 67 36.7 31.58 53.04 34 38.73 459.25 48.4
7 25.28 238.54 31.47 40.19 54.91 34.26 36.14 109.4 36.8
8
9
TABLE 7. KL - Divergence (lower the better) for Bocconi, Navigli and Duomo. ’Weeks’ indicates the amount of data used in the calculations in terms of number of
10 weeks.
11
12 Bocconi Navigli Duomo
13 Weeks DoppelGANger PAR TimeGAN DoppelGANger PAR TimeGAN DoppelGANger PAR TimeGAN
3 0.01 0.08 0.02 0.006 0.12 0.009 0.02 0.1 0.049
14 5 0.01 0.13 0.032 0.014 0.10 0.004 0.05 0.15 0.06
15 7 0.03 0.22 0.034 0.034 0.04 0.017 0.05 0.24 0.05
16
17 TABLE 8. Train Synthetic Test Real (TSTR) and Train Real Test Real (TRTR) prediction accuracies (in MAPE) Bocconi, Navigli and Duomo. ’Weeks’ indicates the
18 amount of data used in the calculations in terms of number of weeks. Note that DG and tgan are short for DoppelGANger and TimeGAN respectively.
19
20 Bocconi Navigli Duomo
Weeks DG tgan PAR TRTR DG tgan PAR TRTR DG tgan PAR TRTR
21 1 12.21 15.72 28.16 10.48 7.6 10.33 15.46 7.34 12.5 13.2 26.5 9.34
22 3 12.25 16.54 26.66 8.47 7.51 7.89 24.58 7.20 11.08 12.4 22.1 7.68
23 5 12.59 16.21 20.7 7.26 7.24 8.1 16.54 6.84 10.11 18.3 30.8 7.14
24
25 TABLE 9. Augmented data performance for Bocconi, Navigli and Duomo (For reference, TRTR for 5 weeks’ real data is 7.26%, 6.84% and 7.14% for Bocconi,
Navigli and Duomo respectively). All values are in MAPE. ’R + S (Weeks)’ means Real + Synthetic number of weeks, so ’1 + 4’ means 1 real and 4 synthetic weeks
26 of data.
27
28 Bocconi Navigli Duomo
29 R + S (Weeks) DoppelGANger TimeGAN PAR DoppelGANger TimeGAN PAR DoppelGANger TimeGAN PAR
1+4 9.88 15.04 13.45 7.41 7.67 8.5 9.6 10.9 13.85
30 4+1 7.61 8.46 7.73 6.76 7.07 6.78 8.5 7.63 8.59
31 2+3 8.84 11.27 10.4 7.42 7.43 7.66 8.8 9 11.7
32 3+2 7.88 7.70 8.83 6.93 6.53 6.37 7.9 8.8 7.86
33
34
35 data at all.
36 The above results show that combining synthetic data with
37 real data can yield performance benefits in a time-series
38 forecasting framework. These improvements are minor, but
39 that can be expected given that we are working with scarce
40 data.
41
42 5) Imputation Utility
43 Tables 10, 11 and 12 display the prediction errors for
44 five models trained on mixed data from all three generative
45 models, as well as interpolated data. For comparison, we also
46 train a model on the smaller data set created by randomly
47 removing observations. We see that the performance of all
48 models deteriorate as we use them to fill larger gaps in the
49 real data set. However, the DoppelGANger’s performance
50 appears the most stable, followed by the TimeGAN although
51 it exhibits higher errors overall. We also observe the model
52 FIGURE 18. Generated Data vs Real Data, Bocconi, shown over 7 Weeks for trained on PAR imputed data performs worse than a model
all three models.
53 trained on true unimputed data, but its performance suddenly
54 improves when the original data is reduced by 90%. We can
55 say that PAR’s uncharacteristically good performance in this
56 this improvement is small; less than 0.5%. In Duomo, we case is probably due to our model learning some spurious
57 see no improvement in any combination of real and synthetic correlations that happen to allow it to predict well on the
58
14 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 46 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 TABLE 10. Imputing missing data via interpolation and with generated data for Bocconi university. All values are MAPE. The total amount of data used is five
3 weeks or 840 hourly readings.

4
Bocconi University
5 %age data removed DoppelGANger TimeGAN PAR Missing Interpolated
6 20 7.22 11.25 14.61 10.62 7.72
7 40 8.23 14.25 19.84 16.83 8.36
60 9.99 15.45 24.24 31.67 10.34
8 80 11.51 14.28 21.71 53.91 13.23
9 90 11.35 14.98 15.13 54.04 13.93
10
11 TABLE 11. Imputing missing data via interpolation and with generated data for Navigli District. All values are MAPE. The total amount of data used is five weeks or
12 840 hourly readings.
13
14 Navigli District
%age data removed DoppelGANger TimeGAN PAR Missing Interpolated
15 20 6.88 6.54 9.91 8.42 7.08
16 40 6.36 7.47 18.71 9.46 6.9
17 60 7.02 7.94 21.66 16.08 7.76
80 6.94 8.74 22.83 27.65 11.57
18 90 7.24 7.8 14.85 27.74 11.31
19
20
TABLE 12. Imputing missing data via interpolation and with generated data for Duomo Cathedral. All values are MAPE. The total amount of data used is five weeks
21 or 840 hourly readings.
22
23 Duomo Cathedral
24 %age data removed DoppelGANger TimeGAN PAR Missing Interpolated
20 9.68 10.8 64.47 10.83 7.12
25 40 11.23 14.7 84.37 16.09 8.41
26 60 12.94 22.33 71.15 29.37 8.54
27 80 12.4 20.83 36.65 72.56 10.26
90 12.13 22.36 30.24 91.13 10.65
28
29
30
missing data with a common technique like quadratic inter-
31
polation seems to impute the data just as well. The inter-
32
polated model’s performance deteriorates at 80% and 90%
33
missing data levels, where the DoppelGANger imputed data
34
yields a roughly 2.5% accuracy gain in Bocconi, a 4% gain
35
in Navigli and a degradation of 2% in Duomo. These results
36
emphasize that GAN based models really start yielding true
37
benefits at high sparsity levels, where interpolation schemes
38
fade off.
39
40 The key takeaway here is that the DoppelGANger’s perfor-
41 mance for different types of time series data is more reliable,
42 since using it to impute random gaps in the true data yields
the most accuracy, in some cases outperforming the model
43
trained entirely on true data (when 40% data is removed
44
in Navigli) The TimeGAN performs well for Navigli, but
45
does relatively poorly in Bocconi and Duomo. Finally, PAR’s
46
FIGURE 19. True vs predicted, Navigli, trained on 5 weeks of TimeGAN data. output is more or less random, and it only captures the value
47
distribution of the original data sets reasonably well.
48
49
50 test data despite the poor training data. The Navigli district VII. CONCLUSION
51 recordings show similar trends, except that the TimeGAN In this work, we compared three publicly available time-
52 and DoppelGANger perform equally well, and the gap be- series generative models against each other using an actual
53 tween the GAN imputed data sets and the interpolated data mobile network data set. Two methods are based on the
54 is wider, at around 4% in favor of the DoppelGANger and GAN - architecture, while one is a deep learning based auto-
55 TimeGAN. regressive model. We see that the GAN based architectures
56 This is not the case in Duomo, where the DoppelGANger are superior to the auto-regressive approach across an array
57 is the clear winner. However, simply interpolating over the of numerical and graphical measures. We used the gener-
58
VOLUME X, 2022 15
59 For Review Only
60
Page 47 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 ated data to train a supervised machine learning algorithm [9] Deniz Gündüz, Paul de Kerret, Nicholas D. Sidiropoulos, David Gesbert,
3 and assessed its performance on unseen real data. These Chandra R. Murthy, and Mihaela van der Schaar. Machine Learning in the
4 experiments revealed that models trained on GAN - based
Air. IEEE Journal on Selected Areas in Communications, 37(10):2184–
2199, 2019.
5 DoppelGANger and TimeGAN generated data were com- [10] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
6 petitive with a model trained on true data, but the Doppel- Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen-
7 GANger was most superior in performance across all three
erative Adversarial Nets. In Z. Ghahramani, M. Welling, C. Cortes,
N. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Infor-
8 regions. Finally, our simulations revealed that increasing the mation Processing Systems, volume 27. Curran Associates, Inc., 2014.
9 training data did not always prove beneficial but in some [11] H. Ye, L. Liang, G. Y. Li, and B. Juang. Deep Learning-Based End-to-End
10 cases degraded the generative model’s performance and that Wireless Communication Systems With Conditional GANs as Unknown
Channels. IEEE Transactions on Wireless Communications, 19(5):3133–
11 some models, like the DoppelGANger, perform very well 3143, 2020.
12 even on relatively little data. We then saw that augmenting [12] L. Sun, Y. Wang, A. L. Swindlehurst, and X. Tang. Generative-
13 small amounts of real data with comparatively large amounts Adversarial-Network Enabled Signal Detection for Communication Sys-
tems With Unknown Channel Models. IEEE Journal on Selected Areas in
14 of synthetic data yielded minor performance improvements. Communications, 39(1):47–60, 2021.
15 Finally, we saw that GAN generated values were a good [13] Ben Hughes, Shruti Bothe, Hasan Farooq, and Ali Imran. Generative Ad-
16 substitute for real data when imputing missing values in a versarial Learning for Machine Learning empowered Self Organizing 5G
Networks. In 2019 International Conference on Computing, Networking
17 time-series, but interpolation techniques in most cases can and Communications (ICNC), pages 282–286, 2019.
18 perform just as well except when the number of missing [14] Mohammad Navid Fekri, Ananda Mohon Ghosh, and Katarina Grolinger.
19 values is quite large. Generating energy data for machine learning with recurrent generative
20 While this is the first in-depth research on comparing the
adversarial networks. Energies, 13(1):1–23, 2019.
[15] Stephanie L. Hyland, Cristóbal Esteban, and Gunnar Rätsch. Real-valued
21 latest deep learning based generative models for a time series (medical) time series generation with recurrent conditional GANs. arXiv,
22 telecommunications data set, future extensions of the work 2017.
23 may involve replicating it with a much larger data set, as [16] Olof Mogren. C-RNN-GAN: A continuous recurrent neural network with
adversarial training. In Constructive Machine Learning Workshop (CML)
24 well as with multivariate time-series data. Another direction at NIPS 2016, page 1, 2016.
25 could be to work with tabular data, which would entail [17] Changhee HAN, Kohei MURAO, Shin’ichi SATOH, and Hideki
26 using different GAN models as well as different evaluation NAKAYAMA. Learning More with Less: GAN-based Medical Image
Augmentation. Medical Imaging Technology, 37(3):137–142, 2019.
27 metrics. Finally, the DoppelGANger as well as PAR are [18] Maayan Frid-Adar, Idit Diamant, Eyal Klang, Michal Amitai, Jacob
28 capable of reproducing time-series data with corresponding Goldberger, and Hayit Greenspan. GAN-based synthetic medical image
29 context/metadata information, so it may be worthwhile to see augmentation for increased CNN performance in liver lesion classification.
Neurocomputing, 321:321–331, 2018.
30 if their performance differs in that environment. [19] Changhee Han, Leonardo Rundo, Ryosuke Araki, Yujiro Furukawa, Gian-
31 carlo Mauri, Hideki Nakayama, and Hideaki Hayashi. Infinite Brain MR
32 REFERENCES Images: PGGAN-Based Data Augmentation for Tumor Detection. Smart
33 [1] Ali Imran, Ahmed Zoha, and Adnan Abu-Dayya. Challenges in 5G: how
Innovation, Systems and Technologies, 151:291–303, 2020.
[20] Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda
34 to empower SON with big data for enabling 5G. IEEE Network, 28(6):27–
Wang, and Russ Webb. Learning from simulated and unsupervised
33, 2014.
35 [2] Jessica Moysen and Lorenza Giupponi. From 4G to 5G: Self-organized
images through adversarial training. Proceedings - 30th IEEE Confer-
36 network management meets machine learning. Computer Communica-
ence on Computer Vision and Pattern Recognition, CVPR 2017, 2017-
January:2242–2251, 2017.
37 tions, 129:248–268, 2018.
[21] Ali Borji. Pros and Cons of GAN Evaluation Measures. 2018.
[3] Alireza Ghasempour. Internet of things in smart grid: Architecture,
38 applications, services, key technologies, and challenges. Inventions, 4(1), [22] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler,
39 2019. and Sepp Hochreiter. GANs Trained by a Two Time-Scale Update Rule
Converge to a Local Nash Equilibrium. In I. Guyon, U. V. Luxburg,
40 [4] Gordon J. Sutton, Jie Zeng, Ren Ping Liu, Wei Ni, Diep N. Nguyen,
S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett,
Beeshanga A. Jayawickrama, Xiaojing Huang, Mehran Abolhasan, Zhang
41 Zhang, Eryk Dutkiewicz, and Tiejun Lv. Enabling technologies for editors, Advances in Neural Information Processing Systems, volume 30.
42 ultra-reliable and low latency communications: From phy and mac layer Curran Associates, Inc., 2017.
[23] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec
43 perspectives. IEEE Communications Surveys Tutorials, 21(3):2488–2524,
Radford, Xi Chen, and Xi Chen. Improved Techniques for Training GANs.
2019.
44 [5] Umair Sajid Hashmi, Arsalan Darbandi, and Ali Imran. Enabling proactive In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors,
45 self-healing by data mining network failure logs. In 2017 International Advances in Neural Information Processing Systems, volume 29. Curran
Associates, Inc., 2016.
46 Conference on Computing, Networking and Communications (ICNC),
[24] Muhammad Ferjad Naeem, Seong Joon Oh, Youngjung Uh, Yunjey Choi,
pages 511–517, 2017.
47 [6] Umair Sajid Hashmi, Ashok Rudrapatna, Zhengxue Zhao, Marek Rozwad- and Jaejun Yoo. Reliable Fidelity and Diversity Metrics for Generative
48 owski, Joseph Kang, Raj Wuppalapati, and Ali Imran. Towards Real-Time Models. In Hal Daumé III and Aarti Singh, editors, Proceedings of
the 37th International Conference on Machine Learning, volume 119 of
49 User QoE Assessment via Machine Learning on LTE Network Data. In
Proceedings of Machine Learning Research, pages 7176–7185. PMLR,
2019 IEEE 90th Vehicular Technology Conference (VTC2019-Fall), pages
50 1–7, 2019. 13–18 Jul 2020.
51 [7] Mostafa Ibrahim, Umair Sajid Hashmi, Muhammad Nabeel, Ali Imran, [25] Konstantin Shmelkov, Cordelia Schmid, and Karteek Alahari. How good
52 and Sabit Ekin. Embracing Complexity: Agent-Based Modeling for is my GAN? In Proceedings of the European Conference on Computer
Vision (ECCV), September 2018.
HetNets Design and Optimization via Concurrent Reinforcement Learning
53 Algorithms. IEEE Transactions on Network and Service Management, [26] NIST/SEMATECH. Introduction to Time Series Analysis. e-Handbook of
54 18(4):4042–4062, 2021. Statistical Methods, 2013.
55 [8] Gianni Barlacchi, Marco De Nadai, Roberto Larcher, Antonio Casella, [27] Lars Kegel, Martin Hahmann, and Wolfgang Lehner. Generating What-If
Cristiana Chitic, Giovanni Torrisi, Fabrizio Antonelli, Alessandro Vespig- Scenarios for Time Series Data. In Proceedings of the 29th International
56 nani, Alex Pentland, and Bruno Lepri. A multi-source dataset of urban life Conference on Scientific and Statistical Database Management, SSDBM
57 in the city of Milan and the Province of Trentino. Scientific Data, 2, 2015. ’17, New York, NY, USA, 2017. Association for Computing Machinery.
58
16 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 48 of 65
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 [28] Christoph Bergmeir, Rob J. Hyndman, and José M. Benítez. Bagging MUHAMMAD HARIS NAVEED received his
3 exponential smoothing methods using STL decomposition and Box–Cox B.Sc. degree in Electrical Engineering from
4 transformation. International Journal of Forecasting, 32(2):303–312, 2016. School of Electrical Engineering and Computer
[29] Aditya Grover and Stefano Ermon. Autoregressive models, Nov 2019. Science, National University of Sciences and
5 [30] Hugo Larochelle and Iain Murray. The Neural Autoregressive Distribution Technology, Pakistan, in 2021. His research inter-
6 Estimator. In Geoffrey Gordon, David Dunson, and Miroslav Dudík, ests include the application of GANs to tabular
7 editors, Proceedings of the Fourteenth International Conference on Arti- and time-series network data and audio keyword
ficial Intelligence and Statistics, volume 15 of Proceedings of Machine
8 Learning Research, pages 29–37, Fort Lauderdale, FL, USA, 11–13 Apr
spotting using using ASR-free approaches.
9 2011. PMLR.
10 [31] Neha Patki, Roy Wedge, and Kalyan Veeramachaneni. The Synthetic
Data Vault. In 2016 IEEE International Conference on Data Science and
11 Advanced Analytics (DSAA), pages 399–410, 2016.
12 [32] Qingsong Wen, Liang Sun, Fan Yang, Xiaomin Song, Jingkun Gao, Xue
13 Wang, and Huan Xu. Time series data augmentation for deep learning: A
survey. In Zhi-Hua Zhou, editor, Proceedings of the Thirtieth International
14 Joint Conference on Artificial Intelligence, IJCAI-21, pages 4653–4660.
UMAIR SAJID HASHMI (Member, IEEE) re-
ceived the B.S. degree in electronics engineering
15 International Joint Conferences on Artificial Intelligence Organization, 8
from the GIK Institute of Engineering Sciences
16 2021. Survey Track.
[33] John F. Nash. Equilibrium points in n-person games. Proceedings of the and Technology, Pakistan, in 2008, the M.Sc.
17 National Academy of Sciences, 36(1):48–49, 1950. degree in advanced distributed systems from the
18 [34] Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veera- University of Leicester, U.K., in 2010, and the
Ph.D. degree in electrical and computer engineer-
19 machaneni. Modeling Tabular data using Conditional GAN. In H. Wallach,
H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, ing from the University of Oklahoma, OK, USA,
20 editors, Advances in Neural Information Processing Systems, volume 32. in 2019. During his Ph.D., he worked as a Grad-
21 Curran Associates, Inc., 2019. uate Research Assistant with the AI4Networks
[35] Jinsung Yoon, Daniel Jarrett, and Mihaela van der Schaar. Time-series
22 generative adversarial networks. Advances in Neural Information Process-
Research Center. He also worked with AT&T, Atlanta, GA, USA, and Nokia
Bell Labs, Murray Hill, NJ, USA, on multiple research internships and co-
23 ing Systems, 32(NeurIPS):1–11, 2019. ops. Since Fall 2019, he has been serving as an Assistant Professor with the
24 [36] Zinan Lin, Alankar Jain, Chen Wang, Giulia Fanti, and Vyas Sekar. Using School of Electrical Engineering and Computer Science, National University
gans for sharing networked time series data: Challenges, initial promise,
25 and open questions. In Proceedings of the ACM Internet Measurement
of Sciences and Technology, Pakistan, where he is working in the broad
26 Conference, IMC ’20, page 464–483, New York, NY, USA, 2020. Associ-
area of 5G wireless networks and application of artificial intelligence toward
system-level performance optimization of wireless networks, and health
27 ation for Computing Machinery.
care applications. He is also affiliated with the University of Toronto as a
[37] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for
28 spoken word recognition. IEEE Transactions on Acoustics, Speech, and postdoctoral research fellow. He has published about 20 technical papers
29 Signal Processing, 26(1):43–49, 1978. in high impact journals and proceedings of IEEE flagship conferences on
30 [38] Giorgino Toni. Computing and Visualizing Dynamic Time Warping communications. He has been involved in four NSF funded projects on
Alignments in R: The dtw Package. Journal of Statistical Software, 31, 5G self organizing networks, and is a Co-Pi on an Erasmus+ consortium
31 08 2009. with a combined award worth of over $4 million USD. Since 2020, he has
32 [39] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, been serving as a Review Editor for IoT and Sensor Networks stream in the
33 M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- Frontiers in Communications and Networks.
sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-
34 learn: Machine Learning in Python. Journal of Machine Learning Re-
35 search, 12:2825–2830, 2011.
36 [40] Amirhossein Ahmadi, Mojtaba Nabipour, Behnam Mohammadi-Ivatloo,
Ali Moradi Amani, Seungmin Rho, and Md. Jalil Piran. Long-term wind
37 power forecasting using tree-based learning algorithms. IEEE Access,
38 8:151511–151522, 2020. NAYAB TAJVED received her B.S. degree in
39 Electrical Engineering from National University
of Science and Technology, Pakistan, in 2021. Her
40 research work includes studies on tackling data
41 scarcity problems faced in Future Big-Data Em-
42 powered Cellular Networks using analytical and
43 machine learning tools.
44
45
46
47
48
49
NEHA SULTAN received her Bachelor of Science
50 (B.Sc), in Electrical Engineering, from National
51 University of Sciences and Technology(NUST),
52 Pakistan, in 2021. Her research interests include
53 application of AI - driven systems in wireless
networks.
54
55
56
57
58
VOLUME X, 2022 17
59 For Review Only
60
Page 49 of 65 IEEE Access
M.H Naveed et al.: Assessing Deep Generative Models on Time Series Network Data

1
2 ALI IMRAN (Senior Member, IEEE) received the
3 B.Sc. degree in electrical engineering from the
4 University of Engineering and Technology La-
hore, Pakistan, in 2005, and the M.Sc. degree
5 (Hons.) in mobile and satellite communications
6 and the Ph.D. degree from the University of Sur-
7 rey, Guildford, U.K., in 2007 and 2011, respec-
8 tively. He is a Presidential Associate Professor of
ECE and the Founding Director of the Artificial
9 Intelligence (AI) for Networks Research Center
10 and TurboRAN Testbed for 5G and Beyond, University of Oklahoma. His
11 research interests include AI and its applications in wireless networks and
12 healthcare. His work on these topics has resulted in several patents and over
100 peer-reviewed articles, including some of the most influential papers
13 in domain of wireless network automation. On these topics, he has led
14 numerous multinational projects, given invited talks/keynotes and tutorials
15 at international forums and advised major public and private stakeholders
16 and cofounded multiple start-ups. He is an Associate Fellow of the Higher
Education Academy, U.K. He is also a member of the Advisory Board to the
17 Special Technical Community on Big Data, the IEEE Computer Society.
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
18 VOLUME X, 2022
59 For Review Only
60
IEEE Access Page 50 of 65

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
MUHAMMAD HARIS NAVEED received his B.Sc. degree in Electrical Engineering from School of
30
31 Electrical Engineering and Computer Science, National University of Sciences and Technology,
32 Pakistan, in 2021. His research interests include the application of GANs to tabular and time-
33 series network data and audio keyword spotting using using ASR-free approaches.
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60 For Review Only
Page 51 of 65 IEEE Access

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 UMAIR SAJID HASHMI (Member, IEEE) received the B.S. degree in electronics engineering from
31 the GIK Institute of Engineering Sciences and Technology, Pakistan, in 2008, the M.Sc. degree in
32 advanced distributed systems from the University of Leicester, U.K., in 2010, and the Ph.D.
33
34 degree in electrical and computer engineering from the University of Oklahoma, OK, USA, in
35 2019. During his Ph.D., he worked as a Graduate Research Assistant with the AI4Networks
36 Research Center. He also worked with AT&T, Atlanta, GA, USA, and Nokia Bell Labs, Murray Hill,
37
38
NJ, USA, on multiple research internships and co-ops. Since Fall 2019, he has been serving as an
39 Assistant Professor with the School of Electrical Engineering and Computer Science, National
40 University of Sciences and Technology, Pakistan, where he is working in the broad area of 5G
41
42
wireless networks and application of artificial intelligence toward system-level performance
43 optimization of wireless networks, and health care applications. He is also affiliated with the
44 University of Toronto as a postdoctoral research fellow. He has published about 20 technical
45
papers in high impact journals and proceedings of IEEE flagship conferences on communications.
46
47 He has been involved in four NSF funded projects on 5G self-organizing networks, and is a Co-Pi
48 on an Erasmus+ consortium with a combined award worth of over $4 million USD. Since 2020, he
49
has been serving as a Review Editor for IoT and Sensor Networks stream in the Frontiers in
50
51 Communications and Networks.
52
53
54
55
56
57
58
59
60 For Review Only
IEEE Access Page 52 of 65

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32 NAYAB TAJVED received her B.S. degree in Electrical Engineering from National University of
33 Science and Technology, Pakistan, in 2021. Her research work includes studies on tackling data
34 scarcity problems faced in Future Big-Data Empowered Cellular Networks using analytical and
35
36
machine learning tools.
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60 For Review Only
Page 53 of 65 IEEE Access

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33 NEHA SULTAN received her Bachelor of Science (B.Sc), in Electrical Engineering, from National
34
35
University of Sciences and Technology(NUST), Pakistan, in 2021. Her research interests include
36 application of AI - driven systems in wireless networks.
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60 For Review Only
IEEE Access Page 54 of 65

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32 ALI IMRAN (Senior Member, IEEE) received the B.Sc. degree in electrical engineering from the
33
34
University of Engineering and Technology Lahore, Pakistan, in 2005, and the M.Sc. degree (Hons.)
35 in mobile and satellite communications and the Ph.D. degree from the University of Surrey,
36 Guildford, U.K., in 2007 and 2011, respectively. He is a Presidential Associate Professor of ECE
37
38
and the Founding Director of the Artificial Intelligence (AI) for Networks Research Center and
39 TurboRAN Testbed for 5G and Beyond, University of Oklahoma. His research interests include AI
40 and its applications in wireless networks and healthcare. His work on these topics has resulted in
41
several patents and over 100 peer-reviewed articles, including some of the most influential
42
43 papers in domain of wireless network automation. On these topics, he has led numerous
44 multinational projects, given invited talks/keynotes and tutorials at international forums and
45
advised major public and private stakeholders and cofounded multiple start-ups. He is an
46
47 Associate Fellow of the Higher Education Academy, U.K. He is also a member of the Advisory
48 Board to the Special Technical Community on Big Data, the IEEE Computer Society.
49
50
51
52
53
54
55
56
57
58
59
60 For Review Only
Page 55 of 65 IEEE Access

1
2 Original Manuscript ID: Access-2021-42131
3
4 Original Article Title: “Is Synthetic the New Real? Performance Analysis of Time Series Generation
5 Techniques with Focus on Network Load Forecasting”
6
7
8
9 To: IEEE Access Editor
10
11 Re: Response to reviewers
12
13
14
15
16
17
18
19 Dear Editor,
20
21
22
23 Thank you for allowing a resubmission of our manuscript, with an opportunity to address the reviewers’
24 comments.
25
26 We are uploading (a) our point-by-point response to the comments (below) (response to reviewers), (b) an
27 updated manuscript with yellow highlighting indicating changes (Supplementary Material for Review), and
28 (c) a clean updated manuscript without highlights (Main Manuscript).
29
30 A short summary of the changes implemented in the paper in accordance with the suggestions of the
31 respected reviewers is given below
32
33 Summary of Changes in the Manuscript:
34
35 • Reformatted figures and reworded their captions to improve readability. Added references
36
suggested by the reviewers and restructured the text in some sections to improve clarity.
37
38 • Restructured the abstract to more clearly communicate our findings and contributions.
39 • Removed excess background information and references to decrease the length of the paper
40 and make it more readable.
41
• Added a section that shares the parameters as well as the detailed methodology behind our
42
43 work to aid reproducibility of our results.
44 • Performed the analysis on a new time-series from another region called Duomo and updated
45 all sections with its results accordingly.
46
47
48
49 Point – by – point Responses:
50
51 In the point-by-point response below, the following scheme is used, (I) red represents the reviewers’
52 comments, (II) our response is in black font, (III) any changes made in the paper, or text taken directly from
53 the paper is shown in italic blue font. Note that not all revisions have been copied into this letter to avoid
54
cluttering.
55
56
57
58
Best regards,
59
60 M.H Naveed et al.

For Review Only


IEEE Access Page 56 of 65

1
2 Reviewer#1, Concern # 1: All abbreviations should be defined in the first place, e.g., AR, VR, URLLC, IoT, etc.
3 The same acronym in abstract and another part of the paper should be defined in both abstract and another
4 place in the first place.
5
6 Author response: We are grateful to the author for providing this suggestion.
7
8 Author action: We updated the manuscript by defining all abbreviations in both the abstract as well as when
9 they initially appear in the main body of the paper.
10
11
12
13
14
15 Reviewer#1, Concern # 2: To provide enough and more information about Internet of Things to readers, the
16 following article is suggested to be used and cited in the 1st paragraph of introduction section, i.e.,
17 “applications such as IoT […], URLLC” and it should be added to reference section:
18
[…] “Internet of Things in Smart Grid: Architecture, Applications, Services, Key Technologies, and Challenges,”
19
20 Inventions journal, vol. 4, no. 1, pp. 1-12, 2019.
21
22
Author response: We read the reference suggested by the reviewer and decided that its addition improved
23 our paper.
24
25 Author action: We updated the manuscript by adding the suggested reference in paragraph 1 of Section I
26 (Introduction), page 1.
27
28
29
30
31
32 Reviewer#1, Concern # 3: References should be provided for the equations which were borrowed from the
33 literature.
34
35 Author response: We completely agree with the reviewer’s suggestion.
36
37 Author action: The paper from which an equation was taken is now referenced before the said equation is
38 presented in our paper. This applies to equations (2), (4), (5) in Section IV on pages 4 – 6 and equations (7),
39 (8) in Section V on page 7. Equations related to the TimeGAN in Section IV, page 5 have been removed from
40 the paper and are now referenced with respect to their number in the original TimeGAN paper. This change
41 is shown below:
42
43 The losses used to train the Embedding, Recovery, Generator and Discriminator networks can be
44 found as Eqs. (7), (8) and (9) in [35].
45
46
47
48
49
50 Reviewer#1, Concern # 4: All figures and tables should be placed at the top of the page or at the bottom of
51 the page but not between paragraphs, e.g., Fig. 5-8, Table 1.
52
53 Author response: We thank the reviewer for their invaluable observation.
54
55 Author action: All figures and tables are now positioned on the top of pages rather than between paragraphs.
56
57
58
59
60

For Review Only


Page 57 of 65 IEEE Access

1
2 Reviewer#1, Concern # 5: Each part of Figures 10-14 should be labeled using a), b), c), and so on. Also, each
3 part should have a separate explanation in its caption.
4
5 Author response: We agree with the reviewer’s comment and believe it will help make the figures more
6 understandable.
7
8 Author action: We have added the labels (a), (b) and (c) to each row in Figures 10 – 16. We have also
9 explained each part in the corresponding caption of each figure.
10
11
12
13
14
15
16
17 Reviewer#1, Concern # 6: The future work should be explained in conclusion section or separately after
18 conclusion as section VIII.
19
20 Author response: We thank the reviewer for their constructive feedback on the structuring of the paper.
21
22 Author action: Possible future extensions of the work are now explained in paragraph 2 of Section VII, page
23 16. The paragraph is reproduced below:
24
25 While this is the first in-depth research on comparing the latest GAN models for a time series
26
27
telecommunications data set, future extensions of the work may involve replicating it with a much
28 larger data set, as well as with multivariate time-series data. Another direction could be to work
29 with tabular data, which would entail using different GAN models as well as different evaluation
30 metrics. Finally, the DoppelGANger as well as PAR are capable of reproducing time-series data with
31 corresponding context/metadata information, so it could be worthwhile to see if their performance
32
33 differs in that environment.
34
35
36
37
38
Reviewer#2, Concern # 1: The author needs to restructure the abstract, it should be in line with contributions
39
40 listed in the Introduction section.
41
42
Author response: We regret that our choice of words in the abstract did not clearly communicate our
43 contributions.
44
45 Author action: We have restructured the abstract to reflect our contributions more clearly. The part of the
46 abstract most pertinent to this is reproduced below:
47
48 In this paper, we choose two GAN - based models and one deep learning - based autoregressive
49 model. We then compare their performance at generating synthetic time-series cellular traffic data.
50 We also assess the impact of data scarcity on the generated data quality by varying the level of data
51
available to the GANs for training. Moreover, in order to assess the usefulness of this generated
52
53 data, we compare the forecasting performance of a gradient boosting regressor model trained solely
54 on synthetic data, real data, and a mix of both.
55
56 This summarizes our main contributions, which are:
57
58 • Assess the performance of open – source GAN – based generative models vs a deep learning
59 based autoregressive model.
60
• Determine how the availability of data impacts the performance of these models.

For Review Only


IEEE Access Page 58 of 65

1
2 • Numerically evaluate the usefulness of generated data in downstream applications such as
3 network load forecasting.
4
5
6
7
8
9 Reviewer#2, Concern # 2: There are grammatical mistakes in the paper, in result analysis. line no 10, Hence,
10 we train our forecasting model on 1, 3 and 5 weeks of data only and forecasted for a week from 12/09/2013
11
- 12/15/2013. Which week author consider for the experiment? Are these dates and weeks correct
12
13 12/09/2013 - 12/15/2013? Result analysis shows 1, 3 and 5 week, but in explanation author trace data to the
14 7th week as well
15
16 Author response: We are thankful to the reviewer for pointing out this potentially confusing paragraph. 1,
17 3 and 5 weeks of data are used in training the forecasting model only. This is because our forecasting interval
18 is chronologically the 6th week so it is inappropriate to use 7 weeks of data to train a forecaster. However,
19 the 7 weeks of generated data are used in other comparative metrics such DTW score, auto-correlation plots,
20 histograms and KL - Divergence.
21
22 Author action: We have added a paragraph to further elaborate on the data used in our analysis within
23 Section VI on page 8. This added paragraph is presented as follows:
24
25 For conciseness, we report the qualitative assessment, KL - Divergence and Dynamic Time Warping
26
27
of 3, 5 and 7 weeks of data only. For the Train Synthetic Test Real (TSTR), augmentation and
28 imputation assessments, a forecasting model is trained on 1, 3 or 5 weeks of data only and then used
29 to forecast over the test week, which is chronologically the 6th week in the time-series. If we use 7
30 weeks of data as well, that would require making the test week the 8th week. The problem is that
31 the 8th week lies in the Christmas season and its readings are radically different from our training
32
33 data. Thus, using 7 weeks of data is not appropriate in these assessments.
34
35
We have also added the test-week span and length to Table 1, which describes how the data is partitioned.
36 The caption of Table 1 also offers more explanation now as well.
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51 Reviewer#3, Concern # 1: Please use a short yet informative title for the paper.
52
53 Author response: We agree with the reviewer that the original title was exceedingly long and too wordy.
54
55 Author action: We have changed the title of the Manuscript to Assessing Deep Generative Models on
56 Time Series Network Data from Is Synthetic the New Real? Performance Analysis of Time Series
57 Generation Techniques with Focus on Network Load Forecasting
58
59
60

For Review Only


Page 59 of 65 IEEE Access

1
2 Reviewer#3, Concern # 2: Please use correct abbreviations, GAN is abbreviated several times.
3
4 Author response: We appreciate the reviewer’s observation and have taken measures to implement it
5
6 Author action: Abbreviations are now defined once in the abstract and once when they are initially used in
7 the main body. Any other excess abbreviations have been removed.
8
9
10
11
12
13 Reviewer#3, Concern # 3: Please clarify we “traffic forecasting models” in the Abstract since it is misleading
14 at the first glance!
15
16 Author response: We regret that the reviewer was inconvenienced by our choice of words and agree that
17 our use of the term was not clear.
18
19 Author action: We have updated the abstract to be more explicit about what we are forecasting. The portion
20 of the abstract most relevant to this suggestion is reproduced below:
21
22 Moreover, in order to assess the usefulness of this generated data, we compare the performance of
23
a gradient boosting regressor model trained solely on generated data, real data, and a mix of both
24
25 at forecasting network usage activity. We do not consider any privacy issues pertaining to the
26 generated data in this work. Our experiments show that the GANs perform better than the
27 autoregressive approach in each aspect considered in this work. Forecasting models trained to
28 predict network load based on data generated by these GANs yield error rates comparable to models
29
30
trained on real data.
31
Thus, it is now clearer that by ‘traffic forecasting’ we meant predicting network traffic.
32
33
34
35
36
37 Reviewer#3, Concern # 4: Please replace section with Section at the end of the first part of the Introduction.
38
39 Author response: We regret that such a basic mistake occurred on our end.
40
41 Author action: The above amendment has been made and the manuscript has been checked for other such
42 mistakes as well.
43
44
45
46
47
48 Reviewer#3, Concern # 5: What is the main aim of the paper? Applying GANs, traffic forecasting or privacy
49 issue?! What we are talking about sparsity or privacy?
50
51 Author response: We are thankful to the reviewer for sharing their concerns.
52
53 Author action: The main aim of the paper is to compare three different generative models, then determine
54 how the availability of data affects their performance and finally see how the generated data can be used in
55
a network load forecasting task. We do not consider the privacy issue particularly in this work, rather conduct
56
57 a thorough analysis on the performance of generative techniques in producing time-series data along with a
58 study on the impact of data scarcity, often created by virtue of privacy policies on the efficiency of these
59 deep generative techniques. This point has now been clarified within the abstract, the relevant part of which
60 is reproduced below:

For Review Only


IEEE Access Page 60 of 65

1
2 In this paper, we choose two publicly available GAN - based models and one deep learning - based
3 autoregressive model. We then compare their performance at generating synthetic time-series
4
5
cellular traffic data. We also assess the impact of data scarcity on the generated data quality by
6 varying the level of data available to the GANs for training. Moreover, in order to assess the
7 usefulness of this generated data, we compare the performance of a gradient boosting regressor
8 model trained solely on generated data, real data, and a mix of both at forecasting network usage
9
activity.
10
11
12
13
14
15 Reviewer#3, Concern # 6: Preliminaries are described in excessive detail such that the reader wonders which
16
method and where is used in the simulations. Please restructure the paper with concise descriptions to avoid
17
18
making a simple complicated.
19
Author response: We thank the reviewer for bringing this to our attention. For the sake of completion, we
20
21 explained each model’s novelties and its working. However, we see now that this approach has done more
22 harm than good.
23
24 Author action: We have significantly cut down on the amount of detail offered in Section II: Related Work
25 and Section IV: Time – Series Generative Models. The paras that underwent the most significant editing are
26 reproduced below:
27
28 Section II Para 1 (page 2):
29
30 The Generative Adversarial Network (GAN) framework was originally designed for simple image
31 data, but since its inception has seen great advancements [10]. CycleGAN [11] deals with image-to-
32
image translation as compared to generating image from a noise vector, with applications such as
33
34 super resolution and style transfer. StyleGAN [12] focuses on generating high resolution human faces
35 by proposing an alternative generator architecture for GAN based on style transfer learning.
36 StackGAN [13] synthesizes high-quality images from text descriptions using a two-stage process.
37
38
39
40 Section IV, Subsection C, Para 4 (page 5):
41
42 The use of GANs to generate tabular or time-series data is less common than generating images or
43
44
video, with the most popular model (as per GitHub statistics) being the CTGAN (Conditional Tabular
45 GAN) [47]. Another model is the ITS-GAN [48], which produces synthetic data tables that exhibit the
46 same statistical properties and functional dependencies as the real, partially available data.
47 MedGAN [49] was the first GAN to model high dimensional, multi-label discrete variables in
48 electronic health records (EHRs). GANs have been used to generate time-series data from many
49
50 domains such as physiological signals [50], medical ICU data [18], financial time-series [21], [20],
51 and PV (photovoltaic) production [51]. The TimeGAN [52] and DoppelGANger [53] are frameworks
52 that modify the traditional GAN architecture to make it more suitable for time-series data. A brief
53 overview of these models is given below.
54
55
56
57 Section IV, Subsection C, TimeGAN (page 5):
58
59
We remove explanations of the individual losses of the TimeGAN and instead refer readers directly
60
to the paper.

For Review Only


Page 61 of 65 IEEE Access

1
2 The losses used to train the Embedding, Recovery, Generator and Discriminator networks can be
3 found as Eqs. (7), (8) and (9) in [35].
4
5 Section IV, Subsection C, DoppelGANger, Para 2, (Page 5):
6
7
DoppelGANger solves this by modifying the RNN structure to produce S values in a single pass. This
8
9 reduces the overall number of passes required to generate the entire series, but the quality of the
10 generated samples also deteriorates as S increases. The authors recommend a value of S = 5 for best
11 results. In order to counter mode collapse, common in data sets that have large variability in values,
12 the authors use an idea they call auto-normalization. Instead of normalizing the entire data set using
13
14
the minimum and maximum values, they normalize each sample individually and treat the maximum
15 and minimum values of each time series as random data that has to be learnt by the model rather
16 than passed as input. Perhaps the largest design contribution is the use of a separate auxiliary
17 discriminator in addition to the regular Figure 6. Simplified block diagram of DoppelGANger
18
generator and discriminator setup, that works only on static features (called metadata) of a single
19
20 time-series sample. This is done to ensure that the complex relationships between a time-series and
21 its associated metadata are replicated in the generated data. This approach is unique because most
22 models that we have discussed so far trained on the metadata and temporal features jointly. This is
23 also why It uses the Wasserstein Loss …
24
25
26
27
28
29 Reviewer#3, Concern # 7: Instead of referring to a paper from 2011, please refer to up-to-date references
30 like 10.1109/ACCESS.2020.3017442 and make it clear if you used a simple gradient Boosting and not
31
XGBoost, CatBoost or LightGboost.
32
33 Author response: We are thankful to the reviewer for pointing out this omission on our end and suggesting
34
a paper that is relevant to our methodology.
35
36
Author action: The abstract as well as Section VI, Subsection A (Pre-processing setup), page 8, now explicitly
37
38
state that we use a simple gradient boosting regressor model and not one of its variants such as XGBoost,
39 CatBoost etc. Furthermore, we also cite the suggested paper in Section VI, Subsection A (Pre – processing
40 setup), paragraph 3, page 8 so that readers unfamiliar with the use of this technique for forecasting can read
41 more about it.
42
43
44
45 Reviewer#3, Concern # 8: There are 60 references which are excessive for a regular paper. Please critically
46 review the most related ones.
47
48 Author response: This suggestion of the reviewer compliments the earlier suggestion of cutting down on
49 excessive explanations in the preliminaries and so both have been adequately addressed.
50
51 Author action: We have removed all references that are not directly concerned with the main scope of our
52 paper. This has led to the total references being reduced from 60 to 40.
53
54
55
56
57
58 Reviewer#3, Concern # 9: It is totally unclear how the authors train the models! Please provide a clear flow
59 diagram briefly clarifying the presented method and metrics at the beginning of the Simulation section.
60 Please provide all hyper-parameters used in this paper in a table helping to regenerate the results.

For Review Only


IEEE Access Page 62 of 65

1
2 Author response: We are immensely grateful to the reviewer for this particular suggestion. Lack of
3 reproducibility in machine learning research is a very real concern, and so we have worked especially hard to
4 accommodate this suggestion.
5
6 Author action: We have added Figures 10 and 11 in Section VI, page 9 that depict our comparison process as
7 well as the generation process that comes before. Together they provide a succinct summary of the
8
methodology used in the paper. Furthermore, we added an entire new subsection (Subsection B) to Section
9
10
VI (Results and Discussion) on page 8 called ‘Model Training.’ We did this so that, in addition to providing the
11 hyper-parameters for each model in Tables 3 – 5, we could also provide additional information on how to
12 prepare data for each model or modify their code for our specific use case. This information will make it
13 easier for any readers who wish to reproduce our work. Figures 10 and 11 are reproduced below:
14
15 Figure 10:
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 Figure 11:
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58 Reviewer#3, Concern # 10: The performance deterioration mentioned in Section B.2 is the ease of learning
59 of a small set of data. Gradient boostings, especially XGBoost, can easily handle more than 7 weeks without
60 concerning about long-term dependencies. As far as we know LSTM, as an RNN, is developed to handle long-

For Review Only


Page 63 of 65 IEEE Access

1
2 term dependencies, hence, how do the authors claim, “holding these dependencies in memory is difficult for
3 RNN - based architectures”?
4
5 Author response: We thank the reviewer for pointing out the lack of clarity in our statement and appreciate
6 the chance to explain it once more.
7
8 In the original manuscript, the idea that LSTMs cannot capture long-term dependencies is first introduced in
9 Section IV, Subsection C, DoppelGANger, Para 1, where we state:
10
11 Since RNNs produce a single measurement in a single pass and for a time-series of length L perform
12
13
L passes, they tend to forget prior values and struggle to capture long-term correlations in a time-
14 series.
15
16 This claim is originally made in the paper ‘Using GANs for Sharing Networked Time Series Data: Challenges,
17 Initial Promise, and Open Questions’ (citation 36) where the authors prove empirically that even LSTMs
18 struggle to capture long-term dependencies when a time-series is longer than a few hundred measurements
19 in length. We regret that we did not make this more explicit in the paragraph and this mistake has now been
20 corrected.
21
22 Secondly, we refer to this statement while explaining the performance degradation observed in the ACF Plots
23 in Section VI, Subsection B.2 as:
24
25 Firstly, the amount of data added is simply not enough to have a significant improvement in
26
27
performance; time-series data usually spans years instead of a few weeks. Secondly, increasing the
28 data increases the number of long-term dependencies that each model has to learn. As previously
29 explained, holding these dependencies in memory is difficult for RNN - based architectures, but the
30 DoppelGANger’s superior performance indicates that it’s solution for this problem does improve
31
performance over standard RNNs (as used in PAR) or RNNs used in conjunction with auto-regressive
32
33 models (as used in TimeGAN)
34
35 We reassessed this explanation and decided that while it could apply in our case (the length of one week of
36 data is 168), there must be another explanation for the degradation observed. It turns out that we were
37 correct. The degradation observed in the ACF plots is only visible for Navigli district, not for Bocconi or Duomo
38 (a new region we added, more discussion on that in Concern # 13). This indicates that the correlation plots
39 show more deterioration when dealing with a volatile time-series (like Navigli) rather than more regular,
40 repeating time-series data (like Duomo or Bocconi)
41
42 Author action: We have made the first statement in Section IV clearer; the new additions (in bold) are shown
43 below:
44
45 Since RNNs produce a single measurement in a single pass and for a time-series of length L perform
46
47
L passes, they tend to forget prior values and struggle to capture long term correlations in a time-
48 series. Furthermore, authors in [36] in Section 4.1, page 5 state that even LSTMs, which were
49 designed to correct the above mentioned problem with RNNs, empirically struggle to perform well
50 when the length of a time-series surpasses a few hundred records.
51
52 We have also removed the original explanation in Section VI, Subsection B.2, page 10 and replaced it with
53 the explanation given above. In Para 2, we now state:
54
55 Thus, it seems that the models struggle to capture the correlations across a more chaotic time-series
56
57
like Navigli but easily reproduce the correlations in places like Duomo and Bocconi, which have more
58 regularly repeating activity patterns.
59
60

For Review Only


IEEE Access Page 64 of 65

1
2
3
4 Reviewer#3, Concern # 11: There are flaws in the presentation and analysis of the results. The paper requires
5 better comparative methods to substantiate its originality and superiority. For example, did you answer the
6 claim you stated in the Abstract? i.e. “We also assess how much real data is required to produce high-quality
7 synthetic data.”? To this end, the authors are requested to compare the presented technique with active
8
learning-based models.
9
10 Author response: We agree that our original statement, “We also assess how much real data is required to
11
produce high-quality synthetic data”, did not accurately represent our contributions. Our work is more of a
12
13 case study that looks at how publicly available deep learning based generative models, particularly GANs
14 perform, especially in the case when training data is scarce. What we meant by the above statement was to
15 state how much real data would be required to produce useful synthetic data in our particular use-case.
16 However, we now realize that the statement came off as a generalization about how much data of any nature
17 would be required to produce useful synthetic data from these models.
18
19 The comparison with active learning – based ML models is unsuitable in this use case though. Active learning-
20 based solutions are applied when one has a small amount of labelled data, lots of unlabeled data and not
21 enough resources to manually label it. In that case, you train a model on the limited labeled data available
22
and then use that to predict the labels on the unlabeled data. Then, with a suitable score, we prioritize which
23
24
data to label and the process is iterated repeatedly. Approaches like these are more suitable for classification
25 problems where one has a distinct number of classes or problems that use independent data samples.
26
27 In contrast, deep generative models such as the models we use learn the underlying distribution from which
28 a time-series originates. This includes learning the distribution of values of the recordings in the time-series
29 as well as the temporal relationship between the measurements. We then use these trained models to
30 generate new, unique data points that are not identical to the true data we have but appear to have been
31 generated from the same process.
32
33 Since the two approaches address different problems, we feel it is outside the scope of our work to assess
34 active – learning models on this particular use case.
35
36 Author action: We have rewritten the abstract and removed the line stated above since it does not clearly
37 represent our work as noted above.
38
39
40
41
42
43 Reviewer#3, Concern # 12: Why augmented data does not improve forecasting accuracy? Is it cogent to
44 simply say “there is not enough data”?
45
46 Author response: Augmenting small amounts of real data with some synthetic data does lead to
47 improvements in forecasting accuracy over just using the small amount of real data. In the explanation in
48 Section VI, Subsection C.4, we state that although there is improvement, it is just not large enough to be
49
significant. We believe this is reasonable because we are only using five weeks of data to forecast, so any
50
51 measurements in the synthetic data that degrade the model’s learning have more weight than they would if
52 we were using twelve weeks of data to train a forecaster. In the latter case, the overall characteristics of the
53 synthetic data would have more effect that any individual readings. Therefore, we do believe that the small
54 amount of improvement seen is indicative of possible larger improvements if we were using a larger data set
55 to perform the forecasting (as well as the GAN training)
56
57 Author action: NA
58
59
60

For Review Only


Page 65 of 65 IEEE Access

1
2 Reviewer#3, Concern # 13: Please make sure to use datasets with considerable sample numbers. Do you
3 think it is cogent claiming on a small-sized dataset? Or on a single dataset? I don’t think so!
4
5 Author response: We are extremely thankful to the reviewer for bringing this up since this comment gave us
6 a chance to really work through the entire paper again. This exercise brought new insights and resulted in
7 slightly improved results.
8
9 To answer the first part of the comment, it is important to reiterate the fact that the entire premise of this
10 work revolves lack of useful data. The idea of synthetic data generation, as detailed in our introduction,
11
hinges on the idea of large amounts of data being unavailable. If one has access to large amounts of data,
12
13 then the need for synthetic data is severely limited to only situations where the vendor does not want to
14 share real data as is. Another problem is that in our chosen domain; telecommunications, large datasets are
15 not open-source. The Telecom Italia set is one of few publicly available telecom activity data sets. Another
16 one is Broadband America, however that set deals with internet usage in homes whereas telecom Italia deals
17 with mobile internet usage, which we found more interesting in today’s mobile first environment. Note that
18 for many homes in the Broadband America data set, there are lots of missing values as well, so that even it
19 is not an extremely large data set.
20
21 Regarding the other point about using a single data set, note that while the measurements for each region
22
come from the same provider, they are in fact separate data sets. Both Bocconi University and Navigli District
23
24
have distinct measurements that are independent from one another. Therefore, the analysis conducted on
25 them should be considered as being done on two data sets instead of one.
26
27
28
29
Author action: We decided to analyze data of a third region in addition to the two we already had. This region
30 is a historical cathedral in central Milan and arguably one of its biggest tourism spots. We have updated all
31 sections of the paper with insights from this region. In particular, we have tried to rewrite Section III to make
32 it clearer that we are using three different data sets instead of one. For instance, we have updated Figure 3
33 as:
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52 We have also added a brief description of Duomo’s time-series in Section III, Subsection B, page 4 as:
53
54 Lastly, Duomo resembles a sinusoidal pattern, but a closer look shows that there’s a slight uptick in
55
56 activity every Friday and Saturday towards twilight. Thus, all three regions …
57
58
Reconducting the analysis on Duomo allowed us to reassess our choice of parameters in the case of
59 DoppelGANger and PAR. For instance, in the original work, we were applying an absolute function on PAR’s
60 output to ensure all its generated values were positive. While working on Duomo, we decided that this was

For Review Only


IEEE Access Page 66 of 65

1
2 unfair since there was no such postprocessing being done for other models. This led to repeating the analysis
3 for PAR for the other regions as well and all corresponding figures/tables were updated accordingly.
4
5 Finally, assessing a region with a regular pattern like Duomo allowed us to contrast performance with other
6 regions with more volatile trends such as Navigli. This provided insights that were previously unclear from
7 just looking at the original two regions and as such, large parts of the results section have been updated,
8
especially the section regarding DTW Score, where we realized the degradation observed when using 7 weeks
9
10
of data was because of an overall downwards trend in Bocconi and Navigli (visible in Figure 3). This
11 downwards trend is somewhat inaccurately captured by the TimeGAN, which is why its performance
12 improves at that level.
13
14
15
16
17
18
19
20 Note: References suggested by reviewers should only be added if it is relevant to the article and makes it more
21 complete. Excessive cases of recommending non-relevant articles should be reported to
22 ieeeaccesseic@ieee.org
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

For Review Only

You might also like