Data Evacuation From Data Centers in Disaster Affected Region - 2019 - Computer PDF

Computer Networks 148 (2019) 88–100
Contents lists available at ScienceDirect
Computer Networks
journal homepage: www.elsevier.com/locate/comnet
Data evacuation from data centers in disaster-affected regions through

software-defined satellite networksR
Rafael B. R. Lourenço a,∗, Gustavo B. Figueiredo c, Massimo Tornatore a,b,
Biswanath Mukherjee a
a
University of California, Davis, USA
b
Politecnico di Milano, Italy
c
Federal University of Bahia, Salvador, Brazil
a r t i c l e i n f o a b s t r a c t
Article history: Large-scale disasters can severely disrupt Information Technology (IT) infrastructure, e.g., Data Centers.
Received 8 February 2018 Earthquakes, hurricanes, tsunamis, and other natural catastrophes may lead to such a scenario. Man-
Revised 14 September 2018
made threats, e.g., a High-Altitude Electromagnetic Pulse (HEMP), can also provoke such an aftermath. In
Accepted 29 October 2018
the HEMP case, however, the aftermath might include damage even to non-terrestrial IT infrastructure,
Available online 3 November 2018
such as satellites. After any disaster, it is important that critical data located in the affected region be
Keywords: evacuated to secure locations where it can be useful for emergency operations, mission-critical activities,
Post-Disaster recovery rescue and relief efforts, and society and businesses in general. To minimize the time it takes to per-
Satellite and terrestrial networks form this evacuation, we must use all available resources as efficiently as possible. This includes using
Data evacuation the remaining satellites to connect the affected regions of the network to the unaffected ones. Utilizing
Software-Defined networks the Software-Defined Network (SDN) paradigm applied to satellite networks, we propose an algorithm
Mega constellations
that can be executed by the SDN controller. This algorithm generates an evacuation plan for data located
Traffic engineering.
in possibly-isolated terrestrial systems, such as Data Centers, through the satellite network, towards fi-
nal destinations in the main network. The evacuation plan is a transmission schedule that maximizes
the amount of evacuated data. Considering the current industrial interest in mega satellite constellations,
we compare how two constellations of 66 and 720 satellites perform in terms of amount of data evac-
uated. Our results show how the evacuation is affected by different satellite constellation configurations
(i.e., buffers, inter-satellite link capacities, etc.). Since our approach allows for Traffic Engineering (TE) to
be performed, we also demonstrate how it enables fair resource utilization among different affected in-
frastructures during data evacuation. Our illustrative examples also compare our method to an approach
designed for Delay-Tolerant-Vehicle networks and show how our solution can evacuate up to 60% more
data after a disaster.
© 2018 Elsevier B.V. All rights reserved.
1. Introduction catastrophes caused by human actions. After events like these, it is

important to (re)establish communication with the affected region
Disasters are hazardous not only because of the potential casu- to maintain mission-critical operations as well to aid in rescue and
alties they might inflict, but also because of their effects on in- relief efforts.
frastructure society relies upon. The Information Technology (IT) The man-made High-Altitude Electromagnetic Pulse (HEMP) can
infrastructure can be greatly harmed by disasters [2]. Natural cause grave disasters. A HEMP detonation might lead to complete
phenomena such as earthquakes, tsunamis, and hurricanes might isolation of terrestrial IT systems (such as Data Centers) and, as the
cause widespread damage to communication networks, as well as HEMP pulse also propagates upward, it also is potentially harmful
to lower-orbit satellite equipment [3]. Traditionally, one of the im-
portant uses of satellite communication is as a backup for when
R
A short summarized version of this study was presented at the IEEE Interna- terrestrial communication fails or becomes heavily congested. After
tional Conference on Communications (ICC) 2017 [1].
∗
a HEMP, however, some satellites might become unavailable, possi-
Corresponding author.
E-mail addresses: rlourenco@ucdavis.edu (R.B. R. Lourenço), gustavo@dcc.ufba.br
bly hindering their use as backup in the post-disaster environment.
(G.B. Figueiredo), tornator@polimi.it (M. Tornatore), bmukherjee@ucdavis.edu (B. Hence, with respect to the IT infrastructure, a HEMP is one of the
Mukherjee).
https://doi.org/10.1016/j.comnet.2018.10.019
1389-1286/© 2018 Elsevier B.V. All rights reserved.
R.B. R. Lourenço, G.B. Figueiredo and M. Tornatore et al. / Computer Networks 148 (2019) 88–100 89
most impactful disasters. In this study, we focus on the occurrence ponents, avoiding unnecessary competition for potentially-scarce
of a HEMP, but our contributions are applicable to other disasters. post-disaster satellite resources.
Fig. 1 depicts a possible impact zone of a HEMP detonation. This We focus on a scenario where a part of the terrestrial network
zone might span hundreds of miles in which electronic equipment is severely damaged. Our contributions, however, also apply to a
might be completely damaged. In the example, nodes 28 and 29 strenuous scenario where the terrestrial network becomes frag-
stop functioning, which forces all communication coming from and mented (and maybe even the satellite, too), as a possible aftermath
going to nodes 34, 35, and 36 to go through the link from node of a HEMP. We focus on actions to be taken after the HEMP deto-
33 to node 40. As already observed in past disasters, a peak in nation.
offered traffic commonly occurs right after the disaster. After the To devise an efficient data-evacuation plan after a disaster, we
Great Hanshin-Awaji earthquake of 1995 (a.k.a., Kobe earthquake), leverage properties of satellite networks and Delay-Tolerant Net-
for example, a peak of 50x the usual traffic created a network con- works (DTN). We utilize knowledge such as the geographical loca-
gestion that lasted for five days [4]. Hence, it is very likely that the tions of terrestrial network infrastructure and satellite orbits to for-
link between node 33 and node 40 is heavily congested after the mulate an evacuation strategy from systems within the affected re-
disaster. In practice, this scenario is similar to a situation where gions to destinations in the main network. This evacuation strategy
nodes 34, 35, and 36 become completely disconnected from the consists of a transmission schedule that determines when satel-
main network, i.e., where the network becomes fragmented. lite GSs should transmit to specific satellites (and vice versa), and
After any disaster, it is important that critical data located in when satellites should transmit to other satellites (as well as when
the affected region be evacuated to safe destination storages in they should store data in their buffers). This schedule is such that
the unaffected part of the network (referred to as the main net- it maximizes the amount of data evacuated from the affected re-
work), where it might be useful to rescue efforts, mission-critical gion using satellite links that were not affected by the HEMP det-
operations, and society and businesses in general. To perform the onation. It also contains time information so that data that must
evacuation, the use of alternative means of communication, such as reach its destination more urgently can be sent at the appropri-
satellites, is very important whether the network has been severely ate transmissions during evacuation. Our numerical examples show
damaged (as in Fig. 1) or even fragmented. Nonetheless, since even how the evacuation plan performs for different sizes of constella-
the satellite constellation might have been damaged — particularly tions, and how TE can be beneficial to the evacuation. Our exam-
after the detonation of a HEMP — it might be the case that satellite ples also demonstrate how our method can evacuate up to 60%
services become intermittent. more data after a disaster, when compared to a solution specifi-
Low-Earth Orbit (LEO) satellites are constantly moving around cally designed for Delay-Tolerant-Vehicle Networks.
orbits of low altitude (typically around 2,0 0 0 km for circular or- The rest of this work is organized as follows: Section 2 briefly
bits). Accordingly, such systems are able to provide low-latency reviews the literature related to the impact of disaster on IT in-
communication. To stay in such orbits, LEO satellites must main- frastructure. Section 3 analyzes the impact of HEMPs on satellites.
tain constant motion with respect to Earths surface, taking them Section 4 details the system architecture and problem statement.
around 90 min to revolve around the globe. Thus, to provide con- Section 5 describes how our algorithm works, and Section 6 dis-
tinuous coverage to any point on Earths surface, multiple satellites cusses the scalability of the algorithm. Section 7 discusses the
are necessary. In contrast, one Geostationary (GEO) satellite can lessons learned using practical numerical examples. Section 8 con-
cover a region of Earths surface continuously, albeit at a distance cludes the study.
of around 36,0 0 0 km. A group of satellites working in concert is
called a constellation. Although LEO satellites are more suscepti-
2. Related works
ble (than higher-orbit satellites, such as GEO) to being harmed by
events such as a HEMP, even in the dire scenario of constellation
In this section, we review the relevant literature. To the best of
fragmentation, there are still different reasons why LEO satellites
our knowledge, no other work has focused on evacuating data from
are good candidates to evacuate critical data, as follows:
a disaster-affected region by utilizing both terrestrial and aerial
• Alternative evacuation means, such as physical evacuation of networks. Thus, in Section 2.1, we review studies that have focused
hard drives, would incur long delays; on evacuating data from disaster-affected regions (either through
• Higher-orbit unaffected satellites (e.g., GEO) would likely be wired or wireless networks, without using both).
used by emergency operations for always-on communications; Because this study utilizes methods similar to algorithms pro-
• Satellite Ground Stations (GSs) can be quickly deployed by posed for DTN, in Section 2.2, we briefly review some studies that
emergency operations and can possibly be located in radiation- have focused on routing in these environments.
protected silos; and
• Currently, there are commercial plans to deploy at least two
2.1. Post-Disaster data evacuation and first response
mega LEO constellations, one having more than 700 satellites
[5] and another more than 40 0 0 [6].
The vulnerability of terrestrial networks to disasters is a prob-
As Software-Defined Networking (SDN) technology is intro- lem that has been addressed in different studies. The importance
duced to satellite networks [7], it allows for complex orchestra- of disaster resiliency in communication networks was analyzed in
tion of the network elements, which further motivates the use [9]. The resiliency of Wavelength-Division Multiplexed (WDM) net-
of satellites to evacuate critical data after disasters. As will be works to disasters was investigated in [10]. The problem of disaster
further explored, we utilize the SDN control plane to compute survivability in optical networks was investigated in [2]. A progres-
a data-evacuation strategy and, accordingly, to enforce that strat- sive recovery approach for virtual infrastructure services after dis-
egy on the network elements. The SDN control plane also enables asters was investigated in [11]. A method to protect anycast and
the orchestration of different network domains [8], creating an ar- unicast flows against attacks and disasters was studied in [12]. The
rangement of different networks (or domains) that allows for the effects of electromagnetic pulses on communication networks have
maximization of the evacuated data, even when multiple net- been considered in [13]. None of the works above have focused on
work fragments are disconnected from the main network. In this actions to be taken after disasters in order to retrieve important
last scenario, the control plane can use Traffic Engineering (TE) information that was stored in the affected region, which is the
techniques to achieve fairness between different isolated com- subject of this work.
90 R.B. R. Lourenço, G.B. Figueiredo and M. Tornatore et al. / Computer Networks 148 (2019) 88–100
Works on evacuating data from the affected region after disas-

ters include the following: [14] studied disaster-aware evacuation
of data for wireless sensor networks; [15] studied how to evacuate
data from Data Centers through optical networks, once a disaster
is predicted to occur in the near future; [16] investigated the al-
gorithms to decide where and how to evacuate virtual machines if
a disaster occurs; and [17] analyzed the problem of executing fast
backups between different data centers to lessen the effects of dis-
asters. The problem of emergency backup in an inter-Data Center
network due to a progressive, predictable disaster was investigated
in [18,19] optimizes the backup of data for a Data Center that will
be impacted by a disaster that will soon affect that location.
All of the works above only studied how to evacuate data
through one single network, i.e., either a terrestrial network or a
wireless sensor network, by usually assuming some knowledge of
the disaster prior to its occurrence. Also, if the network is frag-
mented by the disaster, these works do not propose solutions Fig. 1. HEMP footprint over a US-wide terrestrial network. All links in this terres-
trial network have 100 Gbps throughput. Different zones inside the red circle expe-
to reestablish communication between the different fragments.
rience distinct HEMP effects. In this scenario, nodes 28 and 29 fail, heavily congest-
To evacuate important data from a disaster-affected region, it is ing the link between node 33 and node 40. (For interpretation of the references to
proper to consider all surviving communication resources after the colour in this figure legend, the reader is referred to the web version of this article.)
occurrence of the disaster, because disasters can be unpredictable
and severely damage networks (to the point of fragmentation).
Thus, we focus on evacuating data by cooperatively utilizing both
terrestrial (e.g., optical WDM networks) and aerial networks (e.g.,
satellite constellations) in order to provide relatively large amounts
of bandwidth to evacuate data from the affected region.
The use of aerial platforms in post-disaster scenarios is an ac-
tive topic in the literature. In [20], literature related to the use
of unmanned aerial vehicles (UAVs) to assist first response to dis-
asters was surveyed. Among the studied works, [21] investigates
the use of drones to assist post-disaster search-and-rescue opera-
tions; and [22] studies how to apply the SDN paradigm to wire-
less 5G networks. The authors of [7] provide an architecture for a
temporal-spatial SDN-capable satellite network. The works above, Fig. 2. A possible immediate HEMP effect on two different LEO orbits. This is a
however, do not investigate the problem of providing large band- snapshot of a 3D visualization of the HEMP shown in Fig. 1. The inverted cone leav-
ing the Earth’s surface represents the immediate radiation caused by an explosion.
width to the disaster-affected regions to enable the evacuation of
It is likely that satellites with line-of-sight to the explosion will be compromised,
important data — the focus of this work. especially the ones inside the denoted region.
2.2. Providing high bandwidth in delay-Tolerant networks

however, when buffers are present, our method is also able to use
In [23], a time-invariant graph is proposed to decide how to op- them if necessary.
timally route data through a DTN. Their approach has similarities
with ours; however, they consider that all contacts between dif- 3. HEMP Impact on satellites
ferent mobile nodes take a very short time to conclude (i.e., they
are atomic); and they consider a network consisting of only mo- Possible impacts of a HEMP attack on satellites are studied in
bile nodes (i.e., without considering routing within fixed, terrestrial [3]. Such an event has coupled effects: immediately, there is an
networks). impacting radiation wave (which is more severe in regions within
Another time-invariant graph to define how to route data in a line-of-sight of the explosion); and, later, there is the formation
vehicular network is presented in [24]. The authors do not consider (or worsening) of long-lasting radiation belts. In [3], most satellites
contacts to be atomic; however, they only consider fixed access considered were hardened to a higher level than common com-
points to which the mobile network is connected, without consid- mercial communication satellites. Report [3] concludes that there
ering the terrestrial network connecting these access points to one is a significant risk that LEO satellites will fail either due to imme-
another. Thus, the routing that occurs within the terrestrial net- diate or long-term effects of a HEMP burst, while Medium-Earth
work is not accounted for — an aspect that our current work takes Orbit (MEO), and GEO systems are usually not significantly im-
into consideration. In Section 7, we explore how this solution com- pacted. There is no record of satellites effectively changing orbits
pares to ours. due to HEMPs: their effects are limited to a satellites’ electrical and
To the best of our knowledge, most works on routing large electronic malfunctions.
amounts of data in DTNs are based on the premise of “carry and As Fig. 2 depicts, the HEMP radiation propagates upward
forward”, i.e. storing data in buffers to transmit them at later throughout a large region of Earth’s atmosphere. In the scenario
time. A similar approach is also presented in [25]. In general, presented, multiple LEO satellites can get damaged due to the
the carry-and-forward premise is usually not suitable for networks HEMP. Historical data from the 1962 Starfish Prime nuclear test
that lack large buffers and that can retransmit packets almost in- shows that 8 of the then 21 satellites in LEO orbits were com-
stantaneously upon receiving them, which is usual in terrestrial promised [26]. Although many satellite constellations have in-orbit
networks. To enable the use of terrestrial networks, our method spares, these take some time to be put into appropriate position.
(as explained in Section 5.4), allows packets to be retransmitted A severe strike can render even these backups useless. An effec-
as they are received without needing to store them in buffers; tive strategy that relies on LEO satellite communications after a HEMP
II. Output: A node-to-node time schedule listing all transmissions

to be sent at each instant.
III. Goal: An evacuation plan that maximizes the amount of critical
data evacuated from the affected region in a given time period.
IV. Constraints: Link rates, duration of the periods through which
satellites get to communicate with each other and with GSs,
and buffers.
Illustratively, data stored in some system (e.g., a Data Center) in
the affected sub-component of Fig. 3 flows through the terrestrial
links of the network (called non-intermittent links) until it reaches
a GS. From there, data is sent to the LEO network through possibly-
intermittent links (due to the likely disruption of the satellite con-
stellation and its coverage) GS → Satellite. In the satellite network,
data might be exchanged (and possibly buffered) among satellites
by means of ISLs (shown in bold solid red arrows in Fig. 3) un-
til a GS in the main network is contacted, and data can flow from
the satellites through these GSs of the main network. Finally, the
evacuated data is routed inside the main terrestrial network until
Fig. 3. Proposed architecture. Bold dashed red arrows are ground-to-satellite links.
Bold solid red arrows are ISLs. Thin dotted green arrows represent the Control Plane it reaches its destination Data Center. Our main contribution is the
interaction with elements. (For interpretation of the references to colour in this fig- algorithm described in Section 5, which performs the calculation
ure legend, the reader is referred to the web version of this article.) of this node-to-node transmission schedule.
The algorithm can be executed by the SDN controller (yel-
low “Control” box of Fig. 3), which is in charge of enforcing
needs to consider that constellations might not be fully operational,
the evacuation plan on the network elements (satellites, GSs,
with some damaged satellites.
switches and routers, etc.). Thus, we can avoid shortest-path rout-
ing and fully utilize the capacity of all links, even the ones that
4. Problem statement will be established in the future. Contact Graph Routing (CGR)
[28] is a distributed routing mechanism to compute routing solu-
In this work, we assume that: tions across time-varying networks (i.e., whose topology changes
i. There are multiple Ground Stations (GSs) capable of handling through time). Unlike CGR, our approach maximizes the utilization
several simultaneous ground-to-satellite links; of transmission capacity while also allowing the implementation of
ii. Satellite networks might have Inter-Satellite Links (ISLs), and TE, which can guarantee overall fairness among different affected
each different system has its own particular network topol- Data Centers.
ogy with regards to how satellites communicate with one an-
other. New connections to other satellites might be established 5. Routing and scheduling algorithm
as satellites travel through their orbits;
iii. All affected components have access to at least one GS; In this section, we often refer to the examples of Figs. 4 and 5,
iv. The SDN controller is able to reserve any links for transmission, which are based on the physical topology of Fig. 4a. For illustration
even after the disaster; purposes, this example shows a fragmented network; however, the
v. Satellites are capable of setting up links in negligibly small time same procedures are valid for a non-fragmented network as well.
compared to the duration of their transmissions. Also, satellites
5.1. Initialization
are capable of transmitting using different protocols and fre-
quencies, such that they can establish links with each and every
Before the calculation begins, it is necessary to exchange infor-
covered GS;
mation between all nodes and the controller. In case the network
vi. Even after the disaster, all the undamaged elements are capa-
is still connected (albeit possibly being congested), we consider
ble of exchanging control information (e.g., through some emer-
that the packets for this initial information exchange will have the
gency communication channel).
highest priority and, thus, this exchange will be performed by the
Now, we consider a system architecture as shown in Fig. 3. The surviving terrestrial infrastructure. In case the network becomes
bottom-leftmost box represents the affected system from where fragmented, we consider that there will be surviving GEO or other
data will be evacuated. This system consists of a data storage facil- higher-orbit satellites (which are not likely affected by a HEMP), or
ity connected to a GS through some terrestrial network infrastruc- even other types of communication such as microwave links, etc.,
ture. The bottom-rightmost box shows the main network to where which can be used to perform this initial information exchange.
data will be evacuated with the assistance of a satellite network. Because of the severity of a HEMP attack, such exchange should be
The yellow Control box in Fig. 3 is an abstraction of all the SDN promptly executed due to matters of national safety. In this phase,
controllers [7] of each of the networks involved in the evacuation all GSs inform the control plane about their positions, buffering ca-
(i.e., satellite and terrestrial networks). Post-Disaster Data Evacua- pacity, satellite communication systems transmission rates, etc. The
tion Problem Statement controller also has knowledge of the location of all the operating
satellite GSs in the main network. We assume that the controller is
I. Given: up-to-date about orbit information from all undamaged LEO satel-
• Terrestrial network topology: undamaged nodes and links, lites that are going to be used during the evacuation.
including geolocation information of GSs;
• Node and link information: node buffering and switching 5.2. Contact times calculation
capacity, and link data rates and delays; and
• Satellite network topology: remaining LEO satellites and re- In this step, the future trajectories of the satellites are calcu-
spective ISLs, including orbit predictors [27]. lated to determine contact events for a large time period T during
1
Sat2,8 E,8
1
DC,8
4 1
D,8
1
B,6 Sat2,6
1
Isolated,6 ∞
1
1
A,6
1
B,5 Sat2,5 E,5
∞ 1 1
(a) ∞
Isolated,5 DC,5
1 1
i j k l m n ∞ 1 1 Sink
A,5 Sat1,5 D,5
Sat 2→GS E
B,3 Sat2,3 E,3
∞ 1 2 1
GS B→Sat 2 ∞
1
Isolated,3 DC,3
Sat 1→GS D Source 1
1
1 2 ∞
GS A→Sat 1 A,3 Sat1,3 D,3
∞ B,2 E,2
1 2 3 5 6 8Time (time units) 1 1
1
Isolated,2 DC,2
(b) ∞ 1
1
2 1
A,2 Sat1,2 D,2
it j:
Mb 2s , 1M B,1
s, 1 bit
Z 1
i:1
bit Sat 1 k:3 1
s, 2M
s, 1M bit
Isolated,1
j:2 Y
1
l:5 1
bit GS A bit s, 1 GS D −, 1 A,1 Sat1,1
M s , 2M Mb M bit
−, 1 k:3 it
O X
Isolated −, 1
M
k:
3s, M bit DC
bit 2 Mb −, 1
it Fig. 5. Event-Driven Graph representation of the Contact Graph from Fig. 4. Bold
GS B l:5 GS E
s, 1M edges are buffers. Solid edges are non-intermittent links. Dashed edges are inter-
bit bit mittent links. Dotted lines connect a dummy source to all occurrences of the Iso-
m: s, 4M
6s, Sat 2 n:8 lated vertices and all DC vertices to a dummy sink. Edges with zero capacity are not
1M shown. All capacities are in Mbit.
bit
(c)
Fig. 4. (a) shows an example of two terrestrial networks communicating through

a satellite network. Dashed red arrows are intermittent contacts between GSs and with multiple edges in the Contact Graph (and multiple edges and
satellites. All links have a 1 Mbit per time unit (t.u.) data rate, and both satellites
have 125 kB buffer (1 Mbit). (b) shows the times and durations of each contact
multiple vertices in the Event-Driven Graph).
that is depicted in (a). (c) shows the atomized Contact Graph that represents the To understand why this procedure is necessary, consider a sce-
partially-overlapping contacts of (a) and their capacities. Solid edges represent non- nario where two satellites start covering one particular GS at ex-
intermittent links. Dashed edges represent atomized contacts labeled as s slice: t actly the same time, but one of these satellites will cover such GS
time unit, c Mbit. (For interpretation of the references to colour in this figure leg-
for a longer time. In such a scenario, one of the possible actions
end, the reader is referred to the web version of this article.)
that the GS can take is to send data to one satellite (the one with
the shortest contact, which may offer a larger data rate, for exam-
which satellite assistance will be used. We define as a contact the ple) and, as soon as this contact is over, start sending data to the
period when two nodes get to communicate, e.g., when a satellite other satellite (the one with the longer contact). The Contact At-
is passing above a GS. This step can use an orbit propagator tool to omization step allows switching from the first satellite to the sec-
simulate satellite movements during the future period T. Once the ond to be possible in the Contact and Event-Driven Graphs (which
system has calculated the contact times between satellites, and be- would not be the case if we simply ignored this step). This is be-
tween each satellite and each GS, a list of contacts is created. The cause the Event-Driven Graph is a directed, acyclic graph. Thus, if
contacts in this list may happen simultaneously or not, i.e., they each partially-overlapping contact were represented by only one
may overlap each other. If they start and end at the exact same edge (i.e., without the Contact Atomization), it would not be possi-
times, we say that such contacts completely overlap one another. If ble to compute an evacuation strategy that included both satellites
they don’t start and end at the exact same times (but still overlap), in this example.
we say that they partially overlap. In the illustrative example shown Let each non-atomized contact ci have its duration interval be
in Fig. 4b, each contact has a duration of 4 time units (t.u.). Con- defined as di = (si , ei ), where si is its start, and its end time is ei .
tacts GS A → Sat 1, Sat 1 → GS D, and GS B → Sat 2 partially overlap. Before the atomization procedure, for any two contacts ci and cj ,

the possible values that an intersection di d j can result in ranges
5.2.1. Contact atomization from empty (i.e., no intersection) to the smallest between di and dj
In the next steps, we will build two graphs that will allow us to (and everything in between these two options). After the Atomiza-

create the node-to-node transmission schedule. For that, we must tion procedure, the intersection di d j between any two contacts
first transform partial overlaps into complete overlaps using a pro- ci and cj is either empty or it is equals to di and dj (when both
cedure that we call Contact Atomization. This procedure transforms are equal to each other). The solution of this specific problem is
partial overlaps into smaller complete overlaps. As will be shown, not the focus of our study, but efficient algorithms can be found in
this will allow us to represent each partially-overlapping contact [29].
The partially-overlapping contact of Fig. 4b can be represented vertices B,1 and B,2 of the graph in Fig. 5, for example, are only in-
by the smaller completely-overlapping contacts of each interval cluded at the t = 2 XY-plane because of the non-intermittent links
(slice) i, j, k, l, m, and n. The output of Contact Atomization step Isolated → GS A and Isolated → GS B of the graph in Fig. 4. Because
is a list of either non-overlapping or completely-overlapping con- GS B has no buffer, there is no edge connecting B,1 and B,2 in Fig. 5.
tacts, each labeled α : δ , γ where α identifies the slice, δ the in- Since bidirectional connections prevail in both terrestrial and
stant when the contact happens, and γ the data transmission ca- satellite networks, we introduce a method to represent bidirec-
pacity of the contact. In Fig. 4b, contact GS A → Sat 1 is atomized tional contacts in the Event-Driven Graph GT . Because the contact
to edges i: 1s, 1Mbit; j: 2s, 1Mbit; and k: 3s, 2Mbit. atomization process slices partially-overlapping contacts, several
atomized contacts end up happening simultaneously. Compared to
5.3. Contact graph the overall evacuation period, terrestrial links, satellite-to-satellite
links, and satellite-to-GS links have negligibly-small propagation
This step consists of generating a graph GC that represents the delay, thus traversing from one node to any other at one single
topology of the network where contacts between satellites and GSs instant t is assumed to be possible (given that there is a route be-
(i.e., intermittent links) are represented by edges labeled accord- tween them). This process is performed in the translation from the
ing to the moment they occur and with capacity equal to the link undirected Contact Graph of Fig. 6a to the directed Event-Driven
transmission rate multiplied by the contact duration. In this ex- Graph of Fig. 6b. In this example, edges S1 → T0, T1 → S0, F1 → T0,
ample, since all contacts are atomized, the Contact Graph shown and T1 → F0 refer to the same instant t. Also, in Fig. 6, we omit the
in Fig. 4c has more intermittent edges than the original topology contacts that nodes S and T will perform in future t and t (i.e.,
of Fig. 4a. Note that we also include non-intermittent links in the edges of vertices S0, t and T0, t are not shown); but the buffer
Contact Graph. To do so, instead of atomizing and labeling these edges connecting these nodes at t to their respective representa-
edges, we set its time label to “ − ”, and its capacity equals to tions in the following t and t are shown. The capacities of the
the link data rate (see edges Isolated → GS A, Isolated → GS B, GS edges S0 → S1, T0 → T1, and F0 → F1 are the switching capacities
D → DC, and GS E → DC in Fig. 4c). SWA of each node A (in this work, we consider these infinite). The
link between nodes T and F is non-intermittent. Thus, nodes F0, t
5.4. Event-Driven graph and F1, t represent F at instant t. Note that it is possible to flow
data from F0 to S1 in the same instant t.
Based on the Contact Graph GC generated above, we now cre- We present a pseudo-code for this step in Algorithm 1, which
ate an Event-Driven Graph GT . The current example of Fig. 4c can creates the directed Event-Driven Graph GT from the bidirectional
be transformed into the Event-Driven Graph shown in Fig. 5. The Contact Graph GC . The notations S0, t; S1, t, T0, t; T1, t, Q0, t; Q1,
Event-Driven Graph can be understood as a 3D structure. The first t, P0, t; P1, t, U0, t; U1, t, and W0, t; W1, t all refer to different
dimension (X) represents intermittent links, the second dimension bidirectional event-driven vertices, such as the ones exemplified in
(Y) represents non-intermittent links, and the third dimension (Z) Fig. 6. It first creates a copy of the Contact Graph with all intermit-
represents buffer edges. tent edges removed. Then, the outer for loop of line 5 goes through
GT is a directed graph that removes the time dimension from all intermittent edges of the original Contact Graph. For each edge,
GC ’s edges by representing each physical node A with a set of ver- it adds a pair of vertices representing the edge’s source and des-
tices, one for each instant ti when node A participates in some tination (connecting them as was discussed) to set AllVertices. It
atomized contact (each of these vertices in GT being labeled “A, also adds corresponding edges to AllEdges. For the source and the
ti ”, where the set of ti are all the times when an atomized con- destination nodes of each intermittent edge, it goes through (for
tact involving node A happens). When a contact happens at ti loops of lines 11 and 19) each edge in the connected component
and the very next contact happens at tj , we connect A, ti to A, tj of the copied Contact Graph (without intermittent edges) that that
with an edge representing node A’s buffer (thus, the capacity of node is part of. Then, it adds a pair of vertices representing each of
such edge is equal to that node’s buffer). In Fig. 5, vertices Sat1,1, the graph component’s nodes to AllVertices and edges to AllEdges.
Sat1,2, Sat1,3, and Sat1,5 represent Sat1 of the Contact Graph in Once all intermittent edges have been considered, the rest of the
Fig. 4c. Note that bold edges Sat1,1 → Sat1,2, Sat1,2 → Sat1,3, and pseudo-code adds vertices and then edges to the Event-Driven
Sat1,3 → Sat1,5 represent the buffer of Satellite 1. Accordingly, an Graph (by first ordering by time-of-contact and then traversing the
intermittent edge between nodes A and B with a labeled time ti sets of all edges and vertices).
from graph GC is translated to an edge from A, ti to B, ti in GT . The As we will discuss in the next section, the performance of this
capacity of such an edge is the same for both GC and GT . In our step is directly proportional to the size of its input, the Contact
example, edge A,1 → Sat1,1 of Fig. 5 represents atomized contact GS Graph. Similarly, the performance of the flow-maximization algo-
A → Sat1 labeled i: 1 s, 1 Mbit of the graph in Fig. 4c. rithm, discussed in the next step, is directly dependent on the size
Note that we consider non-intermittent links in the Event- of the Event-Driven Graph generated in this step. Since both Con-
Driven Graph GT . To this end, if there is a contact at time t between tact and Event-Driven Graphs can be of large sizes, it is important
nodes A and B, we add vertices representing all nodes C, D, E, . . . to implement this algorithm efficiently.
that are reachable from A, and all nodes J, K, L, . . . that are reach-
able from B through non-intermittent links in GC . Each of these 5.5. Flow maximization
vertices carry the label t, for every t when there is a contact in a
node A that can reach them. The capacities of the edges that in- So far, all the procedures shown can be applied in the same
terconnect these vertices of GT are equal to the equivalent GC edge way whether the network was fragmented or not (and if so,
capacity multiplied by the duration of the contact that starts at t. whether it was fragmented into two or more fragments). Now,
Indeed, this consists of replicating the whole contiguous network however, some of these cases have to be treated separately.
attached at nodes A and/or B whenever there is a contact involv- To maximize the amount of data evacuated from the affected
ing these nodes. Thus, GT will have a set of vertices for each node Data Centers to the target Data Center in the main connected net-
C reachable from some node A for every time t that A participates work, we execute a flow-maximization algorithm on top of the
in a contact with some other node. Like before, the vertices repre- Event-Driven Graph created. The output of this step is a node-to-
senting C at each t, t , t , etc., are also connected by edges leaving node transmission schedule. Thus, we consider two possible post-
C, t and reaching C, t with capacity equal to the buffer of C. The HEMP scenarios:
Algorithm 1 Event-Driven Graph Generation. 6. Scalability

Input: Contact Graph GC in
Now, we investigate how the proposed algorithm can be scaled
Output: Event-Driven Graph GT out
in case of large satellite constellations.
1: Create a copy of GC , and call it GC∗
2: Remove all intermittent edges from GC∗
6.1. Time and space complexity
3: Create Set AllVertices, Map AllEdges
4: for all intermittent edge einterm
c in GC do
The complexity of the algorithm is directly related to the size
5: Get Source, Target, capacity c, time t, and duration δ of einterm
c
of both the Contact Graph and its resulting Event-Driven Graph.
6: Add pair [S0, t; S1, t] to AllVertices
The size of the Contact Graph GC is at least the same as the
7: Add pair [T 0, t; T 1, t] to AllVertices
graph representing the simple network topology. In fact, size of
8: Add (eS1t→T 0t ; c ), (eT 1t→S0t ; c ) to AllEdges
GC is |VC | = |VPhysical | and |EC | = |ENon−interm | + , where |VPhysical |
9: Add (eS0t→S1t ; c ), (eT 0t→T 1t ; c ) to AllEdges
is the number of nodes in the physical infrastructure, |ENon−interm |
10: Get from GC∗ the connected component CS of Source
is number of non-intermittent links in the physical network, and
11: for all non-intermittent edge eQ→P in CS do
12: Set capacity a = link-throughput ×δ
is number of non-overlapping (intermittent) contacts resulting
from the atomization process.
13: Add pair [Q0, t; Q1, t] to AllVertices
The size |GT | of the bi-directional Event-Driven Graph is:
Add pair [P 0, t; P 1, t] to AllVertices
14:

15: Add (eQ1t→P0t ; a ), (eP1t→Q0t ; a ) to AllEdges
Add (eQ 0t→Q 1t ; a ), (eP0t→P1t ; a ) to AllEdges |VT | = 2 · (E (Reachv )) (E (v ))
16:
17: end for v∈VC t t t
18: Get from GC∗ the connected component CT of Target

19: for all non-intermittent edge eU→W in CT do
|ET | = |VT | + 2 · |EC |
20: Set capacity b = link-throughput ×δ where VC is the set of vertices of GC , Reachv is the set of vertices
21: Add pair [U0, t; U1, t] to AllVertices that participate in some contact and are reachable from v through

22: Add pair [W 0, t; W 1, t] to AllVertices non-intermittent links, E(v) is the set of edges of v, is the union
23: Add (eU 1t→U 0t ; b), (eW 1t→W 0t ; b) to AllEdges t
operation with regards to the time label t of each vertex v, and |EC |
24: Add (eU 0t→U 1t ; b), (eW 0t→W 1t ; b) to AllEdges
is the number of edges of GC .
25: end for
Since the occurrence of contacts is a result of satellite orbital
26: end for
propagation and GS locations, both |GC | and |GT | are highly depen-
27: Map of MinHeaps (by time) of vertex pairs MinHeapsMap
dent, not only on the terrestrial and satellite network topologies,
28: for all vertex pair in AllVertices do
but also on the GS locations and satellite orbits.
29: Add both vertices from pair to Event-Driven Graph GT
The steps presented in Section 5 can all be implemented with
30: Add vertex pair to its MinHeap in MinHeapsMap
linear time complexity with respect to inputs, except for the Event-
31: end for
Driven Graph generation and the flow-maximization steps. The
32: for all map-entry in AllEdges do
contact-atomization procedure resembles the Skyline Problem [29],
33: Add edge between vertex pair key of map-entry to GT
so it can be solved efficiently with the assistance of a min-heap in
34: end for
O(number of contacts) (since every contact must be pushed in and
35: for all MinHeap in MinHeapsMap do
out of the heap when they are being sliced). The Contact Graph
36: Vertex type 1 u1 ← MinHeap.pop()
can be generated quickly, e.g., for a meshed 720 satellite constella-
37: while MinHeap is not empty do
tion in a 20-hour evacuation period (studied in the next section),
38: Vertex type 0 w0 ← MinHeap.pop
it takes less than 0.9 seconds to generate GC . For the Event-Driven
39: Add buffer edge eu1 →w0 to GT
Graph GT , Algorithm 1 has worst-case performance of O(E2 ). Ap-
40: Set vertex type 1 u1 ← w1
propriate data structures, such as O(1) access-time Sets and Maps
41: end while
(a.k.a. HashSets and HashMaps) allow this algorithm to execute
42: end for
fast, e.g., in the previous scenario, it takes less than 44 seconds
43: return GT
to build GT on an Intel Core i5 machine, with 8 GB of RAM. Since
the volume of critical data may be significant (few terabytes) [15],
the evacuation will likely take several hours, thus, using only a few
5.5.1. No fragmentation or single isolated component seconds to compute the evacuation strategy is not an issue.
In this situation, add virtual Super Source and Super Sink nodes The flow-maximization algorithm must also be properly chosen
to the Event-Driven Graph and, respectively, connect them to all to minimize computational time of the evacuation plan. As noted
sources and all sinks at all ts. Execute a flow-maximization algo- in [31], several approaches can solve the flow-maximization prob-
rithm on GPT . lem, among which the fastest known worst-case is an O(VE) al-
gorithm. However, practitioners suggest different Max-Flow algo-
rithms depending on the application. For regular-sized constella-
tions, evacuation-plan calculation time tends not to be an issue.
5.5.2. Multiple isolated components However, in case of mega constellations — with possibly a few mil-
This problem has been addressed in [30] and related literature. lion vertices in its Event-Driven Graph — the previous algorithm
It consists of finding a Max Min Fair (MMF) fractional flow alloca- might not scale, so below we discuss methods to improve its run
tion between many sources and the Super Sink. In this case, due time.
to the emergency of the situation, we allow that the data evacu-
ated from each source Data Center end up in possibly-different tar- 6.2. Vertex shrinking and edge pruning
get Data Centers — this mixing can be later appropriately handled
by an off-line data reorganization procedure between the different Currently, at least two companies are planing to launch mega
target Data Centers. constellations [5,6]. In Section 7, we study how two different con-
stellations might evacuate data from the terrestrial topology pre- t , Cap = Y
sented in Fig. 1. The first one, consisting of 66 satellites evenly dis-
tributed in six orbits (loosely based on the current IRIDIUM system S (buf f erS )
[32]), generates a Contact Graph that contains about 100 vertices
and 70 0 0 edges, for an evacuation period of 6 hours. Its corre- t, Cap = X
sponding Event-Driven Graph contains about 160,0 0 0 vertices and
T (buf f erT )
440,0 0 0 edges. The second constellation, of 720 satellites (evenly
distributed in 18 planes, loosely-based on [5]), for the same evac- −, Cap = Z
uation period, yields a Contact Graph of 700 vertices and 30,0 0 0
t , Cap = W
edges. Its corresponding Event-Driven Graph contains almost 4 F (buf fF )
million vertices and more than 10 million edges.
We present some techniques, based on [33], to help shrink the (a)
size of the Event-Driven Graph. Shrinking this graph reduces mem-
ory consumption and makes the flow-maximization algorithm run
faster. These techniques perform shrinking of vertices into super- S0, t X T0, t Z· F0, t
δ
vertices and pruning of edges, while maintaining important infor- X δ Z·
mation about contact times so that the final transmission schedule SWS SWT SWF
can still be precisely generated, thus they do not remove accuracy
S1, t T1, t F1, t
from the evacuation plan. These techniques can be always used.
However, they might take some time to execute and, in a small buf f erS buf f erT
constellation, this increase might not compensate for the improve-
ment in the max-flow execution time. S0, t T0, t
(b)
6.2.1. Event-Driven graph pruning
We create the pruned graph GPT from the Event-Driven Graph Fig. 6. The bidirectional Contact Graph in 6 a can be represented as the Event-
GT by traversing the latter starting at the Source node and record- Driven Graph in 6 b. The duration of the slice that starts at t is δ .
ing in a set SReach all the vertices that were reached. Then, we tra-
verse GIn v (edge-inverted version of G ) starting at the Sink and
T T
Inv
recording in another set SReach all the reached vertices. Finally, we 7. Illustrative numerical examples
check ∀v ∈ VT if v was reached in both traversals, i.e., if v ∈ SReach
Inv . If not, we remove v and all its incident edges from
and v ∈ SReach In this section, we present and compare some illustrative re-
sults of the proposed approach. The algorithm was implemented
the GT . The resulting graph is GPT . To create GPT from GT of Fig. 5,
in Java, using Orekit [34] for orbit calculations, and the JGraphT
vertices B,1; B,2; A,5; A,6; E,2; E,3; E,5; and D,8 are pruned. This
library [35] to aid in graph-related computations.
procedure does not affect the evacuation calculation because the
We start by analyzing what factors impact the most the
vertices and edges that are pruned are either not reachable from
data evacuation (Section 7.1). Then, we study how TE can pro-
the affected system (i.e., there is no sequence of transmission pos-
vide fairness among multiple different network fragments, in
sible that allow such node to be reached at that time); or, if they
Section 7.2 (i.e., if the network is severely damaged, to the point of
are reachable, data cannot leave those nodes and reach the des-
disconnecting multiple elements). Finally, in Section 7.3, we com-
tination Data Centers (i.e., there is no sequence of transmissions
pare our solution to a traditional DTN-based approach that could
leaving that node at that time and reaching the destination Data
also be utilized to evacuate data after a disaster.
Centers).
7.1. Scenario a: Evacuating data from affected region

6.2.2. Switching matrix shrinking
In case the switching capacity of the nodes is infinite, or very Configuration: The terrestrial network is presented in Fig. 1.
large, compared to any of the link throughputs in the network, we Data Centers are connected to nodes 1, 8, 12, 14, 22, 27, 34, and
merge all pairs of nodes (as described in Fig. 6) that represent the 36. Each Data Center is connected to the node closest to it through
same physical node in t into one single node. This procedure can a 100 Gbps link. GSs are connected to nodes 1, 12, 30, 34, and 36.
be performed earlier, during the Event-Driven Graph generation, After the HEMP, the link between nodes 33 and 40 gets heavily
by simply not splitting vertices into two, if the switching capaci- congested, as nodes 28 and 29 stop functioning. This post-disaster
ties are all very large. When switching capacity is very large, this topology can be seen in Fig. 7. We compare how two different
procedure does not affect the node-to-node evacuation schedule. satellite constellations would perform. One contains 66 satellites
This is because the edges representing the switching matrix in the and the other has 720 satellites. All satellites in both constellations
Event-Driven Graph would have capacities larger than the aggre- (except those in the first and last orbit planes) have 4 ISLs, one
gate incoming/outgoing capacities of the other edges that touch to each of its closest neighbors.1 Satellites may establish ground-
such vertex, thus, the switching edges are not bottlenecks. to-satellite links with as many GSs inside their coverage area as
possible. Note that a scenario of no damage to the satellite network
(i.e., 0% damage) accounts for a non-HEMP disaster.
6.2.3. Event-Driven graph capacity normalization
We consider four different configurations for each constellation
Based on [33], this step is meant to speed up the max-flow al-
(see Table 1). We consider three different damage scenarios for
gorithm. We traverse all vertices from the Event-Driven Graph; in
each constellation, i.e., one where no satellites are damaged, which
each of them, we note the edge with maximum capacity leaving
the node; and, if that capacity is larger than the summed capac-
ity of all incoming edges, we reduce that outgoing capacity to the 1
Due to lack of field data describing orbits for a super constellation, the orbits in
sum of the incoming edges. This step improves the run time of the this example were generated analytically, assuming orbits are circular and satellites
flow-maximization algorithm. are evenly distributed among them.
66 Constellation
Data Evacuated After 12 Hours
No-buffer, No-ISL
Evacuated Data (TB)

Buffer, No-ISL
No-Buffer, ISL
250 Buffer, ISL
200
150
100
50
0
0% 15% 30%
Damaged Satellites
(a)
Fig. 7. Post-disaster topology of Scenario A. The network remains connected but
the link between nodes 33 and 40 becomes heavily congested.
720 Constellation
Table 1 Data Evacuated After 12 Hours
Constellation Configurations.
No-buffer, No-ISL
Evacuated Data (TB)

Configuration Buffer ISL Downlink/Uplink Buffer, No-ISL
(TB) (Gbps) (Gbps)
No-Buffer, ISL
No-buffer, No-ISL 0 0 10 250 Buffer, ISL
Buffer, No-ISL 1 0 10
200
No-buffer, ISL 0 10 10
Buffer, ISL 1 10 10 150
100
50
corresponds to a non-HEMP disaster; one where 15% of satellites
0
are damaged after the HEMP; and one where 30% of satellites are 0% 15% 30%
damaged. Damaged Satellites
In Fig. 8, the amount of data evacuated from the affected region
after 12 hours of evacuation is shown for each satellite configura- (b)
tion and for each different satellite damage scenario. Intuitively, for Fig. 8. Total evacuated data for a 66-satellite constellation and a 720-satellite con-
both constellations, the larger the damage to the constellation, the stellation. Each group of bars represents a different amount of damaged satellites
less data is evacuated. Comparing the different configurations, we in each constellation.
observe the following:
• The combination of buffers and ISLs performs the best, but ISLs
evacuate from a GS during a contact also varies. The amount of
have more impact than buffers;
data received from a GS is also subject to the size of the satellite’s
• ISLs are more beneficial than buffers when the constellations
buffer, to the amount of data it can retransmit to other satellites
are not damaged, most notably for the 720 constellation with-
(i.e., its ISL), and, ultimately, to the capacity of the terrestrial net-
out damage. This benefit fades away as the fraction of dam-
work through which the evacuated data will flow before reaching
aged satellites increases, because the damaged satellites create
its destination Data Center.
holes in the covered area, minimizing the number of occasions
In Fig. 9, we explore how different buffers and ISL configura-
in which a GS in the affected region and a GS in the main net-
tions affect the amount of evacuated data for increasing Down-
work are covered by the constellation simultaneously. Hence,
link/Uplink bandwidths after 12 hours of evacuation for the un-
as the satellite network gets more damaged, the satellites start
damaged scenario of each constellation. In each graph, two groups
acting as physical data carriers (using their buffers).
of results are shown: for buffer sensitivity analysis, a group with
• The 720 constellation evacuates more data in the configurations
ISLs and a group without ISLs (dashed lines); for ISL sensitivity
“Buffer, No-ISL”, “No-buffer, ISL”, and “Buffer, ISL”. This is due to
analysis, a group with buffers and a group without buffers (dashed
a smaller reduction in the overall covered area for the 720 con-
lines).
stellation and to the higher amount of total transmitting capac-
In Fig. 9a, the impact of different buffer sizes is investigated
ity in that constellation;
for the 66 constellation. The sizes of buffer has a limited effect on
• A lack of buffers and ISLs hinders more the performance of the
the performance of this constellation (specially in the presence of
720 constellation than the 66 constellation. This is because one
ISLs), as most of its transmissions occur by relaying communica-
satellite in the 720 constellation rarely covers a GS in the af-
tion from GSs in the affected region to GSs in the main network
fected region and, at the same time, a GS in the main network
when both of them are inside the covered area. Thus, the amount
since its coverage area is smaller.
of evacuated data continuously increases as the throughput of the
The performance of the two constellations differs mainly due ground-to-satellite links grows, to the point where it is capped by
to the different covered area per satellite. As the coverage areas the maximum throughput of the terrestrial network.
of satellites in each constellation have different sizes (smaller in Fig. 9b shows the impact of different ISL capacities in the evac-
the 720 constellation), it takes less time for satellites in this con- uation strategy. As in Fig. 9a, the performance of the evacua-
stellation to cross it than it takes for satellites in the 66 constel- tion strategy is more impacted by the capacity of the ground-to-
lation to cross their coverage area. Because of this difference, the satellite links. Fig. 9b shows that the presence of ISLs has a much
amount of data that a satellite in each of these configurations can higher impact than the presence of buffers, while the capacities of
66 Constellation - No Damage 66 Constellation - No Damage

Data Evacuated after 12 Hours with Different Buffers Data Evacuated after 12 Hours with Different ISLs
Buff = 0.1 TB; ISL = 10 Gbps Buff = 1 TB; ISL = 10 Gbps
Buff = 1 TB; ISL = 10 Gbps Buff = 1 TB; ISL = 20 Gbps
Evacuated Data (TB)

Evacuated Data (TB)

Buff = 0.1 TB; ISL = 0 Buff = 0; ISL = 10 Gbps
600 Buff = 1 TB; ISL = 0 600 Buff = 0; ISL = 20 Gbps
Buff = 10 TB; ISL = 0 Buff = 0; ISL = 30 Gbps
500 500
400 400
300 300
200 200
100 100
10 30 50 70 90 10 30 50 70 90
Downlink/Uplink (Gbps) Downlink/Uplink (Gbps)
(a) (b)
720 Constellation - No Damage 720 Constellation - No Damage

Data Evacuated after 12 Hours with Different Buffers Data Evacuated after 12 Hours with Different ISLs
Buff = 0.1 TB; ISL = 10 Gbps Buff = 1 TB; ISL = 10 Gbps
Evacuated Data (TB)
Evacuated Data (TB)

Buff = 0.1 TB; ISL = 0 Buff = 0; ISL = 10 Gbps
600 Buff = 1 TB; ISL = 0 600 Buff = 0; ISL = 20 Gbps
Buff = 10 TB; ISL = 0 Buff = 0; ISL = 30 Gbps
500 500
400 400
300 300
200 200
100 100
10 30 50 70 90 10 30 50 70 90
Downlink/Uplink (Gbps) Downlink/Uplink (Gbps)
(c) (d)
Fig. 9. Total evacuated data for the 66 constellation and the 720 constellation for different ground-to-satellite link throughputs.
the ISLs (i.e., when they are present) do not yield much difference, which gets saturated when the Downlink/Uplink capacity is larger
whereas the capacities of the buffers in Fig. 9a have greater influ- than 70 Gbps.
ence on the performance. The above results show that ISLs have a higher impact on the
Fig. 9c also explores different buffer sizes, but for the 720 con- overall amount of evacuated data. This impact is more intense in
stellation. As satellites in this constellation rarely cover a GS in the the 720 constellation. However, in both constellations, the benefits
affected region and a GS in the main network at the same time, the of the ISLs decline when the constellation is severely damaged, to
performance of this constellation is deeply tied to the presence of the point where satellites start acting as physical carriers of data.
buffers and ISLs. The biggest difference in performance occurs be-
tween the group with ISLs and the group without (again, the curve 7.2. Scenario b: Traffic engineering with two fragments
with ISLs and larger buffers perform best). In this case, however,
the amount of evacuated data converges to the maximum evacu- We show how TE — enabled by our approach — can be worth-
ation capacity of the terrestrial network with a smaller increase while. The satellite network considered is the 66 constellation with
in the ground-to-satellite link throughput than in the 66 constella- a 15% damage.
tion. Note, however, that such maximum is reached by three differ- Configuration: In this scenario, the network of Fig. 1 is frag-
ent configurations (i.e., 10 TB buffers and 10 Gbps ISLs; 1 TB buffers mented. Nodes 40 and 35 also stop working. Thus, the terres-
and 10 Gbps ISLs; and 10 TB buffers and no ISLs). The other config- trial network is divided in three: the main network; a component
urations do not achieve that maximum by simply increasing the consisting of nodes 34 and a Data Center; and third component
ground-to-satellite link capacities, i.e., the satellite network gets formed by nodes 36 and another Data Center. DC34 and DC36 (dis-
saturated before the terrestrial network does in these configura- connected from each other) are trying to evacuate their data to the
tions. main network through the impaired LEO satellite constellation to
Finally, Fig. 9d describes the impact of different ISL capacities which they are connected through GSs at nodes 34 and 36 (re-
for the 720 constellation. This plot confirms how the transmis- spectively). This topology is shown in Fig. 10.
sion rate of ISLs is more crucial in increasing the performance of We compare a Max Min Fair (MMF) fractional flow allocation
the network than the presence of buffers for the undamaged 720 policy to an earliest delivery (shortest-path) non-TE approach, sim-
constellation. Note how the performance of the 20 Gbps ISLs with ilar to the scenarios presented in the previous section. In both
buffers, the 30 Gbps ISLs with buffers, and the 30 Gbps ISLs with- cases, flows can be fractioned across different paths. MMF brings
out buffers are very similar. This shows that the limiting factor flows up to a Pareto optimum equilibrium, i.e., when no flows can
for these configurations is the capacity of the terrestrial network, be further increased without forcing some other flow to be low-
720 Constellation - No Damage

Comparison of Data Evacuated After 12 Hours
250
Our Approach - DC 1
Our Approach - DC 8
Our Approach - DC 14
Evacuated Data (TB)

200
Our Approach - DC 22
Our Approach - Total
150 DTN-based - DC 1
DTN-based - Total
100
50
Fig. 10. Post-disaster topology of Scenario B. The network becomes fragmented be-
cause nodes 40 and 35 fail as well. 0
1 2 3 4 5 6 7 8 9 10 11 12
66 Constellation Hours After Disaster
Data Evacuated With and Without Traffic Engineering
120 Fig. 12. Data evacuated using our solution vs. a Delay-Tolerant-Vehicle network ap-
DC34 - No TE
Evacuated Data (TB)
DC36 - No TE proach. Our approach evacuates data to different DCs, while a DTN-based approach
100
DC34 - MMF only evacuate data to DC 1, since it is the only one directly connected to a GS (see
80 DC36 - MMF Fig. 1).
60
40
720 Constellation - Data Centers With Ground Stations
Comparison of Data Evacuated After 12 Hours
20
115%
0
Evacuated Data Relative to
No-buffer, No-ISL Buffer, No-ISL No-Buffer, ISL Buffer, ISL

DTN-based Approach
Fig. 11. Scenario B: Data evacuated using MMF vs. a non-TE earliest delivery 110%
(shortest-path) approach.
105%
ered. The non-TE approach is the theoretical maximum that a dis-
tributed protocol (e.g., Contact Graph Routing [28]) can achieve
without utilizing extra tools to implement fairness. Non-TE flows 100%
can be routed through ISLs, are aware of the full contact plan, and
prioritize shortest paths in this plan.
The results shown in Fig. 11 demonstrate how TE can be im- 95%
plemented in all configurations of Table 1. It shows that a non-TE 0% 15% 30%
approach to evacuate data does not provide fairness among differ- Damaged Satellites
ent components. Without TE, DC34 gets to evacuate significantly
Fig. 13. Comparison of the amount of data evacuated after 12 hours using our so-
more data at the expense of DC36 evacuating significantly less. lution relative to that of a DTN-based approach.
DC34 ends up blocking resources from DC36, because its GS gets
to see satellites first. This effect accumulates such that, after 12
hours, DC36 evacuates 46% less data than it would if MMF had utilized; thus, no evacuation schedule can be calculated (whereas
been utilized instead. our method can still evacuate data in such situation). Because of
The summed amount of data evacuated from both DCs 34 and this, in this section, we first study a scenario where only one des-
36 is the same in both MMF and non-TE approaches, an observa- tination DC is locally connected to a GS (in Fig. 12); and, then, a
tion elucidated in [30], i.e., max-flow over multiple sources is the scenario where all destination and affected DCs are connected to
same as the max-flow over a single dummy source connected to local GSs (in Fig. 13).
the previous multiple sources. Because of geographical proximity In the results of Fig. 12, we utilize a 1 TB buffer, 10 Gbps ISL,
between GSs of nodes 35 and 34, they both have access to the 720 constellation (as described in Table 1, in line “Buffer, ISL”). The
same satellite resources, thus MMF is able to share capacity be- terrestrial network is shown in Fig. 1, and all links have 100 Gbps
tween the two DCs equally. capacity.
Fig. 12 demonstrates how our solution outperforms a DTN-
7.3. Comparison with DTN approaches based approach in a scenario where only one of the destination
DCs (i.e., DC at node 1) is directly connected to a GS, while other
We now study how our solution compares to a generic DTN- DCs are only accessible through the terrestrial network. Our ap-
based approach, as in [24]. As mentioned in Section 2, the main proach is able to evacuate data to DCs 1, 8, 14, and 22; while the
difference between [24] and ours is that, during the calculation of DTN-based approach can only reach DC 1. As a result, our solution
the evacuation plan, we consider the terrestrial topology that con- evacuates 60% more data than the DTN-based approach. It is not
nects GSs to one another. necessary that GSs will be collocated with DCs. Thus, for a proper
If either destination DCs or affected DCs are not directly con- evacuation of data after a disaster, our approach should be utilized,
nected to a local GS, the DTN-based approach cannot be directly instead of approaches that require DCs directly connected to GSs.
We also explore an opposite scenario where all DCs are lo- how several factors impact the performance of the data-evacuation
cally connected to GSs. Even in such a scenario, we note that our plan in two satellite networks of different constellation sizes. In
method either performs similarly or better than a DTN-based ap- the simulated scenarios, it is clear that ISLs have a higher contri-
proach. In fact, if the three following factors are true, our solution bution to the amount of evacuated data on undamaged satellite
outperforms a DTN-based approach: networks; and, as constellations are more damaged, buffers start
playing an important role. We showed how Traffic Engineering
1. The connection between a DC and its local GS provides less
can be implemented to enforce fairness among different discon-
bandwidth than the bandwidth between the GS and the af-
nected components. We compared our method to a DTN solution
fected DC provided by the satellite network (i.e., more data is
and showed that we can achieve up to 60% higher throughput. Fu-
being downloaded through some GS than its local DC can ab-
ture work includes studying how other aerial platforms with flexi-
sorb);
ble trajectories (such as planes and drones) might be used to min-
2. The local GS is also connected to a terrestrial network that
imize bottlenecks in the data evacuation plan.
can evacuate data to other, non-local DCs (i.e., because the DC
There is a growing interest in utilizing aerial platforms for com-
cannot absorb all the data being downloaded by the local GS,
munication purposes. The success in using these aerial platforms in
the GS must be able to evacuate the excess data to other DCs
post-disaster scenarios depends on efficient routing and schedul-
through the terrestrial network).
ing strategies that allow the maximum amount of data to be evac-
3. Connections between the other DCs and their local GSs are not
uated. Future work includes studying how other aerial platforms
completely utilized by the data being evacuated through those
might be used to minimize bottlenecks in the data evacuation
GSs (i.e., at least one other DC in the network can absorb the
plan. Also, the relationship between buffer sizes, ISL throughputs,
data being downloaded through the GS referenced above).
ground-to-satellite throughputs, and terrestrial network capacities
In the results of Fig. 13, we use the same topology of Fig. 1; must be investigated to devise an appropriate dimensioning of
however, only nodes 1, 30, 34, and 36 have DCs and GSs (DCs these elements, particularly when the goal of evacuating a specific
34 and 36 evacuate data to the others). We also consider a 720 amount of data in a certain amount of time is considered.
constellation where ISL and uplink/downlink capacities are all of
100 Gbps (the same capacity of the terrestrial network links). Acknowledgments
An example of the three factors above occurring is as follows:
(1) affected nodes 34 and 36 are evacuating data through the satel- R. Lourenço was funded by CAPES Foundation (Proc. 13220-13-
lite network; (2) the disaster created a hole in the satellite cover- 6). M. Tornatore acknowledges the research support from COST Ac-
age, and such hole is, at a certain moment, over GS 1 causing it tion CA15127. This work was supported in part by the Defense
to not be able to download any data from the satellite network Threat Reduction Agency grant HDTRA1-14-1-0047. We thank Dr.
(while GS 30 is still fully connected to the satellite network); and Paul Tandy of DTRA for many helpful discussions. We also thank
(3) at that particular moment, the amount of data being down- the anonymous reviewers for their helpful comments which sig-
loaded from the satellite network is larger than what its local DC nificantly improved the paper.
(30) can absorb. Thus, GS 30 must reach DC 1 through the ter-
References
restrial network so that the evacuated data that cannot be locally
stored in DC 30 can be retransmitted through the terrestrial net- [1] R.B.R. Lourenco, G.B. Figueiredo, M. Tornatore, B. Mukherjee, Post-disaster data
work and stored in DC 1. evacuation from isolated data centers through LEO satellite networks, in: Proc.
In Fig. 13, note that our approach performs just slightly bet- IEEE ICC, 2017.
[2] M.F. Habib, M. Tornatore, F. Dikbiyik, B. Mukherjee, Disaster survivability in
ter than the DTN-based solution for a scenario without damage optical communication networks, Comput. Commun. 36 (6) (2013) 630–644.
to the satellite network. This is because, our study utilizes evenly- [3] J.S. Foster Jr, et al., Report of the commission to assess the threat to the United
distributed satellites in the network with orbits that allow satellite States from electromagnetic pulse (EMP) attack: critical national infrastruc-
tures, Technical Report, DTIC Document, 2008.
coverage to be uniform. However, as Fig. 13 shows, when satellite
[4] F. Ranghieri, M. Ishiwatari, Learning from megadisasters: Lessons from the
coverage is not so uniform, e.g., due to satellite failures (i.e., the 15 great east japan earthquake, World Bank Publications, 2014.
or 30% damaged constellations), the factors listed above occur fre- [5] A. Vance, The new space race: one man’s mission to build a galac-
quently, resulting in more evacuated data with our solution when tic internet, 2015, http://www.bloomberg.com/news/features/2015-01-22/
the- new- space- race- one- man- s- mission- to- build- a- galactic- internet- i58i2dp6.
compared to the DTN-based approach2 Thus, our method achieves [6] D. Mosher, SpaceX just asked permission to launch 4,425 satellites,
high volumes of evacuated data not only when some DCs do not 2016, https://www.businessinsider.com/spacex- internet- satellite- constellation-
have access to local GSs, but also when all DCs are locally con- 2016-11.
[7] B. Barritt, T. Kichkaylo, K. Mandke, A. Zalcman, V. Lin, Operating a uav mesh
nected to GSs. internet backhaul network using temporospatial sdn, in: IEEE Aerospace Con-
ference, 2017, pp. 1–7.
8. Conclusion [8] S. Fichera, et al., On experimenting 5G: testbed set-up for SDN orchestration
across network cloud and IoT domains, in: 2017 IEEE NetSoft, 2017, pp. 1–6,
doi:10.1109/NETSOFT.2017.8004245.
We studied how to utilize satellite networks to reestablish com- [9] B. Mukherjee, M.F. Habib, F. Dikbiyik, Network adaptability from disaster dis-
munication after a disaster in order to evacuate data from dis- ruptions and cascading failures, IEEE Commun. Mag. 52 (5) (2014) 230–238.
[10] P.K. Agarwal, et al., The resilience of WDM networks to probabilistic geograph-
tressed regions in a post-disaster scenario. We focused on the oc-
ical failures, IEEE/ACM Trans. Netw. 21 (5) (2013) 1525–1538.
currence of High-Altitude Electromagnetic Pulse (HEMP), but our [11] M. Pourvali, F. Gu, K. Liang, K. Shaban, N. Ghani, Progressive recovery of virtual
contribution is also applicable for other types of disasters (namely, infrastructure services in optical cloud networks after large disasters, in: Proc.
the undamaged satellite network scenario). We proposed an al- OSA Optical Fiber Communication Conference, 2016, p. W1B.6.
[12] J. Rak, K. Walkowiak, Reliable anycast and unicast routing: protection against
gorithm which creates a node-to-node transmission schedule that attacks, Telecommun. Syst. 52 (2013) 889–906.
maximizes the amount of data evacuated from affected regions to [13] S. Neumayer, E. Modiano, Network reliability with geographically correlated
the main network. A possible architecture on top of which our pro- failures, in: Proc. IEEE INFOCOM, 2010, pp. 1–9.
[14] M. Liu, et al., The last minute: Efficient data evacuation strategy for sensor net-
posed algorithm can function was also shown. We demonstrated works in post-disaster applications, in: Proc. IEEE INFOCOM, 2011, pp. 291–295.
[15] S. Ferdousi, M. Tornatore, M.F. Habib, B. Mukherjee, Rapid data evacuation for
large-scale disasters in optical cloud networks, J. Opt. Commun. Netw. 7 (12)
2
Note that, in practical satellite networks, because coverage is not as uniform as (2015) B163–B172.
the coverage of our simulations, the three factors listed in this section might occur [16] A. Bianco, L. Giraudo, D. Hay, Optimal resource allocation for disaster recovery,
even without damage to the satellite constellation. in: Proc. IEEE GLOBECOM, 2010, pp. 1–5.
[17] J. Yao, P. Lu, Z. Zhu, Minimizing disaster backup window for geo-distributed [28] G. Araniti, et al., Contact graph routing in DTN space networks: overview, en-
multi-datacenter cloud systems, in: Proc. IEEE ICC, 2014, pp. 3631–3635. hancements and performance, IEEE Commun. Mag. 53 (3) (2015) 38–46.
[18] X. Xie, Q. Ling, P. Lu, W. Xu, Z. Zhu, Evacuate before too late: distributed [29] C. Sheng, Y. Tao, Worst-case i/o-efficient skyline algorithms, ACM Trans.
backup in inter-dc networks with progressive disasters, IEEE Trans. Parallel Database Syst. 37 (4) (2012) 26:1–26:22.
Distrib. Syst. PP (99) (2017). 1–1. [30] N. Megiddo, Optimal flows in networks with multiple sources and sinks, Math.
[19] L. Ma, et al., E-Time early warning data backup in disaster-aware optical in- Program. 7 (1) (1974) 97–107.
ter-connected data center networks, IEEE/OSA J. Opt. Commun. Netw. 9 (2017) [31] J.B. Orlin, Max Flows in O(nm) time, or better, in: Proc. ACM STOC, 2013.
536–545. [32] S.R. Pratt, R.A. Raines, C.E. Fossa, M.A. Temple, An operational and performance
[20] M. Erdelj, E. Natalizio, K.R. Chowdhury, I.F. Akyildiz, Help from the sky: lever- overview of the IRIDIUM low earth orbit satellite system, IEEE Commun. Surv.
aging UAVs for disaster management, IEEE Pervas. Comput. 16 (2017) 24–32. 2 (2) (1999) 2–10.
[21] D. Câmara, Cavalry to the rescue: drones fleet to help rescuers operations over [33] F. Liers, G. Pardella, Simplifying maximum flow computations: the effect
disasters scenarios, in: Proc. IEEE Conference on Antenna Measurements Ap- of shrinking and good initial flows, Discrete Appl. Math. 159 (17) (2011)
plications (CAMA), 2014. 2187–2203.
[22] I.F. Akyildiz, P. Wang, S.-C. Lin, Softair: a software defined networking architec- [34] V. Pommier-Maurussane, L. Maisonobe, Orekit: an open-source library for op-
ture for 5G wireless systems, Comput. Netw. 85 (2015) 1–18. erational flight dynamics applications, in: Proc. ESA ICATT, 2010.
[23] D. Hay, P. Giaccone, Optimal routing and scheduling for deterministic delay [35] B. Naveh, 2011, Jgrapht, Internet: http://jgrapht.sourceforge.net.
tolerant networks, in: Proc. IEEE WONS, 2009.
[24] F. Malandrino, C. Casetti, C.F. Chiasserini, M. Fiore, Optimal content download- Rafael B. R. Lourenço received the B. Eng. degree in com-
ing in vehicular networks, IEEE Trans. Mob. Comput. 12 (7) (2013) 1377–1391. munication and networks engineering from University of
[25] Y. Zeng, K. Xiang, D. Li, A.V. Vasilakos, Directional routing and scheduling for Brasilia, DF, Brazil, in 2012 and the Ph.D. degree in com-
green vehicular delay tolerant networks, Wireless Netw. 19 (2) (2013) 161–173. puter science from the University of California, Davis, CA,
[26] W. Brown, W. Hess, J. Van Allen, Collected papers on the artificial radiation USA, in 2018. In the Summers of 2016 and 2017 he in-
belt from the july 9, 1962, nuclear detonation, J. Geophys. Res. 68 (1963) terned at Google, working with their Software-Defined
605–606. Networks team. He was a grantee of the Brazilian Science
[27] D.J. Kessler, B.G. Cour-Palais, Collision frequency of artificial satellites: the cre- Without Borders Scholarship during his Ph.D.
ation of a debris belt, J. Geophys. Res. 83 (A6) (1978) 2637–2646.

Data Evacuation From Data Centers in Disaster Affected Region - 2019 - Computer PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Evacuation From Data Centers in Disaster Affected Region - 2019 - Computer PDF

Uploaded by

Copyright:

Available Formats

Computer Networks 148 (2019) 88–100

Contents lists available at ScienceDirect

Data evacuation from data centers in disaster-affected regions through

1. Introduction catastrophes caused by human actions. After events like these, it is

Works on evacuating data from the affected region after disas-

2.2. Providing high bandwidth in delay-Tolerant networks

II. Output: A node-to-node time schedule listing all transmissions

Fig. 4. (a) shows an example of two terrestrial networks communicating through

Algorithm 1 Event-Driven Graph Generation. 6. Scalability

18: Get from GC∗ the connected component CT of Target

7.1. Scenario a: Evacuating data from affected region

Evacuated Data (TB)

Evacuated Data (TB)

66 Constellation - No Damage 66 Constellation - No Damage

Evacuated Data (TB)

Buff = 10 TB; ISL = 10 Gbps Buff = 1 TB; ISL = 30 Gbps

720 Constellation - No Damage 720 Constellation - No Damage

Evacuated Data (TB)

720 Constellation - No Damage

Evacuated Data (TB)

No-buffer, No-ISL Buffer, No-ISL No-Buffer, ISL Buffer, ISL

You might also like