Professional Documents
Culture Documents
ABSTRACT These instructions give you guidelines for preparing papers for IEEE Access. Use this
document as a template if you are using LATEX. Otherwise, use this document as an instruction set. The
electronic file of your paper will be formatted further at IEEE. Paper titles should be written in uppercase
and lowercase letters, not all uppercase. Avoid writing long formulas with subscripts in the title; short
formulas that identify the elements are fine (e.g., "Nd–Fe–B"). Do not write “(Invited)” in the title. Full
names of authors are preferred in the author field, but are not required. Put a space between authors’ initials.
The abstract must be a concise yet comprehensive reflection of what is in your article. In particular, the
abstract must be self-contained, without abbreviations, footnotes, or references. It should be a microcosm
of the full article. The abstract must be between 150–250 words. Be sure that you adhere to these limits;
otherwise, you will need to edit your abstract accordingly. The abstract must be written as one paragraph,
and should not contain displayed mathematical equations or tabular material. The abstract should include
three or four different keywords or phrases, as this will help readers to find it. It is important to avoid over-
repetition of such phrases as this can result in a page being rejected by search engines. Ensure that your
abstract reads well and is grammatically correct.
INDEX TERMS Enter key words or phrases in alphabetical order, separated by com-
mas. For a list of suggested keywords, send a blank e-mail to keywords@ieee.org or visit
http://www.ieee.org/organizations/pubs/ani_prod/keywrd98.txt
VOLUME 4, 2016 1
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
ture to efficiently execute the complex simulations models networks. With the development of technology, use of elec-
[2]. PDES is a collection of processes running in parallel tronic devices is growing rapidly. Applications and services
that interact through messages. These messages are used to are hosted inside the Cloud and can be accessed through
encapsulate the events and they are used to drive the simu- any device connected to the internet. In addition, with the
lation among different processes. Events that simultaneously inception of internet-of-things (IoT), heterogeneous devices
run on multiple computing systems need to be synchronized. can become part of grid network. Consequently, available
Synchronization (or Time management) algorithms are used computation platforms have changed as compared to the
to ensure that events are processed in correct order adhering conventional cluster environment. This advancement in tech-
local causality constraint. Local causality constraint ensures nology introduced new challenges for research community.
that a parallel or distributed simulation produces the same In Cloud computing environment, the workload on a single
results as a sequential execution. node can affect the whole simulation process executing on the
Fig. 1, shows the classification of traditional synchroniza- system. This can cause longer wait time for event execution
tion algorithms. There are two basic categories of time flow that can give rise to the straggler message problem [3]–[5].
management: Similarly, with IoT, where any computing device can be-
• Time stepped approach, where the simulation time is come a part of a network to share its computing resources
evenly spaced along a sequence of equal sized time steps and store data. Conventional simulation protocols fail to
or intervals. perform efficiently on such network or devices. Most of the
• Event driven approach, where the simulation time does sensors and smart-phones are resource constrained, cannot
not progress in time steps but only when something keep a large amount of data and perform large computations.
interesting happens that is referred to as an “event.” Furthermore, the sending and receiving of data also costs
in terms of energy and data transfer rate. Because of ever
Time stepped approaches are only limited to specific ap-
growing demand for mobile devices and embedded systems
plications. PDES community widely adopted event driven
in microelectronics market, many modern applications are
approaches that are categorized as two famous synchroniza-
particularly designed to execute on mobile platforms. Overall
tion mechanisms, (1) conservative and (2) optimistic. In
performance in terms of energy consumption is the primary
conservative approach causality error is strictly avoided by
design factor for such applications. Therefore, to improve
applying strategies to determine when it is safe to process
the performance and energy efficiency of modern comput-
an event. Whereas, in optimistic approach simulation con-
ing applications and platforms, it is important to accurately
tinues until causality error is detected and then the error is
estimate their power and energy consumption and identify
handled by applying rollback mechanism for recovery and
critical parameters and modules for further optimizations.
later re-execution of rollback events. There are two types
Execution of large-scale distributed simulation over mo-
of conservative mechanisms that are synchronous and asyn-
bile and embedded devices opens new research areas that
chronous. Synchronous algorithms require global synchro-
need to be explored. Such research fields include energy-
nization to compute Lower Bound on Timestamp (LBTS)
aware distributed simulation and dynamic data-driven appli-
thus all LPs proceed in synchronous fashion. On the other
cations. Many aspects of such applications have already been
hand, asynchronous algorithms do not use global synchro-
focused in research community. These systems can be useful
nization mechanism and LPs proceeds asynchronously.
in many applications such as manufacturing, telecommunica-
tions, preparation for inclement weather, defense, intelligent
transportation systems and crisis management systems.
In battery operated systems, energy consumption is a ma-
jor concern; however, minimizing power consumption may
not always result in low energy consumption. Decreasing
the frequency at which the CPU operates results in low
power consumption but it increases the total time required
to complete the task. Thus, computations require more time
to complete. The problem becomes more complex in an
environment where heterogeneous devices are participating
in distributed simulations.
In traditional parallel and distributed simulation, logical
processes (LPs) are mapped onto different systems or pro-
FIGURE 1—Types of Synchronization Algorithms cessing cores. These LPs communicate with each other by
exchanging timestamped messages. However, process map-
The field of parallel & distributed simulations have been ping on heterogeneous devices or resource constraint devices
strongly influenced by emerging technologies. These tech- can affect the performance of entire simulation. Some sim-
nologies include massive parallel systems, Cloud computing, ulation algorithms need significantly large storage capacity
GPU computing, embedded computing systems and sensor to store the history of processed events and more compu-
2 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
tation is required to undo out-of-order execution. There- performance analysis and energy-aware simulation for HPC
fore, it increases the number of memory accesses, sending systems as well as mobile and embedded systems.
and receiving new event messages and execution of events; Many tools have been developed over the years for profil-
that requires significant amount of energy. Thus, resource ing performance and power consumption to generate, analyze
constraint devices are not suitable for traditional simulation and visualize the data of systems and applications from
algorithms. functional components. Tuning and analysis utilities [6], [7],
Traditional PDES protocols are very well tested and de- [8], [9] and [10] provide support for instrumentation and
signed for distributed systems and Cloud infrastructure but performance visualization of parallel applications. Isci et al.
still they need to be tested for handheld, mobile and IoT [11] presented an approach to estimate power consumption
devices. Moreover, traditional frameworks are not designed using performance logs. Similarly, authors in [12] provided
to support mobile or handheld devices that have memory a framework called PowerPack that can be used for energy
and energy constraints. Therefore, thorough analysis of tra- profiling and analysis of parallel applications on multi core
ditional PDES frameworks are required to migrate resource processors. Authors in [13] indicated that power profiles al-
hungry modules to the cloud and to be accessed through well- ways correspond to the characteristics of the application and
defined services. increasing number of nodes results in more power consump-
As a case study, we initially performed the instrumentation tion and not always results in better performance. Authors
of a PDES framework (Rensselaer's Optimistic Simulation in [14] analyzed the power consumption in relationship with
System - ROSS) on a desktop computing environment to software components. Their analysis shows that in different
measure power, CPU usage energy and memory consump- software scenarios, power consumption on a general-purpose
tion using PHOLD benchmark. This case study includes the computer system can vary from 12% to 20%. Authors in
results of serial, parallel conservative and parallel optimistic [15] and [16] focused on low power embedded systems to
approaches. This allows us to understand resource utilization analyze and benchmark HPC applications for their energy
of simulation approaches before adopting the best synchro- consumption.
nization model for handheld embedded and IOT devices. The There is substantial work available in power-aware com-
objective is to precisely identify the modules that are resource puting and various techniques has been deployed to reduce
hungry, so those can be executed on cloudlets and traditional the power consumption [17], [18] and [19]. Computing en-
frameworks. Accurately identifying these modules allows ergy consumed by each machine instructions is one way
simulations to be adapted to handheld devices with little to profile the energy consumption of all functional com-
modification. ponents. Tiwari et al. [20] discussed that functional level
profiles for energy consumption obtained through computing
Using these results, we proposed SEECSSim – a dis-
energy consumed by each machine instruction. However,
tributed simulation suite designed to work on mobile and
the proposed work is only limited to function level energy
embedded devices. It includes the core synchronization al-
marking. Whereas, communication between processes is also
gorithms as classified in fig. 1. The proposed suite includes
holding a major portion is parallel and distributed simulation.
Chandy−Misra−Bryant CMB NULL message algorithm,
In a distributed simulation, different techniques are used to
Time-Stepped, Tree Barrier, Time Warp and Time Warp with
reduce the communication delay between processes [21].
Wolf algorithm (wolf calls). SEECSSim, will help researchers
One such approach is to utilize the different cores available
in selecting suitable algorithms for mobile and embedded
in a physical system [22]. However, the synchronization
systems. A correctly selected energy efficient algorithm can
algorithms incurred overhead which is difficult to reduce.
exploit the true potential of embedded systems for highly
Most of the existing work on energy-efficient computing
scalable parallel and distributed simulation.
has been done for HPC environment [23]–[26]; where dif-
The rest of paper is organized as follows. Section II covers
ferent techniques are used to optimize the use of energy
the literature review. Some of the algorithms in SEECSSim
such as DVFS, process migration, task consolidation and
are discussed in Section III. Section IV, covers the result and
Dynamic Power Management (DPM). R. Child et al. in [27]
discussions. Finally, section V concludes the paper.
explored the features of Dynamic Voltage and Frequency
Scaling (DVFS) to enhance the performance and reduce
II. LITERATURE REVIEW power consumption by repeatedly reducing the operating
Despite the need of profiling power and performance of sim- frequencies of the cores. The authors investigated energy
ulation protocols and creating energy-aware PDES platforms, efficiency through DVFS while for Time Warp simulation
there are only few articles that address this issue. There exists algorithm. The proposed study is conducted over physical
substantial literature work related to energy-aware comput- systems using MPI version of wrapped TW simulator.
ing in the domain of High-Performance Computing (HPC), Similarly, G. Tom et al. [28] described the integration of
mobile computing platforms and Wireless Sensor Networks energy-aware module to simulate the energy consumption
(WSN). Moreover, different profiling methods, techniques, of distributed systems. Authors have provided an overview
and tools have been proposed over the years. In this section, of energy-aware simulations and described DVFS simulation
we have briefly discussed some related contributions for tools required for obtaining accurate simulations in terms
VOLUME 4, 2016 3
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
of power consumptions. This work is mostly related to the The experimental section shows the significant gain in terms
energy of cloud systems; therefore, authors have explored the of performance over traditional optimistic simulation.
DVFS and cloud simulators in detail. Moreover, they have Similarly, Yihua Wu et al. [35] presented a BOINC
also added DVFS features in one of the cloud simulator. based system for Cloud environment to execute parallel
Communication is a common performance bottleneck for and distributed simulation over private cloud infrastructures.
fine grained parallel applications like ROSS,. Authors in BOINC is a middle-ware developed for volunteer and grid
[29] and [30] have discussed the techniques for improv- computing. It is an open-source framework designed to sup-
ing network performance by reducing lock contention and port task distribution and result gathering in client server
overlapping communication. Jagtap et al. [31] analyzed the model. However, use of private cloud is not a recommended
performance of ROSS simulation framework on two different option due to its cost and other management factors. The
platforms and compared multi-threaded implementation with main reason for adopting private clouds is due to the sensitive
MPI based implementation. Erazo et al. [32] presented a nature of simulations results.
case study to profile the energy consumption of distributed In the context of distributed simulation over mo-
simulation tested it on their PRIME simulator [33]. The bile/embedded devices, Biswas et al. [36] discussed the
authors of PRIME Sim concluded that using more nodes to techniques to create the power profiles. Energy consumption
achieve parallelism results in significant increase in energy of simulation model, engine, computations and communi-
consumption. cations is separated to understand the energy consumption
In [34], Fujimoto shared future research challenges for of each aspect of simulation. They have also presented
PDES. These challenges include large-scale simulation of the comparative analysis of energy consumed by Chandy-
complex networks, exploiting GPU’s, Cloud computing ex- Misra-Bryant and YAWNS algorithms. Similarly, Malik et
ploitation, composable simulation and energy consumption al. [37] have analyzed the energy consumption of Time Warp
of PDES. In PDES energy consumption is less explored protocol over smart phones. Moreover, on-line distributed
with respect to other aspects. The minimization of power simulation such as traffic prediction system requires signif-
consumption through change in clock rate (DVFS) can even- icant amount of energy. Neal et al. [38] analyzed the energy
tually increase the overall energy required to complete the consumption of data driven traffic simulations on mobile
task. The schemes such as DVFS are more suited for data devices. Online traffic prediction requires significant amount
centers and super computers. Therefore, for IoT, the power- of energy therefore, understanding the energy consumption
aware and energy-aware techniques are more important for at various levels, helps in optimizing the use of resources.
design consideration of PDES over handheld devices as en- The authors presented the empirical investigation of modules
ergy consumption is based on many factors such as network such as data transmission, gathering and traffic computations.
communication, memory usage etc. Fujimoto et al. [39] has presented a detailed work on power
Traditional simulations are designed for cluster environ- efficient distributed simulation. The authors have covered
ment. With the advancement of technology, easy availability few conservative and optimistic synchronization algorithms
of infrastructure-as-a-service offers flexible computing envi- along with discussion on energy efficient distributed simu-
ronment on a pay-as-you-go model. New PDES techniques lations. The main objective of their work is to analyze the
are proposed for such cloud architectures. In [4] authors power consumption of various distributed simulation tech-
studied the execution of conservative algorithm over various niques along with profiling of simulation engine, application,
configurations of Amazon EC2. The objective is to see the and communication. The experiments are conducted on mul-
suitability of cloud platform for distributed discrete event tiple configurations, such as Jetson TK1 development board
simulation. For conservative algorithm, null messages play and quad-core LG Nexus 5 cellular phone.
a significant role in the performance of whole simulation In this study we have presented a detailed study of existing
system. Therefore, the authors tested various variations of traditional algorithms on hand held devices. The analysis
synchronization algorithms such as Chandy−Misra−Bryant, includes the execution time, CPU, memory usage, energy
time-out based null message sending, deadlock avoidance consumption and event rate. This article will serve as a guide-
based null message, on-demand null message, timeout pro- line for PDES community especially the new researchers,
tocols etc. The results showed that timeout and blocking to help them in selecting the right protocols for embedded
protocol performed better in a cloud environment. systems keeping in view the resource constraints.
Cloud provides a multi-tenant paradigm, therefore, ex- The next sections cover some background, experimental
ecution of optimistic PDES over cloud results in a large setup for Desktop and Mobile Dervices, their results and
number of rollbacks. This is because systems are not equally discsussions.
loaded in terms of number of jobs. Moreover, some tasks
require more computation whereas other need more com- III. BACKGROUND
munication. Similarly, Asad et al. [3] proposed a PDES A brief description of Parallel and Distributed Simulation
model for cloud environment to improve the performance of platforms specially Rensselaer's Optimistic Simulation Sys-
optimistic parallel simulation through dynamically defining tem (ROSS) is presented in this section. We have selected
the barrier points, to reduce the total number of rollbacks. ROSS to present a case study for the instrumentation of
4 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
VOLUME 4, 2016 5
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
PDES systems with different synchronization algorithms. B. RENSSELAER'S OPTIMISTIC SIMULATION SYSTEM
Few features of ROSS framework are discussed and ex- (ROSS):
plained. In Table 1, we have briefly explained different PDES In this study, we have used ROSS, a PDES framework that is
frameworks to have an understanding of how some of these based on Message Passing Interface (MPI). ROSS is a high
frameworks have be designed previously. Software and hard- performance and extremely modular PDES system that uses
ware tools and techniques to perform instrumentation and small (and constant) amount of memory to keep state vari-
power consumption analysis are discussed in the last part of ables and execute events [53]. Its modular implementation,
the section. use of reverse computation, Kernel Processes and Fujimoto's
GVT (Global Virtual Time) algorithm makes it a state of
the art simulation system to perform experimental studies.
A. PARALLEL DISCRETE EVENT SIMULATION (PDES) Continuous analysis shows that ROSS outperforms the fa-
A Discrete Event Framework is a system where state changes mous GTW (Georgia-Tech Time Warp) System. In this study,
(events) happen at discrete occurrences in time, and events we analyze the performance and power consumption of the
take zero time to happen. It is accepted that nothing (inter- ROSS simulation implementation under classical PHOLD
esting) happens between two continuous events, that means, simulation model benchmark. While ROSS is based on op-
no state change happened in the system between the events. timistic scheduling approach it also provides implementation
Such frameworks that can be categorized as Discrete Event for both conservative and sequential approaches. Our study
Frameworks can be modeled using Discrete Event Simula- includes performance and power consumption analysis of all
tion (DES) Systems. sequential, conservative and optimistic approaches.
6 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
1) PHOLD Benchmark Model: in this section are averaged for three independent simulation
To evaluate the performance of synchronization protocols runs over the specified system and using the profiling tools
used in parallel and distributed simulation systems, many mentioned in the previous section.
different benchmark models have been used. PHOLD is one
of these most widely used benchmark application [54], [55]. A. PHOLD EXECUTION
PHOLD (Parallel HOLD) is a parallel version of HOLD Results obtained by executing PHOLD benchmark model in
model that is a synthetic benchmark used for performance ROSS framework with varying number of logical processes
analysis of sequential discrete event list simulation algo- are discussed in this section. To determine how a compo-
rithms. PHOLD model consists of N fully connected logical nent or more specifically a synchronization algorithm of a
processes(LPs). Simulation model starts with a number of simulation engine should be developed for mobile devices,
objects known as Logical Processes (LPS) with each LP we have done comparative analysis of synchronization algo-
having a fixed number of events. Event execution function rithms in terms of their CPU usage, memory consumption,
sends a message to another LP that is selected uniformly total execution time, energy and power consumption. These
among all the logical processes in the simulation. On receiv- result helped us to determine which simulation model is
ing this message each LP sends another event message to efficient in terms of those parameters that are used in perfor-
the neighboring LP. In this way, number of total events in mance analysis. Using these results not only we can develop
the simulation remains constant. The message size, message efficient algorithms but also improve the performance of
population, timestamp increment and the message routing existing algorithms. Moreover, simulation components that
probabilities can be varied to test the simulation system [56]. are resource hungry can be redesigned or moved to cloudlets.
Different profiling tools that we have been used are briefly Detailed results of serial execution, parallel conservative and
described here: optimistic simulation algorithms are presented in this section.
Results only specific to PHOLD benchmark in ROSS
2) Energy, Memory Consumption - Allinea Map: framework are presented in Tables 2, 3 and 4 respectively.
Allinea MAP is a profiling tool designed for wide range of In all simulations, linear mapping is used between logical
applications including parallel, single threaded and multi- and physical processes. Total number of events are kept
threaded (Pthread, OpenMP and MPI) applications that are constant and are also used as stopping condition for the sim-
based on Fortran, C, C++ [57]. Through analysis of any target ulation. As it was obvious, serial execution takes more time
application pinpoints the bottlenecks in the code execution as compared to parallel conservative and parallel optimistic
and keeps logs for power, energy and memory consumption simulation with varying number of LPs. For 1024 LPs, se-
traces. quential simulation took 34.827 seconds to complete whereas
conservative and optimistic parallel approaches took 21.4
3) CPU Usage - Intel® Vtune Amplifier: and 24.7 seconds respectively. For 524288 LPs, the sequen-
Intel® VTune™ Amplifier is a profiling tool that is used tial execution took 8.72 hours whereas parallel conservative
to optimize code for better performance by profiling CPU took 4.04 hours and parallel optimistic execution took 5.99
usage of the system. It provides a user friendly interface hours. Results showed that conservative simulation execution
to analyze and obtain results using enriched performance outperforms the others techniques while running simulations
insights. It helps application developers to develop code that with different number of LPs. As discussed earlier, in op-
is more threaded, scalable, vectorized and tuned. We have timistic simulation, there are out of order event executions
used VTune™ amplifier to check the average CPU usage and that cause some events to rollback and then re-execute. Thus
the amount of time that ROSS framework spends on locks the total number of events are more than the committed
and waits in a parallel setup. events that results in reduced performance as compared to
the optimistic approach. Other functions that are the reason
4) CPU Cores Temperature - Intel® SoC Watch: for increased execution time are GVT computations, fossil
Intel SoC is a command-line utility designed under the collection and reverse computation.
umbrella of Intel energy profiler (a set of data collector Other interesting results specific to PHOLD execution in
tools) [58]. we have used this utility to study the temperature ROSS framework are memory usage and wastage in case of
profiles of CPU cores during execution of ROSS framework serial and both parallel techniques. For execution with 1024
for different numbers of LPs. LPs, serial execution used a total of 12,080 MBs, parallel
conservative used 11,528 MBs and optimistic used 29,768
V. PRELIMINARY RESULTS AND ANALYSIS OF CASE MBs. Similarly for 524,288 LPs serial execution used a total
STUDY of 38,8176 MBs, parallel conservative used 105,552 MBs
This section section contains in-depth results obtained from and optimistic used 123,792 MBs to complete the simulation.
the analysis of simulation runs obtained using various differ- This trend is similar to the case of total execution time and
ent analysis tools. All the results including CPU usage, CPU thus the reasons is same that every LP must implement the
Temperature, Wait time, Power, Energy and Memory con- rollback mechanism, therefore, the LPs needs to maintain the
sumption are discussed in sub-sections. The results reported history of all events to handle straggler/transient, and anti-
VOLUME 4, 2016 7
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 3: Results - Parallel Conservative Execuation of PHOLD with Varying Number of LPs
LPs
1024 2048 4096 8192 16348 32768 65536 131072 262144 524288
Running Time (Seconds) 21.432 42.233 83.687 169.071 375.403 770.323 1598.350 3513.903 7212.069 14542.773
Event Rate (million events/sec) 4.78 4.85 4.89 4.85 4.36 4.25 4.100 3.73 3.63 3.60
Memory Allocated (MB) 11528 11712 12080 12816 12873 17232 23120 34896 58448 105552
Memory Wasted (MB) 677 629 533 341 469 213 213 213 213 212
Total Events Processed (billions) 0.102 0.205 0.410 0.819 1.638 3.277 6.554 13.107 26.214 52.428
Total GVT Computations (million) 0.200 0.300 0.500 0.900 1.700 3.300 6.504 12.920 25.751 51.394
Efficiency (%) 100 100 100 100 100 100 100 100 100 100
TABLE 4: Results - Parallel Optimistic Execution of PHOLD with Varying Number of LPs
LPs
1024 2048 4096 8192 16348 32768 65536 131072 262144 524288
Running Time (Seconds) 24.727 50.468 100.051 216.820 510.869 1197.870 1187.070 5065.395 10401.267 21598.115
Event Rate (million events/sec) 4.141 4.058 4.094 3.778 3.207 2.735 2.760 2.588 2.520 2.427
Memory Allocated (MB) 29768 29952 30320 31056 32528 35472 35472 53136 76688 123792
Memory Wasted (MB) 869 821 725 533 149 405 405 405 405 404
Fossil Collect Attempts (million) 0.408 0.808 1.609 3.210 6.410 12.811 25.611 51.212 102.410 204.811
Total Events Processed (billion) 0.104 0.207 0.412 0.822 1.641 3.280 3.280 13.110 26.217 52.432
Total GVT Computations (million) 0.102 0.202 0.402 0.802 1.603 3.203 3.203 12.803 25.603 51.203
Total Roll Backs 133218 79114 45681 23980 12143 5983 5970 1665 716 421
PrimaryRoll Backs 105890 63101 35860 19571 10561 5541 5448 1578 681 398
SecondaryRoll Backs 27328 16013 9821 4409 1582 442 522 87 35 23
Efficiency (%) 98.13 98.96 99.43 99.69 99.84 99.90 99.92 99.98 99.99 99.99
anti-messages.
FIGURE 3—Energy consumption of sequential, parallel FIGURE 4—Power consumption of sequential, parallel
conservative and optimistic approaches for varying number conservative and optimistic approaches for varying number
of LPs of LPs
have shown that the sequential execution that did not used
many CPU resources initially but still consumed a lot more
energy in the end because of its longer execution time.
2) Power Consumption
The CPU power consumption is a significant portion of over-
all power consumed by Desktop computers. It is represented
in watts and is a combination of electrical energy used by
CPU while performing various tasks per unit time and the
energy dissipated in the form of heat during the course of
execution. Here, we have discussed the total CPU power
consumption; whereas temperature statistics are discussed
later. Figure 4 show that for less umber of LPs such as 1024
and 2048 the results of CPU power consumption show a
similar trend for conservative and optimistic approaches but FIGURE 5—Memory usage of sequential, parallel
overall conservative approach consumed more power, while
conservative and optimistic approaches for varying number
CPU power consumption of serial is very low as compared to
both parallel versions (conservative and optimistic) as it uses of LPs
a single core but that eventually results in higher execution
time.
time spent on execution of some major functions and their
3) Memory Usage corresponding individual child functions. In all three sim-
Memory usage is also an important parameters in the design ulation versions total events are kept constant and linear
of parallel applications like ROSS. Memory usage is highest mapping is used between logical and physical processes.
for optimistic simulations and is lowest in the case of serial Functional level percentage time for each sequential sim-
executions. Figure 5 shows the memory usage of sequential, ulation execution for different number of LPs aregiven in
parallel conservative and optimistic approaches with varying Table 5. Results shows that serial simulation with 1024
number of LPs. LPSs took 34.8 seconds but increasing the number of LPs
to 524288, serial simulation took 8.72 hours to complete.
4) Functional Level Execution Time texitittw_scheduler_sequential is the main executing func-
In this section, we have presented execution time results tion; responsible for event processing, memory management
of core functions for serial, and parallel conservative and and virtual time computation. Parallel versions of PHOLD
optimistic simulation execution. Table 5, 6 and 7 contain simulations showed more interesting results.
functional level execution time results respectively for vary- Table 6 contains percentage execution time results for sim-
ing number of logical processes (1024, 2048, 32768, 262144 ulation functions with portion of compute time for parallel
and 524288). These tables contain functional hierarchy and conservative execution. Total execution time of the simu-
10 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 6: Functional Level Execution Time for Parallel Conservative Version of ROSS
LP’s
1024 2048 32768 262144 524288
Functions (% Time) Total MPI Total MPI Total MPI Total MPI Total MPI
tw_run 99.7 43 99.8 40 99.9 32 100 27 100 27
tw_scheduler_conservative 99.5 43 99.8 40 99.9 32 100 27 100 27
phold_event_handler 59 14 64 14 55 12 58 10 58 11
tw_event_send 21 14 22 14 18 12 19 10 21 11
tw_rand_exponential 15 - 16 - 13 - 12 - 11 -
tw_event_new 14 - 15 - 14 - 11 - 12 -
rng_gen_val 6 - 8 - 8 - 13 - 12 -
Others 3 - 3 - 2 - 3 - 2 -
tw_net_read 17 15 17 14 16 12 16 11 16 11
service_queues 17 15 17 14 16 12 16 11 16 11
Others 0 - 0 0 0 0 0 0 0 0
tw_gvt_step2 14 14 9 12 10 8 6 6 6 5
MPI_Allreduce 13 13 8 11 10 8 5 5 5 5
Others 1 1 1 1 0 - 1 1 1 -
Others 9.5 - 9.8 - 18.9 - 20 - 20 -
Others 0.3 - 0.2 - 0.1 - 0 - 0 -
Execution Time (sec) 21.4 42.2 770.3 7212.1 14542.8
lation code for 1024 LPs was 21.4 seconds out of which done in sequential fashion essentially. for rollbacks, as no
parallel execution time was 9.2 seconds. similarly when process can rollback to a timestamp smaller than GVT value
the number of LPs is increased to 524288, total execution [59]. This trend can be seen in 6 and 7 as the number of LPs
time was about 4.04 hours with parallel execution time of increase there is decrease in parallel execution time of GVT
1.09 hours. This gives us a rough idea about the degree of calculation function thus spending more time in sequential
parallelism that MPI based ROSS provides as compared to GVT calculation due to rollbacks.
the sequential execution; as the execution time is decreased Table 7 contains functional level percentage execution
to half in the case of parallel conservative executions. But time results for parallel optimistic simulation alogn with
it is interesting to note that for parallel version degree of execution execution time for different numbers of LPs. Total
parallelism tends to decrease as we increase the number of execution time of the simulation code for 1024 LPs was
LPs. This decrease in parallel execution time is because there 24.7 seconds out of which parallel execution time was 9.14
is an earliest time tag known as Global Virtual Time (GVT) seconds. Similarly, for 524288 LPs total execution time was
or (LBTS in the case of conservative synchronization) associ- about 6 hours with parallel execution time of about 1.38
ated with unprocessed pending events. This virtual time value hours. Similar to the case of conservative approach, serial
act as a lower bound in the simulation time that guarantees GVT computations that need to be performed in sequential
that there will be no event message with time-stamp lower manner are used to find a time in the past for which it is
than this GVT value. These GVT computations needs to be guaranteed that there will be no roll back before this time.
For this reason, in table 6 and 7 it can be seen that there
VOLUME 4, 2016 11
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
is considerable increase in computation time of tw_gvt_step consumption for optimistic and conservative execution is that
function as the number of LPs increases. More Roll back a significant part of energy is dissipated in the form of heat.
cause more reverse computation overhead. The use of reverse
computation in parallel discrete event simulation is to reduce E. ANALYSIS OF ROSS FRAMEWORK
state saving. For energy constraint system we had to make we have presented in-depth instrumentation results for serial,
sure that there are no excessive roll backs that may cause parallel conservative and optimistic approaches. This helped
longer execution time. Sometimes it starts a chain of roll- us to determine the critical parameters that we need to fo-
backs; primary rollbacks cause secondary roll transitively, to cus while designing synchronization algorithms for mobile
reverse the effect of previously sent messages [60]. While platforms. A serial execution of the simulation model will
in conservative simulation causality error is avoided by per- consume few resources but it will take a lot more time thus
forming more GVT computations that is why execution time will not be efficient and effective for most of the real time
of conservative approach is better than optimistic algorithm. mobile applications. Some applications that need to perform
less extensive simulation tasks with no need to provide real
D. CPU TEMPERATURE - INTEL® SOC WATCH RESULTS time results, serial execution model can be used effectively.
In this section, we discuss the temperature statistic for each For parallel conservative and optimistic simulation mod-
core while PHOLD application is being executed. For serial els, a good strategy could be to migrate resource extensive
execution, It was observed that the temperature of a specific functions (or modules) to cloud-lets for their execution. In
core remained higher as compared to others. This was due to this analysis, the execution time of core functions are listed
the fact that serial execution utilizes only a single core. The in Table 5, 6 and 7. This kind of detailed analysis is the first
statistics showed that in the serial execution the minimum step of the adoption of any simulation framework on mobile
average temperature is 36.1°C and the maximum average handheld devices. moreover, a second step would be to
temperature goes up to 54.4°C. evaluate the migration cost of compute and energy intensive
code to cloud-lets. In some cases, this evaluation can be done
before running the simulation (e.g. based on information
derived from previous runs of the simulator) but often it needs
to be executed at run-time. In fact, the decision to migrate
or not to migrate a simulation module is based on a many
parameters, some of these can be unknown or unpredictable
a priori. For this reason, heuristic approaches for the dynamic
(and adaptive) reallocation of these modules could be very
promising. Next sections of the paper covers our proposed
simulation suite, its detailed results and performance analysis
for mobile platforms.
TABLE 7: Functional Level Execution Time for Parallel Optimistic Version of ROSS
LP’s
1024 2048 32768 262144 524288
Functions (% Time) Total MPI Total MPI Total MPI Total MPI Total MPI
tw_run 99.8 37 99.9 38 99.9 26 99.9 23 99.9 23
tw_scheduler_optimistic 99.6 37 99.7 38 99.9 26 99.9 23 99.9 23
tw_sched_batch 60 13 58 13 54 8 52 7 50 7
phold_event_handler 52 13 51 13 39 8 37 7 38 7
tw_event_send 18 13 18 13 13 8 13 7 13 7
tw_event_new 11 - 11 - 9 - 8 - 8 -
rng_gen_val 7 - 5 - 7 - 7 - 8 -
tw_rand_exponential 13 - 14 - 9 - 8 - 7 -
Others 3 - 3 - 1 - 0 - 1 -
tw_gvt_step2 18 11 20 12 28 8 28 7 29 8
tw_pe_fossil_collect 7 0 8 0 19 - 20 0 20 0
MPI_Allreduce 10 10 11 11 8 8 7 7 7 7
Others 1 1 1 1 1 0 0 0 2 1
tw_net_read 20 12 21 13 17 9 20 8 21 9
service_queues 20 12 21 13 17 9 20 8 21 9
test_q 9 2 9 2 9 0.8 12 1.2 13 1
recv_begin 11 10 12 11 9 8 8 7.1 8 7
Others 0 - 0 - 0 0.2 0 0 0 1
tw_kp_rollback_to 1.4 0.6 0.8 0.2 <0.1 <0.1 <0.1 <0.1 <0.1 <0.1
tw_event_rollback 1.2 0.6 0.8 0.2 <0.1 <0.1 <0.1 <0.1 <0.1 <0.1
Others 0.2 - 0 - 0 0 0 0 0 0
Others 0 - 0 - 0 0 0 0 0 1
Others 1.6 - 0.7 - 0.9 1 0 1 0 0
Others 0.2 0 0.1 0 0.1 0 0.1 0 0.1 0
Execution Time (sec) 24.7 50.5 1197.9 10401.3 21598.1
FIGURE 9—Tree Barrier Execution Model on Mobile FIGURE 11—Time Warp Execution on Mobile Platform
Platform
Results of Time Warp algorithm with the varying number
of LPs are shown in fig. 11. The total number of events
processed, the number of total rollback events and total GVT
computations are plotted for varying number of LPs. For
increasing number of LPs, the total rollback events does not
increase rapidly as compared to the case of Null messages in
CMB algorithm.
1) CPU Usage
– Average CPU usage of all four synchronization algorithms
is shown in fig. 13. Time-stepped approach consumed the
least CPU resources whereas TW approach consumed the
most. The reason for excessive CPU utilization for time
warp algorithm is that it need to process more events as
compared to other approaches. Moreover, during rollbacks,
more events become pilled up in the input queue that is
FIGURE 14—Memory Consumption – CMB NULL
ready to execute, thus more CPU work is required. As TW
allows the events to execute without abiding local causality Message, Time-Stepped, Tree Barrier and Time Warp
constraint, then it has to roll back out of order executions Algorithm.
and then re-execute them causing more computations. GVT
computations and fossil collect attempts are also responsible
for greater CPU utilization. CPU usage for CMB algorithm is 2) Memory Consumption
less as compared to time warp but more than tree barrier. This – Memory consumption is one of the important parameters
is because, a number of NULL messages it has to generate that need to be considered for embedded or mobile devices
and transmit to process a single event. Similarly, the look-a- which has limited memory to use. As discussed in the
head value, also has an impact on the performance of CMB. previous section that the TW algorithm initially executes
The time-stepped approach performed better compared to events without synchronizing with other processes, In order
others. However, it is important to note that the term CPU to perform rollbacks, each LP has to save state variables
utilization for mobile devices is not the same as for Desktop and processed event. This state saving mechanism requires
systems. As for Desktop computer, more CPU utilization a significant amount of memory compared to the other
can be termed as better CPU utilization as consistent power techniques discussed here. Other approaches consume less
supply is always available. On the other hand, for mobile amount of energy and are very close to each other in terms of
devices, more CPU usage means more energy consumption. their memory consumption. The memory comparison for all
That will consume the limited battery more rapidly. algorithms is shown in Fig. 14.
3) Energy Consumption
– Energy consumed by different synchronization algorithms
1 https://developer.qualcomm.com/software/trepn-power-profiler is presented in the Fig. 15. Here energy is given in milliwatt-
VOLUME 4, 2016 17
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 15—Energy Consumption – CMB NULL FIGURE 16—Total Execution Time – CMB NULL
Message, Time-Stepped, Tree Barrier and Time Warp Message, Time-Stepped, Tree Barrier and Time Warp
Algorithm. Algorithm.
hour, the watt-hour (Wh) is a unit of energy equivalent Time Warp is termed as one of the most efficient and
to one watt (1W) of power expended for one hour (1h) extensively used algorithm in distributed simulations. But it
of time, thus, a milliwatt hour is 1/1000 Wh (symbolized consumes a lot of energy that is limited for mobile or embed-
mWh). In the figure, number of LPs and amount of energy ded devices. The energy consumption and CPU utilization of
is plotted on logarithmic scale. It is important to note the TW algorithm can be improved using techniques that help
energy consumption of algorithms relative to each other to reduce the number of rollbacks. One of the well-known
rather than their individual energy consumption. As energy technique is Wolf Call [65]. In the Wolf Calls protocol, when
is computed by multiplying power with time, therefore, if a logical processor detects that there is a straggler message
the execution time increases for completing the simulation its received in the past, it sends a control message to all the LPs
energy consumption also increases. Battery operated mobile causing the LPs to stop processing until the error is removed.
devices are energy constrained, so their design requirement These control messages are called wolf calls. A better way
is to minimize the total amount of energy consumption to to improve the performance of the wolf call algorithm is to
complete the given computation task. Parallel and distributed make sure that only those LPs stop processing to which the
simulations that need to be executed within deadlines require error may have spread. There are other techniques that can
more energy and power at the same time. Like the trend be adopted as well, such as; lazy, re-lazy cancellation, and
depicted in CPU usage, Time Warp is energy extensive as reverse computation. The proposed SEECSSim simulation
compared to the other algorithms. On the other hand, Time suite includes one of the optimization techniques i.e. Wolf
Stepped and Tree Barrier approaches consume lesser amount Calls. Results in the following figure 18 suggest that the
of energy. The energy consumption of CMB is nearly the energy consumption of Time Warp is improved considerably
same as the Time Stepped and Tree Barrier approaches. with increasing number of logical processes.
It is important to consider that the improvement in energy
4) Total Execution Time consumption is achieved at the expense more execution time.
– Previous results clearly show that the performance of the But this does not greatly reduce the performance of the
TW is below par as compared to other techniques discussed simulation system as the execution time of Time Warp with
in this paper. Time stepped approach is better among all Wolf Call protocol is still better than Tree Barrier and Time
the synchronization algorithms (except for execution time), Stepped approach as shown in figure.
however, it cannot exploit the true parallelism. In distributed
systems, the performance is usually measured in terms of VIII. CONCLUSION
speedup and execution time. As shown in the Fig. 16, that It is worth noting that, in comparison with traditional sys-
the total execution time for the Tree barrier and Time Stepped tems, the handheld devices provide a limited amount of
approach is more than other two approaches. The simulations computational resources. Therefore, the completion time of
using Chandy, Misra and Bryant algorithm and Time Warp the simulations on handheld devices usually increases in
algorithm performs better in terms of execution time. Thus, comparison to traditional systems. This is due to the fact
we can conclude that the CMB algorithm is adequate in terms that handheld devices uses ARM based processors that are
of execution time as well as energy consumption. designed for optimizing the energy consumption ( [66] [67]).
18 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
VOLUME 4, 2016 19
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
[22] K. Kumar, P. Rajiv, G. Laxmi, and N. Bhuyan, “Shuffling: a framework [41] J. H. Cowie, D. M. Nicol, and A. T. Ogielski, “Modeling the global
for lock contention aware thread scheduling for multicore multiprocessor internet,” Computing in Science & Engineering, vol. 1, no. 1, pp. 42–50,
systems,” in Parallel Architecture and Compilation Techniques (PACT), 1999.
2014 23rd International Conference on. IEEE, 2014, pp. 289–300. [42] L. Bononi, M. Bracuto, G. DâĂŹAngelo, and L. Donatiello, “Artis: a
[23] M. Curtis-Maury, A. Shah, F. Blagojevic, D. S. Nikolopoulos, B. R. parallel and distributed simulation middleware for performance evalua-
De Supinski, and M. Schulz, “Prediction models for multi-dimensional tion,” in International Symposium on Computer and Information Sciences.
power-performance optimization on many cores,” in Proceedings of the Springer, 2004, pp. 627–637.
17th international conference on Parallel architectures and compilation [43] G. DâĂŹAngelo, “The simulation model partitioning problem: an
techniques. ACM, 2008, pp. 250–259. adaptive solution based on self-clustering,” Simulation Modelling Practice
[24] X. Feng, R. Ge, and K. W. Cameron, “Power and energy profiling of scien- and Theory (SIMPAT), vol. 70, pp. 1 – 20, 2017. [Online]. Available:
tific applications on distributed systems,” in Parallel and Distributed Pro- http://www.sciencedirect.com/science/article/pii/S1569190X16302350
cessing Symposium, 2005. Proceedings. 19th IEEE International. IEEE, [44] G. D’Angelo and S. Ferretti, “Lunes: Agent-based simulation of p2p
2005, pp. 10–pp. systems,” in High Performance Computing and Simulation (HPCS), 2011
[25] S. Hua and G. Qu, “Approaching the maximum energy saving on em- International Conference on. IEEE, 2011, pp. 593–599.
bedded systems with multiple voltages,” in Proceedings of the 2003 [45] A. I. McInnes and B. R. Thorne, “Scipysim: towards distributed hetero-
IEEE/ACM international conference on Computer-aided design. IEEE geneous system simulation for the scipy platform (work-in-progress),”
Computer Society, 2003, p. 26. in Proceedings of the 2011 Symposium on Theory of Modeling & Sim-
[26] C. Lively, V. Taylor, X. Wu, H.-C. Chang, C.-Y. Su, K. Cameron, S. Moore, ulation: DEVS Integrative M&S Symposium. Society for Computer
and D. Terpstra, “E-amom: an energy-aware modeling and optimization Simulation International, 2011, pp. 89–94.
methodology for scientific applications,” Computer Science-Research and [46] A. Pellegrini, R. Vitali, and F. Quaglia, “The rome optimistic simulator:
Development, vol. 29, no. 3-4, pp. 197–210, 2014. core internals and programming model,” in Proceedings of the 4th Inter-
[27] R. Child and P. A. Wilsey, “Using dvfs to optimize time warp simulations,” national ICST Conference on Simulation Tools and Techniques. ICST
in Proceedings of the Winter Simulation Conference. Winter Simulation (Institute for Computer Sciences, Social-Informatics and Telecommunica-
Conference, 2012, p. 288. tions Engineering), 2011, pp. 96–98.
[28] T. Guérout, T. Monteil, G. Da Costa, R. N. Calheiros, R. Buyya, and [47] Y. M. Teo and Y. K. Ng, “Spades/java: object-oriented parallel discrete-
M. Alexandru, “Energy-aware simulation with dvfs,” Simulation Mod- event simulation,” in Simulation Symposium, 2002. Proceedings. 35th
elling Practice and Theory, vol. 39, pp. 76–91, 2013. Annual. IEEE, 2002, pp. 245–252.
[48] L. Toscano, G. D’Angelo, and M. Marzolla, “Parallel discrete event
[29] R. M. Yoo, C. J. Hughes, K. Lai, and R. Rajwar, “Performance evaluation
simulation with erlang,” in Proceedings of the 1st ACM SIGPLAN
of intel® transactional synchronization extensions for high-performance
workshop on Functional high-performance computing, ser. FHPC’12.
computing,” in 2013 SC-International Conference for High Performance
New York, NY, USA: ACM, 2012, pp. 83–92. [Online]. Available:
Computing, Networking, Storage and Analysis (SC). IEEE, 2013, pp.
http://doi.acm.org/10.1145/2364474.2364487
1–11.
[49] G. D’Angelo, S. Ferretti, and M. Marzolla, “Time warp on the go,”
[30] K. K. Pusukuri, R. Gupta, and L. N. Bhuyan, “Shuffling: a framework
in Proceedings of the 5th International ICST Conference on Simulation
for lock contention aware thread scheduling for multicore multiprocessor
Tools and Techniques, ser. SIMUTOOLS ’12. ICST, Brussels, Belgium,
systems,” in Proceedings of the 23rd international conference on Parallel
Belgium: ICST (Institute for Computer Sciences, Social-Informatics and
architectures and compilation. ACM, 2014, pp. 289–300.
Telecommunications Engineering), 2012, pp. 242–248.
[31] D. Jagtap, N. Abu-Ghazaleh, and D. Ponomarev, “Optimization of parallel [50] E. Mikida, N. Jain, L. Kale, E. Gonsiorowski, C. D. Carothers, P. D.
discrete event simulator for multi-core systems,” in Parallel & Distributed Barnes Jr, and D. Jefferson, “Towards pdes in a message-driven paradigm:
Processing Symposium (IPDPS), 2012 IEEE 26th International. IEEE, A preliminary case study using charm++,” in Proceedings of the 2016
2012, pp. 520–531. annual ACM Conference on SIGSIM Principles of Advanced Discrete
[32] M. A. Erazo and R. Pereira, “On profiling the energy consumption Simulation. ACM, 2016, pp. 99–110.
of distributed simulations: A case study,” in Proceedings of the 2010 [51] ROSS, “Rensselaer’s optimistic simulation system,” https://carothersc.
IEEE/ACM Int’l Conference on Green Computing and Communications github.io/ROSS, 2017, accessed March 20, 2017.
& Int’l Conference on Cyber, Physical and Social Computing. IEEE [52] R. M. Fujimoto, “Parallel discrete event simulation,” Communications of
Computer Society, 2010, pp. 133–138. the ACM, vol. 33, no. 10, pp. 30–53, 1990.
[33] J. Liu, “The prime research,” 2007. [53] C. D. Carothers, D. Bauer, and S. Pearce, “Ross: A high-performance, low-
[34] R. M. Fujimoto, “Research challenges in parallel and distributed sim- memory, modular time warp system,” Journal of Parallel and Distributed
ulation,” ACM Transactions on Modeling and Computer Simulation Computing, vol. 62, no. 11, pp. 1648–1669, 2002.
(TOMACS), vol. 26, no. 4, p. 22, 2016. [54] K. S. Perumalla, “Scaling time warp-based discrete event execution to
[35] Y. Wu, J. Cao, and M. Li, “Private cloud system based on boinc with 104 processors on a blue gene supercomputer,” in Proceedings of the 4th
support for parallel and distributed simulation,” in Dependable, Autonomic international conference on Computing frontiers. ACM, 2007, pp. 69–76.
and Secure Computing (DASC), 2011 IEEE Ninth International Confer- [55] D. W. Bauer Jr, C. D. Carothers, and A. Holder, “Scalable time warp on
ence on. IEEE, 2011, pp. 1172–1178. blue gene supercomputers,” in Proceedings of the 2009 ACM/IEEE/SCS
[36] A. Biswas and R. Fujimoto, “Profiling energy consumption in distributed 23rd Workshop on Principles of Advanced and Distributed Simulation.
simulations,” in Proceedings of the 2016 annual ACM Conference on IEEE Computer Society, 2009, pp. 35–44.
SIGSIM Principles of Advanced Discrete Simulation. ACM, 2016, pp. [56] R. M. Fujimoto, “Performance of time warp under synthetic workloads,”
201–209. 1990.
[37] A. W. Malik, I. Mahmood, and A. Parkash, “Energy consumption of tradi- [57] Allinea, “Allinea-map,” http://www.allinea.com/products/map, 2017, ac-
tional simulation protocol over smartphones: an empirical study (wip),” in cessed April. 2, 2017.
Proceedings of the Summer Computer Simulation Conference. Society [58] Intel, “Intel® soc watch,” https://software.intel.com/en-us/node/589913,
for Computer Simulation International, 2016, p. 23. 2017, accessed March 29, 2017.
[38] S. Neal, R. Fujimoto, and M. Hunter, “Energy consumption of data [59] K. R. Rouse, William Boff, Organizational simulation. Wily-Interscience,
driven traffic simulations,” in Winter Simulation Conference (WSC), 2016. 2005.
IEEE, 2016, pp. 1119–1130. [60] K. S. Perumalla, Introduction to reversible computing. Chapman and
[39] R. M. Fujimoto, M. Hunter, A. Biswas, M. Jackson, and S. Neal, “Power Hall/CRC, 2013.
efficient distributed simulation,” in Proceedings of the 2017 ACM SIGSIM [61] K. Shenoy, “Techniques for optimizing time-stepped simulations,” 2004.
Conference on Principles of Advanced Discrete Simulation. ACM, 2017, [62] R. Garg, V. K. Garg, and Y. Sabharwal, “Efficient algorithms for global
pp. 77–88. snapshots in large distributed systems,” IEEE Transactions on Parallel and
[40] L. Bajaj, M. Takai, R. Ahuja, K. Tang, R. Bagrodia, and M. Gerla, “Glo- Distributed Systems, vol. 21, no. 5, pp. 620–630, 2010.
mosim: A scalable network simulation environment,” UCLA Computer [63] K. M. Chandy and J. Misra, “Distributed simulation: A case study in
Science Department Technical Report, vol. 990027, no. 1999, p. 213, design and verification of distributed programs,” IEEE Transactions on
1999. software engineering, no. 5, pp. 440–452, 1979.
20 VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
VOLUME 4, 2016 21