You are on page 1of 4

Paper 1

6, JUNE 2012
Title: An Online Data Access Prediction and Optimization Approach for Distributed Systems
Authors: Renato Porfirio Ishii and Rodrigo Fernandes de Mello
Current scientific applications have been producing large amounts of data.
Examples of such applications include the Pan-STARRS project,1 which captures at about 2.5
petabytes (PB) of data per year, and Large Hadron Collider project (LHC),2 that generates from 50 to
100 PB of data every year.
The processing, handling and analysis of such data require large-scale computing infrastructures such
as clusters and grids.
distributed computing tools to deal with the need for high performance and storage requirements.

In this area, studies aim at improving the performance of data-intensive applications by optimizing
data accesses. In order to achieve this goal, distributed storage systems have been considering
techniques of data replication, migration, distribution, and access parallelism. However, the main
drawback of those studies is that they do not take into account application behavior to perform data
access optimization.
This limitation motivated this paper which applies strategies to support the online prediction of
application behavior in order to optimize data access operations on distributed systems, without
requiring any information on past executions. In order to accomplish such a goal, this approach
organizes application behaviors as time series and, then, analyzes and classifies those series according
to their properties.
In order to accomplish this goal, our approach considers the following steps
1. Application knowledge acquisition;
2. Organization of process behaviors as time series;
3. Analysis of time series generation processes;
4. Selection of techniques to model times series;
5. Definition of how many future observations will be predicted;
6. Prediction of observations, and, finally,
7. Execution of the optimization heuristic in attempt to reduce the time consumed in data access

we can better select modeling techniques to perform prediction. the approach selects modeling techniques to represent series and perform predictions. Experiments confirm this new approach reduces application execution time in about 50 percent. specially when handling large amounts of data. data access optimization. stochastic time series are formed by random observations and relations. they evolve over time around a constant average and variance [34]. i. sponsored by the LHCCERN project and widely employed by the scientific community. and stochasticity. On the other side. time series analysis.By knowing properties.e. Proposed Method: Time Series Deterministic Stochastic Linear Stationary NonLinear Nonstationary . later on. This new approach was implemented and evaluated using the OptorSim simulator. prediction. nevertheless. which are. Before modeling. which follow probability density functions and may change over time [34]. In which data are commonly organized in terms of variables and their observations over time. Index Terms—Distributed computing. in linear time series. it is necessary to understand the implicit features embedded in data such as the stationarity. observations are composed of a linear combination of past occurrences and noises [35]. By understanding those features.. used to optimize data access operations. A time series is said to be stationary when its observations are in a particular state of statistical equilibrium. Finally. distributed file system. linearity.

Each local controller can only observe its subsystem but can communicate with the other controllers by piggy-backing extra information. To refine their control policy. an estimate of the current global state of the distributed system.NO. We provide an algorithm that computes. and stationarity. However. The efficiency of our approach was confirmed through experiments using real-world data. we use abstract interpretation techniques to obtain over approximations of (co-)reachable states. to the messages sent in the FIFO channels. controllers can use the FIFO queues to communicate by piggybacking extra information (some timestamps and their state estimates) to the messages sent by the subsystems.Conclusion: This paper has presented a data access optimization approach which uses predictive techniques for distributed computing environments. Our method relies on the computation of (co-)reachable states. By modeling those series. Index Terms— Automata. migration.59. Since the reachability problem is undecidable in our model. An implementation of our algorithms provides an empirical evaluation of our method. discrete event systems(DES). and consistency.supervisory control. Herv Marchand. duringthe execution ofthe system. predict future observations. therefore. first out (FIFO) queues between subsystems. By conducting additional tests. data access operations are transformed into time series. therefore. Such prediction supports to take decisions beforehand. such as state estimates. this modeling is related to specific aspects of each time series such as the stochasticity. We model distributed systems as communicating finite state machines with reliable unbounded first in. Tristan Le Gall. Our algorithm synthesizes the local controllers that restrict the behaviour of a distributed system in order to satisfy a . Our main objective is to minimize the application execution time by optimizing data accesses and. we confirmed that the time series classification can indeed be used as a way to select the most appropriate set of modeling techniques. Paper 2: Source: IEEETRANSACTIONSONAUTOMATICCONTROL.2.VOL. linearity. We evaluated our approach to select modeling techniques for real systems data (SNIA data sets [46]). and Thierry Massart Abstract: We consider the control of distributed systems composed of subsystems communicating asynchronously. Local controllers can only observe the behavior of their proper subsystem and do not see the queue contents. From that. we can understand the behavior of applications and. Conclusion: We propose in this paper a novel framework for the control of distributed systems modelled as communicating finite state machines with reliable unbounded FIFO channels. for each local subsystem (andthus for each controller).FEBRUARY2014 Title: Symbolic Supervisory Control of Distributed Systems with Communications Authors: Gabriel Kalyon. improving decisions on replication. the aim is to build local controllers that restrict the behavior of a distributed system in order to satisfy a global state avoidance property. We then define asynthesis algorithm to compute local controllers.

If the same estimate must be transmitted. we intend to solve the main practical problem of our approach: we compute and send states estimates every time a message is sent. . The more precise the abstraction state avoidance property. to ensure that an error state is no longer reachable or to bound the size of the FIFO channels. this abstraction leads to a safe effective algorithm. We still have to determine what is the most efficient technique. were mind that this permissiveness depends on the quality of the abstraction.Our experiments show that our approach is tractable and allows a precise control. We abstract the content of the FIFO channels by the same regular representation as in [22]. Estimates would be indexed in a table. only its index can be transmitted and the receiver cans find from its table the corresponding estimate. A similar online method would be to use the memorization technique: when a state estimate is computed for the first time. A more evolved technique would consist in the offline computation of the set of possible estimates. As a further work. Even if we cannot have any theoretical guarantee about the permissiveness of the control (like a non-blocking property). the more permissive the control is . We also believe that the work of decentralized control with communication and modular control with coordinator might be adapted in our framework in order to reduce the communication between controllers. and evaluate how it improves the current implementation. e. available at execution time to each local estimator. it is associated with an index that is transmitted to the subsystem which records both values.g.