You are on page 1of 9

1

Evolving Smart Grid Information Management Cloudward:
A Cloud Optimization Perspective
Xi Fang, Dejun Yang, and Guoliang Xue, Fellow, IEEE
Arizona State University
Abstract—Smart Grid (SG) is a power system with advanced
communication and information technologies integrated and
leveraged. In this paper, we study an optimization problem of
leveraging the cloud domain to reduce the cost of information
management in the SG. We propose a cloud-based SG information management model and present a cloud and network
resource optimization framework to solve the cost reduction
problem in cloud-based SG information storage and computation.
Index Terms—Smart Grid, Cloud Computing, Information
Management, Optimization

I. I NTRODUCTION
Smart Grid (SG) is an intelligent power system that uses
two-way communication and information technologies, and
computational intelligence to revolutionize power generation,
delivery, and consumption. Its evolution relies on the utilization and integration of advanced information technologies,
which transform the energy system from an analog one to
a digital one. In the vision of the SG, information plays a key
role and should be managed efficiently and effectively [10].
One of the important trends in today’s information management is outsoucing management tasks to cloud computing,
which has been widely regarded as the next-generation computing and storage paradigm [5], [20]. The concept of cloud
computing is based on large data centers with massive computation and storage capacities operated by cloud providers,
which deliver computing and storage services as utilities.
The overwhelming data generated in the SG due to widelydeployed monitoring, metering, measurement, and control
devices calls for an information management paradigm shift
in large scale data processing and storage mechanisms. Integrating the DNA of cloud computing into the SG information
management makes sense for the following four reasons.
First, highly scalable computing and storage services provided by cloud providers fit well with the requirement of the
information processing in the SG. This is because the resource
demands for many computation and data intensive applications
in the SG vary so much that a highly scalable information
storage and computing platform is required. For instance, the
resource demands for the electric utility vary over the time
of the day, with peak operation occurring during the day and
information processing needs slowing down at night [22].
Second, the level of information integration in the SG
can be effectively improved by leveraging cloud information
sharing. As stated in [10], autonomous business activities often
lead to “islands of information.” As a result, in many cases
Fang, Yang, and Xue are all affiliated with Arizona State University, Tempe,
AZ 85281. Email: {xi.fang, dejun.yang, xue}@asu.edu This research was
supported in part by ARO grant W911NF-09-1-0467 and NSF grant 0901451.
The information reported here does not reflect the position or the policy of
the federal government.

the information in one department of an electric utility is
not easily accessible by applications in other departments or
organizations. However, a well-functioning SG requires the
information to be widely and highly available. The property
of sharing information easily and conveniently enabled by the
cloud storage provides a cost-effective way to integrate these
islands of information in the SG.
Third, the sophistication of the SG may lead to a highly
complex information management system [10]. For traditional
electric utilities, realizing such complicated information systems may be costly or even beyond their capacity. Therefore,
it would be a good option to get the information technology
sector involved and outsource some tasks to the clouds, which
provide cost-effective computing and storage solutions. This
relieves the pain of electric utilities in the costly information
system design, deployment, maintenance, and upgrade during
the massive transformation to the SG.
Fourth, distributed electricity generation [10] lowers entry
barriers for new players (e.g. small businesses or households
capable of generating electricity) and brings about a new
ecosystem of the SG. Outsourcing information management to
the clouds allows these new players to focus on their business
innovation rather than focusing on building data centers to
achieve scalability goals. It also allows these players to cut
their losses if the launched products do not succeed.
Although a cloud-based SG information management
paradigm is promising, we are still facing many challenges.
The first challenge is a systematic optimization on the resources in the SG, cloud providers, and networking providers.
For example, a fully functioning SG may be a large-scale or
even continental-wide system, where information generating
sources (e.g. smart meters and sensors [10]) are distributed
and scattered across a large geographical area, and heterogeneous communication networks (e.g. WiMax, fiber optic
networks, and powerline communications [10]) are used for
data transmission. Hence, many geographically and architecturally diverse cloud providers and networking providers may
get involved. Systematically optimizing the usage of different
resources in these diverse clouds and networks can reduce the
overall cost of the SG information management.
The second challenge is the fact that the healthy operation
of the SG is dependent on the high availability and the prompt
analysis of the critical information (e.g. power grid status monitoring data). Without a careful design, outsourcing information management to the clouds may bring about potential risks
to the operation of the SG. The first risk is that the information
security and privacy may be compromised, since outsourcing
information management may lead to electric utilities losing
the full control of the integrity, confidentiality, and availability
of the information [9]. The second risk is that the quality

[8] proposed an optimization approach to maximize the utilization of the internal data center and to minimize the cost of running the outsourced tasks in the cloud. parallel processing for real-time information retrieval. our optimization framework is designed based on our novel cloud-based SG information management model. The Cloud-Based Smart Grid Information Management Model SG Domain: As defined by the National Institute of Standards and Technology. Zhao et al. Second. For example. we are the first to present a cloud and network resource optimization framework for cloud-based SG information management. we review the related works in Section II. computational project (CP). such as smart meters or phasor measurement units (PMUs) [10]. broker domain. [11] proposed algorithms to tackle challenges in migrating enterprise services into hybrid cloud-based deployments. Nagothu et al. cloud domain. • • Data item: A DI is an information object generated by some information sources in the SG. Bossche et al. II.2 of service delivery may not be guaranteed. Overview Our cloud-based SG information management model consists of four domains– SG domain. It should be securely stored. It may be taken by some CPs as input. Hajjat et al. The remainder of this paper is organized as follows. each of which takes some DI(s) and/or the output(s) of previously finished task(s) as inputs and performs required computing operations. [15] used a combination of bin-packing. Kim et al. Next. [25] proposed two resource rental planning models to generate rental decisions. Mohsenian-Rad and Leon-Garcia [16] designed an approach to improve load balancing in the grid by distributing the service requests among data centers. and network domain. The difference between our work and the above line of research is that to the best of our knowledge. and present a resource optimization framework to solve the cost reduction problem in cloud-based SG information storage and computation. we introduce three concepts related to SG information management– data item (DI). where realtime data delivery with an extremely high probability is not easily guaranteed. our work is novel. III. the SG domain is composed of seven subdomains (refer to [4] for more details). In this paper. [24] presented a cloud-based virtual SG architecture that embeds the SG into a cloud environment. mixed integer programming and performance models to make decisions on optimal deployments of large service centers and clouds. [19] presented a model for the SG data management based on cloud computing. the consequences in the power industry may be much more fatal. taking into account the concerns of security. privacy. Computational project: A CP consists of one or more tasks. Li et al. The difference between our work and the research on general cloud resource optimization is the following. 1. S YSTEM M ODEL A. 1. risks of outsourcing still exist. [14] proposed a cloudbased demand response architecture for fast response times in large scale deployments. and then based on that to study the problem of reducing the cost of information storage and computation. which aim to minimize resource rental cost for running elastic applications in clouds. [6] proposed an optimal cloud resource provisioning algorithm to provision resources offered by multiple cloud providers. Next. . we study an optimization problem of leveraging clouds to reduce the cost of information management in the SG. and protection of service quality. We propose a cloud-based SG information management model. Xin et al. which takes advantage of distributed data management for real-time data gathering. we present an optimization framework for cloud-based SG information storage in Section IV and an optimization framework for information computation in Section V. Rusitschka et al. Simmhan et al. R ELATED W ORKS Recently. since our cloud model is designed based on novel computation flow structures and aims to optimize the cost and information flow in multiple domains. First. and user. [18] presented a decisionsupport system and a cloud computing software methodology that bring together energy consultants and modern web interoperable technologies. researchers have studied how to use cloud computing to help manage the SG. millions of households may lose power supplies due to the power outage. as shown in Fig. we describe the system model in Section III. First. electric utilities may fail to receive critical alerts from a grid status analysis service running in a public cloud. Wang et al. We use D to denote the set of DIs in the SG. Although servicelevel agreements can be established between cloud providers and actors in the SG to enforce quality of service delivery. Chaisiri et al. Then. [21] analyzed the benefit of using cloud for demand response optimization in the SG. We present simulation results in Section VII and conclude this paper in Section VIII. and ubiquitous access. while fulfilling quality of service constraints. Nikolopoulos et al. For example. Note that cloud optimization is also an active research area. even from the perspective of general cloud resource optimization. [23] proposed a practical mechanism to securely outsource linear programming to the cloud. which takes into account the requirements of actors in the SG and resources provided by cloud providers and networking providers. [17] proposed to use cloud data centers as the central communication and optimization infrastructure supporting a cognitive radio network of smart meters. Compared with the consequences of failing to ensure quality of service delivery in other industries outsourcing information management. since public clouds are often built upon the Internet. Fig.

A new class of cloud brokers may evolve from traditional brokers and specifically aim to solve SG information management problems. The customer electricity usage data is full of customer behavior information. gathers the location of a large number of EVs. Cloud Domain: The cloud domain consists of one or more clouds which provide storage and/or computing services. These charging agents then download the charging schedule from the cloud. which has seven clouds. c) Variable indicating whether task t is executed in cloud c Binary variable indicating whether cloud c1 transmits X I (t1. and security and protection requirements. the electric utility uploads the DI.e. The idea of cloud broker is compelling because as the number of services supported in the cloud domain increases.c1. and even overloading) can arise under high penetration levels of uncoordinated charging. A cloud- TABLE I F REQUENTLY USED NOTATIONS • C D DS DO T TD (d) TT (t) C CD (d) CT (t) U UD (d) UT (t) p+ (d. a set of EV charging agents (i. We use C to denote the set of clouds available. Table I shows the notations frequently used in this paper. end users may have more difficulties in finding one that meets their requirements such as cost. These DIs can be stored in a storage cloud and shared by different users for the purpose of advanced information processing. including the power grid status information. At the same time. which collects grid status information and charging station information. to a cloud. Thus. while different clouds may have different pricing policies. This information can also be used to advertise appropriate energy saving appliances or improve the hit rate of advertising. For example. In the following. which can easily deal with the fluctuation of the number of EVs involved.g. while cloud brokers help formulate programming according to demands. Network Domain: Networking providers in the network domain own the communications and network infrastructure and provide the information transmission service between any two of the above three domains.e.3 User: A user is a party in the SG who is interested in accessing some stored DIs or the outputs generated from some CPs. Detailed design and implementation of practical algorithms and protocols would be an interesting topic to be explored. More specifically. Each cloud has one pricing policy (including transfer-in. obtaining. the electricity market analysis department. Therefore.c2) intermediate data to cloud c2 because task t2 takes the output of task t1 as input based service is clearly a good candidate. computation. and releasing cloud services. upload the DIs. c2 ) κ(t. customer electricity usage data (i. users).t2. Information Sharing: As discussed in Section I. such as the customer billing service in the electric utility. coordinated EV charging is very important.e. c) Total cost Set of DIs Set of DIs required to be split Set of DIs required to be stored in one cloud Set of tasks Set of tasks taking DI d as input Set of tasks taking the output of task t as input Set of clouds Set of clouds in which DI d is allowed to be stored Set of clouds in which task t is allowed to be executed Set of users Set of users requesting DI d Set of users requesting the output of task t Unit price of uploading DI d to cloud c Unit price of downloading data from cloud c to user u Unit storage price in cloud c Unit inter-cloud transfer price from cloud c1 to cloud c2 Computational cost of task t charged by cloud c Size of DI d Size of the output of task t Size of the portion of DI d stored in cloud c Storage redundancy ratio of DI d Storage splitting ratio of DI d Binary variable indicating whether DI d is uploaded to cloud c for data computation X T (t. However. we focus on the problem of which cloud(s) should be used to store this DI. users). and computation pricing). transferout. c) sD (d) sT (t) sD (d. c) ρ(d) θ(d) X + (d. The focus of our work is above the virtualization layer in a cloud. Note that virtualization technology may be used in clouds for resource optimization. The above examples depict two intriguing example scenarios for cloud-based SG information management. We use U to denote the set of users in the SG. electric utilities). Broker Domain: The broker domain consists of one or more cloud brokers that mediate between the SG domain and the cloud domain for gathering requirements from the actors of the SG domain (e. cloud. . cloud storage provides a way to integrate the islands of information in the SG domain. can help to give the assistance information. a service. significant degradation of power system performance and efficiency. recall the example of Amazon Simple Storage Service discussed before. u) pS (c) pI (c1 . each of which is responsible for coordinating a group of EVs’ charging operations. locating the suitable clouds. storage. including EVs’ battery status information and behavior prediction. For example. to the cloud. or have difficulties in optimizing various system resources. which can be mined to provide energy recommendations or consultant advices [9]. Our framework is focused on the SG. and the customer power saving recommendation service provider. Amazon Simple Storage Service is available in seven regions with different pricing policies [2]. and estimates user demands. rather than how to optimize the virtual and physical resources in the chosen cloud(s) for data storage. electric vehicle (EV) plays an important role since they can be used to provide power to help balance loads by “peak shaving” (sending power back to the grid when demand is high) but also “valley filling” (charging when demand is low) [10]. we use three submodels to characterize cloud-based SG information storage. A CP running in the cloud takes these DIs as inputs and outputs an optimized charging schedule for these EVs.g. Let us illustrate the above concepts using the following two examples. and assisting the actors of the SG domain in buying. Coordinated Electric Vehicle Charging Analysis: In the vision of SG. availability. and service category [5]. and network domains. For instance. we consider that Amazon Simple Storage Service is provided by seven clouds. since serious problems (e. because information integration via clouds is relatively cost-effective and cloud services are highly scalable. For a DI. c) p− (c. DIs) would be useful to multiple parties (i.

Although c1 has the highest unit storage price. let p (c. and the output of task t4 as inputs. it is easy to verify that the optimal total cost is achieved when d1 is stored in c1 due to the cheaper download prices from c1 to u1 and u2 . Tasks in MapReduce are a series of operations whose relationship can be modeled as a directed acyclic graph. the arrow from c1 to c3 represents the information flow resulted by transferring the outputs of t2 and t4 from c1 to c3 . 2. t3 }. the output of task t2 . task. Therefore. For simplicity. A Dryad job is a directed acyclic graph where each vertex is a program and edges represent data channels. 3 shows an example of a CP structure graph that represents two CPs. The basic concept of the CP structure graph is similar to those used in Microsoft Dryad [12] and Google MapReduce [7]. t2 > from t1 to t2 represents that task t2 takes the output of task t1 as an input. Next. Directed link <t1 . A CP structure graph is a directed acyclic graph that characterizes the work flow and communication flow of the tasks in CPs. In MapReduce. Each user u ∈ U is interested in the output of one or more tasks and downloads the output from the cloud domain. DIs d1 and d2 are uploaded to clouds c1 and c2 . and T to denote the set of all tasks existing in the CPs. DI d1 is requested by both user u1 and user u2 . TD (d2 ) = {t2 }. S • As in Section III-B. Similarly. each DI d ∈ D is uploaded to the cloud domain and taken by some tasks as inputs. In this model. TT (t4 ) = {t5 }. and user. This example also justifies our previous statement that we should more systematically leverage resources in the diverse clouds and networks to reduce the total cost. Fig. Each user u ∈ U is interested in one or more DIs and downloads them from the cloud domain. we use C to denote the set of clouds providing storage services. As defined before. The arrow from c2 to c1 represents that the output of task t1 will be transferred to c1 . Each DI d ∈ D is uploaded to the cloud domain and stored in one or more clouds. Let sD (d) denote the size of DI d. since they are both operated in cloud c3 . and the output of task t4 is downloaded by user u1 . we use sT (t) to represent the size of the output of task t. In our model. tasks. − • As in Section III-B. Task t1 takes DI d1 as an input and its output is taken by task t2 as an input. c2 ) to denote the unit inter-cloud transfer price of transferring data from cloud c1 to cloud c2 . Each CP consists of a series of tasks. Let TT (t) denote the set of successor nodes of task t on the CP structure graph. in which tasks t1 –t4 are executed. An illustrative example is given in Fig. seven tasks. c) to represent the unit upload price of uploading DI d to cloud c. availability. and confidentiality of the data in the SG (e. t> from DI node d to task node t represents that task t takes DI d as an input. and user node denote the node in this graph representing DI. DIs. Therefore. each of which takes the DI(s) generated in the SG or the output of previously finished task(s) as input and performs required computing operations. TT (t2 ) = {t5 }. c) to represent the computational cost of task t charged by cloud c. The output of task t5 is downloaded by user u2 . Let us take CP1 as an example. and users are represented as nodes in a CP structure graph. let p (c) represent the unit storage price of storing DI d in cloud c. u) represent the unit download price of downloading data from cloud c to user u. we explain CPs and tasks in more detail. In addition. Security. In addition. Fig. it downloads this output from cloud c1 . The relationships among the tasks in a CP and the relationships among different CPs are represented by a graph. u> from task node t to user node u represents that user u downloads the output of task t from the cloud domain.4 B. we will discuss how to store DI d2 in Section III-D. and three users. We use C to denote the set of compute clouds. There are three types of prices related to this model. Note that only one copy of a DI is uploaded to a cloud even if there are two or more tasks in this cloud needing this DI as an input. − • We use p (c. This price consists of the data transfer-in price charged by cloud c and the communication price between cloud c and the information source of DI d charged by networking providers. In Dryad the overall structure of a Dryad job is determined by its communication flow. In this model. Directed link <t. DI d3 is uploaded to cloud c3 only once (to save the upload cost). a CP consists of one or more tasks. and sD (d. This price consists of the data transfer-out price charged by cloud c and the communication price between cloud c and user u charged by the networking providers. 3. • We use κ(t. We use TD (d) to denote the set of tasks that take DI d as input. S • We use p (c) to represent the unit storage price of storing DI d in cloud c. each of these clouds stores one portion of this DI. I • We use p (c1 . each of which aims to solve one SG information management job. task node. Directed link <d. D. t6 }. Cloud Storage Submodel One of the key information management tasks in the SG is data storage. Cloud Computation Submodel Another key information management in the SG is processing and analyzing the information generated in the SG.g. since t2 takes the output of task t1 as an input. Since user u1 needs the output of task t4 . although both t5 and t6 need DI d3 . four DIs. respectively. Note that the reason for the download prices being cheaper may be due to the fact that both u1 and u2 are closer to the data center of c1 than the data centers of c2 and c3 . If one DI should be stored in two or more clouds. there are two sequential steps–Map and Reduce. C. There are five types of prices related to this model. the information generated . Let DI node. + • We use p (d. c) to represent the unit upload price of uploading DI d to cloud c. 4 shows an example of assigning clouds to the tasks shown in Fig. called a CP structure graph. c) denote the size of the portion of DI d stored in cloud c. u) to represent the unit download price of downloading data from cloud c to user u. Privacy and Protection Submodel 1) Data Storage: The integrity. TD (d1 ) = {t1 . Task t5 takes DI d3 . TT (t3 ) = {t4 }. we do not show the prices in this figure. + • As in Section III-B. In addition. and TT (t5 ) = ∅. TT (t1 ) = {t2 }. we use p (d. and TD (d3 ) = {t5 .

2 shows an example where DI d2 is forced to be split and stored in clouds c2 and c3 . c) ∈ {0. which find a set of clouds satisfying the preferences of data storage. • Constraint (3) represents the third storage requirement (see Section III-D). • DI d should be split and stored in two or more clouds. for DI d. cloud brokers decide a set of candidate clouds (denoted by CT (t)) satisfying these requirements. a sensitive task (e. An example for the cloud storage Fig. c)p− (c.t. (2) c∈CD (d) over (Data Splitting Constraint) ρ(d)sD (d) sD (d. (4) ρ(d)sD (d) sD (d. In Equation (1). ∀d ∈ D. they may need to ensure the data redundancy by themselves. respectively. ∀c ∈ CD (d). and download.g. c) ∈ [0. Let DS denote the set of DIs required to be split. 4. The third requirement: Data splitting is an important approach to protecting sensitive DIs generated in the SG from unauthorized access. and θ(d) the storage splitting ratio of DI d. • Constraint (4) represents the fourth storage requirement (see Section III-D). such as data center location. A N O PTIMIZATION F RAMEWORK FOR C LOUD -BASED S MART G RID I NFORMATION S TORAGE We formulate the cost minimization problem for cloud-based SG information storage as a mixed-integer linear programming 7 Fig. u). (1) d∈D c∈CD (d) u∈UD (d) s. 2) Data Computation: Each task has its own quality of service delivery requirements. In other words. The second requirement: Although some cloud providers (e. ∀c ∈ CD (d). Fig. c) X sD (d. The first requirement: Deciding a set of candidate clouds can be done by cloud brokers. For example. and the cost of downloading DIs to users from the clouds. the total amount of this DI stored in the cloud domain equals the size of DI d times its storage redundancy ratio. c) = ρ(d)sD (d). t 3 . We hence consider that each DI d ∈ D has one or more of the following storage requirements (note that the third and fourth requirements are mutually exclusive): • DI d should be stored in one or more pre-selected candidate clouds. Even if some clouds are compromised. Amazon) claim that they provide data redundancy services. We use ρ(d) to denote the storage redundancy ratio of DI d. a data analyzer of an electric utility that uses confidential algorithms) should be only executed in the private clouds of the electric utility. c) ≤ . clouds. (Data Redundancy Constraint) X sD (d. c)pS (c) d∈D c∈CD (d) + X X d∈D c∈CD (d) + X X sD (d. and properties of communication networks among DI sources.g. In the following. facilitate data content check. • DI d should be stored with redundancy. ∀c ∈ CD (d). such as computation deadlines and data center locations. We use DO to denote the set of DIs required to be stored completely in one of the clouds. and users. IV. ∀d ∈ DS . The fourth requirement: This constraint is applicable when a DI of some type must be stored completely as a whole in order to. . t 4 u 1 d 1 c 2 t d 2 d 3 1 2 u c t 3 5 t 6 3 d 4 u c 4 t Fig. A N O PTIMIZATION F RAMEWORK FOR C LOUD -BASED S MART G RID I NFORMATION C OMPUTATION We formulate the cost minimization problem for cloud-based information computation as an integer linear programming. we explain the above four requirements. Let CD (d) denote the set of these clouds. upload. ∀d ∈ D. for example. V. by encrypting the DIs and storing different portions of each DI in different clouds. (5) Interpretations: • The objective is to minimize the total cost. For each task t. 3. In order to allow the actors in the SG to have better control of the redundancy of the critical data. the three terms represent the cost of storage. electric utilities). • Constraint (2) represents the second storage requirement (see Section III-D). 2. the actual data processing in such service is like being encapsulated in a black box and is done by the clouds rather than the actors in the SG (e. the cost of uploading DIs to the clouds. • DI d should be completely stored in one cloud. ∞). 1 Constraint (3) means that no cloud stores more than θ(d) of the total amount of DI d stored in the cloud domain. 1}. ∀d ∈ DO . data availability.g. System1 : min (Cost) X X C= sD (d. the attacker may still be unable to retrieve the whole DI.5 D a t a I t e m s C l o u d s U s c t 2 e r s 1 . An example for a CP structure graph from PMUs and smart meters) is critical for the operation of the SG [10]. (3) θ(d) (Data Exclusive Constraint) sD (d. c)p+ (d. An example for the cloud computation problem.

t2 . c1 ) = X (t2 . (11) (12) X I (t1 . 1}. ∀d ∈ D. . Similarly. c). Binary variable X I (t1 . c2 ) is 1 if and only if 1) task t1 is executed in cloud c1 . which is computed as in Equation (6). we may not know the accurate values of unit upload. 2) task t2 is executed in cloud c2 . when some prices are uncertain. which aims to minimize the expectation of objective (1). storage. storage. we need to upload DI d to cloud c once if there is at least one task executed in cloud c that would take DI d as an input. c). c) X X X I (t1 . c1 . c)sD (d)p+ (d. c2) t1 ∈T t2 ∈TT (t1 )c1 ∈CT (t1 )c2 ∈CT (t2 ) + X X X X T (t. Second. c2 ). d∈D c∈CD (d) u∈UD (d) We use the solution to this objective as an approximate solution to System 1. we assume that they are upper bounded (i. ∀t ∈ T. the unit storage price would be larger. c) is 1. c2 )) − ≤ X I (t1 . D ISCUSSIONS t∈T c∈CT (t) d∈D c∈CD (d) (Task Execution Constraint) X X T (t. u). this constraint means that if task t1 is executed in cloud c1 . We replace objective (6) by the following: XX XX X + (d. (10) over X + (d. ∀c ∈ / CT (t). c)sD (d)pSUB (c) + X T (t. I • Constraint (10) ensures X (t1 . u). c2 ) to be 1 if and T T only if X (t1 . Moreover. 1}. X T (t. c). t ∈ T. we will discuss some practical issues. c) = 1. the cost of executing tasks. c) is 1 if and only if DI d is uploaded to cloud c. ∀c1 ∈ CT (t1 ). c) is 1 if and only if task t is executed in cloud c. In other words. c1 ) + X T (t2 . + • Constraints (8) and (9) ensure X (d. c)κUB (t. ∀c1 ∈CT (t1 ). ∀t ∈ T. 1}. (6) t∈T c∈CT (t) u∈UT (t) s. one approach is using the worst case method for the treatment of uncertainty. c)sT (t)p− (c. ∀c ∈ CD (d). System2 : min C= (Cost) X X X X X + (d. c) ≤ p+ UB (d. ∀c ∈ CD (d). c2 ) ≤ pIUB (c1 .t. c) to be 1 if for D T some t ∈ T (d). c) ≤ X + (d. The five terms represent the cost of storing DIs. In other words. t2 . c) d∈D c∈CD (d) + X X X d∈D c∈CD (d) sD (d. c2 ) = 1. Binary variable X + (d. X + (d. u) ≤ p− (c. c)p− UB (c. and the cost of downloading the outputs of tasks to users from the cloud domain. Another approach is adopting stochastic programming [13] (if we know the distribution of the uncertain prices). First. c) + X X d∈D c∈CD (d) XX + In this section. c) min C = t∈T c∈CT (t) d∈D c∈CD (d) + X X d∈D c∈CD (d) + XX X + (d. • The data upload operation takes a long time and the communication price changes during the operation. • Constraint (7) ensures that each task is executed by exactly one of the clouds. c2 ) 2 2 1 T 1 T ≤ (X (t1 . One approach to dealing with uncertain prices is using a worst case approximation. Dealing with Uncertain Prices In the SG information storage problem. ∀t1 ∈ T. and download prices. (7) c∈CT (t) (Data Upload Constraint) P T t∈TD (d) X (t. c)sD (d)pS (c) + X T (t. t2 . ∀t2 ∈ TT (t1 ). c1 . ∀d ∈ D. c1 . the cost of inter-cloud intermediate data transfer. c) |TD (d)| X ≤ X T (t. unpredictable size of task output may also lead to uncertainty of download cost. p+ (d. c)pSUB (c) + sD (d. VI. The cost of inter-cloud transfer may also be uncertain due to varying inter-cloud transfer price and unpredictable size of task output.6 of t1 as an input. We list some possible scenarios in the following. c2 )) + . and pI (c1 . c)sTUB (t)p− UB (c. In the SG information computation problem. We replace objective (1) by the following: X X X X min C = sD (d. then cloud c1 will transfer the output of t1 to cloud c2 . u) and p (c) ≤ p (c)). and download prices. (13) Interpretations: • The objective is to minimize the total cost. and t2 takes the output A. u). Binary variable X T (t. ∀t1 ∈ T. although the actual values of these prices may be uncertain. c2)sT (t1)pI (c1 . c2) t1 ∈T t2 ∈TT (t1 )c1 ∈CT (t1 )c2 ∈CT (t2 ) + X X X t∈T c∈CT (t) u∈U X T (t. t2 . and 3) cloud c1 transfers data to cloud c2 because task t2 takes the output of task t1 as an input. S S p− (c. t2 . ∀t2 ∈ TT (t1 ). c1 ) + X (t2 . and that we know UB UB these bounds. c1 . c1 . such as unpredictable running time. If the actual storage period is longer than the designed one. c1 . ∀c ∈ C. More specifically. c)p+ UB (d. the cost of uploading DIs to the cloud domain.e. We assume that sT (t) ≤ sTUB (t). (9) (Inter-Cloud Intermediate Data Transfer Constraint) 1 T 1 (X (t1 . c) ∈ {0. (8) t∈TD (d) X T (t. ∀c2 ∈CT (t2 ). κ(t. c) X X X I (t1 . c2 )∈{0. c) = 0. c) ∈ {0. X (t. computational cost of tasks may also be uncertain due to many reasons. we may not know the accurate values of upload. c)κ(t. • The actual unit download price could be different than that when the programming is formulated and solved. task t2 is executed in cloud c2 . c)sD (d)p+ UB (d. 3 3 ∀c2 ∈ CT (t2 ). c2)sTUB (t1)pIUB (c1 . c) ≤ κUB (t. • The unit storage price is computed based on the knowledge of how long DIs will be stored. respectively. we also need to deal with uncertain prices. t2 .

and PMU data). Optimization on physical and d∈D c∈CD (d) u∈UD (d) virtual resources catering to the information management tasks might lead to a cost reduction. D. we evaluate the performance of our optimization framework. and EV analysis service) requesting these data. Refining Resource Optimization Model Our model can be further refined to capture more practical and sophisticated properties. P ERFORMANCE E VALUATION In this section. C. c) be beneficial to optimize communication resources on a d∈D c∈CD (d) more sophisticated communication model. it would + X + (d. analyzing customer behavior data is a service with steady state and predictable usage. 1. Each household is equipped with one smart meter. both users download min (Cost) this DI from this cloud. + information management. Jointly Optimizing Information Computation and Storage • The communication topology among DI sources. three common options are on-demand instance pricing. For instance. but customers will never pay more than the maximum price they have specified. There exists one large electric utility. t2 . However. when these two users X X X X + D S T are quite close. c)sT (t)p− (c. clouds In some scenarios. We list the following two example scenarios and analyze the corresponding pricing options. Due to the sensitivity of the customer . c)s (d)p (c) + X (t. a service can be run in the cloud to coordinate EV charing. in this scenario the on-demand instance pricing option may be better than the other two options. we can adopt the reserved instance pricing option to lower hourly rate. storage and sharing can be formulated as to two users.7 If we know the distribution of the uncertain prices. which may lead to short term. where we want to transport a DI from a cloud of computation. the total cost may be further reduced if C= X (d. u).8GHz Linux machine. The data aggregated in each residential area is represented as one DI and categorized as the customer account data. a smart meter generates about 1 megabyte of information per day if the sampling interval is 15 minutes). reserved instance pricing. • Coordinated EV charing analysis: As discussed in Section III-A. The data aggregated from the smart meters in each residential area is represented as one DI and categorized as the customer behavior data. b. c) one user forwards this DI to the other once the former D T d∈D c∈C (d) t∈T c∈C (t) X X one downloads the DI from the cloud. EV data. On-demand instances let customers pay for compute capacity by the hour with no long-term commitments or upfront payments. which captures XX X X + X I (t1 . + X T (t. In our current model. Cloud-Based Smart Grid Information Storage We evaluated the following scenario. We assumed that each smart meter generates 1 megabyte of information per day (according to Austin Energy. Spot instance pricing option may also be applicable because this service has flexible start and end times and can be executed at the time when the spot price is low. For example. c. c1 . where Infrastructure as a Service t∈T c∈CT (t) u∈UT (t) X X X (IaaS) [5] is provided to the electric utility for SG X + (d. upfront payment for an instance. and pay a significantly lower hourly rate for that instance. one-time. variance of b. The formation flow to reduce the cost. d) denotes a Gaussian variable with mean of a. B. and also stored in clouds for information sharing. c2) more details of information flow. c)sD (d)p+ (d. VII. Therefore. Spot instances provide the ability for customers to purchase compute capacity with no upfront commitment and at hourly rates usually lower than the ondemand rate. • Customer behavior data analysis service: In general. u) consider a scenario. 100000). The number of households in each residential area was set to G(10000. For example. consider objective function of the problem of minimizing the total cost a scenario. 5000. The programmings formulated were solved using the CPLEXTM solver [3] on a 2. One property of this service is that the analysis work load is fluctuating and often unpredictable because the number of the EVs involved varies over time. spiky or unpredictable workloads. c)κ(t. customer account data. customer billing service. we can adopt stochastic programming and aim to minimize the expectation of objective (6). c)sD (d)p− (c. four types of DIs (customer behavior data. We list the following two extensions as the future research work. Therefore. Wisely Selecting Pricing Options Cloud providers often offer multiple pricing options. This electric utility maintains three geographically different private clouds by itself. Reserved instances let customers make a low. The spot price fluctuates based on supply and demand for instances. and four service departments in this utility (grid and market analysis service. This pricing option is suitable for the information management services. and spot instance pricing [1]. reserve it for a one or three year term. The number of residential areas served by this utility was varied from 50 to 500 with increment of 50. where G(a. d]. t1 ∈T t2 ∈TT (t1 )c1 ∈CT (t1 )c2 ∈CT (t2 ) • It might be beneficial to consider physical and virtual X X X resources in clouds for some scenarios. and range of [c. Each household also generates 1 kilobyte of important account information per day. c2)sT (t1)pI (c1 . These DIs can be stored in public and/or private clouds. two public clouds (Amazon-I and Amazon-II). This pricing option is suitable for the information management services with steady state or predictable usage. As a result. A. DIs would be taken by the CPs as and users may be taken into account to optimize the ininputs. recommendation service.

e.8000 6000 4000 2000 0 50 100 150 200 250 300 350 400 450 500 Number of Residential Areas (a) Information Storage Cost Fig.125 (i. We assumed that these tasks output 10 kilobyte of data for each household.1. STOC and WST represent the 2 0 50 100 150 200 250 300 350 400 450 500 Number of Residential Areas (c) Information Computation Cost Running Time (second) Cost ($) 10000 OPT STOC WST PRVT Cost ($) 12000 Running Time (second) 8 5 4 3 2 1 50 100 150 200 250 300 350 400 450 500 Number of Residential Areas (d) Running Time of System 2 stochastic programming and the worst case method discussed in Section VI-A.05. the electric utility collects the information from 100 PMUs. the price of Amazon-I for high memory on-demand computational task is $2. Now we explain the relationships between DIs and users. We also observe that it would lead to a 40% cost increase to enforce all the DIs to be stored in private clouds. 5.12) per gigabyte. and PRVT. These DIs can be stored in public and/or private clouds. which can be further used to improve future operation of SG. Fig. and hence this task should be run in the private clouds for a privacy reason. The average transferin (transfer-out) price of the private clouds was assumed to be $0. 0. Each household has zero or more EVs.05. 5(a). 5(c). we compared the costs of four cases: OPT.05 0. The rationale behind the setting that the storage price of private clouds is higher than that of public clouds is that public cloud providers are usually focused IT companies and have the large-scale and professional advantage to reduce the price. respectively.05. In addition. Cloud-Based Smart Grid Information Computation We evaluated a CP that analyzes electricity saving products once a month (30 days). another task takes all the intermediate data and analyzes what kinds of power saving products should be produced to meet customers’ requirements of electricity saving.00 per hour. in a residential area with 10000 households.15). residential areas. which mines these DIs to provide customers with energy recommendations to help them reduce billing. We further assumed that a secret commercial model is used in this task. and the communication price between c1 and c2 (note that the communication price was set to G(0. The average transfer-in (transfer-out) price of Amazon-I and Amazon-II is $0 ($0. 1. Customer behavior data is also used by the recommendation service department. We assumed that the time each of these tasks takes is G(15. OPT represents the case where the optimal solution is achieved. the CP first needs a set of tasks.2) per gigabyte. WST. and download prices are similar to those in Section VII-A.14 (i. customer behavior data. The unit inter-cloud transfer price between clouds c1 and c2 is equal to the sum of the transfer-out price of c1 . Customer behavior data and PMU data are used by the grid and market analysis service department to establish the grid and market status in the past. The settings of clouds. The communication price was assumed to be G(0. In Fig. each of which generates 500 megabyte of phasor information per day.075 0.28 per hour. US West-Northern California pricing [2]). 0. For example. OPT represents the case where the optimal solution is achieved. In addition. In Fig. and PRVT. STOC. We assumed that the total number of EVs in each residential area is uniformly distributed in [ R2 . We assumed that 1 megabyte output is generated by this task and G(5. the transfer-in price of c2 . For each residential area.2 ($0. B. After every DI has been analyzed by the corresponding task. 0. 5(b) shows in practice the running time of CPLEXTM is very short. each of which analyzes the customer behavior data generated in one residential area. and 100 megabyte of intermediate data is generated by the corresponding task. We observe that in STOC and WST although we may not know the exact values of storage and download prices. We observe that in STOC and . WST. Customer account data is used by the customer billing service department. although solving mixed integer programming is NP-hard. and EV data are used by the EV analysis service department to investigate the impacts of charging/discharging EVs on the power grid status and customer electricity usage. PRVT represents the case where all the DIs must be stored in the private clouds of the electric utility. The data aggregated from EVs in each residential area is represented as one DI and categorized as the EV data. 9) hours is needed to finish this task.1.05.00 per hour. STOC. They should be stored in one of the public or private clouds. The unit storage price for one gigabyte per month of a private cloud is uniformly distributed between $0. and that of Amazon-II is $2. 0. the costs are just slightly higher than the optimal one. upload prices.e. a 300 gigabyte DI is generated (recall the setting of customer behavior data in Section VII-A). We assumed that all the DIs are stored for six months and we evaluated the cost of storing the DIs generated in one day. 10. US standard pricing [2]) and that of Amazon-II is $0.2. The data generated from each PMU is represented as one DI and categorized as the PMU data. PRVT represents the case where all the DIs must be stored in private clouds and all the tasks must be executed in private clouds. STOC and WST represent the stochastic programming and the worst case method.15–$0. The price of private clouds for computational task is $3. 1.1 x 10 OPT STOC WST 4 PRVT 6 0. 0. 0. 1. The unit storage price for one gigabyte per month of Amazon-I is $0. PMU data. One EV generates 50 kilobyte of information per day.15)). they can only be stored in the private clouds. 20) hours. where R is the number of households in this residential area. In addition. and the storage splitting ratio and the redundancy ratio were both set to 2. The unit upload (download) price is equal to the sum of the transfer-in (transfer-out) price charged by the clouds and the communication price charged by networking providers. Performance Evaluation 4 0. we compared the costs of four cases: OPT. when the real storage and download prices are unknown in advance. when the task computational costs and download prices are unknown in advance.025 50 100 150 200 250 300 350 400 450 500 Number of Residential Areas (b) Running Time of System 1 account data. respectively. R].

B. IEEE Communications Surveys and Tutorial. [10] X. Ren. and D. X. 13(3):311–336. C ONCLUSION In this paper. pages 820 –828.-J.9 WST. [12] M. Stochastic linear programming: models. R EFERENCES [1] Amazon EC2 Instance Purchasing Options: http://aws. Cao. Misra. and M. Isard. Rao. Y. Cloud computing and emerging it platforms: Vision. B. Sripanidkulchai. I. Leon-Garcia. Prasanna. [20] S. [8] R. pages 398–403. Currently he is a Ph. G. and IEEE ICC 2011. I. pages 483–488. Wang. in 2005 and 2008. Kumbhare. the costs are just slightly higher than the optimal one. A survey of large scale data management approaches in cloud environments. Fang. Lee. A. Fang.the new and improved power grid: A survey. and A. from Peking University. Coordination of cloud computing and smart power grids. An analysis of security and privacy issues in smart grid software architectures on clouds. Li. X. Loumos. G. den Bossche. and C. M. pages 243–254. 2011. M. [13] P.S. In Proceedings of the ACM SIGCOMM. although our formulation is an integer programming. [5] R. Smart grid data cloud: A model for utilizing cloud computing in the smart grid domain. Eger. Fang. Yang. J.com/s3/. We also observe that it would lead to a 50% cost increase to enforce all the DIs to be stored in the private clouds of the electric utility and all the tasks to be executed in the private clouds. IEEE Transactions on Services Computing. pages 137–150. Informatics. Gerdes. He is a computer science Ph. China. USENIX OSDI. He is an Associate Editor of IEEE/ACM Transactions on Networking and IEEE Network. Yang. 2009. student in the School of Computing. J. Simulations have shown that our optimization framework can significantly reduce the SG information management cost. I. C. Simmhan. pages 368– 372. [16] A. Beijing. 2012. degrees from Beijing University of Posts and Telecommunications. 2010. april 2011. Cao. and M. NIST framework and roadmap for smart grid interoperability standards.ibm. In addition. and D. Batista. and J.-S.com/software/ integration/optimization/cplex-optimizer. In INFOCOM. Venugopal. IEEE SmartGridComm. Fetterly. Nikolopoulos. theory. 2011. 5(d) shows that the running time of CPLEXTM is no more than 5 seconds in all cases studied. [23] C. ACKNOWLEDGMENT We thank the editor and the reviewers whose comments on an earlier version of this paper have helped to significantly improve the presentation and the content of this paper. and reality for delivering computing as the 5th utility. K. B. and M. Dean and S. and V. IEEE International Conference on Cloud Computing. Cloudopt: Multi-goal optimization of application deployments across a cloud. T. International Conference on Network and Service Management. Mayer.amazon. Fig. Mohsenian-Rad and A. Broberg. Kumbhare.-W. [24] Y. and M. 2010. Secure and practical outsourcing of linear programming in cloud computing.-H. . Dejun Yang [StM’08] received his B.S. Ghemawat. B.D. 2011. D. IEEE MASS 2011. Chaisiri. IEEE Communications Surveys and Tutorials. Mpardis. M. 5(5):454–465. S. Budiu. University of Southern California. Y. Sharma. and IEEE ICC 2011. Simmhan. S. A. Kall and J. Li. Aman. An informatics approach to demand response optimization in smart grids. Jamshidi. 2011. China. K. 2012. Guoliang Xue [M’96. Sun. Smart grid . [25] H. Paul. He has published over 200 papers in these areas. Misra. G. M. release 1. [15] J. and Y. Niyato. [19] S. Sung. and D. Kelley. respectively. Baldine. and V. Litoiu. Optimization of resource provisioning cost in cloud computing. He has received Best Paper Awards at IEEE ICC 2012. Fern. Birrell. One of his co-authored papers was a runner-up to the Best Paper Award at IEEE ICNP 2010. SM’99. S. A. although we may not know the exact values of task computational costs and download prices. Thottan. Nagothu. IEEE SmartGridComm’10. D. Xue. [17] K. and M. Cloudward bound: planning for beneficial migration of enterprise applications to the cloud. IEEE Network. Future Generation Computer Systems.0. Buyya. and resource allocation issues in networks. A. 2011. Lykourentzou. 2010. we have proposed a cloud-based SG information management model and presented a cloud and network resource optimization framework to solve the cost reduction problem in cloud-based SG information storage and computation. Liu. Optimal resource rental planning for elastic applications in cloud market. Zhou. Kim. [9] X. [4] National Institute of Standards and Technology. D. and served as a TPC Co-Chair of IEEE INFOCOM 2010. and applications. Rajaee. E. Yang. 6(1):4–15. S. Cloud-based demand response for smart grid: Architecture and distributed algorithms. Brandic. Alomari. Technical Report. Giakkoupis. Wang. and J. IET Software. [22] Y. Springer Verlag. Y. Yeo. 2011. Hajjat. Maltz. Chakrabortty. He is an IEEE Fellow. Prasanna. He has received Best Paper Awards at IEEE ICC 2012. One of his co-authored papers was a runner-up to the Best Paper Award at IEEE ICNP 2010. Sakr. 2010. Rusitschka. 2011. [18] V. 2011. hype. [3] IBM ILOG CPLEX optimizer: http://www-01. IEEE MASS 2011. and A. S. B. [14] H.amazon. IEEE SmartGridComm. pages 599–616. MapReduce: Simplified data processing on large clusters. Computer Science Department. volume 156. IEEE Conference on Cloud Computing. [6] S. [11] M. Woodside. Broeckhove. Giannoukos. 2010. pages 1–9. model. Yu. pages 1–6.D. Chase. 2004. Chinneck.S. K. 2011 Proceedings IEEE. 2010. and Decision Systems Engineering at Arizona State University. candidate at Arizona State University. IEEE System Journal. Xin. [7] J. Dryad: distributed data-parallel programs from sequential building blocks. IEEE International Parallel and Distributed Processing Symposium. Zhao. J. VIII. Xue. M. Persistent Net-AMI for microgrid infrastructure using cognitive radio on cloud data centers. A. Fellow’11] is a Professor of Computer Science and Engineering at Arizona State University. security. [2] Amazon Simple Storage Service: http://aws. Xi Fang [StM’ 09] received B. Liu. Parkhurst. and computation. Vanmechelen. K. Virtual smart grid architecture and control framework. V. K. Managing smart grid information in the cloud: Opportunities.com/ec2/ purchasing-options/. Q. Beyene. Web-based decision-support system methodology for smart provision of adaptive digital energy services over cloud technologies. and D. X. 2007. Tawarmalani. Kim. in 2007. M. [21] Y. pages 228–235. He was a Keynote Speaker at IEEE LCN 2011. Cost-optimal scheduling in hybrid iaas clouds for deadline constrained workloads. IEEE SmartGridComm’11. 2012. C. Pan. and I. 2012. Z. S. and V. His research interests include survivability. EuroSys.