You are on page 1of 14

IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO.

5, SEPTEMBER 2016 2423

Energy Big Data Analytics and Security:


Challenges and Opportunities
Jiankun Hu, Member, IEEE, and Athanasios V. Vasilakos, Senior Member, IEEE

AbstractThe limited available fossil fuels and the call for sus- numbers will triple from 10.3 million in 2011 to 29.9 million
tainable environment have brought about new technologies for by 2017 [4]. There are many potential advantages to be derived
the high efficiency in the use of fossil fuels and introduction of from smart grid data [4]: automated and real-time moni-
renewable energy. Smart grid is an emerging technology that can
fulfill such demands by incorporating advanced information and toring users energy consumption, automated processing of
communications technology (ICT). The pervasive deployment of billing, detection of energy losses (possible fault and/or fraud),
the advanced ICT, especially the smart metering, will generate early warning of blackouts, fast detection of disturbances in
big energy data in terms of volume, velocity, and variety. The energy supply, intelligent and real-time energy planning and
generated big data can bring huge benefits to the better energy pricing. Smart energy data exhibit characteristics of 3Vs
planning, efficient energy generation, and distribution. As such
data involve end users privacy and secure operation of the criti- model describing big data, which are Volume, Velocity and
cal infrastructure, there will be new security issues. This paper is Variety.
to survey and discuss new findings and developments in the exist-
ing big energy data analytics and security. Several taxonomies
have been proposed to express the intriguing relationships of
various variables in the field. A. Big Data Characteristics of Smart Energy Data
Index TermsEnergy, big data, analytics, cyber security, smart 1) Volume and Velocity: Smart meters are usually deployed
grid, anomaly detection, SCOPF. at the scale of multimillion units. The E-Sketch project showed
that hourly power consumption data will miss much vital infor-
mation on how homes consume energy, which is needed for
I. I NTRODUCTION real-time pricing plans. Also overlapping of peak demand from
OSSIL fuel reserves are finite and it is predicted that individual homes may cause blackouts at some substations
F the known oil reserves will be exhausted by 2050 [1].
Renewable energy such as wind energy and solar energy can
within several seconds [5]. Therefore data generated at the
minute-level and even second-level will be desirable [5][7].
provide a solution to both energy shortage and sustainable According to 2012 housing units in New York State,
environment. Smart and efficient energy usage will also be 127.1TB is needed to store each days power consumption
effective in energy saving and reducing carbon emissions. data [5].
The emerging smart grid (SG) technology provides an effec- 2) Variety: There are many structured and unstructured
tive means in incorporating various renewable energy sources data from multiple sources and categories that are relevant to
into the existing energy system and also making smart energy the energy generation, energy planning, and distribution etc. In
a reality. It is considered as a technological paradigm shift. addition to the household appliance-level energy consumption
In a smart grid, networking, intelligent communications data generated by the Advanced Metering Infrastructure (AMI)
technology and information processing functions are immersed and smart meters, other data such as different renewable
into every facet of the energy system ranging from power energy, weather, and market etc. are collected for the optimal
generation, power transmission, power distribution and con- operation of the system.
sumer appliances [2][4]. A large-scale smart grid can consist
of thousands of microgrids that are operating in both intercon-
nected and isolated modes [2]. Smart metering is an integral B. Motivation
component of a smart grid where smart meters are being
The big data nature of smart energy poses new challenges
installed in homes and other parts of the system in a large
in data analytics and security where conventional technolo-
scale. Reports indicate that the global smart meter installation
gies cannot deal with. Recently much research effort has been
Manuscript received August 29, 2015; revised December 11, 2015, made to address the challenges of energy big data analyt-
March 9, 2016, and April 11, 2016; accepted April 24, 2016. Date ics and security. Energy big data analytics and security is
of publication May 9, 2016; date of current version August 19, 2016. a very broad area involving large distributed infrastructure, big
Paper no. TSG-01019-2015.
J. Hu is with the University of New South Wales, Canberra, ACT 2610, data generation, transmission, storage, sharing and processing,
Australia (e-mail: J.hu@adfa.edu.au). and security and privacy. In addition to the common challenges
A. V. Vasilakos is with the Department of Computer, Electrical and Space of big data analytics and security, energy big data analyt-
Engineering, Luea University of Technology, Lulea 97187, Sweden (e-mail:
vasilako@ath.forthnet.gr). ics will add another dimension of difficulty in dealing with
Digital Object Identifier 10.1109/TSG.2016.2563461 the unique factor of tight cyber-physical couplings. A sys-
1949-3053 
c 2016 EU
2424 IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 5, SEPTEMBER 2016

Fig. 1. Taxonomy of energy big data challenges.

tematical and comprehensive survey on the new findings will II. C HALLENGES OF E NERGY B IG DATA
help track the latest developments in the field, which will be A NALYTICS AND S ECURITY
the main motivation of this paper. Although there exist sev- In this section, an overview is presented on the challenges of
eral surveys on energy big data analytics, no survey has been energy big data analytics and security/privacy. A taxonomy of
found that has incorporated the security and privacy issues challenges of energy big data analytics and security is shown
of energy big data analytics. Our survey will attempt to fill in Fig. 1.
this gap.

A. Scalable and Interoperable Computing Infrastructure


C. Main Contributions
A SG is a highly distributed system. The huge amount of
In this paper, we will present a survey reflecting the latest data is collected from every corner including energy genera-
developments in the field. Our contributions are manifold: tion, distribution, renewal energy powered vehicles and smart
It provides a first comprehensive survey covering both meters etc. It includes dynamic streaming and non-streaming
energy big data analytics and its security in an integrated data, structure and un-structured data. Also there is a constant
framework. flow of the data, e.g., machine to machine, machine to human
It proposes several well-designed taxonomies for energy etc. It is very challenging to store, share and process such
big data analytics and security, which can help under- data. A scalable and interoperable computing infrastructure
stand intriguing relationships among many variables and is needed.
concepts in the field.
We have discussed several important lessons learned from
previous research activities. For example, power sys- B. Real-Time Big Data Intelligence
tem community has a different security concept from
cyber security community, while conventional cryp- A report indicates that overlapping of peak demand from
tography in the cyber security community does not individual homes may cause blackouts at some substa-
consider the factors of real-time and the tight cyber- tions within several seconds [5]. Therefore real-time deci-
physical couplings in the advanced power systems sion is essential for both system operation and real-time
such as smart grids. Such parallel and un-converged pricing. An intelligent decision making needs to process
security research activities will leave severe security current data and historical data. Given the huge volume
loopholes in the gap. We also suggest the adop- and high variety of the data, it is challenging enough
tion of expanded dependability framework to close to process such data. With the constraint of real-time
the gap. demand, it will be extremely challenging to design new
We have provided and discussed open research questions
algorithms that can provide real-time intelligence from such
for future research directions. big data.
The rest of this paper is organized as follows: In Section II,
an overview is given on the challenges of energy big data ana-
lytics and security/privacy. In Section III, an overall taxonomy C. Big Data Knowledge Representation and Processing
of energy big data analytics is presented. Various algorithms Big data analytics requires new machine learning the-
and schemes on energy big data analytics are classified under ory and artificial intelligence. It is well known that that
this proposed taxonomy. In Section IV, we discuss energy the process and outputs from machine learning and artifi-
big data infrastructure security from cyber-physical aspect. In cial intelligence lack of intuitive physical interpretation [8].
Section V, latest developments on energy big data security and Therefore it is important to fill this gap by provid-
privacy are surveyed and summarized from data-driven aspect. ing suitable knowledge interpretation in order to make
Section VI will discuss data-driven schemes for resilient smart a sound decision based on the intelligence derived from
grid operations. Open research questions for future research machine learning and artificial intelligence. This task is
directions are discussed in Section VII. Section VIII is devoted challenging due to the big data nature of the smart
to the conclusions. energy data.
HU AND VASILAKOS: ENERGY BIG DATA ANALYTICS AND SECURITY: CHALLENGES AND OPPORTUNITIES 2425

TABLE I
S UMMARY OF S MART G RID DATA

D. Big Data Security and Privacy III. E NERGY B IG DATA A NALYTICS :


Smart energy data contain individual users private infor- OVERALL TAXONOMY
mation which is required to be protected under vari- A. Overview of Energy Big Data Analytics
ous legal regulations [9], [10]. The data can also contain Load modeling and forecasting are two major energy big
sensitive information of an organization. More impor- data intelligence applications. Load modeling is essential to the
tantly such data can be used to make decisions affect- understanding of the behavior of the individual and system in
ing the safe operation of the critical infrastructure. achieving efficient energy management. Load forecasting has
Therefore security and privacy will be an important generic load forecasting (long term and medium term) and
issue. However, this is also very challenging due to the short term forecasting. The former is useful for system capac-
big data nature of the smart energy data, tight cyber- ity planning and planning etc. while the latter is useful for
physical couplings, distributed and open environment of the many aspects including power distribution, demand-response
infrastructure. and pricing [11], [12].
2426 IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 5, SEPTEMBER 2016

Fig. 2. Overall taxonomy of energy big data analytics.

Due to its real-time nature, short-term forecasting is most In order to provide a systematical analysis of the tech-
challenging. Several models have been developed for energy niques related to energy big data analytics, we need to provide
load forecasting over big data [12][22]. A cost-effective a structure or classification to these vast amounts of tech-
method is the penalized linear regression-based MapReduce niques. Taxonomy is an effective tool in providing a structure
algorithms [22], where the MapReduce algorithm can reduce for the distributed and complicated environment. Designing
the training dataset significantly for the penalized linear a good taxonomy is non-trivial as it needs to meet the seven
regression. The framework of the similar day approach, principles of taxonomy design [2]. By following the seven tax-
being used in conventional short-term forecasting, seems to onomy design principles [2], a taxonomy is designed for the
be useful for the short-term forecasting over energy big overall energy big data analytics as shown in Fig. 2.
data [17], [18]. Energy big data analytics is a very broad area, In Fig. 2, energy big data analytics are categorized into three
covering big data generation infrastructure, big data comput- distinct categories: (i) energy big data architecture/platform,
ing platform/architecture, big data intelligence applications, (ii) energy big data intelligence, and (iii) energy big data secu-
big data intelligence algorithms and tools, and big data secu- rity and privacy. Different from a common perception that
rity and privacy. To deal with such complicated environment, energy big analytics is on energy big data intelligence, our
a well-structured analysis framework is desirable. In the next proposed taxonomy has placed the above three mentioned
section, taxonomy will be designed in order to provide such components under an integrated framework.
a structured analysis framework. This is because an optimal energy big data intelligence
scheme is tightly coupled with how energy big data is stored,
accessed, and communicated. Also as energy data involve
B. Overall Taxonomy human privacy and system security, energy big data intel-
In order to understand the issues related to energy big ligence schemes have to address the issue of privacy and
data analytics, it will be helpful to know what types of security. For example, the best platform of energy big
data a SG can encounter. There are many different data related data intelligence is cloud computing where data are encrypted.
to energy data analytics and it is impossible to provide Data intelligence over these encrypted data has to tailor its
a complete list here. Based on the power system application design to complying with the underlying privacy and security
data dictionary [23], we provide a summary of various smart protection mechanisms.
grid relevant energy data as shown in Table I. Basically the This overall taxonomy shows the intriguing relations among
classification is based on the data generation source. various components under energy big data analytics. For
HU AND VASILAKOS: ENERGY BIG DATA ANALYTICS AND SECURITY: CHALLENGES AND OPPORTUNITIES 2427

example, components of Control System Aspect, Cyber systems, it is essential to develop a new security framework
Security Aspect, and energy big data oriented anomaly detec- that can cover other factors due to the introduction of the
tion will be tightly coupled with Data-driven Resilient SG ICT technology. In [26], a distributed agent based security
Management. Such knowledge will be helpful in the under- framework is proposed to deal with possible system dam-
standing of the issues in the energy big data security and age caused by the cyber-attacks. The proposed framework
privacy, which will be the focus in the remaining sections. utilized peer-to-peer communication architecture, reputation-
based trust management scheme and a data retransmission
scheme to detect possible cyber-attacks. Although such frame-
IV. E NERGY B IG DATA I NFRASTRUCTURE S YSTEM works have incorporated ICT components into the security,
S ECURITY: C YBER P HYSICAL A SPECT they are mostly based on the high-level abstraction of the cyber
Smart energy data contain commercial information of com- impact, e., reputation, in detecting possible cyber-attacks.
panies, and involve end users privacy. A SG, as an open Such coarse generalization tends to generate too conservative
critical infrastructure, is vulnerable to cyber-attacks which results, leading to inefficiency. A more accurate description of
could lead to catastrophic disasters. In the energy sector, the cyber-physical interaction will help produce optimal results.
concept of system security differs from the one in the cyber Cyber-physical coupling modeling: The most unconven-
security community. Although energy system security and tional security feature in smart grids is the tight cyber-physical
cyber security are both integral parts of the energy infrastruc- coupling. We have following two interesting observations:
ture security, little research effort has been made to consider Real-time factor: Traditional power system security
them in a coherent framework. In this section, we will attempt involves real-time factor while traditional cryptography
to discuss energy infrastructure security methods in a coher- does not involve this factor.
ence framework and reveal lessons we have learned from the Cyber-physical interaction: Both traditional power sys-
prior ignorance of such coherence. tem security and cryptography do not consider such
In the energy sector, security is defined as the ability of interactions. Yet, in reality, cyber activities can impact
a power system to withstand sudden disturbances [24], [25]. physical processes significantly and the characteristics
Traditionally such security is concerned with designing a sys- of a physical process can also be used for detecting
tem that can provide acceptable service under sudden distur- cyber-attacks [2].
bances from equipment faults and/or nature disasters. Such Therefore, a good understanding of the tight cyber-physical
design relies heavily on the technique called contingency coupling will be essential for the smart grid infrastructure
analysis [25]. The contingency analysis refers to a system- security. Some efforts have been made on this aspect. In [28],
atical system security assessment, which is often dependent cause-effect relationships of cyber-physical system have been
on performing evaluation of the state estimator against possi- used to estimate the impact of cyber-attacks. A graph the-
ble occurrence of credible contingencies to determine whether ory approach is used to model such relationships. Although
steady-state operating limits would be violated. Traditionally cause-effect relationships of cyber-physical systems are very
the possible occurrence of credible contingencies is most important for the understanding the cyber-physical coupling
likely from equipment faults and nature disasters. Therefore characteristics, such relationships are from qualitative aspect
most relevant theories and models are based on the parts which is not sufficient. This is because the physical opera-
random fault and nature disaster occurrence probability distri- tion process of a power system depends also on the real-time
bution. Unfortunately such energy systems are most unlikely quantitative relationships of the cyber-physical coupling.
to withstand cyber-attacks. Instead of following the nature In [27], a security-oriented stochastic risk index CPINDEX
random equipment fault/failure probability distribution, cyber- is proposed to measure the security level of the underlying
attacks will usually target most critical components of the cyber-physical settings in the smart grid. This is the first com-
energy system and conduct coordinated attacks, which will prehensive attempt in describing the quantitative relationships
generate concatenated failures leading to far more severe dam- of the cyber-physical coupling and their impact to a smart
ages than what can be coped with by the traditional security grid system security. The proposed CPINDEX utilizes the
system design. Security incorporating tight cyber-physical cou- information of system logs and other information to estimate
pling has become a new exciting research paradigm. With the the security index by using belief propagation algorithms. An
emerging smart grid, it will become a pressing research issue. interesting security measure named cyber-physical index is
Recently some developments have been made to extend proposed as follows:
power system security into covering cyber-attacks as   
disturbances [26][29]. These works focus on two important CP(s) = CIPC () Pr root(CT ) = 0s , (1)
A(s)
aspects of smart grid security: (i) Security framework covering
cyber aspect, and (ii) Cyber-physical coupling modeling. where s is system security state, represents a critical asset,
Security framework covering cyber aspect: Traditional CIPC () is the physical contingency ranking centrality index,
power system security design is based on the properties and CT represents Boolean expressions consequence trees
of pure power system physical components such as relay related to low level detectable incidents. A(s) refers to all the
devices, and switches etc. The random fault/failure proper- assets in the control of a physical equipment and the probabil-
ties have played a pivotal role in system security assessment ity that the asset is affected due to the attack at the current
and resilient mechanism design. For the emerging smart grid network security state is represented by Pr(rootCT ) = 0|s).
2428 IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 5, SEPTEMBER 2016

This index can effectively measure the severity of the dam- A. Big Data Oriented Cryptosystems
age/risk when a specific physical incident happens due to Cryptography is the most fundamental component in a secu-
cyber-attacks. We can have following observations: rity system. As a SG is of a hierarchical structure, its relevant
System security state is not a well-defined concept, which
big data oriented cryptosystems will normally have a focus on
is hard to describe quantitatively. Security risk could be
individual hierarchical levels. Existing energy big data oriented
a better mechanism.
cryptosystems can be classified into following two categories:
Mathematically eq.(1) represents the average or expecta-
(i) AMI and smart meters oriented big data privacy and
tion of the impact to the assets being controlled by the
security, and (ii) scalable privacy-preserving data-sharing and
same physical equipment. In practical applications, mea-
authentication schemes for overall smart grids, and (iii) Smart
surement of the maximum impact to the system will be
grid big data oriented scalable public-key certificate revocation
more concerned.
In summary, we can have following observations: schemes.
Most of system security works are within the control 1) AMI and Smart Meters Oriented Big Data Privacy and
engineering community where the major research focus is Security: AMI and smart meters are major components of
on physical process and many conventional cyber security big data generation and they provide the major source for
issues, e.g., confidentiality etc., are seldom considered. intelligent applications. Data aggregation mechanisms can
The system security definition is very similar to the con- effectively reduce the data size for further processing and
cept of reliability in the domain of cyber security: Under address the privacy issue. Although much research effort has
the impact of cyber-attacks on command signals, and sys- been made on how to perform data aggregation, less effort has
tem state etc., the system can maintain the service quality been made on how to make methods of performing data aggre-
within a tolerable error range [30]. The dependability gation and encryption scalable. Recently some progress has
framework can integrate these two security definitions been made on this through reducing the computation load of
seamlessly together [30]. real-time smart meter data encryption [38], and providing scal-
able encryption [39], [40]. In [38], an online/offline attribute
based encryption (ABE) scheme is proposed which is based
on the J.Hurs ABE scheme with hidden policy. The involved
V. E NERGY B IG DATA S ECURITY AND P RIVACY: data privacy and policy privacy are achieved due to the prop-
DATA -D RIVEN A SPECT erties of the J.Hurs ABE scheme with hidden policy. On the
In the domain of cyber security, security is defined and other hand, the computational load of the encryption is signifi-
characterized as follows [30]: cantly reduced by splitting the encryption algorithm into online
Reliability: Even under the disturbances of faults and and offline phases where the tedious decryption operations can
cyber-attacks, the system correct service can be main- be delegated to the offline phases. This can effectively reduce
tained within a certain level. the real-time computing demand which is related to the big
Confidentiality: Property that data or information is not data velocity aspect.
made available to unauthorized persons or processes. In In [39], a privacy-enhanced data aggregation scheme is
the proposed dependability framework [30], it refers to proposed against internal attackers in smart grid. In this
the property that unauthorized persons or processes will scheme, electricity suppliers can learn about the current energy
not be able to observe the values/contents of the sensitive usage of each neighborhood to arrange energy supply and
variables of the relevant systems. distribution without knowing the individual electricity con-
Availability: Readiness for correct service. The correct sumption of each user. Secure batch verification and formal
service is defined as delivered system behavior that is proofs are provided. This is the first scheme against internal
within the error tolerance boundary. . attackers.
Integrity: Absence of malicious external disturbance, In this proposed scheme, the aggregator and users generate
which makes the system output off its desired service. their relevant public-private key pairs, then the offline trusted
Maintainability: Ability to undergo modifications and third party (TTP) sends the blinding factors to the aggrega-
repairs. tor and each user. The end user Ui collects his/her power
Authenticity: Ability to provide services with provable consumption reading mi from own smart meter and computes

a ciphertext CTi = gm ri i
0 (H2 (t)h ) and a signature i . Upon
i
origin.
Non-repudiation: Services provided cannot be disclaimed receiving the signature and cipertext pairs from all users, the
later. aggregator cancompute the aggregated energy usagesof n
the
Although many security solutions have been made for neighborhood ni=1 mi by taking the discrete log of (g) i=1 mi .
SGs, most of them are not big data oriented. Recently new This is feasible as the total power usage within a neighbor-
developments have been made for energy big data security hood is not a large number within a regular interview, and
and privacy [31][36]. These new achievements cover three the computation can take polynomial time using the Pollards
important aspects of energy big data security: (i) big data ori- lambda method.
ented cryptosystems [31], [32], (ii) big data oriented privacy- Although such privacy-preserving data aggregation solu-
preserving intelligent financial applications [33], and (iii) big tions are strong from cryptographic aspect, they are vulnerable
data oriented anomaly detection [33][35], [37]. to human factor attack [41], [42]. For example, an adversary
HU AND VASILAKOS: ENERGY BIG DATA ANALYTICS AND SECURITY: CHALLENGES AND OPPORTUNITIES 2429

can infer a persons meter reading from the knowledge of the property of the identify-based ring signature can eliminate
persons presence or absence in the property. In [42], a new the process of certificate verification. The property of for-
attack has been identified and formalized. The attack could ward security can enjoy the following advantage: if a secrete
exploit about the presence or absence of a specific person key of a user has been compromised, all previous gener-
to infer his meter readings via human-factor-aware differen- ated signatures including this user still remain valid. This
tial aggregation (HAD). This attack could not be addressed property is very useful for large scale data sharing SG sys-
by existing privacy-preserving aggregation protocols proposed tem because it is infeasible to request all data owners to
for smart grids. In order to address this problem, two new re-authenticate their data when one user has been compro-
protocols, including basic scheme and advanced scheme, have mised. In [31], a secure cloud computing based framework for
been proposed to resist the HAD attack [42]. big data information management in smart grids is proposed.
Wan et al. [40] show that a recent proposed key management The proposed framework is a hierarchical structure consist-
scheme for AMI suffers from the desynchronization attack ing of multiple interconnected regional clouds. Four types of
and is not scalable. To address these issues, a new scalable services are provided via this cloud computing architecture:
key management scheme (SKM) based on the combination of (i) Information storages. They store all smart grid informa-
identity-based cryptosystem and efficient key tree technique tion received from intelligent devices such as smart meters.
is proposed [40]. The complexity of the proposed SKM is (ii) General user services. They supply all services that an
O(log n) in each aspect of computation and communication, energy consumer will need. (iii) Control and management ser-
which is scalable in terms of the number of smart meters n. vices, and (iv) Electricity distribution services. Based on this
Although it provides a solution for scalable key management architecture, an identity-based cryptography scheme is pro-
for AMI and associated smart meters, it does not address the posed for authentication of data and nodes. Confidentiality
issue of secure communication and /or secure data sharing and nonrepudiation security features have also been supported.
with other parts of the smart grid, e.g., control centers etc. It The proposed security scheme provides two related but differ-
is desirable to develop a framework that can cover the whole ent constructions for confidentiality service and authentication
smart grid operation ranges. This issue can be addressed by service.
cryptosystems designed for the overall system. a) Confidentiality service: Key generation phase: The
2) Scalable Privacy-Preserving Data-Sharing and trusted external party PKG generates two groups G1 , G2
Authentication Schemes for Overall Smart Grids: A smart of prime order q and an computational efficient pairing:
grid is a large distributed system with hierarchical structure. e : G1 G2 G2 , generator g and hash functions Hi .
Most existing research works consider only the lowest level It picks up a random key s Zq and computes its pub-
components, e.g., smart meters, which will be exposed lic key u = gs . Then it sets up the master key mk = s
to new attacks when coming across different hierarchical for the top cloud, and distribute the set of public parameters
levels [43]. Recently some developments have been made params = (G1 , G2 , e, g, u, H1 , H2 ) to top and regional clouds
to address this issue by covering higher hierarchical levels. and end-users. Upon receiving identities TC, IS, A, and EU
In [44], a SmartAnalyzer security analysis tool is proposed from the top cloud, information storage unit in the regional
for analyzing threats in AMI. It can provide formal modeling cloud, service in the regional cloud, and the end user, the
of AMI configuration and systematic diagnosing of smart PKG will calculate their corresponding private keys KTC =
meter unusual traces. The accuracy and scalability of the tool H1 (TC)s , KIS = H1 (IS)s , KSerA = H1 (SerA)s , KEU = H1 (EU)s
are evaluated on an AMI testbed and various synthetic test and returns to them.
networks. In [42], the scope and functionalities of a smart Encryption phase: For the encryption of the message M
grid have been introduced covering its automation and intended for the top cloud, an entity in the regional cloud
control system, and communications. It presents a general can select a random key r Zq and compute the ciphertext
SCADA cyberattack process and proposed a conceptual C1 = gr , C2 = M e(u, H1 (TC))r . Then it sends the cipher-
layered framework for protecting power grid automation text to the top cloud. Once receiving the above ciphertext, the
systems against cyberattacks without compromising the top cloud TC can recover the message as M = C2 /e(C1 , KTC ).
timely availability of control and signal data. The on-site Similarly encryption and decryption can be performed between
test of the developed security prototype system has been an end user and information storage unit in the regional cloud;
discussed. proxy and the service etc.
In addition to the work on directly handling the smart b) Authentication service: The key generation process is
meter data, research efforts have been made on designing similar to the identity based encryption (IBE) scheme used
scalable cryptosystems for the overall smart grids [45][49]. in the confidentiality service oriented scheme. A signature
The homomorphic encryption mechanism has been used to component is added for the authentication service as follows.
design a distributed smart grid dataset access control scheme For a message M, a cloud can pick a random number r Zq
achieving both privacy-preserving data aggregation and access and compute its signature as 1 = grM KTC , 2 = gr0 where
control [49], and scalable privacy-preserving demand response g1 = H1 (TC) G1 , gM = H1 (TC, M) G1 . Any party
scheme with adaptive key evolution in smart grids [47]. can verify this signature by using the clouds pubic key by
Recently more general SG oriented scalable cryptosystems checking e(g0 , 1 ) = e(u, g1 )e(2 , gM ).
have been developed [48]. In [48], an identity-based ring sig- 3) Smart Grid Big Data Oriented Scalable Public-Key
nature scheme with forward security has been proposed. The Certificate Revocation Schemes: Most of cryptosystems for
2430 IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 5, SEPTEMBER 2016

energy big data are public key infrastructure (PKI) based due to perform the high-order back-propagation algorithm on the
to the advantages of access control, and nonrepudiation etc. encrypted data for deep computation model training. The
However PKI schemes will require public key certificates in proposed scheme is highly scalable with privacy-preserving.
binding the user and the associated public key. There exists Direct feature retrieval over encrypted big data is very
a need for certificate revocation when the certificate expires, difficult and costly. A more cost-effective way is to first
security policy changes and/or a key or a node has been retrieve all relevant records (raw features) from the big data
compromised. In addressing this issue, scalable and efficient and then perform relevant intelligent applications based on
SG oriented PKI certificate revocation schemes have been these records. A range query has been a popular opera-
proposed [50][52]. In [52], a Bloom filter based scheme is tion in database in retrieving all records where some value
proposed for the efficient certificate revocation for large-scale is within a range, which will very useful for the efficient
AMI networks. The proposed scheme is constructed based on application over big data. For example, in the smart grid
the Merkle tree to enable the gateway to provide proof for cer- financial application, energy buyers can filter out the energy
tificate revocation without contacting the certificate authority with reasonable price with the help of the keyword specify-
in achieving significant overhead savings. In [50], a scheme ing the range of price. For privacy-preserving query, query
based on the compressed Certificate Revocation Lists (CRLs) over encrypted database is the solution. Unfortunately exist-
is proposed for pseudonymous public key infrastructure. It ing encrypted keyword search schemes in smart grid auction
is shown the proposed scheme is secure and the size of market cannot achieve range query of keywords [56]. In order
the certificate revocation list is linear with the number of to address this issue, some recent research efforts have been
revoked certificate series. This scheme has been applied to made [32], [55], [56]. In [56], a SG auction market oriented
the vehicle-to-grid communication application [51]. scheme is proposed which can support both range query and
The cryptosystems discussed above are mainly concerned ranked search over encrypted big data. Based on the homo-
with data security and node authentication. They are not morphic Pailier encryption, it can aggregate multidimensional
suitable for other important energy big data security related keywords of the buyer and the seller where the comparison
applications such as privacy-preserving intelligent applica- between the keywords of all sellers and one buyer can be
tions, and energy big data oriented anomaly detection. For conducted with only one calculation. In [32], a more generic
these applications, different security mechanisms are needed. privacy-preserving range query (PaRQ) scheme is proposed to
4) Big Data Oriented Privacy-Preserving Intelligent perform query over encrypted metering data for the intelligent
Applications: For privacy-preserving intelligent applications, financial auditing applications. The PaRq constructs a hidden
the big challenge is how to find useful information from vector encryption based range query predicate to encrypt the
potential protected/encrypted big data. Efficiency/scalability is searchable attributes and session keys of the encrypted data,
another hard constraint. Privacy-preserving pricing is a typical The requesters range can be transferred into two query tokens
application of big energy analytics. In [53], a usage-based to be used to find the matched query results. More specifi-
dynamic pricing (UDP) scheme is proposed for smart grid cally, for a range query p = (a1 x1 b1 ) (a2 x2
which is privacy-preserving. The proposed scheme has b2 ) (an xn bn ), the control center will divide P into
4 phases: (i) The utility company first defines the usage two parts:
threshold em of the local power grid, defines the dynamic
price function f(), and selects random secretes for each p = (a1 x1 ) (a2 x2 ) (an xn ),
community and its gateway C. (ii) Customers will report their p = (x1 b1 ) (x2 b2 ) (xn bn ).
electricity usage per time slot in the encrypted mode to their
community gateway C where decryption will be conducted By using Boneh and Waters Hidden Vector Encryption (HVE)
to obtain these data. Then it sends back a price indicator to predicate encryption [57], privacy-preserving tokens with con-
the customer. (iii) The community gateway C will forward stant size can be generated which can reduce communication
the received electricity usage to the utility company. (iv) The overhead, computation overhead and response time signifi-
utility company will recover the electricity usage and compute cantly. Security analysis demonstrates that data confidentiality
the actual electricity price for each customer. The above and query privacy are preserved. Also simulation results show
scheme is constructed through the homomorphic encryption that the proposed scheme is scalable which is suitable for big
technique where privacy is guaranteed. Scalability and data analytics. A major limitation of this scheme is the specific
efficiency have been achieved via this distributed encryption environment setting where a central control center is connected
process. to two cloud servers and acts as a proxy for the users and
The above scheme is targeting a specific application of the the searcher. A practical SG would require more general and
SG in a community environment. It is desirable to develop complex environment such as more distributed cloud servers,
more generic energy big data intelligent schemes to accommo- multiple control centers, and direct communication between
date variant applications in a more general environment. One users and clouds etc.
solution is to retrieve general features over encrypted energy 5) Summary of Energy Big Data Oriented Cryptosystems:
big data [32], [54][56]. In [54], a privacy-preserving deep In Table II, we provide summarized comparisons of var-
computation model is proposed. The proposed model uses ious energy big data oriented cryptosystems. The com-
Brakerski-Gentry-Vaikuntanathan (BGV) encryption scheme parison is made from the aspects of functionality includ-
to encrypt the private data and then employs cloud servers ing confidentiality, integrity, authenticity, non-repudiation,
HU AND VASILAKOS: ENERGY BIG DATA ANALYTICS AND SECURITY: CHALLENGES AND OPPORTUNITIES 2431

TABLE II
S UMMARY OF E NERGY B IG DATA O RIENTED C RYPTOSYSTEM

and privacy-preserving intelligence; scalability, cost and reduce its assigned trust value in trust based security
data source. mechanisms.
Man-in-the-middle attack: Such attacks can either
covertly monitor or manipulate information between two
B. Big Data Oriented Anomaly Detection smart grid communication nodes.
Although traditional cryptographic methods can provide Denial of service attack: It can prevent smart grid from
strong solutions for message based security, they are inef- providing services.
fective in addressing issues related to real-time tight cyber- System hijacking attack: It can obtain unauthorized
physical couplings because conventional cryptography does remote access to a smart grid node.
not consider real-time factor and tight cyber-physical cou- Anomaly detection is a powerful mechanism for smart grid
plings. On the other hand, a power system obeys relevant fault detection [58], power theft and fraud detection [36], [58],
physical laws. Therefore we can utilize such physical process and cyber intrusion detection [37], [58]. Unfortunately most
knowledge to help detect anomaly behavior which could be of the existing solutions are not big data oriented. Recently
related to cyber-attacks and/or energy theft. In [29], a reputa- several algorithms have been developed in addressing this
tion based trust management system has been introduced. It issue [33][35], [58]. These algorithms can be classified into
aims to address the risks of cyber-attacks originated from mis- following two categories: (i) general energy big data oriented
using information generated within the smart grid. The risks anomaly detection algorithms applicable to electricity theft,
of cyber-attacks are broad [29]: fault and cyber intrusions, (ii) energy theft based algorithm,
Customer profiling attack: smart meter data contain pri- and (iii) physical process based anomaly detection scheme
vate information of users usage. They can be used to against false data-injection attacks.
determine, among many other things, when residential 1) Generic Energy Big Data Anomaly Detection: In a smart
customers are and are not at home. Such information will grid, anomaly based intrusion detection system (IDS) design
be detrimental to individuals if it is used by thieves. is far more challenging and also critical due to the real-time
IP spoofing attack: It can be used to redirect smart grid and cyber-physical interaction factors. For example, a delayed
information to an adversarys own computer for further command signal or a compromised command signal can cause
analysis. It can also cause a smart grid communica- the physical power system to become unstable. As there
tion node appear non-responsive and faulty, which will could exist multiple sources, e.g., fault, cyber intrusion and
2432 IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 5, SEPTEMBER 2016

electricity theft, leading to the observed abnormal data of qg , qBf to represent expected energy usage from genuine type
a smart grid, it is not always easy to determine which source of customers and fraudulent type of customers respectively.
is behind the observed abnormal data. Therefore research The framework considers following two hypotheses:
efforts have been made to design general energy big data ori- iid  
ented anomaly detection algorithms applicable to electricity Hg : Y1i , . . . , YKi
= fg , E Yti = qg ,
theft, fault and cyber intrusions without identifying the source iid  
causing the anomaly [33], [34]. In [33], a scalable method is Hf : Y1i , . . . , YKi
= ff , E Yti = qBf < qg , (2)
proposed to observe anomaly behavior over energy big data.
Supervised learning classifiers have been applied to finding where yit is a meter measurement collected at the ith customers
anomaly data. It can be used to detect electricity theft, fault, meter at time t which is a realization of the random variable Yti .
and cyber intrusions. This approach has utilized some ad hoc fg , ff denote the probability density function of the meter mea-
intuitions in addressing the big data challenge. surements of the genuine customer and fraudulent customer
A more rigorous algorithm is proposed in [34] for detect- respectively. Setting a threshold , for a customer i, then we
ing energy big data anomaly algorithm. In [34], a data-driven have following fraudulent detection rule:
scheme is proposed for monitoring the abnormal events hap- K i
yt < , fraudulent
pening in a smart grid. In order to address the issue of t=1
K (3)
t=1 yt , genuine.
i
high-dimensional analysis, a big data architecture is designed
based on the random matrix theory. A statistic named mean It is interesting to observe that, like many other similar
spectral radius (MSR) is proposed to reflect the correlations approaches, the above mentioned game-theoretic framework
of system data in different dimensions. The smart grid state for energy theft detection is based on behaviors of individ-
can be monitored from a distributed MSR estimated locally. ual meter reading. They are not applicable when dealing with
It is shown that anomaly events of the smart grid can be coordinated attacks. In [62], two types of data attacks have
detected with the proposed scheme under reasonable cost. been investigated. One attack is that an adversary has compro-
More specifically suppose data collected at the energy system mised a sufficient number of smart meters so that the network
at time ti is expressed as a vector x ti . Use x to repre- state becomes unobservable by the control center. A graph the-
sent all the collections of the raw data accumulated in the oretic approach has been proposed to characterize the smallest
database. The size (xti ) represents the dimension of the data at set of attacked smart meters that can cause network unobserv-
time ti whose upper bound will be very big due to the large ability. Another type of attack is that an adversary controls
number of variables in the system. By using the probabil- only a small number of smart meters. Such attack can be
ity Ring Law, the smart grid system state can be estimated examined from a decision theoretic perspective for both the
as the mean of the radius of all eigenvalues of Z that is control center and the adversary.
formed by the interested data area specified by a split-window 3) Physical Process Based Anomaly Detection Scheme
(e.g., N rows, T columns). If this value deviates significantly Against False Data-Injection Attacks: With the emerging
across two consecutive time windows, an anomaly will be smart grid, a new type of cyberintrusion is looming which
announced. attempts to alter the power load through the Internet via auto-
2) Energy Theft Detection: Energy theft has been a seri- matic and distributed software intruding agents [63]. Such
ous problem in the power service industry. It is estimated attacks can compromise direct load control command signals,
that the annual losses in the United States alone can reach demand side management price signals etc. In [63], a system-
$6 billion [59]. In [59], investigation has been conducted on atical investigation has been conducted to identify a variety
adversary means of defrauding by manipulating AMI systems. of practical loads that can be vulnerable to Internet-based
An energy theft attack tree has been constructed to explore load altering attacks. It also provides an overview of potential
various vulnerabilities of energy theft. It is found that manip- defense mechanisms including:
ulating the demand information at AMI is the most effective Protection of Command and Price Signal using (i) pri-
way for energy theft [59]. vate key encryption and message authentication code,
Energy theft is another important issue which can be (ii) efficient group key management
detected via machine learning based anomaly detection. Protection of smart meters and data centers
In [60], a multi-sensor energy detection framework for the Load anomaly detection
Ami is proposed. The proposed scheme, named as AMIDS, is In a smart grid, state estimation is used to estimate the
an AMI intrusion detection system. It fuses meter audit logs of power grid state based on the meter measurement data. This
physical and cyber events with consumption data to increase information will be used in the reliable operation of the
the accuracy in detecting theft-related behavior. One limita- smart grid, contingency analysis, optimal power flow, and pric-
tion of this approach is the difficulty in collecting reliable ing etc. [64], [65]. Recently false data-injection attacks have
physical activity logs for the large-scale smart grid. In [61], been reported which can bypass existing bad data detection
a game-theoretic framework is proposed to model the adver- techniques [65]. This discovery has sparked a strong interest
sarial nature of the electricity theft problem. It considers two in exploring various strategies of false data-injection attacks
environments, i.e., unregulated monopoly and perfect compe- and countering measures [64][80].
tition. This framework can help detect electricity theft through Suppose the state x = (x1 , x2 , . . . , xn )T and the measure-
the observation of the power usage behavior. Suppose we use ment data is expressed as z = Hx + e where H is related to
HU AND VASILAKOS: ENERGY BIG DATA ANALYTICS AND SECURITY: CHALLENGES AND OPPORTUNITIES 2433

network configurations and the measurement error, and e is localization. The limitation is the specific application environ-
assumed normally distributed with zero mean. Conventional ment of wireless sensor networks where the filtering relies on
bad data measurement detection is based the observation the cooperation of the sensor nodes during the message for-
whether the measurement residual zHx is larger than a prede- warding process. It is not applicable for the meter that does
fined threshold or not [65]. In the basic false data-injection not go through this forwarding process, e.g., the meter is just
attacks, the measurement data received at the control center one hop to the smart grid. Also the pre-stored master key
will be za = z + a where a is the attack vector. When the mechanism will make it costly in key update. As a matter of
adversary chooses a = Hc as the attack vector, we will have fact, hardware attacks can reveal data stored in the hardware.
z = H(x + c) + e where such attacks are unobservable as the Most recently a short-term state forecasting-aided method is
measurement residual will be within the bad data detection proposed to detect smart grid general false data injection
threshold [65]. In practice, an adversary can only compromise attacks [83]. This idea is to extend the approximate DC model
a small set of meters, and hence a sparse structure a is gen- to a more general linear model that can handle both supervi-
erated where most of as entries are zeros. These k nonzero sory control and data acquisition and phasor measurement unit
entries will correspond to the k compromised meters. measurements.
In [67], it is shown that smallest size unobservable attack,
i.e., an observable attack with smallest number of compro-
mised meters k, can be identified in polynomial time via the VI. DATA -D RIVEN S CHEMES FOR R ESILIENT SG
graph theory. If the adversary can compromise a much smaller O PERATION : S ECURITY-C ONSTRAINED
set of meters, i.e., in the weal attack regime, it is possible O PTIMAL P OWER F LOW (SCOPF)
to detect the presence of such attacks via the generalized The network state estimated from the state estimator
likelihood ratio test (GLRT). A similar work can be found can be used for resilient SG operations including planning,
in [81] where the problem of finding the minimum num- operational planning, pricing and real-time operation of the
ber of measurement points to be attacked undetectably can system [84]. In some cases, a severe incident, e.g., natural
be reduced to minimum cut problems on hypergraphs. Most disaster or cyber-attack can trigger contingency actions and
recently two data-driven strategies for unobservable attacks it is very important that the SG can maintain resilient opera-
are presented where detailed knowledge of system parameters tions. The SCOPF is a useful tool for this purpose which can
are not needed [82]. The first strategy is to affect the system make correct actions/decisions against a set of predefined con-
state by hiding directly the attack vector in the system sub- tingencies. Due to the big data nature of the SG, the SCOPF
space. The second strategy misleads the bad data detection problem will become a nonlinear, non-convex, and large-scale
mechanism so that data not under attack are removed. optimization problem with both continues and discrete vari-
Most recent developments along this line are: (i) the ables, which is very challenging [84]. SCOPF problem usually
centralized sparse attack construction case is extended to has two aspects (i) SCOPF problem and (ii) SCOPF problem
a distributed framework for both the estimation and attack solution including reducing the size of the SCOPF problem,
problems. Corresponding optimization is formulated together efficient algorithms for handling challenges brought about by
with a proposed solution [73], (ii) an optimal data-injection the mixed continuous variables and discrete variables.
attacking strategy is designed to selecting a set of meters which SCOPF problem formulation includes the modelling of
can cause maximum damage. A temporal-based detection a limited number of post-contingency control actions, deter-
using online nonparametric cusum change detection mecha- mining the minimum number of control actions, determining
nism is proposed to detect such attacks [64], and (iii) a new the sequence of control actions, and handling voltage and tran-
CUSUM-type algorithms based on the generalized likelihood sient stability in SCOPF [84]. In [85], an optimization-based
ration (GLR) is proposed for centralized and distributed cyber- approach is proposed for identifying constraints that are nec-
attack detection. The proposed scheme is efficient in terms of essary and sufficient to the description of the feasible set of an
low communication overhead and small detection delay [68]. SCOPF problem. It is shown that the resulting sizes of SCOPF
The above schemes are based on the principle of power problems are much smaller and can be solved much faster.
system physical process. The fundamental cause of false Recently some efforts have been made to incorporate cyber-
data-injection attacks is the modification of sensed meter attack issue into the SCOPF problem formulation. In [86],
data. At the data level, cryptographic authentication mech- investigation has been conducted in analyzing the behavior
anisms such as message authentication code can address of the Optimal Power Flow (OPF) algorithm in the pres-
effectively the issue of data integrity. Recently some effort ence of false data-injection and the resulting consequences
has been made in this field. In [72], an anomaly detection to the system operator. It characterizes the set of attacks
mechanism is designed with an integrated watermarking com- that may lead the operator to apply the erroneous OPF
ponent to address the false data injection attacks in smart recommendation. In [87], it considers the problem of charac-
grids. This scheme can help detect more stealthy attacks terizing impacts of bad data on real-time locational marginal
that involve subtle manipulation of the measurement data. price (LMP). Numeric simulations are provided to illustrate
In [70], a polynomial-based compromise-resilient en-route fil- the worst performance for IEEE-14 and IEEE-118 networks.
tering scheme (PREF) is proposed to filtering false injected SCOPF problem solution includes reducing the size of
data effectively and achieve a high resilience to the number of the SCOPF problem, efficient algorithms for handling chal-
compromised nodes without relying on static routes and node lenges brought about by the mixed continuous variables and
2434 IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 5, SEPTEMBER 2016

discrete variables [84]. For large-scale smart grids, it is impos- or several neighboring nodes should be able to esti-
sible to obtain the direct solution of the SCOPF problems mate the overall sensor network data probability density
due to the memory limitation and/or prohibitive computation distribution with a solid theoretical basis [74], [75].
times [88]. As a result, many indirect solutions are proposed
such as iterative contingency selection schemes; decomposi-
VIII. C ONCLUSION
tion method; network compression and hybrid methods that
combine contingency selection and network compression [88]. In this paper, we have surveyed the latest developments in
the field of energy big data analytics and security/privacy.
We have provided a comprehensive coverage of energy
VII. O PEN R ESEARCH I SSUES big data analytics and security/privacy ranging from energy
Smart energy big data analytics is a very complex and chal- big data architecture, intelligence applications, cryptosystem
lenging topic. In addition to sharing many common issues with design, system security assessment, cyber intrusion detection,
the generic big data analytics, smart energy big data involve secure financial intelligent application, and electricity theft.
extensively with physical processes where data intelligence In addition to the usual 3Vs challenges of energy big data,
can have a huge impact to the safe operation of the systems we have also covered the real-time and tight cyber-physical
in real-time. This tight cyber-physical dimension has brought coupling which are salient features in a smart grid. We have
about many exciting research problems. Following is the list also proposed an energy big data oriented taxonomy for better
of open research problems: understanding the complicated and intriguing relations among
Holistic and modular architectures: Existing architectures various components, security issues and associated solutions.
for smart energy big data analytics are based on generic Finally we have provided and discussed various open research
cloud computing architectures. They are either too gen- questions and future research directions.
eral to be implemented or too specific without covering
sufficient functionalities. A holistic and modular energy R EFERENCES
big data analytics architecture based platform is needed
[1] Ecotricity. The End of Fossil Fuels. Accessed on Jul. 21, 2015.
to ensure the widest coverage of issues related to smart [Online]. Available: https://www.ecotricity.co.uk/our-green-energy/
energy big data analytics and interoperability among energy-independence/the-end-of-fossil-fuels
various modules including new modules. [2] J. Hu, H. R. Pota, and S. Guo, Taxonomy of attacks for agent-
based smart grids, IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 7,
Platform incorporating real-time control: Existing plat-
pp. 18861895, Jul. 2014.
forms for smart energy big data analytics is mainly for [3] N. Bui, A. P. Castellani, P. Casari, and M. Zorzi, The Internet of
data sharing and intelligence, or precisely system mon- energy: A Web-enabled smart grid system, IEEE Netw., vol. 26, no. 4,
itoring and financial related intelligence. It is desirable pp. 3945, Jul./Aug. 2012.
[4] D. Alahakoon and X. Yu, Smart electricity meter data intelligence for
to incorporate a real-time control module that can pro- future energy systems: A survey, IEEE Trans. Ind. Informat., vol. 12,
duce real-time physical control signals based on the smart no. 1, pp. 425436, Feb. 2016, doi: 10.1109/TII.2015.2414355.
energy big data analytics. [5] Z. Huang, H. Luo, D. Skoda, T. Zhu, and Y. Gu, E-Sketch: Gathering
large-scale energy consumption data based on consumption patterns,
Co-design of smart energy big data analytics and secu- in Proc. IEEE Int. Conf. Big Data (Big Data), Washington, DC, USA,
rity mechanism: Existing smart energy big data analytics 2014, pp. 656665.
schemes and the security schemes are designed sepa- [6] J. Yin, P. Sharma, I. Gorton, and B. Akyoli, Large-scale data challenges
in future power grids, in Proc. IEEE 7th Int. Symp. Service Orient. Syst.
rately. Security functions are mostly aftermath thoughts. Eng. (SOSE), Redwood City, CA, USA, 2013, pp. 324328.
A co-design of smart energy big data analytics and [7] M. Aiello and G. A. Pagani, The smart grids data generating poten-
security mechanisms can produce a seamless integrated tials, in Proc. Federated Conf. Comput. Sci. Inf. Syst. (FedCSIS),
Warsaw, Poland, 2014, pp. 916.
framework which can reduce the security risk to the [8] K. L. Wagstaff, Machine learning that matters, in Proc. 29th Int. Conf.
minimum. Mach. Learn. (ICML), Pasadena, CA, USA, 2012, pp. 529536.
Distributed and parallel intelligence: Smart energy big [9] D. A. Powner, Electricity grid modernization: Progress being made on
cybersecurity guidelines, but key challenges remain to be addressed,
data analytics is experiencing data explosion collected GAO report, United States Gov. Account. Office, Washington,
from distributed sources. A distributed and parallel intel- DC, USA, Tech. Rep. GAO-11-117, Jan. 2011. [Online]. Available:
ligence can effectively address this problem which can http://www.gao.gov/new.items/d11117.pdf
[10] S. Simitis, From the market to the polis: The EU directive on the
also reduce the raw data accumulation and communica- protection of personal data, Iowa Law. Rev., vol. 80, no. 3, p. 445,
tion significantly. Existing aggregation or summarization 1994.
methods can achieve the same aim of reducing large raw [11] S. Bera, S. Misra, and J. J. P. C. Rodrigues, Cloud computing appli-
cations for smart grid: A survey, IEEE Trans. Parallel Distrib. Syst.,
data. However as aggregation or summarization meth- vol. 26, no. 5, pp. 14771494, May 2015.
ods are targeting local raw data without considering [12] J. W. Taylor and P. E. McSharry, Short-term load forecasting methods:
overall system target, they can lose very useful infor- An evaluation based on european data, IEEE Trans. Power Syst., vol. 22,
mation which would be needed for specific applications. no. 4, pp. 22132219, Nov. 2007.
[13] M. Frincu, C. Chelmis, M. U. Noor, and V. Prasanna, Accurate and
A good distributed intelligence algorithm should be built efficient selection of the best consumption prediction method in smart
upon a solid theoretical basis to approximate the relevant grids, in Proc. IEEE Int. Conf. Big Data (Big Data), Washington, DC,
overall performance indicator. For example, for anomaly USA, 2014, pp. 721729.
[14] S. Aman, Y. Simmhan, and V. K. Prasanna, Holistic measures for eval-
detection in a wireless sensor network, distributed local uating prediction models in smart grids, IEEE Trans. Knowl. Data Eng.,
intelligence based on the observations from a single node vol. 27, no. 2, pp. 475488, Feb. 2015.
HU AND VASILAKOS: ENERGY BIG DATA ANALYTICS AND SECURITY: CHALLENGES AND OPPORTUNITIES 2435

[15] K. Nose-Filho, A. D. P. Lotufo, and C. R. Minussi, Short-term [37] R. Mitchell and I.-R. Chen, Behavior-rule based intrusion detection
multinodal load forecasting using a modified general regression neu- systems for safety critical smart grid applications, IEEE Trans. Smart
ral network, IEEE Trans. Power Del., vol. 26, no. 4, pp. 28622869, Grid, vol. 4, no. 3, pp. 12541263, Sep. 2013.
Oct. 2011. [38] Z. Wang, F. Chen, and A. Xia, Attribute-based online/offline encryp-
[16] L. Ghelardoni, A. Ghio, and D. Anguita, Energy load forecasting using tion in smart grid, in Proc. 24th Int. Conf. Comput. Commun.
empirical mode decomposition and support vector regression, IEEE Netw. (ICCCN), Las Vegas, NV, USA, 2015, pp. 15.
Trans. Smart Grid, vol. 4, no. 1, pp. 549556, Mar. 2013. [39] C.-I. Fan, S.-Y. Huang, and Y.-L. Lai, Privacy-enhanced data aggrega-
[17] T. Senjyu, P. Mandal, K. Uezato, and T. Funabashi, Next day load tion scheme against internal attackers in smart grid, IEEE Trans. Ind.
curve forecasting using hybrid correction method, IEEE Trans. Power Informat., vol. 10, no. 1, pp. 666675, Feb. 2014.
Syst., vol. 20, no. 1, pp. 102109, Feb. 2005. [40] Z. Wan, G. Wang, Y. Yang, and S. Shi, SKM: Scalable key management
[18] Y. Chen et al., Short-term load forecasting: Similar day-based wavelet for advanced metering infrastructure in smart grids, IEEE Trans. Ind.
neural networks, IEEE Trans. Power Syst., vol. 25, no. 1, pp. 322330, Electron., vol. 61, no. 12, pp. 70557066, Dec. 2014.
Feb. 2010. [41] W. Jia, H. Zhu, Z. Cao, X. Dong, and C. Xiao, Human-factor-aware
[19] R. E. Abdel-Aal, Short-term hourly load forecasting using abduc- privacy-preserving aggregation in smart grid, IEEE Syst. J., vol. 8, no. 2,
tive networks, IEEE Trans. Power Syst., vol. 19, no. 1, pp. 164173, pp. 598607, Jun. 2014.
Feb. 2004. [42] D. Wei, Y. Lu, M. Jafari, P. M. Skare, and K. Rohde, Protecting smart
[20] S. Li, P. Wang, and L. Goel, A novel wavelet-based ensemble method grid automation systems against cyberattacks, IEEE Trans. Smart Grid,
for short-term load forecasting with hybrid neural networks and fea- vol. 2, no. 4, pp. 782795, Dec. 2011.
ture selection, IEEE Trans. Power Syst., vol. 31, no. 3, pp. 17881798, [43] X. Dong, J. Zhou, and Z. Cao, Efficient privacy-preserving tem-
May 2016. poral and spacial data aggregation for smart grid communications,
[21] A. Bracale, P. Caramia, G. Carpinelli, A. R. Di Fazio, and P. Varilone, Concurrency Comput. Pract. Exp., vol. 28, no. 4, pp. 11451160, 2016.
A bayesian-based approach for a short-term steady-state forecast of [44] M. A. Rahman, E. Al-Shaer, and P. Bera, A noninvasive threat analyzer
a smart grid, IEEE Trans. Smart Grid, vol. 4, no. 4, pp. 17601771, for advanced metering infrastructure in smart grid, IEEE Trans. Smart
Dec. 2013. Grid, vol. 4, no. 1, pp. 273287, Mar. 2013.
[22] W. Lee, B.-W. On, I. Lee, and J. Choi, A big data management sys- [45] Q. Li et al., ESO: An efficient and secure outsourcing scheme for smart
tem for energy consumption prediction models, in Proc. 9th Int. Conf. grid, in Proc. Int. Conf. Wireless Commun. Signal Process. (WCSP),
Digital Inf. Manag. (ICDIM), 2014, pp. 156161. Hangzhou, China, 2013, pp. 16.
[23] R. Christie, Power systems test case archive, Dept. Elect. Eng.,
[46] E. Gonzalez, L. B. Kish, R. S. Balog, and P. Enjeti, Information
Univ. Washington, Seattle, WA, USA, 2000. [Online]. Available:
theoretically secure, enhanced Johnson noise based key distribution over
http://www.ee.washington.edu/research/pstca/
the smart grid with switched filters, PLoS One, vol. 8, no. 7, 2013,
[24] M. Shahidehpour, F. Tinney, and Y. Fu, Impact of security on Art. no. e70206.
power systems operation, Proc. IEEE, vol. 93, no. 11, pp. 20132025,
[47] H. Li et al., EPPDR: An efficient privacy-preserving demand response
Nov. 2005.
scheme with adaptive key evolution in smart grid, IEEE Trans. Parallel
[25] N. Balu et al., On-line power system security analysis, Proc. IEEE, Distrib. Syst., vol. 25, no. 8, pp. 20532064, Aug. 2014.
vol. 80, no. 2, pp. 262282, Feb. 1992.
[48] X. Huang et al., Cost-effective authentic and anonymous data shar-
[26] K. J. Ross, K. M. Hopkinson, and M. Pachter, Using a dis-
ing with forward security, IEEE Trans. Comput., vol. 64, no. 4,
tributed agent-based communication enabled special protection system
pp. 971983, Apr. 2015.
to enhance smart grid security, IEEE Trans. Smart Grid, vol. 4, no. 2,
pp. 12161224, Jun. 2013. [49] S. Ruj and A. Nayak, A decentralized security framework for
data aggregation and access control in smart grids, IEEE Trans. Smart
[27] C. Vellaithurai, A. Srivastava, S. Zonouz, and R. Berthier, CPINDEX:
Grid, vol. 4, no. 1, pp. 196205, Mar. 2013.
Cyber-physical vulnerability assessment for power-grid infrastructures,
IEEE Trans. Smart Grid, vol. 6, no. 2, pp. 566575, Mar. 2015. [50] M. M. E. A. Mahmoud, J. Misic, and X. Shen, Efficient public-key
certificate revocation schemes for smart grid, in Proc. IEEE Glob.
[28] D. Kundur, X. Feng, S. Liu, T. Zourntos, and K. L. Butler-Purry,
Commun. Conf. (GLOBECOM), Atlanta, GA, USA, 2013, pp. 778783.
Towards a framework for cyber attack impact analysis of the elec-
tric smart grid, in Proc. 1st IEEE Int. Conf. Smart Grid Commun. [51] M. M. E. A. Mahmoud, J. Miic, K. Akkaya, and X. Shen, Investigating
(SmartGridComm), Gaithersburg, MD, USA, 2010, pp. 244249. public-key certificate revocation in smart grid, IEEE Internet Things,
[29] J. Fadul, K. Hopkinson, C. Sheffield, J. Moore, and T. Andel, Trust vol. 2, no. 6, pp. 490503, Dec. 2015.
management and security in the future communication-based smart [52] M. M. E. A. Mahmoud, K. Akkaya, K. Rabieh, and S. Tonyali, An
electric power grid, in Proc. 44th Hawaii Int. Conf. Syst. Sci. (HICSS), efficient certificate revocation scheme for large-scale AMI networks,
2011, pp. 110. in Proc. IEEE Int. Perform. Comput. Commun. Conf. (IPCCC), Austin,
[30] J. Hu, I. Khalil, S. Han, and A. Mahmood, Seamless integration of TX, USA, 2014, pp. 18.
dependability and security concepts in SOA: A feedback control system [53] X. Liang, X. Li, R. Lu, X. Lin, and X. Shen, UDP: Usage-based
based framework and taxonomy, J. Netw. Comput. Appl., vol. 34, no. 4, dynamic pricing with privacy preservation for smart grid, IEEE Trans.
pp. 11501159, 2011. Smart Grid, vol. 4, no. 1, pp. 141150, Mar. 2013.
[31] J. Baek, Q. H. Vu, J. K. Liu, X. Huang, and Y. Xiang, A secure cloud [54] Q. Zhang, L. Yang, and Z. Chen, Privacy preserving deep com-
computing based framework for big data information management of putation model on cloud for big data feature learning, IEEE
smart grid, IEEE Trans. Cloud Comput., vol. 3, no. 2, pp. 233244, Trans. Comput., vol. 65, no. 5, pp. 13511362, May 2015,
Apr./Jun. 2014. doi: 10.1109/TC.2015.2470255.
[32] M. Wen et al., PaRQ: A privacy-preserving range query scheme over [55] M. Wen et al., ECQ: An efficient conjunctive query scheme over
encrypted metering data for smart grid, IEEE Trans. Emerg. Topics encrypted multidimensional data in smart grid, in Proc. IEEE Glob.
Comput., vol. 1, no. 1, pp. 178191, Jun. 2013. Commun. Conf. (GLOBECOM), Atlanta, GA, USA, 2013, pp. 796801.
[33] W. Hurst, M. Merabti, and P. Fergus, Big data analysis techniques for [56] Y. Yang, H. Li, M. Wen, H. Luo, and R. Lu, Achieving ranked
cyber-threat detection in critical infrastructures, in Proc. 28th Int. Conf. range query in smart grid auction market, in Proc. IEEE Int. Conf.
Adv. Inf. Netw. Appl. Workshops (WAINA), Victoria, BC, Canada, 2014, Commun. (ICC), Sydney, NSW, Australia, 2014, pp. 951956.
pp. 916921. [57] D. Boneh and B. Waters, Conjunctive, subset, and range queries on
[34] X. He et al., A big data architecture design for smart grids encrypted data, in Theory of Cryptography. Heidelberg, Germany:
based on random matrix theory, IEEE Trans. Smart Grid, 2015, Springer, 2007, pp. 535554.
doi: 10.1109/TSG.2015.2445828. [58] S. S. S. Rawat, V. A. Polavarapu, V. Kumar, E. Aruna, and V. Sumathi,
[35] S. Pan, T. Morris, and U. Adhikari, Developing a hybrid intrusion Anomaly detection in smart grid using rough set theory and K cross val-
detection system using data mining for power systems, IEEE Trans. idation, in Proc. Int. Conf. Circuit Power Comput. Technol. (ICCPCT),
Smart Grid, vol. 6, no. 6, pp. 31043113, Nov. 2015. Nagercoil, India, 2014, pp. 479483.
[36] C. C. O. Ramos, A. N. de Sousa, J. P. Papa, and A. X. Falco, [59] S. McLaughlin, D. Podkuiko, and P. McDaniel, Energy theft
A new approach for nontechnical losses detection based on optimum- in the advanced metering infrastructure, in Critical Information
path forest, IEEE Trans. Power Syst., vol. 26, no. 1, pp. 181189, Infrastructures Security. Heidelberg, Germany: Springer, 2010,
Feb. 2011. pp. 176187.
2436 IEEE TRANSACTIONS ON SMART GRID, VOL. 7, NO. 5, SEPTEMBER 2016

[60] S. McLaughlin, B. Holbert, A.-Q. Fawaz, R. Berthier, and S. Zonouz, [83] J. Zhao et al., Short-term state forecasting-aided method for detection
A multi-sensor energy theft detection framework for advanced meter- of smart grid general false data injection attacks, IEEE Trans. Smart
ing infrastructures, IEEE J. Sel. Areas Commun., vol. 31, no. 7, Grid, vol. pp, issue 99, 2015, pp. 111, doi: 10.1109/TSG.2015.2492827.
pp. 13191330, Jul. 2013. [84] F. Capitanescu et al., State-of-the-art, challenges, and future trends
[61] S. Amin, G. A. Schwartz, A. A. Cardenas, and S. S. Sastry, Game- in security constrained optimal power flow, Elect. Power Syst. Res.,
theoretic models of electricity theft detection in smart utility networks: vol. 81, no. 8, pp. 17311741, 2011.
Providing new capabilities with advanced metering infrastructure, IEEE [85] A. J. Ardakani and F. Bouffard, Identification of umbrella constraints in
Control Syst., vol. 35, no. 1, pp. 6681, Feb. 2015. DC-based security-constrained optimal power flow, IEEE Trans. Power
[62] O. Kosut, L. Jia, R. J. Thomas, and L. Tong, Malicious data attacks Syst., vol. 28, no. 4, pp. 39243934, Nov. 2013.
on the smart grid, IEEE Trans. Smart Grid, vol. 2, no. 4, pp. 645658, [86] A. Teixeira, H. Sandberg, G. Dn, and K. H. Johansson, Optimal power
Dec. 2011. flow: Closing the loop over corrupted data, in Proc. Amer. Control
[63] A.-H. Mohsenian-Rad and A. Leon-Garcia, Distributed Internet-based Conf. (ACC), Montral, QC, Canada, 2012, pp. 35343540.
load altering attacks against smart power grids, IEEE Trans. Smart [87] L. Jia, J. Kim, R. J. Thomas, and L. Tong, Impact of data quality on
Grid, vol. 2, no. 4, pp. 667674, Dec. 2011. real-time locational marginal price, IEEE Trans. Power Syst., vol. 29,
[64] Q. Yang et al., On false data-injection attacks against power system no. 2, pp. 627636, Mar. 2014.
state estimation: Modeling and countermeasures, IEEE Trans. Parallel [88] L. Platbrood, F. Capitanescu, C. Merckx, H. Crisciu, and L. Wehenkel,
Distrib. Syst., vol. 25, no. 3, pp. 717729, Mar. 2014. A generic approach for solving nonlinear-discrete security-constrained
optimal power flow problems in large-scale systems, IEEE Trans. Power
[65] Y. Liu, P. Ning, and M. K. Reiter, False data injection attacks against
Syst., vol. 29, no. 3, pp. 11941203, May 2014.
state estimation in electric power grids, ACM Trans. Inf. Syst. Security,
vol. 14, no. 1, 2011, Art. no. 13.
[66] O. Kosut, L. Jia, R. J. Thomas, and L. Tong, Malicious data attacks
on smart grid state estimation: Attack strategies and countermeasures,
in Proc. 1st IEEE Int. Conf. Smart Grid Commun. (SmartGridComm),
Gaithersburg, MD, USA, 2010, pp. 220225.
[67] O. Kosut, L. Jia, R. J. Thomas, and L. Tong, Malicious data attacks Jiankun Hu received the B.E. degree from Hunan
on the smart grid, IEEE Trans. Smart Grid, vol. 2, no. 4, pp. 645658, University, Changsha, China, in 1983, the Ph.D.
Dec. 2011. degree in control engineering from the Harbin
[68] S. Li, Y. Yilmaz, and X. Wang, Quickest detection of false data injection Institute of Technology, China, in 1993, and the
attack in wide-area smart grids, IEEE Trans. Smart Grid, vol. 6, no. 6, Masters by Research degree in computer science
pp. 27152735, Nov. 2015. and software engineering from Monash University,
[69] L. Liu, M. Esmalifalak, and Z. Han, Detection of false data injection in Australia, in 2000. He is a Full Professor and the
power grid exploiting low rank and sparsity, in Proc. IEEE Int. Conf. Research Director of Cyber Security Laboratory,
Commun. (ICC), Budapest, Hungary, 2013, pp. 44614465. School of Engineering and IT, University of
[70] X. Yang et al., A novel en-route filtering scheme against false New South Wales, Canberra, Australia. He has been
data injection attacks in cyber-physical networked systems, IEEE Trans. with Ruhr University Bochum, Germany, on the
Comput., vol. 64, no. 1, pp. 418, Jan. 2015. prestigious German Alexander von Humboldt Fellowship from 1995 to 1996
[71] Y. W. Law, T. Alpcan, and M. Palaniswami, Security games for risk and a Research Fellow with the Delft University of the Netherlands, from
minimization in automatic generation control, IEEE Trans. Power Syst., 1997 to 1998, and a Research Fellow with Melbourne University, Australia,
vol. 30, no. 1, pp. 223232, Jan. 2015. from 1998 to 1999.
He is a Guest Professor with the College of Mathematics, Shandong
[72] W. Yu, D. Griffith, L. Ge, S. Bhattarai, and N. Golmie, An integrated
University, China, and the Guest Professor with the State Key Laboratory of
detection system against false data injection attacks in the smart grid,
Information Security, the Institute of Information Engineering, the Chinese
Security Commun. Netw., vol. 8, no. 2, pp. 91109, 2015.
Academy of Sciences. His current research interests include the field of
[73] M. Ozay, I. Esnaola, F. T. Vural, S. R. Kulkarni, and H. V. Poor, Sparse cyber security including biometric security, bio-cryptography, intrusion detec-
attack construction and state estimation in the smart grid: Centralized tion, and applied cryptography, where he has published many papers in top
and distributed models, IEEE J. Sel. Areas Commun., vol. 31, no. 7, journals, including the IEEE T RANSACTIONS ON PATTERN A NALYSIS AND
pp. 13061318, Jul. 2013. M ACHINE I NTELLIGENCE, the IEEE T RANSACTIONS ON C OMPUTERS, the
[74] S. Cui et al., Coordinated data-injection attack and detection in the IEEE T RANSACTIONS ON PARALLEL AND D ISTRIBUTED S YSTEMS, and
smart grid: A detailed look at enriching detection solutions, IEEE Signal the IEEE T RANSACTIONS ON I NFORMATION F ORENSICS AND S ECURITY.
Process. Mag., vol. 29, no. 5, pp. 106115, Sep. 2012. He has served in the Editorial Board of up to seven international journals,
[75] Y. Yuan, Z. Li, and K. Ren, Modeling load redistribution attacks in including the IEEE T RANSACTIONS ON I NFORMATION F ORENSICS AND
power systems, IEEE Trans. Smart Grid, vol. 2, no. 2, pp. 382390, S ECURITY and served as the Security Symposium Chair of the IEEE flag-
Jun. 2011. ship conferences of the IEEE ICC and IEEE Globecom. He has obtained
[76] A. Ashok and M. Govindarasu, Cyber attacks on power system state seven Australian Research Council (ARC) Grants and has served at the pres-
estimation through topology errors, in Proc. IEEE Power Energy Soc. tigious Panel of Mathematics, Information, and Computing Sciences, ARC
Gen. Meeting, San Diego, CA, USA, 2012, pp. 18. the Excellence in Research for Australia (ERA) Evaluation Committee 2012.
[77] O. Vukovi and G. Dn, On the security of distributed power system He is the invited expert of Australia Attorney-Generals Office.
state estimation under targeted attacks, in Proc. 28th Annu. ACM Symp.
Appl. Comput., Coimbra, Portugal, 2013, pp. 666672.
[78] A. Tajer, S. Kar, H. V. Poor, and S. Cui, Distributed joint cyber
attack detection and state recovery in smart grids, in Proc. IEEE
Int. Conf. Smart Grid Commun. (SmartGridComm), Brussels, Belgium,
2011, pp. 202207. Athanasios V. Vasilakos is currently a Professor
[79] L. Liu, Z. Han, H. V. Poor, and S. Cui, Big data processing for smart with the Lulea University of Technology, Sweden.
grid security, in Big Data Over Networks. Cambridge, U.K.: Cambridge He served or is serving as an Editor for
Univ. Press, 2016, pp. 217243. many technical journals, such as the IEEE
[80] J. Zhao, G. Zhang, Z. Y. Dong, and K. P. Wong, Forecasting-aided T RANSACTIONS ON N ETWORK AND S ERVICE
imperfect false data injection attacks against power system nonlin- M ANAGEMENT, the IEEE T RANSACTIONS ON
ear state estimation, IEEE Trans. Smart Grid, vol. 7, no. 1, pp. 68, C LOUD C OMPUTING, the IEEE T RANSACTIONS
Jan. 2016. ON I NFORMATION F ORENSICS AND S ECURITY , the
[81] Y. Yamaguchi, A. Ogawa, A. Takeda, and S. Iwata, Cyber security IEEE T RANSACTIONS ON C YBERNETICS, the IEEE
analysis of power networks by hypergraph cut algorithms, IEEE Trans. T RANSACTIONS ON NANOBIOSCIENCE, the IEEE
Smart Grid, vol. 6, no. 5, pp. 21892199, Sep. 2015. T RANSACTIONS ON I NFORMATION T ECHNOLOGY
[82] J. Kim, L. Tong, and R. J. Thomas, Subspace methods for data attack on IN B IOMEDICINE , ACM Transactions on Autonomous and Adaptive Systems,
state estimation: A data driven approach, IEEE Trans. Signal Process., and the IEEE J OURNAL ON S ELECTED A REAS IN C OMMUNICATIONS. He is
vol. 63, no. 5, pp. 11021114, Mar. 2015. also the General Chair of the European Alliances for Innovation.