You are on page 1of 13

Advanced Engineering Informatics 42 (2019) 100945

Contents lists available at ScienceDirect

Advanced Engineering Informatics


journal homepage: www.elsevier.com/locate/aei

Review article

State of the art in big data applications in microgrid: A review T


Karim Moharm
Alexandria University, Egypt

ARTICLE INFO ABSTRACT

Keywords: The prospering Big data era is emerging in the power grid. Multiple world-wide studies are emphasizing the big
Big data data applications in the microgrid due to the huge amount of produced data. Big data analytics can impact the
Microgrid design and applications towards safer, better, more profitable, and effective power grid. This paper presents the
recognition and challenges of the big data and the microgrid. The construction of big data analytics is in-
troduced. The data sources, big data opportunities, and enhancement areas in the microgrid like stability im-
provement, asset management, renewable energy prediction, and decision-making support are summarized.
Diverse case studies are presented including different planning, operation control, decision making, load fore-
casting, data attacks detection, and maintenance aspects of the microgrid. Finally, the open challenges of big
data in the microgrid are discussed.

1. Introduction selling opportunities and quick experimentation. In manufacturing, big


data help risk analysis, identify new investment areas, and planning.
Over the past years, data size is dramatically increasing supported Supported by those achievements in different perspectives, micro-
by new technologies like Internet of Things, spread sensors, and new grid big data analytics offers interesting applications. Big data can en-
communication technologies. Massive amounts of data are being gen- hance reliable, efficient, and cost-effective operation through the high
erated in structured, unstructured, and semi-structured forms as shown computational power. The data analytics application can lead to better
in Fig. 1. The traditional techniques for data processing is not efficient energy forecasting, smart meter analytics, asset management and ana-
nor sufficient for processing this huge amount of data which requiring lytics, more reliable grid operation, customer segmentation, more
more powerful computational analysis strategies. Big Data analytics profitable energy trading, customer service data analytics, energy effi-
and storage offer scalable solutions and new methods to handle that ciency, end-user engagement and marketing, bad data detection, energy
huge complex amounts of data. theft detection, load management and profiling, burden forecasting,
Big data found its applications in different fields [1] such as business and outage management [7].
[2] where it could help analyzing of the customers preferences and The microgrid concept combines the micro source distributed gen-
interactions to achieve better goods recommendation and reduces costs eration (DG), loads, control, and storage elements operating together in
[3]. In social life, big data can be processed to have more insights about islanded or grid connected modes [8]. It can be seen by the utility
public needs [3] such as processing data from Facebook, Twitter, and network as a controllable unit. DG makes use of micro turbines, fuel
LinkedIn to comprehend the user interaction and behaviors. In cells, internal combustion engines, solar energy, wind, hydro power,
healthcare field [4], data sources are mostly unstructured and can be biomass, ocean, bio fuels, and hydrogen into electrical and heat energy
Electronic Medical Records (EMR), X-rays, CT scans, documents, which are easier to use and consume. Distributed storage systems like
claims, machinery, etc. Big data help in profiling patients, illness pre- batteries, flywheels, and super capacitors. Mirogrid supports the usage
diction, and support doctors. In government and defense field, big ci- of renewable energy in the power grid. The requested electrical power
tizens data can be analyzed for crime detection, attacks analysis, and is increasing fast. Simultaneously, conventional energy resources used
visualization. In insurance field, big data extracts customer behaviors, for electricity generation are exhausting. Also, environmental effects
social media, and call records for developing predictions, better cus- caused by the conventional power stations are causing global problems.
tomer satisfaction, and profits maximization. In finance field, big data The interest of renewable and environment friendly energy sources is
adoption reveals new analytical trade analysis, invisible market increasing to achieve energy savings and best performance. Renewable
chances, and client analysis because of the growing financal datasets energy solutions are assured to present reliable solution to energy
[5]. In automotive industry [6], big data finds its applications for more challenge although the oscillatory electrical power and imbalance can

E-mail address: karim.i.moharm@gmail.com.

https://doi.org/10.1016/j.aei.2019.100945
Received 31 December 2018; Received in revised form 1 May 2019; Accepted 7 June 2019
1474-0346/ © 2019 Published by Elsevier Ltd.
K. Moharm Advanced Engineering Informatics 42 (2019) 100945

collected data in a decentralized computing platform where analysis is


moved near the data [14]. The cost of deploying big data in the mi-
crogrid market is being studied. Moreover, the big data processing
centers can be fed by renewable energy for complete economic and
environment friendly environment. Big data applications in microgrid
field are of great value, impact and importance. In this paper, we
summarize the sources of big data in the microgird, platform constric-
tion of handling the big data in the microgrid, and summarize case
studies.
The rest of the paper is organized as follows. The big data Vs in
microgrid is discussed in Section 2. Section 3 summarizes the data
sources that formulates the big data problem in the grid. Section 4
presents the widely used big data environments for processing. Section
5 introduces the layered representation of big data problem solving.
Fig. 1. Big data content format.
Opportunities offered by adopting big data solution in the power grid
are studied in Section 6. Challenges of proceeding with big data in
happen if inconsistency between load and renewable generation hap- microgrid and open points are discussed in Section 7. Finally, the
pens. Moreover, disturbances in the grid. conclusion is presented in Section 8.
Vast employment of uncontrolled and uncertainty intermittent
nature of the renewable energy triggers the need for new real time and 2. Big data problem in the microgrid
data-based energy management system in the microgrid control system
to avoid volatility of the whole grid. Huge data sources are pervading Microgrid includes advanced technologies, control, sensors, and
the microgrid such as data exported from PV measurements, advanced data transmission for the sake of better services in the power grid. Data
measuring units, smart meters, SCADA, weather reporting, environ- infers important important to different stake holders such as distribu-
mental conditions, and sensors data. Smart control involves high tion, power dispatch, power generation and transmission [15]. Data
communication data rate to improve power quality, reliability, effi- gathered by the microgrid are increasing in size and complexity. Data
ciency, and regulation of the whole power grid. Data levels are of becomes a big data problem when the traditional data processing
Petabytes level to be analyzed within small reasonable which going methods cannot handle that amount of data (volume), data growth
beyond the traditional computational methods. The microgrid re- (velocity), or format (variety). Big Data does not only infer the huge
presents high volume of data and there is still primitive transition to- amount but also the variety and velocity of the data [16]. Even, big data
wards using new big data technologies [9]. can be characterized by the following 7V’s [17] shown in Fig. 3:
The electrical networks are generating big data. For example, only
40 Phasor Measurement Units (PMUs) can generate about 5.6 Terabytes • Value: Big data processing results in the value extracted from the
per year. 1 Million smart meters can generate 2920 Terabytes when microgrid data. The two way communication between the grid and
data is collected each 15 min [10]. IBM analyzed 4 heterogeneous Pe- the customers reveals opportunities such as electrical energy
tabytes data like environmental data (weather, wind, etc.) to optimally transfer with dynamic pricing between loads and generation, better
select the wind turbine best location [11]. The data in the microgrid are paying rates for the consumers, better profit for the utility, and more
increasing more and more in the streaming volume and speed with savings for the clients could be achieved.
different formats and structures. E-Sketch project [12] showed the • Volume: The tremendous data in the microgrid is generated by a
importance of analyzing the big data of the grid at streaming at second plethora of sources such as consumers data, sensors, equipment
level leads is more appreciated. Power disturbance economic loss in US data, real time acquisition data, operation data, weather, cloud
is between 119$ billions to 188& billions which shows the emerging storage, SCADA, metering units, weather prediction, assets mon-
need for big data processing in the microgrids [13]. itoring, finance, energy market, social media, power quality data,
Big data analytics provides the microgrid with the appropriate signals waveforms, etc. [14]. Weather prediction problem only re-
powerful computational methods. Big Data platforms like Hadoop and quires 18 billion calculation [18].
Spark are able to process this data within the minimal time to improve • Variety: Different kinds of data and communication protocols are
operation efficiency through fast data collection, grid adaptation to the available in the microgrid such as remote sensing data for renewable
user, microgrid management for best operation. It can help for main-
tenance decisions in appropriate time and prevent failures in the power
grid network. A simple conceptual view is shown in Fig. 2.
The new era of big data analytics is providing much more ad-
vantages than the traditional used techniques in the power grid in-
dustry. It shall provide the parallel real time speedy processing to the
data to knowledge extraction and discovering value underneath of the

Fig. 2. Conceptual view. Fig. 3. Microgrid big data 7Vs.

2
K. Moharm Advanced Engineering Informatics 42 (2019) 100945

energy integration coming from satellite, historical data in cloud, Each renewable micro energy source has its own data. Biogas energy
etc. Microgrid data is a joint among structured data, unstructured is dependent on the anaerobic digestion of biomass. Data source for
data, and semi structured data. Structured data is modeled through biogas energy simulation is available using BSM2 tool [27]. Wind tur-
tables and rational data bases and examples are data generated by bine generation includes data about amount of generated energy,
smart meters and sensors [19]. It can be used for representing monitoring, and control depends on different sensors data like em-
consumption and market. Semi-structured data model through ploying temperature sensor PT100 for external air temperature,
JSON, and XML. It can be used for several sensors data and one gearbox temperature, casting temperature, and enclosure temperature
example is weather data representation. Social media data, human [28]. Wind speed is measured using Anemometer and direction is de-
interaction, and visual data are examples of unstructured data [14]. termined using wind vane. Other sensors are used in wind turbines
• Velocity: The fast communication technologies inside the microgrid operation like humidity sensors, accelerometers for tower movement
is increasing the speed of data generation and processing. Data in measurement, electrical voltage and current measurement, and rota-
the microgrid can be in batches like planning data, online data like tional speed measuring (for example, hall effect sensors).
SCADA, and it can be streaming data like PMU, and Cyber Security Concentrated solar plant (CSP) generates data like tilt angles, lo-
data [14]. cation, orientation of the collector, mirrors dimensions, generated
• Validity: Data should be valid for its intended use. power, current limit, pump head specifications, ultrasonic signals to
• Veracity: Data communicating between microgrid elements can be detect possible damage, and temperature status signals [29]. Con-
noisy, certain and reliable and maybe not. Observations data shall centrated solar plant must have data about market demand, meteor-
be checked against the reasonable change. For example, checking if ology information given by Anemometer, Pluviometer, Hygrometer,
temperature can change by this value within this time range or this Thermometer, Barometer, Pyrometer, Cloud cover, and Radiance index
outlier observation shall be neglected. [29].
• Volatility: Microgrid data is not directly neglected or removed after SCADA data provides online real-time data, measurement flows
processing, contrariwise the traditional methods. data, set points, control commands, grid assessment, and grid variables.

3. Big data collection in microgrid


3.3. Storage battery data
Sources of big data in microgrid can be:
Battery management system includes data about its internal states,
terminal voltage, State of Health (SoH), State of Discharge (SoD), fault
3.1. Equipment related data
detection data, temperature, voltage, and current [30].
Equipment data results from monitoring and management are so
spreading and can include data from tranformers, switch gears, trans- 3.4. Load-side data
mission lines, ccables, substations, and circuit breakers [20,21].
Transformer data includes data about oil pressure, discharge, Load management data sources such as the smart meters, DHC
ground current, nominal data, manufacturer, capacity, connection, communication between appliances in the smart home [31], customers
cooling system, release data, transformer model, fault recorders, de- life patterns, consumer questionnaire [32], cooling/heating, water
scription, fault source, transformer dielectric loss, polarization index, status, gas consumption, and local building data such as rooms tem-
resistance, insulation data, moisture content, load data, short circuit perature and attendance in building. The load-side is an extensive
current, tap-change data, and tripped situations [22]. Switch gear data source of big data. For example, 1 Million Smart meters produces more
such as discharge, moisture content, and discharge info. Transmission than 1.82 Tb of data [33,34]. Big data can reveal the load type from the
lines data such as voltage sag, electrical signal waveforms, and leaning energy consumption data such as consumption patterns. Consumers
of the tower [22]. Power cables data such as temperature, discharge, also may share the obtained loads information [35].
and circulation current [22]. Substation data such as name, ID, type, Load generated data such as the electric vehicle data also is a big
location, and pollution status. Circuit breaker parameters such as data source. The data encompasses all data like vehicle location, faults,
nominal voltage, current, capacity, connection, model, manufacturer, pedal readings, motor drive, velocity, tire stress, etc. Beijing Electric
cooling system, release data, operation data, fault recorders, fault Vehicles Monitoring and Service Center [36] reserves the real time data
source, fault action, substation name, tripping details, SF6 inspection (online) and stored data.
reports, insulation resistance, control mechanisms, equipment produc-
tion data, and number of circuit breakers.
3.5. Environmental data
Equipment monitoring data represents a big data source.
Monitoring data can include energy management system such as
Numerical Weather prediction data represents high volume suspi-
equipment voltage, current, power, frequency, etc. Equipment real-time
cious data streaming in different velocities in different structures (the
monitoring data can represent transformer healthy operation, ventila-
standard or time stamp) [37,18]. Weather prediction equations are
tion, lubricant or oil status. Also, voltage and current waveforms are
highly complex and nonlinear.
stored for power quality calculation such as the total harmonic distor-
Solar and wind data have different sources. Satellite data is pro-
tion and flicker index [23]. Distribution relay responses with 40 bytes
vided by National Aeronautics and Space Administration (NASA)
for Modbus [24] standard protocol for equipment monitoring [25].
globally [38]. Environmental data can be provided by some local or-
Remote Telemetry Units (RTU) used for monitoring of renewable en-
ganizations provide data like Renewable Energies Organization of Iran
ergy [26]. Equipment diagnostics data can be data about previous
(SUNA) and Iran Meteorological Organization (IRMO) [39].
faults, location, trip times, voltage dips records, transients, faults de-
scriptions, previous disturbances, and digital protection data.
3.6. Sensors
3.2. Distributed generation data
The spread sensors for maintenance or operation are a viable source
Generation datasets such as that represents the satisfaction of the of data. Sensors generated data may be expressed in key-value pairs.
generated power to the consumption. The data represents if the gen- (sensor ID, time step, consumption, etc.). Metering infrastructures in
eration power is sufficient, more or less than the load needed power. New York State generates more than 127 terabytes per day [14].

3
K. Moharm Advanced Engineering Informatics 42 (2019) 100945

3.6.1. PMU NameNode) is responsible for locating the data and retrieve data
Phasor measurement units (PMUs) and fault disturbance recorders. from NameNodes (slave nodes) when requested. It includes Ma-
More than 2500 phasor measurement units in China [40,41]. PMU logs pReduce [57] for distributed data processing and running the ap-
the voltage, frequency, current, and phase angles with time stamp. It propriate jobs on data. Disadvantages of Hadoop is somewhat heavy
communicates with the GPS. resource usage at operation [58]. Hadoop is recommended for batch
processing of microgrid big data that analyses the data without high
3.6.2. Wireless network connections caring on response time [59]. Through Hadoop, Java scripts can be
Wireless Networked Sensors [42] are a main source of big data in used for programmers while other users can access big data using
the microgrid. One example [43] is for PV modules inspection. This Hive [60], Pig [61], and Avro [62]. Hadoop consists of Hadoop
data can be the current, power, temperature, serial number, voltage, Common libraries and modules, HDFS, Apache YARN [63] for re-
and sun intensity per each PV module. All data sets are considered as source and task management and scheduling, MapReduce. Hadoop
the highest irradiance does not strictly mean the highest output power. offers high reliability as single or even multiple failure will not block
At the highest irradiance, temperature is high and affects the output Hadoop operation, scalability to thousand of nodes is easy, high
power negatively. Data can be transferred using Wi-Fi or Zigbee [44]. performance and fast operation, and open source with no cost [64].
It is an ecosystem and not single application and has a number of
3.6.3. Cloud storage components most of them are created by Apache Software Foun-
The cloud can serve as a storage for the data collected by smart dation and open source. Main components of Hadoop ecosystem are
homes, sensors, meters, etc. The cloud also supports the security, Hive, Drill, Pig, Cascading, Giraph, and Mahout.
availability, fault-tolerant storage [45]. The big data can make use of • MapReduce [65]: An effective distributed processing approach in-
the vast cloud storage. troduced by Google. It performs two steps of operation. The map
task its input dataset as key/value pairs. The task does some scalar
3.6.4. Internet of Things transformations. Outputs of map are grouped by the keys, sorted,
IoT is a source of big data [46]. IoT represents networked sensors and divided to multiple smaller tasks. Reduce task inputs are list of
embedded in devices. keys to be handled. It does set operations on values arrays for each
key and can optionally combine matching keys from map operation
3.6.5. Specialized monitoring sensors and it collects the results.
Specialized monitoring sensors such as anemometer, illumination, • Apache Zeppelin [66,67]: A web based interface for interactive data
ceilometer, acceleration, stresses, strain, speed and power data given by visualization, analysis, and understanding.
blades, gearbox of wind turbines, bearing sensors [47], frequency dis- • Apache Zookeeper [68]: distributed configuration service and syn-
turbance recorders (FDR), customer software for pricing, outage man- chronization. It can manage HBase and Hadoop clusters.
agement systems and wide-area monitoring systems (WAMS) [48]. • Apache Chukwa [69]: built on HDFS and MapReduce used for
monitoring of distributed systems. It has tools for displaying and
3.7. Off-domain data sources analyzing data results.
• Apache SAMOA [70]: (Scalable Advanced Massive Online Analysis)
The off-domain data sources that not directly related to the power- is a distributed streaming machine learning framework for data
industry but affects it are shown in Table 1. mining and machine learning tasks.
• Apache Mahout [71]: provides machine learning
3.8. Public datasets • Apache RHadoop [72] for data mining and visualization.
• Apache Hive [60]: a data warehouse used for data storage on top of
There are several public datasets that represents big data in the Hadoop MapReduce. Its main components are Hcatalog and
microgrid as shown in Table 2. WebHcat. Data can be written easily in Hcatalog as it is a table while
HTTP interface is available through WebHcat that enables running
4. Big data environments in microgrid Hadoop MapReduce, Pig, and Hive.
• Apache Pig [73]: programmed using PigLatin and run into MapRe-
Some general big data tools are: duce tasks on Hadoop for parallel data flow system for datasets
analysis.
• Apache Hadoop [54,55]: was developed by Dough Cutting and Mike • Apache Flume [74]: streaming data acquisition tool to HDFS.
Cafarella in 2005. Most popular big data platform built on the top of • Apache HBase [75]: Data actually get stored in HDFS. NoSQL data
Hadoop Distributed File System (HDFS) [56]. HDFS is used for storage. Hadoop Data base used when the HDFS not enough at cases
storage through dividing the data files into 64 MB partitions called when data is updated by more than one application [76].
chunks using like master-slave architecture. Master node (called • Cassandra [77]: open source distribution database management

Table 1
Off-domain data sources in microgrid.
Data source Summary

Human Data such as the customers feedback, customer calls at power outage [49], population, behaviors, public security data, and consuming behaviors.
Human generated social media data sources like emails, Facebook, Twitter, etc.
Off-Domain Generation Data such as the traffic condition data
Off-Domain Monitoring Data such as the environmental conditions and its effect on the equipment such as the temperature and rain, animal migration data for outage
expectation and management, and Seismic data [49]
Enterprise data Data about the power generation from the enterprise
Video on-site video recording
Financial data Data during operation and management including business reports, prices, sales, and market conditions
Records electrical energy consumption per population, spare renewable energy, education rate, land price, emissions, grid connectivity length to the area,
water consumption

4
K. Moharm Advanced Engineering Informatics 42 (2019) 100945

Table 2
Public available datasets.
Dataset Description

Sinogreenergy represents solar generation data sets [50]


EnergyPlus other data sets sources can be provided by [51]
NREL [52] provides geospatial data and provides large datasets about energy data resources like biomass, geothermal, marine, wave power, wind, solar, and Hydrogen
data for U.S. and some other areas.
OpenAMIData Structured example of the 5 min energy data contains time stamp, date, time, value, and correctness of the data can be found in the online repository [53]

system and used to store high volume datasets. It supports cloud and historical data based on Hadoop and Storm with open source
storage. It is used at Netflix, Twitter, etc. stack integration [96]. The solution is deployed in [97] for energy
• MongoDB [78]: used for storage not based on Hadoop. efficiency big data applications.
• Apache Sqoop [79]: (SQL to Hadoop) used to transfer structured • PS CLEMENTINE PRO [98] provides a solution for big data analysis
data to Hadoop. Apache Sqoop or Apache Tajo can be used to in- environment and mining that used by a study in [99] for high re-
tegrate the smart meters data with the big data processing. Smart liability operation and revealing correlation between elements
meters data can be saved in relational data bases like PostgreSQL based on equipment and monitoring data.
[80,76]. • Oxdata H2O [100] is a scalable machine learning platform for big
• Apache Tajo [81]: parallel querying for big data warehouse. data and can be standalone or run over Hadoop or Spark. It is used
• Apache Spark [82,83]: For analytics, scalable and fast data proces- for energy consumption prediction [101].
sing due to its built-in memory computing framework. Spark can run
as a standalone, on Hadoop, on cloud or on Mesos. It can be used for Big data and cloud computing are very popular research topics.
real time data processing and batch processing [59] and can be in- While the high volume data is the main motive for big data and cloud
tegrated with R for data mining. Spark consists of [84]: computing concepts, they are not the same. Big data methods can work
– Spark key component: for task scheduling, memory and fault without cloud computing technologies but combining big data and
handling. cloud computing enables more economic and efficient analysis. Big data
– Spark SQL: programmed by Java, Python, or Scala for processing refers to have large volume datasets and dealing with them. Cloud
data like Hive tables, Parquet, and Jason. computing refers to network services, platform for getting data, ac-
– Spark streaming component: for handling data in memory that cessing, and computing.
makes data processing much more faster than Hadoop which uses
drive. 5. Big data construction in microgrid
– MLib: a library includes machine learning algorithms like re-
gression and clustering. The construction of big data platform in the microgrid can be de-
– GraphX: a component for visualization. scribed in layered architecture [102] shown in Fig. 4:
• Apache Flink [85,86]: Flink can be used also for real time data
processing and batch processing. Flink is scalable, fault tolerant and • Database Collection Layer: It gathers the data from the sources like
fast. Flink is used for machine learning applications because it is sensors, load data, generation data, equipment data, etc. presented
coming with its FlinkML machine learning library [87]. To store in Section 3. It also transmits the data via wire or wireless
data, Flink has ready integration with HDFS.
• Apache Storm [88,89]: The recommended tool for real time pro-
cessing of big data in the microgrid. Other tools for real time pro-
cessing are S4 [90], StreamBase [91] and Splunk [92]. Storm offers
open source, fault tolerant, parallel processing, and simple pro-
gramming. Storm consists of some definitions like Stream, spout,
bolt, and topology. Storm creates a graph for computation called
Topology. Stream is the core of Storm which is unlimited sequence
of tuples where a tuple is a named list of values (integars, decimals,
etc.). The tuples is read from external sources and injected in the
topology by the Spout. A spout is a logical node acting as the source
of streams. Bolts are logical units that process input streams and
emit the output streams.
• Apache Kafka [93]: fault tolerant that keeps the data if an error
occurred. Kafka can be easily integrated with Storm for data in-
tegration. It can be used for storing streaming data.

Different from general big data ecosystems, some platforms have


been proposed. Generally in the microgrid, the big data initiatives are
using Spark over Hadoop due to easiness to access more machine
learning algorithms and iterative or streaming real time processing
[94]. Better for microgrid data processing is streaming data processing
more than batch data because of continuous real time pricing or sta-
bility analysis.
The customized big data processing platforms:

• SuperDoop environment provided by Ingenia [95] is a complete Fig. 4. Big data processing and analytics platform in the microgrid.
solution for big data storage, scalable parallel processing in real time

5
K. Moharm Advanced Engineering Informatics 42 (2019) 100945

communication for the analytics. • Value and knowledge discovery and visualization about customers,
• Data preprocessing: The layer includes subsystems for data interfa- power plants operation and control, and markets after mining the
cing using tools like SQOOP for interaction with HDFS and tradi- big data in the microgrids [84].
tional data storage, FTP to data exchanging, and MQTT for data- • Protection of data hacking attacks [117].
bases handling. The output of this layer is data transformation, • Real time monitoring of the microgrid and discover correlations
assessment, and outlier detection and correction [103]. [118].
• Data Management: Management of the big data whether un- • Optimal operation considering operation and maintenance cost.
structured or structured using Hive, Hbase, Impala. Also, MongoDB
represents lately outstanding solution as it has compatibility with The applications are discussed according to:
the MapReduce [104].
• Data Analytics Layer: apply big data tools on the data to mine and 6.1. Feasibility studies
make use of the data. Tools of this layer can be Spark, MLib [105],
GraphX [106], Python, MATLAB. The purpose of this layer is to Big Data can effectively participate in feasibility studies for different
achieve analysis or risks, decision support, correlation [107]. improvements in the elements inside grid. For example, big data sup-
• Application Layer: Reports the final results of the mining technol- ports discovery in battery materials innovations [119]. Another ex-
ogies and the sense of the data. The report may be visualized, esti- ample, it can enhance the electric vehicle industry, being part of the
mation, grid and equipment expectations. microgrid. Bryan et al. [120] presented a big data-based approach for
analyzing the extension of the electric vehicle range issue impacted by
6. Big data applications in the microgrid increasing the power limit of the battery to allow more regenerative
braking power on the plug-in hybrid electric vehicle range. Data re-
Big Data analytics offers enhanced business decisions to have new corded by different driving styles of Ford employees and could found
customers, recognize the potential competitors, eradicate the electricity the slightly impact of the increase on the range.
deception, more efficient markets based on data and not only experi-
ence, reliable operation, and distinguish the needs of the customers 6.2. Planning & decision making
[108].
Big data also finds its applications in different elements of microgrid Big data would improve the urban planning [108] and the microgrid
like green and renewable energies (wind, solar, marine, etc.) [109], planning as well based on support the transition to renewable sources of
control, load management and operation. Big Data enhances the op- generation. The proper planning is achieved as big data can combine
timal operation of the microgrid by making use of the data collected for the sales, operational data, management, etc. to achieve the optimal
better decision making. Big data applications can be [110]: planning. The energy market combines different sources of energy like
fuel, nuclear, renewable energy which includes biomass, solar, hydro,
• Eradicate electrical energy leakage and fraud detection [102]. geothermal and wind energies. Making a policy future plan for the
• Identification of the electrical equipment connected to the microgrid emphasis of the future energy source involves big data. Big data sup-
based on big consumption data patterns classification [45]. ports policy makers to have energy sources selectivity. Chalvatzis et al.
• Enhance reliable, sustainable, safe, effective operation of the power [121] used big data approaches to help the UK decision makers to select
grid through the computational analysis and prediction provided by their future sources of energy plan taking into consideration the tech-
real time big data mining. nical and environmental terms. The results based on the detailed ana-
• Improve integration of DG in the power grid, fault detection and lysis indicated investment should have tendency to mix all renewable
power quality [111]. energy sources except biomass, even being environmentally friendly, in
• Big data can support very high sampling rate data streaming of UK. The studies are summarized in the following areas:
electrical signals like the current, voltage, etc. This can enhance
power quality analysis of the microgrid [112]. 6.2.1. Micro generation optimum plant design
• Enhance deployment of complex control algorithms to achieve Big data has its applications in the planning to design most efficient
better and efficient power grid operation. residential power plant. Huang and Chang [122] provided an approach
• Support decision makers through highlighting the areas to be im- to build rooftop power plant based on data about roof material and
proved. height, inverter specifications, solar power, number of Maximum Power
• Implementation of dynamic pricing algorithms. Consumers pattern Point Tracking (MPPT) units and environment data like azimuth angles
analytics will enable analysis of the millions of users to visualize or gathered by Sinogreenergy [50] to recommend best efficient build solar
analysis of the rate that will be assigned [14]. EDF [113] already power plan in Taiwan based on best connection type (series, parallel,
estimated 35 million of smart meters in the French network for the series-parallel), altitude angle and module voltage. The system uses
real time pricing for lower bills. Data generation estimation is about Self-Adaptive Harmony Search (SAHS) [123] for distinguishing relevant
120 terabytes per year. elements and selecting their weights. Association rules, SVM, and K-
• Load forecasting data generated by big data analytics for every means classifiers are employed to identify optimal models.
consumer every time step can be used to improve utility planning, Vetas Wind Turbines in Denmark and IBM use big data processing
determine unit price, decrease operating costs, increase the quality, for optimal location for wind turbines based on data from weather, tidal
and help reducing power loss [14,]. expectations, geographical, and satellite [115].
• Assets monitoring and predictive fault analysis will allow predictive Big data and cloud computing perspectives can manage con-
maintenance. centrated solar plant and monitoring based on big data sources like
• Value extraction can be detected. Big data can help selection of solar data collected, data stored, meteorology, market demand, and
relevant and non-relevant factors in decision making and future equipment conditions [29].
planning.
• Enable backup solutions for the power grid enterprises [115]. 6.2.2. Wave energy integration support
• Enable dynamic pricing strategies. BigDataOcean project proposes maritime big data for interest of
• Peak load management. organization and scientists for maritime sector enhancements and em-
• Demand response management and scheduling of temperature powering development. The project value chain starts from filtering of
controlled loads like heat pump days ahead [116]. measurements of wave, weather, marine and equipment and supporting

6
K. Moharm Advanced Engineering Informatics 42 (2019) 100945

site planning, assessment of wave energy potential, and grid impact 6.3.2. Weather forecasting
analysis [124,125]. For the optimal operation of the renewable energy like the solar and
wind, estimates of the output power. The weather forecasting may add
6.2.3. Operator decision making support strengths to the planning of the power grid days or hours ahead [37]
Big data application in microgird can help operators for better de- and would enhance the scheduling of the sources in the grid. SUN4-
cision making under circumstances like hurricane and the storms. [22] CAST [18] represents a model for big data-based approach for weather
provided an Hadoop based platform, called Disaster-Mitigating Dis- prediction.
patch Platform (DMDP), for disaster prevention and immediate re-
sponse. The purpose of this platform is risk identification, visualization 6.3.3. Burden forecasting
(like through warnings and emergent locations and repairing centers) Old methods for long period forecasting the load like regression,
and emergency response decision making like vehicles and transpor- wavelet analysis, SVM, and ANN are prone to disadvantages like apt to
tation distribution at the weak points and during the disasters. Zoo- uncertainties conditions like the environmental conditions and climate,
Keeper is used for cluster co-ordination. Data stored in HBase from so detailed analysis is requested. Effective burden forecasting enhances
weather, environmental, historical, equipment status, etc. is mined to the stability of the microgird and improves the power dispatch between
provide preventive procedures to weak areas, identify major problems sources. Big data supports detailed analysis for the environmental
with corrective actions in case of power outage due to disasters, identify conditions and electricity consumers behavioral data with the historical
weak areas in the grid, self enhancement and efficiency calculation of analysis of the old data. Hu et al. [131] presented a framework based on
measures done. Spark to predict the load to be considered for dispatching and planning.
[96] used SuperDoop to predict consumption model based on energy
data using Lambda architecture for big data analytics considering data
6.2.4. Optimal renewable energy integration
acquisition challenge and speed batch processing [132]. SuperDoop
[35] presented a solution to dispatch between the renewable
based analysis with machine learning allows identification of con-
sources and the conventional grid. The renewable energy is supportive
sumption for each home based on meters and historical data.
when the conventional grid not able to provide the full electricity
[133] studied short term load forecasting based on Hadoop big data
consumption. The methodologies are presented and applied on town
analytics of smart meter data.
had 2700 residents.
[134] studied short term load forecasting based on improved SVM
Ifaei et al. [126] analyzed Iran’s case study about load and climate
based on Hadoop for faster computations and better prediction based
big data for renewable energy management using ANOVA and ANN.
on load data.
Related to RES prospect, the study provided clustering Iran into 5
[97] studied application of big data analytics of consumption da-
clusters. The big data analysis is used for selection of optimal design
tasets to forecast future energy behavior and recommending system for
and areas of microgrid implementation and making action plan prior-
smart home consumers. Big communication data between appliances,
itized in Iran.
device active or inactive status, cloud stored home data, historical data,
and energy profiles of users are analyzed using ANN deployed on Su-
6.3. Power prediction perDoop to batch and real-time processing, excavate consumption
pattern, generate cheaper recommendations, and energy predictions.
Cloud based platform with big data find application in microgrid Zhu [135] presented a data mining technique based on Hadoop
planning through matching between the supply and the load. [127] HDFS and MapReduce platform with cloud technology for prediction of
proposes Cassandra cloud storage of historic weather, consumers pro- the yearly electrical power consumption using electrical power system
files, generation data, and load data. The power generation is predicted data.
based on the data. Processing the data is performed using HDFS for data
retrieval and then MapReduce for processing and prediction. 6.4. Electrical equipment assessment
Grolinger et al. [101] presented an energy consumption prediction
strategy based on local learning with Support Vector Regression (SVR) Big data provides an opportunity for the monitoring of the on-line
of big data [128]. The study compared SVR, local SVR, and H2O deep fault detection of equipment working in different environmental con-
learning results for energy consumption forecasting with superior ac- ditions. The approach in [131] shows the framework can handle mas-
curacy gained by Local SVR. sive data from different monitoring sensors for health assessment of the
Daki et al. [129] proposed big data based solution for energy fore- equipment to reduce the blackout and power outage cases.
casting system for matching the production with the consumption in a
Moroccan engineering school. Data sources include semi-structured 6.5. Control
weather data, unstructured Kafka streaming real time sensors data, and
structured operational data (planning and equipment) from rational Big data analytics can determine the risk based on analysis in the
data bases using HTTP protocol. Data stored in HBase column-oriented. power grid [136].
MapReduce for data processing to manage distributed computing
system. Zookeeper is a mandatory component for co-ordination for 6.5.1. Monitoring, reliability, management, cost reduction and maintenance
HBase. Machine learning algorithms are applied for prediction of the The management and maintenance of the DG is a challenging topic
electrical consumption. The studies can generally categorized as fol- to preserve healthy operation of those sources. Big data techniques can
lows: monitor the power system and evaluate its status for decision making
and visualization [137].
6.3.1. Intermittent sources forecasting Hu et al. [43] presented an experimental approach for PV power
Big data can improve forecasting of the output power of the inter- plant operation based on wireless sensor networks presented in 3.6.2.
mittent sources like the wind turbines. [130] used game theoretical big Using Hadoop for the big data collection of all sensors and historical
data analytics for short term prediction of the output power of the wind data, then merging and semi-supervised Support Vector Machine
energy in a microgrid in Hebei in China. techniques (SVM) [138] is used to distinguish the failed modules in real
[104] built big data based Solar Photovoltaic Power Forecasting time. The presented system can also anticipate the output power using
System (SPPFS) to forecast the photovoltaic energy accurately based on the irradiance data and uses Password-based Encrypted Group Key
real-time neural network processing of meteorological data. Agreement protocol [139] to protect the data transmission.

7
K. Moharm Advanced Engineering Informatics 42 (2019) 100945

Liu [107] et al. presented a big data platform for stability en- 6.5.2. Dispatch
hancement for inspection of fault conditions of transmission lines. Big data supports new methods for grid scheduling and dispatch
Based on the analysis, decisions can be addressed for prevention and process [152].
fast response when faults occur. The data analytics is performed using
Spark, Scala, R, Python and some computational tools like NumPy. The 6.5.3. Energy savings and optimization
results are mapped in 3D using Cesium. The correlations between [153] provided an energy saving optimization to reduce substations
transients and transmission equipment faults are addressed using big energy consumption based on HDFS storage, YARN for scheduling, and
data to enhance operator’s knowledge about the relationship between Hadoop big data analysis of the substation variables data like the cur-
the transmission weather, line trips, and voltage dips to take protection rent, voltage, line status, control equipment, primary and secondary
procedures. equipment status, environment, etc. The system also provides the vi-
[58] proposed a new big data architecture to recommend reduction sualization of the consumption data based on the analysis of energy
of energy consumption in industrial and mining application and pro- saving strategies.
vide insights and data visualizing for management and decision makers.
The proposed system makes use of MongoDB [140] for document sto- 6.5.4. Fault diagnosis
rage to store multi-structured data integrated with standalone Apache [47] supposed a big data-based algorithm driven by Hadoop and
Spark. Pyhton [141,142] is used for analytics and is integrated with Spark that provides support for maintenance of the vast amount of wind
Python using APIs. Applying the big data analytics on emissions, energy turbines per wind farm and detection of the abnormal conditions. The
consumption and costs real time data of mining and industrial com- approach used HDFS for wind turbines historical data, Apache Kafka for
panies, it provided identification about wastage areas and areas to read time data acquisition and Apache Spark for data processing. De-
improve to have more energy reduction and pathways to plan com- cision trees perform fault identification rules.
pany’s road maps. [154] provided health monitoring and failure analysis of wind
[143] presented a proof of concept big data analytics platform in turbines based on fuzzy-C means algorithm implemented by Hadoop
Alakhawayn University campus microgird using OpenStack [144] for HBase and MapReduce platform using cloud storage and computing of
high performance scalable cloud computing [145] and integrating with gathered sensors and SCADA data.
Hadoop and MapReduce for storage and analytics. Running electrical [28] proposed big data solution based on Hadoop for FFT analysis of
power consumption and storage status data are analyzed by big data wind turbine for anomaly detection.
platform to decision making purposes and decide either to store the
energy or transmit in the campus. For efficient energy management, 6.5.5. Maintenance
each water heater is connected using with an actuator controlled by the Maintenance of wind turbines affects the planning and dispatching
analytics of the big data gathered from temperature and humidity of the microgrid. Maintenance of the wind turbines is a field can get
sensors in the rooms connected using Zigbee [44] protocol through support from big data [155] because each wind turbine contains 30
Raspberry PI controller [146]. sensors and multiple business reports and descriptions. [28].
[76] piloted a renewable energy management system called Na-
tional Virtual Power Plant (NVPP) based on big data for excess energy 6.5.6. Storage systems
storage, exchanging, load shifting or load reducing. The system pro- Mismatch between generation and time varying load may result in
vides future energy demand prediction, policy and economy simula- disturbances in the grid. Storage system is one of the microgrid ele-
tions, automated demand response and monitoring based commis- ments to enhance reliable operation of the microgrid and generation-
sioning services using 2 clusters, one using Spark and the other is using load stabilization. There is a possibility of terminal voltage fault due to
Hadoop. The system uses Hbase is used for storage, Hadoop big data several reasons like mechanical weariness and mechanical failure inside
analytics of the billing and power metering data. The system uses Sqoop the battery. Big data can support building fault recognition and de-
for gathering structured data, MapReduce for processing, Zookeeper tection of the exceptional faults in the battery systems depending on
and Apache Chukwa for ease of synchronization and searching. The statistical methods and machine learning [30].
machine algorithm is implemented using Mahout and RHadoop for Undersized batteries in the microgrid leads to possible power
mining and visualization. outages and unreliable operation and higher sizes of unneeded batteries
Liu et al. [147] implemented an energy monitoring system in Tun- in the microgrid is high cost. Big data simulation can be used to opti-
ghai University that gathers the sensors and the smart meters data using mally design storage systems to operate with intermittent PV genera-
Java programs and then stores it into HDFS and Hive for real-time tion and isolated PV based microgrids [156].
Spark big data processing for the historical data (using Sqoop) and the
real time data. 6.5.7. Stability
[94] provided Spark big data processing based on hardware and Microgrids suffers from lack of inertia that sustain the stability of
software implementation to get the stability estimation and monitoring the grid. The big data analytics can be used to improve the stability of
using PMUs data. The system used Apache Kafka for streaming data, microgrid. [40] proposed a big data approach for stability improvement
Cassandra for data storage, and Spark for streaming data processing. at microgrid islanding operation when for example a failure happen in a
While [148] based environment for big data analytics of millions of transmission line. The approach is based on monitoring to detect the
PMUs data for analysis and locating disturbances. sudden islanding based on Spatial-Temporal approach. After detection
[149] used parallel de-trended fluctuation analysis for fast detection of outage, optimally PSO approach is employed to enhance dynamic
of PMU measurements based on MapReduce processing of the PMUs response simultaneously with occurrence of disturbances in most effi-
data from openPDC [150]. MapReduce is dividing the PMUs data into cient operation.
smaller units and then processing the smaller units to calculate the [157] proposed intelligent stability analysis system using Hadoop
fluctuation while Reduce stage to compare the fluctuation to the processing on historical data, SCADA, EMS, WAMS, etc. for addressing
threshold to identify abnormal situation. the stability and risk management with the aid of advanced data mining
[151] proposed a big data based real time monitoring system for techniques like ANN, Deep Reinforcement learning, and Automatic
power loss in the transmission lines based on Spark streaming analysis supervised learning.
of big data collected from spatial data and monitoring data.
6.5.8. Energy efficiency support
[158] proposed a big data-based approach for energy loss detection

8
K. Moharm Advanced Engineering Informatics 42 (2019) 100945

through analysis of irregular electrical consumption patterns inside 6.7. Marketing enhancement
buildings based on MongoDB data storage for historical, operational
(lighting, temperature, ventilation, etc.), environmental, and energy Big data analytics supports customers segmentation and do statistics
usage data. for the market. [168] proposed an initial prototype for cloud based
Hadoop platform for doing energy consumption statistics. Processed
6.5.9. Power quality assessment data in < key, pair > values where the key is the serial number of the
[159] proposed real-time transient response Naive Bayes classifi- customer and the value is the energy feeding source values whether
cation [160] based on Hadoop-MapReduce platform big data processing wind, hydro power, thermal, etc. Reduce task to collect the number of
on historical and streaming user data to identify the status of the customer per each energy source for statistics purposes.
whether regular, small disturbance, potential risk or defect mode.
[111] proposed a Hadoop based system for harmonics analysis in 6.8. Detection of data attacks
active distribution systems for THD and harmonics estimation.
As data grows in the power grid, it is prone to hacking attacks. For
6.5.10. Power-cut management example, SCADA systems nodes communicates packets data of high
Weather and environmental conditions usually cause power grid importance. The packets include multiple attributes like the setpoints,
cut. Big data can analyze the historical data, PMUs, sensors readings gain, control commands, etc. Big data supports classification of those
and Geographic Information Systems along with advanced measure- attacks for proper action. By K-means clustering for analysis of SCADA
ment units, the environmental conditions, and customer interaction communication packets history, Paramkusem and Aygun [117] pro-
with emails or calls for better prediction and localization of faults, vided a big data analysis of communication packets (read and write
sensitive area proactive measures, and detailed analysis of protection packets) in the SCADA nodes based on Hadoop and random forest
schemes reaction [49]. classification [169] built on Mahout for detection and classification of
hacking attacks.
6.5.11. Intensive power flow analysis calculation
7. Challenges & future prospects
Big data superior computational analysis can be used to solve the
complex power flow analysis. Spark and its interfacing with machine
Big Data in the microgrid offers a lot of achievements, simulta-
learning can solve the increased complexity of matrix algebra Newton-
neously, it offers open points to be proceeded in future work. Some
Raphson equations [161] for power flow analysis [162]. Newton-
challenges are in the fields of:
Raphson method includes the calculation of the power mismatch, par-

• Real time parallel processing as some big data like Hadoop not able
tial derivatives Jacobian matrix of active and reactive power with re-
spect to the phase angle and voltage magnitude, and the unknown
to operate with the streaming data [170] given the persistent need
vector of voltage and phase angle at each iteration. Spark can help in
for very hard real time processing as the transient control in the
calculation of the computational effort like the matrix inverse and
microgrid for extreme conditions like faults requires real-time re-
iterations. Spark deployment lead to six times increase in calculation.
action. Even if the latency, processing congestion, and how com-
plicated were the algorithms, the reaction time may be less than
6.6. Customer loads milliseconds [171].

[163] proposed applying big data on SCADA systems steady state


• Data Collection: Data collected may be introduced to some in-
accurate, incomplete and unreliable at some times. For example, not
readings combined with WAMS dynamic data for real time modeling all microgrid sources and elements would have techniques for real
the loads. time data transmission or cases when some sensors are missing in
Big data can provide customer analysis for variety of applications some units. More investment can be directed to operate new sensors
like pricing [164] and optimizing power delivery efficiency. [32] pro- technologies and enhanced data quality and automation of all data
posed a system for 800,000 power consuming data, consumer analysis, entries. The data collection requires also support by the government.
social, economic, and geographical data in Shanghai city. Hadoop
platform is employed after data cleaning. The data mining can lead to
• Data Privacy: Data can be infiltrate, so it must be protected espe-
cially for some personal data like usage, and load information. Law
improve planning and load side management. can enforce data privacy for the big data in the microgrid using
legalization strict laws for data ownership [19]. Customers must
6.6.1. Smart building control trust their private data for its intended use only.
Big data and IoT can control the smart building appliances. [165]
proposed an integrated IoT based big data analytics system with virtual
• Data security: Data should be secure from hacking activities. This
data is highly considered in prediction and analysis, so it must have
streaming sensors transferred from sensors using TCP (Transmission secure data transmission and highly encrypted [172]. Missing a
Control Protocol) [166] to Flume agent and stored in HDFS and pro- platform that incorporates data security mechanisms with big data
cessing using Spark for visualization and auto tuning of oxygen pump analytics in the microgrid [173].
control, illuminance adjusting and fire alarming in the building.
• Data storage: The expansion in the amount of data, representation,
and format requires expansion in the data storage management tools
6.6.2. Load adaptation to supply and technologies.
[167] provided a big data real time processing using Apache Storm • Data visualization enhancements: The waveforms of electrical net-
to reduce load at the peaks by disconnecting selected loads and identify work can be visualized easily. However, for the complicated re-
if the building is empty to be disconnected and the savings. A Storm lationships and correlations required for decision making or getting
streaming data spout is gathered from customer location through mo- value behind the data needs to be enhanced given the multi-source
bile app and the other spout represents device measurements. The da- data, different standard, and variable formatting [171].
tabase contains info about all appliances inside the home, family size, • Data Processing: Historical data shall be merged with the real time
etc. Data is processed (Storm bolts) through retrieving house ID and instantaneous gathered data related to operation and electricity
location, checking the family is inside the house based on their location usage and consumption and proceeded with in very short time.
and home location, customer permission to disconnect his devices, and • Data opening: records the data and enable public to have good
calculate the potential savings. quality data to enable companies and developers find ways to

9
K. Moharm Advanced Engineering Informatics 42 (2019) 100945

improve the grid. soundly the microgrid creating it more efficient, reliable, and profitable
• Existence of reliable databases for power data collected from dif- microgrid.
ferent sources.
• Cost effective big data solutions for the whole solution including Declaration of Competing Interest
data center, tools, storage, control algorithms, warehouses, etc. The
current power grid stakeholders cannot easily accept cost of the The authors declare that they have no known competing financial
massive sensors and the new infrastructure [174]. interests or personal relationships that could have appeared to influ-
• Integrated holistic approach study of big data solution incorporates ence the work reported in this paper.
cloud with sufficient scalable specialized functionality in the mi-
crogrid. Appendix A. Supplementary material
• Perform online learning and make use of its continuous feedback
correction and its superior precise prediction than the statistical Supplementary data associated with this article can be found, in the
machine learning approaches [175,176]. online version, at https://doi.org/10.1016/j.aei.2019.100945.
• Unsuccessful applicable obligatory data standards in the microgrid
and lack of big data analysts for the whole microgrid data sources References
and processing. [172]: Several sensors and communication meth-
odologies are producing data operating in the microgrid. The need [1] S. John Walker, Big data: A revolution that will transform how we live, work, and
for a standard communication and protocols describes the speed, think, 2014.
[2] A. Vera-Baquero, R. Colomo-Palacios, O. Molloy, Towards a process to guide big
data transfer, and secure data transmission.

data based decision support systems for business processes, Procedia Technol. 16
Data integration [172,177]: The challenge for integrating different (2014) 11–21.
sources like social data, sensors, reports, etc. together as there is no [3] K. Venkatram, M.A. Geetha, Review on big data & analytics–concepts, philosophy,
process and applications, Cybernet. Inform. Technol. 17 (2) (2017) 3–27.
standardized environment today.

[4] M.R. Bendre, V.R. Thool, Analytics, challenges and applications in big data en-
Big data techniques are viable and effective for only huge data sets. vironment: a survey, J. Manage. Analyt. 3 (3) (2016) 206–239.
Little data sets will not have the effectiveness of technologies like [5] B. Fang, P. Zhang, Big data in finance, Big Data Concepts, Theories, and
Applications, Springer, 2016, pp. 391–412.
Hadoop [178].

[6] X. Ge, J. Jackson, The big data application strategy for cost reduction in auto-
Combination of cloud computing and big data processing provide motive industry, SAE Int. J. Commer. Veh. 7 (2014-01-2410) (2014) 588–598.
new perspectives for analysis and incorporation of IoT data for [7] Y. Wang, Q. Chen, T. Hong, C. Kang, Review of smart meter data analytics: ap-
plications, methodologies, and challenges, IEEE Trans. Smart Grid (2018).
management of the microgrid [179,180]. Big data and cloud com-
[8] X. Liu, B. Su, Microgrids-an integration of renewable energy technologies,
puting are complementary as big data is an application of cloud Electricity Distribution, 2008. CICED 2008. China International Conference on,
computing while cloud computing provide tools for big data pro- IEEE, 2008, pp. 1–7.
cessing [115]. [9] H. Jiang, K. Wang, Y. Wang, M. Gao, Y. Zhang, Energy big data: a survey, IEEE


Access 4 (2016) 3844–3861.
Using Starfish for Hadoop based big data processing for optimizing [10] S. Sagiroglu, R. Terzi, Y. Canbay, I. Colak, Big data issues in smart grid systems,
the configurations to get the maximum Hadoop power and flex- Renewable Energy Research and Applications (ICRERA), 2016 IEEE International
ibility like number of reducers, memory settings, etc. [181]. Hadoop Conference on, IEEE, 2016, pp. 1007–1012.
[11] Y. Song, G. Zhou, Y. Zhu, Present status and challenges of big data processing in
has over 190 configurations parameters having high performance smart grid, Power Syst. Technol. 37 (4) (2013) 927–935.
effect and need to be investigated. Few researches [182] consider [12] Z. Huang, H. Luo, D. Skoda, T. Zhu, Y. Gu, E-sketch: Gathering large-scale energy
tuning of the Hadoop parameters in power grid applications and consumption data based on consumption patterns, Big Data (Big Data), 2014 IEEE
International Conference on, IEEE, 2014, pp. 656–665.
show more speedy performance for PMUs data processing. Other [13] J. Zhu, E. Zhuang, J. Fu, J. Baranowski, A. Ford, J. Shen, A framework-based
advances can be achieved with the optimal configuration. approach to utility big data analytics, IEEE Trans. Power Syst. 31 (3) (2016)
• Big data sources integration 2455–2462.

• Uncertainty about the big data integration in the current microgrid [14] H. Akhavan-Hejazi, H. Mohsenian-Rad, Power systems big data analytics: an as-
sessment of paradigm shift barriers and prospects, Energy Rep. 4 (2018) 91–100.
systems [183]. [15] C. Kang, Y. Wang, Y. Xue, G. Mu, R. Liao, Big data analytics in china’s electric
• The need for very effective preprocessing and data quality en- power industry: modern information, communication technologies, and millions
of smart meters, IEEE Power Energ. Mag. 16 (3) (2018) 54–65.
hancement algorithms. For example, the failure of a critical sensor
[16] R. Lu, H. Zhu, X. Liu, J.K. Liu, J. Shao, Toward efficient and privacy-preserving
must be discovered and cleaned [183]. computing in big data era, IEEE Network 28 (4) (2014) 46–50.
• Missing innovative case-studies that transform big data into opera- [17] M.F. Uddin, N. Gupta, et al., Seven v’s of big data understanding big data to extract
value, American Society for Engineering Education (ASEE Zone 1), 2014 Zone 1
tional intelligence [184].

Conference of the, IEEE, 2014, pp. 1–5.
Imperfect research on big data system architecture analytics design [18] S.E. Haupt, B. Kosović, Variable generation power forecasting as a big data pro-
in the microgrid [184] blem, IEEE Trans. Sustain. Energy 8 (2) (2017) 725–732.

• Missing return on investment studies and researches about impact of [19] N. Amaro, J.M. Pina, Big data in power systems leveraging grid optimization and
wave energy integration, Engineering, Technology and Innovation (ICE/ITMC),
applying costly big data analytics in the power grid.

2017 International Conference on, IEEE, 2017, pp. 1046–1054.
The necessity of intelligent selective real-time processing algorithms [20] K. Ye, Y. Cao, F. Xiao, J. Bai, F. Ma, Y. Hu, Research on unified information model
for big gathered data as the blackout may occur within multiple for big data analysis of power grid equipment monitoring, Computer and
Communications (ICCC), 2017 3rd IEEE International Conference on, IEEE, 2017,
seconds [173].

pp. 2334–2337.
Skills leakage for big data experts in the energy sector [185]. [21] Y. Guo, S. Feng, K. Li, W. Mo, Y. Liu, Y. Wang, Big data processing and analysis
• Lack of familiarity of end user with energy data [185]. platform for condition monitoring of electric power system, Control (CONTROL),
2016 UKACC 11th International Conference on, IEEE, 2016, pp. 1–6.
[22] Y. Wang, M. Deng, Y. Bao, H. Zhang, J. Chen, J. Qian, C. Guo, Power system
8. Conclusion disaster-mitigating dispatch platform based on big data, Power System Technology
(POWERCON), 2014 International Conference on, IEEE, 2014, pp. 1014–1019.
[23] F.C. Trindade, L.F. Ochoa, W. Freitas, Data analytics in smart distribution net-
In this paper, we demonstrated the Big Data relevant opportunities
works: applications and challenges, Innovative Smart Grid Technologies-Asia
and studies in the microgrid applications using the presented layered (ISGT-Asia), 2016 IEEE, IEEE, 2016, pp. 574–579.
analytics framework. The data sources, storage, analytics, are pre- [24] I. Modbus, Modbus messaging on tcp, IP Implementation Guide v1. 0a, North
sented. Several opportunities are presented like the optimal design of Grafton, Massachusetts (www.modbus.org/specs.php), 2004.
[25] M. Aiello, G.A. Pagani, The smart grid’s data generating potentials, Computer
power plants, grid stability enhancement, optimal dispatching, assets Science and Information Systems (FedCSIS), 2014 Federated Conference on, IEEE,
maintenance, and fault diagnosis. Nevertheless, there are still open 2014, pp. 9–16.
challenges like the data security and privacy. Big Data will impact [26] G. Suciu, A. Vulpe, A. Martian, S. Halunga, D.N. Vizireanu, Big data processing for

10
K. Moharm Advanced Engineering Informatics 42 (2019) 100945

renewable energy telemetry using a decentralized cloud m2m system, Wireless [62] D. Cutting, Data interoperability with apache avro, Cloudera Blog, 2011.
Pers. Commun. 87 (3) (2016) 1113–1128. [63] V.K. Vavilapalli, A.C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans,
[27] I. Nopens, L. Benedetti, U. Jeppsson, M.-N. Pons, J. Alex, J.B. Copp, K.V. Gernaey, T. Graves, J. Lowe, H. Shah, S. Seth, et al., Apache hadoop yarn: yet another
C. Rosen, J.-P. Steyer, P.A. Vanrolleghem, Benchmark simulation model no 2: fi- resource negotiator, Proceedings of the 4th Annual Symposium on Cloud
nalisation of plant layout and default control strategy, Water Sci. Technol. 62 (9) Computing, ACM, 2013, p. 5.
(2010) 1967–1974. [64] J. Tan, The design and implementation of big data platform for telecom operators,
[28] D. Ferguson, V. Catterson, Big data techniques for wind turbine condition mon- International Conference on Industrial IoT Technologies and Applications,
itoring, European Wind Energy Association Annual Event (EWEA 2014), 2014. Springer, 2016, pp. 3–11.
[29] A.A. Jiménez, C.Q. Gómez, F.P.G. Márquez, Concentrated solar plants manage- [65] M. Bhandarkar, Mapreduce programming with apache hadoop, Parallel &
ment: Big data and neural network, Renewable Energies, Springer, 2018, pp. Distributed Processing (IPDPS), 2010 IEEE International Symposium on, IEEE,
63–81. 2010, p. 1.
[30] Y. Zhao, P. Liu, Z. Wang, L. Zhang, J. Hong, Fault and defect diagnosis of battery [66] Apache zeppelin, https://zeppelin.apache.org/ (accessed: 2018-11-16).
for electric vehicles based on big data analysis methods, Appl. Energy 207 (2017) [67] A. MadhaviLatha, G.V. Kumar, Streaming data analysis using apache cassandra
354–362. and zeppelin, IJISET-Int. J. Innov. Sci. Eng. Technol. 3 (10) (2016).
[31] I.G. Alonso, O.Á. Fres, A.A. Fernández, P.G. del Torno, J.M. Maestre, [68] S. Haloi, Apache ZooKeeper Essentials, Packt Publishing Ltd, 2015.
M.A.G. Fuente, Towards a new open communication standard between homes and [69] A. Rabkin, R. Katz, Chukwa: A system for reliable large-scale log collection, in:
service robots, the dhcompliant case, Robot. Auton. Syst. 60 (6) (2012) 889–900. Proceedings of LISA’10: 24th Large Installation System Administration
[32] H. Qu, P. Ling, L. Wu, Electricity consumption analysis and applications based on Conference, 2010, p. 163.
smart grid big data, Ubiquitous Intelligence and Computing and 2015 IEEE 12th [70] Apache samoa, https://samoa.incubator.apache.org/ (accessed: 2018-11-16).
Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on [71] G. Ingersoll, Introducing Apache Mahout, Scalable, Commercial-friendly Machine
Scalable Computing and Communications and Its Associated Workshops (UIC- Learning for Building Intelligent Applications, IBM, 2009.
ATC-ScalCom), 2015 IEEE 12th Intl Conf on, IEEE, 2015, pp. 923–928. [72] L. Cai, X. Guan, P. Chi, L. Chen, J. Luo, Big data visualization collaborative fil-
[33] K. Zhou, C. Fu, S. Yang, Big data driven smart energy management: from big data tering algorithm based on rhadoop, Int. J. Distrib. Sens. Netw. 11 (10) (2015)
to big insights, Renew. Sustain. Energy Rev. 56 (2016) 215–225. 271253.
[34] T. Dunne, Big data, analytics, and energy consumption, 2012. [73] Apachepig, https://pig.apache.org/ (accessed: 2018-11-16).
[35] A. Nasiakou, M. Alamaniotis, L.H. Tsoukala, Power distribution network parti- [74] Apacheflume, https://flume.apache.org/ (accessed: 2018-11-16).
tioning in big data environment using k-means and fuzzy logic, 2016. [75] A.H. Team, Apache hbase reference guide, Apache, version, vol. 2, no. 0, 2016.
[36] E. Aarseth, Electric vehicle service center and method for exchanging and charging [76] J. Choi, M. Kim, J. Yoon, Implementation of the big data management system for
vehicle batteries, Dec. 7 1999, US Patent 5,998,963. demand side energy management, Computer and Information Technology;
[37] S.E. Haupt, B. Kosovic, Big data and machine learning for applied weather fore- Ubiquitous Computing and Communications; Dependable, Autonomic and Secure
casts: forecasting solar power for utility operations, Computational Intelligence, Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM),
2015 IEEE Symposium Series on, IEEE, 2015, pp. 496–501. 2015 IEEE International Conference on, IEEE, 2015, pp. 1515–1520.
[38] Nasa, https://www.nasa.gov/ (accessed: 2018-11-16). [77] A. Lakshman, P. Malik, Cassandra: a decentralized structured storage system, ACM
[39] Irimo, http://irimo.ir/far/eng (accessed: 2018-11-16). SIGOPS Oper. Syst. Rev. 44 (2) (2010) 35–40.
[40] H. Jiang, Y. Li, Y. Zhang, J.J. Zhang, D.W. Gao, E. Muljadi, Y. Gu, Big data-based [78] K. Banker, MongoDB in Action, Manning Publications Co., 2011.
approach to detect, locate, and enhance the stability of an unplanned microgrid [79] K. Ting, J.J. Cecho, Apache Sqoop Cookbook: Unlocking Hadoop for Your
islanding, J Energy Eng 143 (5) (2017) 04017045. Relational Database, O’Reilly Media, Inc., 2013.
[41] B. Yang, J. Yamazaki, N. Saito, Y. Kokai, D. Xie, Big data analytic empowered grid [80] B. Momjian, PostgreSQL: Introduction and Concepts vol. 192, Addison-Wesley,
applications-is pmu a big data issue? European Energy Market (EEM), 2015 12th New York, 2001.
International Conference on the, IEEE, 2015, pp. 1–4. [81] J. Jung, Sql-on-hadoop with apache tajo, and application case of sk telecom, 2013.
[42] M.J. Prieto, A.M. Pernía, F. Nuño, J. Díaz, P.J. Villegas, Development of a wireless [82] Spark, https://spark.apache.org/ (accessed: 2018-11-16).
sensor network for individual monitoring of panels in a photovoltaic plant, Sensors [83] M. Zaharia, R.S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen,
14 (2) (2014) 2379–2396. S. Venkataraman, M.J. Franklin, et al., Apache spark: a unified engine for big data
[43] T. Hu, M. Zheng, J. Tan, L. Zhu, W. Miao, Intelligent photovoltaic monitoring processing, Commun. ACM 59 (11) (2016) 56–65.
based on solar irradiance big data and wireless sensor networks, Ad Hoc Netw. 35 [84] Y. Gu, H. Jiang, Y. Zhang, J.J. Zhang, T. Gao, E. Muljadi, Knowledge discovery for
(2015) 127–136. smart grid operation, control, and situation awareness-a big data visualization
[44] P. Kinney et al., Zigbee technology: wireless control that simply works, in: platform, North American Power Symposium (NAPS), 2016, IEEE, 2016, pp. 1–6.
Communications design conference, vol. 2, 2003, pp. 1–7. [85] Flink, https://flink.apache.org/ (accessed: 2018-11-16).
[45] M. Rodriguez, I. González, E. Zalama, Identification of electrical devices applying [86] P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, K. Tzoumas, Apache
big data and machine learning techniques to power consumption data, flink: stream and batch processing in a single engine, Bull. IEEE Comput. Soc.
International Technology Robotics Applications, Springer, 2014, pp. 37–46. Tech. Committee Data Eng. 36 (4) (2015).
[46] M. Chen, S. Mao, Y. Liu, Big data: a survey, Mob. Netw. Appl. 19 (2) (2014) [87] S. Aridhi, E.M. Nguifo, Big graph mining: Frameworks and techniques, Big Data
171–209. Res. 6 (2016) 1–10.
[47] I. Abdallah, V. Dertimanis, H. Mylonas, K. Tatsis, E. Chatzi, N. Dervilis, K. Worden, [88] Storm, http://storm.apache.org/ (accessed: 2018-11-16).
E. Maguire, Fault diagnosis of wind turbine structures using decision tree learning [89] M.H. Iqbal, T.R. Soomro, Big data analysis: apache storm perspective, Int. J.
algorithms with big data, Safety and Reliability-Safe Societies in a Changing Comput. Trends Technol. 19 (1) (2015) 9–14.
World, CRC Press, 2018, pp. 3053–3061. [90] F. Xhafa, V. Naranjo, S. Caballé, Processing and analytics of big data streams with
[48] S.M. Bhuiyan, J.F. Khan, G.V. Murphy, Big data analysis of the electric power pmu yahoo! s4, Advanced Information Networking and Applications (AINA), 2015 IEEE
data from smart grid, SoutheastCon, 2017, IEEE, 2017, pp. 1–5. 29th International Conference on, IEEE, 2015, pp. 263–270.
[49] P.-C. Chen, T. Dokic, M. Kezunovic, The use of big data for outage management in [91] Streambase, https://www.tibco.com/products/tibco-streambase (accessed: 2018-
distribution systems, International Conference on Electricity Distribution (CIRED) 11-16).
Workshop, 2014. [92] P. Zadrozny, R. Kodali, Big Data Analytics Using Splunk: Deriving Operational
[50] Sinogreenergy, https://www.Sinogreenergy.com/ (accessed: 2018-11-16). Intelligence from Social Media, Machine Data, Existing Data Warehouses, and
[51] Energyplus, https://EnergyPlus.net/ (accessed: 2018-11-16). Other Real-time Streaming Sources, Apress, 2013.
[52] Nrel, https://www.nrel.gov/ (accessed: 2018-11-16). [93] Kafka, https://kafka.apache.org/ (accessed: 2018-11-16).
[53] Open data, https://open-enernoc-data.s3.amazonaws.com/anon/index.html (ac- [94] R. Shyam, B.G. HB, S. Kumar, P. Poornachandran, K. Soman, Apache spark a big
cessed: 2018-11-16). data analytics platform for smart grid, Procedia Technol. 21 (2015) 171–178.
[54] Hadoop, https://hadoop.apache.org/ (accessed: 2018-11-16). [95] Ingenia company, https://www.ingenia.es/ (accessed: 2018-11-16).
[55] T. White, Hadoop: The Definitive Guide, O’Reilly Media, Inc., 2012. [96] A. Cortés, A. Téllez, M. Gallardo, J. Peralta, Big data technology to exploit climate
[56] K. Shvachko, H. Kuang, S. Radia, R. Chansler, The hadoop distributed file system, information/consumption models and to predict future behaviours, International
Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, Technology Robotics Applications, Springer, 2014, pp. 25–36.
IEEE, 2010, pp. 1–10. [97] I. González Alonso, M. Rodríguez Fernández, A holistic approach to energy effi-
[57] J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters, ciency systems through consumption management and big data analytics, Int. J.
Commun. ACM 51 (1) (2008) 107–113. Adv. Softw. 6 (2013).
[58] J. Herman, H. Herman, M.J. Mathews, J.C. Vosloo, Using big data for insights into [98] Ps clementine, http://www.psclementinepro.pl/index_en.html (accessed: 2018-
sustainable energy consumption in industrial and mining sectors, J. Clean. Prod. 11-16).
197 (2018) 1352–1364. [99] L. Guo, H. Yan, Y. Hao, Y. Chen, Comprehensive analysis and evaluation of big
[59] H. Daki, A. El Hannani, A. Aqqal, A. Haidine, A. Dahbi, Big data management in data for main transformer equipment based on pca and apriority, IOP Conference
smart grid: concepts, requirements and implementation, J. Big Data 4 (1) Series: Earth and Environmental Science, vol. 108, no. 5, IOP Publishing, 2018, p.
(2017) 13. 052026.
[60] Y. Huai, A. Chauhan, A. Gates, G. Hagleitner, E.N. Hanson, O. O’Malley, J. Pandey, [100] H2o, https://www.h2o.ai/ (accessed: 2018-11-16).
Y. Yuan, R. Lee, X. Zhang, ”Major technical advancements in apache hive, [101] K. Grolinger, M.A. Capretz, L. Seewald, Energy consumption prediction with big
Proceedings of the 2014 ACM SIGMOD International Conference on Management data: Balancing prediction accuracy and computational resources, Big Data
of Data, ACM, 2014, pp. 1235–1246. (BigData Congress), 2016 IEEE International Congress on, IEEE, 2016, pp.
[61] S. Kamburugamuve, G. Fox, D. Leake, J. Qiu, Survey of apache big data stack, 157–164.
Indiana University, Tech. Rep., 2013. [102] M. Ruiguang, W. Haiyan, Z. Quanming, L. Yuan, Technical research on the electric

11
K. Moharm Advanced Engineering Informatics 42 (2019) 100945

power big data platform of smart grid, MATEC Web of Conferences, vol. 139, EDP cost-effective batch and speed big data processing, Big Data (Big Data), 2015 IEEE
Sciences, 2017, p. 00217. International Conference on, IEEE, 2015, pp. 2785–2792.
[103] J. Neto, P. de Andrade, J. Vilanueva, F. Santos, Big data analytics of smart grids [133] P. Zhang, X. Wu, X. Wang, S. Bi, Short-term load forecasting based on big data
using artificial intelligence for the outliers correction at demand measurements, technologies, CSEE J. Power Energy Syst. 1 (3) (2015) 59–67.
2018 3rd International Symposium on Instrumentation Systems, Circuits and [134] H. Zhao, Z. Tang, W. Shi, Z. Wang, Study of short-term load forecasting in big data
Transducers (INSCIT), IEEE, 2018, pp. 1–6. environment, Control And Decision Conference (CCDC), 2017 29th Chinese, IEEE,
[104] J. Wang, Y. Chen, R. Hua, P. Wang, J. Fu, A distributed big data storage and data 2017, pp. 6673–6678.
mining framework for solar-generated electricity quantity forecasting, Photonics [135] J. Zhu, Research on data mining of electric power system based on hadoop cloud
and Optoelectronics Meetings (POEM) 2011: Optoelectronic Devices and computing platform, Int. J. Comput. Appl. (2017) 1–7.
Integration, vol. 8333, International Society for Optics and Photonics, 2012, p. [136] B. Schmidt, P. Flannery, M. DeSantis, Real-time predictive analytics, big data &
83330S. energy market efficiency: key to efficient markets and lower prices for consumers,
[105] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, Appl. Mech. Mater. 704 (2014).
D. Tsai, M. Amde, S. Owen, et al., Mllib: machine learning in apache spark, J. [137] Y. Guo, Z. Yang, S. Feng, J. Hu, Complex power system status monitoring and
Mach. Learn. Res. 17 (1) (2016) 1235–1241. evaluation using big data platform and machine learning algorithms: a review and
[106] R.S. Xin, J.E. Gonzalez, M.J. Franklin, I. Stoica, Graphx: a resilient distributed a case study, Complexity 2018 (2018).
graph system on spark, First International Workshop on Graph Data Management [138] K.P. Bennett, A. Demiriz, Semi-supervised support vector machines, in: Advances
Experiences and Systems, ACM, 2013, p. 2. in Neural Information processing systems, 1999, pp. 368–374.
[107] Y. Liu, Y. Guo, Z. Yang, J. Hu, G. Lu, Y. Wang, Power system transmission line [139] R. Dutta, R. Barua, Password-based encrypted group key agreement, IJ Netw.
tripping analysis using a big data platform with 3d visualization, Computational Secur. 3 (1) (2006) 23–34.
Intelligence (SSCI), 2017 IEEE Symposium Series on, IEEE, 2017, pp. 1–8. [140] K. Chodorow, MongoDB: The Definitive Guide: Powerful and Scalable Data
[108] Y.X. Sun, Q.Y. Yan, The application mode of energy big data and its enlightenment Storage, O’Reilly Media Inc, 2013.
for power grid enterprises, Advanced Materials Research, vol. 1008, Trans Tech [141] T.E. Oliphant, Python for scientific computing, Comput. Sci. Eng. 9 (3) (2007).
Publ, 2014, pp. 1452–1455. [142] Python, https://www.python.org/ (accessed: 2018-11-16).
[109] J. Tian, L. Huang, Big data analysis and simulation of distributed marine green [143] M.R. Abid, R. Lghoul, D. Benhaddou, Ict for renewable energy integration into
energy resources grid-connected system, Polish Marit. Res. 24 (s1) (2017) smart buildings: Iot and big data approach, AFRICON, 2017 IEEE, IEEE, 2017, pp.
182–191. 856–861.
[110] D. Liu, G. Li, R. Fan, G. Guo, Research about big data platform of electrical power [144] K. Pepple, Deploying Openstack, O’Reilly Media, Inc., 2011.
system, International Conference on Industrial IoT Technologies and Applications, [145] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee,
Springer, 2016, pp. 36–43. D. Patterson, A. Rabkin, I. Stoica, et al., A view of cloud computing, Commun.
[111] Z. Cao, J. Lin, C. Wan, Y. Song, G. Taylor, M. Li, Hadoop-based framework for big ACM 53 (4) (2010) 50–58.
data analysis of synchronised harmonics in active distribution network, IET Gener. [146] E. Upton, G. Halfacree, Raspberry Pi User Guide, John Wiley & Sons, 2014.
Transm. Distrib. 11 (16) (2017) 3930–3937. [147] R.-H. Liu, C.-F. Kuo, C.-T. Yang, S.-T. Chen, J.-C. Liu, On construction of an energy
[112] N. Mollaei, S.H. Mousavi, Application of a hadoop-based distributed system for monitoring service using big data technology for smart campus, 2016 7th
offline processing of power quality disturbances, Int. J. Power Electron. Drive Syst. International Conference on Cloud Computing and Big Data (CCBD), IEEE, 2016,
(IJPEDS) 8 (2) (2017) 695–704. pp. 81–86.
[113] lectricité de france, www.edf.fr (accessed: 2018–11-16). [148] N. Matloff, The Art of R Programming: A Tour of Statistical Software Design, No
[114] S.A. Jees, V. Gomathi, Load forecasting for smart grid using non-linear model in Starch Press, 2011.
hadoop distributed file system, Cluster Comput. (2018) 1–13. [149] M. Khan, M. Li, P. Ashton, G. Taylor, J. Liu, Big data analytics on pmu measure-
[115] J. Zhan, J. Huang, L. Niu, X. Peng, D. Deng, S. Cheng, “Study of the key tech- ments, 2014.
nologies of electric power big data and its application prospects in smart grid, [150] G.P. Alliance, Openpdc, 2011. Available on-line: http://openpdc.codeplex.com.
Power and Energy Engineering Conference (APPEEC), 2014 IEEE PES Asia-Pacific, [151] Q.-H. Huang, J. Huang, X.-Z. Wang, C.-Q. Huang, X.-C. Heng, Big data based
IEEE, 2014, pp. 1–4. service platform and its typical applications of electric power industries, DEStech
[116] L. Chang, X. Wang, M. Mao, Forecast of schedulable capacity for thermostatically Transactions on Environment, Energy and Earth Sciences, no. epee, 2017.
controlled loads with big data analysis, Power Electronics for Distributed [152] A. Minghao, G. Xianjun, W. Xiaohui, L. Zhihong, C. Naishi, P. Tianjiao, X. Zhiheng,
Generation Systems (PEDG), 2017 IEEE 8th International Symposium on, IEEE, Y. Fei, A big data analysis based new method for power grid dispatch and control
2017, pp. 1–6. training simulation, Electricity Distribution (CICED), 2016 China International
[117] K.M. Paramkusem, R.S. Aygun, Classifying categories of scada attacks in a big data Conference on, IEEE, 2016, pp. 1–4.
framework, Ann. Data Sci. (2018) 1–28. [153] J. Mu, Y. Pei, W. Li, J. Xu, C. Guan, J. Dong, H. Zhang, L. Pan, Research on energy
[118] M. Kezunovic, L. Xie, S. Grijalva, The role of big data in improving power system saving optimization strategy of substation operation based on big data technology,
operation and protection, Bulk Power System Dynamics and Control-IX 2018 Chinese Control And Decision Conference (CCDC), IEEE, 2018, pp.
Optimization, Security and Control of the Emerging Power Grid (IREP), 2013 IREP 3567–3571.
Symposium, IEEE, 2013, pp. 1–9. [154] H. Wang, S. Zhao, H. Zhao, Y. Yue, Research on data processing for condition
[119] X. Qu, A. Jain, N.N. Rajput, L. Cheng, Y. Zhang, S.P. Ong, M. Brafman, E. Maginn, monitoring of wind turbine based on hadoop platform, Mechatronics and
L.A. Curtiss, K.A. Persson, The electrolyte genome project: a big data approach in Automation (ICMA), 2017 IEEE International Conference on, IEEE, 2017, pp.
battery materials discovery, Comput. Mater. Sci. 103 (2015) 56–67. 322–326.
[120] S. Bryan, M. Guido, D. Ostrowski, N.K. Ahmed, Big data analysis of battery charge [155] E.G. Nabati, K.D. Thoben, Big data analytics in the maintenance of off-shore wind
power limit impact on electric vehicle driving range while considering driving turbines: a study on data characteristics, Dynamics in Logistics, Springer, 2017,
behavior, SAE Technical Paper, Tech. Rep., 2017. pp. 131–140.
[121] K.J. Chalvatzis, H. Malekpoor, N. Mishra, F. Lettice, S. Choudhary, Sustainable [156] Q. Yuan, K. Zhou, W. Lu, J. Yao, Big data driven optimal sizing of stand-alone
resource allocation for power generation: the role of big data in enabling inter- photovoltaic energy systems, 2018 13th IEEE Conference on Industrial Electronics
industry architectural innovation, Technol. Forecast. Soc. Chang. (2018). and Applications (ICIEA), IEEE, 2018, pp. 679–684.
[122] Y.-F. Huang, S.-H. Chang, Mining optimum models of generating solar power [157] W. Hu, L. Zheng, X. Liu, P. Zhang, X. Xu, C. Wang, “Power grid’s intelligent sta-
based on big data analysis, Sol. Energy 155 (2017) 224–232. bility analysis based on big data technology, Power and Energy Engineering
[123] C.-M. Wang, Y.-F. Huang, Self-adaptive harmony search algorithm for optimiza- Conference (APPEEC), 2016 IEEE PES Asia-Pacific, IEEE, 2016, pp. 623–627.
tion, Expert Syst. Appl. 37 (4) (2010) 2826–2837. [158] J.W. Kim, An automatic detection of anomalous energy consumption by lever-
[124] R.F.-W.-U.B.U. Germany, Bigdataocean project profile, Framework 6 (2017) 1. aging bems big data analytics, Int. J. Appl. Eng. Res. 12 (14) (2017) 4345–4349.
[125] Bigdataocean project, http://www.bigdataocean.eu/site/ (accessed: 2018-11-16). [159] H. Zhiwei, G. Tian, Z. Huaving, H. Xu, C. Junwei, H. Ziheng, Y. Senjing,
[126] P. Ifaei, A. Farid, C. Yoo, An optimal renewable energy management strategy with Z. Zhengguo, Transient power quality assessment based on big data analysis,
and without hydropower using a factor weighted multi-criteria decision making Electricity Distribution (CICED), 2014 China International Conference on, IEEE,
analysis and nation-wide big data-case study in Iran, Energy 158 (2018) 357–372. 2014, pp. 1308–1312.
[127] M. Mayilvaganan, M. Sabitha, A cloud-based architecture for big-data analytics in [160] A.Y. Ng, M.I. Jordan, On discriminative vs. generative classifiers: a comparison of
smart grid: a proposal, Computational Intelligence and Computing Research logistic regression and naive bayes, Adv. Neural Inform. Process. Syst. (2002)
(ICCIC), 2013 IEEE International Conference on, IEEE, 2013, pp. 1–4. 841–848.
[128] A.J. Smola, B. Schölkopf, A tutorial on support vector regression, Stat. Comput. 14 [161] J.J. Grainger, W.D. Stevenson, Power System Analysis vol. 67, McGraw-Hill, New
(3) (2004) 199–222. York, 1994.
[129] H. Daki, A. El Hannani, H. Ouahmane, Hbase-based storage system for electrical [162] D. Šutić, E. Varga, Apache spark as distributed middleware for power system
consumption forecasting in a moroccan engineering school, Optimization and analysis, Telecommunication Forum (TELFOR), 2017 25th, IEEE, 2017, pp. 1–4.
Applications (ICOA), 2018 4th International Conference on, IEEE, 2018, pp. 1–6. [163] R.C. Yuan, H. Yan, X.M. Zhou, F.C. Di, L.X. Li, Application and architecture of
[130] Z. Zhou, F. Xiong, B. Huang, C. Xu, R. Jiao, B. Liao, Z. Yin, J. Li, Game-theoretical power dispatching & distribution system using big data technology, Advanced
energy management for energy internet with big data-based renewable power Materials Research, vol. 1070, Trans Tech Publ, 2015, pp. 1425–1429.
forecasting, IEEE Access 5 (2017) 5731–5746. [164] J. Yang, J. Zhao, F. Luo, F. Wen, Z.Y. Dong, Decision-making for electricity re-
[131] B. Hu, C. Pang, L. Wang, H. Chu, C. Mao, ”Big data management and application tailers: a brief survey, IEEE Trans. Smart Grid 9 (5) (2018) 4140–4153.
research in power load forecasting and power transmission and transformation [165] M.R. Bashir, A.Q. Gill, ”Towards an iot big data analytics framework: smart
equipment evaluation, Journal of Physics: Conference Series, vol. 1069, IOP buildings systems, High Performance Computing and Communications; IEEE 14th
Publishing, 2018, p. 012084 no. 1. International Conference on Smart City; IEEE 2nd International Conference on
[132] M. Kiran, P. Murphy, I. Monga, J. Dugan, S.S. Baveja, Lambda architecture for Data Science and Systems (HPCC/SmartCity/DSS), 2016 IEEE 18th International

12
K. Moharm Advanced Engineering Informatics 42 (2019) 100945

Conference on, IEEE, 2016, pp. 1325–1332. dynamic energy management in smart grids, Big Data Res. 2 (3) (2015) 94–101.
[166] J. Postel, Transmission control protocol, Tech. Rep., 1981. [176] B. Wang, S. Huang, J. Qiu, Y. Liu, G. Wang, Parallel online sequential extreme
[167] I. Kovačević, A. Erdeljan, S. Vukmirović, N. Dalčeković, J. Stankovski, Combining learning machine based on mapreduce, Neurocomputing 149 (2015) 224–232.
real-time processing streams to enable demand response in smart grids, Networks, [177] C. Zhu, H. Zhou, V.C. Leung, K. Wang, Y. Zhang, L.T. Yang, Toward big data in
Computers and Communications (ISNCC), 2017 International Symposium on, green city, IEEE Commun. Mag. 55 (11) (2017) 14–18.
IEEE, 2017, pp. 1–6. [178] L. Lu, H. Dong, C. Yang, L. Wan, A novel mass data processing framework based on
[168] S. Yu, W. Maomao, M. Lin, Design and realization of the smart grid marketing hadoop for electrical power monitoring system, Power and Energy Engineering
system architecture based on hadoop, Control Engineering and Communication Conference (APPEEC), 2012 Asia-Pacific, IEEE, 2012, pp. 1–4.
Technology (ICCECT), 2012 International Conference on, IEEE, 2012, pp. [179] N. Chen, C. Wang, P. Han, J. Zhang, K. Wang, E. Dai, W. Kang, F. Yang, B. Sun,
500–503. G. Guo, ”Research about solutions to the bottleneck of big data processing in
[169] M. Liu, M. Wang, J. Wang, D. Li, Comparison of random forest, support vector power system, International Conference on Industrial IoT Technologies and
machine and back propagation neural network for electronic tongue data classi- Applications, Springer, 2016, pp. 44–51.
fication: application to the recognition of orange beverage and chinese vinegar, [180] A. Ghosal, S. Halder, Building intelligent systems for smart cities: Issues, chal-
Sensors Actuat. B: Chem. 177 (2013) 970–980. lenges and approaches, Smart Cities, Springer, 2018, pp. 107–125.
[170] Z. Zhao, Q. Ma, A real-time processing system for massive traffic sensor data, [181] H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F.B. Cetin, S. Babu, Starfish: a
Connected Vehicles and Expo (ICCVE), 2012 International Conference on, IEEE, self-tuning system for big data analytics, Cidr 11 (2011) (2011) 261–272.
2012, pp. 142–147. [182] M. Khan, Z. Huang, M. Li, G.A. Taylor, P.M. Ashton, M. Khan, Optimizing hadoop
[171] C. Tu, X. He, Z. Shuai, F. Jiang, Big data issues in smart grid–a review, Renew. performance for big data analytics in smart grid, Math. Problems Eng. 2017
Sustain. Energy Rev. 79 (2017) 1099–1107. (2017).
[172] V. Potdar, A. Chandan, S. Batool, N. Patel, Big energy data management for smart [183] C.S. Lai, L.L. Lai, Application of big data in smart grid, Systems, Man, and
grids-issues, challenges and recent developments, Smart Cities, Springer, 2018, pp. Cybernetics (SMC), 2015 IEEE International Conference on, IEEE, 2015, pp.
177–205. 665–670.
[173] J. Hu, A.V. Vasilakos, Energy big data analytics and security: challenges and op- [184] N. Yu, S. Shah, R. Johnson, R. Sherick, M. Hong, K. Loparo, Big data analytics in
portunities, IEEE Trans. Smart Grid 7 (5) (2016) 2423–2436. power distribution systems, Innovative Smart Grid Technologies Conference
[174] T.-H. Dang-Ha, R. Olsson, H. Wang, The role of big data on smart grid transition, (ISGT), 2015 IEEE Power & Energy Society, IEEE, 2015, pp. 1–5.
2015 IEEE International Conference on Smart City/SocialCom/SustainCom [185] S. Rusitschka, E. Curry, Big data in the energy and transport sectors, New Horizons
(SmartCity), IEEE, 2015, pp. 33–39. for a Data-Driven Economy, Springer, 2016, pp. 225–244.
[175] P.D. Diamantoulakis, V.M. Kapinas, G.K. Karagiannidis, Big data analytics for

13

You might also like