You are on page 1of 28

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/327424414

IoT Big Data Analytics for Smart Homes with Fog and Cloud Computing

Article in Future Generation Computer Systems · September 2018


DOI: 10.1016/j.future.2018.08.040

CITATIONS READS
3 1,445

4 authors:

Abdulsalam Yassine Shailendra Singh


Lakehead University Thunder Bay Campus Lakehead University Thunder Bay Campus
72 PUBLICATIONS 628 CITATIONS 9 PUBLICATIONS 89 CITATIONS

SEE PROFILE SEE PROFILE

M. Shamim Hossain Ghulam Muhammad


King Saud University King Saud University
212 PUBLICATIONS 2,505 CITATIONS 233 PUBLICATIONS 2,173 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Smart Meters Big Data View project

A Vision System for Date Fruit Harvesting Robot View project

All content following this page was uploaded by Abdulsalam Yassine on 04 September 2018.

The user has requested enhancement of the downloaded file.


IoT Big Data Analytics for Smart Homes with Fog and
Cloud Computing
Abdulsalam Yassinea , Shailendra Singhb , M. Shamim Hossainc , Ghulam
Muhammadd
a
Department of Software Engineering, Lakehead University
955 Oliver Road, Thunder Bay, Ontario, P7B 5E1, Canada
(Email: ayassine@lakeheadu.ca)
b
Department of Electrical and Computer Engineering, Lakehead University
955 Oliver Road, Thunder Bay, Ontario, P7B 5E1, Canada
(Email: ssingh59@lakeheadu.ca)
c
Department of Software Engineering, College of Computer and Information Sciences,
King Saud University, Riyadh 11543, Saudi Arabia, (E-mail: mshossain@ksu.edu.sa)
d
Department of Computer Engineering, College of Computer and Information Sciences,
King Saud University, Riyadh 11543, Saudi Arabia (Email: ghulam@ksu.edu.sa)

Abstract
Internet of Things (IoT) analytics is an essential mean to derive knowledge
and support applications for smart homes. Connected appliances and devices
inside the smart home produce a significant amount of data about consumers
and how they go about their daily activities. IoT analytics can aid in per-
sonalizing applications that benefit both homeowners and the ever growing
industries that need to tap into consumers profiles. This article presents a
new platform that enables innovative analytics on IoT captured data from
smart homes. We propose the use of fog nodes and cloud system to allow
data-driven services and address the challenges of complexities and resource
demands for online and offline data processing, storage, and classification
analysis. We discuss in this paper the requirements and the design compo-
nents of the system. To validate the platform and present meaningful results,
we present a case study using a dataset acquired from real smart home in
Vancouver, Canada. The results of the experiments show clearly the benefit
and practicality of the proposed platform.
Keywords: Internet of Things (IoT), Cloud Computing, Fog Computing,
Big Data Analytics, Energy Management, Smart Homes

Preprint submitted to Elsevier July 9, 2018


1. Introduction
Smart homes are the theme of the future living. Many communities across
the globe are currently deploying smart homes as part of modernization ini-
tiatives. These always-on houses generate massive amount of valuable data
from smart devices and appliances connected to an IoT system [1]. The
ability to analyze these data in near real-time and off-line allows for the
discovery of various information that has significant impact on our society’s
safety, health, and economy. For example, a smart city’s health care sys-
tem can determine the status of patients inside a smart home by monitoring
their usage of appliances and detect their routine or abnormal activities that
could indicate signs of health problems [2][3]. A utility company may analyze
large amount of energy consumption data from appliances inside the home
to learn about the behavior of occupants and recommend electricity bill re-
duction plans for consumers based on energy usage profiles [4]. Such scenario
leads to cost reduction not only to consumers but also to utility companies.
Real-time IoT application allows manufactures to analyze data continuously
and determine or predict an appliance maintenance schedule or promptly re-
place malfunction equipment. These examples of IoT applications reveal the
advantages of analyzing smart home data. While such data presents valuable
opportunities in understanding the dynamics and behavior of smart homes
and their occupants, it also spells out a tremendous challenge regarding data
management, storage, and analytics. To ensure that users are not drowning
in floods of data, they need systems capable of managing, analyzing and
transforming this amount of data into actionable insights for smart city ap-
plications that demand prompt actions with stringent requirements. These
systems must also meet the needs of scalability with the growing volume of
data and the temporal granularity of decision-making whether it is off-line
or near real-time.
In this paper, we propose a system which combines IoT and big data
analytics technology with fog and cloud computing. The proposed system
addresses the challenge of designing efficient solutions that are fast and can
handle large volumes of unstructured smart home data. In this system,
fog computing provides fast near real-time analytics while the abundance of
computing and storage resources in the cloud system is used to carry out
computationally intensive applications. The process involves taking data-in-
motion from IoT sources, such as individual smart meters, appliances, and
devices and integrate them for highly sophisticated analytics processes that

2
deliver timely decision-making. Fog computing nodes are resource-efficient
because they are equipped with virtual machine technologies capable of con-
tinuously processing fresh IoT streams of data and transfer the processed
data to the cloud for further processing [5]. Cloud computing offers a multi-
tude of benefits such as Infrastructure as a Service (IaaS): providing access to
unlimited storage space, Platform as a Service (PaaS): potential to execute
resource-intensive applications, Software as a Service (SaaS): facilitates soft-
ware access, and Utility Services: store massive volume of data for remote
access. Fog computing play a critical role in the IoT ecosystem to support
the processing of big data for near real-time responses. Furthermore, fog
computing fundamentally processes and stores data at the edge of the cloud
system [6]. This unified architecture allows us to resolve the latency issues
pertaining to the underlying transport communication network of cloud sys-
tems which has a significant impact on time-sensitive applications [1][7] [8][9].
For the evaluation of the proposed system, we present a case study of
analyzing and processing streams of data from a smart home. The smart
home generates continuous streams of the massive data in short time inter-
vals. Processing and analyzing such data is vital for many applications (e.g.,
healthcare systems, smart grid energy management applications etc.) [10]
[11]. The main contribution made in this paper are as follows:

• Proposing a platform for IoT smart home big data analytics with fog
and cloud computing. The system design allows the processing of mas-
sive multiple smart home IoT data in distributed fog nodes, which ac-
commodate cognitive data mining algorithms that provide insight from
processed data. This approach is rather significant for many applica-
tions that require access to information for timely functional economies
of scale, where smart home operations can be cost-effectively deployed
and used.

• Providing detail requirements and design component analysis of the


platform architecture. Specifically, we discuss requirements of scalabil-
ity regarding the processing of multiple IoT data streams from smart
homes and the design aspects of minimizing communication overhead
between the fog nodes and the cloud systems. Furthermore, we dis-
cussed the design of mechanisms for task allocation, IoT management
and integration services, and admission authentication of smart home
and third-party applications.

3
• Presenting a case study of an actual smart home. We analyzed the
smart home IoT data for behavioral and predictive analytics of occu-
pants pertaining to energy consumption routines and patterns. We dis-
cussed the applications of these finding within the context of demand
response management and electricity cost reduction. These analysis
are considered among the primary functions and applications of smart
homes, which can be scaled with fog and cloud computing to an entire
smart community[[34]].

The organization of the paper is as follows: in the next section, we discuss


the related work. In section 3, we present the components of the proposed
platform followed by a study case in section 4. Finally, in section 5 we present
the conclusion of the study and provide direction for future work.

2. Related Work
Recently, several studies have proposed systems and frameworks for IoT
data analytics using various architectures involving fog and cloud computing.
In this section, we discuss these studies especially those that are representa-
tive of the state-of-the-art and close to our work.
Many researchers tackled issues closely related to our work such as those
in [14] [33][29][24][31] and [25]. For example the work in [14], focuses on pre-
dictive analytics for smart homes that need access to historical data which
must be stored in a large database that can only be provided by a cloud
system. The work in [33] investigated the smart home services for in-depth
analysis of home appliance frequent pattern usage. Specifically, the discovery
of co-utilization behavior of appliances inside smart homes. For this purpose
the authors propose a multidimensional patterns mining framework from a
large number of residential users connected to an Internet Service Provider
(ISP). The authors in [29] developed a new gateway system to automatically
integrate and configure new home-based IoT devices for seamless analytics
in cloud systems. The SLASH framework in [24] presents new approach for
smart home adaptivity and self-learning mechanisms. The idea include the
development of big data layer with an analytical engine that supports occu-
pants behavior. The work in [25] proposes an end-to-end home automation
system that supports multiple IoT protocols for data acquisition and analy-
sis. The authors claim that their system is capable of handling data coming
from city-wide deployed devices. Similar to the work in [25], a general smart

4
city paradigm is proposed in [31] for IoT big data analytics system that in-
tegrate sensors from smart homes, traffic, vehicles, surveillance sensors, etc.
using Hadoop ecosystem real-life environment.
In addition to the above mentioned studies, the authors in [40] present
real-time data analytics engine that facilitates processing of data near the
source of information. The proposed analytical engine ensures that data is
processed before it is offloaded to the central cloud system. The system
coordinates the analytics between the physical location of the IoT devices
in the vicinity leading to the creation of device-to-device analytical layer
under the cloud system. The main issue with this approach is that it adds
complexities to the system to the point that makes it practically prohibitive.
Similar to [40], the works in [15][18] and [19] address the issues of data
analytics at the edge of the cloud system, but focused on the latency problem
of processing large amount of IoT data using fog computing. The design
approach in these systems brings resources of edge computing as close as
possible to the source where data is generated. The work in [36] further
investigates this issue and develops mechanisms to estimate the latency for
cloud-fog-IoT continuum systems.
For real-time analytics of IoT data in uncontrolled environments, the work
presented in [32] proposed a general-purpose IoT framework that integrate
wireless hub nodes to support analytical reliability and assures real-time data
acquisition. The work in [35] proposed a system that runs data analytics in
a distributed fashion using fog computing, IoT devices, edge and central
servers. The main approach is to optimize the decision-making of analytics
such that all IoT devices are fairly treated and satisfied. The results of this
work show a promising solution for enhancing the utilization of fog and cloud
computing systems. To facilitate intelligence of the edge network in providing
robust analytics for IoT systems, the work in [37] outlined a new approach to
dynamically automate the transitions between the central cloud system and
its edge taking into account the various conditions and requirements of the
applications. The author in [26] proposes a general model and architecture
that ingests IoT data streams into fog computing nodes. The model addresses
the challenges of existing techniques and the shortcomings pertaining to the
essential dimensions of data analytics related to system, data, human and
optimization.
It is important to note that a platform with fog computing nodes coupled
with cloud computing offers a resource-efficient processing of IoT big data
at near realtime basis while providing insights and processed data to cloud

5
for further processing and analysis. This integrate design facilitates us to
address the latency issues of cloud system that can have a remarkable impact
on time-sensitive applications.
Our work in this paper is in line with the work presented in [25], however,
our focus is on a scalable IoT big data analytics platform with fog computing
that is capable of managing, analyzing and transforming household energy
consumption data into actionable insights. Therefore, we present a holistic
architecture that is suitable for an end-to-end analytics of IoT connected
smart homes. We discuss the validity of the architecture and the intercon-
nectivity of analytical modules. For the evaluation of the system model, we
present a case study of data streams collected from an actual smart home
in Vancouver, Canada. Our case study addresses the challenges of data an-
alytics of smart home energy consumption for smart grid applications (e.g.
Automatic Demand Response). It must be noted that this work differs from
[4][3][27] and [28]. These works do not address the IoT big data analytics
in fog and cloud computing systems, but focus on analyzing behavioral en-
ergy consumption that lead to peak hours as in [4], activity recognition for
healthcare applications [3], and prediction models [28]. This paper intro-
duces detail system requirement and component design analysis for an IoT
big data analytics platform for smart homes via fog and cloud computing.
The smart home dataset, the platform as well as the results in this paper are
completely different from our previous work.

3. Platform Overview
The fast deployment of smart homes is taking off across the world, and
it is becoming a compelling business opportunity for various industrial ap-
plications. Smart homes that are supported by IoT paradigm generate large
useful data. However, unlocking the potential of this information hinges on
the development of sophisticated big data analytics tools and platforms capa-
ble of processing, analyzing and managing these data in cost-effective ways.
In this section, we address the system requirements for the development of
IoT big data with fog and cloud computing, and we present the components
of the proposed platform.

3.1. Platform Requirements


The design of innovative platforms that are suited to support a large
amount of data generated from smart homes posses peculiar requirements,

6
functionalities, and design structures.
• Resource Distribution: Processing large amount of data generated
from household appliances and devices require cost-effective and re-
source efficient big data analytics closer to the physical system. Mining
continuous streams of IoT data should meet the timing requirements for
many smart city applications such as automatic demand response, sen-
sitive healthcare applications, safety and surveillance operations, etc.
which require predictable latency for near real-time detection and noti-
fication. Mostly, these functionalities face serious constraints when pro-
cessing data and invoking services from the back-end cloud. The prox-
imity of resources helps overcome the high-latency that is associated
with the provisioning of cloud-based services. Therefore, optimized
scheduling mechanisms are required to coordinate the tasks among fog
computing nodes and should appropriately allocate resources from the
cloud system.

• Scalable Analytics: Data streams from smart homes present a chal-


lenging prospect for processing and mining operations. These data
streams are received at the system in high volumes, high velocity and
from various sources. Furthermore, smart home data change over time
due to the changing behavior of occupants. Hence, such data require
scalable tools to process and analyze them for different behavioral
traits. Also, these data come from different communication channels
that contribute to the abnormalities and noises that require further
cleaning and pre-processing to reduce their dimensionality and better
extract useful information.

• Performance: IoT data streams should be handled in a parallel man-


ner to boost the performance of data analytics and to optimize the
smart home operations. Depending on the analytics activity, the spe-
cific requirements include elastic resource acquisition, efficient data net-
work management, data reliability, and functional data abstractions.
Furthermore, IoT data processing should make full use of all compu-
tational resources to address performance challenges of near real-time
and off-line computation algorithms such as finding concealed patterns,
quickly and valuable knowledge. Furthermore, there is a need for co-
ordination between fog computing nodes and the cloud system. Fog
nodes cannot carry out high computational tasks because of resource

7
and storage limitations, and therefore, intensive-applications are per-
formed in the cloud servers for better performance [30].

• Integration: At any single time instant, data collected from different


devices inside smart homes are unstructured and cannot be processed
using conventional tools. Therefore, IoT data from smart homes re-
quire dedicated integration mechanisms to be further processed in a
unified system. Such approach imposes restrictions on edge computing
selection concerning add-hoc pre-processing operations. As a result, an
automated and adaptive data ingestion methods must be developed to
integrate incoming varying data rates and volumes and accommodate
various data sources and applications requirements. These methods
should process all inbound time-series data, execute data transforma-
tions, and coordinate the transmission of the processed data to the
cloud system.

• Visualization:, Another major requirement for dealing with large-


scale IoT data from smart homes, is data-relevance through visualiza-
tion. This requirement is rather significant since different applications
rely on the visualization of the underlying trends and the perceived
presentation of data after being processed and analyzed for decision-
making. Application-specific data visualization serves the domain ex-
pertise by providing meaningful results based on data quality. For
example, presenting abstraction of data clustering for co-utilization of
appliances inside a smart home leads to better understanding the be-
havior of occupants inside the house and their activities.
To satisfy the above requirements, in this paper we present an IoT big
data analytics platform for processing and analyzing a large volume of smart
homes data streams. Next subsection describes the proposed platform.

3.2. Platform Design Components


Figure (1) shows the architecture of the proposed platform. It consists
of IoT big data analytics with fog computing nodes and cloud system. The
components of the platform support complex operation of continuous in-
tegration, processing and analytics of multiple smart home data. The fog
nodes broaden the services of the cloud system to the network edge close
to the physical locations of the smart homes, thus allowing for faster data
processing and services applications that can only be served within specific

8
time constraints. The cloud system takes the heavy lifting of processing
computationally intensive application.
In the proposed model, smart homes are the source of data. Such data
typically arrive at the smart home IoT gateway from different sources in-
cluding household appliances and smart devices. The acquisition of data
is typically performed by specific IoT protocols such as machine-to-machine
(M2M)/Message Queuing Telemetry Transport (MQTT) that communicate
with smart home devices and IoT gateways. An IoT gateway acts as an
agent that mediates between the smart home and the cloud system. The IoT
gateway may also provide local processing and storage functions including au-
tonomously controlling and filtering of data streams. In the proposed model,
the IoT gateway can be used to serve multiple households while ensuring
trusted connectivity and security by enforcing policy-based access mecha-
nisms. The acquisition of data during this communication process passes
through several stages until the data rest on cloud storage devices where
further processing may be performed in future. As mentioned in section ”In-
troduction”, smart home data have the volume, velocity, and variety char-
acteristics to be considered as big data. The analytics operations include
filtering and cleaning, clustering and aggregation where each operation takes
extensive time depending on the nature of the data. The following are the
details of the platform components.

• Smart Home Components: The smart home consists of sensors, de-


vices, appliances and metering systems. The components of the smart
home are roughly categorized into three tiers namely cyber-physical,
connectivity, and context-aware. The cyber-physical tier consists of
smart devices, metering systems, sensor systems, appliances, electri-
cal vehicle charging points, and energy management systems. These
elements are responsible for all actual operations of the smart home
and are installed on the home premises. They interact with the out-
side world through the connectivity tier. As mentioned earlier in this
section several communication protocols allow these devices to com-
municate among themselves and with the cloud system through an IoT
gateway. The connectivity tier is responsible for outbound and inbound
communication with the smart home, which includes direct interaction
with the occupants through mobile or web applications. The context-
aware tier provides real intelligence about the essence of the smart
home. Context-aware tier includes user-defined rules and policies to

9
Figure 1: IoT Big Data Analytics with Fog Computing

10
manage the interaction and the services of the smart home. This tier
allows for context-based privacy and security configuration that sat-
isfies the occupants’ concerns. Activity recognition, event detection,
behavioral and predictive analytics are performed by the fog and cloud
computing system and reported to the smart home applications. For
example, behavioral analytics can be very effective to understand how
users go about using their appliances and derive conclusions about en-
ergy consumption, which can be used to forecast the future demand.
Activity recognition can be used to allow caregivers in healthcare appli-
cations [3] to detect abnormal behavior of patients. The applicabilities
and benefit of such analytics and services are countless.

• IoT Management and Integration Services: The IoT manage-


ment service is a broker-based subsystem responsible for handling IoT
service requests from multiple smart home applications into the cloud
system. It plays a vital role in providing authentication services for
these requests and ensures that the rules of admission are consistent
with the pre-configured policies in the rules engine. Services that re-
quire access to data about the smart home must register with the IoT
management broker before using any data from the cloud system. The
orchestration of these tasks are handled by the requests handler after
registration and authentication. The operation of the IoT management
services are protocol independent and are responsible for maintaining
continuity and flexibility for the whole IoT ecosystem. Figure (2) illus-
trates the operation of the IoT management services. In this figure all
services are first authenticated and then registered before accessing the
data. Figure (3) shows a high level description of the integration ser-
vices in the proposed platform. This service provides seamless access
to external applications via programming interfaces (APIs). The main
approach in this design is the decoupling of the analytical components
of the fog nodes and the cloud system from user applications. By so do-
ing, the integration service assures security since external applications
would not be able to have any direct access to the analytical engine.
Also, it adds data abstraction by enabling the use of data for various
user-specific applications including mobile and desktop. Another main
advantage is providing interoperability for various channel technologies
(Wi-Fi, Bluetooth, ZigBee, and LPWAN etc.)and data transfer proto-
cols M2M, MQTT, CoAP (Constrained Application Protocol), AMQP

11
Figure 2: The IoT Big Data management service is responsible for request handling,
authentication and service registration

Figure 3: The IoT Big Data integration service is responsible for smart home functional
and third party services and applications

(Advanced Message Queuing Protocol)), Websocket etc..

• Fog Computing Nodes: The fog node provides additional resources


and computational services to support various smart home time-sensitive
applications. The fog nodes provide the means for accelerating ana-
lytics services while ensuring increased responsiveness from the cloud
infrastructure. Figure (1) shows the main functions of the fog node
in our platform. It is composed of several functions including data
pre-processing, pattern mining, classification, prediction, and visual-
ization. These functions are responsible for rapid analytics of smart
home data collected through the IoT system while the aggregated re-
sults are sent to the cloud or directly to the serviced applications. The
fog node performs all the short-term analytics at the edge of the cloud

12
system. Data arrived at the fog node are unstructured and do not have
a predefined model. During the cleaning/pre-processing process, er-
rors, redundancies, and outliers are removed to ensure consistency. In
the pre-processing stage, all IoT streams are filtered, parsed and trans-
lated into a unified data structure for further analysis. At this stage,
raw data which contains millions of high time-resolution data records
are transformed into a pre-defined resolution for each device. The fre-
quent pattern mining techniques are conducted on the data to discover
the occurrence of appliance correlation in data streams. Frequent pat-
tern mining searches for these recurring patterns in a given dataset to
determine associations and correlations among patterns of interest[39].
In clustering stage, we employ an unsupervised form of classification
which is capable of distinguishing classes of appliances learned from
the data [39]. Prediction analytics are responsible for forecasting oc-
cupants activities or use of certain devices. Visualization provides an
interactive medium for the user to discover knowledge from data to
enhance the decision-making process. Finally, the results of the stages
mentioned above are sent to the cloud system which has an abundance
of resources for computationally intensive tasks.
It should be noted that the configuration of the fog node to ensure the
privacy of the smart home is a challenging prospect. As fog nodes are
becoming a major computation hub, smart home private data become
vulnerable to various attacks. Therefore, a new breed of trust manage-
ment systems and privacy protection mechanisms are required to tackle
such problem. These mechanisms are not considered in this paper.
However, other remedies for this problem can be found in [20][21][22]
and [23].

• Cloud System: In the proposed platform, the cloud system is respon-


sible for providing core services to smart home applications that include
historical data analytics, extended storage capabilities, and core smart
home management infrastructure. The cloud services include smart
home device tracking, configuration, analysis, reporting, authentica-
tion and authorization services. These functions provide value-added
service for users to control and manage their smart homes using differ-
ent means ( e.g., web and mobile applications) as well as to interact
with third-party vendors. Also, the cloud system provides heavy com-
putational using large-scale data mining resources such as MapReduce,

13
Spark, Storm, etc. The cloud system uses its back-end computation to
gain business insight and updated the fog nodes about new operational
rules.

4. Case Study

In this case study, we perform IoT data analytics of appliances and


devices from a smart home in Vancouver, British Columbia, Canada.
The data is publicly available from Harvard Education website [41].
The dataset consists of one minute interval measurements of multiple
smart home appliances over the span of two years, April 2012-April
2014. We perform data analytics on IoT streams to uncover occupants
behavior of appliance usage such as identifying frequent patterns asso-
ciated with appliances including hour of day, day of week, and month
of year as a means of understanding how occupants go about their daily
routines. For this particular study, we assume that the data is acquired
at the fog node where an analytical engine is responsible of perform-
ing immediate analysis to satisfy the requirements of applications such
as energy consumption management, targeted advertisement, activity
recognition. Ideally, for this particular case study a scalable computing
resources are required to enhance the performance with additional ac-
quisition of data. Our tests are conducted on a single node consisting
of computer system running i5-Core CPU with 8GB of RAM and 1 TB
of storage device. The main processing resources are allocated to the
analytics part where we process 2-years of data. The current running
time takes few minutes which can be improved with more computing
resources, however, it shows that one fog node is capable of processing
more than one smart home.
In this paper, we focus on energy consumption analytics of IoT data
from smart homes. Specifically, we analyze IoT data aggregate of en-
ergy measurements of home appliances. An application such as residen-
tial Automatic Demand Response (ADR) requires energy consumption
data about appliances in residential homes to be analyzed to engage
them in demand response signals effectively [12]. To realize this con-
cept, we first assume that every home is attached to a fog node that is
responsible to perform the analytics. The aim is to analyze the frequent
pattern of home appliance usages, the context of appliance usage (i.e

14
co-utilization of appliances), the cluster of appliances with respect to
the time of use (i.e spatio-temproal analysis), and the forecast of appli-
ance usage. The following steps illustrate the life-cycle of the analytics
at the fog node.
Data Cleaning and Preparation: The dataset contains millions of
records (sample of raw data is shown Table (1)) with a large amount
of data about appliances. Data about appliances is collected every
minute for a length of two years (April 2012-April 2014). These data
measurements include: unix timestamp, line voltage, voltage, apparent
power. The process of cleaning the data started by importing the data
files in Python scripts. The cleaning process includes eliminate un-
necessary columns, convert Unix timestamp to human readable date,
remove values that are below the standby power threshold, removing
outliers and duplicate rows. The entire cleaning process was completed
using Python with regular expressions (RegEx). The preparation of
the data includes comparing all the reading to a pattern and only the
matching patterns were stored in a database. The tuples not matching
the pattern are considered noise because the values for power and times-
tamp are supposed to be Integers only, hence, any different character
in this values would represent an error in the recording process of those
tuples. The process of pattern matching also ensures the quality of the
data, because any tuple that was incomplete or inconsistent did not
match the pattern and therefore was ignored. For the purpose of train-
ing, we developed a synthetic dataset which include the appliance, the
time of its operation, the date, and the power. With this information
in hand, we can then perform clustering analysis and frequent pattern
of appliance usage such as the hour of day, day of week, month of the
year.
Frequent Pattern Mining: For frequent pattern mining, we are in-
terested in analyzing the occurrences of when certain appliances are
being used by examining the ”ON/OFF” state and the energy con-
sumption. Being in an On state allows for the inference that a human
is currently using a particular appliance. This information can be ben-
eficial in certain applications, and as a result, the data and patterns
mined have a value to industries. For example, by knowing when an
individual is likely to have the television turned on could help compa-
nies target advertisements. We would like to derive these patterns in

15
Table 1: Sample of IoT Data from the Smart Home
Timestamp* Apparent power consumption of appliances**
1360548360 210
- -
1360548420 70
- -
1360548480 28
* - Unix timestamp
** - Example of appliances includes Dishwasher, Toaster, TV, Dryer
Home Theater, Washing Machine, Laptop etc.

discrete-valued relations [28]. Specifically, we study the appliance us-


age patterns of the whole house and seek to uncover associations over
time domains. Formally, let A be a database consisting of n itemsets
T1 such that A= (T1 , T2 , ...Tn ). An itemset is considered a frequent
pattern if it appears with a certain frequency in a database transac-
tion. The user may define the threshold level of the frequency count
of an itemset in a transaction. One of the methods of determining the
frequency count is known as the support count s which is defined as
the statistical count of the frequency of an itemset in a transaction
carried over the database A. For example, two itemsets I (I ⊆ A)
and J (J ⊆ A) are counted as frequent patterns in a transaction if
their support sI and sJ is above a threshold value known as the mini-
mum support minsup. In the case of finding a frequent pattern, then
the association rules are determined. An association rule is expressed
as {I ⇒ J} and are derived from the support − conf idence, where
support sI⇒J such that s(I ⇒ J) = sI⇒J = s(I ∪ J) is the percentage
of all transactions that have (I ∪ J) in A. The support represents the
mutual preconditions of this association in the database while the con-
fidence is the preconditions that contribute to the consequence. In this
sense, the frequency of itemsets in a transaction suggests the statistical
significance of the association rule ( meaning the probability P (I, J)),
determined by the confidence |s(I∪J)|
|s(I)|
(meaning the conditional prob-
ability P (I|J)) [38][39]. We employ the frequent pattern FP-Growth
algorithm [38][39] and its extension [4] on this smart home dataset.
Procedure (1) shows the steps of capturing the frequent patterns from

16
Figure 4: Hour of the Day Energy Consumption Pattern - (a) Dishwasher (DWE), (b)
Cloth Dryer (DWE), (c) Kettle (KE), (d) TV, (e) Home Theater (HTE), (f) Laptop (EBE)

the dataset. Figures (4), (5), and (6) show the pattern of energy con-
sumption of six appliances in the home comprising hours of day, day of
week, month of year. We applied a minimum support threshold of 30%
on the dataset and turned all values that were below the threshold to 0
and all the ones above to 1. This allowed us to obtain a binary matrix
to check what appliances are in use at the specific time as shown in the
table (2).
The final result of the frequent pattern mining is the association among
appliances that are the result of the simultaneous use of the appliance
by occupants. Figure (7) shows an example of hourly use and day of
the week use of appliances. From figure (7-a) it is apparent the two
appliances used the most together are the dishwasher and television be-
tween the hours of 6pm-1030pm. For the three appliances (dishwasher,
dryer, and television) on at the same time, the most likely time of the
day this will happen between 8-8: 30 pm. The days of the week in
figure (7-b) demonstrates that very often the dishwasher and television
are frequently on together at the same time. Inspecting each day in-
dividually, you can see certain patterns such as Monday and Tuesday
night the dishwasher and television are on the longest amount of time
or Saturdays the television and dishwasher are on later at night.

17
Procedure-1:Generating Frequent Patterns-GP Growth

Require: Start by constructing the Frequent Pattern Tree


(FP-T)with algorithm in [28].
Ensure: Generation of Frequent Patterns
1: Start by single path in the FP-Tree
2: Check for the support s among all the combination of
nodes in the FP-Tree
3: Determine the frequent patterns
4: Add to Database A the discovered patterns
5: Repeat steps for multiple paths in FP-Tree
6: Determine the frequent patterns using for multiple paths
7: Add to Database A the discovered patterns
8: Final Frequent Patterns = Single path ∪ Multiple path
9: Determine association rules based in %30 support using
algorithm in [4]
10: Create a binary matrix and update it with all appliance
patterns within the threshold.

Figure 5: Day of the Week Energy Consumption Pattern - (a) Dishwasher (DWE), (b)
Cloth Dryer (DWE), (c) Kettle (KE), (d) TV, (e) Home Theater (HTE), (f) Laptop (EBE)

Cluster Mining: The above frequent pattern analysis provided insight

18
Table 2: Sample of a binary matrix to uncover the frequent pattern of appliance usages
10:30pm- 11pm- 11:30pm- DWE- DWE- CDE- DWE-
11pm 11:30pm 12pm CDE TVE TVE CDE-TVE
0 0 0 1 1 1 1
0 0 0 1 1 1 1
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1 1 1 1
0 0 0 1 1 1 1
1 0 0 1 1 1 1
0 0 0 1 1 1 1

Figure 6: Day of the Week Energy Consumption Pattern - (a) Dishwasher (DWE), (b)
Cloth Dryer (DWE), (c) Kettle (KE), (d) TV, (e) Home Theater (HTE), (f) Laptop (EBE)

about how the smart home occupants are co-utilizing their appliances. Clus-
tering analysis allows us to interpret time-intervals associated with groups
of appliances. This is rather important to uncover deeper behavior of ap-
pliance energy consumption of specific times (e.g. peak hours). To achieve
this objective, we implement the k-mean clustering algorithm in [39]. The
basic principle of the k-mean algorithm is that it defines k centers which
are placed
P in P specific positions away from each other. Then, the function
G(z) = ki=1 C 2
j=1 (||ai − bj ||) is used to determine the squared error value,
i

where ai − bj is the Euclidean distance between a and b, Ci represents the


number of data points in ith cluster. Determining the optimal number of k is

19
Figure 7: Frequent Pattern of appliances- Dishwasher (DWE), Cloth Dryer (DWE), TV -
(a) Hour of the Day (b) Day of Week [D1 -Sunday - D5 Saturday ]

vital for getting better results. There are many methods for determining the
ideal number k as described in [43]. The approach in this work is using the
silhouette coef f icient as a means of calculating the optimal number k [44].
This method basically measures the quality of the cluster by evaluating how
well the data points are positioned within a cluster. It computes the average
distance of yj given as xj = average{dis(yj , yi )} to all other data points in
cluster Ci and then determine wj = min(wj ) across all the clusters except
(wj −xj )
Ci . The Silhouette coef f icient for yj is determined as ryj = max(x j ,wj )
and the Silhouette coef f icient for cluster Ci and for having k clusters as
rCi = average(syj ) f or j = d1 ..dn and rk = average(sCi ) f or i = 1..k
respectively. The higher the average silhouette value, the better the cluster-
ing. In other words, the average Silhouette provides observation about the
various values of k ∈ 1, 2, 3...m, where m represents the unique objects in
a dataset. To find out the optimal number of clusters, the process is con-
tinuously executed and the average Silhouette coefficient is calculated until
finding the optimal number of clusters k that maximizes rk .
Figure (8) shows the clustering of appliances at the hour of the day, where
cluster strength signifies the frequency of use of appliance, i.e., a higher
strength of a cluster for an appliance indicates the higher use of it during
the period. Higher or lower usage of appliance, i.e., patterns of appliance
usage can be the direct representative of energy consumption behavior of
occupants. Such an analysis can be conducted at various levels such as
individual house, group of houses, community or neighborhood, or at the
system level. When done at a higher level such as neighborhood or system
level, the outcomes can help profile houses according to energy consumption
behavior and customize demand response mechanism to be more efficient.
Further, at a single home, the outcomes can assist adapt recommendations
to reduce household energy cost while respecting the occupants expected

20
Figure 8: Smart Home Appliance Clustering - Hour of the Day

comfort. Moreover, it feasible to consider renewable energy generation at


the neighborhood or house level to fine-tune demand response programs or
energy reduction recommendations.

5. Conclusion and Future Work


In this paper, we presented a platform for smart home IoT big data ana-
lytics with fog and cloud computing. We provided detail requirement analysis
and illustration of the platform components. The process of performing the
analytics in the fog node is presented, and the results show the possible ap-
plications of the system in different aspects. For example, applications of the
data acquired may include activity recognition to identify health problems,
identify energy consumption patterns and energy saving planning, and pre-
dict appliance maintenance schedule to avoid expensive repairs and ensure
efficient operations from the point of view of energy consumption.
In general, the platform can aid effective and in-time decision making for
individual house owners by facilitating various energy management programs
at home level. Household energy consumption management and data ana-
lytics is a complex operation that requires continuous integration of multiple
sources into a common processing system with easy access to data. Other
possible application may be extended to serve companies who are interested
in targeted advertisement. They can choose a time slot where customers
are using these appliances; typically between 6 pm and 9 pm because the

21
residents are frequently watching television at this time. Analytics in fog
nodes increases the ability of the platform to manage an integrated array of
IoT data streams for various applications in highly automated ways which
result in significant savings for service providers. Also, service providers can
design and develop their applications using fog nodes that offer abundance
elasticity to enhance performance, redundancy and storage devices for their
applications.
For future work, we plan to develop optimization mechanisms such as
those in [16][17] to determine the optimal distribution and configuration of
fog nodes while taking into consideration the computational resources and ca-
pability of processing the required data from multiple homes. Furthermore,
we plan to refine the platform component and test with different datasets
from various homes. This approach is crucial to validate the applicability of
the platform and its robustness in dealing with all kind of IoT data measure-
ments. We also plan to study a benchmarking scheme to assess and capture
the performance of the platform and analytics under different concerns in-
cluding runtime, CPU utilization, data size, incoming requests, etc.

6. References
[1] H. El-Sayed et al. Edge of Things: The Big Picture on the Integration
of Edge, IoT and the Cloud in a Distributed Computing Environment.
IEEE Access, vol. 6, pp. 1706-1717, 2018

[2] P. Rashidi, D. J. Cook, L. B. Holder and M. Schmitter-Edgecombe.


Discovering Activities to Recognize and Track in a Smart Environment.
IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 4,
pp. 527-539, April 2011

[3] A. Yassine, S. Singh and A. Alamri. Mining Human Activity Patterns


From Smart Home Big Data for Health Care Applications. IEEE Access,
vol. 5, pp. 13131-13141, 2017. doi: 10.1109/ACCESS.2017.2719921

[4] S. Singh and A. Yassine. Mining Energy Consumption Behavior Patterns


for Households in Smart Grid. IEEE Transactions on Emerging Topics
in Computing. doi: 10.1109/TETC.2017.2692098

[5] M. Marjani et al. Big IoT Data Analytics: Architecture, Opportunities,


and Open Research Challenges. IEEE Access, vol. 5, pp. 5247-5261, 2017

22
[6] A. Mebrek, L. Merghem-Boulahia and M. Esseghir. Efficient green solu-
tion for a balanced energy consumption and delay in the IoT-Fog-Cloud
computing. IEEE 16th International Symposium on Network Computing
and Applications (NCA), Cambridge, MA, 2017, pp. 1-4, 2017

[7] A. S. Chhabra, T. Choudhury, A. V. Srivastava and A. Aggarwal. Predic-


tion for big data and IoT in 2017. International Conference on Infocom
Technologies and Unmanned Systems (Trends and Future Directions)
(ICTUS) Dubai, 2017, pp. 181-187

[8] Y. Ge, X. Liang, Y. C. Zhou, Z. Pan, G. T. Zhao and Y. L. Zheng.


Adaptive Analytic Service for Real-Time Internet of Things Applica-
tions. IEEE International Conference on Web Services (ICWS), San
Francisco, CA, 2016, pp. 484-491

[9] P. Pouladzadeh, S.V.B. Peddi, P. Kuhad, A. Yassine, S. Shirmoham-


madi. A virtualization mechanism for real-time multimedia-assisted mo-
bile food recognition application in cloud computing, Cluster Comput-
ing. 18 (3) (2015) 10991110

[10] A. R. Al-Ali, I. A. Zualkernan, M. Rashid, R. Gupta and M. Alikarar. A


smart home energy management system using IoT and big data analytics
approach. IEEE Transactions on Consumer Electronics, vol. 63, no. 4,
pp. 426-434, November 2017

[11] A. Berouine, F. Lachhab, Y. N. Malek, M. Bakhouya and R. Ouladsine.


A smart metering platform using big data and IoT technologies. 3rd
International Conference of Cloud Computing Technologies and Appli-
cations (CloudTech), Rabat, 2017, pp. 1-6.

[12] A. Yassine. Implementation challenges of automatic demand response for


households in smart grids. 3rd International Conference on Renewable
Energies for Developing Countries (REDEC), Zouk Mosbeh, 2016, pp.
1-6

[13] M. Frincu, C. Chelmis, M. U. Noor, and V. K. Prasanna. Accurate and


efficient selection of the best consumption prediction method in smart
grids. Proc. IEEE Int. Conf. Big Data, 2014, pp. 721729

23
[14] H. Cai, B. Xu, L. Jiang and A. V. Vasilakos. IoT-Based Big Data Stor-
age Systems in Cloud Computing: Perspectives and Challenges. IEEE
Internet of Things Journal, vol. 4, no. 1, pp. 75-87, Feb. 2017.

[15] J. He, J. Wei, K. Chen, Z. Tang, Y. Zhou and Y. Zhang. Multi-


tier Fog Computing with Large-scale IoT Data Analytics for Smart
Cities. IEEE Internet of Things Journal, vol. PP, no. 99, pp. 1-1. doi:
10.1109/JIOT.2017.2724845

[16] M. Taneja and A. Davy. Resource aware placement of IoT application


modules in Fog-Cloud Computing Paradigm. IFIP/IEEE Symposium
on Integrated Network and Service Management (IM), Lisbon, 2017, pp.
1222-1228

[17] Q. T. Minh, D. T. Nguyen, A. Van Le, H. D. Nguyen and A. Truong.


Toward service placement on Fog computing landscape. 4th NAFOSTED
Conference on Information and Computer Science, Hanoi, 2017, pp. 291-
296

[18] N. M. Gonzalez et al. Fog computing: Data analytics and cloud dis-
tributed processing on the network edges. 35th International Conference
of the Chilean Computer Science Society (SCCC), Valparaiso, 2016, pp.
1-9.

[19] H. Cao, M. Wachowicz and S. Cha. Developing an edge computing plat-


form for real-time descriptive analytics. IEEE International Conference
on Big Data (Big Data), Boston, MA, 2017, pp. 4546-4554.

[20] A. Yassine, A. A. Nazari Shirehjini and S. Shirmohammadi. Smart Me-


ters Big Data: Game Theoretic Model for Fair Data Sharing in Dereg-
ulated Smart Grids. IEEE Access, vol. 3, no. , pp. 2743-2754, 2015.

[21] A.Yassine, S.Shirmohammadi. Measuring user’s privacy payoff using in-


telligent agents. Computational Intelligence for Measurement Systems
and Applications, 2009, CIMSA’09. IEEE International Conference on,
pp169-174

[22] A. Paverd, A. Martin, and I. Brown. Security and Privacy in Smart Grid
Demand Response Systems. Series Lecture Notes in Computer Science.
Volume 8448, pp 1-15, Smart Grid Security, Springer, 2014

24
[23] A.Yassine, S.Shirmohammadi. Privacy and the market for private data:
a negotiation model to capitalize on private data. IEEE/ACS Interna-
tional Conference on Computer Systems and Applications, Doha, 2008,
pp. 669-678.

[24] M. Sultan and K. N. Ahmed. SLASH: Self-learning and adaptive smart


home framework by integrating IoT with big data analytics. Computing
Conference, London, 2017, pp. 530-538.

[25] J. Lohokare, R. Dani, A. Rajurkar and A. Apte. An IoT ecosystem for


the implementation of scalable wireless home automation systems at
smart city level. TENCON 2017 IEEE Region 10 Conference, Penang,
2017, pp. 1503-1508.

[26] S. Yang. IoT Stream Processing and Analytics in the Fog. IEEE Com-
munication Magazine, vol. 55, no. 8, pp. 21-27, 2017.

[27] S. Singh, A. Yassine. IoT Big Data Analytics with Fog Computing for
Household Energy Management in Smart Grids. SGIoT 2018 - 2nd EAI
International Conference on Smart Grid and Internet of Things, Niagara
Falls, Canada, 2018

[28] S. Singh, A. Yassine. Big Data Mining of Energy Time Series for Behav-
ioral Analytics and Energy Consumption Forecasting. Energies, 2018,
11, 452

[29] B. Kang, D. Kim and H. Choo. Internet of Everything: A Large-Scale


Autonomic IoT Gateway. IEEE Transactions on Multi-Scale Computing
Systems vol. 3, no. 3, pp. 206-214, July-Sept. 1 2017.

[30] J. Y. Kim, H. J. Lee, J. Y. Son and J. H. Park. Smart home web of


objects-based IoT management model and methods for home data min-
ing 17th Asia-Pacific Network Operations and Management Symposium
(APNOMS), Busan, 2015, pp. 327-331

[31] M. M. Rathore, A. Ahmad and A. Paul. IoT-based smart city develop-


ment using big data analytical approach. IEEE International Conference
on Automatica (ICA-ACCA), Curico, 2016, pp. 1-8

25
[32] G. Daneels et al. Real-Time data dissemination and analytics platform
for challenging IoT environments. Global Information Infrastructure and
Networking Symposium (GIIS), St. Pierre, 2017, pp. 23-30.

[33] G. Poghosyan, I. Pefkianakis, P. Le Guyadec and V. Christophides. Ex-


tracting usage patterns of home IoT devices. IEEE Symposium on Com-
puters and Communications (ISCC), Heraklion, 2017, pp. 1318-1324

[34] J. He, J. Wei, K. Chen, Z. Tang, Y. Zhou and Y. Zhang. Multitier


Fog Computing With Large-Scale IoT Data Analytics for Smart Cities.
IEEE Internet of Things Journal, vol. 5, no. 2, pp. 677-686, April 2018

[35] H. J. Hong, P. H. Tsai, A. C. Cheng, M. Y. S. Uddin, N. Venkatasub-


ramanian and C. H. Hsu. Supporting Internet-of-Things Analytics in
a Fog Computing Platform. IEEE International Conference on Cloud
Computing Technology and Science (CloudCom) Hong Kong, 2017, pp.
138-145

[36] J. Li, T. Zhang, J. Jin, Y. Yang, D. Yuan and L. Gao. Latency estima-
tion for fog-based internet of things. 27th International Telecommunica-
tion Networks and Applications Conference (ITNAC), Melbourne, VIC,
2017, pp. 1-6.

[37] P. Patel, M. Intizar Ali and A. Sheth. On Using the Intelligent Edge
for IoT Analytics. IEEE Intelligent Systems vol. 32, no. 5, pp. 64-69,
September/October 2017.

[38] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate
generation. 2000 ACM SIGMOD International Conference on Manage-
ment of Data USA, pages. 1 12, 2000

[39] J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without
candidate generation: A frequent-pattern tree approach. Data Mining
and Knowledge Discovery vol. 8, no. 1, pp. 5387, 2004

[40] F. Mehdipour, B. Javadi and A. Mahanti. FOG-Engine: Towards Big


Data Analytics in the Fog IEEE 14th Intl Conf on Dependable, Auto-
nomic and Secure Computing. 14th Intl Conf on Pervasive Intelligence
and Computing, Auckland, 2016, pp. 640-646

26
[41] S. Makonin, B. Ellert, I. V. Bajic, F. Popowich. AMPds2 - Almanac of
Minutely Power dataset: Electricity, water, and natural gas consump-
tion of a residential house in Canada from 2012 to 2014. Scientific Data,
DOI 10.1038/sdata.2016.37, vol. 3, pp. 1-12, 2015.

[42] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman


and A. Y. Wu. An efficient k-means clustering algorithm: analysis and
implementation IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 24, no. 7, pp. 881-892, Jul 2002.

[43] C.A. Sugar and G.M. James. Finding the Number of Clusters in a Data
Set: An Information Theoretic Approach. J. Am. Statistical Assoc., vol.
98, pp. 750-763, 2003

[44] P.J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and


validation of cluster analysis. Journal 702 of Computational and Applied
Mathematics 1987, 20, 5365.

27

View publication stats

You might also like