Professional Documents
Culture Documents
of Things
Giacomo Veneri
Antonio Capasso
BIRMINGHAM - MUMBAI
Hands-On Industrial Internet of Things
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form
or by any means, without the prior written permission of the publisher, except in the case of brief quotations
embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented.
However, the information contained in this book is sold without warranty, either express or implied. Neither the
authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to
have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products
mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy
of this information.
ISBN 978-1-78953-722-2
www.packtpub.com
Thanks to Stefania and Gilda.
Thanks to Andrea Panizza, Giacomo Monaci, Renato Magliacani and Fausto Carlevaro for
their feedback and suggestions.
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as
well as industry leading tools to help you plan your personal development and advance
your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos
from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Packt.com
Did you know that Packt offers eBook versions of every book published, with PDF and
ePub files available? You can upgrade to the eBook version at www.packt.com and as a print
book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
customercare@packtpub.com for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a
range of free newsletters, and receive exclusive discounts and offers on Packt books and
eBooks.
Contributors
Antonio Capasso graduated in computer automation in 1999 and computer science in 2003
from the University of Naples. He has been working for twenty years on large and complex
IT projects related to the industrial world in a variety of fields (automotive, pharma, food
and beverage, and oil and gas), in a variety of roles (programmer, analyst, architect, and
team leader) with different technologies and software. Since 2011, he has been involved in
building and securing industrial IoT infrastructure. He currently lives in Tuscany, where he
loves trekking and swimming.
About the reviewer
Pradeeka Seneviratne is a software engineer with over 10 years' experience in computer
programming and systems design. He is an expert in the development of Arduino and
Raspberry Pi-based embedded systems, and is currently a full-time embedded software
engineer working with embedded systems and highly scalable technologies. Previously,
Pradeeka worked as a software engineer for several IT infrastructure and technology
servicing companies. He is the author of many books: Building Arduino PLCs, by Apress,
Internet of Things with Arduino Blueprints, by Packt, Raspberry Pi 3 Projects for Java
Programmers, by Packt, Beginning BBC micro:bit, by Apress, and Hands-on Internet of Things
with Blynk, by Packt.
Summary 53
Questions 54
Further reading 55
Chapter 3: Industrial Data Flow and Devices 56
Technical requirements 57
The I-IoT data flow in the factory 57
Measurements and the actuator chain 59
Sensors 60
The converters 63
Digital to analogical 63
Analog to digital 66
Actuators 69
Controllers 70
Microcontrollers 71
Embedded microcontrollers 73
Microcontrollers with external memory 73
DSPs 73
PLCs 74
Processor module 75
Input and output (I/O) module 76
Remote I/O module 76
Network module 76
Other modules 77
DCS 77
Industrial protocols 79
Automation networks 83
The fieldbus 84
Supervisory control and data acquisition (SCADA) 89
Historian 91
ERP and MES 95
The asset model 96
ISA-95 equipment entities 98
SA-88 extensions 99
Summary 101
Questions 102
Further reading 103
Chapter 4: Implementing the Industrial IoT Data Flow 105
Discovering OPC 106
OPC Classic 108
The data model and retrieving data in OPC Classic 111
OPC UA 113
The OPC UA information model 114
OPC UA sessions 117
The OPC UA security model 117
The OPC UA data exchange 122
[ ii ]
Table of Contents
[ iii ]
Table of Contents
[ iv ]
Table of Contents
[v]
Table of Contents
[ vi ]
Table of Contents
[ vii ]
Table of Contents
[ viii ]
Table of Contents
[ ix ]
Table of Contents
[x]
Table of Contents
[ xi ]
Preface
We are living in an era where automation is being used on a higher scale to get accurate
results. Industrial automation is one of those automation environments. To get an
automation environment, we first have to set up the network that can be accessed
anywhere and by any device. This is why the industrial IoT is gaining traction, and it is
been stated that by the end of 2020, there will be 20 billion connected devices, which clearly
demonstrates that there is traction for connected devices and the IoT.
This book is a practical guide that lets you discover the technologies and the use cases for
the industrial IoT. Taking you through the implementation of industrial processes,
specialized control devices, and protocols, it covers the process of identification and
connection of different industrial data sources gathered from different sensors. You will be
able to connect these sensors, such as AWS IoT, Azure IoT, OEM IoT platforms, and Google
IoT, to the cloud network, and extract this data from the cloud itself to your devices.
Over time, you will gain the knowledge to obtain the hands-on experience necessary for
using open source Node-RED, Kafka, Cassandra, and Python. You will develop streaming
and batch-based machine learning algorithms. By the end of this book, you will have
mastered the features of Industry 4.0 and will be able to build strong, faster, and more
reliable IoT infrastructure within your industry.
Chapter 2, Understanding the Industrial Process and Devices, defines the factory processes.
This chapter describes the concept of distributed control system (DCS), programmable logic
controllers (PLCs), supervisory control and data acquisition (SCADA), Historian,
manufacturing execution system (MES), enterprise resources planning (ERP), and fieldbus.
It introduces the International Electrotechnical Commission (IEC)-61131 and the CIM
pyramid. Finally, it designs a big picture, from equipment through to the cloud.
Chapter 3, Industrial Data Flow and Devices, details which equipment, devices, network
protocols, and software layers managing the industrial IoT data flow along its path, from
the sensors in the factory floor to the edge that is the external boundary of the industrial IoT
data flow inside the factory.
Chapter 4, Implementing the Industrial IoT Data Flow, explains how to implement the
industrial IoT data flow in a complex industrial plant. This journey starts with an
understanding of how to select the industrial data source to connect to for the purpose of
gathering the data and ends providing five network scenarios for edge deployment in
industrial plants.
Chapter 5, Applying Cybersecurity, explores the industrial IoT data flow from the
cybersecurity perspective, outlining the goals of the DiD strategy, and the most common
network architecture to secure the industrial control systems, including the five network
scenarios for edge deployment discussed in the previous chapter.
Chapter 6, Performing an Exercise Based on Industrial Protocols and Standards, discovers how
to implement a basic data flow from the edge to the cloud by means of OPC UA and Node-
RED.
Chapter 7, Developing Industrial IoT and Architecture, outlines the basic concepts regarding
industrial IoT data processing, providing the key principles for storing time series data,
handling the asset data model, processing the data with analytics, and building digital
twins.
Chapter 8, Implementing a Custom Industrial IoT Platform, shows how to implement a custom
platform leveraging on the most popular open source technologies: Apache Kafka, Node.js,
Docker, Cassandra, KairosDB, Neo4J, Apache Airflow, Mosquitto, and Docker.
Chapter 9, Understanding Industrial OEM Platforms, explores the most common industrial
IoT platforms developed by OEM vendors, from Siemens to BOSCH to General Electric.
[2]
Preface
Chapter 10, Implementing a Cloud Industrial IoT Solution with AWS, explores the solutions
proposed by Amazon Web Services (AWS) and the capabilities of the AWS IoT platform.
This chapter introduces the Edge IoT of AWS (Greengrass), the IoT Core, DynamoDB, AWS
analytics, and QuickSight, for the purpose of showing data. We will learn these
technologies by performing a practical exercise.
Chapter 11, Implementing a Cloud Industrial IoT Solution with Google Cloud, explores the
solutions proposed by the Google Cloud Platform (GCP) and the capabilities of the GCP
IoT platform, the GCP Bigtable, and GCP analytics.
Chapter 12, Performing a Practical Industrial IoT Solution with Azure, develops a wing-to-
wing industrial IoT solution leveraging on Azure, Azure Edge, and the Azure IoT platform.
Chapter 13, Understanding Diagnostics, Maintenance, and Predictive Analytics, introduces the
reader to the basic concepts of analytics and data consumption. It also develops basic
analytics for anomaly detection and prediction.
Chapter 14, Implementing a Digital Twin – Advanced Analytics, develops a physics-based and
data-driven digital equipment model to monitor assets and systems.
Chapter 15, Deploying Analytics on an IoT Platform, shows how to develop maintenance and
predictive analytics on Azure ML and AWS SageMaker. Finally, the chapter explores other
common technologies.
[3]
Preface
Once the file is downloaded, please make sure that you unzip or extract the folder using the
latest version of:
The code bundle for the book is also hosted on GitHub at https://github.com/
PacktPublishing/Hands-On-Industrial-Internet-of-Things. In case there's an update to
the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available
at https://github.com/PacktPublishing/. Check them out!
Conventions used
There are a number of text conventions used throughout this book.
model.fit(): Indicates code words in text, database table names, folder names, filenames,
file extensions, path names, dummy URLs, user input, and Twitter handles. Here is an
example: "Mount the directory /opt/ml as another disk in your system."
[4]
Preface
When we wish to draw your attention to a particular part of a code block, the relevant lines
or items are set in bold:
import pandas as pd
def my_function(x):
# standard operability
return 1+ 2*x
# plotting
plot(array)
Bold: Indicates a new term, an important word, or words that you see on screen. For
example, words in menus or dialog boxes appear in the text like this. Here is an example:
"Select System info from the Administration panel."
[5]
Preface
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book
title in the subject of your message and email us at customercare@packtpub.com.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes
do happen. If you have found a mistake in this book, we would be grateful if you would
report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking
on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we
would be grateful if you would provide us with the location address or website name.
Please contact us at copyright@packt.com with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in,
and you are interested in either writing or contributing to a book, please visit
authors.packtpub.com.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on
the site that you purchased it from? Potential readers can then see and use your unbiased
opinion to make purchase decisions, we at Packt can understand what you think about our
products, and our authors can see your feedback on their book. Thank you!
[6]
1
Introduction to Industrial IoT
This chapter introduces the reader to the IoT world, highlighting the key factors driving its
growth and the main technologies behind it. We will go through the basic concepts of the
IoT and how these have been applied, tailored, and specialized to fit Industrial Internet of
Things (I-IoT) scenarios. We will then look at the similarities and differences between the
IoT and the I-IoT, and consider some of the key use cases and expected outcomes of the I-
IoT. The reader will become familiar with some of the key concepts related to the I-IoT,
such as operational efficiency, preventive and predictive maintenance, and cost
optimization. In this chapter, we will also clarify the kind of I-IoT that will be dealt with in
this book.
IoT background
IoT key technologies
What is the I-IoT?
Use cases of the I-IoT
IoT and I-IoT—similarities and differences
IoT analytics and AI
Industrial environments and scenarios involving I-IoT
Technical requirements
In this book, we will work with several open source technologies and proprietary
technologies. To simplify the tests that we will carry out, we will use Docker to deploy
databases and frameworks.
Introduction to Industrial IoT Chapter 1
IoT background
Over the last few years, the IoT has become a viral topic in the digital world, one that is
discussed, debated, and analyzed in many channels, forums, and contexts. This is common
for all new or emerging software technologies. Developers, architects, and salespeople
discuss the capabilities, impacts, and market penetration of the new technology in forums,
blogs, and specialized social media. Think, for example, of Docker or Kubernetes, which are
changing the ways in which software applications are designed, implemented, and
deployed. They are having a tremendous impact on the digital world in terms of time to
market, software life cycle, capabilities, and cost. Despite this, they remain primarily
confined to their specialized fields. This is not the case for the IoT.
Over the last 10 years, the IoT has become a familiar topic in the mass media. This is
because the IoT is more than just a new technology that impacts a restricted range of people
or a specific market. It can be better understood as a set of technologies that impacts us all,
and will change markets, even creating new ones. The IoT is changing our lives, feelings,
and perceptions of the physical world daily, by modifying how we interact with it. The
development of the IoT is a crucial moment in the history of humanity because it is
changing our mindset, culture, and the way we live. Just like the internet age, we will have
a pre-IoT phase and a post-IoT phase. The IoT era will be not an instantaneous transition,
but a gradual and continuous shift during which the evolution never stops.
Currently, we are just at the beginning of this journey. Like the arrival of e-commerce or
mobile applications, there is a certain time lag between when you hear about an upcoming
technology and when it actually exists in the real world. But the change has started. We are
moving toward a world in which we interact increasingly not with physical objects, but
with their digital images that live in the cloud and communicate with other digital images.
These images are integrated through a powerful injection of digital intelligence, which
makes them capable of suggesting actions, making decisions autonomously, or providing
new and innovative services. You might currently be able to regulate your heating system
remotely, but if it lived in the cloud and received information from your car, your calendar,
your geolocation, and the weather, then your heating system would be able to regulate
itself. When an object lives in the cloud and interacts with other digital images in a web of
artificial intelligence, that object becomes a smart object.
[8]
Introduction to Industrial IoT Chapter 1
These developments might seem to be paving the way for a new and perfect world, but
there is a dark side to the IoT as well. A lot of personal data and information is stored in the
cloud, in order that artificial intelligence can extrapolate information about us and profile
our behaviors and preferences. From a different perspective, therefore, the cloud
could also be seen as a sort of Big Brother, as in George Orwell's novel 1984. There is the
possibility that our data and profiles could be used not just to enhance our lifestyles, but
also for more malicious purposes, such as political or economic influence on a large scale.
An example of this was the Cambridge Analytica scandal that occurred in March 2018. It
was widely reported at the time that this company had acquired and used the personal data
of Facebook users from an external researcher who had told Facebook he was collecting it
for academic purposes. This researcher was the founder of the third-party app
thisisyourdigitallife, which was given permission by its 270,000 users to access their data
back in 2015. By providing this permission, however, these users also unknowingly gave
the app permission to access to the information of their friends, which resulted in the data
of about 87 million users being collected, the majority of whom had not explicitly given
Cambridge Analytica permission to access their data. Shortly afterward, the media aired
undercover investigative videos revealing that Cambridge Analytica was involved in
Donald Trump's digital presidential campaign.
This book will not go into detail about the social, legal, political, or economic impacts of the
IoT, but we wanted to highlight that it does have a dark side. More often than not, human
history has demonstrated that a technology is not good or bad in itself, but instead becomes
good or bad depending on how it is used by humans. This is true for the IoT. Its power is
tremendous, and it only just starting to be understood. Nobody yet knows how the IoT will
develop from now on, but we are all responsible for trying to control its path.
[9]
Introduction to Industrial IoT Chapter 1
Kevin Ashton, the Executive Director of Auto-ID Labs at MIT, was the first to describe the
IoT in a presentation for Procter and Gamble. During his 1999 speech, Mr. Ashton stated as
follows:
“Today, computers, and therefore the Internet, are almost wholly dependent on human
beings for information. Nearly all of the roughly 50 petabytes (a petabyte is 1,024
terabytes) of data available on the Internet was first captured and created by human beings
by typing, pressing a record button, taking a digital picture, or scanning a bar code. The
problem is, people have limited time, attention, and accuracy, all of which means they are
not very good at capturing data about things in the real world. If we had computers that
knew everything there was to know about things, using data they gathered without any
help from us, we would be able to track and count everything and greatly reduce waste,
loss and cost. We would know when things needed replacing, repairing or recalling and
whether they were fresh or past their best.”
Kevin Ashton believed that radio-frequency identification (RFID) was a prerequisite for
the IoT, and that if all devices were tagged, computers could manage, track, and inventory
them.
In the first decade of the 21st century, several projects were developed to try to implement
and translate into the real world the IoT philosophy and Ashton's innovative approach.
These first attempts, however, were not so successful. One of the most famous and
emblematic cases was the Walmart mandate (2003). By placing RFID tags with embedded
circuits and radio antennas on pallets, cases, and even individual packages, Walmart was
supposed to be able to reduce inefficiencies in its massive logistics operations and slash out-
of-stock incidents, thus boosting same-store sales.
In 2003, Walmart started this pilot project to put RFID tags, carrying electronic product
codes, on all pallets and cases involving all of its suppliers. In 2009, Procter and Gamble,
one of the main suppliers involved in the project, stated that it would exit from the pilot
project after validating and checking the benefits of RFID in merchandising
and promotional displays.
The unsuccessful story of the Walmart RFID project was caused by various factors:
Most of the technologies used were in their initial stages of development and
their performance was poor. They had sensors with little information, and Wi-Fi
or LAN connectivity with high power and bandwidth usage.
The sensors and connectivity devices were expensive due to the small market
size.
[ 10 ]
Introduction to Industrial IoT Chapter 1
There were no common standards for emerging technologies, and there was a
lack of interoperability between legacy systems.
Business cases were not very accurate.
The technology infrastructure and architecture was organized in vertical silos
with legacy hardware and middleware, and a lack of interactions between each
silo.
Technology infrastructure and software architecture was based on a client-server
model that still belonged to the so-called second digital platform.
From 2008, several changes were introduced to deal with the preceding issues, which were
led mainly by the mobile market. These included the following:
Due to these changes, the IoT evolved into a system that used multiple technologies. These
included the internet, wireless communication, micro-electromechanical systems, and
embedded systems such as the automation of public buildings, homes, factories, wireless
sensor networks, GPS, control systems, and so on.
The IoT consists of any device with an on/off switch that is connected to the internet. If it
has an on/off switch, then it can, theoretically, be part of a system. This includes almost
anything you can think of, from cell phones, to building maintenance, to the jet engine of an
airplane. Medical devices, such as a heart monitor implant or a bio-chip transponder in a
farm animal, are also part of the IoT because they can transfer data over a network. The IoT
is a large and digital network of things and devices connected through the internet. It can
also be thought of as a horizontal technology stack, linking the physical world to the digital
world. By creating a digital twin of the physical object in the cloud, the IoT makes the object
more intelligent thanks to the interaction of the digital twin with the other digital images
living in the cloud.
[ 11 ]
Introduction to Industrial IoT Chapter 1
The technologies that have led to the evolution of IoT ecosystems are as follows:
New sensors that are much more mature, have more capabilities, and offer high
performance at a lower cost. These smart sensors are natively designed to hide
the complexity of the signal processing, and they interact easily through a digital
interface. The smart sensor is a system itself, with a dedicated chip for signal
processing. The hardware for signal processing is embedded in each sensor and
miniaturized to the point that it is part of the sensor package. Smart sensors are
defined by the IEEE 1451 standard as sensors with a small memory and
standardized physical connections to enable communication with the processor
and the data network. As well as this, smart sensors are the combination of a
normal sensor with signal conditioning, embedded algorithms, and a digital
interface. The principal catalyst for the growth of smart-sensing technology has
been the development of microelectronics at reduced cost. Many silicon
manufacturing techniques are now being used to make not only sensor elements,
but also multilayered sensors and sensor arrays that are able to provide internal
compensation and increase reliability. The global smart sensor market was
evaluated at between $22 and $25.96 billion in 2017. It is forecast to reach
between $60 and $80 billion by the end of 2022.
New networks and wireless connectivity, such as personal area networks
(PANs) or low power networks (LPNs), interconnect sensors and devices in
order to optimize their bandwidth, power consumption, latency, and range. In
PANs, a number of small devices connect directly or through a main device to a
LAN, which has access to the internet. Low-Power Wide-Area Networks
(LPWANs) are wireless networks designed to allow long-range communications
at a low bit rate among battery-operated devices. Their low power, low bit rate,
and their intended use distinguish these types of network from the already
existing wireless WAN, which is designed to connect users and businesses and
carry more data, using more power. (More information can be found on WANs
at https://en.wikipedia.org/wiki/Wireless_WAN.)
[ 12 ]
Introduction to Industrial IoT Chapter 1
New processors and microprocessors coming from the world of mobile devices.
These are very powerful and very cheap. They have produced a new generation
of sensors and devices based on standardized and cheap hardware that is driven
by open and generic operating systems. These use common software frameworks
as an interface, allowing you to transition from a legacy solution, with strictly
coupled hardware and software, to a platform built on the COTS component and
the adoption of an open software framework.
The battle of the real-time operating system (RTOS) to gain a larger slice of new
markets between the big market players. This places more sophisticated and
powerful integrated development platforms at the maker's disposal.
Virtualization technology, which divides naturally into the data center, big data
and the cloud. This leads to the following features:
CPUs, memory, storage, infrastructures, platforms, and software
frameworks available as services on demand, with flexible and
tailored sizing. These are cheap and available without capital
expenditure (CAPEX) investment.
Elastic repositories for storing and analyzing the onslaught of data.
The profitable and flexible operational expenditure (OPEX) model
per CPU, memory, storage, and IT maintenance services. This
creates a business case for migrating the legacy data,
infrastructure, and applications to the cloud, and making the
collection of big data and subsequent analytics possible.
The convergence of IT and operational technology (OT). This has led to the
increasing adoption of COTS components in sectors in which hardware was
traditionally developed with specific requirements such as is the case in
industrial plants.
The diffusion of mobile devices and social networks has created a culture and a
generic mindset with an embedded expectation for the market consumers to
encounter the world through an app which shares related information.
The preceding factors are making it possible to transition from a vertical, legacy platform
with an application organized hierarchically, with the data confined in silos, to a horizontal,
modular, and cloud-based platform. This new platform uses a standardized API layer that
provides high interoperability capabilities and the ability to share data and information
between applications.
[ 13 ]
Introduction to Industrial IoT Chapter 1
Let's consider what might happen if the Walmart project was carried out now. In 2003, the
only RFID technology that existed was active RFID systems. Active RFID systems use
battery-powered RFID tags that continuously broadcast their own signals. They provide a
long-read range, but they are also expensive and consume a lot of power. Passive RFID
systems, on the other hand, use tags with no internal power source, and are instead
powered by the electromagnetic energy transmitted from an RFID reader. They have a
shorter read range, but they are unembeddable, printable, and much cheaper, which makes
them a better choice for many industries. Also, at the time of the Walmart project, there
were no PANs or LPNs to capture and transmit the label data, meaning the developers had
to adopt an expensive, wired connection to transfer the information. The data was then
stored in a legacy database and processed by a custom application. If the Walmart project
were to be carried out now, instead of in 2003, the tracking information could be carried out
by passive RFIDs. The data could be captured by a PAN and transmitted via the cloud to be
processed by an application built on top of a common API and framework. This means that
all data and information could be easily shared between the project partners. According to
Forbes and Gartner, the IoT market and connected devices is expected to grow strongly in
the next year, as shown by the following statistics:
The IoT market will grow from$900 million in 2015 to $3.7 billion in 2020 (McKInsey).
Source: https://www.forbes.com/sites/louiscolumbus/2016/11/27/roundup-
of-internet-of-things-forecasts-and-market-estimates-2016/
#10e08343292d.
There will be 21 billion IoT devices by 2021 (Gartner). Source: https://www.
informationweek.com/mobile/mobile-devices/gartner-21-billion-iot-
devices-to-invade-by-2020/d/d-id/1323081.
[ 14 ]
Introduction to Industrial IoT Chapter 1
Energy
Smart cities
Smart Homes and Buildings
Government and military
Security forces
Education
Sports and fitness
All of these are already involved in the digital transformation that has been caused by the
IoT, and are likely to play a greater role in this in the future.
Across all uses of the IoT, the common feature is the smart object. From a qualitative
perspective, a smart object is a multidisciplinary object which includes the following
elements:
In another definition, the article How Smart, Connected Products Are Transforming
Competition, written by Michael E. Porter and James E. Heppelmann, details four increasing
levels that classify the smartness of an object or product:
[ 15 ]
Introduction to Industrial IoT Chapter 1
IoT is focused on a data stream, rather than having huge amounts (petabytes) of
data storage
IoT can use on-premises solutions through virtualization technology
Machine learning on IoT is not as productive as simple threshold rules or
physics-based analytics
These concepts have been highlighted by the The Eclipse Foundation's 2018 IoT survey. The
following diagram shows the adoption of the cloud technologies by companies:
Eclipse's IoT survey—the technologies underlined in red are those that will be discussed in this book
[ 16 ]
Introduction to Industrial IoT Chapter 1
The following shows IoT technology adoption from a storage point of view:
Eclipse's IoT survey—the technologies underlined in red are those that will be discussed in this book
In this book, we will explore the most common IoT cloud solutions, such as AWS, GCP, and
Azure, and the most common OEM I-IoT platforms, such as Bosch IoT, Predix, and
MindSphere, to provide state-of-the-art IoT technology. We will also look at other common
open source technologies, including those in the following table:
[ 17 ]
Introduction to Industrial IoT Chapter 1
These technologies can be used to build an IoT platform from scratch or to integrate with an
existing one. We will also consider other commonly used commercial software in the
industrial environment.
We will discover the new generation of edge computing and the edge gateway, and, finally,
we will deal with machine learning and artificial intelligence. This journey is also the
journey of the IoT from the cloud to the big revolution expected around 2020:
[ 18 ]
Introduction to Industrial IoT Chapter 1
The IoT is almost, by definition, the key for further development of the manufacturing
industry by including technologies such as big data analytics, the cloud, robotics, and most
importantly, the integration and convergence between IT and OT.
Generally speaking, the term I-IoT refers to the industrial subset of the IoT. The I-IoT, like
the IoT, is not just a specific new technology, but instead refers to the whole chain of value
of a product. Similarly, the I-IoT impacts all sectors of the industrial world by significantly
modifying the processes at each stage of the life cycle of a product, including how it is
designed, made, delivered, sold, and maintained. Like the IoT, we are just at the beginning
of the I-IoT journey.
[ 19 ]
Introduction to Industrial IoT Chapter 1
The I-IoT is expected to generate so much business value and have such a deep impact on
human society that it is leading the Fourth Industrial Revolution.
The global IoT market will grow from $157 billion in 2016 to $457 billion by 2020,
attaining a compound annual growth rate (CAGR) of 28.5%
Discrete manufacturing, transportation and logistics, and utilities will lead all
industries in IoT spending by 2020, averaging $40 billion each
The following is a list of several I-IoT use cases in manufacturing and their benefits:
[ 20 ]
Introduction to Industrial IoT Chapter 1
Ultimately, all of these use cases highlights that data plays a key role. In the next few
chapters, we will see how the data that comes from sensors and other industrial equipment
is gathered and how big that data can be. Manufacturers who use this data can bridge the
gaps between the planning, the design, the supply chain, and the customer of a particular
product. In addition, thanks to this strong integration, shared data and information islands
of automation can be easily linked together.
Cyber security is a critical topic for any digital solution, but its implementation in
the industrial world requires special attention. This is because the OT systems
and devices in industry have a much longer life cycle and are often based on
legacy chips, processors, and operating systems that are not designed to be
connected over the internet. This means they live in an isolated LAN, protected
by a firewall from the external world.
It is critical to ensure that industrial digital devices stay running; any temporary
disruption can imply a large economic loss.
I-IoT solutions must co-exist in an environment with a significant amount of
legacy operation technologies. They must also co-exist with different devices
acting as data sources, including SCADA, PLCs, DCS, various protocols and
datasets, and the back-office enterprise resource planning (ERP) systems as well.
Industrial networks are specialized and deterministic networks, supporting tens
of thousands of controllers, robots, and machinery. I-IoT solutions deployed into
these networks must, therefore, scale tens of thousands of sensors, devices, and
controllers seamlessly.
Physical objects in the industrial world are more complex and have a wider
range of typologies when compared to the consumer world.
In the industrial world, robustness, resilience, and availability are key factors.
Usability and user experience, however, are not as relevant as they are in the
consumer world.
[ 21 ]
Introduction to Industrial IoT Chapter 1
In the IoT, from the technical point of view, we can identify two broad categories of
analytics:
Physics-based and data-driven analytics can be combined to build a reliable hybrid model.
[ 22 ]
Introduction to Industrial IoT Chapter 1
Recently, the introduction of deep learning (a branch of machine learning) in the contexts of
image and audio processing has brought a lot of attention to data-driven technologies.
Artificial intelligence is nothing without data; the IoT is nothing but data.
We are now aiming to expand the application of deep learning to the I-IoT to improve
speed and accuracy in data analysis. In addition to audio and image data, IoT data can be
processed with deep learning based on learning, inference, and actions.
Resolving both of these issues will ensure that an abundance of caution is built into
machine learning models used in industrial applications. We need to not only create better
algorithms, but also make sure that people with domain expertise understand machine
learning suggestions. We also need to build systems that take in feedback, and are aware of
the end user and the effects of a good or bad response.
[ 23 ]
Introduction to Industrial IoT Chapter 1
In this book, we will not cover scenarios in which there are too few signals to be collected
for each data source to justify the need of an edge device.
Summary
In this chapter, we have analyzed the origin of the IoT and looked at how it came about
through a combined set of technologies. We then learned about the key technologies that
underlie the IoT, by going into its use cases and business models. We defined the IoT as a
technological layer that creates a digital twin of a physical object in the cloud, making the
object more intelligent due to the interaction of its digital twin with other digital images
living in the cloud. We also identified four levels to define the smartness of a product or
object.
We then looked at how the IoT can be applied to the industrial world, thereby beginning
the Fourth Industrial Revolution and Industry 4.0. We looked at the key transformation
elements that mark out the I-IoT. We also highlighted some of the main use cases of the I-
IoT and the main differences between the IoT and the I-IoT. We then listed and understood
the different types of analytics that apply to industrial data. Finally, we clarified and
defined the industrial scenarios that will be covered in the rest of the book.
In the following chapter, we are going to look at how a factory is structured and organized
from an OT perspective. We will consider who produces, processes, and enriches the data.
We will also explore some key concepts, including deterministic, real-time, closed loop,
sensor, fieldbus, PLCs, CNC, RTU, SCADA, HISTORIANS, MES, and ERP.
[ 24 ]
Introduction to Industrial IoT Chapter 1
Questions
1. Why did the Walmart mandate fail?
2. What are the main enabling factors for the IoT?
3. Which are the main technologies underlying the IoT?
4. What is a smart object?
5. What is the main scope of the I-IoT?
6. What are the main differences between the IoT and the I-IoT?
7. What are the two main categories of analytics in the I-IoT?
Further reading
Read the following articles for more information:
[ 25 ]
2
Understanding the Industrial
Process and Devices
In this chapter, the reader will understand how industrial data is generated, gathered, and
transferred to the cloud. We will look at continuous discrete processes and how they work,
becoming familiar with the model of computer-integrated manufacturing (CIM) from its
origin in the factories of the 1980s to the current day. The reader will learn about industrial
equipment, networks, and protocols, and come to understand terms such as distributed
control system (DCS), programmable logic controllers (PLCs), Supervisory Control and
Data Acquisition (SCADA), Historian, manufacturing execution system (MES),
Enterprise Resources Planning (ERP), and Fieldbus. We will also look at how the
industrial world interacts with the cloud, and look at the devices and protocols that allow
this to happen. Related to this, we will learn terms including OPC Proxy, store and
forward, edge, and IoT gateway. All the concepts that are sketched in this chapter will be
further explained and analyzed over the next few chapters. We will provide a high-level,
but all-encompassing, vision of the I-IoT from a data perspective. We will look at the path
of industrial signals, from their generation by the sensors to their processing in the cloud.
Technical requirements
In this chapter, we will present concepts related to industrial processes and equipment. A
basic knowledge about analogical signal processing and analogical/digital and
digital/analogical conversion is required. You will also need to be aware of elements of
control theory, and LAN and WAN networking.
Energy
Machines
Tools
Human work
[ 27 ]
Understanding the Industrial Process and Devices Chapter 2
The industrial process is a sequential process. It can be split into a further set of sequential
production steps, transforming the raw materials into the desired state along the way, as
shown in the following diagram:
[ 28 ]
Understanding the Industrial Process and Devices Chapter 2
In an automated system, we can identify the physical processes and the control system, as
shown in the following diagram:
[ 29 ]
Understanding the Industrial Process and Devices Chapter 2
Physical processes can be defined as the sum of the operations that act on entities
belonging to the physical world and which change some of their characteristics. Operations
that fit this definition include material or part movements, mechanical processing, or
chemical reactions. These physical processes can be considered objects of automation. Pure
and simple information, on the other hand, does not make changes to the real world, and so
cannot be considered a physical process.
A physical process receives raw materials and energy as inputs. It also receives
information, which can be in the form of electric voltage, current values, or fluid pressure,
or which can be coded in sequences of binary values. It produces output materials in the
form of finished products and waste, and also sends information. The noises coming from
the environment that act on the process can also be considered as inputs to the process
itself.
Sensor: Transforms the variable to be measured into the type necessary for
measurement
Transducer: Accepts information in the form of a physical or chemical variable,
and converts it into a magnitude of a different nature—typically electric
Very often, sensors and transducers coincide in the same physical component. We generally
call a device a sensor (or a transducer) if it measures a magnitude and gives an output as a
signal, typically an electrical one. The incoming information is used by the actuators to set
the value of the control variables for the process. Usually, the real actuator is built by a pre-
actuator, which processes the information to convert it into a power signal. Sensors,
actuators, and pre-actuators carry out part of the physical process and act as interfaces to
the control system.
A control system receives information on the status of the process from the sensors and
processes them according to the specified algorithms. It then sends to the actuators
information related to actions that provide the desired control of the physical process. The
control system also receives information from one or more external entities, such as human
operators or other control systems that are hierarchically higher. It is also able to
provide information about its own status and the controlled process to the external entities.
[ 30 ]
Understanding the Industrial Process and Devices Chapter 2
[ 31 ]
Understanding the Industrial Process and Devices Chapter 2
[ 32 ]
Understanding the Industrial Process and Devices Chapter 2
In the modern control and measurement systems for industrial processes, PLCs, DCS, and
usual computers are used for this purpose. Generally speaking, PLCs and DCSes are
connected directly to the sensors and actuators, whilst a usual computer exchanges
information with other devices such as PLCs or other computers through communication
networks.
[ 33 ]
Understanding the Industrial Process and Devices Chapter 2
A more precise classification splits each of these two main classes depending on the
countability of their input and output. These classes are continuous, batch, semi-continuous
and discrete, all of which are described in the following section.
Continuous processes
Continuous processes involve the continuous mass transformations of energy and
momentum on a flow of material. In a continuous process in normal operation, the goal is
to obtain a product of uniform quality over time, regardless of how long the process has
remained active. A process may last days or even weeks, and may even only be turned off
for verification operations, cleaning, maintenance (either scheduled or unplanned due to
failure). Continuous processes, which are also called industry or process controls, are as
follows:
Energy production
Distribution of energy, water, and gas
Crude oil and gas extraction
Rolling plants
[ 34 ]
Understanding the Industrial Process and Devices Chapter 2
Batch processes
Batch processes involve finite quantities of the final product, which are obtained from finite
quantities of raw materials and then processed according to an ordered set of activities in a
finite time interval. Batch processes can be applied to the following:
Semi-continuous processes
Semi-continuous processes have some characteristics in common with both continuous
and batch processes. Typical examples of semi-continuous processes are the following:
[ 35 ]
Understanding the Industrial Process and Devices Chapter 2
Discrete processes
Discrete processes are characterized by processing cycles based on single parts or
individual units of a product. In a discrete process, both the raw materials and the final
product are countable. Typical examples of batch processes are as follows:
Business: These are activities that strictly face the customer and are related to the
start and end of the whole process. These include order management, marketing,
sales, and the budget.
Design: These are activities related to designing the product according to the
customer's needs, expectations, and requirements.
Planning: These are activities based on planning the business and design
functions of a product. They involve the working sequence, timing, storage, and
supplies.
Control: These are activities related to the management and supervision of the
process production. It includes controlling the production flow and checking the
quality of the production processes and products.
[ 36 ]
Understanding the Industrial Process and Devices Chapter 2
We can sketch these activities as a loop around the production activities managing the
related flow of information, as shown in the following diagram:
For all of these, there are specific software applications for automating and coordinating the
different activities. These include the following:
This type of approach, however, might lead to the creation of automation islands that are
not integrated with each other. A huge improvement can be achieved by integrating all
subsystems at the company level. This can be done as follows:
[ 37 ]
Understanding the Industrial Process and Devices Chapter 2
However, in all production processes, there is a trade-off between the variety and the
quantity of the production in terms of the number of parts or the output. Typically, the
higher the quantity of the production, the less variety there can be in the production.
This trade-off threshold can be then shifted up through a very flexible production system
that uses highly configurable machinery. This machinery can be easily set up for different
uses and allows for a strict interaction and integration between the information systems of
the production process and the support activities, as depicted in the following diagram:
[ 38 ]
Understanding the Industrial Process and Devices Chapter 2
CIM is a logical model for production systems that were developed in the 1990s to integrate
the production processes, the automation systems, and the information technology systems
at a company or enterprise level.
CIM should not be considered as a design technique for building automatic factories, but
rather as a reference model for the implementation of industrial automation based on the
collection, coordination, sharing, and transmission of data and information between the
different systems and sub-systems by means of software applications and communication
networks.
The CIM model is often depicted as a pyramid made up of six functional levels, as shown
in the following diagram:
[ 39 ]
Understanding the Industrial Process and Devices Chapter 2
Level 1—field: This level includes all devices that interact directly with the
process, such as sensors and actuators.
Level 2—command and control systems: This level includes the devices that
interact directly with the sensors and the actuators, such as PLCs, micro-
controllers, proportional-integral-derivative (PID) controllers, robot controllers,
and computer numerical controllers (CNCs).
Level 3—cell supervisory: In a cell, a complete sub-process of production is
executed through various devices and machines that must be coordinated with
each other. The main functions of the control devices placed in this level are the
following:
The receipt of the instructions from the upper level.
The transformation of the lower level devices into actions and
commands.
The collection of information from the lower levels to pass to the
upper level.
The arrangement of information for the human supervisor, who
may eventually issue commands or set up set-points or thresholds.
Level 4—plant supervisory: At this level, the production database collects and
stores the main parameters of the production process, and the coordination
between the cells is carried out to implement the whole production process.
Level 5—plant management level: At this level, the aforementioned integration
between the support systems takes place.
Level 6—company management: Typically, a company handles several plants,
so at this level the information is collected from the lower levels to feed the
decision support systems that help managers to plan flows of materials and
finance necessary for the maintenance, improvement, and optimization of the
production process.
The pyramidal shape used to present the CIM levels is suitable for a hierarchical
organization in which the following occurs:
Each level communicates directly with the upper one, from which it receives
commands and sends information
Each level communicates directly with the lower one to send commands and
receive information
Each level sends to the upper level less information at a lower frequency than
that received from the lower level
[ 40 ]
Understanding the Industrial Process and Devices Chapter 2
This strict and mutual interaction between the different systems and sub-systems involved
in the production cycle has a number of benefits. Among the most important of these are
the following:
In IT companies that provide automation services to factories, the CIM pyramid is typically
simplified to a three-level structure:
Both coalesced representations of the CIM pyramid are shown in the following diagram:
[ 41 ]
Understanding the Industrial Process and Devices Chapter 2
[ 42 ]
Understanding the Industrial Process and Devices Chapter 2
The control devices and the industrial networks will be analyzed and discussed more
thoroughly in Chapter 3, Industrial Data Flow and Devices. In this section, we will provide a
description of each of these industrial systems. This will introduce the next section, which
will present the flow of industrial data from the factory to the cloud.
A description of the devices and applications related to the CIM pyramid is given in the
following section.
"The primary element of a measurement chain, which converts the input variable in a
signal suitable to the measurement."
"The device that accepts information in the form of a physical variable (its input variable)
and converts it to an output variable of the same or of different magnitude, according to a
defined physical law."
Since the sensor and the transducer are often physically within the same component, the
two terms are often used as synonyms. The actuator is a transducer that transforms a
command signal into a physical action on the process. Basically, its function is
complementary to that of the sensor. It receives a signal as an input in the physical domain
of the control device and sends energy as an output in the physical domain of the command
variable. They are often made by means of multiple transducers.
[ 43 ]
Understanding the Industrial Process and Devices Chapter 2
Embedded controller is generally a single chip or board that includes all the
necessary components to carry out the required control tasks. They are usually
designed for a specific application and are built on top of a specific or custom
hardware.
Computer Numerical Controls (CNCs) are machine tools that are controlled by
an electronic device integrated into the machine. Movements and functions of
CNC machines are predefined and set through a specific software. They are used
for performing high-precision machining that requires long processing times
without any interaction with the external environment.
PLC is an industrial controller that is dedicated to the control of industrial
processes. The PLC executes a program in a cyclical fashion, processing the
signals coming as input from the sensors and sending the output values to the
actuators to control the physical process. The reading of the inputs, their
processing, and finally the writing of the outputs, takes place within a predefined
maximum time, called a scan cycle. This typically takes between 10 and 100
milliseconds.
DCSes are typically used in continuous processes such as in refineries, energy
production, or chemical plants. They integrate both the control function
implemented in the PLC and the supervision of the SCADA system. While the
PLC and the SCADA are two separate systems, each with their own variables
and data structures, in the DCS, the control and the supervision of their
processing tasks share the same variables and data structures.
Receives data and information from the PLCs of the control system, and through
them from the sensors, RTUs, and the other devices that are located at the lower
levels of the CIM pyramid.
Processes the gathered data, storing the most relevant information.
[ 44 ]
Understanding the Industrial Process and Devices Chapter 2
Detects process anomalies, triggering alarms that are escalated to the operators in
the control room.
Presents information to the operators in the control room through a Human
Machine Interface (HMI).
From the HMI synoptics, the operator can monitor the data of the overall plant
under the supervisory of the SCADA application. If needed, the operator can also
react promptly through the sequences and operational logics implemented in the
SCADA system by sending commands to the PLCs.
Level 4 – MES
The MES is a software system located between the ERP and the SCADA or PLC to manage
the production process of a company efficiently. Its main function is to synchronize the
business management and the production system by bridging the planning and control
levels to optimize processes and resources.
Level 5 – ERP
ERP includes the systems and software packages that organizations use to manage the
daily activities of their business, such as accounting, procurement, project management,
and production. ERP systems combine and define a set of business processes governing the
exchange of information and data between the systems involved. ERP collects and shares
transactional data coming from different departments of the organization, thus ensuring
data integrity by acting as a single source of truth.
[ 45 ]
Understanding the Industrial Process and Devices Chapter 2
CIM networks
At the lower levels, we have a large amount of simple information that must be transferred
within small and predefined deterministic time intervals. At the higher levels, the
requirement is to transfer much more complex information within time intervals that do
not need to be deterministic. So, in an integrated production system, different types of
communication networks are required, each one specialized for the task to which it is
dedicated. With regards to the CIM pyramid, we have the following networks:
Level 1: fieldbus
Level 2: control bus
Levels 3, 4 and 5: corporate network
Field or fieldbus networks were introduced for interfacing control devices, sensors, and
actuators, reducing the need for complex wiring. In the fieldbus, sensors and actuators
are equipped with a minimum set of processing capabilities to ensure the transmission of
information in a deterministic way.
Control network must ensure the communication between the devices is dedicated to the
control and the supervision of the plants. In this case, the information sent is not very
complex, but the transmission must occur within predefined deterministic time intervals
and at higher frequencies. Control and fieldbus networks are also often called real-time
networks because of their determinism in the transmission of data and information.
Corporate network is the network channel located between the control systems and the
planning and the management systems. This connectivity layer must guarantee the
processing of complex information, but at low frequencies. For this reason, there is no need
to have determinism on this network.
[ 46 ]
Understanding the Industrial Process and Devices Chapter 2
Here, when explaining the I-IoT data flow, we have not mentioned cybersecurity. This is
not because the topic is not pertinent or relevant. On the contrary, cyber-security is always
a hot topic, especially in an industry in which availability and reliability of equipment is a
must, and confidentiality and intellectual properties are increasingly more important. In
addition, opening systems and devices isolated until now, and not designed to be linked to
the internet, has put them under stress from the cyber-security perspective once their
isolation is broken. For this reason, Chapter 5, Applying Cybersecurity, will be dedicated to
discussing cyber-security in industrial control system (ICS) and intellectual property (IP)
in the I-IoT.
The following diagram shows the I-IoT data flow from the sensors to the cloud:
[ 47 ]
Understanding the Industrial Process and Devices Chapter 2
This cycle is the same for all kinds of controllers (embedded controllers, PLCs, or DCSes,
for example) but a cycle's duration varies according to the kind of controllers in use:
In any case, any controller, in every scan cycle, whatever its duration is, performs a reading
of all the sensors and calculates the values for all actuators linked to it. For example, a PLC
that controls a gas turbine can manage up to 3,000 readings coming from the sensors or as
feedback commands from the actuators. Basically, this means that 3,000 values are read
every 100 milliseconds, 30,000 every second, 1,800,000 every minute, 108,000,000 every
hour, and 2,592,000,000 each day. This is just for one PLC controlling a single equipment,
even if that piece of equipment is quite complex, like a gas turbine.
The values refer to analog signals, such as temperatures, pressures, or Boolean for two-
status representation, such as whether a valve is open or closed, or whether a part in a
conveyor is present or not. The sampling of these values come as an input to the systems of
the upper level, such as SCADA and Historian, as a representation of the related signals. In
the industrial world, the digital representation of a signal is named a tag, and its samples
are called time-series. A time-series is an ordered sequence of timestamps and values
related to a signal in a particular time interval. SCADA systems and Historian gather the
data from the controllers, but with a higher scan time. Typically, they collect the time-series
using a scan cycle from 500 to 1,000 milliseconds in SCADA applications, and from one
second to one minute or more in the Historian systems.
[ 48 ]
Understanding the Industrial Process and Devices Chapter 2
In addition, a smoothing filter is applied to the sample of analog signal that is gathered by
the SCADA and Historians by ignoring incremental changes in values that fall within a
deadband, which is centered around the last reported value. Samples of the Boolean signals
are typically collected when their state changes rather than at every scan time. In this way,
the amount of the data that SCADA and Historian must deal with is much less compared to
that managed by controllers, but it has a lower time resolution. The balance between the
amount and the resolution of the data collected is a trade-off, depending mainly on the
dynamic of the industrial process and the resources that are available. PLCs and DCSes can
also provide other information to the upper levels besides the time-series, such as the
following:
A memory buffer area that temporarily stores the values of the internal variables
of the controllers at its scan time in a sliding time window. The memory buffer
area can be exported as a binary or text file on an event or a condition.
Alarms related to measurements or calculations that fall outside the configured
high or low limit for that variable.
From SCADA, the most important information flows to the MES system, and vice versa. At
this level, the information provided from SCADA can be the total amount of a batch, the
timestamp of its start or stop, quality samples, and so on. The information received from
the MES might be the recipe to use, a cleaning cycle to be carried out, the kind of product or
part to be produced, on which equipment, and so on.
From the data collection perspective, the data sources of the I-IoT are the following:
Sensors, RTUs, CNCs, single pieces of equipment, or stations that are driven by an
embedded controller cannot act autonomously as a data source for the I-IoT gateway. This
is because they have too many connections to isolated networks that are managed by legacy
protocols. Moreover, their data is mostly collected and made available by the PLCs and
DCS, so there is no need to connect directly to them.
PLCs and DCSes in the control network are managed by fieldbus protocols, such as
Modbus, Profibus, and ControlNet. Fieldbus protocols are specialized industrial network
protocols that ensure determinism in the packet delivery to the devices that are configured
as belonging to that network. PLCs and DCS manage all data coming from the sensors and
devices of the lower levels according to the resolution of their scan time. Control networks
are isolated networks that are protected by a firewall. They have a few well controlled ports
that are open to the upper levels by means of the firewalls.
[ 49 ]
Understanding the Industrial Process and Devices Chapter 2
SCADA and Historian are in the corporate network and are based on Ethernet, so they are
not in an isolated and specialized network, and therefore don't need specific network cards
or boards to communicate with the other systems. SCADA and Historian do not collect all
the data that is made available by the PLCs to which they are connected. Instead, they
collect the most important data, and that which has a lower frequency.
In any case, whatever the device that is acting as a data source is, we need to have a
software layer to interact with the industrial data source through its own protocol for
querying tags and time-series and exposing them by means of a standard interface to the
upper levels and external systems. This software layer, which acts as a translator between
the controllers and the producers of the industrial data and the consumer systems, is the
Open Platform Communications (OPC). The OPC is a standard that is used to abstract
PLC or DCS-specific protocols (such as Modbus or Profibus) into a standardized interface.
This allows SCADA and other systems to interface with a middle-man which converts
generic OPC read or write requests into device-specific requests and vice versa.
Through the OPC, the PLC or the DCS can expose tags, time-series, alarms, events and
Sequence of Events (SOE) to any system that implements the OPC interfaces. In the next
chapters, we will look at how the OPC standard has changed from its first specification in
1996, when it was called OLE for process control, to the current specification, which is now
called OPC. It has overcome various challenges related to security, data modelling, and
performances in the manufacturing systems. Since their beginnings, OPC specifications and
the related interfaces have enjoyed considerable success, becoming over the course of few
years the de facto standard to expose industrial application data and make it available to
the external systems.
SCADA and Historian, which were initially developed to be consumers of industrial data,
therefore implementing an OPC client interface, themselves became producers of industrial
data, therefore so implementing an OPC server interface and playing both roles at same
time. The quality and the performance of the OPC technology had various issues and
limitations at the beginning. Even so, the adoption and the spread of the OPC in the
industrial world was considerable. This was because of the significant need of all the actors
involved, which includes vendors, end users, and integrators, to exchange data between
applications belonging to the automated production system in a standardized and reusable
way. OPC interfaces allow you to query for time-series and data cyclically, both on
changes, and through subscriptions with dead bands and a scan time typically from one
second. Cyclical data collection is also called in polling or just polling, on-change data
collection is often referred as unsolicited, and the publish/subscribe mechanism is also
known as subscription.
[ 50 ]
Understanding the Industrial Process and Devices Chapter 2
The OPC standards include two main specifications for OPC interfaces: the so-called OPC
Classic and the OPC Unified Architecture (UA). These specifications have no compatibility
with each other.
Implement OPC client interfaces (both OPC Classic and OPC UA) to gather the
data from the industrial data sources, which include PLCs, DCSes, SCADA, and
Historians
Have OPC Proxy functionalities to manage indifferently the incoming data
requests following both OPC Classic and OPC unified interfaces and addressing
them accordingly
Implement bi-directional communication to the cloud through VPNs, SSLs, or
cellular lines through the most common internet application protocols such as
Message Queuing Telemetry Transport (MQTT), Advanced Message Queuing
Protocol (AMQP), and Hypertext Transfer Protocol Secure (HTTPS) to:
Transfer the data to the cloud through the IoT Hub
Receive commands and configurations from the IoT Hub
Implement a store and forward mechanism to guarantee the data transfer along
the communication channel in case of a poor, intermittent, or overloaded
connectivity
Expose centralized functionalities available from the cloud. These include the
following:
Setup and monitoring of the device
Data acquisition parameters and configuration deployment
Software patches and software upgrade deployment
Be available both as a physical and a virtual appliance
Be multi-platform (at least on Windows and Linux)
Be flexible and scalable to support the collection and transfer of anything from
several hundred tags to dozens of thousands of tags
[ 51 ]
Understanding the Industrial Process and Devices Chapter 2
Capturing, storing and processing of the buffer with high frequency data (from 1
Hz to 20 kHz)
Implementing drivers for the most common control bus protocols (PROFINET,
EthernetIP, and Modbus)
Implementing an analytics engine to detect anomalies and/or locally processing
high frequency data related to the transient or high dynamic phenomenon
Implementing an Advanced Process Control (APC) to analyze the process and
optimize it autonomously, sending the suggested outputs to the controllers
Implementing a Trusted Platform Module (TPM) to improve cybersecurity
IoT Hub: This is the dispatcher of data and the manager of devices. It checks
security and dispatches data to the right data processors (storage, analytics, or
queue). Normally, it is implemented with a multi-protocol gateway, such as
AMQP, HTTPS, or MQTTS, and a message broker.
Time Series Database (TSDB): This is the centralized database in which events
and data points acquired from sensors are stored.
Analytics: These work with data to extract anomalies, the machine's health, the
efficiency, or generic key performance indicator (KPI). Analytics can work either
in stream mode or in micro-batch processing mode. Normally, we use simple,
stream-based analytics to evaluate simple rules, and machine-learning analytics
or physics-based analytics for more complex analytics working in micro-batch
mode.
Asset registry: This supports additional (static) information, such as the model of
the machine being monitored and the operational attributes. This might include
the fuel used, the process steps followed, or the machine's status.
Data lake: This is normally used to support raw data, such as images or log files.
Sometimes, it is used to offer storage support for events and measures (time-
series). In the data lake, we normally store the outcome of the analytics.
[ 52 ]
Understanding the Industrial Process and Devices Chapter 2
Object storage: This stores additional information for large files or document-
based data. Object storage can be implemented using the data lake.
Big data analytics: These are not necessarily within the scope of the IoT, but
sometimes we need to run big data analytics over the entire fleet. Alternatively,
we might be using a huge amount of data to carry out business analysis.
Summary
In this chapter, we have looked at what an industrial process and an automated system
are. We have identified the physical process, the control systems, and their interactions. The
physical process is a combination of operations that act on entities that belong to the
physical world and change some of their characteristics. The control system receives the
information on the status of the process, elaborates it, and performs the required actions on
the physical process. A more rigorous definition of process control and data acquisition
systems has been also provided. We have identified and defined the different entities
involved, which included devices, resources, data, events, and the interface.
Following this, we learned about the different kinds of industrial processes there are,
distinguishing between continuous, semi-continuous, discrete, and batch processes. We
also looked at which activities support the production cycle, including business, design,
planning, and control. After that, we introduced the CIM model, which is a hierarchical
structure that integrates the production processes, the automation systems, and the
information technology systems at a company or enterprise level. We analyzed in detail the
hierarchical structures of the CIM pyramid, acknowledging its levels and their aggregation
from different perspectives. We then learned what the main industrial devices and
networks are in each level of the CIM pyramid. From this, we gained an understanding of
which devices act as data sources in the I-IoT data flow.
Following on from this, we analyzed the data flow of industrial signals, from their
generation by the sensors, to when they are processed in the cloud. Data acquisition, data
sources, the related protocols, and the edge device for pushing the data into the cloud will
be analyzed and discussed in Chapter 3, Industrial Data Flow and Devices and Chapter 4,
Implementing the Industrial IoT Data Flow. In Chapter 5, Applying Cybersecurity, in ICS and IP
in the Industrial I-IoT will both be discussed in full.
[ 53 ]
Understanding the Industrial Process and Devices Chapter 2
Questions
1. Which is the component that carries out the actions on a physical process?
1. The sensor
2. The actuator
2. What is a device, according to the IEC 61131 standard definition?
1. A device is defined as an independent physical entity able to
implement one or more functionalities
2. A device is a logical breakdown of a software or hardware structure
3. What is an event, according to the IEC 61131 standard definition?
1. An event is a representation of a fact in a suitable format for the
communication and elaboration by the resource
2. An event is an occurrence of a specific condition, such as the reaching
of a definite temperature
[ 54 ]
Understanding the Industrial Process and Devices Chapter 2
Further reading
Read the following articles for more information:
[ 55 ]
3
Industrial Data Flow and
Devices
In this chapter, we'll take a close look at the equipment, devices, protocols, and software
layers that manage industrial data along its path in the factory. We'll start from the sensors
and track the path of data until we reach the edge, which is the external boundary of the I-
IoT data flow inside the factory. We'll look at how industrial signals are generated and how
they are managed, exchanged, and enriched by passing through the different levels of the
CIM pyramid. We'll also consider what the main OT devices that are involved in each level
are. The reader will learn about analog to digital conversion and vice versa, sensors and
actuators, remote terminal units (RTU), embedded controllers, programmable logic
controllers (PLCs), distributed control system (DCS), Supervisory Control and Data
Acquisition (SCADA), Historian, industrial protocols, and fieldbus. We would require a
specific chapter for each system in order to explain them exhaustively, so instead we will
focus on analyzing them in this chapter from the point of view of the I-IoT data flow. We'll
briefly highlight their main mechanisms, but we'll explain in detail how they are important
and relevant from the perspective of data acquisition and collection. Acquiring industrial
data from sensors and controllers is one of the most critical processes in I-IoT and comes
with its own specific challenges and constraints to overcome.
Technical requirements
You will need to have read the previous chapters of this book to understand the concepts
presented in this chapter. Knowledge of the following topics is also necessary:
[ 57 ]
Industrial Data Flow and Devices Chapter 3
We will emphasize here that this logical schema is not a complete picture of all of the
equipment, devices, networks, and protocols that we have in an industrial plant—nor is it
the only possible representation of their interactions and interconnections. As we
mentioned in previous chapters, there is no such precise and clear-cut separation of
functionalities and devices between CIM levels in the real industrial world. Some devices
placed in one particular level can be deployed to an upper or lower level with an enlarged
or restricted scope. This is even more true nowadays, with the progress of this technology
providing additional capabilities and more computational resources to devices and
allowing them to enlarge their initial scope of functionality. Let's take a look at a few
examples to understand this better:
Sensors and actuators: Because these exchange very little information, they are
traditionally placed in level 1. Over the past few years, however, smart sensors
have been introduced with processing and computing capabilities to exchange
and provide much more information.
PLCs: These are generally placed in level 2. However, we can also have powerful
PLCs in level 3, acting as control coordinators and data collectors. We can even
have micro PLCs between levels 1 and 2 to control a working cell or to drive a
robot or a complex piece of equipment such as a rotating machine.
SCADA systems: These are located in level 3. We can also have a local Human
Machine Interface (HMI) in level 2 to supervise working cells that are
cooperating to carry out a specific task locally. SCADA systems have been
evolving over the last few years, covering tasks such as tracking, maintenance,
and production planning. These are traditionally covered by the MES system in
level 4.
Historians: These are placed in level 3, but we often have a time window
snapshot that pushes the time series to a larger replica database that is located in
level 4.
Cyber security is not covered in this schema, except for a generic firewall that is placed
between the factory and the edge that is exposed to the internet. We will learn how to
secure the I-IoT data flow in Chapter 5, Applying Cybersecurity.
The preceding diagram focuses on the I-IoT data flow. It shows the devices that act as data
sources and highlights the network channels and protocols that allow the data and the
information to pass along the devices and the layers of the CIM pyramid until they reach
the edge and are transferred to the cloud. Let's start by looking at where the digital signals
are generated.
[ 58 ]
Industrial Data Flow and Devices Chapter 3
Set point (SP): This is the reference and the desired value for a specific process
variable
Controlled value (CV): This is the calculated value for the control variable to fit
the reference value
Process value (PV): This is the measurement of the process value that comes
from the process:
Let's think about a very simple example. When you turn on your oven, the SP is the
temperature that you want to set. The CV is the value of the electric current that is feeding
the heating elements of the oven, and the PV is the real temperature of the oven that is
measured by the sensor.
[ 59 ]
Industrial Data Flow and Devices Chapter 3
Sensors
As we outlined in the previous chapters, the measuring device is actually made up of two
main components:
The sensor, which transforms the variable to be measured into the type of
variable that is required for measurement
The transducer, which accepts information in the form of a physical or chemical
variable and converts it into a variable of a different nature (typically electric)
Very often, the sensor and transducer coincide in the same physical component, and this is
why the words sensor and transducer are often used interchangeably to indicate a device
that measures a quantity and provides a signal, typically electrical, as an output.
A sensor is based on the identification of a physical law that binds two variables of a
different nature, such as temperature and resistance or speed and potential voltage.
Therefore, there are many different types of sensors. You can distinguish between them
according to the type of output supplied:
Transduction: This describes the relationship between the input variable (the
variable to be measured) and the output quantity. This relationship can be of
direct proportionality, at least in the range specified for the input quantity. It is
therefore either characterized by a proportionality constant, or it is non-linear.
Bandwidth: This information is of fundamental importance in the controller
project, as it gives an indication of the maximum frequency at which the
transducing characteristics of a linear sensor do not degrade. If this is much
higher than the maximum frequency that is needed for a project, which is usually
the frequency needed by the closed loop to be able to follow the dynamics of the
process, the sensor is simply characterized by its own constant proportionality
between the input and the output. Therefore, there is no need to consider a
dynamic model.
[ 60 ]
Industrial Data Flow and Devices Chapter 3
Dynamic features: These features are usually defined against the step response
of the sensor. They include delay, rise time, time to the half value, overshoot, and
settling time.
Input range: This refers to the range of variation of the input quantity for which
the sensor guarantees certain properties, such as linearity, accuracy, or ace.
Output range: This is the range of values that the output can be.
Sensitivity: This is the ratio between the output and the input variation at a
steady state.
Resolution: This is the minimum variation of the input that produces a variation
in the output and is therefore captured by the sensor.
Linearity: This is usually defined as a range of input values for which the sensor
has linear behavior with proportionality between the input and the output.
Deterministic error (offset): This is the systematic error that is produced at every
measure. If known, this can usually be eliminated through sensor calibration.
Probabilistic error: This is a random error, which may depend on multiple
factors.
Accuracy: This term defines the maximum error that the sensor can make in a
measurement operation. It is not to be confused with the resolution. Accuracy
can be thought of as the sum of the deterministic and probabilistic errors.
Precision: This is defined as the variance between several measures. In the
following diagram:
Archer a has a high distance from the target (offset) and poor
accuracy. Therefore, they have a poor accuracy overall.
Archer b has high offset and good accuracy. Therefore, they have a
poor accuracy overall.
Archer c has low offset and poor accuracy. Therefore, they have a
poor accuracy overall.
Archer d has low offset and good accuracy. This means that they
are accurate:
[ 61 ]
Industrial Data Flow and Devices Chapter 3
Hysteresis: This means that the output signal for a specific input can take several
values, depending on whether the input variation is negative or positive.
Drift: This can be defined as the variation of the output over long time periods
when a constant signal is applied to the input.
Electrical output impedance: This is useful to deal with problems related to
interfacing. It's better to have a low impedance value since this helps to couple
the sensors to the device.
Loading effect: This refers to the fact that the introduction of a sensor to a piece
of equipment to measure a variable can alter the operation of the equipment
itself.
Noise: A sensor can generate noise in the output, which can corrupt the detected
information.
[ 62 ]
Industrial Data Flow and Devices Chapter 3
The converters
By conversion, we are referring to the transformation of some parameters of an electrical
signal, while keeping the quantity of information from the signal the same. Converters are
essential in control systems because they are present in any data acquisition and processing
chain. Remember that an analogical signal can assume all of the values within a range,
while a digital signal can assume only a finite number of values.
Digital to analogical
A digital to analog converter (DAC) acts as an interface between the numerical values that
are generated by a calculation system in discrete time instants and the continuous time
signals of the analogical world.
A DAC takes digital words as input and generates voltage or current values as output. It
therefore carries out a transduction operation. If the converter memorizes the input word
through an internal register, it must also directly carry out the so-called zero-order
hold (ZOH) operation. If not, the register must be preempted in the DAC when it interfaces
with a calculator, even if it is not actually included.
The digital input word and the reference voltage control the output of a DAC according to
the following relation:
Here, Vout is the voltage in the output, K is the gain, Vfs is the full scale value in the output, a1,
a2, . . . an are the n bits of the binary world in the input, and Voffset is the offset voltage.
The output can be made available as a potential voltage or a current value, depending on
the converter. The input word is generally handled as a binary fraction that has the decimal
point on the left. a1 is the most significant bit (MSB) and an is the least significant
bit (LSB).
[ 63 ]
Industrial Data Flow and Devices Chapter 3
If K = 1 and Voffset =0 , the potential voltage in the output can also be represented as follows:
Here, N is the decimal value of the word in the input to the DAC. For example, the word
10001001 would be handled as the binary number 0.10001001. Its decimal value, therefore,
is as follows:
Alternatively, you can consider the decimal representation of the input word. 137 is the
decimal value that corresponds to the binary value 10001001, while 256 is the decimal value
that corresponds to the binary value 11111111.
The reference voltage controls the full-scale value Vfs of the converter (or Ifs, if the output is
in the current). Typical values of the potential voltage are 2.5, 5.0, 10.00, and 10.24 volts,
which coincide with the required reference voltages. For outputs in the current, a typical
value for the full scale is 20 mA.
The constant K represents the gain of the converter and is typically equal to 1. The offset
voltage represents the output value when the digital input is zero and must be set to zero.
The resolution of a DAC is defined as the smallest change in output that is possible. It is
determined by the number of bits of the input word and by the full-scale value of the
voltage according to the following relation:
[ 64 ]
Industrial Data Flow and Devices Chapter 3
Here, n indicates the number of the bits of the converter. For example, a 12-bit converter
with a full-scale voltage of five volts has a resolution of 1221 mV. This applies to mono-
polar DACs, where the output is always positive. There are also bipolar DACs, in which the
input word is coded to represent negative values. In general, the relationship may not be
linear because the input-output couples are not on a straight line, as shown in the following
diagram:
For this reason, some errors are defined against the ideal relationship that is represented by
a straight line. These include the following:
Error of linearity: This is defined as the maximum deviation of the output value
from that of an ideal converter. It is usually expressed as a fraction of the LSB or
as a percentage of the full scale.
Error of differential linearity: When the input word changes by one bit, the
output should vary by one LSB. If this does not occur, the differential linearity
error is defined as the amplitude of the maximum difference between the actual
increase of the output when the input varies of one bit and the ideal increase.
Error of integral linearity: For a given input word, the error of integral linearity
is defined as the sum of differential linearity errors up to the given word.
[ 65 ]
Industrial Data Flow and Devices Chapter 3
Monotonic: If the input word increases by one bit, the output must always
increase, even if by an incorrect value. This property is very important in a
control system. If it is non-monotonic, this represents a variation of the desired
phase, which can lead to unstable behavior.
Settling time: This is the time needed for the output value to follow a change of
the input word within a limited percentage error.
Glitches: Glitches occur in the output due to non-simultaneous variations of the
bits of the input word. The DACs on the market have specific circuits to
minimize the occurrences of this phenomena.
The performance of a DAC changes over time, with variations in the temperature and the
supply voltage.
Analog to digital
An analog to digital converter (ADC) acts as an interface between the continuous values of
the analogue world and the digital numerical values used in a calculation system. An ADC
accepts an unknown analogical signal (typically, a voltage) and converts it into a digital
word with n bits, representing the ratio between the input voltage and the full scale of the
converter.
[ 66 ]
Industrial Data Flow and Devices Chapter 3
Generally, ADC converters include, or must be preceded by, a sampling and holding circuit
to avoid changes in the input voltage during the conversion operation. The input to output
relationship for an ideal three-bit monopolar converter is shown in the following diagram:
The output assumes that the encoded values are between 000 and 111 when the input varies
between zero and Vfs. The amplitude of the code transition of the least significant bit
at the values of .
The characteristics of an ADC can be expressed in the same way as the DAC. If an
analogical voltage is applied to the input, the output of the ADC is an equivalent binary
word:
[ 67 ]
Industrial Data Flow and Devices Chapter 3
The approximation (≈) is necessary since the input voltage is reproduced from the binary
word according to the converter's resolution. An alternative formula to calculate the binary
word in output is given by the following equation:
The binary code must be steady for a change in the input voltage that is equal to 1 LSB. For
an ideal converter, this means that when the input voltage increases, the output code
should first underestimate the input voltage and then overestimate it. This error, called
a quantization error, is represented in the following diagram:
Quantization error
[ 68 ]
Industrial Data Flow and Devices Chapter 3
The transitions of the encoded values in an ideal converter should be on a straight line. This
does not always happen, and we can define differential and integral linearity errors just like
we did for the DAC. Each increment of the encoded values should correspond to a
variation that's equivalent to the LSB of the voltage in input. The differential linearity error
is the difference between the real increase of the input that causes an increase of one bit and
the ideal increase corresponding to the LSB. The integral linearity error is the sum of the
differential linearity errors.
Analogic signals: These can assume all of the values within a range so that their
values change continuously with time
Quantized signals: These can assume a limited number of values, according to
the resolution
Sampled signals: These are analogical signals that are evaluated in specific time
instances and separated by the sampling time
Digital signals: These are quantized and sampled signals that are coded as
binary numbers
Actuators
In typical process control applications, the acquisition of the process variables to be
controlled and the processing of the control laws is carried out through analogical or
numerical controllers that work on a low power signal level. The variable that is processed
by the controller and sent to the process to carry out the necessary corrective actions is
handled in the same low power representation. The process, on the other hand, may require
high levels of power to be controlled. Examples of processes that require high levels of
energy include thousands of cubic meters of a liquid or the forces of hundreds of thousands
of Newton, such as in a metallurgical rolling process.
The function of the actuator is to convert low energy control signals into actions of a power
level that is suitable to the process to be controlled. This function can be considered as an
amplification of the actuation signal generated by the controller. In many cases, the signal is
also converted and then transduced into a completely different physical dimension.
[ 69 ]
Industrial Data Flow and Devices Chapter 3
A final control element carries out the necessary steps to convert the control signal
generated by the controller into actions performed on the process. To change the flow rate
of a fluid from 10 to 50 cubic meters per second, for example, a typical control signal of
between 4 and 20 mA requires some intermediate operations. The necessary operations
may depend on the process and the controller, but they can be summarized in the following
diagram:
The first operation is the Signal Conversion. This represents the modifications
that must be implemented on the control signal so that it is compatible with the
actuator. If, for example, the controller is numeric, and the final control element
is a valve, the actuator may be a direct current motor. In this case, the signal
conversion will be performed by a DAC and a power amplifier. The output of the
conversion produces a transduced and amplified signal, which establishes
the control variable of the process. This effect on the control variable is usually
carried out by a physical device that is related to the process, such as a valve or a
conveyor belt.
The Actuator can be defined as a device that converts the control signal into an
action on the control component, not on the process.
Finally, we arrive at the Final Control Component. This device has a direct
influence on the dynamic variables of the process and is an integral part of the
process. If a flow rate must be controlled, for example, the last control element
might be a valve that is part of the fluid transport system. If a temperature must
be controlled, the last control device will have a direct influence on the
temperature.
Controllers
We can define controllers as electronic devices that receive digital signals and process them
according to a user-defined program for generating other digital signals in the output in a
predefined time interval. In this section, we will explain the main types of controllers used
in industrial plants. These are microcontrollers, PLCs, and DCSs.
[ 70 ]
Industrial Data Flow and Devices Chapter 3
Microcontrollers
A microcontroller is a programmable electronic device that is based on an integrated chip.
It includes many of the components needed to make a control system. It can perform
different functions autonomously, according to the program that's been implemented. From
a computational point of view, a microcontroller is very similar to a microprocessor in that
it has a Central Processing Unit (CPU) and executes control program instructions in data
memory. It differs from a microprocessor, however, in its integrated functionalities. The
main differences between microprocessors and microcontrollers are highlighted in the
following table. The acronyms that are used are as follows:
Microcontroller Microprocessor
In microcontrollers, the management program is Microprocessors run the program that resides in an
inside in a special non-volatile memory area (ROM, external memory area that is modified according to
EPROM, or FLASH). the CPU commands.
[ 71 ]
Industrial Data Flow and Devices Chapter 3
The microcontroller follows the von Neumann architecture, in which the memory is unique
to the system and the data. Compared to a control system architecture built with
microprocessors and peripherals, an equivalent architecture based on microcontrollers
reduces a lot of the complexity of the circuit diagrams since many of the services such as
memory and I/O are included in the basic functionalities of the microcontroller. In a
microcontroller, all of the main units communicate with each other through a BUS made up
of 4, 8, 16, or 32 bits. This indirectly provides a measure of the capacity of the
microcontroller to process this information. A microcontroller like a microprocessor
recognizes only one type of language, called machine language. This has control
instructions that are written using the hexadecimal system, which is not very human-
readable. For this reason, a more intuitive language called assembler is used. The assembler
does not produce an executable file—just machine code.
Alarm systems
Automotive systems
Electrical appliances (such as washing machines, ovens, and dishwashers)
Distributor machines
Control systems (such as temperature, pressure, and liquid level)
[ 72 ]
Industrial Data Flow and Devices Chapter 3
Depending on the type and the number of integrated peripherals, the microcontrollers are
divided into the following groups:
Embedded microcontrollers
Microcontrollers with external memory
Digital Signal Processing (DSP)
Embedded microcontrollers
Embedded micro-controllers have the largest integration value. This is due to the following
reasons:
DSPs
DSPs are quite a recent development. Their power and computing capacity makes them
very similar to microprocessors in the following ways:
[ 73 ]
Industrial Data Flow and Devices Chapter 3
Generally speaking, they implement a small part of more complex systems. They
act as interfaces for the output of these systems.
They are used in digital controls.
PLCs
PLCs originated in 1968, when General Motors United provided the desired features for a
new generation of controllers to be used in their production facilities. Currently, a PLC is
based on a multiprocessor system, with network integrated abilities that are able to perform
very complex functions. It is based on almost the same technologies as a usual computer,
but adapted to the control of industrial processes. A PLC consists of several main
components—a processor, I/O, a power supply, a remote I/O, a fieldbus, PID, a servo, and
an encoder. It also contains other components that are available for specific uses as
independent modules. The cabinet or rack contains and encloses all of the modules,
ensuring the first deep integration from the pure mechanic world and the
electronic world. There are some PLCs that are not modular but instead contain all of the
fundamental components in a single device. The standardization of PLC programming
languages from the IEC 61131-3 standard has greatly contributed to making PLCs more
open and less tied to a specific vendor. The IEC 61131 standard provides the following five
PLC programming languages:
The first three are graphical programming languages, while the last two are textual
languages. The most common PLC vendors in the market are Siemens AG, Rockwell
Automation, Mitsubishi Electric Corporation, Schneider Electric, and Omron Corporation.
In the following sections, we will look at some of the main modules of a PLC.
[ 74 ]
Industrial Data Flow and Devices Chapter 3
Processor module
The processor module is at the heart of the PLC system. It contains a board with one or
more microprocessors that executes the operating system programs, the programs
developed by the user, the memory where these programs are stored, as well as all of the
components necessary for the operation. The processor module, as implemented by the
operating system, provides a cycle that is composed of the following sequence of
operations:
1. Updates the input memory area with the values coming from the physical inputs
2. Executes the user program by operating on the values of the memory and
always keeping the results in the memory
3. Executes the system management programs, for example, for diagnostics
4. Writes the physical outputs of the values that are stored in the reserved memory
area
The preceding cycle optimizes the communication of the input and output modules and
guarantees that the stored values of the inputs remain unchanged during the execution of
the programs. The reading of the inputs and the writing of the outputs is managed entirely
by the operating system and not by the control program. The most common real-time
operating systems are Microware OS-9, VxWorks, QNX, and Linux.
An important parameter of the processor module is the scan time. This is defined as the
time interval that elapses between two consecutive activations of the same portion of the
program application, thus comprising the time necessary to update the inputs and outputs.
The scan time must not be confused with the response time of the system, which is defined
as the maximum time interval between the detection of a certain event and the execution of
the related programmed action. The response time also takes into account the delays
introduced by the input and output modules.
[ 75 ]
Industrial Data Flow and Devices Chapter 3
Network module
It is much more convenient to connect all of the sensors to the PLC through a common
network or bus rather than to connect each sensor through dedicated wiring. Industrial
networks are specialized networks that ensure determinism in the message delivery
process. The fieldbus modules manage the communication protocols for the various types
of industrial networks, which include fieldbus, proprietary networks, and Ethernet.
[ 76 ]
Industrial Data Flow and Devices Chapter 3
Other modules
There also are additional modules for specific purposes. These include the following:
DCS
In the field of industrial automation, DCS represents the most widely adopted solution for
large continuous plants, including refineries, energy production plants, paper mills,
glassworks, and chemical plants. The characteristics of the DCS include the possibility of
distributing modules for data acquisition, processing and control, and a communication
network that is as efficient as possible between the various subsystems. Another important
feature of DCS is its ability to change the network topology and add and remove modules
on the fly with the system running. DCS can also perform functions that are both normally
implemented on PLC and on SCADA in an integrated way.
The typical architecture of a DCS is made up of several levels. Starting from the bottom, we
can find the interfaces to the field, consisting of the appropriate electronic acquisition
boards (inputs) and commands (outputs). At this level, there are also the communication
interfaces for the most common fieldbuses, through which the information is exchanged
with the transmitters and actuators that support the same type of protocol.
[ 77 ]
Industrial Data Flow and Devices Chapter 3
Through the I/O bus, the values of the inputs and outputs are conveyed to the controllers
and processed according to the control strategies and the logic designed for the specific
application. The software is developed through the implementation of function blocks,
which therefore constitute the so-called system database. Each of these is labelled with a
uniquely defined name, commonly known as a tag. It is therefore necessary to associate
each database record to the relative tag and its fields to the different variables used in the
function block, normally called items. A PID controller, for example, normally represents a
database tag. Its items are the process variables, the SP, the control variable, the alarm
thresholds, the proportional, integral, and derivative parameters, and so on.
Supervisors are above the controllers. Nowadays, they are usually implemented by
commercial off-the-shelf (COTS) operating systems. They act as an interface for the
control room operator by accessing the tags and the database items, and consist of
numerous screens through which a graphic representation of the system and the process is
developed. This is one of the main differences between a DCS and a PLC-SCADA system.
PLC and SCADA are two separate systems, each making use of their own variables and
data structures, and exchanging data through a suitable communication driver, sometimes
making use of a proprietary protocol. In the DCS, however, the database is unique, shared,
and distributed between controllers and supervisors, each of which takes control of its
processing tasks.
In general, a PLC controls one or more machining processes, while the DCS controls the
entire plant. Typically, a DCS is used when:
In any case, the differences between the PLC and DCS have been reducing over the years.
The most common DCS vendors on the market are ABB, Yokogawa, Honeywell, Emerson,
and Siemens.
[ 78 ]
Industrial Data Flow and Devices Chapter 3
Industrial protocols
The reference model for computer networks is the Open Systems Interconnection (OSI)
model, which was developed by the International Organization for Standardization
(ISO). It is a conceptual model to which the computer networks, makers can refer to rather
than a precise description of a real network. In the ISO/OSI model, every node of the
network can be sketched as a hierarchical structure of seven levels, as shown in the
following diagram:
[ 79 ]
Industrial Data Flow and Devices Chapter 3
The first level, starting from the bottom, is level 1, or the physical level. This
includes the mechanical and electrical connections between the nodes, together
with the software drivers for the communication ports. The features of this level
establish some parameters of the network, such as the speed and the
transmission mechanisms.
The second level is the data link. Like all of the following levels, it is
implemented in the network software of each node. This level is responsible for
the exactness of the flow of the structured sequences of bits that are exchanged
between the nodes. These are also called frames. These frames are composed of
the information to be transmitted, plus a control code that is added at the
beginning and checked upon arrival. If an error occurs, an attempt to restore the
right sequence is carried out using the control code that was added. If the
attempt fails, the re-transmission of the corrupted sequence of bits is requested.
The primary responsibility of this level is to ensure an error-free connection
between the nodes.
The third level, the network level, guarantees a logical path between two nodes
that are not directly connected. This level defines the characteristics of the
network from a management perspective, such as the node addresses, the
regulation of the accesses, and the treatment of the collisions. In summary, this
level implements the routing and the interconnections between the nodes.
Level 4, the transport level, independently implements the transport functions of
the network structures of the underlying lower levels, ensuring the transfer and
the integrity of the messages and their re-transmission, if necessary. It acts as an
interface between the network and the application software of the next three
levels.
From level 5, the session level, we enter the user area. This level manages the
ordinary data exchange and the synchronism between the nodes, making the
remote connection possible. This level therefore includes the functionalities
necessary to support the exchange of information between different machines.
Level 6, the presentation level, is the level where the information is encoded and
vice versa. The sequences of the binary data are related to their meaning and
represented in the form of texts, figures, and so on.
Level 7, the application level, provides interfaces and services to application
programs, such as file transfers between nodes, distributed databases, web
applications, and so on.
[ 80 ]
Industrial Data Flow and Devices Chapter 3
The real physical connection between the nodes only exists at the first level. For
all other levels, there is a virtual connection.
Each level is responsible for a defined set of functions such as coding,
fragmentation, or routing. These are implemented by specific entities or
functional blocks.
Each level can interact with the level below and above on the same node. Each
level can also interact with the same level on other nodes.
In passing the sequences of bits (data packets) from one level to a lower one, a
piece of information is added to mark and identify it. This is known as a Protocol
Data Unit (PDU).
Each level adds its own PDU to the information received from an above level and
removes it if the information comes from a level below. This is known as an
encapsulation technique.
The main features that differentiate these computer networks are their topology, the
transmission media, and the mechanism or protocol for accessing the network. The
topology defines the physical schema of the network, or how the different nodes are
connected to each other. The following are some examples of topologies:
An example of a topology is the bus or open loop. This topology does not have
any problem adding or deleting nodes, since the information moves
independently, but there is only one possible route between the nodes.
Another example is the ring or closed loop, in which the messages cross the
nodes. In this case, the latter must be able to identify which node is the receiver
of the message and eventually send it back.
A further example is the star topology, where a primary node is connected
directly to all other nodes. All messages have to go through the primary node.
Transmission media defines the physical support through which the information flows. Its
characteristics obviously have a direct influence on network performance. The simplest
transmission media is the twisted pair, which can either be shielded or not. Other
transmission media include the following:
Coaxial cable: This is more difficult to install, but it does support multiple
modulated transmission channels at different frequencies due to its bandwidth.
[ 81 ]
Industrial Data Flow and Devices Chapter 3
Optical fibre: This is the best transmission medium in terms of its transmission
rate, immunity to noise, and the possibility of having multiple channels inside
the same physical media. However, it is more expensive and its installation is
more hardworking.
Powerline: These use low powerlines from between 230 and 400 V to transmit a
frequency modulated signal. They can reach speeds of a few hundred KBps, but
there is often noise on the network.
Wireless networks: These use infrared frequencies and industrial field radio
frequencies. They do not require wiring, but they suffer from interference
problems and noise.
The access method defines how the nodes can access the network and avoid transmission
conflicts. Some access methods are explained as follows:
[ 82 ]
Industrial Data Flow and Devices Chapter 3
Automation networks
Looking at the CIM model, we can understand how important information being
exchanged between devices on the same or different levels is. In the lower levels, we have a
large amount of simple information that is exchanged within small and determined time
intervals. At higher levels, we need to transfer less information that is more complex and
within longer time intervals that are not necessarily determined. In an integrated
production system, different types of IT networks are needed, each tailored to the task to
which it is dedicated. We can identify three categories of networks—information network,
control network, and field network, as follows:
[ 83 ]
Industrial Data Flow and Devices Chapter 3
Control network: This must ensure the communication between the devices
dedicated to the control and supervision of the plants. The information that's sent
is not very complex, but the transmission has to be within determined time
intervals and at higher frequencies. The control networks are usually developed
by the PLC makers, and their implementation is proprietary. This means that the
other devices in these networks can talk with the PLCs through specific network
cards and programs for interfacing. For these networks, the first two levels of the
OSI model are typically based on a token to ensure the determinism in the
transmission. The software for network management is usually integrated into
the system provided by the PLC maker, which allows the user to configure the
network by configuring the physical connections between the nodes by assigning
a logical address to each of them. Over the last few years, the market has been
driving a standardization process for the control networks through formal or de
facto standards and also using Ethernet for the first two levels of the OSI model.
Field network: The networks dedicated to the field or fieldbuses, as they are
commonly referred to, have been introduced to act as interfaces for
the smart sensors and actuators to the control devices, as they have basic
computation capabilities. We will discuss fieldbuses further in the next section.
The fieldbus
Field or fieldbus networks are computer network implementations for connecting control
devices, such as PLCs, and devices on the workshop or field, such as sensors or actuators.
In a typical scenario, sensors and actuators are connected to the controller either directly,
through the input or output modules of its cabinet, or through a serial line from the remote
input and output cabinets. Using a field network, however, such devices become the nodes
of a computer network and so they must have some processing capacity to be able to
communicate through the network.
These networks differ from the preceding networks because of the following points:
[ 84 ]
Industrial Data Flow and Devices Chapter 3
These characteristics lead to the creation of lean networks, where just levels 1, 2, and 7 of
the OSI model are defined, leaving the intermediate levels empty. This reduced model of
the OSI model is called an Enhanced Performance Architecture (EPA) model, and is shown
in the following diagram:
They have a simplified architecture, since they are scalable and flexible.
They have less wiring, with a reduction in installation and maintenance costs.
They can handle more complex and bidirectional information, since the fieldbus
devices can operate signal processing locally. This allows the fieldbus devices to
do the following:
Linearize or convert the signals into engineering units
Send information about their state to the controller
Carry out close regulation loops
[ 85 ]
Industrial Data Flow and Devices Chapter 3
They have a local processing capability. This makes the response time less critical
since many of the necessary operations, including the closed-loop regulations,
are carried out by the device itself autonomously.
They can carry out calibrations of sensors and actuators through software from a
terminal connected to the network.
They can provide robustness in their transmissions, since digital transmissions
are intrinsically less sensitive to noise than analogical transmissions. They can
also implement techniques for the recognition and correction of transmission
errors.
Even if all of the devices that require networks such as these are located at the lower level
of the CIM hierarchy (on the field level), they need different functionalities. For this reason,
there are three different classes of field networks—the sensor bus operating at the bit level,
the device bus operating at the byte level, and the control bus operating at the level of
blocks of bytes. This is depicted in the following diagram:
[ 86 ]
Industrial Data Flow and Devices Chapter 3
Sensor bus: The sensor bus usually implements just the first two levels of the OSI
model, that is , the physical and data links. Its primary purpose is to reduce the
wiring; the typical length of a message is one byte. Usually, the devices that are
already available are connected to the network through a multiplexer. The most
common sensor buses are CAN, Seriplex, ASI, and LonWorks. An example of a
connected device could be a proximity sensor without internal diagnostics.
Device bus: This allows the transfer of messages that are up to 16-32 bytes in
size. In addition to the first two OSI levels, it implements some features
belonging to level 7, the application level. Device buses also allow us to send a
simple diagnostic. The most common device buses are CAN, DeviceNet, Profibus
DP, LonWorks, SDS, and Interbus-S. An example of a connected device could be
a photocell or a temperature sensor with internal diagnostic functions.
Control bus: This allows for the communication of blocks of bytes, which can be
up to several thousands of bytes. It implements levels 1, 2, and 7 of the OSI
model, as well as an additional level called the user level. The devices developed
for connecting to a control bus include predefined algorithms that are configured
through the network so that they can adapt them to a specific application. They
also include a real-time internal cache which checks and updates all data
continuously, making them available to all other devices on the network. An
example of a connected device could be a valve with an embedded flow
regulator and self-diagnosis and configuration capabilities. The most popular
control buses are WorldFIP, Profibus, Modbus, and DeviceNet.
[ 87 ]
Industrial Data Flow and Devices Chapter 3
[ 88 ]
Industrial Data Flow and Devices Chapter 3
Polling: The signal is sampled on a regular basis. The polling time is the time
interval between the sampled values of the same signal. Even so, the sample of
an analogical signal is only gathered by SCADA if its value changes by more
than a predefined dead-band compared to the value of the previous sample
collected.
Unsolicited: The data is gathered when its value changes. The data collection
interval is the shortest time interval in which a variation of the value of the signal
is caught by the data acquisition system.
The time taken for the communication channel and the related protocol to talk to
the device from which the data is being gathered
The performance of the driver in terms of how long it takes to refresh the data of
the device from which the data is gathered
The amount of data to be collected from the same driver and data source
Typical data acquisition intervals for SCADA applications are in a range of between half a
second and one minute. It is not very common for SCADA to carry out data acquisition
sampling in under half a second unless you have dedicated hardware specifically for this
purpose.
Client nodes are mainly dedicated to data visualization. The communication between the
server and the client nodes is typically based on the Ethernet protocol. The presence of
multiple server nodes makes it possible to process data in a distributed way.
[ 89 ]
Industrial Data Flow and Devices Chapter 3
The core of a SCADA system is the process database to which all other modules that make
up the system refer to. A small or medium sized control system typically handles
thousands of process signals. It adopts a common approach to process information based
on a set of parameters that are necessary for the correct management of the information.
These include the tag name, the description, the sampling time, the minimum or maximum
value, the engineering units, and so on. This piece of information is based on the asset
registry of the device to which the signals belong. The asset registry and the related asset
model are very important for industrial companies because they are used for measuring
performance, planning maintenance, and designing, setting, and running analytics. We will
explain these in detail later. The user interface of a SCADA system allows the operator to
access the tag name database and send commands to the system through graphic pages and
synoptic panels of either the whole system or parts of it. This is also called the Human
Machine Interface (HMI). Given the importance of the graphic representations that are
used in the operator user interface, SCADA systems provide a library of graphic objects
that allow you to develop complex synoptics easily.
SCADA also allows you to set thresholds and configure events and conditions for detecting
anomalies and generating alarms. Alarms are classified by type and priority.
Another important task that is carried out by the SCADA system is tracking production
sequences and volumes. It does this both for past data, by using the stored time series of the
values of the process variables, and for the current moment, through the real-time graphical
representation of the evolution of these values. The storage of this information takes place
either through specialized databases for managing time series (such as Historian) or
through relational databases that are managed directly by the SCADA application.
[ 90 ]
Industrial Data Flow and Devices Chapter 3
Historian
In the previous sections, we saw how data acquisition in SCADA systems starts from the
controllers (the PLCs or the DCSs) or RTUs gathering measurements from sensors and
equipment through industrial protocols. The digital representations of these measures are
usually called tags or datapoints. Each of these represents a single input or output signal
that is monitored or controlled by the system and usually appear as value-timestamp pairs.
After generation, the data can also be sent to other monitoring servers for analysis by
humans or to the MES system for planning and maintenance. At the same time, data often
feeds a specialized database for storing and managing times series. This specialized
database is called Historian, or data Historian. Historians are not relational or NoSQL
databases; they have fewer capabilities and features and a much simpler structure.
However, they are designed and optimized for managing situations related to times series.
They can acquire data from the controllers, ingest a huge amount of data, optimize storage,
and retrieve data quickly.
Up until a few years ago, the cloud was not a real option for databases such as this. As
Historians acquire data from controllers and require a high throughput in both input and
output, they need to be close to their data sources, and so they generally lived on premises.
Recently, however, a generation of cloud-native time series databases (TSDBs) have been
developed to store time series in the cloud. Over the next few years, they may well replace
on premise Historians, or, more likely, be strictly integrated with them. We may well have a
scenario in which the Historian on premises acts as a data collector and temporary storage,
and pushes all data to a cloud-native TSDB into the cloud. In the following few paragraphs,
we will just refer to on premises Historians.
[ 91 ]
Industrial Data Flow and Devices Chapter 3
Historians need to gather data from controllers. They implement software interfaces for the
most common industrial protocols and fieldbuses, including Profibus, Modbus, DeviceNet,
and OPC, and sometimes also legacy protocols for old plants or DCS. Connectivity to
industrial devices is one of their most important features. Some Historians implement just a
tenth of the industrial network protocols, while others, such as the OSI-PI, implement much
more. These interfaces implement the most efficient way to gather the data according to the
capabilities provided by the specific protocol. Data acquisition by polling is always
supported, as is the dead band mechanism. Unsolicited data collection depends on the
capabilities provided by the underlying protocol. In addition, the insertion of values that
don't come directly from the controllers but are instead calculated or derived by them can
be carried out through specific interfaces or APIs.
Historians provide fast insertion rates: up to tens of thousands of tags are processed per
second. These performances are made possible by a specific buffer area which keeps recent
values in memory, sorted by increasing timestamps, to later write the data on the disk.
Historians also provide a functionality called store and forward, which prevents data loss
during planned or unplanned network outages between the data sources and the Historian
server. To store large amounts of data with minimum disk usage and acceptable
approximation errors, Historians often rely on efficient data filtering or
lossless compression engines. The data filtering mechanism ignores the incremental
changes in values that fall within a deadband that is centered around the last reported
value. It only takes into account new values that fall outside the deadband and then centers
the deadband around the new value. Basically, it identifies and discharges repeating
elements.
The data compression mechanism is applied to all values that pass through the data
filtering mechanism. Its implementation depends on the specific Historian product, but the
basic idea behind it is to identify and discharge the redundant data to avoid losing content
from an information perspective. The data compression algorithms hold the last one or two
values in memory, building up a sort of swinging door and storing or discarding the values
based on the next value acquired. If the swinging that's achieved by replacing the hold
value with the last acquired value allows us to avoid losing pieces of information, the held
values are discarded. The loss of information related to the discharge of the hold values
depends on the specific Historian product, and it is a trade-off between the storage
allocation and the amount of information coming from the discharge. It can be tuned by the
user through specific settings. The values that are physically stored at the end are typically
called raw data.
[ 92 ]
Industrial Data Flow and Devices Chapter 3
Historians are fundamental in technical information systems, providing data for plant-
operating applications and business intelligence through time series-specific features such
as interpolation, re-sampling, or retrieving pre-computed values from raw data.
Interpolation is a very important feature that differentiates Historians from other types of
databases. The raw data is stored in the Historians according to the data collection interval
and depending on their variations. Tags that have the same data collection interval have
values that are stored in different timestamps. The same happens for tags that have their
data collection configured as unsolicited. This means that the signals that are related to the
same physical asset, for instance, a tank have their raw data stored with different
timestamps, meaning that if we need to get a snapshot of the status of that asset at a specific
time, most of its signals do not have raw data for that specific time. The value is provided
by Historian through interpolation. This involves different algorithms rebuilding the
missing data using the raw data and also providing an indication of how accurate the
interpolation is. Values that are not measured directly, including key performance
indicators or diagnostics, may be computed automatically according to the input flow and
then stored in the Historian archives.
Some Historians store time series following a hierarchical data model, which reflects the
operating environment. This data model refers to the asset model and is consistent
throughout the plant's organization to ease browsing and so that you can group similar
time series by subsystem. Sometimes, the asset model is a separate module within the
Historians application and other times, it is an external application or module that is part of
the MES system to which the tags are linked. In either case, the asset model is a
fundamental piece of the industrial data architecture, as we will see in the section on MES
later.
Visualization features are provided by standalone services or web applications that are
supplied with Historians. They facilitate a sophisticated exploration of the archived data by
displaying plots, tables, and time series trends that retrieve only the representative
inflection points for a considered time range, statistics, or other synoptics. Historians also
provide a proprietary SQL interface, a sort of SQL dialect that is used with legacy
extensions to support their specific features.
A simple schema structure, based on tags with no support for the transactions
An SQL interface to query the archived data, offering ad hoc features for time
series
[ 93 ]
Industrial Data Flow and Devices Chapter 3
From an architecture perspective, Historians can be split into two main categories:
Historians that use a legacy architecture for storing the data. In this category, we
have OSI-PI (OSIsoft), Proficy Historian (GE), and InfoPlus21 (AspenTech).
Historians that use a third party such as Microsoft SQL Server for storing the
data. In this category, we have Wonderware Historian (Schneider), FactoryTalk
Historian (Rockwell Automation), PHD (Honeywell), and SIMATIC IT Historian
(Siemens).
Very often, the companies who develop controllers offer a product suite where each
product covers a specific area of the plant's automation system, while at the same time
providing an integrated and unified interface to the users. This means that, in the same
suite, there are tools to develop, test, and download the control program, a SCADA system,
a Historian application, a tool for designing and maintaining the asset model, and other
tools and applications that make up the MES system.
[ 94 ]
Industrial Data Flow and Devices Chapter 3
The term MES refers to an information system that has the responsibility of managing and
controlling the production functions of a company. The main features of an MES system are
as follows:
Standard interfaces to keep the ERP, warehouse, and the field in sync according
to the ISA-95 and ISA-88 standards. This includes ERP orders, material, product
definitions, planned production schedules, production performances, and
material consumption.
Traceability, backward traceability, and a full identification of all of the materials
in the plant using radiofrequency identification and bar code scanners.
The management and execution of orders that are created manually within the
MES solution or that come from the ERP level. These include plant orders,
production orders, maintenance orders, transport orders, and so on.
Material data management and product definitions that are created
manually within the MES solution.
The integration of data that comes from multiple systems into a database
optimized for Online Analytical Processing (OLAP). This data might come
from Laboratory Information Management Systems (LIMSs), statistical
processes, SPC, ERP, or SCADA.
Monitoring of the production lines and equipment with the option to associate
the reasons of the interruptions of the production activities.
Compliance with standards and regulations and product quality.
Stock accounting, material inventory and consumption, and work progress in
terms of quantity and time.
[ 95 ]
Industrial Data Flow and Devices Chapter 3
MES provides information that helps the production managers understand which actions to
put in place to optimize and improve production. MES works in real time (from the user's
perspective) to keep the production process under control. This includes materials,
personnel, machines, and support services. Moreover, because you are able to capture data
from the control system, you can use MES to create an as-built record of your
manufacturing processes. This is particularly important in regulated sectors such as food,
beverages, or pharmaceuticals. MES is an intermediate system between ERP and SCADA,
also known as a process control system. The implementation of an MES system provides
the following advantages:
[ 96 ]
Industrial Data Flow and Devices Chapter 3
The ISA-95 equipment model is a hierarchy of various logical entities that are used to
model aspects of production. The general structure of the ISA-95 equipment model is
shown in the following diagram:
S95/S88 model
This model has a number of important characteristics. These include the following:
There are similarities between the process cell, the production line, and the
production unit. These are essentially the same levels of entity that are applied to
batch, continuous, or discrete production.
Storage zones and storage units are independent of the type of production.
[ 97 ]
Industrial Data Flow and Devices Chapter 3
Equipment modules can contain other equipment modules, and control modules
can contain other control modules. This allows you to build a complex
hierarchical model of a piece of equipment.
Equipment modules and control modules are not a part of the base ISA-95
standard. They are added as extensions of the model to unify the model with the
ISA-88 standard, which includes these objects.
[ 98 ]
Industrial Data Flow and Devices Chapter 3
Work unit: A work unit is an entity in the factory that adds value to a product.
The way a piece of equipment is used determines whether it is classified as a
work unit. If the device requires a sequence of steps, or specific settings to carry
out its production function, it is a work unit. Work units in ISA-95 are
differentiated based on whether they are used in batch or continuous operations,
discrete manufacturing, or storage. There are three types of work unit:
Unit: A unit is a work unit defined for batch and continuous
production
Work cell: A work cell is a work unit defined for discrete
manufacturing
Storage unit: A storage unit is a work unit within a storage zone
that can be a location and/or a piece of equipment that is dedicated
to the storage of materials and/or equipment
SA-88 extensions
The ISA-95 equipment model stops at the work unit level since this is typically the lowest
level of equipment that is separately monitored and scheduled in the MES and ERP layers.
However, this is not granular enough for us to fully model the factory equipment down to
the control layer. Often, to model an entire enterprise from the head office down to a single
sensor or control input in a factory, the equipment model is extended to include the ISA-88
control and equipment modules:
[ 99 ]
Industrial Data Flow and Devices Chapter 3
To determine the asset model, we need to follow an iterative process that is based on the
following steps:
1. Design the conceptual model of the company asset. To do this, you need to
identify the physical asset and set up the customer business from a logical
perspective. You can then work out which logical resources you need to manage.
2. Fit the designed conceptual model in the hierarchy of the ISA-95 or the ISA-88
equipment model. This means that you need to associate each logical resource
identified in the previous step with the corresponding entity in the hierarchy of
the equipment model made available by the ISA-95/ISA-88 standards.
In general, the granularity of the model depends on what a company wants to do with the
model. A company needs to look at the work processes and the information flow in light of
their own business goals.
The following diagram outlines what an asset model could be in an LNG company,
including its hierarchy from a logical perspective and the related entities of the
ISA-95/ISA-88 standards:
[ 100 ]
Industrial Data Flow and Devices Chapter 3
Summary
In this chapter, we analyzed the main devices involved in the I-IoT data flow in an
industrial plant in detail. We started from the generation of the data, looking at how
physical measurements are captured by the transducers, converted to a digital signal, and
passed through the industrial networks by controllers. We learned that industrial networks
provide determinism in message delivery, which is a fundamental requisite in the control
process. We discovered the enabling mechanisms that can guarantee deterministic
networks. We compared a simplified networking model (an EPA) with the full model of the
ISO/OSI stack and we looked at the methods that are used to access the physical link of the
control network to ensure that the packet is transmitted within a determined time
interval. We also learned that there are different types of industrial networks according to
the length of the message and the complexity of the industrial device.
We then moved our discussion to a higher level, that is, analyzing the SCADA system and
Historian. We looked at the main features of a SCADA system and the main mechanisms
that allow a Historian to collect, store, and retrieve a huge amount of time series data. We
learned a wide range of key terms, including tags and tag names, time series, sampling,
polling and unsolicited, store and forward, data filtering, and data compression. Finally, we
outlined the ERP system and explained what a MES system is. We also looked at the asset
model, as it plays a key role in managing industrial data.
In the next chapter, we will learn more about the OPC and the edge device, looking closer
at how the OPC allows data to pass to the edge. We will then analyze the edge and follow
the I-IoT data flow until it reaches its destination in the cloud.
[ 101 ]
Industrial Data Flow and Devices Chapter 3
Questions
1. What is the setpoint in a control chain?
1. The desired value for a specific process variable
2. The calculated value for the control variable fitting the reference value
3. The measurement of the process value coming from the process
2. What is the accuracy in a sensor?
1. The maximum error that the sensor can make in a measurement
operation
2. The resolution of the sensor
3. The deterministic error
3. Which is the resolution of a 16-bit DAC converter that has a full-scale voltage of
10 volts?
1. 0.152 mV
2. 0.187 mV
3. 1.5 mV
4. In which electronic device is the A/D are integrated?
1. Microcontroller
2. Microprocessor
5. Which are the three PLC graphical languages are provided by the IEC 61131
standard?
1. Sequential Functional Chart, Ladder Diagram, and Instruction List
2. Functional Block Diagram, Sequential Functional Chart, and Ladder
Diagram
3. Functional Block Diagram, Sequential Functional Chart, and
Structured text
6. In which scenario are DCSs typically preferred to PLCs?
1. When the production process is discrete
2. When the process requires determinism and a low scan rate
3. When the production process is continuous and the product value is
high
[ 102 ]
Industrial Data Flow and Devices Chapter 3
Further reading
Read the following articles for more information on what we have covered in this chapter:
[ 103 ]
Industrial Data Flow and Devices Chapter 3
[ 104 ]
4
Implementing the Industrial IoT
Data Flow
In the previous chapters, we looked at how a typical industrial scenario works and at the
existing technologies that have been in place since 1990. In this chapter, we are going to
understand how to connect this old world to the new IoT world. We will look closely at
how to connect and gather industrial data from different devices and data sources. In
particular, we will explain the OLE for Process Control (OPC) protocol in detail. We'll start
by looking at its original implementation in 1996, based on the Microsoft architecture
of Component Object Model (COM) and the Distributed Component Object Model
(DCOM), and track its progress right up to the current Unified Architecture (UA), which is
based on open and interoperability standards.
We will then go into detail about some concepts that were outlined in the previous
chapters. These include high-and low-frequency data, sampling, data filtering, dead bands,
data compression, polling, and unsolicited data collection. We will also introduce other
concepts related to the collection of industrial data, such as subscription, collection
timeouts, temporary storage, and data buffers.
After that, we will look at edge devices in detail to give the reader a comprehensive idea
about their role, functionalities in the I-IoT data flow, hardware footprint, software
architecture, networking capabilities, and the options that are available in the market. We
will also take a look at the emerging models of edge and fog computing.
Following this, we will look at the I-IoT data flow from the perspective of a developer. This
will include choosing which industrial device to use to gather industrial data, highlighting
their strengths and weaknesses; how to collect the data that comes from sensors and
actuators through OPC or fieldbus; how to bubble them to the edge; and, finally, how to
transfer them in a reliable way to the cloud.
Implementing the Industrial IoT Data Flow Chapter 4
Discovering OPC
No other industrial communications standard has achieved such widespread acceptance
across so many different verticals, industries, and equipment manufacturers as OPC
Classic. It is used to interconnect a large variety of industrial and business systems.
SCADA, Safety Instrumented Systems (SISs), Programmable Logic Controllers (PLCs),
and Distributed Control Systems (DCSs) use OPC to exchange data with each other and
with Historian databases, MES, and ERP systems in the corporate world. The reason for the
success of OPC Classic is very simple—it is the only truly universal interface that can be
used to communicate with different industrial devices and applications, regardless of the
manufacturer, software, or protocols used in the control system. This ability has become
increasingly important, as the need to exchange information between the different devices
of the industrial automation pyramid has increased constantly over the years.
[ 106 ]
Implementing the Industrial IoT Data Flow Chapter 4
While OPC doesn't eliminate the need for drivers altogether, application vendors no longer
need to develop different drivers for each industrial network or processor. The effort
required is considerably reduced, since each vendor only needs to implement the OPC
interfaces. Typically, each manufacturer develops an OPC server for its specific product,
using the native protocol for that device, and an OPC client, to interface the OPC servers of
the other vendors. In this way, each OPC server will be implemented with full knowledge
of the underlying native protocol. Once an OPC server exists for a piece of equipment or an
application, it becomes much easier to integrate its data with other OPC-compliant
software. There are software houses, such as Matrikon, that specialize in this specific area
and offer a wide variety of OPC servers for a large number of products.
Initially, the OPC standard was based on the Windows OS and the Microsoft Object
Linking and Embedding (OLE) for Process Control (OPC) technology. These specifications
are now known as OPC Classic (http://opcfoundation.org/about/opc-technologies/
opc-classic/). In 2006, the OPC Foundation developed the OPC UA (http://
opcfoundation.org/about/opc-technologies/opc-ua/) specifications, which use a
service-oriented architecture. These implemented the following actions:
They overcome the limitations that were related to the architecture of OPC
Classic
They address the needs that come from new challenges in security and data
modeling
They provide a more scalable and flexible platform
They build an independent and open platform that is not linked to a specific
technology
Today, the acronym OPC stands for Open Platform Communications. One of the most
important things to highlight about OPC is that it is an Application Programming
Interface (API), rather than an on-the-wire protocol. It provides a higher level of
abstraction than communication protocols such as Ethernet, TCP/IP, or fieldbus protocols
such as EtherNet/IP or PROFINET. This is true both for OPC Classic and OPC UA. We will
analyze both of these in detail in the following sections.
[ 107 ]
Implementing the Industrial IoT Data Flow Chapter 4
OPC Classic
In 1995, various companies decided to create a working group to define an interoperability
standard. These companies were the following:
Fisher Rosemount
Intellution
Intuitive Technology
Opto22
Rockwell
Siemens AG
Members of Microsoft were also invited to provide the support necessary. The goal of the
working group was to define a standard for accessing information in the Windows
environment, based on the current technology at that time.
The technology that was developed was called OLE for Process Control (OPC). In August
of 1996, the first version of OPC was defined. The following diagram shows the different
layers of OPC Classic with the underlying critical communication protocols—COM,
DCOM, and Remote Procedure Call (RPC):
OPC layering
[ 108 ]
Implementing the Industrial IoT Data Flow Chapter 4
DCOMs use the mechanism of RPCs to send and receive information between COM
components in a transparent way on the same network. The RPC mechanism was designed
by Microsoft to allow system developers to request the execution of remote programs
without having to develop specific procedures for the server. The client program sends a
message to the server with the proper arguments and the server returns a message
containing the results that come from the executed program.
The standard was coupled to a specific technology. OPC Classic, in fact, was built
around and on top of Microsoft technology.
COM traffic relies on several undetermined network ports to be opened, so it is
often blocked by firewalls.
DCOMs and RPCs were weighty and complicated mechanisms. DCOM objects
often suffered from a lack of performance and were difficult to maintain.
The following diagram shows the implementation of the same data retrieval scenario using
two different architectures and two different kinds of software. We have two PLCs
connecting to a computer that is running the OPC Classic server for the PLC vendor. In the
scenario on the left, named DCOM INTERFACE, the PLCs and the OPC server
communicate using the native PLC protocol. The OPC client running on the SCADA
computer accesses the data in the OPC server through the DCOM, since they are running
on different computers.
[ 109 ]
Implementing the Industrial IoT Data Flow Chapter 4
In the scenario on the right, named COM INTERFACE, the PLCs and the OPC server
communicate using the native PLC protocol, but the OPC client is embedded in a Historian
agent running on the same machine as the OPC server and therefore accesses the data in
the OPC server through COM. The data gathered by the Historian agent through the OPC
client is then sent to the Historian server through TCP/IP. In the scenario on the right, the
communication between the OPC client and the OPC server is done using COM since they
are running on the same Windows box, therefore avoiding the performance and security
issues that affect DCOM communication:
OPC layering
[ 110 ]
Implementing the Industrial IoT Data Flow Chapter 4
To structure the data on the server side in order to simplify the data gathering on
the client side
To define the communication services and the standard mechanisms to exchange
data
The information available from the OPC server is organized into groups of related items for
efficiency. Servers can contain multiple groups of items. There are two different types of
groups:
[ 111 ]
Implementing the Industrial IoT Data Flow Chapter 4
setpoint (SP)
control output (CO)
process variable (PV)
Low-water alarm (LoAlarm)
High-water alarm (HiAlarm)
The SCADA could register the water-level group with all of these members. It could then
read the current values for all five items, either at timed intervals or by exception, which is
when their values change. The HMI could also have write access to the SP variable.
A significant advantage of OPC is that you do not have to deal with the control device's
internal architecture directly. The software can deal with named items and groups of items
instead of dealing with raw register numbers and data types. This also makes it easier to
add or change control systems, when migrating from a proprietary protocol to an Ethernet-
based protocol, for example, without altering the client applications. The following points
provide some suggestions related to how to implement a data flow based on OPC Classic:
If possible, run the OPC Data Access (DA) client on the same Windows box as
the OPC DA server to avoid the configuration and performance issues that are
related to DCOM.
The OPC DA client can ask the OPC DA server for readings from either the
device or the cache. Requests for readings directly from the device instead of the
OPC cache should be limited to specific use cases where you need to have the
current values related to very few signals. If you need to read a large number of
tags on a regular basis, it is better to use cache readings.
To avoid overloading the communication channel and the OPC server, group the
tags according to the dynamic of their related signals, setting a different polling
time to each group. For instance, you could classify the signals according to their
behavior in high, medium, and low dynamic signals and assign a different
polling time to each of them. You might assign 5 seconds for the high dynamic
signals, 30 seconds for the medium dynamic signals, and 60 seconds for the low
dynamic signals. The OPC server will then create three different internal caches
and refresh them accordingly. Putting all the tags together and configuring just
one group for all of them would force you to assign it the lower polling time of
five seconds. The OPC server would have to refresh all tags every five seconds,
even if many tags would not change meaningfully in that time interval.
[ 112 ]
Implementing the Industrial IoT Data Flow Chapter 4
Configuring the OPC group to have unsolicited data readings would force the
OPC server to invoke a callback if any of the tags configured in a group changes.
The callback mechanism is not very efficient, especially when used with DCOM,
where the OPC DA client does not run in the same Windows box as the OPC DA
server. For this reason, it should be limited to situations in which you are
interested in only a few signals with a low dynamic that don't change their
values very frequently.
OPC UA
The first response of the OPC Foundation to the rising limitations related to the adoption of
the COM and DCOM architecture was the development of OPC XML-DA. This kept the
fundamental characteristics of OPC, but adopted a communication infrastructure that is not
linked either to a manufacturer or to a specific software platform. The conversion of the
OPC-DA specifications into versions based on web services was not enough to fulfill the
needs of factories that increasingly interact and integrate with the corporate and the
external world.
The OPC UA was therefore developed with the aim to replace all existing COM-based
versions and overcome security and performance issues, satisfying the need for platform-
independent interfaces and allowing the creation of rich and extensible data models to
describe complex systems without losing functionality. OPC UA is based on a service-
oriented approach defined by the IEC 62451 standard. It has the following aims:
The API isolates the client and server code from the OPC UA stack
The UA stack converts the API calls into messages
The UA stack receives messages by sending them to the client or the server
through the API
[ 113 ]
Implementing the Industrial IoT Data Flow Chapter 4
OPC UA stack
[ 114 ]
Implementing the Industrial IoT Data Flow Chapter 4
The set of objects and related information that an OPC UA server makes available to the
clients is the address space. You can think of the address space as similar to the
implementation of the OPC UA information model. The address space of the OPC UA is a
set of nodes that are connected by references. Each node has properties, which are called
attributes. A specific set of attributes must exist in all nodes. The relation between nodes,
attributes, and references is shown in the following diagram:
[ 115 ]
Implementing the Industrial IoT Data Flow Chapter 4
Nodes can belong to different node classes, depending on their specific goal. Some nodes
might represent instances, others might represent types, and so on. In the OPC UA, there
are eight standard node classes: variable, object, method, view, data type, variable type,
object type, and reference type. In OPC UA, the most important node classes are object,
variable, and method:
The object node class structures the address space and can be used to group
variables, methods, or other objects.
The variable node class represents a value. Clients can read, write, or subscribe
to it.
The method node class represents methods that have input and output
parameters that clients can invoke through the call service to get a result. They
are designed to work quickly. Some of the quick actions that can be managed
through methods include opening a valve, starting a motor, or even calculating
the results of a simulation based on the input parameters provided.
A reference describes the relationship between two nodes and is uniquely
identified by the source node, the target node, the reference type and its
direction. We can think of references as pointers to other nodes. They are not
accessible directly but can be accessed by browsing the nodes. The reference type
is the way in which the link between the connected nodes is modeled. OPC UA
defines a set of built-in reference types organized hierarchically, but each server
has the option to define its own specific set by extending the basic one. The
reference types are exposed in the address space as nodes and this allows clients
to easily recover information on the references that are used by the OPC UA
server in which the client is interested.
A view is used to restrict the number of nodes and references that are visible in
an extended address space. Using views, servers can provide different
representations of their address space to different clients that they are connecting
to, depending on their use cases. There are two ways to use views in OPC UA:
A view can be represented in the address space as a node that
provides an entry point to the data to be displayed. All the nodes
that are part of a view are accessible, starting from the starting
node.
The node ID of the view node can be used as a filtering parameter
for address space navigation. In this way, the servers can hide the
references to the nodes that are not part of the view and the clients
only see a subset of the address space.
[ 116 ]
Implementing the Industrial IoT Data Flow Chapter 4
OPC UA sessions
OPC UA provides a client-server communication model that includes status information.
This status information is associated with a session. A session is defined as a logical
connection between a client and a server. Each session is independent from the underlying
communication protocol; any issues occurring at protocol level do not automatically cause
the termination of a session. The sessions terminate following an express request from the
client or due to the client's inactivity. The inactivity intervals are set during the creation of a
session.
The entities involved, as shown in the following diagram, are the Application Layer, the
Session, and the Transport Layer:
[ 117 ]
Implementing the Industrial IoT Data Flow Chapter 4
The Transport Layer is the level responsible for transmitting and receiving data
through a socket connection, to which error-handling mechanisms are applied to
ensure the system is protected against attacks such as denial-of-service (DoS):
The creation of a secure channel is based on the endpoint and each server offers one or
more endpoints. Each endpoint has the following features:
Endpoint URL: This is the network address of the endpoint used by the client to
establish a secure channel.
Server application instance certificate: This is the public key of the server used
by the client to make the exchange of data secure.
Security policy: This is the set of algorithms used in security mechanisms and
also includes the length of the key that is used. An example of a security policy is
Advanced Encryption Standard (AES) with a 128-bit key.
Security mode: This ensures the authentication at the level of application. There
are three different modes that can be used: SignAndEncrypt, Sign, or None.
Authentication: This refers to the mechanisms used to authenticate a user during
the creation of a session by means of a username and password, a certificate, or
through anonymous authentication.
Transport protocol: This specifies the network protocol used.
[ 118 ]
Implementing the Industrial IoT Data Flow Chapter 4
[ 119 ]
Implementing the Industrial IoT Data Flow Chapter 4
1. Set the configuration options for the secure connection. If the application is
preconfigured and already knows how to connect to the server, we can skip this
step. If not, the client must send a GetEndpoints request to the discovery
endpoint of the server to which it wants to connect in order to receive the
descriptions of the existing session endpoint and the related available security
configurations, including security policies, security modes, and
server Application Instance Certificate. The security policy defines the
algorithms to be used for signing and encrypting messages, while the security
mode defines the type of security.
The client selects a session endpoint with a supported security configuration and
validates the server Application Instance Certificate. This is done by checking its
validity status with the associated Validation Authority (VA).
2. If the certificate is reliable, we send an Open Secure Channel request in line with
the security policy and the security mode of the selected session endpoint:
If the security mode is None, the Open Secure Channel request is sent
without any security mechanisms.
If the security mode is Sign, the Open Secure Channel request is sent
using the private key of the client Application Instance Certificate as a
signature.
If the security mode is SignAndEncrypt, the Open Secure
Channel request is sent after encrypting it using the public key of the
server Application Instance Certificate.
The security policy specifies which encoding and signature algorithms should be
used for signing and encrypting messages:
Once the server receives the Open Secure Channel message, the
server validates the client's Application Instance Certificate by a
request to the VA.
If the certificate is considered valid, the message is interpreted
according to the security policy and the security mode. The
message is decoded using the server private key and the signature
is verified using the client's public key.
The server sends the response to the client in the same way as the
client sent the request.
The secure channel is established.
[ 120 ]
Implementing the Industrial IoT Data Flow Chapter 4
The creation of the secure channel is mainly to create symmetric keys, which
enable the exchange of secret information between clients and servers. The
symmetric keys mean we can avoid using public key cryptography (asymmetric
keys), which is less efficient in terms of computational speed.
A CreateSession request is sent to the server. The server replies and provides its
software certificates to communicate its functionality and to demonstrate the
ownership of the certificate used in the creation of the underlying secure channel.
[ 121 ]
Implementing the Industrial IoT Data Flow Chapter 4
4. Activate the session that was just created. The customer sends an
ActivateSession request to the server, including the credentials of the current
user and the client's software certificates. The credentials can either be
represented by a X.509 certificate that has been validated by a VA, or by
a username and password pair. Once the user credentials and the software
certificates have been validated by the server, the session is established and
active, and the client can start accessing the server data.
maxAge: This is the maximum time taken for the values to be returned. It is
specified by the client. It forces the server to access the device (for example, a
sensor) if the copy in its cache is older than the maxAge parameter configured by
the client. If maxAge is set to zero, the server must supply the current value by
always reading it directly from the device.
Type of timestamps: In OPC UA, two timestamps are defined, the source
timestamp and the server timestamp. The source timestamp is the timestamp
that comes from the device, while the server timestamp is the timestamp that
comes from the OS where the OPC UA server is running.
List of nodes and attributes: The list of nodes and attributes are as follows:
NodeId
AttributeId for instance value
DataEncoding: This allows the client to choose the appropriate
data encoding and the default values are XML, UA binary
[ 122 ]
Implementing the Industrial IoT Data Flow Chapter 4
List of nodes, attributes, and values: The list of nodes, attributes and their
values are as follows:
NodeId
AttributeId
Value to write
Source timestamp: Null if not set
Timestamp server: Null if not set
A client can create one or more subscriptions for each session. For each subscription, the
client can create monitored items, which are related to the real items. Each monitored item
can create a notification, which can either be linked to the changes of the values of the
attributes or variables selected by the client, or it can be connected to when events occur.
Each monitored item continuously produces notifications either until the subscription is
cancelled, or the monitored items are deleted. Clients can use subscriptions to receive
updates on a regular basis.
The following diagram represents the relationships between sessions, subscriptions, and
monitored items:
OPC UA subscriptions
[ 123 ]
Implementing the Industrial IoT Data Flow Chapter 4
The subscription collects and groups all notifications related to it on a regular basis. All
notifications thus grouped are inserted in a NotificationMessage to be sent to the client.
Monitored items cannot exist if a subscription is not created by the client within a session;
the client must create a subscription before the monitored item can be defined. A
subscription has several parameters. The main ones are as follows:
PublishingInterval: This defines the time interval used by the server to create a
NotificationMessage for the given subscription.
PublishingEnabled: This is a Boolean value that enables the forwarding of the
NotificationMessage to the client.
A keep-alive counter that counts the number of publishing intervals that have
elapsed with no notifications sent to the client
A lifetime counter that counts the consecutive number of publishing intervals
that have elapsed with no actions carried out by the client
A queue of the NotificationMessage sent and not acknowledged by the client
Attributes: Any attribute of any node can be associated with a monitored item.
Attributes are only monitored to indicate when their value changes. A value
change produces a notification.
Variable: The value attribute of the nodes belonging to a NodeClass variable
can be associated with a monitored item. The value attribute of a variable is
monitored to check for changes in its value or status under specific conditions. If
the conditions are met, the related change triggers a notification by the
monitored item.
Objects and view: The nodes belonging to the node class's object and view can
be associated with a monitored item. These are monitored about the occurrence
of a particular event.
[ 124 ]
Implementing the Industrial IoT Data Flow Chapter 4
For every change of attribute, value, or status, or for each event monitored by the
monitored item, a notification is produced. Several notifications are packaged by the
subscription within a NotificationMessage at time intervals according to the publishing
interval.
All monitored items have some common settings. The most important of these
is SamplingInterval. This defines the frequency in milliseconds at which the server
samples the items with which the monitored item is associated. The default value of the
sampling interval is the same as the publishing interval of the related subscription. A client
must set the sampling interval to zero if the subscription is related to events. If the client
specifies a sampling interval that is not supported by the server, the server assigns the most
appropriate range to the monitored item. The sampling performed by the OPC UA server
and the scan cycle used to get the data from the underlying device are usually not
synchronized. This can cause delays in detecting changes.
OPC UA notifications
As mentioned in the previous section, the client can define one or more subscriptions
within an active session. For each subscription, it can create monitored items, which
produce notifications that are stored in specific queues. According to the frequency
specified by the publish interval, for every subscription, the current contents of all queues
of the monitored items related to the subscription are merged into a
NotificationMessage, to be delivered to the client. In OPC Classic, this information was
sent through callbacks, which allowed the server to invoke methods on the client. In OPC
UA, the communication mechanism is based on unidirectional connections that don't use
the callback mechanism. This means it can easily be managed by firewalls and is
independent of the underlying transport layer.
Through the Publish Service request, the client sends a request to the server, expecting to
receive a Publish Service response that contains a NotificationMessage within the
expiration time specified by the publishing interval. A Publish Service request does not
refer to a specific subscription, but just to the session in which the request has been
submitted. So, for each Publish Service request, the server will be required to forward to the
client a NotificationMessage from those produced by the subscriptions of the active
session. If several NotificationMessage have been produced and are ready to be
transmitted, the server will decide which NotificationMessage to send to the client,
either according to the priority of the subscription or by using a Round Robin algorithm.
[ 125 ]
Implementing the Industrial IoT Data Flow Chapter 4
The following diagram shows the notification mechanism we just described. For
each Publish Request received, the server selects a subscription, picks a
NotificationMessage, and is sent a Publish Response. The following diagram
also shows a Publish Request queue, in which all requests that are advanced by the client
and received in the session are queued:
[ 126 ]
Implementing the Industrial IoT Data Flow Chapter 4
The Edge Gateway is the core and is responsible for forwarding the data from the site to
the IoT Data Hub, whether this is on the cloud or not. The Edge Tools are utilities for
configuration, log management, and patching the edge operating system, either from a
remote or a local user interface. The Edge Computing is a new, upcoming component
that uses data to perform an action at site level or to provide insights to headquarters.
So far, the Edge has been limited to collecting and forwarding data using the Edge Gateway
to the I-IoT middleware, whether this is on the cloud or not. Recently, however, industrial
companies have been able to turn data into actionable intelligence using the edge
computing, which is available at the edge side. In 2017, Gartner declared the following:
[ 127 ]
Implementing the Industrial IoT Data Flow Chapter 4
While this statement might seem a bit controversial, it highlights the role that the edge has
played over the past two years. Industrial companies, after an initial phase of absolute
cloud-centricity, have realized that it is not always possible to do everything in a remote
location. The reasons behind this are as follows:
The ability to gather data from different industrial data sources through OPC or
by implementing interfaces that are directly connected to the Fieldbus protocols
(such as Modbus, ProfiNet, or EthernetIP).
The ability to establish a secure channel with the I-IoT Data Hub and to manage
certificates and authorization.
The ability to send data to the I-IoT Data Hub and, in some situations, to
compress it.
[ 128 ]
Implementing the Industrial IoT Data Flow Chapter 4
The ability to easily manage and configure data acquisition either remotely or
locally
The ability to register to receive patching and updates
The ability to log which actions were carried out and by whom
The ability to browse and trend the data using a UI
The ability to self-configure and self-register to the cloud at startup
The ability to receive and execute commands from the cloud
The ability to perform an action on behalf of the I-IoT middleware, either offline
or online
The ability to host custom applications
The ability to run analytics in standalone mode, offline, in collaboration with the
I-IoT middleware, or in collaboration with the middleware of the local
headquarters
The ability to carry out actions or download analytics from the I-IoT middleware
The ability to send unstructured or a specific set of data to the I-IoT middleware
on demand or when triggered by a condition
Looking at the functionalities that the Edge must implement, we can understand that the I-
IoT Edge is not a simple device, but requires sophisticated abilities in term of performance,
management, and computation.
[ 129 ]
Implementing the Industrial IoT Data Flow Chapter 4
From the perspective of the computational architecture on the edge side, the IoT has
introduced the concept of the fog.
[ 130 ]
Implementing the Industrial IoT Data Flow Chapter 4
[ 131 ]
Implementing the Industrial IoT Data Flow Chapter 4
The most important components of the Edge Gateway are the Industrial Adapter and the
IoT Adapter. The Industrial Adapter usually subscribes to the data from the industrial
field and publishes it on a data bus. Typically, it implements the connector or connectors
for the selected device, acting as a data source in our I-IoT data flow and making them
available on the Edge data bus. The IoT Adapter, on the other hand, grabs the industrial
data from the data bus and forwards it to the IoT Data Hub. An important part of the Edge
gateway is the Store-and-Forward component. This is a general mechanism for storing the
data in temporary local storage to make the data-transfer resilient to the instability of the
network. We discussed this previously when analyzing Historians.
In the Edge gateway, the store-and-forward component is crucial since the data transfer
occurs over the internet in a Wide Area Network (WAN) rather than in a Local Area
Network (LAN). In a WAN, the instability and the latency of the connectivity channel is
very high. The store-and-forward mechanism can either offer a limited memory buffer that
covers a short period of connectivity downtime, or a large and specific storage area on a
disk drive that can cover a long period of connectivity downtime or a huge flow of data.
The range of the time window for which the data transfer must be ensured depends on the
specific scenarios and the physical resources of the edge memory and storage.
Edge computing normally subscribes to the Data Bus exposed by the Edge Gateway and
consumes data for data processing. In some circumstances, edge computing invokes cloud
or local servers through a web service to run advanced computational functions.
[ 132 ]
Implementing the Industrial IoT Data Flow Chapter 4
Edge implementations
Cloud vendors and OEM vendors are developing different solutions based on their own
operating system or proposing cloud-agnostic Software Development Kits (SDKs). We
will take a closer look at some of these in the upcoming chapters, but in this section, we'll
list the most important edges.
Greengrass
Greengrass is the new generation of the IoT Edge by AWS. AWS supports an SDK to build
the AWS edge and extends cloud capabilities to edge devices with Greengrass. This enables
them to carry out actions locally, while still using the cloud for management, analytics, and
permanent storage. Greengrass supports OPC UA but not the OPC Classic interface. We
will be discussing Greengrass in more detail in Chapter 10, Implementing a Cloud Industrial
IoT Solution with AWS.
Android IoT
Google supports an SDK for edge development. It is sponsoring Android as the next
generation of edge devices.
[ 133 ]
Implementing the Industrial IoT Data Flow Chapter 4
Node-RED
Node-RED is an agnostic framework that can be used to build a simple IoT edge visually.
Node-RED also supports Raspberry Pi and other lite hardware. It is based on Node.js and
supports several protocols and several standards. Node-RED also allows us to develop our
own plugins. Node-RED is not specialized for industrial environments, but can
be used profitably for pilot projects and proof-of-concept projects, or in conjunction with
other components.
Docker edge
Docker edge is the new Docker for Enterprise solution proposed by Docker for edge
computing. Docker edge is a collaboration between General Electric (GE) and Docker for
the IoT Predix platform.
Message Queue Telemetry Transport (MQTT) and the secure channel MQTTS
are the most important ISO protocols for machine-to-machine communication.
The standard is ISO/IEC PRF 20922. We will explore MQTT in Chapter 8,
Implementing a Custom Industrial IoT Platform. MQTT is based on the
publish/subscribe pattern.
Advanced Message Queuing Protocol (AMQP) was developed for
interoperability and messaging. AMQP is also based on the publish/subscribe
pattern. It is similar to MQTT, but it is a heavier protocol.
HTTP and HTTPS are not real protocols for IoT, but are normally used in
conjunction with the REST API to transmit data over the internet.
[ 134 ]
Implementing the Industrial IoT Data Flow Chapter 4
Variety: In industrial plants and factories, the technology used is always a mix of
generations. We might find sensors that were installed just a year ago, using a
real-time kernel, with self-diagnostic, auto-tuning, and signal-processing
capabilities, together with very old sensors with limited or no fieldbus interfaces.
It is not feasible to deal with these different technologies at the same time; we
would have to implement different ad-hoc connectivity solutions.
Amount: The factory floor is made up of hundreds of sensors and actuators and
the industrial process uses a huge number of microcontrollers and fully
automated working cells. All these signals and information are needed to
implement predictive and prognostic analytics on the cloud. They are also the
basic building blocks to construct digital twins of the machines or processes. It's
not possible to establish a direct connection to the cloud for each of these; the
variety and amount of them would require too much maintenance and
networking to be dealt with.
Security: Last but not least, cyber security is a big barrier to implementing
connectivity in a factory. Devices and equipment that are not designed to be
resilient to cyber-attacks will be much more exposed, thereby putting entire
islands of automation at risk.
[ 135 ]
Implementing the Industrial IoT Data Flow Chapter 4
Let's now move on to looking at the industrial data sources and the related data-gathering
techniques.
[ 136 ]
Implementing the Industrial IoT Data Flow Chapter 4
Let's start to analyze each of these in terms of their capabilities, their strengths and
weaknesses, and their connectivity and data acquisition options.
PLC
PLCs receive data from sensors and send commands to actuators. In a large plant, it is
likely that there will be several PLCs of different sizes with different capabilities. These are
often structured hierarchically, with the highest acting as data concentrators. There aren't
many PLCs that act as concentrators in a factory; you can often find them controlling an
entire line of production or a specific functional area of the industrial plant. The PLC
concentrators meet the needs of the I-IoT data flow very well. They are very powerful and
are usually linked to an OPC server gateway provided by the same vendor as the plant or
the corporate network, thereby simplifying the task of transferring the data to the cloud
over the internet. Sometimes, they are hosted in a separate board on the same rack as an
OPC UA server. Since OPC Classic is just run on the Windows box, the same option is not
possible for the OPC DA server. If PLC concentrators are not available, we need to connect
to the main PLC of every working cell, line production, or plant area from which we need
to gather data. In a large plant, we hardly ever just use one PLC concentrator.
The PLC is a device that processes the signals coming from the sensors, the commands sent
to the actuators, and the current status of the devices and equipment. Its fast scan time
allows us to have fresh data coming from the sensors every 100 milliseconds or less. It is
also reliable, steady, and deterministic. It has the fastest sampling rate and very few
instances of downtime, which are planned well in advance. This is very important if we
need to gather data without unplanned shutdowns or device reboots that might
compromise the quality of the signals or the integrity of the data stream.
[ 137 ]
Implementing the Industrial IoT Data Flow Chapter 4
PLCs are at the first level of the automation pyramid. This means that using them to get
data means working very close to the hardware, with a low level of abstraction. This brings
the following problems:
The PLCs, and the PLCs that act as data concentrators, manage, either directly or
indirectly, several thousand input and output signals plus their internal data,
calculations, and statuses generated by their control logic. In the context of the I-
IoT, all of these are tags, some of which come from physical instruments and
others from internal calculation and intermediate logic steps. These often appear
unstructured, or structured in a very basic way. They are rarely linked to a
comprehensive data model, even though this should be one of the outcomes of a
well-integrated I-IoT platform. In any case, selecting the PLC as a data source
means we have to deal with this complexity due to the low level of abstraction.
Naming conventions are another obstacle to overcome, since the name of tags
might be different in different areas of the same plant. This might depend on the
PLC vendor or the integrator that develops the control logic running on the PLC.
Often, the naming conventions also reflect the naming of the physical
instruments that are provided by the manufacturer of a specific plant area.
We cannot simply connect a PLC to a device, such as the edge, that is exposed to
the internet. This is not recommended by the Integrated Control System (ICS)
security standards and, in any case, is not something that the field engineer
would allow. We will be exploring how we can get around these security
constraints further in Chapter 5, Applying Cybersecurity, but it is not always
possible to do this for the following reasons:
The PLCs may be old and would therefore not be resilient to storm
attacks on their Ethernet port.
There may be other old devices on the same local network that are
potentially at risk.
There may be customer security policies to keep the control
network as isolated as possible, due the variety of the industrial
devices that are linked to it and their vulnerability to cyberattacks.
The mentality of the team may also be a barrier; an automation
engineer with a lot of experience in the field may be unwilling to
accept the idea of connecting a PLC to the internet.
[ 138 ]
Implementing the Industrial IoT Data Flow Chapter 4
Connecting directly to a PLC implies that we are able to talk to it according to its
protocol. Although PLCs have become much more standardized due to the
adoption of the fieldbus standard over the past few years, we would still need to
develop, maintain, and keep updated the connectors for several protocols,
including EtherNet/IP, PROFINET IO, DeviceNet, PROFIBUS DP, and MODBUS.
This requires quite a lot of effort, especially if you consider that each of these
protocols is liable to require enhancements and fixing, meaning you have to deal
with the compatibility and the backward-compatibility of the developed
connectors. Another option for using PLCs as a data source for the I-IoT data
flow is to connect to them through the OPC server. OPC-UA, if available, is
without a doubt the preferred choice compared to OPC Classic. OPC Classic,
however, is preferred to a direct connection through the fieldbus protocol used
by the PLC. If your choice is OPC Classic, try to run your OPC DA client on the
same Windows box as the OPC DA server to avoid the DCOM issues discussed
in the previous section.
DCS
From the perspective of the I-IoT data flow, DCSes are very similar to PLCs, so all the
previously analyzed strengths and weaknesses also apply in this case. There are just a few
differences to underline, which are listed as follows:
The architecture of the DCS is natively organized hierarchically. This means that
in industrial plants where the automation systems mainly use DCSes, we don't
find many of them and they are typically supplied by the same vendor. For
example, in a medium-sized refinery, we might have three or four DCSes that
manage the whole plant.
The DCS plays the double role of controller and SCADA system. Many also have
an integrated Historian natively connected to the DCS controllers. This helps to
select the data to be gathered since they are usually organized and structured
according to a data model. This means we have to deal with less variety and
complexity compared to using PLCs.
[ 139 ]
Implementing the Industrial IoT Data Flow Chapter 4
The DCS, like the PLC, manages and collects thousands of sensors, actuators,
internal calculations, and derived values and statuses. Typically, its scan rate is
between 100 and 1,000 milliseconds. This is lower than the scan rate of a PLC but
in most cases, the data is not connected by linking to its native protocol, but
instead through its OPC server. Any OPC server is unlikely to provide data to its
OPC clients with a refresh time under 1,000 milliseconds. This means that from
the perspective of the I-IoT data flow, it doesn't really matter whether the
underlying device is a DCS or a PLC when gathering data from an OPC server.
SCADA
SCADA systems have the necessary drivers to communicate with the field devices of the
factory floor through their legacy communication protocols or, more often, through OPC or
a fieldbus protocol. Typical data acquisition intervals for SCADA applications are in the
range of half a second to one minute.
One of the main functions of a SCADA system is the data acquisition system. This acts as a
natural data concentrator, collecting the data from different field devices, micro-controllers,
and PLCs that control the different working lines or functional areas of the factory. From
the perspective of I-IoT data acquisition, SCADA systems are very similar to the DCS; they
are organized hierarchically, there are few of them, and they are typically supplied by the
same vendor. For example, in a medium-sized manufacturing plant, you might have one or
two SCADA systems supervising the whole plant, plus some local HMI monitoring
working in cells or small areas. Since SCADA adopts a common approach to processing
industrial data that is often based on a common data model, it is easier to identity and
recognize the data stream needed by the I-IoT data stream. They may need to communicate
with MES and ERP systems so they are usually already linked to the plant or corporate
network. This simplifies the task of transferring data to the cloud over the internet.
[ 140 ]
Implementing the Industrial IoT Data Flow Chapter 4
SCADA systems are much less reliable than PLCs and DCSs. This is because of the
following reasons:
SCADA systems are updated quite frequently to add or change tags, units of
measure, scripts, alarm thresholds, and synoptic. This means that SCADA
applications have to be restarted often. They usually run on Windows operating
systems, so they are also frequently rebooted to apply security patches or other
installation tasks.
SCADA systems are built on several modules that execute different tasks.
Sometimes, these might be overloaded and they may drop the communication
tasks that are in charge of feeding the I-IoT data stream.
Connecting directly to a SCADA system implies that we are able to talk to it
according to its API or SDK. Although there are some SCADA systems, such as
InTouch or iFix, that are very common on the market, in any case we have to
develop, maintain, and update the connectors for the SCADA system. Just like
for the PLC, this require a considerable effort, considering that each SCADA
system is liable to require enhancement and fixing, so you have to deal with the
compatibility and the backward-compatibility of the developed connectors.
Again, similar to the PLC, another option for using SCADA systems as a data
source for the I-IoT data flow is to connect to it through its OPC server. The
considerations about OPC interfacing that were mentioned in the section on the
PLC are also valid here.
Historians
Historians have interfaces toward the most common industrial protocols and fieldbuses,
including PROFIBUS, MODBUS, DeviceNet, and OPC, and sometimes also legacy protocols
for old plants or DCS. These interfaces implement the most efficient ways to gather data
according to the abilities of the specific protocol. To store large amounts of data with
minimum disk usage and acceptable approximation errors, Historians rely on a data-
filtering and compression engine. The values are stored according to the dynamic of the
signal and their filtering and compression mechanisms. Signals coming from the same
device, such as the pressure and the flow rate of a pump, could have, and typically do have,
their row data stored in different timestamps. Typically, Historians store time-series data
following a hierarchical data model.
[ 141 ]
Implementing the Industrial IoT Data Flow Chapter 4
Advantages of Historians
Historians, by definition, act as data concentrators. They collect data from different field
devices, micro-controllers, and PLCs that control different working lines or functional areas
of the plant, and SCADA systems. Historian databases are typically structured and linked
to a structured asset model. This makes it easier to identity and recognize the data flow that
the I-IoT data stream is interested in. Like SCADA, they need to communicate with the
MES and ERP systems so they are usually connected to the plant or the corporate network,
which again simplifies the effort of transferring the data to the cloud over the internet. In
general, they are quite reliable compared to SCADA systems because their connectors,
which gather data from different data sources, have built-in data buffering abilities and
store-and-forward mechanisms. This means that, in general, they are quite resilient to
network instability or unexpected downtime of the Historian server. The reliability of the
data that is stored in the Historian depends almost entirely on the availability and
reliability of the device from which the data comes.
Disadvantages of Historians
Historians were developed to collect data coming from the whole plant and often from all
plants of the same company. In general, they are designed to manage and make available a
large amount of data. This means that optimizations are made with respect to data
sampling and storage. This might produce the following disadvantages:
The data collected by the Historians might be a subset of the data collected by
SCADA or the different PLC concentrators; sometimes, not all the data that is
available with other data-gathering techniques is collected and made available by
Historians.
Data is stored using filtering and compression mechanisms. This implies that
when we get data from Historians, it is stored as raw data, which does not
provide a snapshot of an asset at a specific time as an outcome. This is something
that we must deal with, since typically the data-stream analytics running in the
cloud work with snapshots of data that are temporally consistent and related to
an asset.
The data-collection configuration of Historians usually implements a lower
sampling rate than SCADA systems to minimize the bandwidth consumption
and optimize the data storage. This means you might have a data collection time
of a minute or even more. This is something that you have to take into
consideration when developing analytics on the cloud.
[ 142 ]
Implementing the Industrial IoT Data Flow Chapter 4
Edge on fieldbus
Edge on OPC DCOM
Edge on OPC Proxy
Edge on OPC UA
OPC UA on the controller
These five options do not cover all possible scenarios of edge deployment in a plant
network to gather data from a factory, but they are the main ones; all other possibilities can
be derived from these or are combinations of them. In this section, we aren't going to take
into account cyber security requirements or constraints, as the ICS standards and their
associated best practices will be covered in Chapter 5, Applying Cybersecurity. Let's start by
taking a look at the first option—the edge on fieldbus setup.
[ 143 ]
Implementing the Industrial IoT Data Flow Chapter 4
Edge on fieldbus
[ 144 ]
Implementing the Industrial IoT Data Flow Chapter 4
This means that the edge device must implement and incorporate the connectors to the
fieldbus protocols, such as EtherNet/IP, PROFINET, or MODBUS, in order to gather the
data to be transferred to the cloud. At first sight, this architecture seems very simple, but it's
actually not that easy to implement and maintain.
We have a direct connection to the PLC and the DCS, which provide the best
data-acquisition mechanisms and the most accurate time-series databases, as we
saw in the previous sections.
One device implements both the data collection and the data transfer with no
dependencies on other devices or applications. This means that there is just one
point of failure and potentially better availability and reliability.
There are just two interfaces with the external systems. The first is toward the
controllers, the PLC, or the DCS, and is steady and reliable by design. The second
is toward to the cloud, which is also reliable due to the data-buffering capabilities
and the store-and-forward mechanism implemented by the edge device.
There is strong coupling with the controllers. The direct connection with the
controllers forces the edge to implement and maintain a connector for each
fieldbus protocol. This means it is susceptible to the previously discussed
disadvantages.
[ 145 ]
Implementing the Industrial IoT Data Flow Chapter 4
There is no a single PLC or DCS that controls the whole industrial plant. A PLC
concentrator, like a DCS, controls just a single area of the plant. The related
control networks are not linked with each other by design as they are
deterministic networks. This would also make them vulnerable to security
breaches. As we saw in the previous chapters, their information is joined and
merged at a higher level by the SCADA and Historian systems. Because of this,
an edge device must be deployed for each control network of the industrial plant
from which the data is to be gathered. The scattering of the control networks
forces us to install more edge devices, leading to more complicated deployment
and maintenance.
In this setup, the edge device is connected at the same time to both the internet
and the control network. According to the ICS standards, we should not connect
a device that is connected to the internet directly to the control network; it should
instead be placed in a demilitarized zone (DMZ). This means that the edge
device should be separated by two firewalls, the first of which controls the
interface toward the internet and the second of which controls the interface
toward the control network. The isolation of the control network through
firewalls is not easy. Special firewalls, called operational firewalls, are required,
which have packet-inspection capabilities for the specific fieldbus protocol used.
Even with these firewalls in place, it is common that customer policies and
standards do not allow this type of connection.
[ 146 ]
Implementing the Industrial IoT Data Flow Chapter 4
This means that the edge must implement and incorporate the OPC Classic client interface.
The data source could be one of those mentioned previously—a PLC, a DCS, a SCADA
system, or a Historian. The OPC abstracts the data source, which is why, in the preceding
diagram, the OPC server has a dotted link with each of the potential data sources.
Obviously, the OPC server must be specific to a particular data source, but, from
the perspective of the client and the edge, this does not matter. Whichever data source is
connected, the OPC server will expose its data through an OPC Classic interface to which
the edge can connect through its OPC Classic client interface. This is the OPC-DA client
interface for the PLC, DCS, and SCADA systems, and the OPC-HDA interface for the
Historian system. According to the underlying data source abstracted by the OPC server,
we will have the related advantages and disadvantages that were discussed and analyzed
in the previous section.
[ 147 ]
Implementing the Industrial IoT Data Flow Chapter 4
The strengths of the edge in OPC DCOM setup include the following:
The OPC interface abstracts the underlying device that acts as the data source.
This means that there is no need to implement and maintain a connector for each
fieldbus protocol to which the edge connects.
The OPC server is typically linked to more than one PLC or DCS that have the
same underlying industrial protocols. This means that a single OPC server can
collect data coming from more than one PLC, thereby simplifying the edge
deployment.
Since the edge is not connected directly to the control network, it is easier to
deploy it in a DMZ, as stipulated by the ICS standards.
There is DCOM communication between the edge and the OPC server, which
means we have to deal with DCOM issues related to configuration, performance,
and maintenance.
Since the edge uses the OPC Classic interface, it can only run on a Windows box.
For security reasons, the edge and the OPC server must run on different
networks that are segregated by means of firewalls. By default, however, DCOM
assigns ports dynamically from the TCP port range of 1024 to 65535, and it is
not easy to restrict the number of the TCP ports used by DCOM to allow it to
cross traffic through the firewalls.
[ 148 ]
Implementing the Industrial IoT Data Flow Chapter 4
An OPC Proxy enables the OPC Classic server, which is based on the COM/DCOM
architecture, to be connected with the OPC UA products and can be used to manage the
traffic in both directions at the same time. It typically wraps an OPC UA server that gets
data from its internal COM client. The wrapper is needed to connect the OPC UA client to
the OPC Classic servers. As a proxy, it supports the traffic in the opposite direction. The
proxy is a COM-based server that gets its data from its internal UA client. The proxy is
needed to connect OPC Classic clients to OPC UA servers. It can also be used for firewall
tunneling a COM/DCOM-based OPC through a secure OPC UA connection.
[ 149 ]
Implementing the Industrial IoT Data Flow Chapter 4
In this setup, the OPC Classic server collects data from the related underlying device. The
data gathered through DCOM from the OPC Proxy linked to the same network of the OPC
server is tunneled in a TCP channel across the network boundary to reach the other OPC
Proxy on the other side of the TCP channel. This OPC Proxy exposes the data through its
OPC UA internal server, therefore making it available to the OPC UA client interface,
which implements and incorporates it into the edge to be later transferred to the cloud. Like
the previous scenario, in this setup, the data source could be a PLC, a DCS, a SCADA
system, or a Historian, since the OPC abstracts the data source.
The strengths are the same as those of the previous one, plus the following:
There is no need to move the DCOM traffic between firewalls since this occurs
solely between the OPC server and the OPC Proxy in the same network. Thanks
to the tunneling, just the TCP traffic between the two OPC Proxies on opposite
sides must be managed by the firewalls.
The OPC Proxy abstracts the OPC interfaces. The edge that is implementing the
OPC UA client interface can perform OPC UA reading requests even if there is
an OPC Classic server on the other side, and vice versa.
The OPC Proxy can act as an OPC concentrator, gathering the data from several
OPC Classic or UA servers and making it available through a unique endpoint.
The edge only has to implement an OPC UA client interface, so it doesn't have to
run on a Windows box.
The OPC Proxy implements the OPC Classic interface so it can only run on
Windows boxes.
The OPC Proxy extends and makes the acquisition chain more complex by
introducing additional hosts and applications, therefore increasing the potential
points of failure and the related maintenance necessary.
[ 150 ]
Implementing the Industrial IoT Data Flow Chapter 4
Edge on OPC UA
A direct connection to an OPC UA server is definitely the preferred scenario as this allows
us to make the most of the features and capabilities of OPC UA. It can be deployed in two
different ways. In the first setup, the edge is directly connected to the OPC UA server by
means of its OPC UA client interface:
Edge on OPC UA
[ 151 ]
Implementing the Industrial IoT Data Flow Chapter 4
The data source can be one of the ones analyzed previously: a PLC, a DCS, a SCADA
system, or a Historian. OPC abstracts the data source, which is why, in the preceding
diagram, the OPC UA server shows a dotted link for each of these potential data sources.
Obviously, in this scenario, like in the previous one, the OPC server must be specific for
that data source, but from the client's perspective and from the point of view of the edge,
this does not matter.
The strengths of the edge on the OPC UA scenario include the following:
The OPC interface abstracts the underlying device that acts as a data source. This
means there is no need to implement and maintain a connector for each fieldbus
protocol that the edge connects to.
The OPC UA server is typically linked to more than one PLC or DCS that have
the same underlying industrial protocols. A single OPC server can collect the
data coming from more than one PLC, therefore simplifying the deployment of
the edge. Moreover, the OPC UA server can act, through its discover service
function, as an OPC UA server concentrator. This means that it can also expose
the data coming from OPC UA servers that are linked to different underlying
industrial protocols.
Since the edge is not connected directly to the control network, it is easier to
deploy it in a DMZ, as stipulated by the ICS standards.
There is no COM or DCOM communication between the edge and the OPC UA
server. This means that there is no need to deal with the DCOM communication
mechanism based on TCP dynamic ports.
Since the edge must implement just the OPC UA client interface, it is not
restricted to running just on a Windows box.
The OPC UA server will gather data from different data sources. For this reason,
it becomes a crucial device from the perspective of the I-IoT data flow. If it
experiences downtime, this implies the entire I-IoT data flow breaks and we lose
data from all data sources to which the OPC UA server is connected.
[ 152 ]
Implementing the Industrial IoT Data Flow Chapter 4
In case of partial data loss, it is not easy to understand and identify which
underlying data sources are involved. It is also difficult to identify the issue
causing the interruption.
[ 153 ]
Implementing the Industrial IoT Data Flow Chapter 4
The hosting of an OPC UA server directly by the controller is an option that several PLC
vendors are starting to offer. Typically, the OPC UA board has a Linux kernel that hosts the
OPC UA server, which is connected to the controller by means of its internal bus. This
setup, like the previous one, uses the OPC UA interface. In terms of its strengths and
weaknesses, therefore, it is very similar. There are a few main differences, which are listed
as follows:
The OPC UA server running on the controller only makes the data of that
controller available. Due to the segregation and isolation of the control networks,
an edge device would not be able to reach all the OPC UA servers that are hosted
by the different controllers.
There are no issues involved with deploying a firewall between the edge and the
OPC UA server running on the controller. The OPC UA traffic is made up of TCP
or HTTPS traffic, which is quite easy to manage by means of the firewall policies.
On the other hand, the OPC UA server doesn't use authentication or allow its
OPC UA clients to open a communication channel if its security mode is set to
None. As a consequence, it might not be easy to convince field engineers to
connect the PLC to a device, such as the edge connected to the internet.
Summary
In this chapter, we learned how to implement the I-IoT data flow in a complex industrial
plant. We started by looking at OPC, including what it is, how it has evolved, and how it
works in both its implementations—OPC Classic and OPC UA. We explored how it
connects to data sources to expose their data through common interfaces, following a
common data model. We then went into detail about the edge, analyzing its features,
internal components, architecture, interfaces, and current implementation.
After that, we started our journey on the I-IoT data flow. We looked at how to select a data
source to connect to in order to gather data. We explored controllers, such as the PLC or the
DCS, a SCADA system, and a Historian system. Each of these options has its own
advantages and disadvantages; no one is better than the others. The choice must be made
depending on the capabilities and constraints that exist in our plant and the requirements
of our specific use case.
[ 154 ]
Implementing the Industrial IoT Data Flow Chapter 4
We then proceeded on our journey through the I-IoT data flow by analyzing the different
setups for connecting the edge to selected industrial data sources. We looked in depth at
five different setups, highlighting the strengths and weaknesses of each. In general, setups
that use the capabilities of OPC UA are preferable, but this is not always possible, especially
when dealing with old equipment. In cases such as this, the OPC Classic, with or without
OPC Proxy, is the only option.
In the next chapter, we will learn how to secure these edge deployment setups from the
perspective of cybersecurity.
Questions
1. What is the main difference between OPC Classic and OPC UA?
1. OPC Classic uses COM/DCOM, while OPC UA is platform-
independent
2. OPC Classic uses TCP, while OPC-UA uses HTTP
3. OPC Classic does not have a data model, while OPC-UA does
2. What is the mechanism used by DCOM to send and receive information between
COM components in a transparent way on the same network?
1. Marshalling
2. Remote Procedure Call
3. Dynamic TCP ports
3. What is the main advantage of using OPC rather than the legacy industrial
protocols of the device or equipment?
1. OPC allows us to query for industrial data in the most efficient way
2. OPC allows us to query for the industrial data by means of a Windows
box
3. OPC abstracts the connected device, providing standard and common
interfaces to query its data
4. How can we secure the communication channel between an OPC UA client and
an OPC UA server?
1. If the security mode is None, the channel will be set using all the
security mechanisms available
2. If the security mode is Sign, the OpenSecureChannel is sent using the
PrivateKey of the client Application Instance Certificate as a signature
[ 155 ]
Implementing the Industrial IoT Data Flow Chapter 4
6. What is the main difference between an IoT Edge and an I-IoT Edge?
1. The IoT Edge has to deal with standard protocols, while the I-IoT Edge
has to deal with legacy protocols
2. The I-IoT Edge has to manage more signals than the IoT Edge
3. The I-IoT Edge must deal with cybersecurity constraints, while the IoT
Edge can be deployed without taking care of these
7. What is the best industrial data source for gathering the data to be sent to the
cloud?
1. PLC or DCS are the best since they manage all signals at a better
sampling rate and resolution
2. Historians are the best since they collect all data coming from all areas
of the plant
3. The best choice depends on the specific plant environment and the
requirements of the specific use case
8. What is the main advantage of implementing the edge using an OPC Proxy?
1. The OPC Proxy can act as an OPC concentrator, gathering the data
from several OPC Classic or UA servers and making it available
through a unique endpoint without the need to cross DCOM traffic
between network boundaries
2. The OPC Proxy can run on Windows boxes
3. The OPC Proxy makes the data-acquisition chain simpler
9. Why do we often choose to gather data through an OPC Classic interface, rather
than an OPC UA interface?
1. Because OPC Classic makes a more efficient mechanism available for
querying the data from the underlying device
2. Because OPC Classic exposes a simpler information model
3. Because we often have to deal with old devices, for which OPC UA is
not available
[ 156 ]
Implementing the Industrial IoT Data Flow Chapter 4
Further reading
Additional resources can be found at the following websites:
[ 157 ]
5
Applying Cybersecurity
Cybersecurity is a very hot topic and is becoming increasingly important, not just from an
economic perspective but also from a political and social perspective. The economic impact
of cybersecurity is easy to understand: if information technology infrastructure is
compromised in some way, it is logical to expect some kind of economic impact. What is
not so apparent is the importance that cyber security has from a political and social
perspective. The Cambridge Analytica scandal is representative of the importance of
cybersecurity from a political perspective. In this case, the personal data of several million
Facebook users was used, or at least was attempted to be used, to influence the US
elections. If you take a look at international airports, the fights they are battling are now
more to do with cyberwars than traditional conflicts. Many companies manage and own
tons of personal data. Analysis of this issue would require a book in itself and is beyond the
scope of this book, but it is important to bear this in mind when talking about cybersecurity
and data protection.
These concepts are certainly not new; they come from the military. We assume that if an
attack is successful, this represents a failure of a security mechanism and other security
mechanisms should intervene to allow for the adequate protection of the whole system.
This is quite logical if we understand that for an enemy it is more complicated to penetrate
a complex defensive structure that is made up of a number of levels than a single barrier.
This concept is sufficient to provide each organization or company with technologies,
solutions, and technological, operational, and organizational measures and
countermeasures. However, this also leads to an exponential increase in complexity and
costs, without any guarantee of the actual results and even the risk of counterproductive
effects. The way in which the security of company information is to be dealt with should be
established by evaluating the tradeoff between risk and expected profit.
Let's now examine, one by one, the three key elements that comprise DiD: people,
technology, and operating methods.
People
In a company, it is a good idea for information security to be the responsibility of people at
a high level. Senior management must have an in-depth knowledge of cyber risks and
threats. This awareness is the foundation for the following actions:
Issuing policies
Defining procedures
Assigning roles and responsibilities
Training personnel
Implementing physical security measures to protect the technological
infrastructure
[ 159 ]
Applying Cybersecurity Chapter 5
Technology
DiD involves the spread of technological barriers, a multi-level approach, and the
implementation of mechanisms that detect intrusions.
Firewalls
Antivirus and/or anti-spyware software
Intrusion Detection System (IDS) and Intrusion Prevention System (IPS); these
could either be integrated into firewalls or added as independent systems
Cryptography
The definition and application of clear rules to access systems
A hierarchical password system and software that monitors the integrity of the
files
Authentication can occur using techniques linked to biometrics. In this context, Public Key
Infrastructure (PKI) can be implemented at the company level to manage the access keys to
the various components of the infrastructure robustly.
It is important to emphasize the aspects related to the detection that support protection
from intrusions. These must provide, besides the analysis of activities, the answer to these
questions:
Am I under attack?
What is the source of the attack?
What is the goal of attack?
Which countermeasures can I put in place?
[ 160 ]
Applying Cybersecurity Chapter 5
The preceding list of activities is not exhaustive. There are many tools that can be used from
reactive to proactive, preventive measures and remediation techniques, forensics, and even
intelligence techniques.
Given the breadth of the solutions, following the directions of the DiD in an indiscriminate
style could increase the complexity of the whole system, violating the principle of
simplicity, which is very often touted as a best practice in security environment. The
addition of new security and security features increases complexity, which, paradoxically,
entails new risks. What must guide the decision making process in the business
environment is, as always, a balance, but how do we go about making the correct choices? The
answer lies in risk assessment. The priorities for investments in the security sector must be
dictated according to the risks to the company.
DiD is not just one thing, but a combination of people, technology, operations, and, last but
not least, adversarial awareness. The best technology in the world will not prevent humans
from making mistakes. Applying DiD strategies to ICS environments improves security by
making intrusions more difficult and simultaneously improving the probability of detection
and the ability to defend against a malicious threat.
[ 161 ]
Applying Cybersecurity Chapter 5
The end goal is to reduce the opportunities for an adversary and decrease the potential
areas of attack, forcing the attacker to have greater abilities in order to accomplish their
malicious goal. Some of the available and recommended solutions and strategies for DiD
security are listed here. The reader can refer to the Further reading section of this chapter for
a more in-depth explanation of all of these. The DiD strategy elements are as follows:
[ 162 ]
Applying Cybersecurity Chapter 5
Firewalls
One of the best practices of the DiD strategy is to isolate the Control Network (CN), which
is also often called the Process Control Network (PCN), from the corporate and internet
systems using firewalls. While firewalls are widely used in the traditional IT sector, their
adoption in CN/PCN environments is quite recent. Most IT firewalls are generally unaware
of industrial-control protocols and may introduce unacceptable latency into time-critical
systems. They may also face operational constraints that are not typical in the IT world. The
reality is that firewalls can be complex devices that need careful design, configuration, and
management to be efficient and effective. In this section, we are going to look at some basic
information about firewalls and how they are usually deployed in the factory to segregate
the control network and protect industrial devices.
Basically, a firewall is a mechanism used to control and monitor traffic to and from a
network to protect the devices on that network. It checks for the traffic passing through it
and ensures that the network messages fit predefined security criteria or policies. Messages
that don't meet the policies are discarded. It can also be considered a filter that blocks
unwanted network traffic and forces specific constraints on the amount and type of
communication that occurs between a protected network and other networks. A firewall
can exist in different shapes and configurations. It can be a specific hardware device that is
physically connected to a network, a virtual appliance with firewall capabilities running on
a hypervisor, or even a host-based software solution. Separate hardware and software
devices are often referred to as network firewalls and typically provide the most robust and
secure solution and the best management options. From now on, when we use the term
firewalls, we are referring to this latter kind.
Network traffic is sent in discrete sequences of bits, called packets. Each packet contains
separate pieces of information, including, but not limited to, the following:
[ 163 ]
Applying Cybersecurity Chapter 5
A firewall, upon receiving a packet, analyzes the preceding characteristics to establish the
proper action to take. It may drop the packet, buffer it temporarily to limit the bandwidth
usage according to a class of service, or forward it to a different recipient. The firewall
behavior is based on a set of rules commonly referred to as ACLs. There are different
classes of firewalls depending on their analysis and action capabilities; the main ones are
the following:
Stateful inspection firewall: This works like a packet filter firewall but, in
addition, tracks all TCP connections that are open from and to the outside. It
explicitly models the TCP session concept, allowing us to define rules on this
basis. For example, we can automatically accept all packets that come from a
previously authorized TCP session.
Application proxy firewall: In the application proxy firewall, there is no direct
connection between the inside and the outside of the network machines, but
instead there are two separate connections. The proxy works at the application
level, just like the Hypertext Transfer Protocol (HTTP) or the File Transfer
Protocol (FTP). It receives requests according to these protocols and forwards or
blocks them, depending on the configuration. This means that all internal
machines and clients are forced to go through the proxy since direct access to the
external servers is blocked.
Deep-packet-inspection (DPI) firewalls: This is an emerging trend in the
firewall domain. These typically offer filtering deeper into the application layer
than a traditional application proxy, by allowing or blocking the packets
according to their semantics. They are specialized in understanding the industrial
protocols. For example, a DPI firewall for the OPC Classic protocol could be
configured to only allow reading of the underlying device, blocking any attempt
at writing.
[ 164 ]
Applying Cybersecurity Chapter 5
Basically, the goal of the firewalls is to minimize the risk of unauthorized access and
unwanted network traffic to the internal devices on the PCN. The risk minimization
strategy is based on a few golden rules:
There should be no direct connection from the internet to any device linked to
the PCN.
Access from the corporate or plant network to the PCN must be restricted to
what is really needed after an in-depth analysis of the possible alternatives.
The remote support of control systems should only be allowed if secure methods
of authorization are in place.
If wireless devices are used, secured connectivity must be implemented.
Rules and policies must be well-defined, indicating the type of traffic allowed
between the networks.
There should be regular monitoring of the traffic coming in and going out the
PCN.
There should be a secure communication channel for the management of the
firewall.
Common control-network-segregation
architectures
This section outlines the most common security practices that are currently used in
industrial-control environment in terms of the architecture, design, deployment, and
management of the firewall in order to separate the PCN network from the corporate
network.
[ 165 ]
Applying Cybersecurity Chapter 5
[ 166 ]
Applying Cybersecurity Chapter 5
Firewalls on the market offer stateful inspection for all TCP packets and application proxy
services for common internet application protocols such as FTP, HTTP, and SMTP. If they
are well configured, they can significantly mitigate the risk of a successful external attack
on the control network.
Even so, in this scenario, there is an issue related to which network the servers that need to
be shared between the corporate and the control network are on. For this reason, the data
historian in the preceding diagram appears on both networks but is grayed out on the
control network. If a shared server, such as the data historian or an OPC server, resides on
the corporate network, a rule must exist within the firewall that allows the historian or OPC
traffic to communicate with the control devices on the control network. A packet
originating from a malicious or incorrectly configured host on the corporate network and
appearing as legitimated packet (data historian or OPC) would be forwarded to individual
PLCs.
If the shared server resides on the process control network, a firewall rule must exist to
allow all hosts from the corporation to communicate with it, putting the shared servers at
risk of exploits or spoofing.
[ 167 ]
Applying Cybersecurity Chapter 5
[ 168 ]
Applying Cybersecurity Chapter 5
In this scenario, the firewall positioned in the corporate network blocks the arbitrary
packets from proceeding to the control network or the shared historians. The other firewall
prevents unwanted traffic from a compromised server from entering the control network.
[ 169 ]
Applying Cybersecurity Chapter 5
In many cases, there are functional areas or working cells in the control network where
inter-area communication is not necessary or simply not wanted. Splitting and separating
these areas into a number of VLANs allows inter-VLAN communication, which can be
controlled by simple packet filters using layer 3 and layer 2 switches, as shown in the
following diagram:
[ 170 ]
Applying Cybersecurity Chapter 5
In this scenario, VLANs help to prevent the propagation of unwanted traffic across the
whole control network.
Edge on fieldbus
Edge on OPC DCOM
Edge on OPC Proxy
Edge on OPC UA
OPC UA on controller
We have not yet considered the cybersecurity requirements and constraints for each of
these options. In this section, we will understand how to secure them from a networking
perspective, according to the standards of the ICS and the related best practices. As we
outlined in the previous Common control-network-segregation architectures section , securing
the control network is just one of the recommendations of the DiD strategy that can be used
to mitigate the cyber risks of the whole control system environments. There are other best
practices and specific countermeasures to implement to create an aggregated, risk-based
security posture to defend the control systems against cybersecurity threats and
vulnerabilities. Such analysis would require its own book and is beyond our scope here. For
this reason, we have restricted our analysis of the mitigations of the cyber risks in the ICS to
the network architecture, since this plays a key role in the I-IoT data flow. However, the
reader can look at this topic in more depth by checking out the links provided in the Further
reading section.
For each of the preceding five options, the starting point will be the network schema that
we already discussed for the related edge deployment. It will be modified to segregate the
control network, mainly through the creation of DMZs that host the shared devices.
[ 171 ]
Applying Cybersecurity Chapter 5
Edge on fieldbus
The most efficient way to secure this setup is to place the edge device in a DMZ by means
of two firewalls. The first one should control the interface toward the outside and the
second one should control the interface toward the control network, as shown in the
following diagram:
[ 172 ]
Applying Cybersecurity Chapter 5
To isolate the control network from the edge device, a DPI firewall must to be inserted
between the two. Since the DPI firewall provides deep-packet inspection at the application
level, it can really understand the meaning of the packets passing through it, according to
the specific control protocol. A typical use case is to secure a Modbus control network,
allowing only reading, and blocking any attempt at writing to the control devices linked to
it. A DPI firewall for the Modbus protocol could be deployed as described before with rules
that allow just read commands across the firewall and drop all packets with invalid or
unauthorized Modbus function codes. The other firewall that makes up the DMZ zone
would manage the rules to allow the HTTPS traffic generated by the edge device toward
the outside. There are two main difficulties associated with this setup:
DPI firewall availability: A DPI firewall for managing and filtering the traffic of
the specific industrial protocol may not be available or it may be unable to
implement all the rules needed for securing the required scenario.
[ 173 ]
Applying Cybersecurity Chapter 5
Customer policy: Internal customer policies may not allow us to use the same
device to pull the data from the control network and push it at the same time
over the internet.
[ 174 ]
Applying Cybersecurity Chapter 5
The only way to secure this scenario is to place the edge device in a DMZ by means of two
firewalls, the first one controlling the interface toward the outside and the second one
controlling the interface toward OPC Classic. This is shown in the following diagram:
[ 175 ]
Applying Cybersecurity Chapter 5
In this scenario, DCOM traffic occurs between the edge device and the OPC Classic device.
The DCOM traffic is not as easy to manage by means of a firewall, since the DCOM
communication is based on the opening of dynamic TCP ports. Since the OPC server might
use any port number between 1024 and 65535, any firewall placed between OPC Classic
and the edge must necessarily allow the traffic through all those ports, making the firewall
useless. Therefore, in order to keep the DCOM traffic under effective control, we have the
following two options:
Follow the Microsoft suggestion of limiting the range of port numbers that are
dynamically allocated, modifying the settings of the Windows registries of the
Windows box where the OPC Classic server runs. Unfortunately, this solution
makes the configuration more complex for the system administrator, because
each OPC host needs to have its Windows registry adjusted. Furthermore,
subsequent testing has indicated that this technique is not applicable to some
OPC Classic server products that don't work properly.
Use the port number and protocol limitation provided by some OPC Classic
implementations. Unfortunately, not all vendors of OPC products offer this
option.
To isolate the edge device from OPC Classic effectively, a DPI firewall for OPC Classic
must be used. Since the DPI firewall provides deep-packet inspection at the application
level, it can really understand the meaning of OPC Classic packets passing through it. The
difficulties in managing this scenario are basically the same as those mentioned in the
previous Securing edge on fieldbus. We should also consider the constraints and limitations
related to the use of DCOM communications. In general, the edge on OPC DCOM is not an
easy setup to deal with, so it is advisable to introduce an OPC Proxy.
[ 176 ]
Applying Cybersecurity Chapter 5
This setup is much easier to secure from a networking perspective than the previous one.
Since the DCOM traffic is tunneled through TCP by means of the OPC Proxy, two
important goals from a security perspective are achieved:
The edge does not interface with OPC Classic but instead with the OPC Proxy.
This means there is no need to deal with the settings, configuration, and security
of the DCOM. The lower boundary of the edge device toward the control
network is the OPC UA internal server provided by the OPC Proxy.
DCOM traffic occurs just on the internal Ethernet network between the internal
OPC Proxy and the OPC Classic server. There is no DCOM traffic passing
through firewalls, so there is no need to deal with the limitations of the TCP port
opening or the Windows registry settings.
[ 177 ]
Applying Cybersecurity Chapter 5
Then, the external OPC Proxy can be easily deployed in a DMZ by means of a three-way
firewall, as shown in the following diagram. The three-way firewall links the external OPC
Proxy to one of its interfaces and, by means of switches, connects the internal OPC Proxy to
another interface and the edge device to a third one:
In this setup, by placing the external OPC Proxy in a DMZ and tunneling the DCOM traffic
through TCP, three important goals are achieved from a security perspective:
The OPC Classic server does not receive any requests coming from the devices
outside its network. All requests to the OPC Classic server pass through the
internal OPC Proxy linked to the same Ethernet network.
[ 178 ]
Applying Cybersecurity Chapter 5
There is no need to move DCOM traffic across the network boundaries. DCOM
traffic only occurs between the OPC Classic server and the internal OPC Proxy,
which are linked to the same Ethernet network.
The external OPC Proxy is a DMZ. This means that no request comes directly
from the outside to any device on the control network.
Edge on OPC UA
[ 179 ]
Applying Cybersecurity Chapter 5
This setup uses the security model provided by the OPC UA that is firewall-friendly. It
needs a firewall to filter and secure the HTTPS/TCP traffic due to the requests coming from
the edge device placed outside of the network where the OPC UA server is linked. The
deployment of another firewall between the edge and the network linked to the internet
creates a DMZ for the edge deployment, as shown in the following diagram:
The OPC UA has been designed for devices with very different computational capabilities.
According to the OPC UA standards, a device cannot fully implement all the features of the
OPC UA standards. The implementation level of OPC UA is delegated by the vendor. This
setup requires a risk analysis to understand whether and how much the implementation of
the OPC UA security model provided by its vendor fits the security requirements of the
specific scenario. Eventually, an additional DPI firewall could be deployed between the
OPC UA server and the control network to further segregate the OPC UA server.
[ 180 ]
Applying Cybersecurity Chapter 5
OPC UA on controller
This setup, like the previous one, entirely uses OPC UA interfaces. In this scenario, a
firewall must be placed to filter and secure the HTTPS/TCP traffic due to the requests
coming from the edge device placed outside of the network where the OPC UA server is
linked. The deployment of another firewall between the edge and the network linked to the
internet creates a DMZ for the edge deployment, as shown in the following diagram:
[ 181 ]
Applying Cybersecurity Chapter 5
Unfortunately, in this setup, there is no option to deploy a DPI firewall to segregate the
OPC UA server further from the control network, since the communication between the
OPC UA server and the controller occurs through the internal bus of the controller itself.
There are two additional considerations that should be taken into account with regards to
this setup:
[ 182 ]
Applying Cybersecurity Chapter 5
In any case, since this setup exclusively uses the security model provided by OPC UA, a
careful analysis of the OPC UA security model implemented by the vendor must be carried
out to make sure that it is able to cover the security requirements of the specific use case.
Summary
In this chapter, we outlined the DiD approach. You learned that the goal of a DiD strategy
is the achievement of a security posture through the coordinated and combined use of
multiple security countermeasures that is based on two main concepts: defense in multiple
places and layered defenses. We looked at how DiD is based on the integration of three
different elements: people, technology, and operating methods. Since firewalls are an
important part of securing the control network, we also provided a short description of the
different classes of firewall. After that, we explored the most common architectures to
secure the industrial devices linked to the control network.
Following this, we looked at how to segregate a control network by means of DMZ and
VLAN. We examined the most common security practices currently used in the industrial
control environment and analyzed the five options for connecting the edge to the industrial
data sources from a cybersecurity point of view. We also looked at how to secure these
from a networking perspective, according to the ICS standards and the related best
practices.
In the next chapter, we will discover how to implement a basic data flow with OPC UA and
Node-RED.
Questions
1. What are the three main elements that make up a DiD strategy?
1. People, technology, and operating methods
2. Firewall, antivirus, and people
3. Patching, physical barrier, and people
2. Which is the main feature that differentiates a stateful firewall?
1. Packet filtering
2. TCP session modelling
3. Packet inspection at the application layer
[ 183 ]
Applying Cybersecurity Chapter 5
5. What is the main security constraint of the edge in an OPC DCOM deployment
setup?
1. Allowing DCOM traffic to cross the firewall
2. Using a DPI firewall for filtering OPC packets
3. Building up a DMZ to segregate the OPC Classic server
6. What is the main advantage of the edge in on OPC Proxy deployment setup?
1. Tunneling the DCOM traffic through TCP
2. Tunneling the DCOM traffic through TCP and putting the external
OPC Proxy in a DMZ
3. Segregating OPC Classic, since all requests pass through the internal
OPC Proxy
7. What is the main feature from a security perspective of the edge in an OPC-UA
deployment setup?
1. Using a DPI firewall
2. Using the OPC-UA security model
[ 184 ]
Applying Cybersecurity Chapter 5
Further reading
Additional resources can be found at the following links:
[ 185 ]
Applying Cybersecurity Chapter 5
Backdoors and Holes in Network Perimeters: Case Study for Improving ICS
Security: https://ics-cert.us-cert.gov/sites/default/files/recommended_
practices/CSSC-CaseStudy-001_S508C.pdf
Understanding OPC and How it is Deployed: https://www.tofinosecurity.
com/professional/opc-security-white-paper-1-understanding-opc-and-how-
it-deployed
OPC Exposed: https://www.tofinosecurity.com/professional/opc-security-
white-paper-2-opc-exposed
Guidelines for Hardening OPC Hosts: https://www.tofinosecurity.com/
professional/opc-security-white-paper-3-hardening-guidelines-opc-hosts
Security Implications of OPC, OLE, DCOM, and RPC in Control
Systems: https://ics-cert.us-cert.gov/sites/default/files/recommended_
practices/Security%20Implications%20for%20OPC-OLE-DCOM-RPC%20in%20ICS_
S508C.pdf
[ 186 ]
6
Performing an Exercise Based
on Industrial Protocols and
Standards
In Chapter 4, Implementing the Industrial IoT Data Flow, we learned about the Open
Platform Communications (OPC) protocol. We looked at how the features of its original
implementation were based on the Microsoft COM/DCOM architecture. We also looked at
how it evolved into its current Unified Architecture (UA) and how it uses open and inter-
operable standards to overcome constraints and security issues. After that, we learned how
to gather industrial data from different sources in different deployment scenarios, using the
Edge Gateway and OPC UA. Finally, we briefly discussed the differences between the Edge
Gateway and edge computation. In this chapter, we will look at how to implement a basic
data flow with OPC UA and Node-RED.
Prosys OPC UA
Node-RED Edge Gateway
Technical requirements
In this chapter, we will need the following prerequisites:
Git: https://git-scm.com/downloads
Node.js: https://nodejs.org
Docker (optional): https://www.docker.com/products/docker-desktop
Grunt: https://gruntjs.com/
Performing an Exercise Based on Industrial Protocols and Standards Chapter 6
OPC UA Node.js
The OPC UA implementation for Node.js (http://node-opcua.github.io/) is one of the
most popular libraries. It supports both the client and the server. In this exercise, we will
use the library indirectly when we work with Node-RED. We will also reuse it when we
work with AWS Greengrass.
This server publishes three simple random measures called Temperature, Percentage, and
MyVariable2. We will use this simple server for the exercises in this book.
[ 188 ]
Performing an Exercise Based on Industrial Protocols and Standards Chapter 6
[ 189 ]
Performing an Exercise Based on Industrial Protocols and Standards Chapter 6
Simulating measures
Since we don't have a real industrial data source, we have to simulate the measures. To
build a simulated measure, follow these steps:
[ 190 ]
Performing an Exercise Based on Industrial Protocols and Standards Chapter 6
After creating this simulated measure, we can see the signal by checking the appropriate
box in the Visualize column, as shown in the following screenshot:
The edge
We can develop our edge with different technologies and frameworks. In Chapter 10,
Implementing a Cloud Industrial IoT Solution with AWS and Chapter 12, Performing a Practical
Industrial IoT Solution with Azure, we will learn about AWS Edge (Greengrass) and Azure
Edge. We will also develop a simple Edge MQTT with the AWS SDK, Azure SDK, and GCP
SDK. To develop a simple edge to access the OPC UA server, we can use Node.js and the
OPC UA client (http://node-opcua.github.io/). In this simple exercise, we are going to
use a graphical interface called Node-RED.
[ 191 ]
Performing an Exercise Based on Industrial Protocols and Standards Chapter 6
Node-RED
Node-RED is an open source independent edge device, based on Node.js. Follow these
steps:
3. In the same directory, we need to install the support for OPC UA, as follows:
npm install node-red-contrib-opcua
4. We are now ready to work with Node-RED. From the command console, enter
the following command:
node red
[ 192 ]
Performing an Exercise Based on Industrial Protocols and Standards Chapter 6
You can also run the node red command from Docker (see Chapter 7,
Developing Industrial IoT and Architecture for more information). From the
command line, enter the following command:
mkdir data
cd data
npm install node-red-contrib-opcua
cd ..
docker run -it -p 1880:1880 -v $(pwd)/data:/data --name
mynodered nodered/node-red-docker
Remember to replace the localhost with host.docker.internal to
access the local OPC UA server.
We can then connect to http://127.0.0.1:1880 to see the user interface with the OPC
UA modules installed, as shown in the following screenshot:
Node-RED UI
[ 193 ]
Performing an Exercise Based on Industrial Protocols and Standards Chapter 6
We can now build our flow to access the OPC UA server. Follow these steps:
1. From the input on the left, we have to drag and drop the inject node into the
Flow 1 area. In the Payload field, select string and set the Repeat field to every 1
seconds. The following screenshot shows the steps to accomplish this:
[ 194 ]
Performing an Exercise Based on Industrial Protocols and Standards Chapter 6
2. We can configure the access to OPC UA using the OpcUa Client and the OpcUa
Item sections. We need to drag and drop the two nodes into the Flow 1 area and
connect them to each other, as depicted in the following screenshot:
[ 195 ]
Performing an Exercise Based on Industrial Protocols and Standards Chapter 6
3. Double-click on OpcUa Item to enable the configuration page. In the Item field,
write ns=5;s=MyDevice.Pump_01.Pressure to indicate which measure to get
from the OPC UA server. The following screenshot shows how to do this:
[ 196 ]
Performing an Exercise Based on Industrial Protocols and Standards Chapter 6
5. To complete our flow, we need to connect the debug node as the output.
[ 197 ]
Performing an Exercise Based on Industrial Protocols and Standards Chapter 6
6. Finally, we need to click on the Deploy drop-down to start our flow. The final
flow is shown in the following screenshot:
In the Debug tab, we can see the simulated measures. We can use Node-RED to send data
to the I-IoT middleware using the MQTT node.
Summary
In this chapter, we have implemented our first data flow. We will reuse the Prosys OPC UA
Simulation Server in the next two chapters—Chapter 12, Performing a Practical Industrial IoT
Solution with Azure, and Chapter 10, Implementing a Cloud Industrial IoT Solution with AWS.
In these, we will implement a more practical data flow with Azure and AWS. In the next
chapter, we will also look at I-IoT middleware. We will learn about its basic functionalities
and some key use cases that are useful in an industrial context.
[ 198 ]
Performing an Exercise Based on Industrial Protocols and Standards Chapter 6
Questions
1. Which of the following technologies is the most recent standard managed by the
OPC Foundation?
1. UA
2. DA
3. Modbus
2. What is the best way to make the OPC-UA proxy a secure server?
1. Enable the firewall
2. Avoid connecting through DA
3. Connect with SIGNANDENCRYPT authentication
Further reading
Additional resources can be found through the following links:
[ 199 ]
7
Developing Industrial IoT and
Architecture
In this chapter, we will learn about the basic technologies required to develop an I-IoT
platform. We will look at different use cases and how these affect your choice of
technology.
Technical requirements
This chapter will require the following software to be installed on your local PC:
The amount of data acquired, ingested, and processed requires high bandwidth,
storage, and computational capabilities
The devices are distributed across a vast geographical area
Businesses require their architecture to be evolutionary so that new services and
capabilities can be added daily to deploy to customers
More so than other standard applications, flexibility and scalability are vital to a typical IoT
platform. However, IoT and I-IoT both have an interesting benefit, which is the fact that the
relationships between the actors are weak. In the IoT platform, signals are
independent. Data sharding can be used for storing purposes and parallel computation can
be used to improve computational performance. On a typical ERP or e-commerce
application, the data should remain centralized and connected. The following diagram
highlights these differences:
Data structure
[ 201 ]
Developing Industrial IoT and Architecture Chapter 7
This feature of the IoT platform makes it more similar to a NoSQL database than a SQL
database, a microservice-based application than a monolithic application, and a cloud- or
fog-based system than a centralized system. Therefore, IoT is more related to an ecosystem
of services and applications than to an actual application.
OSGi was chosen due to its high degree of modularity. However, after two years of
development and integration with third-party applications and different languages (such as
Java, Python, C/C++, C#, Scala, and Golang), it became apparent that the spaghetti
architecture was still present. I-IoT is not a simple application, but an ecosystem of services
and computational capabilities. After a period of consideration, the authors decided to
adopt a microservices-based architecture. The question remains, therefore, what is a
microservice?
[ 202 ]
Developing Industrial IoT and Architecture Chapter 7
The concept of microservices was introduced by Netflix and later formalized by Newman
and Fowler. Their key benefits are as follows:
[ 203 ]
Developing Industrial IoT and Architecture Chapter 7
Microservices also offer another important feature, which is the flexibility to develop
evolutionary architectures.
If we make the decision to adopt microservices, what happens to the old legacy applications
developed in previous years? How do we convert them to microservices? Refactoring the whole
application would require a huge amount of effort, which would not be approved by the
company. To solve this problem, a new technology called containers was developed.
Containers allow you to package an old application as a single autonomous service, not
quite a microservice but a good starting point. The most common way of implementing this
is by using a Docker.
[ 204 ]
Developing Industrial IoT and Architecture Chapter 7
Docker
Docker is a container-based platform that is able to host one or more components (such as
databases, services, or queues) in a single bundle. Docker is similar to a virtual machine
(VM) but instead of emulating hardware, it shares the operating system and libraries with
the host machine. For this reason, Docker is more similar to a dedicated environment than
to a VM. Other container-based technologies are available in the market, such as LXC or
open container initiative (OCI), and it is likely that over the next few years we will see
these technologies develop massively. For now, however, Docker is the standard
technology used to develop microservices using container-based architecture.
Docker offers the great advantage of allowing us to build a container from a legacy
application so we can port old applications into a new microservice-based ecosystem with
very few changes. Docker has become so popular in the past few years that today it is not
only used for porting old technologies, but also for building new ones. For instance, some
authors use Docker to build analytics.
Let's consider an example. Imagine you have an old legacy application developed in
FORTRAN with a reading file for input, a file database for parameters, and a writing file for
output. You want to deploy this application as a scalable service. You can build your
Docker container by packaging together all of the different elements and exposing it as a
service in your infrastructure.
The following code runs a local Python HTTP server on your machine. Install Docker
Community Edition from https://store.docker.com/, then create a local
directory; create a file called Dockerfile; Finally copy into the file the following
command-line instructions:
FROM alpine
# Install python3
RUN apk add --update python3
# Copy html
RUN mkdir /src
ADD static/ /src
RUN cd /src
# Run http server on port 8080
EXPOSE 8080
CMD ["python3", "-m", "http.server", "8080"]
[ 205 ]
Developing Industrial IoT and Architecture Chapter 7
From the command console, you have to build a simple HTML page called index.html:
mkdir static
echo "<html><body>I-IoT</body></html>" > static/index.html
We have now built our first Docker microservice. On the Dockerfile, we provided the
following instructions to Docker:
[ 206 ]
Developing Industrial IoT and Architecture Chapter 7
We then run the command to build the container and start mapping from port 8080 to 80.
The most important thing is being able to build a container from a simple script written on
the Dockerfile.
These technologies need an orchestrator to allocate resources and manage scalability. Some
of the most common solutions are Kubernetes, Swarm, EC2, GCC, and Mesos. When we
talk about containers, we are referring to Container as a Service (CaaS). We will explore
these technologies later.
If you do not want to use Docker and you want to develop your own microservice, the
following table provides a list of the most common technologies used to build
microservices:
Technology Purpose
Spring Boot Service
Node.js UI
NGNIX UI, balancing
Python Analytic
Golang Service
.NET UI, service
Docker Container
Microservice-oriented technologies
If you need an orchestrator, you can use native cloud capabilities or an agnostic framework,
such as Cloud Foundry (CF).
[ 207 ]
Developing Industrial IoT and Architecture Chapter 7
Recently, Amazon and Oracle have proposed serverless technologies. The basic idea of
these is to provide the developer with the ability to build and deploy a function. In other
words, the developer can build their own function and deploy it in a single click, without
having to redeploy the entire microservice. Serverless technologies can be considered an
extreme version of microservices. When we speak about functions, we refer to Function as
a Service (FaaS) see the following table. We will discover serverless technologies, such as
AWS or Predix or Azure or GCP, in Chapter 9, Understanding Industrial OEM Platforms,
Chapter 10, Implementing a Cloud Industrial IoT Solution with AWS, Chapter 11,
Implementing a Cloud Industrial IoT Solution with Google Cloud, and Chapter 12, Performing a
Practical Industrial IoT Solution with Azure:
These technologies are not strictly related to IoT or I-IoT, but they provide a good point of
reference for implementing an I-IoT application.
[ 208 ]
Developing Industrial IoT and Architecture Chapter 7
The following diagram shows the flow of I-IoT data on the cloud side, summing up what
was already presented in Chapter 2, Understanding the Industrial Process and Devices. During
data-transfer, the data coming from the sensors is gathered from the data sources (such
as PLCs, DCSs, SCADA, or historian) and stored temporarily to avoid data loss due a to a
connectivity issue. The data can be time-series data, such as an event, semi-structured data,
such as a log or binary file, or completely unstructured data, such as an image. Time-series
data and events are collected frequently (from every second to every few minutes). Files are
normally collected when they are triggered locally, which can happen through machine
shutdown or inspection. Shadow data can then be sent, through LAN, to the department
datacenter. The data is then sent by the edge over the WAN and stored in a centralized data
lake and a time-series database (TSDB). The data lake can be cloud-based, an on-premises
datacenter, or a third-party storage system.
Data can be immediately processed using data-stream analytics, which is called a hot path,
with a simple rule engine platform based on the threshold or the smart threshold:
[ 209 ]
Developing Industrial IoT and Architecture Chapter 7
Advanced analytics, including digital twins, machine learning, deep learning, and data-
driven or physics-based analytics, can process a large amount of data (from ten minutes' to
one month's worth of data) coming from different sensors. This data is stored on an
intermediate repository (called a cold path, as can be seen in the preceding diagram). These
analytics are triggered by a scheduler or by the availability of the data and need a lot of
computational resources and dedicated hardware, such as CPU, GPU, or TPU.
Azure refers to cold paths and hot paths. In a hot path, data is processed
immediately. In a cold path, data is stored and processed later. Normally,
we use data streams for hot paths and micro-batch analytics for cold
paths.
These analytics often need additional information, such as the model of the machine being
monitored and the operational attributes; this information is contained in the asset registry.
The asset registry has information about the type of asset you are monitoring, including its
name, serial number, symbolic name, location, operating capabilities, the history of the
parts it is made up of, and the role it plays in the production process. In the asset registry,
we can store the list of measures of each asset, the logic name, the unit of measure, and the
boundary range. In the industrial sector, this static information is important so that the
correct analytic model can be applied.
[ 210 ]
Developing Industrial IoT and Architecture Chapter 7
The output of the analytics, either stream-based or advanced, is typically calculated data.
This might be the expected performance; the operability drift; an alarm such as an anomaly
vibration; a recommendation, such as clean, replace filter, or close valve; or a report. These
insights can be analyzed by an operator to provide information to the operations manager.
They might then decide to open a work order to improve the process. Although these
operations are not part of an I-IoT process, the analytics should skip this data because the
signals are unpacked and therefore nothing useful can be concluded. For example, if you
decide to turn off a machine whose status is being monitored, this should be interpreted as
an anomaly and the analytics should raise an alert that something is wrong.
Occasionally, I-IoT can use big data analytics to process a huge amount of data, such as
images or raw data. These technologies can employ data lakes and big data platforms.
[ 211 ]
Developing Industrial IoT and Architecture Chapter 7
The NAME column refers to the logic name of the item, the VALUE field stores the measured
value, and the QUALITY column, which can be GOOD, BAD, UNKNOWN, or ERROR, stores
information about the data acquisition. We can sometimes store additional information
such as the attribute and data type.
OSIsoft PI
This is a native, commercial TSDB database developed by OSI Soft. It is very efficient and
provides support for data-collecting and data-streaming. This technology is often used for
local storage, departmental storage, and remote storage in the industrial sector. OSI PI uses
a proprietary protocol to send data between departments. One of the most interesting
features of OSI PI is its ability to support analytics and to make historical and real-time data
instantly accessible to users.
Proficy Historian
This is one of OSI PI's competitors, developed by General Electric (GE). It is a native TSDB
that provides optimal support for data-interpolation and filtering.
KairosDB
This is a popular, open source TSDB based on Apache Cassandra. It is not a native TSDB
since it uses Cassandra for data storage. We will work with KairosDB in Chapter 8,
Implementing a Custom Industrial IoT Platform.
Riak TS (RTS)
This is a NoSQL TSDB optimized for time-series data. It ingests, transforms, and stores
massive amounts of time-series data with a good scalability. It is integrated with the most
popular languages and technologies.
[ 212 ]
Developing Industrial IoT and Architecture Chapter 7
Netflix Atlas
This is an in-memory database used to store time-series data for KPI. It is distributed under
the Apache License.
InfluxDB
InfluxDB is a very fast time-series database distributed under an open source license with
commercial support. InfluxDB allows for precision to a nanosecond.
Elasticsearch
This is a popular, document-based database for fast indexing and searching. It is not a real
TSDB, but thanks to its high indexing performance and its ability to integrate with Kibana,
it can be used to visualize data.
Cloud-based TSDBs
In the next chapters (Chapter 9, Understanding Industrial OEM Platforms, Chapter
10, Implementing a Cloud Industrial IoT Solution with AWS, Chapter 11, Implementing a Cloud
Industrial IoT Solution with Google Cloud and Chapter 12, Performing a Practical Industrial IoT
Solution with Azure), we will look at some of the solutions proposed by AWS, Azure,
Google, and Predix for TSDB. Generally speaking, these solutions support TSDBs based on
NoSQL and SQL databases (Apache Cassandra, SQL Server, and DynamoDB) and extend
the schema with the API.
OpenTSDB
OpenTSDB is very similar to KairosDB but it is less popular and based instead on either
Apache Cassandra or HBase. In the following simple example, we are going to start a
single-node OpenTSDB based on HBase. We will use a Docker image to avoid a long
installation process.
OpenTSDB uses the RESTful API and Telnet to query and ingest data. To install
OpenTSDB, download the code from GitHub:
git clone
https://github.com/PacktPublishing/Hands-On-Industrial-Internet-of-Things
cd Chapter07/opentsdb
[ 213 ]
Developing Industrial IoT and Architecture Chapter 7
OpenTSDB GUI
[ 214 ]
Developing Industrial IoT and Architecture Chapter 7
Now we can push our first time-series using the RESTful API through CURL. You could
also use a simple rest API client installed on Chrome, either Postman or an advanced REST
client. Execute the following command several times, changing the value and the
timestamp:
curl -d '{"metric": "sys.cpu", "timestamp": 1529176746, "value": 80,
"tags": {"host": "localhost", "quality" : "GOOD"}' \
-H "Content-Type: application/json" \
-X POST http://localhost:4242/api/put
A metric name
A UNIX timestamp
A value
A set of tags (key-value pairs) that describe the time-series that the point belongs
to
This example highlighted two important concepts: tags, and metrics or measures. In the
IoT, we refer to tags when we want to provide a name to a measure or when we want to
add attributes to a measure. Unfortunately, this naming is not standard and can cause
confusion.
[ 215 ]
Developing Industrial IoT and Architecture Chapter 7
Asset registry
Asset registry is the second most important repository of the I-IoT. On the asset registry, we
need to store information about our assets and the relationships between them. This
information is normally imported from enterprise resource planning (ERP) or
computerized maintenance management system (CMMS). Assets are linked together
by membership or production-flow relationships.
Asset instances: The CT001 asset is a Barrel Pump and belongs to the fleet, a
subsystem, a plant, or a factory. We collect several measures from it, such as
temperature. Every node of the asset should expose some simple or structured
properties, such as asset type, aliases, name, ID, and custom attributes.
Asset classes: Through classification, you can define the type of asset using an
object-oriented methodology.
[ 216 ]
Developing Industrial IoT and Architecture Chapter 7
The asset registry is normally a graph or a relational database where information about the
asset and measures is stored to be monitored and collected. The following is a list of
technologies that can be used to implement our asset registry:
Neo4j: A NoSQL database with native support for graphs. Neo4j can be used to
model the asset registry and the relationship between assets.
MongoDB: A document-based (JSON) database. We can use it to store
information about our asset list and measures.
Standard SQL: Can be used to model our asset hierarchy using the relationship
between tables, but cannot be used to define a very flexible structure.
There is some overlap between these classes. They are summarized in the following table:
Descriptive use cases Diagnostic Predictive Prescriptive
What is the status? Why is it happening? What's going to happen? What do I have to do?
Expected output
KPI, health of the asset, Recommendations and/or
Alarm and root-cause Recommendations and/or
efficiency, or insights. Probability that the
analysis. insights.
performance. event will occur.
Techniques
Mathematical Statistic, deep/machine Digital twin, physical model,
Rule-based,
formula and learning, data driven, deep learning, reinforcement
mathematical formula.
knowledge-based. regression. learning.
Generally speaking, prescriptive analytics are the most complex analytics and descriptive
analytics are the simplest.
[ 217 ]
Developing Industrial IoT and Architecture Chapter 7
Due to the large amount of data and the correlation between measures, we need at least
two levels of data-processing analytics: excursion monitoring analytics (EMAs) and
advanced analytics.
EMAs
EMAs are the first layer of analytics, which are normally used for descriptive, diagnostic, or
short-term predictions. Signals are acquired and sent to a rule engine directly using a data-
stream mechanism. Simple rules analyze signal by signal to discover any potential
deviations from standard behavior. A simple threshold rule is a common example. EMAs
are not as simple as they might appear: the engine should avoid raising the same event
multiple times within a time range due to oscillations of the signal. The following diagram
shows a typical example:
Example of excursion-monitoring analytic raising the same event multiple time in a few minutes
EMAs can also monitor calculated measures that are made up of raw measures. In an
industrial context, we refer to raw measures to describe the measures acquired from the
sensor prior to conversion. More sophisticated rule engines tune the threshold according to
data acquired in the past or from user feedback.
From a technological point of view, there are sophisticated commercial products, such as
OSI PI, Predix APM, and Azure PI Integrator, that cover these use cases. These solutions
can be installed on the edge or in the cloud.
The fastest way to develop our rule engine solution is to use a serverless solution. We will
see an example of this in the next chapters (Chapter 10, Implementing a Cloud Industrial IoT
Solution with AWS, Chapter 11, Implementing a Cloud Industrial IoT Solution with Google
Cloud and Chapter 12, Performing a Practical Industrial IoT Solution with Azure).
[ 218 ]
Developing Industrial IoT and Architecture Chapter 7
Advanced analytics
The second level of analytics is advanced analytics, which are used for prediction and
prognosis. An example of these analytics is Digital Twin, which tries to replicate the
physical model of an asset through a mathematical formula or a data-driven model. Digital
Twin is a digital representation of an asset or system across its life cycle, which can be used
for various purposes.
Advanced analytics require a more complex platform. They process a relatively large
amount of data (such as 30 to 60 measures over the last 20 minutes to one month) and
compare it with a digital model or the expected behavior. Advanced analytics require more
information, such as model of the asset, its operability, the digital model (which is learned
or defined previously), and the status of the previous analysis. Common technologies used
to develop these analytics are R, Python, Giulia, Scala, and MATLAB. Engines hosting these
analytics are cloud-based platforms or big data platforms.
Summary
In this chapter, we looked at the basic concepts of I-IoT data-processing. We learned about
the key principles of data storage, including time-series, asset-management using asset
registry, and data-processing with analytics and Digital Twin.
In the next chapter, we will build a real I-IoT solution based on common open source
technologies.
[ 219 ]
Developing Industrial IoT and Architecture Chapter 7
Questions
1. What are the main differences between the microservices and monolithic
architectures?
1. Microservices are small services that can be deployed separately
2. Monolithic applications improve modularity but not scalability
3. Microservices perform better than monolithic architectures
2. True or false: Docker is the primary technology used to build microservices.
1. True
2. False
3. What is an asset registry?
1. A SQL database to store time-series
2. A NoSQL database to store static information about devices monitored
3. A repository that stores information about assets and measures
Further reading
Check out the following links for more information:
[ 220 ]
8
Implementing a Custom
Industrial IoT Platform
In Chapter 7, Developing Industrial IoT and Architecture, we described the general flow of
processing I-IoT data. In this chapter, we are going to update our flow, using the most
common open source technologies to develop our platform from scratch. These include the
following:
KairosDB as a TSDB
Neo4j as an asset DB
Kafka and Mosquitto to collect data and adapt
Airflow as an analytics platform
The purpose of our exercise is not to develop a real I-IoT platform, but instead to discuss
some key topics related to I-IoT and to introduce various technologies that we can use in
our on-premise I-IoT stack. At the end of this chapter, we will compare this solution with
other open source solutions such as Kaa IoT or Eclipse IoT.
Technical requirements
In this chapter, we need the following:
To get a complete solution of the exercises, take a look at the repository at https://github.
com/PacktPublishing/Hands-On-Industrial-Internet-of-Things .
Return of investment: We might have a budget that is too small to justify a big
investment
Technology: We might want to use technology that does not depend strictly on a
supplier
Privacy: We might want to export data outside our country
Data shadowing: We might need a copy of the data on our legacy platform,
either to experiment with or just as a backup
Integration: We might be implementing an inter-cloud platform, meaning we
need an an integration layer
Experience: We might be developing our first I-IoT platform, so we want to start
small
Product: We might want to develop our product using only cloud infrastructure
Intellectual property: We might need a small space to run our analytics,
helping to protect our intellectual property
[ 222 ]
Implementing a Custom Industrial IoT Platform Chapter 8
In this chapter, we will develop a custom platform by reusing the concepts that we learned
in the previous Chapter 7, Developing Industrial IoT and Architecture. The following diagram
depicts the proposed solution:
We will use Mosquitto as a Message Queue Telemetry Transport (MQTT) data broker in
conjunction with Kafka to improve scalability and to provide an isolation layer. We can
replace MQTT with different technologies, and we can also use it in conjunction with a
different protocol by replacing Mosquitto only. Our data will be stored on Cassandra, using
a KairosDB data abstraction layer. Rule-based analytics will process data through the Kafka
data stream (hot path), while advanced analytics will use Python and Airflow to process a
small portion of data via micro-batch processing (cold path). The results of these analytics
can be stored on Elasticsearch or a SQL database. We won't be covering how to manage this
data as part of this project. Finally, we will implement an asset registry using Neo4j.
[ 223 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Data gateway
In the first couple of chapters of this book , we dealt with the most common protocols for
data transmission. In this example, we will use MQTT. MQTT isn't the only choice for the I-
IoT, but it has great support for HTTPS encapsulation. In the industrial sector, we normally
have a plethora of technologies, such as MQTT, Constrained Application Protocol (CoAP),
Open Platform Communications (OPC), and OPC Unified Architecture (UA), and we
need to integrate all of these in a single data bus. In this example, we will use Mosquitto
(https://mosquitto.org/download/) for frontend data acquisition and Apache Kafka
(https://kafka.apache.org/) for data distribution.
2. We are going to deploy the official Mosquito distribution, so we can test our
MQTT broker. We need to install an MQTT client. Run the following command
with an administrative user:
$ npm install mqtt-cli -g
[ 224 ]
Implementing a Custom Industrial IoT Platform Chapter 8
In Kafka, each topic is divided into a set of logs called partitions. The producers write to the
tail of Kafka's logs and consumers read the logs. Apache Kafka scales topic consumption by
distributing partitions among a consumer group. A consumer group is a set of consumers
which share a common group identifier. The following diagram shows a topic with three
partitions and two consumer groups with two members:
[ 225 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Each partition in the topic is assigned to exactly one member in the group.
Apache Kafka will only conserve the order inside a partition. If we need to
preserve the order, we can use RabbitMQ or reduce the number of
partitions to one.
We have to install Apache Kafka and an MQTT plugin which subscribes to Mosquitto's
topics. The MQTT plugin is available at https://github.com/evokly/kafka-connect-mqtt,
or we can use an already-configured Docker container at https://github.com/
PacktPublishing/Hands-On-Industrial-Internet-of-Things. Enter the following
command in the console:
$ git clone
https://github.com/PacktPublishing/Hands-On-Industrial-Internet-of-Things
$ cd Chapter08/kafka-mqtt-connector
$ docker build . -t iiot-book/kafka-mqtt
EXPOSE 9092
EXPOSE 2881
# INSTALL GETTEXT
RUN apk update \
&& apk add gettext
# MQTT
ADD mqtt.properties /tmp/mqtt.properties
ADD start-all.sh start-all.sh
VOLUME ["/kafka"]
[ 226 ]
Implementing a Custom Industrial IoT Platform Chapter 8
The current Docker image uses an existing standard Kafka installation and copies the JAR
plugin and mqtt.properties on it.
These examples lose data when the container is stopped. To keep data, we
need to mount an external volume.
We can now test our chain. Get your IP address using ifconfig (or ipconfig for
Windows), then launch the following command (please replace the <ip> marker with your
IP address, such as 192.168.0.1):
$ docker run -p 9092:9092 -p 2181:2181 -e MQTT_URI=tcp://<ip>:1883 -e
KAFKA_ADVERTISED_HOST_NAME=<ip> iiot-book/kafka-mqtt:latest
We now need to subscribe to the MQTT Kafka topic. We can install an official consumer
from the Apache Kafka distribution (https://kafka.apache.org/downloads) and run the
following command:
$ <your kafka home>/bin/kafka-console-consumer.sh --bootstrap-server
<ip>:9092 --topic mqtt --from-beginning
Alternatively, we can run the same Docker image using the interactive mode:
$ docker run -it iiot-book/kafka-mqtt:latest /opt/kafka/bin/kafka-console-
consumer.sh
bootstrap-server <ip>:9092 --topic mqtt --from-beginning
To test the status of our containers, we can run the following command:
$ docker ps
To test our chain, we can publish our first message to the MQTT data broker:
$ mqtt-cli localhost topic/device0 "device0.my.measure.temperature,
27,GOOD"
[ 227 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Testing the MQTT Kafka chain, the top window shows the expected output, while the central window shows the signal sent
The MQTT Kafka publishes a message in Kafka's queue called mqtt and encodes the
payload to base-64:
{"schema":{"type":"bytes","optional":false},"payload":"ZGV2aWNlMC5teS5tZWFz
dXJlLnRlbXBlcmF0dXJlLCAyOCxHT09E"}
To provide good scalability, Apache Kafka uses a key for partitioning. This means that
every device or group of devices can push messages on different MQTT topics.
[ 228 ]
Implementing a Custom Industrial IoT Platform Chapter 8
1. We have to define the file pom.xml in the root directory of the project. Then, we
can add the following dependencies:
...
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>1.1.0</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<version>1.7.2</version>
<exclusions>
<exclusion>
<artifactId>slf4j-api</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>com.101tec</groupId>
<artifactId>zkclient</artifactId>
<version>0.4</version>
<exclusions>
<exclusion>
[ 229 ]
Implementing a Custom Industrial IoT Platform Chapter 8
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
...
import org.apache.commons.collections4.map.HashedMap;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.*;
import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.internals.*;
class RuleEngineDemo {
// window size within which the filtering is applied
private static final int WINDOW_SIZE = 5;
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> source = builder.stream("mqtt");
[ 230 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Alternatively, we can add a timestamp when the signal has been acquired:
measure_name,timestamp,value,quality
3. We can write two functions to extract the measure and the value from the MQTT
payload:
public static String getKey(String key, String value) {
if (values.length>1) {
return values[0];
} else {
return value;
}
}
if (values.length>2) {
return Double.parseDouble(values[2]);
} else if (values.length>1) {
[ 231 ]
Implementing a Custom Industrial IoT Platform Chapter 8
return Double.parseDouble(values[1]);
} else {
return Double.parseDouble(value);
}
}
});
[ 232 ]
Implementing a Custom Industrial IoT Platform Chapter 8
WindowedDeserializer<>(Serdes.String().deserializer(),
WINDOW_SIZE);
Serde<Windowed<String>> windowedSerde =
Serdes.serdeFrom(windowedSerializer, windowedDeserializer);
// the output
max.to("excursion", Produced.with(windowedSerde,
Serdes.String()));
Using the selectKey function, we extract the key and we group by key
using groupByKey. Then we extract only the maximum value in a range of five seconds
using the windowedBy function. Finally, we can apply our rule to filter values that are
greater than a given threshold. Please notice that our simple rule engine uses two maps to
translate the measure to a standard measure and to get the right threshold.
//rules definition
Map<String,Double> excursion = new HashedMap<>();
excursion.put("temperature", 25.0);
Normally, this information can be stored on the asset registry. In an I-IoT environment, we
acquire trillions of measures every day. Each measure has a hard-to-read encoded name, so
normally we translate it to a human readable name and apply the algorithm.
For example, the following two measures shown on the left can be translated to the names
on the right:
device0.my.measure.temperature -> temperature of asset MY
device1.X523.MODEL7.TEMP -> temperature of asset X523
[ 233 ]
Implementing a Custom Industrial IoT Platform Chapter 8
We should apply the same rule to each. Rules are more complex than a simple threshold,
but we can extend our code easily. To test our example, we have to start Mosquitto, Kafka,
and our Rule Engine. We then have to subscribe to the excursion queue:
$ docker run -it iiot-book/kafka-mqtt:latest /opt/kafka/bin/kafka-console-
consumer.sh --bootstrap-server <ip>:9092 --topic excursion --from-beginning
When the temperature reaches 26 degrees, we receive a message in the excursion queue
to notify us.
To complete our example, we should also consider the quality of the signal, ignoring it if it
has a BAD or UNKNOWN quality.
Apache Cassandra
Apache Cassandra is a decentralized NoSQL database that has a good level of scalability
and high availability without compromising performance. Apache Cassandra supports
thousands of nodes and different levels of replicas and consistency. It also has a high level
of data sharding.
[ 234 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Apache Cassandra is organized as a ring of nodes. Each node takes care of a portion of
data, according to a hash code that is calculated based on a key. The following diagram
shows how a client should allocate the measures across the Apache Cassandra data nodes:
In the preceding diagram, node B is taking care of a specific hash data range, 51214-60012,
while node A is taking care of 32768 to 40917. When the client ingests data,
Murmur3Partitioner calculates the hash from the key.
From a storage point of view, Apache Cassandra is based on SSTable. The writing to the
disk is asynchronous. Apache Cassandra uses append mode and data is invalidated rather
than physically deleted. These features make Cassandra a good candidate for I-IoT storage.
Unfortunately, Cassandra doesn't have native support for time-series, but an open source
project has developed these APIs to make Cassandra a TSDB.
[ 235 ]
Implementing a Custom Industrial IoT Platform Chapter 8
KairosDB
KairosDB is a fast time-series database on top of Apache Cassandra. In the previous
chapter, we learned about OpenTSDB, which is based on HBase and has recently
introduced support for Apache Cassandra. KairosDB, similarly, has a very active and smart
community and a very flexible API to build plugins. The APIs of KairosDB are very similar
to OpenTSDB, so we can replace this implementation with OpenTSDB with minimum
effort.
To make KairosDB ready to ingest data from Apache Kafka, we need to perform the
following steps:
We can now install KairosDB, and we now have to develop and configure our plugin to
ingest data from Apache Kafka to Apache Cassandra.
Installing KairosDB
To install KairosDB, get the latest version from https://github.com/kairosdb/kairosdb/
releases. We then need to unpack it. To connect to Apache Cassandra, we need to open
<KAFKA_HOME>/conf/kairosdb.properties and add the comment to H2Module and
remove the CassandraModule comment:
#kairosdb.service.datastore=org.kairosdb.datastore.h2.H2Module
kairosdb.datastore.concurrentQueryThreads=5
kairosdb.service.datastore=org.kairosdb.datastore.cassandra.CassandraModule
[ 236 ]
Implementing a Custom Industrial IoT Platform Chapter 8
During the first run, KairosDB will create three column families to store information on
Apache Cassandra:
We should now test our KairosDB instance. We can ingest a small portion of data using the
KairosDB REST API:
curl -d '[
{
"name": "device0.my.measure.temperature",
"datapoints": [[1529596511000, 11], [1529596525000, 13.2],
[1529596539000, 23.1]],
"tags": {
"host": "localhost",
"data_center": "DC1",
"quality" : "GOOD"
},
"ttl": 300
}]' -H "Content-Type: application/json" -X POST
http://localhost:8080/api/v1/datapoints
[ 237 ]
Implementing a Custom Industrial IoT Platform Chapter 8
To visualize the output, we can open the browser to http://localhost:8080/. Set the
From field and provide device0.my.measure.temperature as the measure name. Under
the Aggregators section, select the SCALE option. This gives us the following result:
[ 238 ]
Implementing a Custom Industrial IoT Platform Chapter 8
The Kafka module gets the properties from the file, starts the consumer, and instantiates the
Kafka service:
@Provides
private Consumer<byte[], byte[]> provideConsumerConnector(
@Named("kairosdb.kafka.bootstrap.servers") String bootsrapserver,
@Named("kairosdb.kafka.group.id") String groupid,
TopicParserFactory factory)
{
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootsrapserver);
///TBD
props.put(ConsumerConfig.CLIENT_ID_CONFIG, "kafka-mqtt");
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupid);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
ByteArrayDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
ByteArrayDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
[ 239 ]
Implementing a Custom Industrial IoT Platform Chapter 8
consumer.subscribe(factory.getTopics());
The KafkaService has the responsibility of fetching data and starting a thread to ingest it:
@Override
public void start() throws KairosDBException
{
while (run) {
final ConsumerRecords<byte[], byte[]> consumerRecords =
consumer.poll(1);
if (consumerRecords.count()==0) {
try {
TimeUnit.MILLISECONDS.sleep(10);
} catch (InterruptedException e) {
logger.error(" MAIN " , e);
}
continue;
}
int threadNumber = 0;
executor.submit(new ConsumerThread(publisher, topicParserFactory,
consumerRecords, threadNumber));
threadNumber++;
consumer.commitAsync();
}
}
[ 240 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Finally, ConsumerThread for each record gets the Parser and ingests the data:
@Override
public void run()
{
Thread.currentThread().setName("TH " + this.threadNumber);
stream.forEach(record -> {
String topic = record.topic();
System.out.println(topic);
TopicParser parser;
try {
parser = topicParserFactory.getTopicParser(topic);
} catch (Exception e) {
e.printStackTrace();
throw e;
}
The parser can be defined in a property file for each topic. The kairos-
kafka.properties file is as follows:
kairosdb.service.kafka=org.kairosdb.plugin.kafka.KafkaModule
kairosdb.service_folder.kafka=lib/kafka
kairosdb.kafka.consumer_threads=2
kairosdb.kafka.bootstrap.servers=localhost:9092
kairosdb.kafka.group.id=kairos_group
# To declare consumer thread classes you must prefix the property with
# kairosdb.kafka.topicparser and then give the declaration a name.
kairosdb.kafka.topicparser.stringparser.class=org.kairosdb.plugin.kafka.iio
[ 241 ]
Implementing a Custom Industrial IoT Platform Chapter 8
tbook.parser.MQTTJsonTopicParserImpl
kairosdb.kafka.topicparser.stringparser.topics=mqtt
kairosdb.kafka.topicparser.stringparser.metric=test_metric
gson-2.2.4.jar
jackson-jaxrs-json-provider-2.2.3.jar
kafka-streams-1.1.0.jar
jackson-core-2.2.3.jar
jackson-module-jaxb-annotations-2.2.3.jar
kairos-kafka-1.0-SNAPSHOT.jar
jackson-jaxrs-base-2.2.3.jar
kafka-clients-1.1.0.jar
When KairosDB starts, it looks for any properties files located in <KAIROSDB_HOME>/conf
to instantiate the plugins. We are now ready to test the entire flow from MQTT to Kafka to
Cassandra:
$ mqtt-cli localhost topic/device1 device0.my.measure.temperature,50,GOOD
$ mqtt-cli localhost topic/device1 device0.my.measure.temperature,50,GOOD
[ 242 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Graphite
Kairos can accept the Graphite (https://graphite.readthedocs.io/en/latest/feeding-
carbon.html) plain-text and pickle protocols. These protocols are plain-text protocols in the
following form:
measure_name value timestamp
[ 243 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Installing Airflow
Installing and running the Apache Airflow single node is quite simple. Follow these steps:
$ export AIRFLOW_HOME=$(pwd)
[ 244 ]
Implementing a Custom Industrial IoT Platform Chapter 8
We now have to define our connection to KairosDB. Unfortunately, Airflow doesn't have a
connection to KairosDB, but this can be easily created by building a simple operator.
[ 245 ]
Implementing a Custom Industrial IoT Platform Chapter 8
[ 246 ]
Implementing a Custom Industrial IoT Platform Chapter 8
If the KairosDB is running on another host, we need to change the IP address from the
localhost to the IP of the server. We can now start to develop our operator. On
the <KAIROSDB_HOME>/plugins directory, we need to create a Python file with the
name kairosdb_operator_plugin.py. The following code shows our implementation:
from airflow.plugins_manager import AirflowPlugin
from airflow.hooks import HttpHook
from airflow.models import BaseOperator
from airflow.operators import BashOperator
from airflow.utils import apply_defaults
import logging
import textwrap
import time
import json
import datetime
[ 247 ]
Implementing a Custom Industrial IoT Platform Chapter 8
import logging
class KairosDBOperator(BaseOperator):
@apply_defaults
def __init__(
self,
query,
http_conn_id='http_kairosdb',
*args, **kwargs):
super(KairosDBOperator,self).__init__(*args,**kwargs)
self.query=query
self.http_conn_id = http_conn_id
self.acceptable_response_codes = [200, 201]
self.http = HttpHook("GET", http_conn_id=self.http_conn_id)
return response
[ 248 ]
Implementing a Custom Industrial IoT Platform Chapter 8
def _mean(data):
ret={}
for d in data:
results = d['results']
for r in results:
m = [float(sum(l))/len(l) for l in zip(*r['values'])]
ret[r['name']] = m[1]
print(ret)
return ret
[ 249 ]
Implementing a Custom Industrial IoT Platform Chapter 8
kairos_operator = KairosDBOperator(
task_id='get_data',
query={
"metrics": [
{
"tags": {},
"name": "device0.my.measure.temperature",
"aggregators": [
{
"name": "scale",
"factor": "1.0"
}
]
}
],
"plugins": [],
"cache_time": 0,
"start_relative": {
"value": "1",
"unit": "years"
}
},
dag=dag)
myanalytic_task= PythonOperator(
task_id='myanalytic',
provide_context=True,
python_callable=print_context,
dag=dag)
The first block imports the Airflow dependencies. The functions my_mean and _mean define
the main functionalities. These functions extract data and calculate the mean. We then have
to define our workflow:
kairos_operator >> myanalytic_task
In the first step, we get the data from the last year. During the second step, we calculate the
mean for each tag. To test our code:
[ 250 ]
Implementing a Custom Industrial IoT Platform Chapter 8
To see the status of execution of the workflow, we can click on DAG: mymean to see the
details. The following screenshot shows the expected output:
[ 251 ]
Implementing a Custom Industrial IoT Platform Chapter 8
What's still missing here? We have discovered the capabilities of Apache Airflow to
orchestrate our analytics and to develop extensions for our data source. To make Apache
Airflow a platform for I-IoT, however, we need a connector to our asset registry and a
simple task able to run analytics for each asset we want to monitor. It is very easy to extend
our plugin for KairosDB to Neo4j using the Neo4j driver for Python. This is described in
more detail in the following section. To scale the execution of our analytics across our fleet
according to our asset list, we can build our DAG dynamically.
Apache Flink: Apache Flink is a data-stream processor with great support for
streaming analytics. The major advantage of Flink is that it has stateful support.
Apache Flume: Flume is normally used for data-ingestion and Extract,
Transform and Load (ETL) in the Hadoop Distributed File System (HDFS).
Apache Storm: Apache Storm is a distributed computational processing system
with good support for real-time analytics, online machine learning, continuous
computation, distributed RPC, ETL, and more.
Apache Beam: Apache Beam is an abstraction API with support for Apache
Flink, Apache Apex, and Google Cloud Dataflow.
Apache Spark: Apache Spark is the most popular framework for big data and
stream processing. Apache Spark can run on a Yet Another Resource Negotiator
(YARN) cluster, a native cluster, or Mesos. Several IoT cloud platforms have
support for Spark, including AWS, Azure, Google, Predix, and IBM Bluemix.
[ 252 ]
Implementing a Custom Industrial IoT Platform Chapter 8
In the following example, the ACME: Enterprise manages SydneyPlant. This in turn
hosts a subsystem, which includes CT001:
ACME:Enterprise -> SydneyPlant:Plant -> Train1:Segment -> CT001:Pump
CT001 is a pump, which we can use to connect several sensors to monitor, for instance, the
temperature:
[ 253 ]
Implementing a Custom Industrial IoT Platform Chapter 8
A consumer (analytic, human, hardware device) of an asset registry could simply ask, Give
me all the measures of CT001, or, give me all the temperatures of the assets belonging to Train1. An
asset registry could manage thousands of assets and millions of measures, so we need an
efficient mechanism to store and retrieve this information. It is very easy to understand
why a graph database is better than a relational database. A graph database is a database
which stores information about nodes and the relationships among nodes. An efficient
graph database is Neo4j (https://neo4j.com/).
An asset registry should support, also, different version of the same asset
to allow back or forward compatibility with different tools.
Neo4j supports a powerful query and a create, read, update, and delete (CRUD) language
called Cypher (https://neo4j.com/docs/developer-manual/3.4/cypher/), and it also
supports the REST API interface and Java, Python, C#, and JavaScript drivers to work with
Cypher.
When service is up and running, we can access the Neo4j UI; we can open the browser
at: http://localhost:7474/browser/.
[ 254 ]
Implementing a Custom Industrial IoT Platform Chapter 8
The following screenshot shows the user interface of the Neo4j UI:
Neo4j will ask to change the password before you proceed; the default username is neo4j
and the password is neo4j.
1. Create the asset. On the Neo4j user interface, we can write the following
commands:
CREATE (CT001:Pump {name:'CT001', alias:'Pump-SN-993416776',
model:'standard'})
CREATE (Train1:Section {name:'Production Train 1'})
CREATE (SydneyPlant:Plant {name:'Plant of ACME in Sydney'})
CREATE (ACME:Company {name:'ACME International'})
[ 255 ]
Implementing a Custom Industrial IoT Platform Chapter 8
3. Create the measures (tags) and associate them to the CT001:Pump asset:
CREATE (TEMP01:Measure {name:'CT001.TEMPERATURE01',
alias:'TEMP01', type:'TEMPERATURE', uom:'DEG'})
CREATE (FLOW01:Measure {name:'CT001.FLOW01', alias:'FLOW01',
type:'FLOW', uom:'sm3/sec'})
CREATE
(TEMP01)-[:MEASURE_OF]->(CT001),
(FLOW01)-[:MEASURE_OF]->(CT001)
If everything went well, we can ask for the temperatures of the assets belonging to the
section Train1, as follows:
MATCH (:Section)<-[:BELONGING_OF]-(EQ)<-[:MEASURE_OF]-(M)
WHERE M.type='TEMPERATURE'
RETURN EQ.name, M.name, M.uom
[ 256 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Neo4j has a very flexible language, and we can work with Cypher using Python, JavaScript,
C#, or the Java driver. To work with JavaScript, we need to install the driver:
$ mkdir neo4j
$ cd neo4j
$ npm init
$ npm install neo4j-driver
[ 257 ]
Implementing a Custom Industrial IoT Platform Chapter 8
We are now ready to develop our code. Build the ask_for_measure.js file with the
following code:
const neo4j = require('neo4j-driver/lib/index.js').v1;
const driver = neo4j.driver("bolt://localhost", neo4j.auth.basic("neo4j",
"admin"));
const session = driver.session();
console.log(singleRecord);
// on application exit:
driver.close();
});
[ 258 ]
Implementing a Custom Industrial IoT Platform Chapter 8
[ 259 ]
Implementing a Custom Industrial IoT Platform Chapter 8
The first three issues can be improved and managed easily, but the last topic is very
important for an I-IoT platform. In a typical IoT platform, we have a few models and
thousands of machines with a few measures that we have to monitor. The ratio from
models to assets to measures is models-assets-measures: 10-100.000-10. In the I-
IoT, however, we have dozens of models, fewer machines, and hundreds of measures per
asset to monitor, so the ratio from models to assets to measures is 100:10,000:100. This
means that we need an efficient mechanism to scale and distribute the analytics across the
fleet, using the asset model. In other words, a real platform should support both a catalog
of analytics and the capability to attach an analytic to a specific asset, using the definition or
the attributes of the model. For instance, we should avoid monitoring the corrosion of an
on-shore wind turbine against an off-shore wind turbine or deploying one analytic to a
two-stage elevator against a single-stage elevator.
[ 260 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Other technologies
Other open source platforms are available as valid alternatives to the proposed
technologies. Here is a short list of the most interesting platforms or technologies:
RabbitMQ
RabbitMQ is queue messaging system based on AMQP. It has great support for plugins
and routing. RabbitMQ is a valid alternative to Apache Kafka. On the IoT, it is very
common to use AMQP for the data protocol.
Redis
Redis is an in-memory NoSQL database that is normally used for caching.
Kibana is based on Elasticsearch and provides a very flexible way to build dashboards and
reports.
Grafana
Grafana is good alternative to Kibana, with very interesting support for different data
sources. The current version of Grafana is strongly recommended for monitoring
infrastructure and logs.
Kaa IoT
Kaa IoT is an open source framework for IoT device management and data visualization.
[ 261 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Eclipse IoT
Eclipse IoT is an interesting I-IoT framework based on open source technologies for
Industry 4.0. One of the most interesting things about Eclipse IoT is that it has support for
CoAP, DTLS, IEC 61499, OMA LWM2M, MQTT, OGC SensorThings API, oneM2M, OPC
UA, and PPMP:
Eclipse Hono: The adaptor gateway which translates MQTT, CoAP, and HTTP to
an AMQP message
Eclipse Ditto: Supports an analytic framework for a Digital Twin
At its current stage of maturity, Eclipse IoT is not so robust, but it is a very promising
solution for the future.
[ 262 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Apache Hive
HBase
Pig
MapReduce
YARN and Tez
Apache Presto
Apache Presto is an open source and distributed SQL query engine for running interactive
analytic queries. It supports Apache Hive and Cassandra.
Apache Spark
Spark is the most common framework for Big Data analytics. Spark can use Hadoop YARN,
HDFS, Apache Cassandra, Apache Kafka, and the Spark machine learning library.
Summary
In this chapter, we learned how to develop an I-IoT platform from scratch. The purpose of
this was to highlight the issues we might encounter during the development of an I-IoT
platform.
In the next chapter, we will discover the most common IoT and I-IoT platform.
[ 263 ]
Implementing a Custom Industrial IoT Platform Chapter 8
Questions
1. What is the most important benefit of using Kafka instead of RabbitMQ or
AMQP?
1. Performance and scalability
2. Routing and protocols
3. Reliability
2. What are the most important differences between cold-path analytics and hot-
path analytics?
1. Cold-path analytics use real-time processing
2. In hot-path analytics, data is processed before storage
3. In cold-path analytics, data is stored and sent to the hot path
3. Why would you use a TSDB database rather than a SQL/NoSQL standard
database to store time-series?
1. Data-sharding
2. It has a specific API to interpolate data and aggregate data
3. Scalability and reliability
Further reading
Read the following articles for more information:
[ 264 ]
9
Understanding Industrial OEM
Platforms
In this chapter, we will learn about the basic technologies that are used to develop an I-IoT
platform with the most common original equipment manufacturer (OEM) platforms. We
will discover the Predix Platform through a small exercise.
Technical requirements
The link to code files of this chapter can be found at https://github.com/
PacktPublishing/Hands-On-Industrial-Internet-of-Things.
There are more than 120 IoT platforms in existence, of which over 20 are I-IoT OEM
platforms. These include GE Predix, Bosch IoT, Siemens MindSphere, Honeywell Sentience,
Carriots, and Cisco Jasper.
In this book, we will focus on GE Predix and Siemens MindSphere. Both of these platforms
are cloud-based.
[ 266 ]
Understanding Industrial OEM Platforms Chapter 9
However, why might we avoid using commercial IoT platforms and the cloud? This might be the
case if we need a high level of data protection and control of exports, if we need high
flexibility for custom integration and proprietary protocols, or if we would like to avoid
dependencies to a specific vendor.
All these issues can be mitigated or solved, but the choice to adopt a particular platform
depends on several factors. The following diagram shows a simple decision diagram. First,
we need to check whether we have any restrictions on the export of data. If we don't, we
might opt for a cloud solution. If we can build everything from scratch, we can use a
vendor cloud solution. Otherwise, we can choose between a hybrid solution or a multi-
cloud solution:
Generally speaking, if we do not want to adopt a single strategy, we can keep the standard
business on-premise and migrate later. This means we can use the best functionalities
provided by each vendor by applying a multi-cloud solution that uses inter-cloud
communications.
[ 267 ]
Understanding Industrial OEM Platforms Chapter 9
To discover the basic capabilities of Predix, we will deploy a simple application that
accesses a time-series and an authentication service. The following diagram shows the main
components of the proposed architecture. On the cloud, we will deploy the authentication
service and the time-series service. We then build our application on our local PC, and after
having configured the manifest.yml, we are ready to deploy the application on the Predix
cloud:
In the proposed exercise, we will configure the Predix cloud service and deploy our first
application
[ 268 ]
Understanding Industrial OEM Platforms Chapter 9
After registration, you can access the predix.io console and explore the applications and
the services. You can also log in through the command console:
px login -a https://api.system.aws-usw02-pr.ice.predix.io
By default, Predix allocates you to a default organization and a development space. We can
list services and applications deployed on our space as follows:
px a
px s
Installing prerequisites
Predix is primarily based on CF, more information about which can be found on its website
at https://www.cloudfoundry.org/. CF is an open source agnostic cloud running on
Azure, Google, AWS, and the private cloud.
CF CLI: https://github.com/cloudfoundry/cli/releases
Git: https://git-scm.com/
jq: https://stedolan.github.io/jq/
Predix CLI: https://github.com/PredixDev/predix-cli/releases
For macOS X:
bash <( curl
https://raw.githubusercontent.com/PredixDev/Predix-HelloWorld-WebApp/master
/scripts/quickstart-hello-world.sh )
For Windows:
@powershell -Command "(new-object
net.webclient).DownloadFile('https://raw.githubusercontent.com/PredixDev/Pr
edix-HelloWorld-WebApp/master/scripts/quickstart-
helloworld.bat','quickstart-hello-world.bat')" && "quickstart-hello-
world.bat"
[ 269 ]
Understanding Industrial OEM Platforms Chapter 9
Node.js: https://nodejs.org
gulp: https://gulpjs.com
Grunt: https://gruntjs.com
Bower: https://bower.io
UAA and time-series services are available from the Predix catalog
[ 270 ]
Understanding Industrial OEM Platforms Chapter 9
We can then configure the name of our UAA and the administrator password:
UAA configuration
[ 271 ]
Understanding Industrial OEM Platforms Chapter 9
We need to provide a name and a space to store the information. The following screenshot
shows the required information:
[ 272 ]
Understanding Industrial OEM Platforms Chapter 9
We can now access our web console to configure the client and the users. Our web console
should appear as shown in the following screenshot:
Configuring security
From the web console, we can click on the previously configured UAA, and then on Open
Service Instance. We will then be redirected to the management console of UAA. Please
take note of the URL of UAA at the bottom of the page. The URL should be have the
following format:
https://<UUID provided>.predix-uaa.run.aws-usw02-pr.ice.predix.io
[ 273 ]
Understanding Industrial OEM Platforms Chapter 9
We need to configure a client to access our time-series and ingest the data. From the UAA
console, we have to add a new client by providing their username and password, and then
add the services that they are authorized for, as follows:
[ 274 ]
Understanding Industrial OEM Platforms Chapter 9
We also have to authorize access to the TSDB. We have to take note of the zonesId
parameter. The zonesId is defined as the ID of your partition on the Predix TSDB.
[ 275 ]
Understanding Industrial OEM Platforms Chapter 9
Replace <your zonesId> with the zone's UUID, which was annotated in the previous
step.
Take note of the zonesId of Predix TS, the UAA URL, the client ID, and
the client password.
[ 276 ]
Understanding Industrial OEM Platforms Chapter 9
From the API explorer, we have to carry out the following steps:
[ 277 ]
Understanding Industrial OEM Platforms Chapter 9
Remember to change the messageId parameter in the Request Body section in the
preceding screenshot.
[ 278 ]
Understanding Industrial OEM Platforms Chapter 9
A more complex query can be performed to retrieve aggregate or interpolated data. Predix
TS supports several aggregation functions, including sum, diff, max, min, rate, trend, and
scale. In this case, we used only the latest data point query. Obviously, Predix provides an
SDK to work with Predix TS, Predix UAA, and the other services managed by Predix.
[ 279 ]
Understanding Industrial OEM Platforms Chapter 9
We are going to use this platform to deliver our first application. From Predix Design,
which can be found at https://www.predix-ui.com/, we can get the code of the Predix
sample application:
Open package.json, which is located on the root directory, and ensure that the Node.js
version is as follows:
"engines": {
"node": ">= 7.5.0"
},
[ 280 ]
Understanding Industrial OEM Platforms Chapter 9
The application is based on Polymer, Node.js, and Bower. First of all, we need to download
the Bower and the node packages:
cd px-sample-app
npm install
bower install
We are now ready to connect to the Predix time-series that we configured in the Configuring
the time-series database section. We need to open the server/localConfig.json file from
https://github.com/predixdesignsystem/px-sample-app and fill it with the right
information:
{
"development": {
"note": "IIOT Book example.",
"uaaURL": "https://<my UAA UUID>.predix-uaa.run.aws-usw02-
pr.ice.predix.io",
"base64ClientCredential": "<client:password base64>",
"loginBase64ClientCredential": "<client:password base64>",
"appURL": "http://localhost:5000",
[ 281 ]
Understanding Industrial OEM Platforms Chapter 9
"timeseriesURL":
"https://time-series-store-predix.run.aws-usw02-pr.ice.predix.io",
"timeseriesZoneId": "<my TS zoneId UUID>",
"assetURL": "",
"assetZoneId": "",
"windServiceURL": "",
"websocketServerURL": "/livestream",
"rmdDatasourceURL": "",
"rmdDocsURL":
"https://raw.githubusercontent.com/PredixDev/predix-rmd-ref-app/master/READ
ME.md"
}
}
As before, we need to provide the UAA's URL and the zone ID of the TSDB. We then have
to provide the base64 credential, encoded on both base64ClientCredential and
loginBase64ClientCredential. Why do we have to do this twice? Normally, we should
define two clients: one for the backend and one for the login, but to simplify our example
we used the same user.
We can use an online service to encode the base64 string, or we can use the following
command line:
echo -n myclient:mypassword | base64
Then, copy the component prepared for this book from https://github.com/
PacktPublishing/Hands-On-Industrial-Internet-of-Things to the src folder.
[ 282 ]
Understanding Industrial OEM Platforms Chapter 9
We now have to select the Compression tag from the dashboard and set the last month of
the data ingested. Click on Get Data to retrieve the data, as shown in the following
screenshot:
We can now deploy the application on the cloud. On the Predix Cloud, the application uses
the VCAPS environment.
[ 283 ]
Understanding Industrial OEM Platforms Chapter 9
You can then access the application that has been deployed on the cloud as follows:
https:// <my name>-px-sample-app.run.aws-usw02-pr.ice.predix.io/
[ 284 ]
Understanding Industrial OEM Platforms Chapter 9
Predix Machine
When we talk about I-IoT, we need a real device to acquire data from a real case. Predix
provides its own hardware device and a simple software gateway called Predix Machine.
The Predix Machine can be downloaded and installed locally to acquire data from
https://www.predix.io/services/service.html?id=1185. We need to configure the data
adapter to get data through modbus, OPC-UA, or MQTT, and we need to configure the
data river so that we can send data on the cloud. The configuration files are located
in <PredixMachine HOME>/configuration/machine. Alternatively, you can get the
Predix developer kit from Predix's official site.
[ 285 ]
Understanding Industrial OEM Platforms Chapter 9
We now need to configure our Edge device to send data to the cloud. To configure the
Predix developer kit, we have to access it through a web interface. We then register the
machine and configure it with the zone ID and the UAA:
After that, stop and restart the Predix Machine. After rebooting, it will start to send data
acquired from Arduino's sensors to the cloud:
[ 286 ]
Understanding Industrial OEM Platforms Chapter 9
UI of Predix's developer kit's sensors, Predix developer kit is based on Intel Gateway and Node-RED
Finally, we are ready to see the data on the cloud from the Predix Toolkit: https://predix-
toolkit.run.aws-usw02-pr.ice.predix.io/. We can access API Explorer, query the time-
series, provide the zone ID, and get tags. This is shown in the following screenshot:
Predix kit—the API Explorer provides a useful interface to ask for data
[ 287 ]
Understanding Industrial OEM Platforms Chapter 9
If we want to see this data on the cloud, we need to add the light, rotaryangle, sound,
and temperature tags on src/px-sample-dashboard.es6.js as we did before for
IIOT-Book:CompressionRate.
Predix Edge OS
Predix Edge OS is the next version of Predix Machine and is based on hypervisor and
Docker. Predix Edge OS is a new development environment in which we can plug in
Docker images and connect different data sources, including MQTT, OPC-UA, EGD, OSI-
PI, GE, and Historian.
Predix Asset
In the previous example, we hardcoded the tag list in the code, but in a real-world case we
need to use the asset registry. Predix provides an asset registry database called Predix
Asset, which can be found at https://docs.predix.io/en-US/content/service/data_
management/asset/. We can also configure our application to ask for tags stored on the
TSDB, but when the tag list becomes very large we need to use the asset hierarchy.
[ 288 ]
Understanding Industrial OEM Platforms Chapter 9
The following diagram shows the main components of Predix and the data flow from the
edge devices to the final user. Data is acquired from sensors using Predix Machine through
OPC-UA or modbus and then fueled into the Predix time-series. We can use Predix
analytics to run analytics on top of this data, and we can show the results or the time-series
through the Predix UI components:
The second most important service is the Predix analytics service. Predix analytics allows
us to develop our own analytics in Java, GoLang, Python, and MATLAB.
[ 289 ]
Understanding Industrial OEM Platforms Chapter 9
Registering to MindSphere
You can request free access to MindSphere at https://developer.mindsphere.io/.
We can then use the cf command to explore MindSphere's space. Here is a list of the most
important cf commands:
To deploy our first application, we can download the source code from Git at https://
github.com/cloudfoundry-samples/hello-spring-cloud. We then have to edit
manifest.yml and deploy it using the following command:
cf push
[ 290 ]
Understanding Industrial OEM Platforms Chapter 9
Other platforms
As we mentioned before, there are more than 120 platforms for IoT. Most of these are
specific to IoT and are not appropriate for I-IoT. We will compare a few of these platforms.
Bosch IoT: Bosch IoT is an IoT platform based on OSGi. It is very easy to manage
devices using this platform, and it offers a very good solution to integrate on-
premise solutions.
Rockwell Automation and ThingWorx IoT Platform: Rockwell is one of the
leaders in the industry sector of automation. It has a good collaboration with PTC
for ThingWorx. ThingWorx is an IoT platform which has a complete suite of
products, from acquisition to virtual reality, design, and operations.
Intel IoT: The Intel IoT platform offers good support for SDK, data-management,
and device-management. We used this device during the exercise with the Predix
developer kit. It is very easy and intuitive to configure, and the support with
Arduino and Node-RED is very useful.
Summary
In this chapter, we learned how to configure the Predix application and how to acquire
data. We also compared the Predix platform with other platforms. In the next chapters, we
will look at the most common IoT solutions so that we can understand the main
capabilities of the I-IoT real middleware.
Questions
Please answer the following questions:
1. Predix
2. Mindsphere
3. Intel IoT
[ 291 ]
Understanding Industrial OEM Platforms Chapter 9
Further reading
Read the following articles for more information:
Predix: https://www.predix-ui.com/
Predix Forum: https://www.predix.io/community/forum
MindSphere HelloWorld: https://developer.mindsphere.io
Eclipse IoT initiative: https://iot.eclipse.org/gateways/
Bosch IoT: https://developer.bosch.com/coming-soon
Intel Gateway IoT: https://www.intel.it/content/www/it/it/embedded/
solutions/iot-gateway/overview.html
[ 292 ]
10
Implementing a Cloud Industrial
IoT Solution with AWS
In the previous chapter, we implemented a custom I-IoT platform. We looked at some
important topics, including storage, time-series, microservices, protocols, data stream
processing, and batch processing. The purpose of this chapter is to explore the solutions
proposed by Amazon Web Services (AWS) and the capabilities of the AWS IoT platform.
In this chapter, we will discover some of the most popular cloud solutions that apply to the
I-IoT. We will cover the following topics:
Technical requirements
To follow along with this chapter, the following prerequisites should be downloaded and
installed:
The code for this chapter is available at the official repository at https://github.com/
PacktPublishing/Hands-On-Industrial-Internet-of-Things.
AWS architecture
In previous chapters, we designed an end-to-end standard architecture from the sensor to
the operator. In this chapter, we will discover that AWS has a lot of similarities with the
architecture proposed in the previous chapter; this cloud architecture is strongly based on
microservices, queues, and common protocols, including HTTPS, MQTT, and REST API.
From the point of view of infrastructure, AWS supports Kubernetes, Hadoop, Docker, and
serverless platforms. From an IoT perspective, AWS has built its own solution: AWS IoT.
AWS IoT
AWS IoT is the IoT platform of AWS. The key components of AWS IoT are the following:
AWS Greengrass
AWS IoT Core
AWS IoT Analytics
Other components of AWS that are not specific to the IoT are listed as follows:
AWS QuickSight
AWS SageMaker
AWS Athena
AWS DynamoDB
AWS Lambda
AWS S3
AWS Machine Learning (ML) analytics
[ 294 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
The following diagram is the proposed architecture for the I-IoT using AWS:
IoT Core is the hub to which the devices send data using several protocols. In our exercise,
this was MQTT. The data can be stored in DynamoDB for time-series data, or S3 for object
data. AWS allows us to perform this storing action by enabling simple rules from IoT Core.
This is further explained in the IoT Analytics section. In parallel, we can process data using
AWS Lambda, a serverless platform, or using IoT Analytics. Results can either be stored on
S3 or Elasticsearch, or directly connected to QuickSight or Kibana for fast visualizing. AWS
also allows you to export data from DynamoDB to S3 with a few clicks. We can use AWS
ML, SageMaker, or Athena for batch processing. Finally, AWS allows us to carry out
actions on the edge by deploying AWS Lambda on the on-premises component Greengrass.
To discover these features, we will develop a simple exercise based on IoT Core,
DynamoDB, AWS Lambda, MQTT, AWS Edge SDK, IoT Analytics, and QuickSight. To
make a start with AWS IoT, we need a valid AWS account.
[ 295 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
[ 296 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
2. We need to provide our details and our credit card information. AWS will not
charge you if you are compliant with their free tier license.
3. Connect to the AWS console at https://aws.amazon.com and provide your
username and password.
4. Finally, choose your AWS default location.
[ 297 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
2. In the menu on the left, click on the User option and then the Add user button.
Provide a user name and check the box to allow programmatic access. Proceed
by clicking on the Next: Permissions button:
[ 298 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
4. Finally, click on the Next: Review button and then the Create user button. AWS
will provide an access key and a secret access key. Please save these in a
protected folder. We will reuse them later:
AWS will provide an access key and a secret access key after user creation
[ 299 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
IoT Core
IoT Core is a basic service to manage devices and to receive data from an edge device. To
enable IoT Core, we need to open the web console and click on IoT Core as shown in the
following screenshot:
In the next sections, we are going to activate the policy to connect an external device
through MQTT/MQTTS, create certificates, and register a new device (which is called a
thing by AWS). AWS IoT uses X.509 certificates to enforce MQTT security.
[ 300 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
[ 301 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
Otherwise, we can copy and paste the following code into the Advanced
mode tab:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"iot:Publish",
"iot:Subscribe",
"iot:Connect",
"iot:Receive"
],
"Effect": "Allow",
[ 302 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
"Resource": [
"*"
]
}
]
}
Registering a thing
The first step is to register a new device to our IoT Core platform. Follow these steps:
[ 303 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
2. Click on the Create a single thing button or import a list of things from S3.
3. We then have to define our device name, my-iiot-book-device; a
simpletest type; and a group with the name my-iiot-book-group. To create
a type and a group, click on the Create a type button and the Create group
button. Finally, click on the Next button:
4. We then have to create a X.509 certificate for our device. We can simply click on
the Create certificate button to do this.
[ 304 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
5. The last step is to activate the certificate by clicking on the Activate button and to
download all the certificates created:
We can save our certificates in an empty folder called iiot-book-
aws, then rename the certificate's prefix with a new prefix, my-iiot-
book-device. For example, this might look as follows—my-iiot-
book-device.public.key.
In the same folder, download the certificate authority from https://
www.symantec.com/content/en/us/enterprise/verisign/roots/
VeriSign-Class%203-Public-Primary-Certification-Authority-G5.
pem and rename it root-CA.pem.
[ 305 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
7. From the drop-down menu, we have to select our policy and register our thing as
shown in the following screenshot:
The dashboard in AWS from which you can manage the things
[ 306 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
We can add more devices by clicking on the Create button, or we can manage the
certificates from the Secure menu. Alternatively, we can define more groups and more
devices from the Manage menu. Finally, we can also subscribe or test the MQTT topic
using the Test menu.
[ 307 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
We have now installed the Node.js client, so we can start to write our simple client, but we
need the endpoint URL. From the Settings option, we have to copy the custom endpoint, as
shown in the following screenshot:
We can now write our simple client using the following code:
var awsIot = require('aws-iot-device-sdk');
console.log(device);
device
.on('connect', function() {
[ 308 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
console.log('connect');
console.log('publishing');
for (var i=0; i<100; i++) {
console.log("sent " + i);
device.publish('signals/'+clientId, JSON.stringify({ ‘temperature':
i}));
}
});
device
.on('message', function(topic, payload) {
console.log('message', topic, payload.toString());
});
Our client simply connects to the AWS IoT Core and publishes a test message. We can test
the data that is sent with the test console.
Click on the Test option and subscribe to the signals/# topic, as shown in the following
screenshot:
[ 309 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
Storing data
In a typical I-IoT scenario, we want to process incoming data, but also store this data in a
cloud storage system. We can receive from the device unstructured data, such as images,
sounds, or logs, as well as events and sensor data. In our I-IoT proposed architecture, we
want to store sensor data, such as time-series data, but it is easy to extend these concepts to
unstructured data as well. AWS doesn't have native support for time-series data; we
suggest that you use DynamoDB for this purpose.
DynamoDB
DynamoDB is a key-value-based NoSQL store service for high scalability. The AWS
proposed schema is shown in the following section (https://docs.aws.amazon.com/
amazondynamodb/latest/developerguide/bp-time-series.html):
The primary key is the date (date of the day without time), while time is used for the sort
key. Every column is an attribute. AWS also suggests partitioning the table to ingest 10 GB
of data in each partition.
1. Click on DynamoDB from the AWS console and then click on the Create table
button
2. Enter the name as current_signals, the primary and partition key as ts_date,
and the sort key as ts_time, as shown in the following screenshot:
[ 310 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
3. Finally, we can click on the Create button to set up our table in DynamoDB
[ 311 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
1. From the menu on the left, click on Act, then Create a rule, as shown in the
following screenshot:
[ 312 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
2. We can provide a name and the attributes that we want to import. In this
example, we want the data (*) and the name of the topic (topic()), so that we
need to set the attributes to *, topic() as topic. Finally, we can subscribe to
all signals, so that we can use signals/# as the topic filter:
[ 313 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
3. We need to configure the final action to store it in the DynamoDB. Click on Add
action, then DynamoDB, and finally the Configure action.
4. Finally, choose the DynamoDB table that we created previously, which is
current_signals in our case. To define the mapping, enter the following code
terms into the relevant fields, as demonstrated by the following screenshot:
ts_date : ${parse_time("yyyy.MM.dd", timestamp() ) }
ts_time : ${parse_time("HH:mm:ss", timestamp() )}
Then, write the payload in the Write message data to this column field and
define the operation as INSERT, as shown in the following screenshot:
[ 314 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
5. We need to create a new role to allow the rule to write in the DynamoDB table.
Click on the Create a new role button, provide the name, such as DDB-IOT-
role, and then click on the Update role button.
6. We can now click on the Add action button.
We can now test our flow. From the command console, enter the following command:
$ node iiot-book-device.js
Finally, from the AWS console, click on DynamoDB, then Tables, and find our table,
current_signals. We can see that our data has been imported, as shown in the following
screenshot:
As we can see, not all the data has been imported due to index conflicts. If we want to add
all the data, we also need to add support for milliseconds in the date.
[ 315 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
AWS Kinesis
It is also possible to attach AWS IoT to Amazon Kinesis. Amazon Kinesis, and in particular,
Data Firehose, is the easiest way of streaming data into data stores and analytics engines.
AWS Kinesis is very good for streaming events, logs, and videos.
AWS analytics
Once the devices are connected and the time-series data is stored, I-IoT data should be
analyzed, processed, and visualized. AWS provides six different mechanisms to process
data:
Lambda analytics
Serverless Lambda functions are the easiest and most customizable form of analytics.
Lambda analytics allow you to process data immediately, with powerful support for
Python, Node.js, GO, C#, and Java. The easiest way to configure a Lambda function is
through the AWS console as follows:
1. From the AWS console, we have to search for Lambda and click on the Create
function button
[ 316 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
2. Select the Author from scratch section and, for the name field,
enter my_iiot_lambda_threshold. From the role template, we can define our
role name as my_iiot_lambda_threshold_role:
[ 317 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
After clicking on the Create function button, AWS will show a page similar to the
following screenshot:
[ 318 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
[ 319 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
[ 320 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
8. Now, it's time to test our function. From the command console, enter the
following command:
$ node iiot-book-device.js
Greengrass
Greengrass is a new component within the AWS family. AWS Greengrass extends AWS IoT
cloud capabilities to local devices. Sometimes, it is not possible to process all the data on the
cloud itself, so instead, we need to collect and analyze data closer to the source of
information.
In other words, developers can use AWS Greengrass to run serverless code (AWS Lambda
functions) in the cloud and conveniently deploy it to devices for local execution of
applications. AWS Greengrass also has great support for machine learning inference.
Greengrass is accessible from the menu on the left side of IoT Core under the Manage
devices section.
[ 321 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
1. Build the Greengrass edge, the certificates, and the configuration file from the IoT
Core interface
2. Deploy and start Greengrass on the edge
3. Build the OPC UA Connector
4. Deploy the OPC UA Connector on the edge
The following diagram shows the architecture of our exercise and the four preceding steps
to be accomplished:
[ 322 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
1. From the menu on the left, we need to click on Greengrass and follow the Get
Started procedure:
[ 323 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
Greengrass group
3. Accept the next few steps and then download the Greengrass distribution and
the resources (certificates and configuration), as shown in the following
screenshot:
[ 324 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
[ 325 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
4. Finally, click on the Create Group and Core button, as shown in the following
screenshot:
[ 326 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
Vagrant is a way to start a virtualized environment with just a few commands. First, we
need to download Vagrant from https://www.vagrantup.com/downloads.html. We can
then run the following script from the command console:
$ vagrant init ubuntu/xenial64
$ vagrant up
$ vagrant ssh
Vagrant will mount the current directory to the /vagrant directory so that we can easily
access the greengrass directory and copy the contents to a local folder:
$ sudo cp -R /vagrant/greengrass /greengrass
We can now unpack our resources and runtime into a local folder, such as aws-
greengrass-home. From Ubuntu's console, we need to run the following commands:
The next step is to configure the certificates and the configuration file, config.json:
$ cd /greengrass/certs/
$ sudo wget -O root.ca.pem
http://www.symantec.com/content/en/us/enterprise/verisign/roots/VeriSign-Cl
ass%203-Public-Primary-Certification-Authority-G5.pem
$ cp root.ca.pem certs/
$ cp certs/* greengrass/certs/
$ cp config/* greengrass/config/
We also need to add the ggc group and the user. From command console, run the
following commands:
$ sudo adduser --system ggc_user
$ sudo addgroup --system ggc_group
After that, we need to install Node.js 6.x on the device. From the command console, run the
following commands:
$ curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -
$ sudo apt-get install -y nodejs
$ sudo ln -s /usr/bin/node /usr/bin/nodejs6.10
[ 327 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
[ 328 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
4. Finally, we need to configure the OPC UA data source. We can connect the OPC
UA Prosys or the Node.js example developed in Chapter 6, Performing an
Exercise Based on Industrial Protocols and Standards. We need to open the index.js
file and change the configSet variable:
const configSet = {
server: {
name: 'server',
url: 'opc.tcp://localhost:26543',
},
subscriptions: [
{
name: 'Temperature',
nodeId: 'ns=1;s=Temperature',
},
],
};
[ 329 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
1. From the menu on the left on the Greengrass site, click on Lambdas and
then Create new Lambda, as shown in the following screenshot:
[ 330 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
[ 331 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
[ 332 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
5. From the IoT Greengrass console, click on the Use existing Lambda button and
link to the function that we just created. We have to make this function
permanent, so we need to edit the function and choose the Make this function
long-lived and keep it running indefinitely option, as shown in the following
screenshot:
[ 333 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
Greengrass will connect the OPC UA server and will send data to the data IoT Hub for the
topic /opcua/server/node/Temperature. The topic will be created automatically.
AWS Athena is a powerful SQL query language for business intelligence that is easy to
apply to NoSQL's context.
[ 334 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
IoT Analytics
IoT Analytics is another analytical framework for IoT data transformation and enrichment.
IoT Analytics is a single service that is able to process the full chain of our data, from data
storage to data enrichment and processing.
To work with IoT Analytics, we need to build the channels, the pipelines, and finally, the
data stores. These are, respectively, the input, the transformation, and the output. To work
with AWS IoT Analytics, connect the browser to the IoT console and search for IoT
Analytics.
Building a channel
To build our channel, we want to connect IoT Analytics to our IoT MQTT. Follow these
steps:
1. From the menu on the left, click on Prepare, then Channels, then click on the
Create a channel button as shown in the following screenshot:
[ 335 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
2. Next, provide a name for the channel, such as signals_channel, and then click
the Next button
3. On the IoT Core Topic filter field, we can subscribe to the signals/# MQTT
4. Finally, we have to create the channel by clicking on the Create a channel button
Click on the IoT Core act to see the rule that we just created.
1. From the channels list, we can create our pipeline by clicking on the Create a
pipeline from this channel option:
[ 336 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
3. For now, we can skip the transformation activity, but we can easily build
attribute calculation and enrichment with just a few clicks if necessary
4. Finally, define where to store the data by clicking on the Create new data store
option and provide the symbolic name as signals_datastore
[ 337 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
1. From the menu on the left, click on Analysis, then Data Set, and then Create data
set.
2. Enter the name as signals_dataset and select signals_datastore, which
we created previously, as the data store source as shown in the following
screenshot:
[ 338 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
3. Finally, we can build our data set by clicking on the Next button. After creation,
click on the Run query now option. We will reuse this dataset in the QuickSight
paragraph:
The IoT Analytics framework is a low latency integration platform for fast transformation
and prototyping.
QuickSight
QuickSight mainly relates to business intelligence capabilities, but also offers an easy way
of IoT data visualization. Although QuickSight is not exactly related to IoT, it has great
connectivity with AWS IoT Analytics and allows you to develop a user interface with just a
few clicks:
[ 339 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
3. To enable the data source, check the box next to the Amazon IoT Analytics
checkbox, as shown in the following screenshot:
Configuring QuickSight
4. Finally, from the data set, create a new instance IoT Analytics and select our
previously created dataset signals_dataset:
[ 340 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
Now, drag and drop our signal (in this case the temperature) into the
autograph space:
[ 341 ]
Implementing a Cloud Industrial IoT Solution with AWS Chapter 10
Summary
In this chapter, we explored the most common functionalities used to deliver an IoT
solution with AWS. We understood the main components and the benefits. In the next
chapter, we will discover the Google IoT offering.
Questions
1. Which of the following technologies is a serverless technology?
1. Greengrass
2. Lambda functions
3. Docker
2. Which of the following technologies is a device manager technology?
1. AWS Greengrass
2. AWS IoT Core
3. AWS Hub
3. Which of the following technologies can we use for fast data processing and low
latency analytics?
1. IoT Analytics
2. AWS Machine Learning
3. AWS Lambda
Further reading
Additional resources can be found at the following links:
[ 342 ]
11
Implementing a Cloud Industrial
IoT Solution with Google Cloud
In the previous chapter, we looked at using an I-IoT platform with AWS. We explored some
of the most important topics—storage, time-series, microservices, protocols, data stream
processing, and batch processing. The purpose of this chapter is to explore the solutions
proposed by the Google Cloud Platform (GCP) and the capabilities of the GCP IoT
platform.
In this chapter, we will discover the most popular cloud solutions applied to the I-IoT. We
will cover the following topics:
Technical requirements
In this chapter, we need the following prerequisites:
Cloud IoT Core: The device hub to receive data and manage devices
Cloud Pub/Sub: A publisher/subscriber service to consume data
Cloud Dataflow or Cloud Functions: These are used to process data for
conversion, digital twins, and diagnosis
Cloud Bigtable: This is used for data storage
From an Edge perspective, Google is quite agnostic. However, they have recently declared
that Android will be the platform that is used to develop IoT. The following diagram shows
the basic flow diagram implemented by the GCP IoT:
[ 344 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
Google IoT Core supports Message Queuing Telemetry Transport (MQTT) and HTTPS. It
is the central hub for connecting remote devices and uses Google Pub/Sub for data routing.
With Cloud Dataflow, a Google service that allows you to process stream analytics, and
Cloud Functions, Google's serverless platform, we can process data and perform actions to
store time-series or object data. Google's Bigtable can be used to store time-series and
events, while Cloud Storage can be used to store objects. Cloud BigQuery and Cloud ML
are useful for batch analytics processing and we can develop a simple visualization
application using Google Data Studio. In the following simple exercise, we will learn how
to connect a device with the Google Edge SDK, how to process data with Google Functions
and Google Dataflow, and how to store time-series in Google Bigtable.
The next steps depend on where you are connecting from. Generally speaking, you will
have to carry out the following activities:
Google requires you to create a project space before starting. This can be done as follows:
[ 345 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
2. From the project bar, we can create a new project. The following screenshot
shows the New Project page:
3. Then, we have to choose the language and confirm the creation of the project.
GCP uses the project to segregate the environment. In this chapter, we will call
this project iiot-book.
[ 346 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
4. After the project has been created, select the project, access Home page, and pin
the following services from the menu on the left:
We also suggest that you pin the billing service so that you can manage
your free account.
[ 347 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
We are now ready to work with IoT Core and the other technologies. The first time we use
it, GCP asks us to enable APIs for every service that we want to work with. GCP provides
an easy command-line interface for this called gcloud.
[ 348 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
2. Then, we can initialize gcloud with our project ID and authorization, using the
following command:
$ gcloud init
[ 349 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
Device registry
Then, we have to provide a name for the device registry, which in our case will be iiot-
book-registry. We also have to define two topics, one for signals, which will be signals,
and one for statuses, which will be statuses.
GCP uses two different topics to send data and statuses about the health
of a device.
[ 350 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
[ 351 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
For now, we can skip adding a custom Certificate Authority (CA) certificate, but we
should provide this in a real production environment. CA certificates are used to verify
device credentials.
1. First, we need to create the private and public RS256 keys to be used for
communication between IoT Core and our sensors:
$ openssl genrsa -out device_private.pem 2048
$ openssl rsa -in device_private.pem -pubout -out
device_public.pem
3. Then, we can create our device. From the registry details, we have to click on
Add a device. We should provide a name, such as my-iiot-device, and the
public key that we created previously, as shown in the following screenshot:
[ 352 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
[ 353 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
1. First, we need to clone the Node.js client from the official repository. From the
root path where we have stored our certificate, enter the following command:
$ mkdir my-iiot-device
$ cd my-iiot-device
$ git clone
https://github.com/GoogleCloudPlatform/nodejs-docs-samples
$ cd nodejs-docs-samples/iot/mqtt_example/
$ npm install
2. We can then start publishing. The following code sends an MQTTS payload to
the IoT core that was just configured on europe-west1, identifying itself as my-
iiot-device:
$ node cloudiot_mqtt_example_nodejs.js \
--projectId=iiot-book \
--cloudRegion=europe-west1 \
--registryId=iiot-book-registry \
--deviceId=my-iiot-device \
--
privateKeyFile=../../../../../certificates/device_private.pem \
--algorithm=RS256
3. Finally, we can check the reception status on the Registry details page, as shown
in the following screenshot:
[ 354 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
4. We now need to change the payload so that it is compliant with our CSV format.
On line 161 of the cloudiot_mqtt_example_nodejs.js file located in the
nodejs-docs-samples/iot/mqtt_example/ directory, we need to replace the
payload with the following:
…
setTimeout(function () {
//const payload = `${argv.registryId}/${argv.deviceId}-
payload-${messagesSent}`;
const ts = new Date().getTime();
const payload =
`${argv.registryId}.${argv.deviceId}.signal1,${messagesSent},${
ts},GOOD`;
…
[ 355 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
Bigtable
To store our data, we need the support of Cloud Storage. We can use three different data
storage services—Cloud BigQuery, Cloud Bigtable, and Cloud Datastore. To store time-
series, Google suggest that you use Bigtable. Bigtable is very similar to the popular open
source platform Apache HBase. Bigtable is organized in tables, rows, columns, and families
of columns:
[ 356 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
In our exercise, we are going to build a schema, where a combination of device IDs and
timestamps is the row key and the column families are the sensors. Please look into the
layout and the content of the following table:
Because Cloud Bigtable tables are sparse, you can create as many column
qualifiers as you need in each row. There is no penalty for having empty
cells in a row. For a detailed explanation of Bigtable schema please
visit https://cloud.google.com/bigtable/docs/schema-design and
https://cloud.google.com/bigtable/docs/schema-design-time-
series.
Before starting our deployment, we can test Bigtable locally using the GCP Bigtable
emulator. To run the Bigtable emulator, we need to launch the following command:
$ gcloud beta emulators bigtable start
When the Bigtable has been started, we can execute the following command:
$ export BIGTABLE_EMULATOR_HOST=localhost:8086
$ cbt -project iiot-book-local -instance iiot-book-data createtable iiot-
book-signals
$ cbt -project iiot-book-local -instance iiot-book-data createfamily iiot-
book-signals signal1
The first line forces the Bigtable command-line tool cbt to use a local emulator. The second
line creates a table called iiot-book-signals. The third line names our first column
family signal1.
[ 357 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
We can now build our Bigtable in the cloud. Follow these steps:
1. We need to create an instance from the left-hand menu of the GCP homepage:
[ 358 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
[ 359 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
Due to the high cost of Bigtable, we suggest that you drop the instance
that we just created after the exercise.
To create the data table, we can either use the UI or the cbt script. In this exercise,
we will use the cbt tool.
3. We need to open a new Command Prompt and execute the following command:
$ cbt -project iiot-book-<id> -instance iiot-book-storage
createtable iiot-book-signals
% cbt -project iiot-book-<id> -instance iiot-book-storage
createfamily iiot-book-signals signal1
4. Finally we can check the creation of the table with the following command:
$ cbt -project iiot-book-<id> -instance iiot-book-storage ls
iiot-book-signals
Cloud Functions
Google Cloud Functions are serverless services that respond to events, such as data coming
from our device. We are now going to set up a Cloud Function to extract the data sent by
the device, process it immediately, and store it in the central data storage:
1. From the Google Cloud Console's left-hand menu, click on Cloud Functions and
then Enable API:
[ 360 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
[ 361 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
4. Under the source code section of the inline editor on the index.js tab, we can
copy and paste the following code:
// needs npm install --save @google-cloud/bigtable
// Imports the Google Cloud client library
const Bigtable = require('@google-cloud/bigtable');
[ 362 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
var bigtableOptions = {
projectId: PROJECT_NAME,
};
});
const deviceId=data[0];
const tag=data[1];
const value=data[2];
const timestamp = data[3];
try {
[ 363 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
key: `${ts}`,
data: {
[tag]: {
[COLUMN_QUALIFIER]: {
value: value
},
},
},
}));
await table.insert(rowsToInsert);
} catch (err) {
console.error(`Error inserting data:`, err);
callback(); //DONE WITH ERROR
}
};
5. Under the source code section on the package.json tab, we can copy and paste
the following code:
{
"name": "sample-pubsub",
"version": "0.0.1",
"dependencies": {
"@google-cloud/bigtable" : "^1.0.0"
}
}
The purpose of this code is to parse our MQTT message and to store it in Google Bigtable.
We can now run our example.
[ 364 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
The tool will send data to the official GCP MQTT Hub: mqtt.googleapis.com:8883. We
can now check the status of our Google Cloud Function:
1. From the left-hand menu of the homepage, click Google function and then click
on our implemented function, ioot-book-function-1 as shown in the
following screenshot:
2. We can check the health of the function by looking at the number of errors.
Alternatively, we can click on View logs if we need more details. Finally, we can
ask for the data stored with the following query:
$ cbt -project iiot-book-<id> -instance iiot-book-data read
iiot-book-signals
[ 365 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
const deviceId=data[0];
const tag=data[1];
const value=data[2];
const timestamp = data[3];
const quality = data[4];
var dt = digitalTwin[tag];
if ((value > dt.upperLimit || value <dt.lowerLimit) && (quality=='GOOD'))
{
console.log(`excursion on ${tag} : ${value}`);
}
callback();
};
To deploy the current function, we can follow the same procedure that we carried out
previously:
1. From the left-hand menu of the homepage, click on Cloud Functions, then
Create function, and add the following information:
[ 366 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
[ 367 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
To check whether our first basic analytics have executed successfully, run the device
emulator again and check the function log.
Dataflow
Google Cloud Dataflow is a data streaming or batch processing platform for analytics based
on Apache Beam (https://beam.apache.org/). Using Google Cloud Dataflow, we can
write complex analytics based on Python or Java, or we can use a template. Dataflow has a
set of useful templates for fast data processing and conversion.
[ 368 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
[ 369 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
Every five minutes, the data received on the topic will be stored in a GCP storage file
system. This will be further explained in the Google Cloud Storage section that follows.
BigQuery
BigQuery is a very robust and mature technology provided by GCP to execute complex
OLAP queries. We can use BigQuery for interactive queries. GCP provides a command-line
tool to execute complex queries. Alternatively, we can use the online tool, which can be
found at https://bigquery.cloud.google.com/welcome.
In this simple example, we are going to use the console tool. We assume that the tool has
already been installed. From the command line, execute the following command:
$ bq query --use_legacy_sql=false 'select count(*) from `signals.iiot-
book`'
The query counts the number of events that were previously stored in our Bigtable.
Cloud SQL: This is useful if we need a relational database with full SQL support
for online transaction processing (OLTP)
Cloud Bigtable: This is useful if we don't require support for ACID transactions
BigQuery: This is useful if we need interactive querying in an online analytical
processing (OLAP) system
Cloud Storage: This is useful if we need to store large immutable blobs, such as
large images or movies
In this exercise, we used Bigtable. In the I-IoT, we do not need support for transactions, but
we prefer scalability optimization rather than complex query language.
[ 370 ]
Implementing a Cloud Industrial IoT Solution with Google Cloud Chapter 11
Summary
In this chapter, we have explored the most common functionalities of GCP, including its
pitfalls and benefits. In the next chapter, we will discover the Azure IoT.
Questions
1. Which of the following technologies is a serverless technology?
1. Greengrass
2. GCP Functions
3. Docker
2. Which of the following technologies is a device manager technology?
1. AWS Greengrass
2. GCP IOT Core
3. Azure Hub
3. Which technology can we use for fast data processing and low latency analytics?
1. GCP Dataflow
2. Azure Machine Learning
3. AWS Lambda
Further reading
Additional resources can be found at the following links:
[ 371 ]
12
Performing a Practical Industrial
IoT Solution with Azure
In the previous chapters, we looked at how to build an I-IoT platform with AWS, GCP, and
a custom solution. We have explored some of the most important concepts, including
storage, time-series, microservices, protocols, and data analytics. The purpose of this
chapter is to build an I-IoT solution with Azure.
In this chapter, we will discover the most popular Azure cloud components that can be
applied to I-IoT. We will cover the following:
Azure IoT
Azure analytics
Building visualizations with Power BI
Time Series Insights
Connecting a device with IoT Edge
Comparing the platforms
Performing a Practical Industrial IoT Solution with Azure Chapter 12
Technical requirements
To follow along with this chapter, we need to download and install the following
prerequisites:
Azure IoT
Azure IoT is a platform proposed by Microsoft to connect multiple devices, enable
telemetry, store measures, run and develop analytics, and visualize results. The key
components of Azure IoT are the following:
IoT Hub
Stream Analytics
Azure Data Lake
Data Lake Analytics
Time Series Insights
The following diagram shows the architecture proposed by Azure. Data is acquired
through the Azure IoT Edge and sent to the Azure IoT Hub. Data can be processed with
low latency Stream Analytics, stored in a time series database called Time Series Insight,
or stored in Azure Data Lake. It can then processed by Azure ML Analytics or Data Lake
Analytics. Finally, we can use Power BI for fast visualization of the data:
[ 373 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
[ 374 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
IoT Hub
IoT Hub is the middle ware that is used to register and manage the devices. IoT Hub
supports the MQTTS, HTTPS, and AMQP protocols. To start working with IoT Hub, we
need to search for IoT using the search bar.
[ 375 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
Once you have found it, click on the Create IoT hub button. For convenience, we can pin
IoT Hub to the dashboard by clicking on the pin button. These steps are shown in the
following screenshot:
[ 376 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
1. Click on the Create IoT hub button and provide the name of the resource (iiot-
book-resources) and the name of the hub (iiot-book-hub)
2. Click on the Next: Size and scale >> button, as shown in the following
screenshot:
3. On the Size and scale page, we need to ensure that the F1 Free tier is selected
4. Finally, click on the Create button
Azure will create an instance of the IoT Hub in a few minutes. If everything goes well, we
will be able to go to the dashboard, click on the IoT Hub instance that we just created, and
see the overview page and the host name of our IoT Hub. In our case, it will be iiot-
book-hub.azure-devices.net. Please take note of this name; we will reuse it later.
[ 377 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
[ 378 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
After creating the device, we can get the host connection string from the Device Details
page:
From the command console, we can create an empty directory and install the right
packages. The commands required to do this are as follows:
mkdir my-iiot-device
cd my-iiot-device
npm init -y
npm install azure-iot-device azure-iot-device-mqtt –save
[ 379 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
On the my-iiot-device path that we just created, we can create a new file called my-
iiot-device.js and copy and paste the following code. Replace the connection string
with your connection string:
'use strict';
var Protocol = require('azure-iot-device-mqtt').Mqtt;
[ 380 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
});
client.on('disconnect', function () {
clearInterval(sendInterval);
client.removeAllListeners();
client.open(connectCallback);
});
}
};
client.open(connectCallback);
// Helper function to print results in the console
function printResultFor(op) {
return function printResult(err, res) {
if (err) console.log(op + ' error: ' + err.toString());
if (res) console.log(op + ' status: ' + res.constructor.name);
};
}
Then, from the command console, we can run the following command:
node my-iiot-device.js
The client will send some data from two fake sensors, which will represent the flow of a gas
and the outside temperature. We will also add a property to indicate when the temperature
reaches 20 degrees. We assume that there is a relationship between the flow of the gas and
the outside temperature. Here, we are emulating a simple compressor station.
[ 381 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
We can also monitor the messages that are sent from the Metrics tab of the IoT Hub, as
shown in the following screenshot:
[ 382 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
To prepare our data lake repository, we can search for Data Lake Storage Gen1 and click
on Create a data lake storage gen1. In the pop-up window, we need to provide a name,
such as iootbook, select the resource group created in the previous section, and click on
the Create button, as shown in the following screenshot:
[ 383 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
Azure analytics
Azure provides three different frameworks that can be used to work with data—Stream
Analytics, Data Lake Analytics, and ML Analytics.
Stream Analytics
Having added the data to the Azure IoT Hub, the next step is to develop an analytics job to
process our data. In this example, we will extract the efficiency of the simulated compressor
station, and will save the results into Data Lake Storage:
1. From the Azure portal, search for Stream Analytics and click on Stream
Analytics jobs. Then, click on the Create stream analytics job button and
provide a straightforward name, such as my-iiot-job, as shown in the
following screenshot:
[ 384 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
2. Once this is done, we need to define an input. From the Stream Analytics page,
click on our job, then click on +Add stream input from the Inputs tab to add
our iiot-book-hub as an IoT Hub input. Finally, click on the Save button.
These steps are shown in the following screenshot:
[ 385 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
4. We can now connect our output. From the Outputs tab, click on +Add and then
Data Lake Store. We have to provide a name, efficiencyoutput, and a path
prefix of out/logs/{date}/{time}. After that, click on Authorize.
[ 386 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
5. We need to change the Event serialization format option to CSV and the
Delimiter option to tab. After that, click on Save, as shown in the following
screenshot:
6. We are now ready to connect our input with our UDF and our output. From the
Query tab, we can define this as follows:
SELECT
deviceId AS device,
ts,
temperature,
flow,
udf.efficiency(temperature,flow)
INTO
efficiencyoutput
FROM
iiothub
[ 387 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
7. Finally, don't forget to start the job by clicking on Start from the Overview tab.
This is shown in the following screenshot:
[ 388 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
From the Azure portal, we can access Data Lake Storage by clicking on iiotstore, and
then the Data explorer section. You can then expand the tree, as shown in the following
screenshot:
[ 389 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
The job will process the data that is feeding the IoT Hub and save the results on Data
Lake using the out/logs/{date}/{time} syntax.
Remember to click on the Stop button, to stop the analytics from running
in order to avoid unwanted billing.
The Advanced Stream Analytics query language also allows you to group by time using a
method called windowing, which means that you can define a relative time interval, such
as five seconds, to aggregate measures:
SELECT
deviceId AS Device,
max(ts),
Avg(temperature) AS temperature,
Avg(flow) AS flow,
Avg(udf.efficiency(temperature,flow)) AS efficiency
INTO
efficiencyoutput
FROM
iiothub
TIMESTAMP BY ts
GROUP BY TumblingWindow(second,5), deviceId
[ 390 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
To test this simple query, we can stop our analytics. From the Query tab, acquire three
minutes of samples by clicking on Sample data from input. We can then test our query.
The following screenshot shows the expected output when the data is sampled every five
seconds. It is also important to group by deviceId to avoid to mixing data between
devices:
Azure supports three types of windowed grouping, 14 aggregate functions, and a very
powerful time management system for skipping out of order data. Out of order data is data
that arrived too late to be processed.
[ 391 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
1. To enable Data Lake Analytics from the Azure portal, search for Data Lake
Analytics.
2. Click on Create New Data Analytics.
3. We need to provide a name, such as iiotbookdla, and select our Data Lake
Storage, which is iiotstore. These steps are shown in the following screenshot:
[ 392 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
If everything went well, we can create our first job. To do this, perform the following steps:
1. Add a new job from the Data Lake Analytics instance that we just created
2. Provide a name, such as my-dla-efficiency-job as shown in the following
screenshot:
3. In the text panel, copy and paste the following U-SQL code:
DECLARE @now DateTime = DateTime.Now;
DECLARE @outputfile =
"/out/reports/"+@now.ToString("yyyy/MM/dd")+"-efficiency.csv";
[ 393 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
filename string
FROM "/out/logs/{date:yyyy}/{date:MM}/{date:dd}/{filename}.csv"
USING Extractors.Tsv(skipFirstNRows:1);
This simple U-SQL script evaluates the average efficiency daily for each device and saves
the report in Data Lake Storage to be operated on later by the operator or other analysts.
For people familiar with T-SQL, U-SQL will be quite easy to read. The first block parses the
CSV, the second block calculates the average grouping by device and date, and the last
block saves the output. After a few seconds, the output will be processed and we can open
our Data Lake Storage to see the report.
[ 394 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
Data Lake Analytics also allows us to interact with the parallelization mechanism to build a
custom reducer. To implement our reducer in Python, we can rewrite the previous code
with Python extensions:
REFERENCE ASSEMBLY [ExtPython];
def usqlml_main(df):
return df.median('efficiency')
";
@result =
REDUCE @d ON device
PRODUCE device string, efficiency string
USING new Extension.Python.Reducer(pyScript:@myScript);
…
ML Analytics
Azure ML Analytics is an environment in which we can deploy advanced analytics for
predictive modelling. We can use ML Analytics with both Python and HDInsight, which
includes Hive tables and Hadoop. We will learn more about ML in the Chapter 14,
Implementing a Digital Twin - Advanced Analytics, and Chapter 15, Deploying Analytics on an
IoT Platform.
[ 395 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
When we have registered, we need to connect our data flow from the IoT Hub to Power BI.
This process is the same as we did for Stream Data Analytics, but we need to set the Power
BI storage as the output of the data flow:
1. From Data Stream Analytics, we can create a new job, called my-iiot-vis-job,
with the following settings:
Input: "iiothub"
Query:
SELECT * into iiotpowerbi FROM iiothub
2. For the output, click on the +Add button and then Power BI. We can then
provide a name, iiotpowerbi, the dataset, myiiotdeviceds, and the table
name, myiiotdevicetb.
3. Finally, click on the Authorize button and then Save. These steps are shown in
the following screenshot:
[ 396 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
Once the data flow has been built, we need to start the job by clicking on the Overview
section and then Start. We can then connect to the Power BI console at https://app.
powerbi.com/. From the Power BI console, we need to click on the dataset that we just
created, myiiotdeviceds, and click on the line chart from the panel on the right. On the Y-
axis, add the measures flow and temperature, and on the X-axis, add ts. The following
screenshot shows the expected output:
[ 397 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
The following screenshot summarizes the main features of Azure IoT Edge from three
different perspectives—computation, gateway, and development:
[ 398 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
Computation: With Azure IoT Edge, we can deploy Azure Stream jobs and
Azure ML Analytics on the Edge to perform simple or advanced actions. Azure
IoT Edge runtime supports C#, Node.js, Java, Python, and C.
Gateway: Azure IoT Edge can connect sensors through MQTT, AMQP, and
HTTP. However, it needs additional modules to support OPC UA, OPC DA, and
Modbus.
Development: We can compile Azure IoT Edge with Visual Studio Code.
Alternatively, we can use the Docker images built previously.
Azure IoT Edge also supports a store and forward feature on its offline mode. Azure IoT
Edge allows us to save data locally for a long period of time, in case we have an
intermittent connection.
The operating systems that are supported by Azure IoT Edge are the following:
[ 399 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
Before starting the Docker image, we need to retrieve few pieces of information:
1. From the IoT Hub, we need to retrieve the connection string. Click on the
iothubowner policy and copy and paste the connection string:
[ 400 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
Pump_01.Pressure",
"OpcSamplingInterval": 2000,
"OpcPublishingInterval": 5000
}
]
}
]
3. Finally, from the command line, we can run the following command. We need to
replace <opc-ua server ip> with the IP address of the OPC UA Proxy Server,
and <IoT Hub connection string> with the Azure IoT Hub connection
string retrieved previously. We also need to provide the full path of the <azure-
edge directory> in which we stored the publishednodes.json file:
Azure IoT Edge will connect to the OPC UA Proxy Server and subscribe to the
temperature measure, polling for a new value every 5,000 ms. The measure will only be
published if the value has changed.
In the log file, publisher.log.txt, we should be able to find a line that is similar to the
following:
10/12/2018 11:02:37: Created monitored item for node
'ns=5;s=MyDevice.Pump_01.Pressure' in subscription with id '2' on endpoint
'opc.tcp://opcuaserver:53530/OPCUA/SimulationServer'
We can also check whether the messages have sent or not in the IoT Hub console. The
expected output is as follows:
[ 401 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
The Azure IoT Hub log, showing Azure IoT Edge connections to the Hub
[ 402 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
In the I-IoT, we acquire data when it changes, so we may need interpolation functions to fill
the gaps, as we discovered with the OpenTSDB and KairosDB open-source databases in
Chapter 8, Implementing a Custom Industrial IoT Platform. It is important to consider
whether for our particular context it is more appropriate to use an analytical cold-path or
hot-path. Microsoft has good support for the OPC DA and OPC UA acquisition standards,
while AWS is currently developing support for OPC UA on Greengrass.
With regard to the key points mentioned here, it seems that while Azure is a very mature
platform for the I-IoT, AWS is filling the gap to integrate standard protocols and OPC UA
on Greengrass, and GCP performs strongly with regard to processing data. The U-SQL and
the windowing mechanism of Azure analytics seem to be very powerful in the context of
the I-IoT as well.
From a cost perspective, which of the three platforms is more suitable will strictly
depend on the discount plans on offer. In general, however, Microsoft Azure has the lowest
price, while AWS usually comes in the middle.
Summary
In this chapter, we have explored the most common functionalities of Azure IoT and looked
at its advantages and disadvantages. With this, we have come to the end of our discussion
on data acquisition technologies. In the next chapter, we will explore how to develop
analytics in the industrial sector. We will learn about diagnostic, prognostic, and predictive
analytics, and discover both physics-based and data-driven technologies. We will develop
several examples using machine learning techniques and deploy our analytics on the cloud.
[ 403 ]
Performing a Practical Industrial IoT Solution with Azure Chapter 12
Questions
1. Which of the following technologies is the most appropriate time-series
database?
1. Power BI
2. Time Series Insights
3. Azure Hub
2. Which of the following technologies is a big data technology?
1. Azure TSI
2. Azure Data Lake
3. Azure Hub
3. Which technology can we use for fast data processing and low latency analytics?
1. Data Lake Analytics
2. Azure ML Analytics
3. Azure Analytics Stream
Further reading
Additional resources can be found at the following links:
[ 404 ]
13
Understanding Diagnostics,
Maintenance, and Predictive
Analytics
In the previous chapter, we implemented I-IoT platforms based on open source, Amazon
Web Services (AWS), Original Equipment Manufacturer (OEM), Google Cloud
Platform (GCP), and Azure. We also introduced some important topics, including storage,
time-series, microservices, protocols, and data processing. Of these, the latter is without a
doubt the most valuable topic. General Electric (GE) has monitored aircraft engines and
power generation systems for 15 years and claims that just a 1% improvement in efficiency
can add $276 billion in profit to major industries. To support these claims, we need to learn
from our data and build a new generation of analytics. In the previous chapter, we
implemented a few simple analytics. In this chapter, we will discover the most important
use cases for I-IoT analytics.
I-IoT analytics
The different classes of analytics
Technologies related to I-IoT analytics
Infrastructure
Consuming data
Open system architecture
Technical requirements
In this chapter, we need the following prerequisites:
Git: https://git-scm.com/downloads
Python 3.7: https://www.python.org/downloads/
Anaconda 5.2: https://www.anaconda.com/download/
To find the complete solutions for the exercises in this chapter, go to the repository for this
book on GitHub: https://github.com/PacktPublishing/Hands-On-Industrial-Internet-
of-Things.
Jupyter
If you have installed Python you may want to use a development IDE. One of the best IDE
is the Jupyter Notebook. To install Jupyter, run the following commands:
$ pip install jupyter
To run the notebook, run the following command from the command-line:
$ jupyter notebook
If you want to avoid installing Python and Anaconda, you can also run Jupyter Notebook
by starting the jupyter/datascience-notebook container:
[ 406 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
I-IoT analytics
Analytics are one of the most important aspects of the I-IoT and without a doubt are the
most complex. To manage complex systems, we need advanced techniques. Because these
technologies are very advanced, they can provide valuable and reliable insights, reducing
false positives and false negatives. We can think, for example, of the analytics applied to
avionic systems or those that apply in safety contexts; clearly, we need high reliability.
Use cases
Over the last five years, the authors of this book have collaborated to develop more than
150 different analytics and thousands of simple rules that can be applied to thousands of
different assets. The following are the most important use cases of these analytics that we
have identified in the I-IoT:
Asset reliability management: Without a doubt, this is the most common class of
analytics. It is used to monitor the health of the system. An example of where this
may be applied is checking the standard operability of a pump, or predicting
cracking or malfunctioning.
Maintenance optimization/planning: This is the most valuable class. Its purpose
is to avoid or delay unwanted maintenance. An example of where this may be
applied is avoiding unwanted cleaning or part replacements of a filter.
Performance and efficiency management or optimization: These analytics try to
minimize the cost while maintaining performance. An example of where this
may be applied is optimizing oil extraction.
Regulatory and compliance: Sometimes, we need to monitor Key Performance
Indicators (KPIs) to ensure that production remains in the constraints of regional
laws. An example of where this may be applied is to regulate Sulphur emissions
at a refinery.
Operation and scheduling optimization: This isn't specifically related to I-IoT,
but it is needed to manage standard operations. An example of where this may
be applied is to optimize the maintenance schedule of a train of assets.
[ 407 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Security and safety: These are rules to check the safety or security of humans or
machines. An example of where this may be applied is to measure the reliability
of a jet engine.
Design and modeling: These are normally used during the design phase. They
allow you to redesign the production. An example of where this may be applied
is to optimize piping.
What-if simulation for decision support: These are normally applied to human
interactions. These techniques allow you to change the parameters of a particular
analytic to simulate the behavior of equipment such as wind turbines, oil and gas
plants, or equipment related to the pharmaceutical industry. An example of
where this may be applied is changing the furnace temperature in a
petrochemical process.
Smart environment: This involves monitoring large environments or utilities. An
example of where this may be applied is to control human access to prohibited
area.
All of these technologies can be applied to the I-IoT in conjunction with the standard
supply chain, Enterprise Resource Planning (ERP), business intelligence (BI), or process
optimization.
Descriptive analytics
Descriptive analytics are the most basic class of analytics. They analyze past data to try and
provide a broad view of the fleet. In other words, descriptive analytics tries to answer the
question What happened? These analytics use data mining, aggregation, or visual
intelligence techniques to understand the status of the assets. In the I-IoT, the most common
technique used for descriptive analytics is KPI monitoring.
[ 408 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Condition monitoring
In the previous chapters, we developed a few basic analytics based on thresholding.
Condition monitoring is exactly this kind of thing. It uses thresholds or fuzzy rules to
extract potential issues that occurred in the past.
Anomaly detection
Anomaly detection algorithms have been applied to the I-IoT since its induction. Anomaly
detection analytics is a special class of descriptive analytics to catch and sometimes
anticipate anomalies from a piece of equipment.
[ 409 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Generally speaking, these analytics build a data-driven model based on the standard
operability of the equipment by monitoring it. This is then compared to the current data or
features. If the difference is too high, an alert is raised to indicate the user of a possible
malfunction. These analytics can be developed using simple rules, clustering algorithms,
simple moving averages, or more advanced analytics such as the Kalman filter.
[ 410 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Diagnostic analytics
Diagnostic analytics are the most common analytics in the I-IoT. These analytics use
advanced modelling techniques to analyze failure modes and to extract the root cause of an
issue. Diagnostic analytics try to reply to the question Why did it happen?
Detecting anomalies
Discovering anomalies
Determining the root cause of anomalies
Normally, the last two steps of diagnostic analytics are done by human investigation.
Diagnostic analytics can use feature extraction or anomaly reasoners to provide indicators
about why an anomaly happened. Anomaly detection is, on the contrary, an automatic step
performed by (more or less) sophisticated analytics.
After anomaly detection, we need to discover the failure mode and identify the cause of the
issue. Normally, these activities require human knowledge of failure modes and effect
analysis or a large dataset of past failures. For instance, we can implement a set of rules
(which can be deterministic, fuzzy, Bayesian, or machine learning-based), codifying the
cause and effect of the fault.
Predictive analytics
Predictive analytics look at the future and attempt to answer the question What could happen
in the future? These analytics use regression models or anomaly detection models to
anticipate potential issues that might occur in the future. In the I-IoT sector, the most
interesting sub-class of predictive analytics is prognostic analytics.
Prognostic analytics
Prognostic analytics is an estimation of time to failure and risk for one or more existing and
future failure modes (ISO 13381-1: 2015). Prognostic analytics predict the future
degradation or damage and the Remaining Useful Life (RUL) of an asset or part, based on
the measured data.
The goal of prognostics is to predict the cycles remaining before the damage grows beyond
the eligibility threshold. In order to predict the RUL, prognostics utilize the measured
damage levels or an estimation of the damage up to the current cycle.
[ 411 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Unfortunately, these measures are often affected by uncertainty, which might include noise,
bad measurements, and lack of knowledge. When we estimate the RUL, we also have to
quantify the probability density function (PDF) of how accurate the prediction is. The
process of estimating these uncertainties is called Uncertainty Quantification (UQ). The
following diagram summarizes these concepts:
Degradation prediction in a prognostic model, including the RUL and the UQ (adaptation from Prognostics and Health Management of Engineering Systems, Nam-Ho Kim, Dawn
An, and Joo-Ho Choi, Springer 2017)
[ 412 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Prescriptive analytics
Prescriptive analytics represents the next step in this prediction process and asks the
question: How should we respond to these potential future events? Prescriptive analytics
anticipates what will happen, when it will happen, and why, suggesting different options
related to decisions. These analytics forecast future opportunities and/or mitigate future
risks. The common application of prescriptive analytics in I-IoT is for condition-based
maintenance (CBM).
CBM
CBM is a maintenance strategy for repairing or replacing damaged or degraded parts that
might reduce the life of a machine. It monitors, detects, isolates, and predicts equipment
performance and degradation without shutting down the daily production. In CBM, the
maintenance of systems and components is based on the health of the equipment versus the
possibility it might break down or require scheduled maintenance. The following diagram
shows the basic idea of CBM compared to more standard maintenance approaches:
CBM tries to answer the question—Should we change this part or piece of equipment, in
accordance with our maintenance plan, or shall we take the risk and continue operating?
[ 413 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
The industrial sector has used these analytics on-premise since 1990 to optimize production
directly from the controller. These analytics are not properly part of the I-IoT, but they
should be considered in our discussion.
Rule-based
Rule-based analytics use knowledge about a variable or a particular feature to build a
decision-based algorithm. Rule-based analytics can either use expert systems, classifiers, or
rule-based ML. Rules-based analytics, for instance, can translate human knowledge or
empirical rules into an algorithm.
Model-based
What is a model? A model is a mathematical and probabilistic relationship that exists
between different variables.
Model-based analytics used with the I-IoT try to describe the relationship between input
and output or the internal status of the monitored equipment. This relationship can either
be based on mathematical formula or inferred from data. In other words, we can
distinguish between a physics-based model and a data-driven model.
[ 414 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Physics-based
Physics-based techniques assume that you have a pretty detailed knowledge of the
behavior of the equipment. Physics-based models join this knowledge about the
equipment with the measured data to measure the performance of a piece of equipment
and/or to predict its future behavior.
Data-driven
Data-driven methods can use ML, statistics, or Artificial Intelligence (AI), deep learning.
These techniques depend on collecting a history of failures, which requires volumes of data.
Without having a comprehensive understanding of the system, it can be hard to know how
much data is good enough for a specific purpose. We suggest that you collect at least six
months worth of data with relevant events. In data-driven approaches, the most common
technique is to use an artificial neural network (otherwise known simply as an NN or deep
learning, in which a network model learns a way to produce a desired output. In the CBM,
for instance, this might refer to the level of degradation of a turbine or the lifespan of a
filter.
Another common technique is Gaussian Process (GP) regression. GP is a often used with
regression-based, data-driven approaches. Other common regression models include linear
regression, such as the linear least squares method. There are also a wide variety of
algorithms including fuzzy logic, the relevance vector machine (RVM), support vector
machine (SVM) gamma process, the Wiener process, Hidden Markov Model (HMM), or
the Kalman filter.
[ 415 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
[ 416 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Data collection: During this step, we collect data and related events or alerts
Data integration and wrangling: During this step, we can convert the data
format (using wrangling or munging) and integrate the data with the metadata
or additional information
Data preparation and cleaning: During this step, we need to prepare the data
and remove spikes or bad data
[ 417 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
The following decision table will help you choose between a physics-based, data-driven, or
hybrid model:
When the model is ready, we can test it with our test set to identify the accuracy of our
model.
Units test
Regression test
Performance test
[ 418 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
We also need to wrap the data wrapper, which is the IN-OUT connector that transforms the
input into the internal object structure and vice versa. Finally, we can deploy the algorithm.
Step 5 – monitoring
Don't forget to periodically review the analytics to evaluate performance.
Analytics are strictly coupled with the support they want from the infrastructure. When
analytics require a large amount of data to be sampled every millisecond, we might not be
able to deploy on the cloud due to the restriction of the bandwidth. If the analytics that we
are using require more data than produced in the last few seconds or minutes, a data
stream platform may not be the right choice. If the analytics are stateless, we might be able
to use serverless technologies. If a user needs to interact with these analytics, they cannot
work easily in streaming mode.
Deploying analytics
Although analytics should be agnostic with regard to how the data is fed to the platform,
we have to consider several potential pitfalls that can affect the efficiency of the
analytics. There are several strategies that we can use to feed I-IoT data to the platform:
[ 419 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
It might be in the wrong order. For example, a data point at 18:00 might be sent
at 18:10 and a data point at 17:59 might be sent at 18:11.
It might be of a bad quality.
It might have holes in it.
It might have anomalous spikes in it.
It might be frozen. This refers to a situation where you have a suspiciously flat
number for a long time.
[ 420 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Data might also be delayed for a long period of time. We know this from personal
experience in the oil and gas industry—one particular customer reactivated their
connection to the cloud after six months of being disconnected and the data from
the sensors filled the data lake in three days. Unfortunately, the analytics processed the
data of the entire time period and detected a whole series of anomalies and alerts. These
alerts were not useful at all because they were from the time in which the customer was
disconnected, so the operations center was flooded with junk alerts.
To build a real I-IoT platform, we have to develop our architecture with sufficient
robustness to address these issues. For example, we can adopt a timeout for data that is too
late, or we can pre-process data and mark any data that is in the wrong order or that is
frozen. Moreover, we can interpolate data before fueling the analytics to avoid holes.
In the previous section, we learned about the technologies required to build analytics. Now,
we need to deploy them, assuming that the infrastructure supports our use case. We then
need to define the method to trigger the analytics. We have three methods we can use to do
this:
Stream analytics
Micro-batch analytics
Condition-based analytics
An alternative to streaming is to schedule the analytics to run regularly after a set period of
time, such as every 10 seconds. This involves checking the availability of the data and then
pulling it from the data lake or the time-series database. We call these analytics micro-batch
analytics.
[ 421 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Data is acquired from sensors and feeds the streaming analytics layer. Data points can be
cleaned and a simple rule can be applied. In parallel, we register the last timestamp and we
store the cleaned data in the storage layer. Micro-batch analytics are activated by the data's
availability.
[ 422 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Condition-based analytics
Analytics can also be triggered by unpredictable events such as human interactions or rare
events such as the abrupt shutdown of a piece of equipment that is being monitored. The
previous framework remains valid; we simply trigger the execution based on a rule
condition.
It's a different story, however, if a human wants to work with these analytics to perform,
for instance, a what-if scenario, or to discover a specific pattern.
Interactive analytics
Interactive analytics represents a combination of queries, a user interface, and a core for the
analytics. It is not possible to define the best method or technology to implement interactive
analytics. The best suggestion is to keep the core of the analytics in a shared library and to
develop the user interface with a small micro-app that uses a common framework such as
Power BI, Google Data Studio, Python-based libraries such as Dash, Bokeh, or D3.js
(https://d3js.org/), or R-based libraries such as R shiny (https://shiny.rstudio.com/).
Alternatively, we can use more advanced tools such as Jupyter (http://jupyter.org/) or
Apache Zeppelin (https://zeppelin.apache.org/).
[ 423 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
There are three use cases in which it is important to deploy analytics on the edge:
Normally, these analytics work on a real-time operating system (RTOS), which is very far
from the high latency of the cloud. AWS, OEM's Cloud, and Google, however, are working
to deliver an RTOS that works closely with an Edge device. Examples of this include AWS
Greengrass and FreeRTOS.
[ 424 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Advanced analytics
Recently, the advances in AI have caused the need for higher computational capabilities.
Vendors such as NVIDIA and Google have developed a special processor, GPU and TPU,
respectively, to support this new requirement. Recently, Google has introduced a
technology to move these analytics to the edge. AWS, Azure, Google, and OEM Cloud are
supporting this new processor on the cloud. We will look closer at these capabilities in
Chapter 15, Deploying Analytics on an IoT Platform.
[ 425 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
The OSA for CBM is the most popular framework. It consists of six layers—Data
Acquisition, Signal Processing, Condition Monitoring, Health Assessment, Prognostics,
and Decision Support. All of these steps can be implemented in a cloud-based architecture
and partially on-premises, as explained in the previous sections. In particular, signal
processing and condition monitoring can be implemented by using on-stream analytics or,
generally speaking, rule-based engines. The health assessment and prognostics phases,
however, require advanced analytics based on ML or physics-based models. The last step
can use either on-demand analytics or a big data query.
Analytics in practice
Let's now put what we have learned into practice. We are going to build a diagnostic
analytic and a predictive analytic. We will develop an anomaly detection algorithm for an
airplane and a predictive algorithm for an oil and gas refinery. We want to remain as
generic as possible, so we won't make any assumptions about the system that we are going
to monitor.
We will develop these two use cases with Python, SciPy, NumPy, Seaborn, and Pandas. We
will assume that Anaconda 5.2 or Python 3.7 are already installed on your system.
For your convenience Jupyter Notebook are available at the official Github
repository https://github.com/PacktPublishing/Hands-On-
Industrial-Internet-of-Things . To work with Jupyter Notebook, open
command console on the Chapter13 directory, then you can run from
command console:
jupyter notebook
Anomaly detection
For this exercise, we will use the free flight operations database provided by NASA.
The dataset is a free database that's provided by NASA, and can be found
at https://ti.arc.nasa.gov/opensource/projects/mkad/.
[ 426 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
The data is provided in a compressed CSV format where the 16 columns are Time,
Altitude, AirSpeed, Landing_Gear, Thrust_Rev, Flaps, and a further 10 generic
parameters (Param1_1, Param1_2, ..., Param2, Param3_1, ..., Param4).
Problem statement
For our exercise, we want to calculate the anomalies in the time-series to understand the
potential issues that might occur during a standard flight.
[ 427 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Step 2 – EDA
The first step shows the data:
import matplotlib.pyplot as plt
import seaborn as sns
# showing dataset
df.plot(x='Time', kind='line', subplots=True)
plt.show()
Time-series of flight
It is easy to see that when Flaps is anything other than zero, the airplane is taking off or
landing, so we should not consider this time frame. The following code drops the two
variables from the dataset:
# drop data during landing
df = df.drop(df[df['Flaps']>0].index)
df = df.drop(df[df['Landing_Gear']>0].index)
[ 428 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
The dataset is good, so we don't need to clean it. We can now analyze the standard
deviation of a few variables. The following code performs this analysis:
# analysis of variance
df_std = df.std()
print(df_std)
We can see that Landing_Gear, Thrust_Rev, and Flaps have a value of 0 for standard
deviation, so they are not useful at all. We can, therefore, remove these variables:
#removing un-usefull vars
df=df.drop(['Landing_Gear', 'Thrust_Rev' ,'Flaps'], axis=1)
We can now analyze the correlation between the sensors. The following code calculates the
Pearson correlation, pvalues. The Pearson correlation is a measure of the linear correlation
between two variables:
# correlation
from scipy.stats import pearsonr
def calculate_pvalues(df):
corr ={}
for r in df.columns:
for c in df.columns:
if not(corr.get(r + ' - ' + c )):
p = pearsonr(df[r], df[c])
corr[c + ' - ' + r ] = p
return corr
print('correlation')
d = calculate_pvalues(df).items()
for k,v in d:
print('%s :\t v: %s \t p: %s' % (k,v[0],v[1]))
The correlations that we have found between the variables suggest that we need to carry
out further investigation. We can use a RandomForestRegressor function to select the
most interesting features:
# separate into input and output variables
array = df.values
x = array[:,0:-1]
y = array[:,-1]
# perform feature selection
[ 429 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Selecting features
During the EDA step, we discovered that Landing_Gear, Thrust_Rev, and Flaps can be
removed from the dataset. This is because they only contribute during the landing of the
airplane, which is out of our scope.
Normally, we wouldn't do this, because the relationship between the input and the output
is quite complex, but for our simplified exercise, we have decided to use only the following
variables—Time, Altitude, Param1_4, and Param3_1.
[ 430 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
In this case, we cannot split the dataset to build a training model, because we do not know
what is a normal operability and what is an anomaly. Therefore, instead, we decide to use
an unsupervised method based on a moving average filter.
anomalies=[]
i=0
for y_i, avg_i in zip(y, avg):
if (y_i > avg_i + (sigma*std)) | (y_i < avg_i - (sigma*std)):
anomalies.append([i, y_i])
i=i+1
This algorithm has been applied to the selected variables Param1_4 and Param3_1:
x = df['Time'].values
Y_Param3_1 = df['Param3_1'].values
Y_Param1_4= df['Param1_4'].values
events_Param3_1 = search_anomalies(Y_Param3_1, window_size=50, sigma=2)
[ 431 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
plt.plot(x,Y_Param3_1, 'b')
plt.plot(x,Y_Param1_4, 'g')
plt.plot([x[row[0]] for row in A_Param3_1], [Y_Param3_1[row[0]] for row in
A_Param3_1], 'or')
plt.plot([x[row[0]] for row in A_Param1_4], [Y_Param1_4[row[0]] for row in
A_Param1_4], 'or')
plt.show()
Excluding the start and the end of the time-series, we can accept only three anomalies,
where both variables show an issue in the same timeframe.
[ 432 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
class MyTest(unittest.TestCase):
def test(self):
d=np.array([10, 10, 10, 10, 30, 20, 10, 10])
a=search_anomalies(d,2)
a=a['anomalies']
self.assertListEqual(a[1], [4,30])
if __name__ == '__main__':
unittest.main()
To deploy the analytics on the cloud, we have to decide whether to use a cold-path or a hot-
path. The developed algorithm doesn't need a lot of data and should run on the data that
we have available. We can use a stream-based infrastructure and deploy the algorithm
directly onto the data stream.
Step 5 – monitoring
It is good idea to review the performance of the algorithm quarterly, calculating the false
positives and the false negatives with the newly acquired dataset.
[ 433 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
The following code defines a function to extract residuals from the dataset and to apply the
OCSVM:
from statsmodels.tsa.arima_model import ARIMA
from matplotlib import pyplot
def ARIMA_residuals(series):
# fit model
model = ARIMA(series, order=(5,1,3))
model_fit = model.fit(disp=0)
print(model_fit.summary())
# plot residual errors
residuals = pd.DataFrame(model_fit.resid)
return residuals.T
def search_anomalies_OCSVM(y):
X_train=y
clf = svm.OneClassSVM(nu=0.005, kernel="rbf", gamma=0.01)
clf.fit(X_train)
anomalies=[]
X_test=y
y_pred_test = clf.predict(X_test)
for i in range(0,len(y)):
if(y_pred_test[i]<0):
anomalies.append([[i, X_test[i][0]],[i, X_test[i][1]]])
[ 434 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Now, we need only to bring these two functions together in a single main function:
Y=np.vstack(( ARIMA_residuals(Y_Param3_1), ARIMA_residuals(Y_Param1_4)) ).T
events_Param=search_anomalies_OCSVM(Y)
A_Param3_1 = [x[0] for x in events_Param['anomalies']]
A_Param1_4 = [x[1] for x in events_Param['anomalies']]
The first line of the preceding code computes ARIMA's residuals and puts them in an array.
We then apply the OCSVM clustering algorithm to identify potential anomalies. The last
two lines of the code split the result into two different arrays, ready to be plotted.
The basic idea is to remove the global trend of the data points, converting the dataset into a
cluster of points around the 0-axis and then identifying the anomalies using a clustering
algorithm such as OCSVM. The following graph shows the scatter plot of ARIMA's
residuals and the outliers identified by OCSVM:
[ 435 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Predictive production
In the previous exercise, we used ARIMA to build a model of the signal. ARIMA (and its
variants, AR, MA, ARIMAX, and so on) is one of the most common statistical algorithms to
work with time-series.
Our problem statement is to predict the future production of oil, given the information
about the last 62 days.
[ 436 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Step 2 – EDA
The dataset contains only one column of data, so we need to use the whole dataset. To
build the model, we want to use only 70% of the dataset to train the model and the
remaining data to test the model.
n=len(y)
s = int(len(y) * 0.7)
train, test = y[0:s+1], y[s:n]
# Evaluating ARIMA
order=order=(3,1,2)
model = ARIMA(train, order)
model_fit = model.fit(disp=0)
# Forecasting
prediction=model_fit.forecast(steps=n-s)[0]
# Visualization
plt.plot(train,'y', label='train')
plt.plot(range(s,n),test,'k', label='test')
plt.plot(range(s,n),prediction,'k--', label='predicted')
plt.legend()
plt.show()
[ 437 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
The following graph shows the predicted time-series versus the actual production, based
on the testing data:
[ 438 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Summary
In this chapter, we have explored the most important classes of analytics in the I-IoT from a
theoretical point of view. We have looked at the most important use cases, including CBM,
diagnostic analytics, prognostics, and predictive analytics. We also discussed the
relationship of the analytics with the data in terms of model accuracy and data processing.
Finally, we implemented a diagnostic algorithm, the anomaly detection exercise, and a
predictive analytic model. Anomaly detection and production prediction are two of the
most common I-IoT algorithms.
In the next chapter, we will focus on the kernel of the analytics—the digital twin.
Questions
1. Which of the following technologies is most applicable for cloud stream
analytics?
1. Machine learning
2. Stateful
3. Simple threshold
4. High sampling analysis
2. Which of these statements best describes the concept of prognostics?
1. The time before a component breaks
2. The probability that a component will break
3. The remaining life of a component
4. The prediction of the remaining life of a component
3. What's the EDA?
1. Explorative Discovery Architecture
2. Explorative Data Analysis
3. Explorative Diagnostic Application
4. What's the OSA?
1. A cloud architecture
2. An open software for CBM
3. A framework for diagnostics
4. An open framework in CBM systems
[ 439 ]
Understanding Diagnostics, Maintenance, and Predictive Analytics Chapter 13
Further reading
Additional resource can be found at the following links and books:
[ 440 ]
14
Implementing a Digital Twin –
Advanced Analytics
In the previous chapter, we investigated some of the most interesting classes and use cases
to do with I-IoT analytics. We discovered that an analytic can be descriptive, diagnostic,
predictive, or prognostic. We also mentioned the remaining useful life (RUL) of an asset or
part within the I-IoT.
In this chapter, we will improve our knowledge of I-IoT analytics, using more advanced
technologies based on machine learning (ML) and deep learning (DL).
Digital twins
Practical examples with DL and ML algorithms
Platforms on which to build digital twins
Technical requirements
In this chapter, we need the following prerequisites:
Git: https://git-scm.com/downloads
Python 3.7: https://www.python.org/downloads/
Anaconda 5.2: https://www.anaconda.com/download/
Advanced technologies
When working with the I-IoT, we typically need to work with a plethora of technologies,
from statistical data modeling using stochastic processes to artificial intelligence. These
technologies often come from other fields, such as trading, image processing, and natural
language processing (NLP), and have been adapted for use in the I-IoT. Let's take a look at
some of the most interesting classes of technologies in the I-IoT.
[ 442 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
ML
ML algorithms try to build a model from data. ML algorithms can be supervised,
unsupervised, or reinforcement-based.
Supervised learning
In supervised models, the algorithm learns from a set of data that is labelled with the
correct answers. For instance, in the renewable energy sector, we might have a set of data
that contains the input of a wind turbine, including the amount of wind and the turbine's
position, and the output, which is the amount of power generated. From this data, we can
infer a model that is able to predict the output of the wind turbine based on the input.
Other examples of supervised learning include regression, classification, tagging, ranking,
sequence learning, neural networks, and regularization.
Unsupervised learning
Unsupervised models don't know the correct answer and try to infer a model by using just
the data input. Some examples of unsupervised learning models include clustering,
instance-based models, subspace estimations, generative adversarial networks, and
Bayesian models.
DL
From 2010, the ML technique DL became an important new method for implementing
neural networks (NNs). Thanks to the application of a hierarchical distributed
representation across multiple layers, NNs have been restored to their former glory.
[ 443 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
In the I-IoT, the most common NN is the recurrent neural network (RNN) or the long
short-term memory (LSTM) network. LSTMs are normally used to predict time-series data.
Recently, however, we have started to use Convolutional Neural Networks (CNNs) in the
IoT sector as well.
Several companies, including Google and NVIDIA, have been developing new hardware
processors to support this technology. A few vendors have also applied GPUs to NNs,
thanks to the ability of GPUs to work with connected graphs. Google has also developed a
new ASIC processor called a tensor processing unit (TPU) and the TensorFlow library to
work with GPUs or TPUs either on-premise or in the cloud. Later, we will look at an
example of using RNNs with TensorFlow.
TensorFlow
TensorFlow (https://www.tensorflow.org/) is an open source library for machine
learning algorithms with great support for GPUs or TPUs. The easiest way to work with
TensorFlow is through Keras and Python, but it is also possible to use other languages
(such as C++) directly with TensorFlow.
[ 444 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
In the dataset, the time series ends some time before the system failure.
Problem statement
For our exercise, we not only want to calculate the RUL but also predict the upcoming
system failure in the last remaining cycles. We assume that one cycle is equal to 1 day. We
will calculate the RUL later.
1. From the command console, install pandas and numpy with the following
commands:
pip install pandas
pip install numpy
pip install matplotlib
pip install seaborn
pip install scipy
pip install keras
[ 445 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
[ 446 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
We can now analyze the correlation between the sensors. The following code calculates the
Pearson correlation, pvalues and shows the pairplot of the first five engines:
# correlation
from scipy.stats import pearsonr
def calculate_pvalues(df):
df = df.dropna()._get_numeric_data()
dfcols = pd.DataFrame(columns=df.columns)
pvalues = dfcols.transpose().join(dfcols, how='outer')
for r in df.columns:
for c in df.columns:
pvalues[r][c] = round(pearsonr(df[r], df[c])[1], 4)
return pvalues
# showing correlation
import matplotlib.pyplot as plt
import seaborn as sns
[ 447 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
plt.show()
[ 448 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
Given the correlation, we can use a RandomForestRegressor function to select the most
important variables:
# Selected features
from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestRegressor
def select_feature(df):
print("extract feature")
[ 449 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
names = df.columns.values[0:-1]
for i in range(len(fit.support_)):
if fit.support_[i]:
print(names[i])
...
Selecting variables
During the step 2 – exploratory data analysis (EDA), we discovered that sensors 1, 5, 10, 16,
18, and 19 and operative setting 3 are not useful at all. The Random Forest Regressor
suggested that we build a model only on a small subset of our variables. We will apply our
model only to the following sensors:
columns_feature=['sensor_4','sensor_7']
[ 450 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
1. Building the RNN and scaling the data set in the interval [0,1]
2. Building the data set to train the model
3. Training the model
4. Testing the model
1. On the first 5 lines Keras and the Scikit-Learn packages jave been imported. The
function prepare_dataset() scales the data, and the function build_model()
builds the model:
# fit the model
import math
from keras.models import Sequential
from keras.layers import Dense
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
def build_model(input_dim):
# create model
model = Sequential()
model.add(Dense(16, input_dim=input_dim,
activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
return model
[ 451 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
The first layer has 16 neurons and expects four input variables. The second
hidden layer has 32 neurons. The output layer has one neuron, which predicts the
output.
During compilation, Keras will use the best numerical library, such as
TensorFlow. The backend will automatically choose the best way to model the
NN to run on your hardware. This might be with CPU, GPU, or distributed
computation.
2. The next step is to build the training set and train the model:
def create_train_dataset(dataset):
dataX, dataY = [], []
start=len(dataset)
for i in range(len(dataset)):
a=dataset[i]
b=(start-i) / start
dataX.append(a)
dataY.append(b)
return np.array(dataX), np.array(dataY)
#engine
i=1
[ 452 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
4. Finally, we can test engine 1 with the data provided in the test_FD001 file and
compare the output with the expected output provided in the RUL_FD001 file
(data is available at the official GitHub repository under the data folder):
# load testing
df_test = pd.read_csv('./data/test_FD001.txt',
delim_whitespace=True,names=columns)
expected = pd.read_csv('./data/RUL_FD001.txt',
delim_whitespace=True,names=['RUL'])
n=len(dataset)
dataset_test = prepare_dataset(df_test[(df_test.unitid
==i)],columns_feature)
testPredict = model.predict(dataset_test)
testPredict = np.multiply(testPredict,n)
We have estimated an RUL of 114 cycles. The expected answer was 112 cycles, so we have
an error of about 2%.
To deploy analytics on the cloud, we have to decide whether to use cold-path or hot-path.
The developed algorithm needs a lot of data and should run daily or weekly so that we
don't need to use a stream-based infrastructure and we can use the batch processing
mechanism.
[ 453 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
Step 5 – monitoring
It is a good idea to review the performance of the algorithm quarterly, calculating the false
positives and false negatives with the newly acquired datasets.
We can compare our data with the physical model of a turbine to evaluate the loss of
performance in order to predict degradation in the next 10 years.
[ 454 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
The following is a plot of the model with varying wind speed on the x axis:
Note that this digital twin is only valid for a specific wind turbine model and we cannot
generalize it to another. Generally speaking, however, every wind turbine will follow a
similar sigmoidal model. Let's build our algorithm:
1. Import the dataset and calculate the expected power versus the measured power.
After that, visualize the dataset:
# load data
df = pd.read_csv('./data/wind_turbine.csv')
expected_power = [wind_turbine_model(x) for x in
df.wind_speed_ms]
reference_power = [wind_turbine_model(x) for x in range(0,30)]
[ 455 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
3. Finally, to predict the degradation of the equation over the next 9 years, we can
estimate the first-order equation of the degradation using the following code:
# predict degradation
samples=365*24*6
year=9
z = np.polyfit(ts, de, 1)
print("degradation in %s years will be %.2f %%" % (year,
100*(z[0]*(samples*year) + z[1])) )
[ 456 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
The output is degradation in 9 years will be -15.72 %. This is not exactly the
same as the expected 16%, but it is quite close.
The platforms that can support digital twins today include AWS, Predix, and Google
Cloud Platform (GCP).
[ 457 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
AWS
AWS recommends that we use SageMaker as the main platform for ML. With SageMaker,
we can define our model to train parameters and hyperparameters. We can also store the
model to deploy it later either on the cloud or on-premise. The basic idea of SageMaker is to
train the model that is coding it on Jupyter and later deploy the model as a microservice to
get the results during normal operation. A high level of computation is required during
training.
Predix
Predix implements digital twins in a very similar way to SageMaker. It builds analytics as
microservices that are accessible through the REST API. Predix Analytics uses Predix Asset
to access metadata about the assets.
GCP
GCP recommends that we use its cloud ML engine with TensorFlow to deploy and train
digital twins.
Other platforms
Other platforms working on digital twins include Siemens MindSphere, IBM Cloud
Watson, and Azure.
Advanced modeling
We are likely to need a more advanced modeling tool in areas such as thermodynamics,
mechanics, fluid dynamics, chemistry, transportation, construction, and physics in general.
MATLAB or Simulink are the most common commercial software available, but other open
source technologies such as OpenModelica (https://www.openmodelica.org/), SageMath
(http://www.sagemath.org/), or GNU Octave (https://www.gnu.org/software/octave/)
can be considered as well. OpenModelica, in particular, supports a Python API, so the
physics-based model can be built by an expert with OpenModelica and we can then decide
to integrate the model in our analytics at a later date.
[ 458 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
Summary
In this chapter, we have explored one of the hottest topics of the I-IoT—the digital twin. We
learned about the differences between the physics-based and data-driven approaches and
we applied these to two use cases—wind turbine and engines. We also looked at some new
technologies, such as DL and RL.
In the next chapter, we will apply the exercises that we have carried out in the past two
chapters to a cloud-based technology using Azure, GCP, and AWS.
Questions
1. Which of the following technologies is not a neural network technology?
1. CNN
2. RNN
3. LSTNN
4. LSTM
2. Which of the following is the best definition of digital twins?
1. A 3D model of a piece of equipment
2. A digital copy of the design project of a piece of equipment
3. A digital representation of a piece of equipment
3. What is the difference between a physics-based and a data-driven model?
1. A physics-based model is based on mathematics, while a data-driven
model is based on statistics
2. A physics-based model is based on design knowledge, while a data-
driven model is driven by data
3. A physics-based model is based on rules, while a data-driven model is
based on machine learning or deep learning
[ 459 ]
Implementing a Digital Twin – Advanced Analytics Chapter 14
Further reading
Additional resource can be found via the following links:
[ 460 ]
15
Deploying Analytics on an IoT
Platform
In the previous chapters, we looked at the differences between different types of analytics.
We implemented some examples of prognostic and diagnostic analytics in the world of the
I-IoT. We also studied how to deploy our analytics on the most common platforms and
how to use open source technologies.
In this chapter, we will finalize our exercise by delivering the algorithms developed in
Chapter 14, Implementing a Digital Twin – Advanced Analytics. In particular, we want to
highlight the major differences between three platforms: AWS, Azure, and GCP. We will
discover that all three platforms adopt the same principles of providing a computational
infrastructure for training and a service-oriented platform for using the analytical model.
Deploying diagnostic analytics using the Azure Machine Learning (ML) service
Deploying prognostic analytics using AWS SageMaker
Understanding GCP analytics
Technical requirements
In this chapter, we need the following prerequisites:
Git: https://git-scm.com/downloads
Python 3.7: https://www.python.org/downloads/
Anaconda 5.2: https://www.anaconda.com/download/
Docker: https://www.docker.com
Deploying Analytics on an IoT Platform Chapter 15
For the complete solution to the exercises, please go to the repository at https://github.
com/PacktPublishing/Hands-On-Industrial-Internet-of-Things.
The Azure ML service is different from the Azure ML Studio. The Azure
ML Studio is a collaborative visual workspace where we can build, test,
and deploy analytics without needing to write code. Models created in the
Azure ML Studio cannot be deployed or managed by the Azure ML
service.
The basic steps to develop our analytical model with the Azure ML service are as follows:
When the model is a web service, we can call from different sources (such as Azure
Function or Azure IoT Hub) and integrate them in our business process. The following
diagram demonstrates these concepts:
[ 462 ]
Deploying Analytics on an IoT Platform Chapter 15
We can write our code either using our preferred IDE or directly in a Jupyter Notebook
exposed by Azure. In either case, we can perform the training and testing phases directly
on Azure ML by calling the Azure ML APIs. When we are happy with our results, we can
expose the model as a web service that can be consumed by other services such as the IoT
Hub.
Many data scientists prefer to use an integrated graphical environment, such as Azure ML
Studio. Others prefer to access a cloud-based environment in which they can write their
own code, such as Azure Notebook. Others prefer to use a local IDE and integrate with the
cloud through SDKs, so they can switch between a local environment and cloud
environment using only the APIs. We belong to this last class. In this exercise, we are going
to use a locally installed Jupyter Notebook with an Azure ML SDK, but we could also use
Visual Code with the Azure ML extension (https://visualstudio.microsoft.com/
downloads/ai-tools-vscode/). Obviously, it is quite easy to extend the exercise to Azure
Notebook.
[ 463 ]
Deploying Analytics on an IoT Platform Chapter 15
If we want to use the Azure portal (which can be found at the following link: https://
portal.azure.com), follow these steps:
[ 464 ]
Deploying Analytics on an IoT Platform Chapter 15
3. Choose the standard resource group, iiot-book-res, and then click on the
Create button, as shown in the following screenshot:
Azure ML instantiation
[ 465 ]
Deploying Analytics on an IoT Platform Chapter 15
After that, we can access the Azure Notebook by clicking on the Experiments tab in the
workspace that we just created, then clicking on Open Azure Notebooks, as shown in the
following screenshot:
Azure Notebook
Note that when we create a workspace, Azure instantiates additional services, as indicated
by the dashed rectangle in the preceding screenshot.
1. From the command line, then install the Jupyter Notebook and the Azure ML
SDK:
$ pip install azureml-sdk[notebooks]
$ pip install jupyter
2. Then, we can start our Jupyter Notebook. From the command line, run the
following command:
$ jupyter notebook
[ 466 ]
Deploying Analytics on an IoT Platform Chapter 15
ws = Workspace.create(name='iiot-book-ml-workspace',
subscription_id='<subscription-id>',
resource_group='iiot-book-res',
create_resource_group=True,
location='eastus2'
)
# store information on the configuration file
ws.write_config()
To get the subscription ID, we can search for it on the Azure portal, as shown in the
following screenshot:
[ 467 ]
Deploying Analytics on an IoT Platform Chapter 15
Instead of creating the workspace every time, we can save the configuration in a
config.json file and restore it with the following Python code:
ws = Workspace.from_config()
From the command console, start the Jupyter server with the following command:
$ jupyter notebook
[ 468 ]
Deploying Analytics on an IoT Platform Chapter 15
Then, we need to create a new notebook by clicking on New | Python 3 from the menu on
the right. As we learned before, we need an instance of the Azure Notebook workbench.
The following code instantiates an Azure workbench, stores this information in a local file
so that it can be loaded later, and starts an experiment called wind-turbine-experiment:
import azureml.core
print(azureml.core.VERSION)
subscription_id = '<my subscription id>'
from azureml.core import Workspace
ws = Workspace.create(name='iiot-book-ml-workspace',
subscription_id=subscription_id,
resource_group='iiot-book-res',
create_resource_group=True,
location='westeurope' # or other supported Azure
region
)
We can now write our physics-based model. This model is the same model as we
developed in Chapter 14, Implementing a Digital Twin - Advanced Analytics. The code is
printed here for convenience:
# model
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
def wind_turbine_model(x):
# standard operability
return 376.936 - 195.8161*x
+ 33.75734*x**2 - 2.212492*x**3
+ 0.06309095*x**4 - 0.0006533647*x**5
[ 469 ]
Deploying Analytics on an IoT Platform Chapter 15
Finally, we can visualize the output on the Azure ML experiment with the following code:
run.log_list('Wind Turbine Model', reference_power) # log a list of values
The model works as expected, which means we can now deploy it on the Azure cloud.
[ 470 ]
Deploying Analytics on an IoT Platform Chapter 15
myenv = CondaDependencies()
#myenv.add_conda_package("keras")
with open("myenv.yml","w") as f:
f.write(myenv.serialize_to_string())
The code of our model should contain two methods: the init() method, and the
run() method. In the init() method, the model is loaded and prepared. In our exercise,
this simply involves assigning the wind turbine formula to a global variable. The run()
method is called when we need to run the model. During this method, we expect the data
to be passed as a JSON and returned as a JSON. In our exercise, we parse the JSON file and
call the wind turbine model.
def wind_turbine_model(x):
# standard operability
return 376.936 - 195.8161*x + 33.75734*x**2 - 2.212492*x**3 +
0.06309095*x**4 - 0.0006533647*x**5
def init():
global model
# not model for wind turbine
model = wind_turbine_model
[ 471 ]
Deploying Analytics on an IoT Platform Chapter 15
def run(raw_data):
data = np.array(json.loads(raw_data)['data'])
# make evaluation
y = model(data)
# return the JSON
return json.dumps(y)
We can customize the memory and the CPU to match our requirements.
service = Webservice.deploy_from_model(workspace=ws,
name='wind-turbine',
deployment_config=aciconfig,
models=[],
image_config=image_config)
service.wait_for_deployment(show_output=True)
print(service.scoring_uri)
[ 472 ]
Deploying Analytics on an IoT Platform Chapter 15
This procedure takes a few minutes to package the image and start the Docker container.
After that, we will receive the URL of the REST API to call.
Be careful with the name of the image. Do not use special characters; only
use lowercase letters.
We can also see the deployed model by accessing the Azure portal. Click on iiot-book-
ml-workspace and then access the Deployments tab to see the URL, as shown in the
following screenshot:
As indicated by the arrow in the preceding screenshot, we now have a new service called
wind-turbine, which is our Docker container.
[ 473 ]
Deploying Analytics on an IoT Platform Chapter 15
print("result:", resp.text)
Note that this method is completely insecure, and in a real-life situation, we should enable
authentication. To understand how to enable SSL and authentication, go to the link at
https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-secure-
web-service.
[ 474 ]
Deploying Analytics on an IoT Platform Chapter 15
In the previous example, we covered all of these steps in our creation of a physics-based
digital twins model. When we work with a data-driven approach, however, we need an
additional sub-step to train and test the model using real data. We can use the machine
learning abilities and high computational abilities (both GPU-and CPU-based) provided by
Azure ML to accomplish this goal.
To demonstrate this, we will develop a simple logistic regression model to identify the
anomalies of our wind turbine based on experimental data.
We will use the wind turbine digital twins to produce simulated data to infer a surrogate
model. The purpose of our exercise is not to find the best algorithm, but simply to use the
computational capabilities of the Azure ML service.
The y_train contains the value 0 when the relation between wind and power is an
anomaly and is marked as 1 using an additional variable:
X_train=[]
y_train=[]
# regular data
X_train.extend( [[x ,wind_turbine_model(x)] for x in range(0,30)] )
[ 475 ]
Deploying Analytics on an IoT Platform Chapter 15
# anomaly data
for x in range(15,30):
X_train.extend([[x, 50 + x*random.random()]])
y_train.extend([0])
ds = ws.get_default_datastore()
ds.upload(src_dir='./data', target_path='mydata',
overwrite=True)
We are now ready to train our model, using the logistic regression method from Scikit-
Learn:
logreg = LogisticRegression(C=1.0/args.reg, random_state=0, solver='lbfgs',
multi_class='multinomial')
logreg.fit(X_train, y_train)
import numpy as np
We get an accuracy of about 80%. This is quite low, but we aren't going to investigate
further into the efficiency of our model. The purpose of this exercise isn't to develop an
anomaly detection algorithm, but to show how to train a model with Azure ML.
[ 476 ]
Deploying Analytics on an IoT Platform Chapter 15
When the model has been built and trained, we can dump it in the Azure ML repository, so
that we can restore it during the init() method of core.py:
from sklearn.externals import joblib
joblib.dump(value=logreg, filename='outputs/sklearn_windturbine_model.pkl')
try:
# look for the existing cluster by name
compute_target = ComputeTarget(workspace=ws,
name=batchai_cluster_name)
if type(compute_target) is BatchAiCompute:
print('found compute target {}, just use
it.'.format(batchai_cluster_name))
else:
print('{} exists but it is not a Batch AI cluster. Please choose a
different name.'.format(batchai_cluster_name))
except ComputeTargetException:
print('creating a new compute target...')
compute_config =
BatchAiCompute.provisioning_configuration(vm_size="STANDARD_D2_V2", #
small CPU-based VM
#vm_priority='lowpriority', # optional
[ 477 ]
Deploying Analytics on an IoT Platform Chapter 15
autoscale_enabled=True,
cluster_min_nodes=0,
cluster_max_nodes=2)
# create the cluster
compute_target = ComputeTarget.create(ws, batchai_cluster_name,
compute_config)
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)
script_params = {
'--data-folder': ds.as_mount(),
'--regularization': 0.8
}
[ 478 ]
Deploying Analytics on an IoT Platform Chapter 15
est = Estimator(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
entry_script='train.py',
conda_packages=['scikit-learn'])
run = exp.submit(config=est)
Azure ML will build a job on the allocated cluster and will provide a link, at which we can
monitor the computational progress. We can see the accuracy in the TRACKED METRICS
section, as shown in the following screenshot:
[ 479 ]
Deploying Analytics on an IoT Platform Chapter 15
We can also submit our Docker image to the edge. For more information,
take a look at the link at https://docs.microsoft.com/en-us/azure/
machine-learning/service/how-to-deploy-to-iot.
[ 480 ]
Deploying Analytics on an IoT Platform Chapter 15
The steps we need to follow to build the analytics are listed as follows:
Downloading a dataset on S3
To start our exercise, we need to download the data to a repository that is accessible by
SageMaker. To do this, we can use S3:
1. We need to sign in to the AWS Management Console and open the Amazon S3
console at https://console.aws.amazon.com/s3.
2. In the upper-right corner of the AWS Management Console, we can choose our
desired AWS region. In this example, we will use EU (Ireland).
[ 481 ]
Deploying Analytics on an IoT Platform Chapter 15
3. Now, we will need to create a bucket. In the S3 console, click on the + Create
bucket button, as shown in the following screenshot:
Create bucket on S3
4. For the Bucket name field, type iiot-book-data in the textbox and click Next.
Leave everything as the default settings on the next two pages and click Create
bucket on the Review page.
5. Click the link for the bucket name you just created and then click on the Upload
button, as shown in the following screenshot:
[ 482 ]
Deploying Analytics on an IoT Platform Chapter 15
Upload data
6. Click + Add more files, find and select the train_FD001.txt file, the
test_FD001.txt file, and the RUL_FD001.txt file, and click Upload.
1. In the AWS Management Console, search for SageMaker and press Enter.
2. We can create a notebook from the SageMaker dashboard. Click on Create
notebook instance.
[ 483 ]
Deploying Analytics on an IoT Platform Chapter 15
3. Give the instance a name, such as iiot-book-notebook, and set the instance
type to be ml.t2.medium, as shown in the following screenshot:
4. Amazon SageMaker will need a role to launch and access resources in the
account. To simplify this, select Create a new role.
5. After that, select the Specific S3 buckets option.
[ 484 ]
Deploying Analytics on an IoT Platform Chapter 15
6. Introduce the bucket name that you used in step 5 of the Downloading a dataset on
S3 section:
[ 485 ]
Deploying Analytics on an IoT Platform Chapter 15
9. Select the No VPC option. Doing this will make it easier to configure access to
the Amazon S3 bucket.
10. Select No Custom Encryption.
11. Click Create notebook instance and wait until the instance's status is InService.
[ 486 ]
Deploying Analytics on an IoT Platform Chapter 15
12. We are now ready to open the notebook for our code. Click on the Open button
shown in the following screenshot:
The other items in the menu on the left allow us to check the status of the training, create
models, create endpoints, and see logs.
We can do this either in the online SageMaker Jupyter or in the locally installed Jupyter
Notebook:
[ 487 ]
Deploying Analytics on an IoT Platform Chapter 15
print(region)
print(role)
import boto3
s3 = boto3.resource('s3')
s3.Bucket(bucket_name).download_file(file_name_train,
file_name_train)
s3.Bucket(bucket_name).download_file(file_name_test,file_name_t
est)
# prepare model
columns_feature=['sensor_4','sensor_7']
file_name_train = 'train.csv'
file_name_test = 'test.csv'
dataset_train=df[(df.unitid ==i)]
dataset_test=df_test[(df_test.unitid ==i)]
[ 488 ]
Deploying Analytics on an IoT Platform Chapter 15
s3 = boto3.resource('s3')
target_bucket = s3.Bucket(bucket_name)
We have now completed our first exercise with SageMaker, working with the S3 repository
and the SageMaker notebook.
[ 489 ]
Deploying Analytics on an IoT Platform Chapter 15
The following diagram shows the architecture of the RUL analytics implementation. In the
local_test directory, we can find the data and the scripts to test our container locally
before pushing it to the cloud. In the rul directory, we can find the predictor evaluation
predict.py and the training model, which is the train file. The files serve, ngnix.cfg,
and wsgi.py build REST APIs to expose predict.py as a microservice:
[ 490 ]
Deploying Analytics on an IoT Platform Chapter 15
1. The first part file of the train file defines the import:
#!/usr/bin/env python3
import pandas as pd
import numpy as np
2. The following section defines the variables where the algorithm will look for the
data:
…
prefix = '/opt/ml/'
[ 491 ]
Deploying Analytics on an IoT Platform Chapter 15
3. Then, we define the training function that we built in Chapter 14, Implementing a
Digital Twin - Advanced Analytics:
...
# data columns
columns = ['unitid', 'time', 'set_1','set_2','set_3']
columns.extend(['sensor_' + str(i) for i in range(1,22)])
# prepare model
columns_feature=['sensor_4','sensor_7']
def build_model(input_dim):
…
return model
def create_train_dataset(dataset):
…
return np.array(dataX), np.array(dataY)
...
[ 492 ]
Deploying Analytics on an IoT Platform Chapter 15
4. Finally, in the train() function, we read the input, build the model, train it, and
save it in a local file, called rul-model.h5:
...
try:
# Read in any hyperparameters that the user passed with
the training job
with open(param_path, 'r') as tc:
trainingParams = json.load(tc)
# Take the set of files and read them all into a single
pandas dataframe
input_files = [ os.path.join(training_path, file) for
file in os.listdir(training_path) if file.endswith('.csv') ]
…
# prepare dataset
dataset_train=
prepare_dataset(dataset_train,columns_feature)
except Exception as e:
…
[ 493 ]
Deploying Analytics on an IoT Platform Chapter 15
The file will be re-opened by the predictor evaluation predictor.py, which includes only
the evaluation steps. The most important part of the algorithm is as follows:
1. The first part file of the predictor.py file defines the import and the variables
where the algorithm will look for the data:
…
prefix = '/opt/ml/'
model_path = os.path.join(prefix, 'model')
2. Then, we define the prediction function that we built in Chapter 14, Implementing
a Digital Twin - Advanced Analytics:
...
# data columns
columns = ['unitid', 'time', 'set_1','set_2','set_3']
columns.extend(['sensor_' + str(i) for i in range(1,22)])
# prepare model
columns_feature=['sensor_4','sensor_7']
[ 494 ]
Deploying Analytics on an IoT Platform Chapter 15
@classmethod
if cls.model == None:
inp = os.path.join(model_path, 'rul-model.h5')
print('Loading model %s ...' % inp)
cls.model = load_model(inp)
print('model loaded')
cls.graph = tf.get_default_graph()
print(cls.model.summary())
return cls.model
...
...
# The flask app for serving predictions
app = flask.Flask(__name__)
@app.route('/invocations', methods=['POST'])
def transformation():
data = None
[ 495 ]
Deploying Analytics on an IoT Platform Chapter 15
else:
return flask.Response(response='This predictor only
supports CSV data', status=415, mimetype='text/plain')
# Do the prediction
testPredict = ScoringService.predict(data)
testPredict = np.multiply(testPredict, N)
print("RUL of Engine %s : predicted:%s expected:%s"%(1,
testPredict[-1], 112))
To build the model, we can just run the following command from the command line:
$ cd container
$ docker build -t rul-estimator .
In test_dir, we need to replicate the same structure and save the training data into
the input/data/training directory.
[ 496 ]
Deploying Analytics on an IoT Platform Chapter 15
[ 497 ]
Deploying Analytics on an IoT Platform Chapter 15
The residual life is 113.80, compared to 112, which is the expected value.
This will upload the file on the AWS Container Registry (the ECR). This process will look as
follows:
[ 498 ]
Deploying Analytics on an IoT Platform Chapter 15
This syntax is the same as we used during the local test. We can now run our estimator
from the SageMaker Notebook. We need to allocate the size of machine, which
is ml.m5.xlarge, pass the data, and fit the model:
import sagemaker as sage
from time import gmtime, strftime
[ 499 ]
Deploying Analytics on an IoT Platform Chapter 15
region = sess.boto_session.region_name
bucket_name = 'iiot-book-data'
image = '{}.dkr.ecr.{}.amazonaws.com/rul-estimator:latest'.format(account,
re-gion)
output_location = 's3://{}/output'.format(bucket_name)
base_location = 's3://{}'.format(bucket_name)
print(image)
model = sage.estimator.Estimator(image,
role, 1, 'ml.m5.4xlarge',
output_path=output_location,
sagemaker_session=sess)
model.fit(base_location)
SageMaker builds the model and saves the results in the model directory.
In this example, we used the default configuration, where SageMaker assumes that it will
find the data. The fit() function supports more arguments, meaning we can pass
additional information to customize our model.
[ 500 ]
Deploying Analytics on an IoT Platform Chapter 15
We can test the model by passing the CSV test file as an input:
import sagemaker
from sagemaker.predictor import csv_serializer, json_deserializer
predictor.content_type = 'text/csv'
predictor.serializer = csv_serializer
result = predictor.predict(dataset_test.values).decode('utf-8')
print(result)
[ 501 ]
Deploying Analytics on an IoT Platform Chapter 15
ML Engine
The process to develop a model using the ML Engine is quite similar to that we have seen
in the Working with the Azure ML service and Implementing analytics on AWS SageMaker
sections:
[ 502 ]
Deploying Analytics on an IoT Platform Chapter 15
The basic concept of Engine ML is to use the gcloud ml-engine command-line tool and
the Google Cloud libraries to train the model locally and interact with the cloud through a
REST API.
The following is an example of how we can run our algorithm locally and remotely:
$ gcloud ml-engine local train \
--module-name trainer.task \
--package-path trainer/ \
--job-dir $MODEL_DIR \
-- \
--train-files $TRAIN_DATA \
--eval-files $EVAL_DATA \
[ 503 ]
Deploying Analytics on an IoT Platform Chapter 15
With the GCP Engine ML, we can do the same operations locally or remotely. The only
difference is in how we store the data. In this respect, we can use Google Storage bucket,
which is very similar to what we did with AWS S3.
PyTorch
If we are working with Python, deep learning, and GPU, we can get interesting benefits
from PyTorch (https://pytorch.org). PyTorch is supported by Azure, AWS, and GCP. It
can also be installed on-premises.
Chainer
Chainer (https://chainer.org/) is supported by AWS and Azure, and partially by GCP. It
is a deep learning library for Python.
MXNet
MXNet (https://mxnet.apache.org) is another popular library for deep learning and
more. It is supported by AWS, GCP, and Azure.
Apache Spark
Apache Spark (https://spark.apache.org/) is a typical big data technology, but is
supported by vendors such as Azure, AWS, Predix, and GCP. Apache Spark supports ML
algorithms through MLib.
[ 504 ]
Deploying Analytics on an IoT Platform Chapter 15
Summary
In this chapter, we discovered how to deploy analytics on Azure ML and AWS SageMaker,
using the algorithms developed in Chapters 13, Understanding Diagnostics, Maintenance, and
Predictive Analytics, and Chapter 14, Implementing a Digital Twin - Advanced Analytics. We
looked at the differences between the deployment methodologies of Azure, AWS, and GCP,
and we also explored the new trend of IoT Analytics.
This chapter is the last chapter of our journey into Industrial IoT, which we started in
Chapter 1, Introduction to Industrial IoT. We began by looking at the differences between the
IoT and the I-IoT. The first four chapters talked about the most important data sources in
the industrial sector and the differences between them. We looked at the OPC UA and
learned how it is the new upcoming standard in the industrial sector. We also understood
the differences between data polling and data subscription. In Chapter 7, Developing
Industrial IoT and Architecture, we explored a general-purpose architecture of I-IoT
platforms and implemented five different prototypes in the chapters that followed: open
source, Azure, AWS, GCP, and OEM platforms such as Predix. We learned that the I-IoT is
predominantly made up of time series data, but also logs and static data such as asset
information. Finally, we discovered the most important classes of analytics: diagnostic,
predictive, and prognostic. We also traced the most important steps to build analytics and
the three ways to deploy analytics: using the hot path, the cold path, and on the edge or the
cloud.
We expect to see an industrial revolution in the next two years. Several vendors are
currently working on the IoT to deliver their own solutions. It is important to understand,
however, that the Industrial IoT is slightly different from the IoT and other big data
platforms. Here are three of the most important common pitfalls we have discovered, based
on our ten years of experience:
Data acquisition is not standard: You will need to adapt your platform to
acquire data from different data sources. Take into consideration what we
learned about different data flow scenarios.
Tag mapping is a grueling job: The names of measures are encoded with
different formats and we need to translate them (or map them) into a common
language. Refer to what we learned about assets in Chapter 8, Implementing a
Custom Industrial IoT Platform.
The data-driven analytics approach can be a false friend: Despite the fact that
we can use ML to build general purpose data-driven analytics, these sometimes
fail to explain the root cause of an issue. Consider what we learned in Chapter
13, Understanding Diagnostics, Maintenance, and Predictive Analytics, to think about
developing hybrid solutions.
[ 505 ]
Deploying Analytics on an IoT Platform Chapter 15
Lastly, it is essential to bear in mind that the basic idea of the microservice architecture and
the SOLID principles remain valid. The I-IoT platform is not a monolithic application, but
instead an ecosystem of different bricks. You should design every brick as a single
autonomous component with a standard input and output. If you do this, you should not
experience insurmountable issues.
Questions
1. What is a Jupyter Notebook?
1. An IDE in which to develop Azure ML or SageMaker analytics
2. A general-purpose interactive IDE for Python and other languages
3. A notebook used by NASA
2. Which one of the following steps is more appropriate to deploy analytics on
Azure ML, SageMaker, or GCP Analytics?
1. Prepare the data, train the model, test the model, deploy the model
2. Prepare the data, build the model, deploy the model, monitor the
model
3. Prepare the data, build the model, (train the model, test the model),
deploy the model, monitor the model
3. What is the basic idea of SageMaker and Azure ML?
1. To build the model as a web application on a containerized application
2. To learn the model using FPGA and GPU
3. To allocate a computational cluster in which to test the analytics
Further reading
Additional resources can be found at the following links:
[ 506 ]
Assessment
Physical objects in the industrial world are more complex and have
a wide range of typologies.
In the industrial world, robustness, resilience, and availability are
key factors.
Intellectual property is a sensitive and important topic in the
industrial world.
[ 508 ]
Assessment
3. 0.152 mV
4. Microcontroller
5. Functional Block Diagram, Sequential Functional Chart, and Ladder Diagram
6. When the production process is continuous and the product value is high
7. Polling happens on a regular basis, while unsolicited happens when changes are
made
8. 1, 2, and 7
9. Between MES and PLC
10. The asset model is the conceptual model of the physical or logical assets that the
company uses to perform its business
[ 509 ]
Assessment
3. Being able to create and deploy the devices acting as dual-homes in a specific
network
4. Building logical networks that share the same physical infrastructure
5. Allowing DCOM traffic to cross the firewall
6. Tunneling the DCOM traffic through TCP and putting the external OPC Proxy in
a DMZ
7. Using the OPC-UA security model
[ 510 ]
Assessment
[ 511 ]
Assessment
[ 512 ]
Other Books You May Enjoy
If you enjoyed this book, you may be interested in these other books by Packt:
[ 514 ]
Other Books You May Enjoy
[ 515 ]
Index
with ML 433
A Apache Airflow
Access Control List (ACL) 160 about 244
actuator chain 59 analytics, implementing 249, 251, 252
actuators 58, 69, 70 batch analytics, developing 244
advanced analytics 210, 218, 442 installing 244, 245
Advanced Message Queuing Protocol (AMQP) 51, KairosDB operator, developing 246, 247
134, 225 Apache Beam 252
Advanced Process Control (APC) 52 Apache Cassandra
Amazon Web Services (AWS) about 234
about 293, 405 Graphite 243
AWS client, installing 297, 299 installing 236
registering for 296 KairosDB 236
analog to digital converter (ADC) 66, 67, 69 reference 236
analogic signals 69 time-series data, storing 234
analogical sensors 60 Apache Flink 252
analytics platform 217 Apache Flume 252
analytics Apache HBase 356
implementing, on AWS SageMaker 481 Apache HDFS 263
Android IoT 133 Apache Kafka
anomaly detection as data dispatcher 225, 227, 228
about 409, 426 Kafka Streams, used for building Rule Engine
algorithm, building 431, 432 229, 231, 234
collective anomalies 409 reference 224, 226
contextual anomalies 409 Apache Presto 263
dataset 427 Apache Spark
deploying 433 about 252, 263
EDA 428, 429 reference 504
environment, preparing 427 Apache Storm 252
features, extracting 430 Apache Zeppelin
features, selecting 430 reference 423
model, building 430, 433, 436 Application Programming Interface (API) 107
monitoring 433 application proxy firewall 164
packaging 433 Artificial Intelligence (AI) 415
point anomaly 409 asset registry
problem statement 427 about 210, 216, 288
training set, defining against validation set 431 asset classes 216
asset instances 216 Azure IoT
asset model, building with Neo4j 254, 257 about 373
building, to store asset information 253, 254 Data Lake Storage, setting up 382
Athena 334 IoT Hub 375
Automated Guided Vehicles (AGVs) 135 registering 374
automation networks Azure ML Analytics 395
control network 84 Azure ML service
field network 84 analytical model, developing 462
information network 83 IoT Hub integration 480
autonomous guided vehicles (AGVs) 36 ML capabilities 474
autoregressive integrated moving average (ARIMA) using 464, 466
model 434 working with 462, 463
AWS analytics Azure ML Studio 462
about 316 Azure Notebook 463, 466
Greengrass 321 Azure portal
Lambda analytics 316, 320 reference 464
AWS architecture
about 294 B
AWS IoT 294 batch analytics
AWS Data Pipeline versus streaming 421, 422
reference 334 batch processes 35
AWS Edge (Greengrass) 191 big data analytics 211, 219
AWS IoT 294 BigQuery
AWS IoT Core about 370
model, consuming from 502 reference 370
AWS Kinesis 316 Bigtable 356, 358
AWS Machine Learning 334 binary sensors 60
AWS SageMaker notebook Bosch IoT 291
model, testing on 501 business intelligence (BI) 408
AWS SageMaker
analytics, implementing 481 C
model, training 499
capital expenditure (CAPEX) 13
Azure analytics
Carnegie Mellon University (CMU) 9
about 384
Carrier Sense Multiple Access/Collision Detection
Azure ML Analytics 395
(CSMA/CD) 82
custom formatter 394
CBM 413
custom reducer 394
central data storage 209
Data Lake Analytics 392, 393
Central Processing Unit (CPU) 71
job, scheduling 395
centralized access 82
Stream Analytics 384, 386
Certificate Authority (CA) 352
Azure Data Factory
Chainer
reference 395
reference 504
Azure Data Lake 373
characteristics, sensors
Azure Edge 191
accuracy 61
Azure IoT Edge 133
bandwidth 60
[ 517 ]
deterministic error (offset) 61 asset registry 52
drift 62 big data analytics 53
dynamic features 61 data lake 52
electrical output impedance 62 IoT Hub 52
hysteresis 62 object storage 53
input range 61 Cloud-based TSDBs 213
linearity 61 cloud
loading effect 62 I-IoT data flow 52
noise 62 cold path 210, 219
output range 61 commercial off-the-shelf (COTS) 11, 78
precision 61 compound annual growth rate (CAGR) 20
probabilistic error 61 computer numerical controllers (CNCs) 40
resolution 61 computer-aided design (CAD) 37
sensitivity 61 computer-aided engineering (CAE) 37
transduction 60 computer-aided process planning (CAPP) 37
CIM pyramid levels computer-integrated manufacturing (CIM) 26, 209
automation control 41 computerized maintenance management system
control level 41 (CMMS) 216
field level 41 condition monitoring 409
management level 41 condition-based analytics 423
planning level 41 condition-based maintenance (CBM) 413
production management 41 Constrained Application Protocol (CoAP)
supervisory 41 about 135, 224
supervisory level 41 reference 135
CIM pyramid Container as a Service (CaaS) 207
about 36, 38, 39, 40, 41 containers 202, 204
architecture 42 continuous processes 34
networks 46 control bus 87
classes, I-IoT analytics control network 84
descriptive analytics 408 Control Network (CN) 163
predictive analytics 411 controllers
prescriptive analytics 413 about 70
cloud analytics 423 microcontrollers 71
Cloud BigQuery 356 converters
Cloud Bigtable 356, 370 about 63
Cloud Datastore 356 analog to digital converter (ADC) 66, 67, 69
Cloud Foundry (CF) digital to analog converter (DAC) 63, 64
about 207, 268 Convolutional Neural Networks (CNNs) 444
reference 269 create, read, update, and delete (CRUD) 254
Cloud Functions cross-validation 418
about 360, 362, 364 cyber-physical systems
example, executing 364 reference 20
Cloud Storage 370 Cypher
cloud, I-IoT data flow reference 254
analytics 52
[ 518 ]
descriptive analytics
D about 408
D3.js anomaly detection 409
reference 423 condition monitoring 409
data acquisition 89 diagnostic analytics 411
Data as a Service (DaaS) 23 health monitoring 409
data collection KPI monitoring 409
polling 89 device bus 87
unsolicited 89 diagnostic analytics 411
Data Firehose 316 differential linearity error 69
data lake 211, 263 digital sensors 60
Data Lake Analytics 392, 393 digital signals 69
Data Lake Storage digital to analog converter (DAC)
setting up 382 about 63, 64
data sources, I-IoT data flow properties 66
DCS 139 Digital Twin 219, 244
Historians 141 digital twins
PLC 137 about 442
SCADA 140 advanced modeling 458
data-driven approaches 442 AWS 458
data-driven model GCP 458
versus physics-based model 418 platforms 457
data-processing analytics predix 458
about 217 use cases 442
advanced analytics 219 directed acyclic graph (DAG) 244
big data analytics 219 discrete processes 36
cold path 219 Distributed Control System (DCS) 106, 212
EMAs 218 distributed control system (DCS), programmable
hot path 219 logic controllers (PLCs) 26
data DL
storing 310 about 443
datapoints 91 TensorFlow 444
DCS 77, 139 Docker 205, 206, 207, 208
deep learning (DL) 441 Docker Compose 207
Deep Learning analytics 244 Docker edge 134
deep-packet-inspection (DPI) firewalls 164 DSPs 73
Defence in Depth (DiD) approach dual-homing 165
about 159 DynamoDB
in Industrial Control System (ICS) environment about 310
161 IoT Core acts, using 312, 315
operation modes 160 reference 310
people 159
procedures 161
strategy elements 162
E
Eclipse IoT
technology 160
about 221, 262
demilitarized zone (DMZ) 146
[ 519 ]
Eclipse Ditto 262 weaknesses 152
Eclipse Hono 262 edge tools
reference 262 about 133
Edge Computing features 129
about 132 Edge
features 129 about 191
edge device 51, 52 Node-RED 192, 194, 196, 198
edge gateway Elasticsearch 213, 261
about 131, 132 embedded microcontrollers 73
features 128, 129 encoder modules 77
edge implementations Enhanced Performance Architecture (EPA) 85
about 133 enterprise resource planning (ERP) 216
Android IoT 133 ERP 95
Azure IoT Edge 133 excursion monitoring analytics (EMAs) 218
Docker edge 134 Extract, Transform and Load (ETL) 252
Greengrass 133
Intel IoT Gateway 134 F
Node-RED 134 factory automation 28, 30
edge internet protocol feature engineering 418
Advanced Message Queuing Protocol (AMQP) field network 84
134 fieldbus networks
Constrained Application Protocol (CoAP) 135 about 84, 85
HTTP 134 advantages 85
HTTPS 134 File Transfer Protocol (FTP) 164
Message Queue Telemetry Transport (MQTT) firewalls
134
about 163
edge on fieldbus setup
application proxy firewall 164
about 144
packet-filtering firewall 164
securing 172, 173
stateful-inspection firewall 164
strengths 145
fog computing
weaknesses 145
versus I-IoT edge 130
edge on OPC DCOM
Function as a Service (FaaS) 208
about 146, 147
functional layers, OSI model
securing 174, 175, 176
application level 80
strengths 148
data link 80
weaknesses 148
network level 80
edge on OPC Proxy
physical level 80
about 148, 150
presentation level 80
securing 176, 177, 178
session level 80
strengths 150
transport level 80
weaknesses 150
edge on OPC UA
about 151
G
securing 179, 180 Gaussian Process (GP) 415
strengths 152 GCP
advanced analytics capabilities 502
[ 520 ]
General Electric (GE) 134, 212, 265, 405 characteristics 93
GNU Octave disadvantages 142
reference 458 raw data 92
Google Cloud IoT store and forward functionality 92
about 344, 345 Historical Data Access (HDA) 143
GCP SDK, installing 348 holdout method 418
initiating 345, 348 hot path 209, 219
Google Cloud ML service HTTP 134
reference 502 HTTPS 134
Google Cloud Platform (GCP) Human Machine Interface (HMI) 45, 58, 90
about 343, 405, 457 hybrid reliable model 22
BigQuery 370 Hypertext Transfer Protocol Secure (HTTPS) 51
Cloud Bigtable 344
Cloud Dataflow 344 I
Cloud IoT Core 344 I-IoT analytics technologies
Cloud Pub/Sub 344 about 414
Dataflow 368, 370 data-driven 415
for analytics 365 model-based 414
functions, for analytics 366 physics-based 415
Google Cloud Storage 370 rule-based 414
Google Cloud Storage I-IoT Analytics
BigQuery 370 about 407
Cloud Bigtable 370 anomaly detection 426
Cloud SQL 370 anomaly detection, with ML 433
Cloud Storage 370 building 416
Grafana 261 classes 408
Graphite cloud analytics 423
reference 243 condition-based analytics 423
Greengrass data-driven, versus physics-based model 418
about 133, 321 dataset acquisition 417
configuring 326 deploying 418, 419, 421
Greengrass Edge, building 323, 324 exploratory data analysis (EDA) 417
OPC-UA connector, building 328, 329 implementing 426
OPC-UA connector, deploying 330, 332 interactive analytics 423
working with 322 model, building 417
monitoring 419
H on-edge analytics 423
Hadoop 263 packaging 418
Hadoop Distributed File System (HDFS) 252 predictive production 436
HBase 234 problem statement 417
Hidden Markov Model (HMM) 415 streaming, versus batch analytics 421, 422
Historian use cases 407
about 58, 91, 92, 93, 141 I-IoT architectures 201
advantages 142 I-IoT data 459
categories 94 I-IoT data flow
[ 521 ]
about 46 International Electrotechnical Commission
data sources 136 (IEC)-61131 standard 31
implementing 135 International Organization for Standardization
in cloud 52 (ISO) 79
in factory 48, 50, 51, 57, 58 Intrusion Detection System (IDS) 160
securing 171 Intrusion Prevention System (IPS) 160
I-IoT edge IoT Analytics
about 126 about 22, 335
architecture 131 channel, building 335
features 128 data store, building 336
macro components 127 IoT analytics
versus fog computing 130 data-driven 22
I-IoT OEM platforms IoT Analytics
about 265 dataset, preparing 338, 339
using 266 diagnostics 22
I-IoT platform 201 efficiency 22
in polling 50 logistic 22
industrial control system (ICS) 47 maintenance 22
Industrial Internet of Things (I-IoT) optimization 22
about 7, 18, 20 physics-based 22
asset management 20 pipeline, building 336
big data 21 prognostic 22
condition-based maintenance 21 supply chain 22
field service organizations 20 IoT Core
manufacturing operations 20 about 300
operation 20 data, sending through MQTT 354, 355
remote monitoring 20 device, registering 352
use cases 20 devices registry, building 350, 352
versus IoT 21 initiating 349
industrial process MQTT client, working with 307, 309
about 27, 28 policies, setting 301
control system 31, 33 thing, registering 303, 305, 307
factory automation 28, 30 IoT Edge
measurement systems 31, 33 applying, to industrial sector 399
types 33 building, with OPC UA support 400
industrial protocols 79 building. with OPC UA support 401
industry environments 23 device, connecting 398, 399
InfluxDB 213 versus I-IoT Edge 130
information network 83 IoT Hub
infrastructure 419 about 375, 377
integral linearity error 69 data, sending through MQTT 379, 381
Intel IoT 291 device, registering 378, 379
Intel IoT Gateway 134 IoT
intellectual property (IP) 47 definition 9, 11
interactive analytics 423 factors, enabling 12, 14
[ 522 ]
history 9, 11 actuators 43
key technologies 16, 18 sensors 43
overview 8, 9 transducers 43
use cases 14 level 2 architecture, CIM pyramid
ISA-95 equipment entities Computer Numerical Controls (CNCs) 44
area 98 DCSes 44
enterprise 98 embedded controller 44
process cell 98 PLC 44
production line 98 remote terminal unit (RTU) 43
production unit 98 level 3 architecture, CIM pyramid
site 98 SCADA 44
storage zone 98 level 4 architecture, CIM pyramid
work center 98 ERP 45
work unit 99 MES 45
Linear Learner 481
J long short-term memory (LSTM) 444
Java Message Service (JMS) 225 low power networks (LPNs) 12
Jupyter Notebook Low-Power Wide-Area Networks (LPWANs) 12
about 463
starting 483, 484, 485, 486 M
Jupyter machine learning (ML) 414, 441
about 406 Machine Learning (ML) 344
reference 423 Machine Learning (ML) Analytics 373
Manufacturing Execution System (MES) 260
K manufacturing execution system (MES) 20, 26
K-Means 481 marshalling 109
K-Nearest Neighbors 481 Matrikon 107
Kaa IoT 221, 261 measurements 59
Kafka KairosDB plugin MES
installing 238 advantages 96
KairosDB asset model 96
about 212, 236 features 95
installing 236, 238 Message Queue Telemetry Transport (MQTT)
reference 236 134, 223
Kalman filter 410 microcontrollers
key performance indicator (KPI) 52 about 71
Key Performance Indicator (KPI) 407 DSPs 73
Kibana 261 embedded microcontrollers 73
Kudu 234 features 72
versus microprocessors 71, 72
L with external memory 73
microservices-based architecture 203, 204
Laboratory Information Management Systems
MindAccess IoT Value 289
(LIMSs) 95
MindSphere platform
least significant bit (LSB) 63
about 289
level 1 architecture, CIM pyramid
[ 523 ]
registering 290
working with 290 O
ML algorithms OAuth2 mechanism 270
about 443 OLE for Process Control (OPC) 108
reinforcement learning (RL) 443 on-edge analytics
supervised learning 443 about 423
unsupervised learning 443 Azure functions 424
ML capabilities, of Azure ML service FreeRTOS 424
about 474 Greengrass 424
cluster, preparing to deploy training model 477 on controller 424
model, submitting to cluster 478 One-Class Support Vector Machine (OCSVM) 434
surrogate model, building with logistic regression Online Analytical Processing (OLAP) 95
475 OPC Classic standard
surrogate model, building with Scikit-Learn 476 aims 111
training model, building 477 working 111
ML Engine 502 OPC Classic
Mosquito about 108
as MQTT connector 224 data flow 112, 113
reference 224 limitations 109
most significant bit (MSB) 63 reference 107
MQTT client OPC Foundation
working with 307, 309 reference 188
multi-cloud platform 403 OPC UA NodeJS
multi-cloud solutions reference 188
Apache Spark 504 server, initiating 188
Chainer 504 OPC UA on controller
discovering 504 about 153, 154
MXNet 504 securing 181
PyTorch 504 OPC UA Simulation Server
MXNet about 188
reference 504 OPC UA NodeJS 188
Prosys OPC UA Simulator 189
N OPC UA
natural language processing (NLP) 442 about 113
Neo4j data exchange 122, 123
asset model, building 254, 257 information model 114, 115, 116
reference 254 notifications 125, 126
Netflix Atlas 213 reference 107
networks, CIM pyramid security model 117, 118, 120, 121, 122
control network 46 sessions 117
corporate network 46 subscriptions 124
fieldbus network 46 working 113
neural networks (NNs) 443 OPC
Neural Topic Model (NTM) 481 discovering 106, 107
Node-RED 134, 191, 192, 194, 196, 198 open container initiative (OCI) 205
[ 524 ]
Open Platform Communications (OPC) 50, 107, network module 76
224 processor module 75
open source platform remote I/O module 76
about 222, 223 polling method 82
data gateway 224 Power BI
features 222 reference 395
pros and cons 260 visualizations, building 395
technologies 261 prediction 489
Open System Architecture (OSA) predictive analytics
about 425 about 411
Condition Monitoring 426 prognostic analytics 411
Data Acquisition 426 predictive production
Decision Support 426 about 436
Health Assessment 426 dataset 436
Prognostics 426 deploying 438
Signal Processing 426 EDA 437
Open Systems Interconnection (OSI) 79 model, building 437
OpenModelica monitoring 438
reference 458 packaging 438
OpenTSDB 213 problem statement 436
operational expenditure (OPEX) 13 Predix Asset
operational firewalls 146 about 288
operational technology (OT) 13 reference 288
Original Equipment Manufacturer (OEM) 405 Predix Machine
original equipment manufacturer (OEM) 265 about 285
OSGi application 202 Predix developer kit, configuring 285, 287
OSI model Predix Edge OS 288
functional layers 80 reference 285, 287
OSIsoft PI 212 Predix Platform
Out of order data 391 about 268
application, deploying 280, 281, 283, 284
P authorization services, configuring 270
packet-filtering firewall 164 data, ingesting 277, 278
packets 163 data, obtaining 279
Parquet 234 Predix Asset 288
personal area networks (PANs) 12 Predix Machine 285
physics-based approaches 442 Predix services 288
Platform-as-a-Service (PaaS) 204 prerequisites, installing 269
platforms reference 268, 280
comparing 402 references, for prerequisites 269
PLC registering 268
about 58, 74, 137 security, configuring 273, 276
advantages 137 timeseries database, configuring 272, 273
disadvantages 138 user authentication, configuring 270
I/O modules 76 prescriptive analytics
[ 525 ]
about 413 exploratory data analysis (EDA) 446, 449
CBM 413 features, extracting 450
production optimization analytics 414 model, building 450
Principal Component Analysis (PCA) 417, 430 model, defining 451, 453
probability density function (PDF) 412 monitoring 454
Process Control (OPC) 107 packaging 453
Process Control Network (PCN) 163 problem statement 445
processor module 75 training set, identifying 450
producer/consumer access method 83 validation set, identifying 450
Programmable Logic Controllers (PLCs) 106 variables, selecting 450
Proportional-Integral-Derivative (PID) 77 Remote Procedure Call (RPC) 108
proportional-integral-derivative (PID) 40 Riak TS (RTS) 212
Prosys OPC UA Simulator Rockwell Automation 291
about 189 rule engine
measures, simulating 190 building, with Kafka Streams 229, 231, 234
Prosys server, installing 189
Prosys server S
reference 189 S3
Protocol Data Unit (PDU) 81 dataset, downloading on 481
Public Key Infrastructure (PKI) 160 SA-88 extensions
PyTorch about 99
reference 504 control module 99
equipment module 99
Q Safety Instrumented Systems (SISs) 106
quantization error 68 SageMaker container
quantized signals 69 building 491, 493, 494, 495
QuickSight 339, 341 implementation 489, 490
model, testing locally 497
R model, training locally 496
R shiny SageMaker
reference 423 about 334
RabbitMQ 261 advanced features 501
radio-frequency identification (RFID) 10 remaining useful life (RUL), evaluating 481
real-time operating system (RTOS) 13, 424 SageMath
recurrent neural network (RNN) 444 reference 458
Redis 261 sampled signals 69
reinforcement learning (RL) 443 SCADA systems
relevance vector machine (RVM) 415 about 58, 140
Remaining Useful Life (RUL) 411 advantages 140
remaining useful life (RUL) disadvantages 141
about 441 semi-continuous processes 35
dataset 445 sensor bus 87
deploying 453 sensors
environment, preparing 445 about 58, 60
evaluating 444 analogical sensors 60
[ 526 ]
binary sensors 60 Time Series Database (TSDB)
characteristics 60, 61, 62 about 52, 215
digital sensors 60 Cloud-based TSDBs 213
Sequence of Events (SOE) 50 Elasticsearch 213
serverless computing 202 InfluxDB 213
service-oriented architecture (SOA) 203 KairosDB 212
servo modules 77 Netflix Atlas 213
signals OpenTSDB 213, 215
analogic signals 69 OSIsoft PI 212
digital signals 69 Proficy Historian 212
quantized signals 69 Riak TS (RTS) 212
sampled signals 69 Uniformance Process History Database (PHD)
Single Responsibility Principle (SRP) 203 212
SSTable 235 time series databases (TSDBs) 91
standard I-IoT flow Time Series Insights (TSI)
about 209, 211 about 373
asset registry 216 reference 397
time-series 211 time-based method 211
stateful-inspection firewall 164 time-series 211
storage unit 99 time-series data
Stream Analytics storing, on Apache Cassandra 234
about 384, 386 token access method 82
advanced Stream Analytics 390, 391 token bus 82
testing 388 topologies
stream socket 204 examples 81
streaming transducer 60
versus batch analytics 421, 422 transmission media
supervised learning 443 about 81
Supervisory Control 89 optical fibre 82
Supervisory Control and Data Acquisition (SCADA) powerline 82
26 wireless networks 82
support vector machine (SVM) 415 Trusted Platform Module (TPM)
system database 78 about 52
edge device 51, 52
T two-factor authentication (2FA) 268
tags 91 types, industrial process
tensor processing unit (TPU) 444 batch processes 35
Tensor Processing Unit (TPU) processor continuous processes 34
reference 502 discrete processes 36
TensorFlow semi-continuous processes 35
about 444
reference 444 U
thing 300 U-SQL
ThingWorx IoT Platform 291 reference 395
time division method 83 Uncertainty Quantification (UQ) 412
[ 527 ]
Unified Architecture (UA) 51, 224 model, developing 468, 469, 470
unit 99 model, registering 472
unsupervised learning 443 model, testing 474
User Account and Authentication (UAA) 270 wind turbine
user-defined function (UDF) 386 dataset 454
deploying 457
V exploratory data analysis 454
virtual machine (VM) 205 model, building 454, 455, 457
Visual Studio Code tools for AI 463 monitoring 454, 457
von Neumann architecture 72 packaging 457
problem statement 454
W windowing 390
work cell 99
WANs
reference 12
wind turbine digital twins, developing with Azure ML
Y
about 468 Yet Another Resource Negotiator (YARN) 252
image, building of model 471
model, deploying 472 Z
zero-order hold (ZOH) operation 63