You are on page 1of 62

Internet of Things (ETUA31202)(TY ETC SEM-I)

Unit V : BIG Data


Dr. Pravin G. Gawande
pravin.gawande@viit.ac.in,

Department of Electronics and Telecommunication Engineering

BRACT’S, Vishwakarma Institute of Information Technology, Pune-48


(An Autonomous Institute affiliated to Savitribai Phule Pune University)
(NBA and NAAC accredited, ISO 9001:2015 certified)
Teaching Scheme
Text books or Reference Any other information
Topics Unit V : BIG Data
books or Datasheets
1 Introduction, Bigdata, Types of data,
2 Characteristics of Big data, Data Storage,
3 Introduction to Hadoop.
4 Types of Data analytics. Statistical Models,
5 Analysis of Variance, Data Dispersion,
6 Contingence and Correlation,
7 Regression Analysis.

Objective: To be familiar with data handling and analytics tools in IoT

Course Outcomes: Use various techniques of Big data storage and analytics in IoT

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 2


Teaching Scheme
Text Books
Sr. No. Title Authors Publication
ISBN : 978-1- 84821-140-
T1 1. The Internet of Things Connecting Objects to the Web Hakima Chaouchi
7, Wiley Publications
Olivier Hersent, David
T2 2. The Internet of Things: Key Applications and Protocols WileyPublications
Boswarthick, and Omar Elloumi
Reference Books
Arsheep Bahga and Vijay
R1 Internet of Things, Universities Press
Madisetti
Building the Internet of Things with IPv6 and MIPv6: The ISBN: 978-1-118-47347-
R2 Daniel Minoli
Evolving World of M2M Communications 4, Willy Publications
The Internet of Things: Enabling Technologies, Platforms, and Pethuru Raj and Anupama C.
R3 CRC Press
Use Cases Raman
Authorization and Access Control Foundations, Frameworks, Parikshit N. Mahalle, Shashikant
R4 CRC Press
and Applications S. Bhong, Gitanjali R. Shinde
Online
O1 https://onlinecourses.nptel.ac.in/noc17_cs22/course NPTEL course
O2 http://www.cse.wustl.edu/~jain/cse570-15/ftp/ iot_prot/ index.html

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 3


Teaching Scheme
TABLE OF CONTENTS
1.What is Big Data?
2.Types of Big Data
1. Structured data
2. Unstructured data
3. Semi-structured data
3.Characteristics of Big Data
1. Volume
2. Variety
3. Velocity
4. Value
5. Veracity

https://bau.edu/blog/characteristics-of-big-data/

https://energie.labs.fhv.at/~repe/bigdata/introduction-to-big-data-projects/introduction-to-big-data/

https://www.javatpoint.com/what-is-big-data
Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 4
Introduction to Big Data
What is “big data”?
• "Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of
processing to enable enhanced decision making, insight discovery and process optimization” (Gartner 2012)
• Complicated (intelligent) analysis of data may make a small data “appear” to be “big”

Bottom line: Any data that exceeds our current capability of processing can be regarded as “big”

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 5


Introduction to Big Data
Why is “big data” a “big deal”?
• Government
• “DIGITAL BHARAT ” Creates huge data
• Many different big data programs launched
• Private Sector
• Walmart handles more than 1 million customer transactions every hour, which is imported into databases
estimated to contain more than 2.5 petabytes of data
• Facebook handles 40 billion photos from its user base.
• Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide
• Science
• Large Synoptic Survey Telescope will generate 140 Terabyte of data every 5 days.
• Biomedical computation like decoding human Genome & personalized medicine
• Social science revolution
• -…

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 6


Introduction to Big Data
Lifecycle of Data: 4 “A”s

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 7


Introduction to Big Data
Computational View of Big Data

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 8


Introduction to Big Data
What’s Big Data?
No single definition; here is from Wikipedia:

• Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using
on-hand database management tools or traditional data processing applications.
• The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.

• The trend to larger data sets is due to the additional information derivable from analysis of a single large set of
related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be
found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime,
and determine real-time roadway traffic conditions.”

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 9


Types of Big Data
As the Internet age continues to grow, we generate an incomprehensible amount of data every second. So much so that the number of
data floating around the internet is estimated to reach 163 zettabytes by 2025. That’s a lot of tweets, selfies, purchases, emails, blog
posts, and any other piece of digital information that we can think of. These data can be classified according to the following types:

Structured data
Structured data has certain predefined organizational properties and is present in structured or tabular schema, making it easier to
analyze and sort. In addition, thanks to its predefined nature, each field is discrete and can be accessed separately or jointly along with
data from other fields. This makes structured data extremely valuable, making it possible to collect data from various locations in the
database quickly.

Unstructured data
Unstructured data entails information with no predefined conceptual definitions and is not easily interpreted or analyzed by standard
databases or data models. Unstructured data accounts for the majority of big data and comprises information such as dates, numbers, and
facts. Big data examples of this type include video and audio files, mobile activity, satellite imagery, and No-SQL databases, to name a
few. Photos we upload on Facebook or Instagram and videos that we watch on YouTube or any other platform contribute to the growing
pile of unstructured data.

Semi-structured data
Semi-structured data is a hybrid of structured and unstructured data. This means that it inherits a few characteristics of structured data
but nonetheless contains information that fails to have a definite structure and does not conform with relational databases or formal
structures of data models. For instance, JSON and XML are typical examples of semi-structured data.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 10


Types of Big Data
As the Internet age continues to grow, we generate an incomprehensible amount of data every second. So much so that the number of
data floating around the internet is estimated to reach 163 zettabytes by 2025. That’s a lot of tweets, selfies, purchases, emails, blog
posts, and any other piece of digital information that we can think of. These data can be classified according to the following types:

• refers to the huge diversity of data


types and data sources.
• structured data, data that conforms with a formal
structure of a data model
• e.g. data that fits into a relational database
• semi-structured data, data that does not conform
with a formal structure, but carry structural
information (self-describing structures)
• entities of the same type may have different attributes
• e.g. XML, JSON, EDI, ...
• quasi-structured data, textual data with erratic data
formats
• e.g. clickstream data
• unstructured data, data that is not organized in a
pre-defined way
• e.g. text documents, pictures,...
Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 11
Sources of Big Data
These data come from many sources like
•Social networking sites: Facebook, Google, LinkedIn all these sites generates huge amount of data
on a day to day basis as they have billions of users worldwide.
•E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs from which
users buying trends can be traced.
•Weather Station: All the weather station and satellite gives very huge data which are stored and
manipulated to forecast weather.
•Telecom company: Telecom giants like Airtel, Vodafone study the user trends and accordingly
publish their plans and for this they store the data of its million users.
•Share Market: Stock exchange across the world generates huge amount of data through its daily
transaction.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 12


Characteristics of Big Data
• As with anything huge, we need to make proper categorizations in order to improve our understanding.
• As a result, features of big data can be characterized by five Vs.: volume, variety, velocity, value, and veracity.
• These characteristics not only assist us in deciphering big data but also gives us an idea of how to deal with huge,
fragmented data at a controllable speed in an acceptable time period so that we can extract value from it, do real-
time analysis, and respond promptly.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 13


Characteristics of Big Data
Big Data: 3V’s

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 14


Characteristics of Big Data
The Three Basic V’s Big Data: 3V’s

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 15


Characteristics of Big Data
The Three Basic V’s Big Data: 3V’s - Volume (Scale)
• Data Volume
• 44x increase from 2009 - 2020
• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially

Exponential increase in
collected/generated data

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 16


Characteristics of Big Data
The Three Basic V’s Big Data: 3V’s - Volume (Scale)
• Data Volume
• 44x increase from 2009 - 2020
• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially

CERN’s Large Hydron Collider (LHC) generates 15 PB a


year Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 17
Characteristics of Big Data
The Three Basic V’s Big Data: 3V’s - Volume (Scale)
12+ TBs
of tweet data 4.6 billion
every day 30 billion RFID tags today camera
(1.3B in 2005) phones
world wide
data every day
? TBs of

100s of millions
of GPS enabled
76 million smart
devices sold
meters in 2009…
annually
200M by 2014

2+ billion
people on the
25+ TBs of Web by end
log data every 2011
day

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 18


Characteristics of Big Data
The Three Basic V’s Big Data: 3V’s - Volume (Scale)

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 19


Characteristics of Big Data
The Three Basic V’s Big Data: 3V’s – Velocity (Speed):

• Data is begin generated fast and need to be


processed fast
• Online Data Analytics
• Late decisions  missing opportunities
• Examples
• E-Promotions: Based on your current
location, your purchase history, what
you like  send promotions right now
for store next to you
• Healthcare monitoring: sensors • Velocity:
monitoring your activities and body • Refers to the speed at which new data is
 any abnormal measurements being created, and the need for data to be
require immediate reaction processed in near real-time.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 20


Characteristics of Big Data
The Three Basic V’s Big Data: 3V’s – Variety (Complexity)

• Relational Data (Tables/Transaction/Legacy


Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …
• Streaming Data
• You can only scan the data once
• A single application can be generating/collecting
many types of data
• Big Public Data (online, weather, finance, etc)

To extract knowledge all these types of data need to linked together

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 21


Characteristics of Big Data
The Three Basic V’s Big Data: 3V’s – Variety (Complexity)

• refers to the huge diversity of data


types and data sources.
• structured data, data that conforms with a formal
structure of a data model
• e.g. data that fits into a relational database
• semi-structured data, data that does not conform
with a formal structure, but carry structural
information (self-describing structures)
• entities of the same type may have different attributes
• e.g. XML, JSON, EDI, ...
• quasi-structured data, textual data with erratic data
formats
• e.g. clickstream data
• unstructured data, data that is not organized in a
pre-defined way
• e.g. text documents, pictures,...

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 22


Characteristics of Big Data
Two Additional V's
• Veracity:
• Veracity means how much the data is reliable. It has many ways to filter or translate the data. Veracity is the
process of being able to handle and manage data efficiently. Big Data is also essential in business development.
• conformity to facts and accuracy
• quality and origin of data

• For example, Facebook posts with hashtags.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 23


Characteristics of Big Data
Two Additional V's

• Value:
• benefit generated by using the information contained in the data to improve to outcomes of actions
• e.g. profit, medical or social benefits, customer, employee, or personal satisfaction
• Value is an essential characteristic of big data. It is not the data that we process or store. It is valuable and
reliable data that we store, process, and also analyze.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 24


Characteristics of Big Data
Some Make it 4V’s

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 25


Characteristics of Big Data
Some Make it 4V’s

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 26


Characteristics of Big Data
• Big Data contains a large amount of data
that is not being processed by traditional
data storage or the processing unit.
• It is used by many multinational
companies to process the data and
business of many organizations.
• The data flow would exceed 150
exabytes per day before replication.

5 V's of Big Data


•Volume
•Veracity
•Variety
•Value
•Velocity

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 27


How the big data is different from traditional data

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 28


Big Data Vendor Landscape

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 29


IOT Analytics Technology/Vendor choices

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 30


Introduction to Hadoop
• Hadoop is an open source framework from Apache and is used to store process and analyze data
which are very huge in volume.
• Hadoop is written in Java and is not OLAP (online analytical processing).
• It is used for batch/offline processing.
• It is being used by Facebook, Yahoo, Google, Twitter, LinkedIn and many more.
• Moreover it can be scaled up just by adding nodes in the cluster.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 31


Introduction to Hadoop
Modules of Hadoop
1. HDFS: Hadoop Distributed File System. Google published its paper GFS and on the basis of that HDFS was
developed. It states that the files will be broken into blocks and stored in nodes over the distributed
architecture.
2. Yarn: Yet another Resource Negotiator is used for job scheduling and manage the cluster.
3. Map Reduce: This is a framework which helps Java programs to do the parallel computation on data using
key value pair. The Map task takes input data and converts it into a data set which can be computed in Key
value pair. The output of Map task is consumed by reduce task and then the out of reducer gives the desired
result.
4. Hadoop Common: These Java libraries are used to start Hadoop and are used by other Hadoop modules.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 32


Hadoop Architecture
Hadoop Architecture
• The Hadoop architecture is a
package of the file system,
MapReduce engine and the HDFS
(Hadoop Distributed File System).
• The MapReduce engine can be
MapReduce/MR1 or YARN/MR2.
• A Hadoop cluster consists of a
single master and multiple slave
nodes.
• The master node includes Job
Tracker, Task Tracker, NameNode,
and DataNode whereas the slave
node includes DataNode and
TaskTracker.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 33


Hadoop Architecture
Hadoop Distributed File System
• The Hadoop Distributed File System
(HDFS) is a distributed file system for
Hadoop.
• It contains a master/slave architecture.
• This architecture consist of a single
NameNode performs the role of
master, and multiple DataNodes
performs the role of a slave.
• Both NameNode and DataNode are
capable enough to run on commodity
machines.
• The Java language is used to develop
HDFS.
• So any machine that supports Java
language can easily run the
NameNode and DataNode software.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 34


Hadoop Architecture
NameNode
• It is a single master server exist in the
HDFS cluster.
• As it is a single node, it may become
the reason of single point failure.
• It manages the file system namespace
by executing an operation like the
opening, renaming and closing the
files.
• It simplifies the architecture of the
system.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 35


Hadoop Architecture
DataNode
• The HDFS cluster contains multiple
DataNodes.
• Each DataNode contains multiple data
blocks.
• These data blocks are used to store
data.
• It is the responsibility of DataNode to
read and write requests from the file
system's clients.
• It performs block creation, deletion,
and replication upon instruction from
the NameNode.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 36


Hadoop Architecture
Job Tracker
• The role of Job Tracker is to accept
the MapReduce jobs from client and
process the data by using NameNode.
• In response, NameNode provides
metadata to Job Tracker.

Task Tracker
• It works as a slave node for Job
Tracker.
• It receives task and code from Job
Tracker and applies that code on the
file.
• This process can also be called as a
Mapper.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 37


Hadoop Architecture
MapReduce Layer
• The MapReduce comes into existence
when the client application submits
the MapReduce job to Job Tracker. In
response, the Job Tracker sends the
request to the appropriate Task
Trackers.
• Sometimes, the TaskTracker fails or
time out.
• In such a case, that part of the job is
rescheduled.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 38


Advantages of Hadoop
Advantages of Hadoop
• Fast: In HDFS the data distributed over the cluster and are mapped which helps in
faster retrieval. Even the tools to process the data are often on the same servers, thus
reducing the processing time. It is able to process terabytes of data in minutes and
Peta bytes in hours.
• Scalable: Hadoop cluster can be extended by just adding nodes in the cluster.
• Cost Effective: Hadoop is open source and uses commodity hardware to store data
so it really cost effective as compared to traditional relational database management
system.
• Resilient to failure: HDFS has the property with which it can replicate data over the
network, so if one node is down or some other network failure happens, then Hadoop
takes the other copy of data and use it. Normally, data are replicated thrice but the
replication factor is configurable.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 39


Advantages of Hadoop
Advantages of Hadoop
• Fast: In HDFS the data distributed over the cluster and are mapped which helps in
faster retrieval. Even the tools to process the data are often on the same servers, thus
reducing the processing time. It is able to process terabytes of data in minutes and
Peta bytes in hours.
• Scalable: Hadoop cluster can be extended by just adding nodes in the cluster.
• Cost Effective: Hadoop is open source and uses commodity hardware to store data
so it really cost effective as compared to traditional relational database management
system.
• Resilient to failure: HDFS has the property with which it can replicate data over the
network, so if one node is down or some other network failure happens, then Hadoop
takes the other copy of data and use it. Normally, data are replicated thrice but the
replication factor is configurable.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 40


Data Analytics

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 41


Data Analytics

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 42


Data Analytics

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 43


Data Analytics

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 44


Data Analytics

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 45


Data and Analytics for IoT
• As more and more devices are added to IoT networks, the data generated by these systems
becomes overwhelming
• Traditional data management systems are simply unprepared for the demands of what has
come to be known as “big data.”
• The real value of IoT is not just in connecting things but rather in the data produced by
those things, the new services you can enable via those connected things, and the
business insights that the data can reveal.
• However, to be useful, the data needs to be handled in a way that is organized and
controlled.
• Thus, a new approach to data analytics is needed for the Internet of Things

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 46


An Introduction to Data Analytics for IoT
• In the world of IoT, the creation of massive amounts of data from sensors is common and
one of the biggest challenges—not only from a transport perspective but also from a data
management standpoint
• Modern jet engines are fitted with thousands of sensors that generate a whopping 10GB of
data per second
• Analyzing this amount of data in the most efficient manner possible falls under the
umbrella of data analytics

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 47


An Introduction to Data Analytics for IoT
• Not all data is the same; it can be categorized and thus analyzed in different ways.
• Depending on how data is categorized, various data analytics tools and processing
methods can be applied.
• Two important categorizations from an IoT perspective are whether the data is structured
or unstructured and whether it is in motion or at rest

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 48


Structured Versus Unstructured Data
• Structured data and unstructured data are important classifications as they typically
require different toolsets from a data analytics perspective
• Structured data means that the data follows a model or schema that defines how the
data is represented or organized, meaning it fits well with a traditional relational
database management system (RDBMS).
• In many cases you will find structured data in a simple tabular form—for example, a
spreadsheet where data occupies a specific cell and can be explicitly defined and
referenced

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 49


Structured Versus Unstructured Data
• Structured data can be found in most computing systems and includes everything from
banking transaction and invoices to computer log files and router configurations.
• IoT sensor data often uses structured values, such as temperature, pressure, humidity,
and so on, which are all sent in a known format.
• Structured data is easily formatted, stored, queried, and processed
• Because of the highly organizational format of structured data, a wide array of data
analytics tools are readily available for processing this type of data.
• From custom scripts to commercial software like Microsoft Excel and Tableau

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 50


Structured Versus Unstructured Data
• Unstructured data lacks a logical schema for understanding and decoding the data
through traditional programming means.
• Examples of this data type include text, speech, images, and video.
• As a general rule, any data that does not fit neatly into a predefined data model is
classified as unstructured data
• According to some estimates, around 80% of a business’s data is unstructured.
• Because of this fact, data analytics methods that can be applied to unstructured data, such
as cognitive computing and machine learning, are deservedly garnering a lot of
attention.
• With machine learning applications, such as natural language processing (NLP), you
can decode speech.
• With image/facial recognition applications, you can extract critical information from still
images and video

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 51


Structured Versus Unstructured Data
• Smart objects in IoT networks generate both structured and unstructured data.
• Structured data is more easily managed and processed due to its well-defined
organization.
• On the other hand, unstructured data can be harder to deal with and typically requires very
different analytics tools for processing the data

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 52


Data in Motion Versus Data at Rest
• Data in IoT networks is either in transit (“data in motion”) or being held or stored (“data
at rest”).
• Examples of data in motion include traditional client/server exchanges, such as web
browsing and file transfers, and email.
• Data saved to a hard drive, storage array, or USB drive is data at rest.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 53


Data in Motion Versus Data at Rest
• From an IoT perspective, the data from smart objects is considered data in motion as it
passes through the network en route to its final destination.
• This is often processed at the edge, using fog computing.
• When data is processed at the edge, it may be filtered and deleted or forwarded on for
further processing and possible storage at a fog node or in the data center.
• Data does not come to rest at the edge.
• When data arrives at the data center, it is possible to process it in real-time, just like at the
edge, while it is still in motion.
• Tools with this sort of capability, are Spark, Storm, and Flink

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 54


Data in Motion Versus Data at Rest
• Data at rest in IoT networks can be typically found in IoT brokers or in some sort of
storage array at the data center
• Hadoop not only helps with data processing but also data storage

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 55


IoT Data Analytics Overview
• The true importance of IoT data from smart objects is realized only when the analysis of
the data leads to actionable business intelligence and insights.
• Data analysis is typically broken down by the types of results that are produced

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 56


IoT Data Analytics Overview
• Types of Data Analysis Results

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 57


Four types of data analysis results

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 58


Four types of data analysis results
Descriptive:
• Descriptive data analysis tells you what is happening, either now or in the past.
• For example, a thermometer in a truck engine reports temperature values every second.
• From a descriptive analysis perspective, you can pull this data at any moment to gain insight
into the current operating condition of the truck engine.
• If the temperature value is too high, then there may be a cooling problem or the engine
may be experiencing too much load

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 59


Four types of data analysis results
Diagnostic:
• When you are interested in the “why,” diagnostic data analysis can provide the answer.
• Continuing with the example of the temperature sensor in the truck engine, you might
wonder why the truck engine failed.
• Diagnostic analysis might show that the temperature of the engine was too high, and
the engine overheated.
• Applying diagnostic analysis across the data generated by a wide range of smart objects can
provide a clear picture of why a problem or an event occurred

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 60


Four types of data analysis results
Predictive:
• Predictive analysis aims to foretell problems or issues before they occur.
• For example, with historical values of temperatures for the truck engine, predictive
analysis could provide an estimate on the remaining life of certain components in the
engine.
• These components could then be proactively replaced before failure occurs.
• Or perhaps if temperature values of the truck engine start to rise slowly over time, this could
indicate the need for an oil change or some other sort of engine cooling maintenance.

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 61


Four types of data analysis results
Prescriptive:
• Prescriptive analysis goes a step beyond predictive and recommends solutions for upcoming
problems.
• A prescriptive analysis of the temperature data from a truck engine might calculate various
alternatives to cost-effectively maintain our truck
• These calculations could range from the cost necessary for more frequent oil changes and
cooling maintenance to installing new cooling equipment on the engine or upgrading to a
lease on a model with a more powerful engine.
• Prescriptive analysis looks at a variety of factors and makes the appropriate recommendation

Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 62

You might also like