You are on page 1of 15

~~

~
A~ ---::::
.....~;
PART•B
----;::
~. :::------~-----~-----~-----
t
-~4.=S
ESSAY QUESTIONS WITH iOL UT ION S
,-tPLEMENTATION OF loT WIT H RASPBERR~
I Pl
A,I lai~ the implementation of loT with Raspberry
Pl.
a1Z· ~p
;(tf;
~11'
Foran~~~--..:....-------------------.. . . . . . _:____ __
swer refer Unit-III, Q24.

~R OD UC TIO N TO SOFTWARE DEFINED


NETWOl!,K (SON), SON F_OR loT
rite in brief about Software Defined Netw ork (SON).
a1s. w .

~f/(', f :
' oefined Network (SDN) Model Paper-I, Q8(a)

seft"'•re ftWare Defined Networking (SON) is the method of separating


data plane and control plane. It then assigns the separated
. · S~ane to the centralized network con~oller. The data
plane consists of activities such as the outputs g~nerated
~ntr0l p ived from end users. Examples of it are packe from data
"l!Ckrets rece . t transmission, duplication of packets to be used in multi
t: ·(ling and reuniting the data.
Wh l l
ere as contro ~ ane consi.sts of act1v1
. .. . casting and
tles required to perform the activities related" to data plane
diY1 tivities of control plane does not contam any end user data .
8uttheac it are settin 1· · . l d k packets. It can be actually referred as brain of the netwo
h
g of po 1c1es re ate to pac et andhn . rk.
~atnP1es of.· g and developing routing tables.
.
SDN makes use of software based controllers -or API's
in order to interact with the embedded hardware infras
tructure
and even to guide the flow of traffic in the network. The SDN is capable of controlling a virtu'al netwo
through the use of any software. rk or traditional hardw are
· •
~ofS DN
There are different types of ~ON models, they are as follow
s,
1. OpenSDN
In this mode~ the network administrators control and manip
ulate the virtual and phY,sical switches using OpenFlow proto
at data plane. col

i SDNbyAPl's
In this model, the application programming interfaces contr
ol and manipulate the .flow of data on all the devices accor
the network. ss
3. HybridSDN
In this model, an environment is embedded with SON an~
traditional networking protocols to perform the functions
to a network. Here, the networking protocols take on the related
responsibility to control the flow of traffic where as SON
care of other traffic. takes

t SDN Overlay Model


In this model, the SON develops dynamic tunnels over .
virtual network which can be used by various on prem .
.
remote data centers.
• • .
ise and
Advantages of SDN

l. It provides centralized management for the networking


devices.
2
· It provides efficien~y, flexibility and scalability than any
other networking.
l. It assures successful transmission and delivery of data.
4
· It provides very low operational costs.
S, It allows to control and program the network through contr
oller.
6. ltprov·.1
1"es secur ·
ity through controller since it holds security po 1·1cies.
·

SPECTRUM fill-IN-ONE JOURNAL FOR ENGINEERING


STODENn
4.6
FUNDAMENTALS OF INTERNET OF THINGS [JHTU-HYDERA
Q14. Dl1cu11 about tredltlonal network archltectu,.. .
Answe r t

111c traditional network ur •h'i . d . ,·udinuO switches, routers etc.. The archife(.1urc
lrad't' c 11 cc1UJC 1Ncreated t,y special h0~ ware inc
I tonu 1network is (lcpit•tcd n~ follows
,... __ . Of

rl< ·- - Network
G;•wnlicntimt Network
Application
4 4 .
Netwotk
Appfication
Applicnllo11

Network Operating Network Operating


System SyAtem
,

Spcciali1,ed Pocket Forwarding Special:i'zed Packet Forw~ding


Hardware Hardware
'

I I
Network Network Network Network
Application .--- Ap~lication Application -- Application

· Network Operating Network Operating


System System

Specialized Packet Forwarding Specialized Packet Forwarding


Hardware Hardware

Figure: Architecture of Traditional Network


In the architecture of traditional network the network devices become complex with an increase
in distributed protocols
are implemented along with the hardware and interfaces. Here, the control plane as wel1'as
data plane are combined. Signalling
flow of data traffic is handled by control plane where as flow of payload data traffic is.
handled by data plane. But the traditio
network architecture has various limitations.such as the following,.

1. Management Overload

Management of various network devices and devices received from various vendors is difficult
to be handled by the netw
managers. Any modifications to the network leads to alterations in configuration of the
devices. With this manageme
II overhead is involved in traditional network architecture.

2. Complex Network Devices

The traditional network architectures becomes c~mplex with the implementation of protocol
s in order to enhance speed and
reliability. Even the interoperability decreases with the decrease in number of standard and
open interfaces. The complexitY.
of network devices effects the alterations in network as well.
I~i~ 3. Limited Scalability

The traditional networks provide limited scalability to the computing eavironments where
large volumes of data exchange
is done among multiple virtual machines.

WARNING: Xerox/Photocopying of this book is a CRIMINAL act. Anyone found guilty ls LIABLE to face LEGAL proceedin
gs.
4.7
--
- -ure
- -.-hltect --
-key ~ nts of:- -
. -:- - - - - - . . . ~ ~ - - - --- -- -~
MlfA and eleme SON.
,.....-- ...,1:e th• arc
Model Paper.ff, Q8(a)
oif.
•~ 1
"'"' 0
, . . , .

the helo;
It is simple ~calable agile cost
f SIJN overcomes the ltnutatJons of tmdlt1011al net work architecture fig~re ' ' '
eture ,.tte • The nrchit~cturc llf SDN nnd its layers arc depicted in
,~ 1-"-""C j\l\'hitctO
II
dc:tS
)' tnfll\ue

--
j\'C?,n
t,.i
- -N~twork
Applicntions Loyer
Network
Application
Network
Application
-- -
Network
Application
\.

,
SUN
Application
APl>licotion
- I
I
I
I
I
'
-
N,,rthi,ound
I
'
I
I
I
I
I
I
I
I
I
I
I

I
I
' I
I
I
I
I j

opcnAPl ''
I
'
I
I

- I : Control Layer I
I
I ~
I I
' '\
Controller
[: I
I
I
Ne' .vork Operating System
I
I
I
'
I
I
I / I ,.SON
'
I

-
I

I
I
1
I
I
Southbound Open APf l
I
I
I
I
I
I I (OpenFlow)
.
I I
I I I '
.
I I
I I

I
I
I
Infrastructure Layer.
I
,
I
I
I Simple Packet Network'
i Forwarding Hardware Devices
I
I '
I
I
·--------- . '
I,

\.
Network/ Network!
Simple Packet Simple Packet / Devices Devices
Forwprding Hard~are Fonvarding Hardware

,
S mple Packet
• Forw~rding Hardware

SDNLayers
· · SDN Architecture

In the above figure, the control and data planes are separa
ted and a central network 'controller is incorporated . The
by the
configuration, managem~nt and provisioning are made simple with the unified view of netwqrk. And this maintained
of packet forwarding hardware and it is not exposed to
iotlwace based SDN controllers. The infrastructure of SDN makes use
the netw<?rk devices and flow packets. These devices are
die applications. The centralized controller is responsible for instructing
of which they become simple and cost eftective. The key
developed from standard hardware and software components because
ekments of SON are as follows,
L Programmable Open APl's
provided by programmable open APl's. This again
The interfacing in between SDN application and control layers is
of service.
provisions the network services like access control, routing and quality
1 Centralized Network Controller
with the centralized controller. This handled
The SDN network is configured with separate control and data planes along - •
bythe network administrators._ .
3.
Staodard Communication Interface (OpenFlow)
~5!::~d communication interface is used in between the control layer and·in
It
frastructure layer in SDN architecture. The
allows to access and manipulate the forwarding plane
of · wts a S~N protocol provided for the southbound interface.
ped either statically or dynamically.
The~ork devices.1:"s protocol depends upon the concept offlows that ar'e develo
.
· . _ ow figure depict~ various components' of OpenFlow switch

SPEO'ROM fill-IN-ONE JOQRNflL FOR ENGi.NEERiNG STUDENTS


---------=r-.__. . -__: --:__-:._. =-===~
4.8 FUNOAMeNrALS OF INTERNET OF THINGS (JNTU-HYDERABA01
~
SDN
OpcnJ:low
Protocol
Controller • - • "· • •• . .. .,.. ..... ,,. .. orenFlow
Channel
--"

=
OtoUP
Table
I •

Pipeline
Flow Flow
Table
------~ Table

OpenJ?loW Switch

Figure: Openflow Switch


In the above figure, the OpenFlow switch consists of flow tables and group tables. 1 J ~y perform functions ·such as packet
fi h.
etc mg an d packet forwardmg. · There .
1s a OpenFlow channel in the switch connected tot he ex terna J SON controlJer.
. The. switch
· lied . • 1
is contro by the external controller through this OpenFlow switch protocol that rs 1mp emen ted 1·n the either
.
sides of mterface
.
tha't connects to controller and the network devices. The . fl t · ch of which cont1ans counters,
open flow table cons1sts,of ow ~ nes ea
match fields and certain instructions to be followed to match the packets. · -

4.3 DATA HANDLING AND ANALYTICS


016. Explain in detail about-the functions that are required for loT applications.
Answer:

The loT applications requires the following functions,


(i) Data Acquiring
. Data acquisition can be defined as a proces; of obtaining or acquiring data from IoT or M2M devices. An application i.e.,
data acquisition system interacts with several number of devices to obtain or acquire the required data. A device sends data only
on demand, otherwise at certain programmed intervals. The data acquired frorri these devices interact through the network layer,
transport layer and security layer.
A device can be configured by an application, to send data only at certain programmed intervals. This is _possible only if
the device has the-ability to support the necessary configurations. All these configured devices have the capability to control the
frequency of data generation.
Consider an example where a system confi~es an umbrella device to obtain information related to weather from an
Internet weather service on every working day in a week.
Consider another example, where an Automati(? Chocolate Vending Machine (ACVM) is configured to transfer the sales
data every hour in the day. The ACVM system can also .be configured to transfer data immediately whenever fault events occur
and also when there is a necessity for "Fill service" of any particular chocolate flavour.
An application sends data only after filtering or enriching of data at the gateway which is present at data adaptation layer.
This gateway exists betw~en device and application and it provides some of the functions like transcoding, data management and
device management.
The data management function provides privacy and security, data integration, compaction and fusion, where as the
device management function provides device ID or address, configuration, activation, registration, deregistration, attaching and
~~~- . .
/,
(ii) Data Organizing
For answer refer Unit-IV, QI9.
(iii) Data Analytics
For answer refer Unit-JV, Q20.

WARNING: Xerox/Photocopying of this book is a CRIMINAL act. Anyone found guilty is LIABLE to face LEGA.L .
, proceedings.
4.9
J. . tes on data generation. Q18. Write a short notes on
t tlrft' a short no
Model Peper-Ill, Q8(e) (I) Data validation
'

o•1• i , ·11 be at n systcl1l or il device and it (II) Dat, categorization for storage
ner.lk Wl . l ·1-,
....,e data gc th, Internet by using a gn CWl\y, 1e (Ill) Aasembly software for events.
11• ~,o:rd to c
.. trarts•t . us data ge11eratcs, Answer i Model Paper-I, QS(b)
~• thcvnrto
r ~,Ilg are tiCCS D1th\ {I) Data Vnlldatlon
~ "-'ti'-e oe , ice possc~s u power source uml it is
, 'f\ ad'"c ~ troller, tiansccivcr and memory. In Dala validation is a process of performing validation
. .A wt
,-i ·th uucrocon .
. · aeneratcd at a system 01. a dcv,cc, chc_cks _on the data acquired and this can be done' by using data
II"'~ \}le data 1s e- . • valtdat1on s?ftware. This software applies several mies, logics
-r..,r. 1,.,1,es ccsofcommumc11t1on.
~-h. w ,l,c Nnscqucn an~ ~cmanl1c annotations on the acquired data to check its
((lil'l\lll vahd1ty. An application and its services should rely upon valid
~ ll'Ptes D treetlight sensor or wireless sensor · data so that the predictions, prescriptions, analytics, decisions
f.\' Mn,·c . RFIf ti • sxamples of active
. d .
ev1ces.
c O 1e e, are agreeable. However, the data acquired from a device is
,resoilt not always correct, meaningful and consistent, therefore data
-r,.J6 ·ve Oe"ices Data .
·
,.111 passi . d ·ce does not possess a power source, so validation is nec~ssary.
\"' A~ss1re r evi d fi
l power source to generate an trans er A large extent of data is obtained from several devices
externa • . ·
. ,.,,ireS an d •ces the data 1s generated at a system or a like automobiles, health devices in ICUs or wireless sensor
.... I. such eVt
l (('!a.,ta, . . Th
11 th consequences of commumcahon. ese networks, machines used in industrial plants or in embedded
tbC • __ ..A on e · h · ll
J.\ffe [)ali<'"' not be connected wit m1crocontro er, components etc. The data validation software uses important
"". 1113y or may
~~ and memory. A conta~tless card and a label or and necessary resources. It is essential to choose a proper
~,,-er th , examples respectively. strategy, so the plan of action can be any of the following,
~are eir
❖ Filtering of worthless data at a device or at a gateway.
~
l)JIPPtes ATM debit card are the examples of passive ❖ Regulating the frequency of data acquisition.
RflDOraD
❖ Cyclic scheduling of a set of devices in an industrial
~-jceS-
... Event Devices Data . system.
(ii) Jne ~ is generated by a device, only once, for an Thus, the data enrichment, data aggregation, data
.integration and data compression can be achieved at the
f'\'ent adaptation layer even before transmitting it into the internet.
£JJD1Ptes i:- . d. . . .
Traffic detection, dark Clllllatlc con 1tlons,ms~~1on (ii) Data Categorization for Storage
•. securify violation etc. All these above conditions The data that is valid, beneficial and appropriate is used
~the event and then the action is perf'onned i.e.; data is in business process, services and business intelligence. So, this
eeoeratedforonce. data can be classified into three types for storage purpose and
riv) Event-driven Device Data they are as follows,
In event driven, devices the data is generated by a (a) Store Data Alone
device, only once for an event. A copy of data is oQly stored so that it can be processed,
[umples referred and audited any number of times in the future.
❖ Suppose, if a monitor senas a command to a device, ,(b) Store Data and Results of Processing
then the device receives the command and executes
The data and the results of processing are stored so
it by using an actuator. After the implementation of
that ·they can be used for referencing and auditing in
necessary actions, the device senqs an acknowledgement
the future. This is also useful for quick visualization
regarding the completion of execution.
and report generation even without processing the data
+ Suppose i{ an application requests the device for its again.
status, then the -0.evice status is transmitted to that
application. (c) Store Results of Data Analytics
M Real Time Device Data The online and real time data has to be processed,-so the
results of processing and the results of data analytics
.~ real time device data th~ information is instantly
are stored. ·
~ to the servers by using internet.
faapJe. (d) Store Big Data
. ~ever an ATM service is used, it generates The data acquired from huge number of source and
devices can be categorized into Big data. This data can
~da~ whi~h is immediately sent to the server using
be stored as big.data onto a cloud or data warehouse on
~ is~us 10 thJS ~ay Online Transaction . Processing
unplemented m real time. server database.

' SPECTRUM fill-IN-ONE JOURNAL FOR ENGINEERING STUDENTS


n 4.1 0

(iii)
FUNDAMENTALS OF INTERNET OF THINGS
[JNTU-HYDERA13/\ij
Assembly Software for Events
3. ObJ~tt oriented Database
Assembly software can be d fi
,iect
component in an applt'c,..t· 111
e cd as a soilware ObJ . oriented database • is defin ed as
d d . a collecrton
u IOU t1lat can COi\ . objects that are used in object onen
and also attaches date t. 't gregatc the events te es\gn. o
. une s amp to the even ts.
~ach and every event contains a Event 10 Eumples
and dev1ce 1D assoct· "t d . . . ~
u e Wit 1l 11. , logic value Concept Bc>se, Cache.
I I (a) Event ID Database Management System (DBMS)
4.
. \\~then a device generates an event then it is alloc Database management system is a software that Perfoi
oted ll\
Wt an ID called event 10. the following functions,
(b) Logic Value Define a database
A logic value contams· I or 0. It sets or resul ·Support query language
ts based on
an event state. Logic value 1 indicates that an
event is ❖ • Generate reports
geo~er:ated but the action is not yet taken. Logic
d th value Create data entry screens.
m tcates at an event is generated and the actio , •!•
n is
also taken or the event is still not generated. DBMS basically ~onsists of set of interrelated data
an
(c) Device ID set of programs for accessing those data. The prim
ary objectiv
of Database Management System is to provide
an efficien
The device that generates an event is allocated
with an environment that makes the retrieval and storage of database
ID called device ID. information easier.
Examples
ACID Properties
A temperature sensor generates an event when These are the properties that a transaction should
the posses
temperature rises to a adjusted ·value or underneat in order to avoid failures during concurrent acces
ha s of a database.
threshold vali.ie. ACID is an acronym which stands for Atomicity,
Consistency,
A pressure sensor which ~xists in a boiler gen· Isolation and Durabilio/.
erates
an event when the boiler pressure surpasses a (a)
critical Atomicity
value which requires attention. · Atomicity ensures that the transaction is either execu
Q19. What are the various ways of orga ted
nizing the completely or not executed at all. Incomplete transa
ction
data? Explain. consequences are not valid. Consider the exam
ple of a
Ans wer : transaction involving crediting and debiting the accou
Model Paper-II, QS(b) nt.
The data can be organized into various forms If there is a failure during transaction execution
such as , then
objects, files, databa,se, relational and object orien measures are ta~en to get back the data in a fonn
ted database which
etc. was in, before transaction. This is taken care of by
trans-
action management component.
1. Database
(b) Cons isten cy
Data b¥e is defined as a collection of data whic
h can
be organized into tables. The database tables The·data in the database must always be in a consi
are useful for stent
retreiving and updating. A single database table state. A transaction that is carried on a consistent
file is called as data
flat file database. Every record is represented in should bring the data to another c~nsistent state
a row and they after
· are not related to each other. ' execution. However, transaction need not maint
· ain
consistency at intermediate stages.
2. Relational Database
It is ~he responsibility of ,tpplication program to
A relational database is defined as a collection ensure
of data consistency.
which are organized into multiple number of table
s. All these
tables are related with each other by using keys. (c) Isolation
Keys are the
special fields such as primary key, unique key
and foreign key. All transactions must run in isolation from one
another
that is e~ery transaction shou
Example ld be kept unaware of other
transactions and execute independently. The interm
. MySQL,Microsoft SQL Server, Oracle .database, ediate
Post res~lts generated by the transactions should
not be
GreSQL. available to other transactions. .
WARNING: Xerox/Photocopying of this book is a CRIMINAL
act. Anyone found guilty is LIABLE t0 tace
. . s.
LE'GAL proceeding
-~---:..---:----- ---:--~-::::-::::---:-::-:.----------__ :.:4· ,1
~1-4 . . . . Q 20, WhatlsAnalytlcs? Explain about analytics pha~"";,
~ n11ral1i11ty ' that data renrnms Ill a co11s1stcnt Answer l Model Paper-flt, O ' b)
v . pr0pertY ensutsl .,.. 1'his is ensured by loadit1g the
14) f!ttS ft the 1a1 u1 .,. Atrnlytics
te even a ~ d1•sk This task is handled by recovcry-
st11Jjfied data into • t of DBMS. It ensures 1h01 either Anulytics refers to decision making which is comple1 ly
111ou t componer1 . d. k I
flll!lgemen , din d data is loaded mto IS Ol' t l ~ basod on facts instead of intuition. It acts as a key factor fo r '-ie
Ill co111Plete 1110 ueld be sufficient to get thl.l modified
tftC , thatWO success of au enterprise as it provides business intelliger' e.
. foflltation . disk. This enables a system to recover Analytics ,arc used tel design or bui ld models by selecting ne
it'l • loaded m
dDta ts ten1 cra11h. appropriate data so the data must always be availa '.1 Je 'id
flet asys
s ted Database access ible. Further these models arc fi rst tested and th~n 1 ·d
ptstribU base partitions o single logicnl database for various processes and services. Moreover1 analyuc'l 11c;e
~ pistributed d;ta ments Later stores them on several
various kinds of techniques to obtain new informat1<>n, '' w
u)tiple data agconne~ted tlu·ough a single network.
iJl!O 01 t computers . . , parameters which adds even more value to the data. Somr )f
~ .i.>penden base Chataderistics . the~e techniques are arithmetic and statistical methods, [; ta
111'· ted pata .
llkttibll •sties of a distributed database are as mining, Machine learning etc.
11·· fb.e characten .
Phases of Analytics
follow-s, ·cal Relation . .
The process of analytics is classified into three ph:, ~s
(') i,ogt. 'buted database is c1;ms1dered as a collection of
1lte distrl ba5es which have logical relationship among and they are as follows,
several data .
1. Descriptive Analytics
dJelll•
franspareocy . . . , D~scriptive analytics provides infoTT?ation to a qu' -y,
(b) y exists m between databases 1.e., a database bas.ed on the historical data or the data that is collected in ,e
rransparenccess data from all the databases providing past. It groups the data find the mean value, variance va I e,
~,..,pr can ac ,
...,.._ • to the users that they were using only single
an illUSIOD · number of occurrences of an item and aggregates or so , e
database. specific properties. Moreover, Descriptive analytics provi.,;;!S
ation Independ,ent OLAP, Data visualizations, generating spreadsheets and ' ~y
(c) ~ -b t~ database must be location independent i:e., perfonnance indicators.
Distrl ~ m user should not know about the location of
tbe sys andealso 1 1ty of movmg
· about the poss1·b·1· · data fr om · The Descriptive Analytics methods are as follows,
I
· :location to another without any affect. (a) .Spreadsheets and Data Visualization
6.CAP Theorem . . . . . Spreadsheets provides the results of descript, ;e
· CAP theorem states that 1t 1s not possible for a d1stnbuted analytics .in the fonn of a table. It contains values in
~ to guarantee all the three features. These three features are,_ its rows and columns but within a cell. Each and e, ,. ry
(a) Consistency value in a cell is related to another cell or a group
(b) Availability of cells. Here, the relationship can be a formula 0 a
(c) Partition tolerance. statistical value or boolean relation etc. T~us, in t'~is
w•ay spreadsheets.provide user visualization of the de.a.
(b) Descriptive Statistics based Reports and D· ta
Visualizati~n
Descriptive analysis uses descriptive stati$tics wh ~h
refers to computing the minima, variance, peak ,, 1d
probabilities etc. It uses the formulae for applying ,m
the data sets to provide easily understandable cu ta
visualizations to the users.
(c) Data Mining and Machine Learning
I Consistency and availability Data mining uses algorithms to extract hidden patterns

I Consistency and partition tolerance


whereas machine learning designs a model to perfon 1 a
specific task. The programming language is c_onsidered
I Availabilit)' and partition tolerance as an essential part of open source products and it
also acts as a software environment for the statistical
,-u:; ..CAP
Figure: Theorem computing and statistical graphics.

:-:. .SPECTRUM fill-IN-ONE JOURNAL FOR ENGINEERING STUDENTS


4.12 FUNDAMENTALS OF INTERNET OF THINGS [JNTU•HVDERABAO]
2. . Pl'edictive Analytics (11) In Ml!tnory and on St~re Column Format 9Ption
Ptedictiye analytics is an ndvnnccd analytics where a Jt1 this format option there are few columns and many
user h~terprets the 1-c,sults_wh,ich nrc obtained from dcscripti_vc rows. This type of option is useful for h. analytics
. Oti
analyttcs. The data v1suahzat1on method displays the cJlects on monthly sales, yearly profits etc. In t_ 1s option, the
a product that can ?~cur in f\iturc. This enables 1hc enterprises entire data ih the column is brought mto CPU by a
to take better dec1S1ous. This type of atmlyt ics use sevol'aJ single memory reference.
algori_thni,s like con-elation and rogre~sion, optimization. other
teclunques like simulation, modelling. machine , Jeami11g, 3. Reol Time Analytics Mitnngemen~
neural llt'tworks etc. The several sothvure tools in the market .
Rea l t uncana lyti'cs management is useful for providing
provides the results of predictive analytics to the user lo a
!urge speccl up 8 fior OLAP and OLTP.d .It petforms direct
easily understandable manner.
querying by using an OLTP database an ,1 can ~1so query an
Examples OLAP databa_se, data warehouse on already q~ened resul~

It predict-s the future trends, identifies patterns and Q21. '=xplain in detail about big data analytics.
clusters with same behaviour. It also performs preventive Model Paper-I, Q9(a)
Answer:
maintenance by observing the device failures that have
occurred in
the past. Moreover, it implements integrated Big Data Analytics
marketing strategies. It also prov(des predictions based on the The big data analytics like Hadoop~ NoSQL and
anomalous detection and anomalous characteristics.
Cassandra will support the big data archi~ectures/mfyastru~res.
3. Prescriptive Analytics The storage. of big data is the storage infrastructure which is
designed to store, manage and retrieve large amount. of data.
Prescn)>tive analytics considers business mies descriptive
analytics inorder to provide infom1ation regarding, It stores and formats the data in the storage such .t hat it can be
accessed, used or 'processed by the applications -e~sily. The
(i) What will happen? st~rage of big data is :flexible in its scale. It allows mpu~ and
(ii) When it will happen? output operations with huge m.qnber of data files and obJects.
The storage is built using the DAS (direct attached storage)
(iii) Why it will happen?
pools, (NAS) scale out or clustered network attached storage or
It suggests predictions along with necessary actions infrastructure based on the object storage form-¼t. The storage
considering a set of rules and inputs. infrastructure is c0nnected to the computing server nodes that
allow to process and retrieve the large amount of data.
There are some more types of analytics, they are as
follows, Mo~t of the compan~es are applying
I
th~big data analytics
.

l. Event Analytics to achieve greater intelligence from metadata. The big data
storage makes use of low cost hard disk drives, even though
Event analytics requires event data for tracking and the prices of flash permitted to use flash in servers and storage
reporting the events. Every event consists of the following systems as the base of big data storage. The storage system
components,
will gather_ multiple servers connected to high capacity disk
(i) Category t0 supp,ort analytic software that is written to test the data. It
depends' upon the database~ that are connected parall_ely to
(ii) Action
analyze the data that is retrieved from various sources. Because
(iii) Label of this the big data might, not have proper structure leading to
(iv) Value. complexities whil'e processing. Various components of Big Data
storage include HDFS, NoSQL, Hadoop, MapReduce, Server
2. In Memory Data Processing and A~alytics nodes etc.
In some specifi~ database, there exists an option for
' The Apache Hadoop Distributed File System (HDFS)
selecting row format or column format with in the memory.
is one of the common big data engine combined with partial
The two options are as follows,
NoSQL database . Hadoop is the software coded in Java
(i) In Memory and on Store Row Format Option language. The HDFS advances the data analytics among various
In this format option there are few rows and many server nodes without the need of performance hit. Mapreduce
columns. Each and every row is associated with several component allows Hadoop to distribute the processing in the
columns. The data accessing becomes easy as the entire form of safeguard against catastrophic failure. Various nodes
data in the row is brought into CPU by single memory ·act as the data analysis platform at the end of network. The
reference. The -row format is beneficial for perfonning Map.reduce will execute the processing directly on storage
OLTP operations such as inserting, updating, querying node of data. Then it collects the results from the server and
etc. compresses to produce sin~le CQhesive response.

WARNING· Xerox/Photocopying of this book is a CRIMINAL act. Anyone found guilty is LIABLE to face LEGAL .
' , proceedings.
r-• ~~~~™:;~~~~~~~~~~ ~::::::---:-:------~~
~~1 ()rie n,ore fcn1111·t' or Hadoop is that. it receives
. .
. 4.13
datn in unstructured for 111 t h
not mean
a SUc as audi 'd
i~ replacing other rclat1011al But o, and text. It
tcdmologics hke database. it Ille 1.. v, eo does
1111do0 P • . ans 111 at hoth the H d
1,~t I r·illdv. rhe innss1vcly pnrnlkl rclntionol. ph1tlorr11 for example <le• . h a oop and relational databases
a1s wit the hi h I
l
1 idcu pa • . . . . . ,
Arc Pl ., 1fopttal and suppn1 ts tlw ll'nturl' ol rcce1vmg dntn 1cciucsts front 1, _g va ue transactional data which is
u:tuicc L scrs and appl1c t' .
i~ ~tfl ,c kvcl sccurit\'. a ions with performance guarantee
dc11tcrr1 s 1 • •
11
'h, 01w or th1: ledmology 1hut den ls with big cl I
" 1c • l~ad~,op cco system is a a ot her technoJo · dc
. ta frotll databases nrc, gres use 1or selecting and access-
1
. btgua •
111~ • t' . 1· 1ructurc.
Atna,on wch service cw m rns
~
J\p:tchc nnFS fi.w distributed file system.
~
M.ar,Reduce for distributed progrmnming r'nodel.

Cassandra or HBase.
}-{Base stores data in colwnn . • .
format. It grants read or write access .penrnss1ons
. . to the users on several database tab! tha
. 'buted in HDFS. It provides random access with quick look-ups and 1 es t
ate d1stn ess acc~ss 1atency.
.,..---Illustrate the architecture of data analytics. ·
Q2.2
z\OSwer : Model Paper-II, Q9{b)

J)ata Analytics ~rchitecture


. The data analytics architecture consist~ of four layers given as follows,

Data sources· layer.


(1)
Data storage and data processing layer.
(ii)
(Iii) Data accessing and query processing layer.
(Iv) Data services and advanced analytics layer.

Analytics Applications Layer


_Data Services, Data Reporting, Data
Visualizations, OLAP, Advanced Analytics
·, J

'I,

Analytics and Applications Support Layer


Data Access, SQL, OLTP, ETL, Mapreduce
Query Processin$, R-descriptive Statistics

It '

Data Storage Layer

Datastore, Data Warehouse, Event Stream


Processing and Cornelex Event Processing
J 'l

Data Sources Layer


loT/M2M, Enterprise and External
Data Sources

Figure: Architecture of Data Analytics

SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS


Qi3. Discus b · l=UNDAMENTALS OF INTERNET OF THINGS [JNTU-Hv
f= s a O\ltthe des I h o~
lie System (Hl:>FS) g OfHadoopDlstrlbutad The building blocks (or) daemons of Hadoo
. ~~~
A ·
ns~ .-: Par,
I. Namenode '
A fi l , , Model PA1>er•II, Q9(1)
c syste m th t 2. Datanode
vari 0 · a tnatH \ol's I
d' .us nctwol'ks, ,.;._ IC. I'
erred tn . stornuo c <'lcross•
f . \llgt'
i,

I •
1stnbuted file st
,
s, . ns
• · em nssonah:' I .
l tstuhutcd tlk system. The
3. Secondary Namenode
referred lo as HDrs (H· d . l . with Hnllllop is typically 4, JobTracker
has hecn drsign 'd . , a _oop Dts~nh\ltcd File Systet\1). I-IDFS
5. TaskTracker.
gigah) tes or krab o1_ ,stonn
. . c g tnassi ' ',
, , "" .
· · \ c <1mount hke mega bytes,
i ~ ks ot data wl . ·I Namenode
aC'cess pat1e111 an , lie 1 support streaming data I.
. ' \1so on. Mot\'ov ·1
copies of the data t ti · er. 1 ensures avnilability of Namenodes manage the filesystem tree and
' 0 ,c cmi user by · · . · · st
remote lt1cation The. h , , .. savmg its copy on different metadata for all files an d dtrecto nes m persistent ore8
. . te ) prov1d111g parallel processing. uses two files for this purpose. They are namespace i geStora
111e HDFS ·ct edit log. Namenodes have the information of data no:age
prov1 es supports the following, . .'. .
store all the blocks of m1orm atton Perta'mmg
.
to a spe es. 11ih
l , Storing files of larger size This information can be re-create d from the nodes asCtfic •1~
2. Stteaming data access the system starts. Hence, the namenodes does not k SOon
• •
blocks on persistent storage. eep,
I

3. ·
Commodity hard~are.
2. Datanode
l. Storing Files of Larger Size
Datanodes store and retrieve the blocks of info .
. •
File~ of larger size refers to the files that are capable of . upon request. They also generate reports to the narnl'lllat
stonng Terra Bytes
(TB) of data. whenever they( store blocks of information. This n:
2. responsible for carrying out the operation allocated to i:J
Streaming Data Access
master node. They typically perform read and write operati I
The data stor~d on HDFS follows "writing once and ~etweeq. HDFS blocks and local filesystem. These op~rat~
involve splitting offiles into blocks distributed over diff~~1
reading several times data processing pattern. Here a . . en
data node according to the mstru ct10ns provided by ~
data set is simply copy from the source location and
NameNode. After this, the client will be able to connectlVi~
analyzed. While performing data analysis, the maj~r .their
respective DataNodes to carryout their functionalitiJ
, focus should be made on reading most (or) all of the These DataN
odes perform inter-node communication
data rather than providing high latency with ~espect to carryout data replication on atleas
t 3 datanodes.
the first record. ', · DataNodel bataNode2 DataNode3 DataNode4
3. Com ~odit y Hardware

HDFS is designed in such a way that it can be run


(f:-ib·fr: <ti:r:r cHffl: crilt
' ' .... ___ .,:_.........., ,
. ' ....... _____ ...... , , ,........ _____ ... , '
' ..............' ,
on commodity hardware that can be easily found
Figure: DataNodes
everywhere. It also _supports hardware components
of different vendors to be easily integrated thereby By using replication, it is possible to inake the syste,
available even if some blocks gets corrupted (or) deleted.111
malcing it highly reliable and cost-effective. However,
information regarding information contained within a DataN1
this feature makes the HDFS more prone to node gets updated periodically on the NameNode to carryout
failure . Moreover, it does not provide any notification operations correctly.
to the user about failures and continue to work
3. Secondary Namenode
accordingly.
A Secondary NameNode (SNN) is used for mergi~
Q24. Discuss in detail about the building blocks of the namespace images with the edit log. This integratio~ ~
Hadoop. done so as to prevent the edit log from increasing its s~•
The secondary NameNode bpsically run on seperate physical
Ans wer : Modei Paper-Ill, Q9(a) machine. Thi~ is because merge operation require significant
The building blocks of Hadoop are nothing but the amount of CPU and huge memory space. Beside this, asec·
ondary NameNode maintain a copy of the merged namespa(e
daemons that ar e distributed. over different. mach
.
ines of the images. This images can later be reused in the events wbefl
network an d carry out their associated funct1onabty. nameNode failure occur.
· G x/Ph
WARNIN : xero Otoe,opying of this booK is a CRIMINAL act. Anyone found guilty is LIABLE to face LEGAL proceedings.
( ·
,. ~rr..4 4.15
~ 11 cker _ .. , , •. , ,
.,j J 1)1\cker 1.·s ""!\Other daemon o1. Hadoop. ll Is tcspornublc fut p1ov1d1ng cornmu,iic·it· ,, L t IJ cJ d
. . . . . r ,Jr1 , ,e ween r a oop anc I en ~user
Job
. , Jt is an iasterwhich manages the cn1trc cxccutmn ol MnpRcducc 106 It 's rcs1J<ji·,s,·b · • y ·,, l
. • , ,. . .. , . , . , . · · · . · t , 111 1 1 execution p1Hn
d·c imwidcd hy 1hc uset 1n the ch1stc1. I h1s plt111 is ucatcd by ricrforming lhc 1.< o; o •crcat can
apP1,·citJOO
ting the co t 11 k
) owing as s,
~re>:cCU . the flies tha1 arc needed to be inrluclcd in the process.
(aptllflllg . . ..
(~) . the tasks nssoclf\ted with the process tu various nodes.
Allocating .
· (b) , . the ex('Cutioll ol tbcsl' tnsks .
(C) rra:!':~sk fails the JobTrackei- nutomatica!ly reaches it on different nodes. In a Iladoop cluster there exist only a sing le
Wh wh' h typically runs as master nodes of' the cluster on a server.
kef tC
Jobrrac .
faskTrarker . . , . .
S. !
raskrracker is responsible 0 r managing th e executio~l of tasks -that .are allocated to individual slave nodes . The bas,.
. . . f each Tasklracker 1s to execute all the task assigned by Job Tracker. Asingle TaskTracker has the ability to shown
s1b1hlY
resl'°~an oneoJava Virtual Mac
· h'mes (TVM) Tl . h l · . out · MapRccluce
.· ·. us ~ ps m carry mg multiple tasks concurrently.
n,OfC kTracker even communicate with Joblracker so as to provide updated information about its state. Ifthe TaskTracker
11
The ~des this infonnation, then the JobTracker considers it as dead and takes backs all the responsibilities given to that
does not provi
~ kTracker. These respons1·b·1· · 1 h dd
t 1t1es are t 1en an e over to some other node in the cluster.
·cular las . .
r8111 . ocess is illustrated m the figme below,
Thts pr
TaskTracker
with MapRe- node I
duce Task

TaskTracker
Clierit Job Tracker with MapRe- node2
duce Task

TaskTracker
withMapRe- node 3
duce Task

Figurie: Communiction Process between Joblracker and TaskTracker


From the figure, it can be observed that the code submitted by client gets received by JobTracker which inturn divides the
code into different MapReduce tasks. Thes_e tasks are distributed over different nodes that carry their own TaskTracker. Once the
tasks gets completed, the results are returned to the JobTracker. During the time period of carryingout these tasks on TaskTracker,
it regularly provide updates regarding its current state to the fob Tracker.
Figure below depicts the topology of Hadoop cluster
Data Node
t
TaskTracker

Secondary NameNode i.--:i..1

TaskTracker

Data Node ·
t
TaskTracker
. Figure: Hadoop Cluster
~N The above figure illustrates the topology ofHadoop cluster. It can be inferred from above figure that the masterNod~ ru~s
~~eNode and JobTracker daemons. If failure occur, the standalone node with secondary NameNode is used. If the cluster _is
Oll-~otbe ~ondary NameNode reside on one single slave Node. Whereas,on large clusters the NameNode and JobTi:acker he
machines. The slave machines uses DataNodes and TaskTrackers so as to run the tasks on the node where d~ta is st0 red.

SPECTRUM fill-IN-ONE JOURNAL FOR ENGINEERING STODENn


4.16 T OF THINGS [JNTU -Hvo~R
. FUNDAMENTALS OF INTERNE /\~~~
Q25. What are the things that need to be configured before rur'lil Ing Hadoop? ~I
~~: Model Pa
~
Hadoop comes up with . a set of configur . directory an<l are needed to be c \
ntlon files thnt rcs1·Jc in
, a, ccmfigura.twn
. be generated usmg
. 'ls ' command cmfi
k~1
effectively for the successful execut ion of Hadoop . The list of conflgurntlon fil es can
. oru~~
as follow, · -~~'" '.'..~·-"'.'.':~=~•.~ ·

I4

. . · . . . des and master. And then define JAVA Ho


Configuring these files reqmres prov1dmg the locatton of Java on al! no . This mapping is carried out - . M
· · b · •
environment vana le m hadoop-env.sh morder · ·
to md1cate the Java
· · tallat10n dtrectory
ms · . Usinoe t
following fonnula,
export JAVA HOME = /user/share/jdk
-
All the variables except JAVA HOME are included inthe file, name d hadoop- env·sh All these variables are used •
creation ofHadoop environment. The-file hadoop-env.sh can also be changed accordiµ h irements of the s tn
g tot e requ Ystem.
• can
· be seen from . generate
. . · 'ls' comman
As 1t the hst d usmg d that•most of the configuration reside .in the XML files · In
· · of Hadoop there .··
earher versions •
existed only two XML files which are hadoop-def:au lt·xrnl and hadoop-s1te.xml.
. Hadoop-defiau
· ' • which
carnes all the default configurations • can be modified
• · h. d -site
accordmgly over a oop ·xml · Howeve r m the
. later vers10·
some additional files are incluaed which ate expanded fonn of hadoop-site.xml.
t!• core-site.xml
❖ hdfs-site.xml
mapred-site.xml.
,.
Use of these files made the configuration more appropriate which is far more better
than the configurations that resideo
hadoop-site.xmi .
All these configurations are necessary to be carried out before running Hadoop . (.
Q26. Define MapReduce. Discuss about weather dataset.
Answe r:
MapReduce
MapReduce is a data processing model used to handle large amount of data distribut
ed over the network. It is capableo
processing the distributed data parallely. It provides two methods namely, Map() and
Reduce( ). Map() method is responsible~
performing operations like filtering and sorting whereas, the Reduce() method is responsi
ble for performing summary operatio
like count. The programs that are based on MapReduce can be implemented in differen
t languag ~s in hadoop .
Weather Dataset
A program that handles the process of mining the data associated with weather is referred
to as weather dataset. This da
is captured by sensors on hourly basis and by considering different locations globally
. In this way huge amount of data is collected
for the purpose of analyzing it with the MapReduce. ~n exa~ple of such d~ta is
the information provided by NCDC(Nation~
Climate Data Center). It follows ASCII foi:mat to save its data m the database m the
form of records. This type of format is capabl
of carrying field of both constant and van able lengths. The data can be formatted
in.the followin g way,
23759 23062015 0530 +47213 + 23121 + 523 470 3 s 750
s 15000, .I N 12 - 14
12139
111'ARNING· Xerox/Photocopying of this book is a CRIMINAL act. Anyone found guilty is LIABL t f
,.. , •
c:, .
i;;; o ace LEGAL proceedings.
.

Field Oe11cripllou
-----........_._
23759 Identifier to represent wcuthcr station
2306201 5 Dote
0530 Ti111c
+ 472 13 x I000 degrees latitude
+ 231 21 x l 000 degrees longitude
-j- 523 Elevation in meters

470 Dircttion of wind in degrees

3 Quality code

s Southern direction

750 Sky ceiling height

1 Quality code

s South
15000 Visibility distance

1 Quality code ·

N Towards north
' 12 Temperature of air in Celsius
1 Quality code
-14 . Temperature of dew point in Celsius
1 Quality code
12139 Atmospheric pressure in hectopascals
1 Quality code

It can be observed from the origiJ:!al record that the fields' included in it are written continuously without any delimiters
between them. The table describes the tern1~ used in the record. Typically, the records are arranged in terms of date and the ~ eather
station where these records are placed in its respective direct~ry. The direct·ories are created in an year-wise manner which carries
fi1es of weather data associated with each weather station. The directory is composed of several small files _because there exist
many weather stations. The process of handling small number of large files is simpler and efficient.

Q27. Explain with an example, the mapper code ·in Hadoop.


Answer:

MapperCode

A mapper in MapReduce is responsible for providing parallelism. The TaskTracker contains the mapper and processes it.
The code associated with mapper is referred to as mapper code. The logic used in the mapper code must be capable of execµting
independentJy. It should be capable of performing the parallel tasks mentioned in the algorithm. The input format resides in driver
program of specific InputFormat type or the file in which the mapper is executed. The output of mapper might be mapper and
value that are set in the mapper output. It is stored in an intermediate file that is specifically created in OS space path. Operations
such as read, s~uftle and sorl: are perfonned on this file.

The general fonnat of a mapper class is as follows,

Org.apache.hadoop.mapreduce.Mapper < INPUT_KEY, I,NPUT_VALUE, OUTPUT_ KEY, OUTPUT_VALUE>

This four parameters specify the type of inputs and outputs associated with the mapper function.

SPECTRUM flLL-IN-ONE JOURNAL FOR ENGINEERING STUDENn ,


418 . F THINGS [JNTU-HYDERA
E.. ,unple
FUNDAMENTALS OF INTERNET O
~
' • .
Consider an example Java program thut computes the uunibet • ·11 a file to illustrate the use of mapper code
of words 1
• ·
import java.io. lOE.xccption;
imp~rt java. util,Stl'ingTolreni7.er;
import org.apachc.hadoop .io.lnt Writuble;
import org.apnche.hadoop .io. Text;
import org.apache.hadoop.mapreducc.fy1appcr;
public class WordCountMappet extends Mapper < object, Text;
Text, In Writable>

ptivate static final lntWritable iw = new IntWritable (1);


private Text w= new Text( );
public void map(Object key, Text value, Conte·xt cnt) throws IOExc • 1 t erupted Exception
eptlon, n e

StTingTokenizer t = new
StringTokenizer(value, toString( I));
• while(t.hasMoreTokens( ))

word.set(t.nextToken( ))
context.write(w, iw);
} I •

}
}
Q28. Write short notes on reducer code . .
Ar-sw er:

Reducer Code
Reducer is capable of reducing the inte~ediate values all of.them
which share the key to smaller set of -yalues. A reducer
in \fapReduce performs three major operations. They are,
I. Shuffle
2. Sort
3. Reduce.
1_. Shuffle
It is responsible for collecting the inputs and generating the mappe
rs in a sorted order. It uses HTTP protocol to retrieve
tht required partition of the output of mappers.
2. Sort
l'
I · Jt is responsible for arranging the reducer keys'in various orders
typically using merge sort approach.
-
I.
3. Reduce
This phase of reducer uses a method reduce( object, Iterable, contex
t) which relates to <key, (collection of values)> associated
wi h every input in the sorted array. It uses TasklnputOutputContext.
Write( object, object) to forward the results generated from
rec uce( ) to RecordWriter.
The general format of reducer code can be written as,
org.apache.hadoop.mapreduce.Reducer<INPUT_KEY, INPU
T_VALUE, OUTPUT_KEY, OUTPUT_VALUE>-
lt c~n be observed that it also uses four parameters similar to
mapper code.

WARNING: Xerox/Photocopying of this book is a CRIMINAL act. Anyone found guilty


is LIABLE to face LEGAL proceedings.
4.19
~~,r-4
~
f.J 'der 811 exantpk javu 11rogram 11lat compu1cR the toilil m11nbcr of worcfa it1 a f tic to illt1<ftralc
the U'lc of reducer code,
Cottsi
J'avr1 io JO Exception;
in1por1 • · ·
•. vo util.StringTokcnizcr;
itnpo11 Ja L •

. .. org.apachc.ht\doop.ioi
,111pOll
apachc.hadoop.io. Int Writublc;
j1nport Org •
-a apache.hadoop.io.Text;
inlP0 rt 01 0 •
ache.hadoop.mapReduce.Reduccr;
in1port org •'ap
duce. Reducer. Context context
protected void reduce(KEYIN key, ltera~le <VALUEJN> values, org.apache.hadoop.,mapre
Exception , InterruptedExcept1on ·
throws IO
>
public class WordCountReducer extends Reducer< Text, JntWrita~Ie, Text, IntWritable

IntWritable cnt == new IntWritable();


tion, InterruptedException
public void reduce(Text key, Iterable < IntWritable > values, Context con) throws IOExcep

int total == 0;
for(lntWritable value : values)

total +==value.get( );

} .
cnt.set(total);

con.write(key, cnt);

In the above example, the four parameters are Text, Int Writable, Text and IntWritable
in which, the first two are input types
each of the words i.e., the
and theremaining are output types. Here, an iterator is used to move across all the words and counting
total result is provided as output.

SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS

You might also like