Professional Documents
Culture Documents
~
A~ ---::::
.....~;
PART•B
----;::
~. :::------~-----~-----~-----
t
-~4.=S
ESSAY QUESTIONS WITH iOL UT ION S
,-tPLEMENTATION OF loT WIT H RASPBERR~
I Pl
A,I lai~ the implementation of loT with Raspberry
Pl.
a1Z· ~p
;(tf;
~11'
Foran~~~--..:....-------------------.. . . . . . _:____ __
swer refer Unit-III, Q24.
~f/(', f :
' oefined Network (SDN) Model Paper-I, Q8(a)
i SDNbyAPl's
In this model, the application programming interfaces contr
ol and manipulate the .flow of data on all the devices accor
the network. ss
3. HybridSDN
In this model, an environment is embedded with SON an~
traditional networking protocols to perform the functions
to a network. Here, the networking protocols take on the related
responsibility to control the flow of traffic where as SON
care of other traffic. takes
111c traditional network ur •h'i . d . ,·udinuO switches, routers etc.. The archife(.1urc
lrad't' c 11 cc1UJC 1Ncreated t,y special h0~ ware inc
I tonu 1network is (lcpit•tcd n~ follows
,... __ . Of
rl< ·- - Network
G;•wnlicntimt Network
Application
4 4 .
Netwotk
Appfication
Applicnllo11
I I
Network Network Network Network
Application .--- Ap~lication Application -- Application
1. Management Overload
Management of various network devices and devices received from various vendors is difficult
to be handled by the netw
managers. Any modifications to the network leads to alterations in configuration of the
devices. With this manageme
II overhead is involved in traditional network architecture.
The traditional network architectures becomes c~mplex with the implementation of protocol
s in order to enhance speed and
reliability. Even the interoperability decreases with the decrease in number of standard and
open interfaces. The complexitY.
of network devices effects the alterations in network as well.
I~i~ 3. Limited Scalability
The traditional networks provide limited scalability to the computing eavironments where
large volumes of data exchange
is done among multiple virtual machines.
WARNING: Xerox/Photocopying of this book is a CRIMINAL act. Anyone found guilty ls LIABLE to face LEGAL proceedin
gs.
4.7
--
- -ure
- -.-hltect --
-key ~ nts of:- -
. -:- - - - - - . . . ~ ~ - - - --- -- -~
MlfA and eleme SON.
,.....-- ...,1:e th• arc
Model Paper.ff, Q8(a)
oif.
•~ 1
"'"' 0
, . . , .
the helo;
It is simple ~calable agile cost
f SIJN overcomes the ltnutatJons of tmdlt1011al net work architecture fig~re ' ' '
eture ,.tte • The nrchit~cturc llf SDN nnd its layers arc depicted in
,~ 1-"-""C j\l\'hitctO
II
dc:tS
)' tnfll\ue
•
--
j\'C?,n
t,.i
- -N~twork
Applicntions Loyer
Network
Application
Network
Application
-- -
Network
Application
\.
,
SUN
Application
APl>licotion
- I
I
I
I
I
'
-
N,,rthi,ound
I
'
I
I
I
I
I
I
I
I
I
I
I
I
I
' I
I
I
I
I j
opcnAPl ''
I
'
I
I
- I : Control Layer I
I
I ~
I I
' '\
Controller
[: I
I
I
Ne' .vork Operating System
I
I
I
'
I
I
I / I ,.SON
'
I
-
I
I
I
1
I
I
Southbound Open APf l
I
I
I
I
I
I I (OpenFlow)
.
I I
I I I '
.
I I
I I
I
I
I
Infrastructure Layer.
I
,
I
I
I Simple Packet Network'
i Forwarding Hardware Devices
I
I '
I
I
·--------- . '
I,
\.
Network/ Network!
Simple Packet Simple Packet / Devices Devices
Forwprding Hard~are Fonvarding Hardware
,
S mple Packet
• Forw~rding Hardware
SDNLayers
· · SDN Architecture
In the above figure, the control and data planes are separa
ted and a central network 'controller is incorporated . The
by the
configuration, managem~nt and provisioning are made simple with the unified view of netwqrk. And this maintained
of packet forwarding hardware and it is not exposed to
iotlwace based SDN controllers. The infrastructure of SDN makes use
the netw<?rk devices and flow packets. These devices are
die applications. The centralized controller is responsible for instructing
of which they become simple and cost eftective. The key
developed from standard hardware and software components because
ekments of SON are as follows,
L Programmable Open APl's
provided by programmable open APl's. This again
The interfacing in between SDN application and control layers is
of service.
provisions the network services like access control, routing and quality
1 Centralized Network Controller
with the centralized controller. This handled
The SDN network is configured with separate control and data planes along - •
bythe network administrators._ .
3.
Staodard Communication Interface (OpenFlow)
~5!::~d communication interface is used in between the control layer and·in
It
frastructure layer in SDN architecture. The
allows to access and manipulate the forwarding plane
of · wts a S~N protocol provided for the southbound interface.
ped either statically or dynamically.
The~ork devices.1:"s protocol depends upon the concept offlows that ar'e develo
.
· . _ ow figure depict~ various components' of OpenFlow switch
=
OtoUP
Table
I •
Pipeline
Flow Flow
Table
------~ Table
OpenJ?loW Switch
WARNING: Xerox/Photocopying of this book is a CRIMINAL act. Anyone found guilty is LIABLE to face LEGA.L .
, proceedings.
4.9
J. . tes on data generation. Q18. Write a short notes on
t tlrft' a short no
Model Peper-Ill, Q8(e) (I) Data validation
'
o•1• i , ·11 be at n systcl1l or il device and it (II) Dat, categorization for storage
ner.lk Wl . l ·1-,
....,e data gc th, Internet by using a gn CWl\y, 1e (Ill) Aasembly software for events.
11• ~,o:rd to c
.. trarts•t . us data ge11eratcs, Answer i Model Paper-I, QS(b)
~• thcvnrto
r ~,Ilg are tiCCS D1th\ {I) Data Vnlldatlon
~ "-'ti'-e oe , ice possc~s u power source uml it is
, 'f\ ad'"c ~ troller, tiansccivcr and memory. In Dala validation is a process of performing validation
. .A wt
,-i ·th uucrocon .
. · aeneratcd at a system 01. a dcv,cc, chc_cks _on the data acquired and this can be done' by using data
II"'~ \}le data 1s e- . • valtdat1on s?ftware. This software applies several mies, logics
-r..,r. 1,.,1,es ccsofcommumc11t1on.
~-h. w ,l,c Nnscqucn an~ ~cmanl1c annotations on the acquired data to check its
((lil'l\lll vahd1ty. An application and its services should rely upon valid
~ ll'Ptes D treetlight sensor or wireless sensor · data so that the predictions, prescriptions, analytics, decisions
f.\' Mn,·c . RFIf ti • sxamples of active
. d .
ev1ces.
c O 1e e, are agreeable. However, the data acquired from a device is
,resoilt not always correct, meaningful and consistent, therefore data
-r,.J6 ·ve Oe"ices Data .
·
,.111 passi . d ·ce does not possess a power source, so validation is nec~ssary.
\"' A~ss1re r evi d fi
l power source to generate an trans er A large extent of data is obtained from several devices
externa • . ·
. ,.,,ireS an d •ces the data 1s generated at a system or a like automobiles, health devices in ICUs or wireless sensor
.... I. such eVt
l (('!a.,ta, . . Th
11 th consequences of commumcahon. ese networks, machines used in industrial plants or in embedded
tbC • __ ..A on e · h · ll
J.\ffe [)ali<'"' not be connected wit m1crocontro er, components etc. The data validation software uses important
"". 1113y or may
~~ and memory. A conta~tless card and a label or and necessary resources. It is essential to choose a proper
~,,-er th , examples respectively. strategy, so the plan of action can be any of the following,
~are eir
❖ Filtering of worthless data at a device or at a gateway.
~
l)JIPPtes ATM debit card are the examples of passive ❖ Regulating the frequency of data acquisition.
RflDOraD
❖ Cyclic scheduling of a set of devices in an industrial
~-jceS-
... Event Devices Data . system.
(ii) Jne ~ is generated by a device, only once, for an Thus, the data enrichment, data aggregation, data
.integration and data compression can be achieved at the
f'\'ent adaptation layer even before transmitting it into the internet.
£JJD1Ptes i:- . d. . . .
Traffic detection, dark Clllllatlc con 1tlons,ms~~1on (ii) Data Categorization for Storage
•. securify violation etc. All these above conditions The data that is valid, beneficial and appropriate is used
~the event and then the action is perf'onned i.e.; data is in business process, services and business intelligence. So, this
eeoeratedforonce. data can be classified into three types for storage purpose and
riv) Event-driven Device Data they are as follows,
In event driven, devices the data is generated by a (a) Store Data Alone
device, only once for an event. A copy of data is oQly stored so that it can be processed,
[umples referred and audited any number of times in the future.
❖ Suppose, if a monitor senas a command to a device, ,(b) Store Data and Results of Processing
then the device receives the command and executes
The data and the results of processing are stored so
it by using an actuator. After the implementation of
that ·they can be used for referencing and auditing in
necessary actions, the device senqs an acknowledgement
the future. This is also useful for quick visualization
regarding the completion of execution.
and report generation even without processing the data
+ Suppose i{ an application requests the device for its again.
status, then the -0.evice status is transmitted to that
application. (c) Store Results of Data Analytics
M Real Time Device Data The online and real time data has to be processed,-so the
results of processing and the results of data analytics
.~ real time device data th~ information is instantly
are stored. ·
~ to the servers by using internet.
faapJe. (d) Store Big Data
. ~ever an ATM service is used, it generates The data acquired from huge number of source and
devices can be categorized into Big data. This data can
~da~ whi~h is immediately sent to the server using
be stored as big.data onto a cloud or data warehouse on
~ is~us 10 thJS ~ay Online Transaction . Processing
unplemented m real time. server database.
(iii)
FUNDAMENTALS OF INTERNET OF THINGS
[JNTU-HYDERA13/\ij
Assembly Software for Events
3. ObJ~tt oriented Database
Assembly software can be d fi
,iect
component in an applt'c,..t· 111
e cd as a soilware ObJ . oriented database • is defin ed as
d d . a collecrton
u IOU t1lat can COi\ . objects that are used in object onen
and also attaches date t. 't gregatc the events te es\gn. o
. une s amp to the even ts.
~ach and every event contains a Event 10 Eumples
and dev1ce 1D assoct· "t d . . . ~
u e Wit 1l 11. , logic value Concept Bc>se, Cache.
I I (a) Event ID Database Management System (DBMS)
4.
. \\~then a device generates an event then it is alloc Database management system is a software that Perfoi
oted ll\
Wt an ID called event 10. the following functions,
(b) Logic Value Define a database
A logic value contams· I or 0. It sets or resul ·Support query language
ts based on
an event state. Logic value 1 indicates that an
event is ❖ • Generate reports
geo~er:ated but the action is not yet taken. Logic
d th value Create data entry screens.
m tcates at an event is generated and the actio , •!•
n is
also taken or the event is still not generated. DBMS basically ~onsists of set of interrelated data
an
(c) Device ID set of programs for accessing those data. The prim
ary objectiv
of Database Management System is to provide
an efficien
The device that generates an event is allocated
with an environment that makes the retrieval and storage of database
ID called device ID. information easier.
Examples
ACID Properties
A temperature sensor generates an event when These are the properties that a transaction should
the posses
temperature rises to a adjusted ·value or underneat in order to avoid failures during concurrent acces
ha s of a database.
threshold vali.ie. ACID is an acronym which stands for Atomicity,
Consistency,
A pressure sensor which ~xists in a boiler gen· Isolation and Durabilio/.
erates
an event when the boiler pressure surpasses a (a)
critical Atomicity
value which requires attention. · Atomicity ensures that the transaction is either execu
Q19. What are the various ways of orga ted
nizing the completely or not executed at all. Incomplete transa
ction
data? Explain. consequences are not valid. Consider the exam
ple of a
Ans wer : transaction involving crediting and debiting the accou
Model Paper-II, QS(b) nt.
The data can be organized into various forms If there is a failure during transaction execution
such as , then
objects, files, databa,se, relational and object orien measures are ta~en to get back the data in a fonn
ted database which
etc. was in, before transaction. This is taken care of by
trans-
action management component.
1. Database
(b) Cons isten cy
Data b¥e is defined as a collection of data whic
h can
be organized into tables. The database tables The·data in the database must always be in a consi
are useful for stent
retreiving and updating. A single database table state. A transaction that is carried on a consistent
file is called as data
flat file database. Every record is represented in should bring the data to another c~nsistent state
a row and they after
· are not related to each other. ' execution. However, transaction need not maint
· ain
consistency at intermediate stages.
2. Relational Database
It is ~he responsibility of ,tpplication program to
A relational database is defined as a collection ensure
of data consistency.
which are organized into multiple number of table
s. All these
tables are related with each other by using keys. (c) Isolation
Keys are the
special fields such as primary key, unique key
and foreign key. All transactions must run in isolation from one
another
that is e~ery transaction shou
Example ld be kept unaware of other
transactions and execute independently. The interm
. MySQL,Microsoft SQL Server, Oracle .database, ediate
Post res~lts generated by the transactions should
not be
GreSQL. available to other transactions. .
WARNING: Xerox/Photocopying of this book is a CRIMINAL
act. Anyone found guilty is LIABLE t0 tace
. . s.
LE'GAL proceeding
-~---:..---:----- ---:--~-::::-::::---:-::-:.----------__ :.:4· ,1
~1-4 . . . . Q 20, WhatlsAnalytlcs? Explain about analytics pha~"";,
~ n11ral1i11ty ' that data renrnms Ill a co11s1stcnt Answer l Model Paper-flt, O ' b)
v . pr0pertY ensutsl .,.. 1'his is ensured by loadit1g the
14) f!ttS ft the 1a1 u1 .,. Atrnlytics
te even a ~ d1•sk This task is handled by recovcry-
st11Jjfied data into • t of DBMS. It ensures 1h01 either Anulytics refers to decision making which is comple1 ly
111ou t componer1 . d. k I
flll!lgemen , din d data is loaded mto IS Ol' t l ~ basod on facts instead of intuition. It acts as a key factor fo r '-ie
Ill co111Plete 1110 ueld be sufficient to get thl.l modified
tftC , thatWO success of au enterprise as it provides business intelliger' e.
. foflltation . disk. This enables a system to recover Analytics ,arc used tel design or bui ld models by selecting ne
it'l • loaded m
dDta ts ten1 cra11h. appropriate data so the data must always be availa '.1 Je 'id
flet asys
s ted Database access ible. Further these models arc fi rst tested and th~n 1 ·d
ptstribU base partitions o single logicnl database for various processes and services. Moreover1 analyuc'l 11c;e
~ pistributed d;ta ments Later stores them on several
various kinds of techniques to obtain new informat1<>n, '' w
u)tiple data agconne~ted tlu·ough a single network.
iJl!O 01 t computers . . , parameters which adds even more value to the data. Somr )f
~ .i.>penden base Chataderistics . the~e techniques are arithmetic and statistical methods, [; ta
111'· ted pata .
llkttibll •sties of a distributed database are as mining, Machine learning etc.
11·· fb.e characten .
Phases of Analytics
follow-s, ·cal Relation . .
The process of analytics is classified into three ph:, ~s
(') i,ogt. 'buted database is c1;ms1dered as a collection of
1lte distrl ba5es which have logical relationship among and they are as follows,
several data .
1. Descriptive Analytics
dJelll•
franspareocy . . . , D~scriptive analytics provides infoTT?ation to a qu' -y,
(b) y exists m between databases 1.e., a database bas.ed on the historical data or the data that is collected in ,e
rransparenccess data from all the databases providing past. It groups the data find the mean value, variance va I e,
~,..,pr can ac ,
...,.._ • to the users that they were using only single
an illUSIOD · number of occurrences of an item and aggregates or so , e
database. specific properties. Moreover, Descriptive analytics provi.,;;!S
ation Independ,ent OLAP, Data visualizations, generating spreadsheets and ' ~y
(c) ~ -b t~ database must be location independent i:e., perfonnance indicators.
Distrl ~ m user should not know about the location of
tbe sys andealso 1 1ty of movmg
· about the poss1·b·1· · data fr om · The Descriptive Analytics methods are as follows,
I
· :location to another without any affect. (a) .Spreadsheets and Data Visualization
6.CAP Theorem . . . . . Spreadsheets provides the results of descript, ;e
· CAP theorem states that 1t 1s not possible for a d1stnbuted analytics .in the fonn of a table. It contains values in
~ to guarantee all the three features. These three features are,_ its rows and columns but within a cell. Each and e, ,. ry
(a) Consistency value in a cell is related to another cell or a group
(b) Availability of cells. Here, the relationship can be a formula 0 a
(c) Partition tolerance. statistical value or boolean relation etc. T~us, in t'~is
w•ay spreadsheets.provide user visualization of the de.a.
(b) Descriptive Statistics based Reports and D· ta
Visualizati~n
Descriptive analysis uses descriptive stati$tics wh ~h
refers to computing the minima, variance, peak ,, 1d
probabilities etc. It uses the formulae for applying ,m
the data sets to provide easily understandable cu ta
visualizations to the users.
(c) Data Mining and Machine Learning
I Consistency and availability Data mining uses algorithms to extract hidden patterns
It predict-s the future trends, identifies patterns and Q21. '=xplain in detail about big data analytics.
clusters with same behaviour. It also performs preventive Model Paper-I, Q9(a)
Answer:
maintenance by observing the device failures that have
occurred in
the past. Moreover, it implements integrated Big Data Analytics
marketing strategies. It also prov(des predictions based on the The big data analytics like Hadoop~ NoSQL and
anomalous detection and anomalous characteristics.
Cassandra will support the big data archi~ectures/mfyastru~res.
3. Prescriptive Analytics The storage. of big data is the storage infrastructure which is
designed to store, manage and retrieve large amount. of data.
Prescn)>tive analytics considers business mies descriptive
analytics inorder to provide infom1ation regarding, It stores and formats the data in the storage such .t hat it can be
accessed, used or 'processed by the applications -e~sily. The
(i) What will happen? st~rage of big data is :flexible in its scale. It allows mpu~ and
(ii) When it will happen? output operations with huge m.qnber of data files and obJects.
The storage is built using the DAS (direct attached storage)
(iii) Why it will happen?
pools, (NAS) scale out or clustered network attached storage or
It suggests predictions along with necessary actions infrastructure based on the object storage form-¼t. The storage
considering a set of rules and inputs. infrastructure is c0nnected to the computing server nodes that
allow to process and retrieve the large amount of data.
There are some more types of analytics, they are as
follows, Mo~t of the compan~es are applying
I
th~big data analytics
.
l. Event Analytics to achieve greater intelligence from metadata. The big data
storage makes use of low cost hard disk drives, even though
Event analytics requires event data for tracking and the prices of flash permitted to use flash in servers and storage
reporting the events. Every event consists of the following systems as the base of big data storage. The storage system
components,
will gather_ multiple servers connected to high capacity disk
(i) Category t0 supp,ort analytic software that is written to test the data. It
depends' upon the database~ that are connected parall_ely to
(ii) Action
analyze the data that is retrieved from various sources. Because
(iii) Label of this the big data might, not have proper structure leading to
(iv) Value. complexities whil'e processing. Various components of Big Data
storage include HDFS, NoSQL, Hadoop, MapReduce, Server
2. In Memory Data Processing and A~alytics nodes etc.
In some specifi~ database, there exists an option for
' The Apache Hadoop Distributed File System (HDFS)
selecting row format or column format with in the memory.
is one of the common big data engine combined with partial
The two options are as follows,
NoSQL database . Hadoop is the software coded in Java
(i) In Memory and on Store Row Format Option language. The HDFS advances the data analytics among various
In this format option there are few rows and many server nodes without the need of performance hit. Mapreduce
columns. Each and every row is associated with several component allows Hadoop to distribute the processing in the
columns. The data accessing becomes easy as the entire form of safeguard against catastrophic failure. Various nodes
data in the row is brought into CPU by single memory ·act as the data analysis platform at the end of network. The
reference. The -row format is beneficial for perfonning Map.reduce will execute the processing directly on storage
OLTP operations such as inserting, updating, querying node of data. Then it collects the results from the server and
etc. compresses to produce sin~le CQhesive response.
WARNING· Xerox/Photocopying of this book is a CRIMINAL act. Anyone found guilty is LIABLE to face LEGAL .
' , proceedings.
r-• ~~~~™:;~~~~~~~~~~ ~::::::---:-:------~~
~~1 ()rie n,ore fcn1111·t' or Hadoop is that. it receives
. .
. 4.13
datn in unstructured for 111 t h
not mean
a SUc as audi 'd
i~ replacing other rclat1011al But o, and text. It
tcdmologics hke database. it Ille 1.. v, eo does
1111do0 P • . ans 111 at hoth the H d
1,~t I r·illdv. rhe innss1vcly pnrnlkl rclntionol. ph1tlorr11 for example <le• . h a oop and relational databases
a1s wit the hi h I
l
1 idcu pa • . . . . . ,
Arc Pl ., 1fopttal and suppn1 ts tlw ll'nturl' ol rcce1vmg dntn 1cciucsts front 1, _g va ue transactional data which is
u:tuicc L scrs and appl1c t' .
i~ ~tfl ,c kvcl sccurit\'. a ions with performance guarantee
dc11tcrr1 s 1 • •
11
'h, 01w or th1: ledmology 1hut den ls with big cl I
" 1c • l~ad~,op cco system is a a ot her technoJo · dc
. ta frotll databases nrc, gres use 1or selecting and access-
1
. btgua •
111~ • t' . 1· 1ructurc.
Atna,on wch service cw m rns
~
J\p:tchc nnFS fi.w distributed file system.
~
M.ar,Reduce for distributed progrmnming r'nodel.
Cassandra or HBase.
}-{Base stores data in colwnn . • .
format. It grants read or write access .penrnss1ons
. . to the users on several database tab! tha
. 'buted in HDFS. It provides random access with quick look-ups and 1 es t
ate d1stn ess acc~ss 1atency.
.,..---Illustrate the architecture of data analytics. ·
Q2.2
z\OSwer : Model Paper-II, Q9{b)
'I,
It '
I •
1stnbuted file st
,
s, . ns
• · em nssonah:' I .
l tstuhutcd tlk system. The
3. Secondary Namenode
referred lo as HDrs (H· d . l . with Hnllllop is typically 4, JobTracker
has hecn drsign 'd . , a _oop Dts~nh\ltcd File Systet\1). I-IDFS
5. TaskTracker.
gigah) tes or krab o1_ ,stonn
. . c g tnassi ' ',
, , "" .
· · \ c <1mount hke mega bytes,
i ~ ks ot data wl . ·I Namenode
aC'cess pat1e111 an , lie 1 support streaming data I.
. ' \1so on. Mot\'ov ·1
copies of the data t ti · er. 1 ensures avnilability of Namenodes manage the filesystem tree and
' 0 ,c cmi user by · · . · · st
remote lt1cation The. h , , .. savmg its copy on different metadata for all files an d dtrecto nes m persistent ore8
. . te ) prov1d111g parallel processing. uses two files for this purpose. They are namespace i geStora
111e HDFS ·ct edit log. Namenodes have the information of data no:age
prov1 es supports the following, . .'. .
store all the blocks of m1orm atton Perta'mmg
.
to a spe es. 11ih
l , Storing files of larger size This information can be re-create d from the nodes asCtfic •1~
2. Stteaming data access the system starts. Hence, the namenodes does not k SOon
• •
blocks on persistent storage. eep,
I
3. ·
Commodity hard~are.
2. Datanode
l. Storing Files of Larger Size
Datanodes store and retrieve the blocks of info .
. •
File~ of larger size refers to the files that are capable of . upon request. They also generate reports to the narnl'lllat
stonng Terra Bytes
(TB) of data. whenever they( store blocks of information. This n:
2. responsible for carrying out the operation allocated to i:J
Streaming Data Access
master node. They typically perform read and write operati I
The data stor~d on HDFS follows "writing once and ~etweeq. HDFS blocks and local filesystem. These op~rat~
involve splitting offiles into blocks distributed over diff~~1
reading several times data processing pattern. Here a . . en
data node according to the mstru ct10ns provided by ~
data set is simply copy from the source location and
NameNode. After this, the client will be able to connectlVi~
analyzed. While performing data analysis, the maj~r .their
respective DataNodes to carryout their functionalitiJ
, focus should be made on reading most (or) all of the These DataN
odes perform inter-node communication
data rather than providing high latency with ~espect to carryout data replication on atleas
t 3 datanodes.
the first record. ', · DataNodel bataNode2 DataNode3 DataNode4
3. Com ~odit y Hardware
TaskTracker
Clierit Job Tracker with MapRe- node2
duce Task
TaskTracker
withMapRe- node 3
duce Task
TaskTracker
Data Node ·
t
TaskTracker
. Figure: Hadoop Cluster
~N The above figure illustrates the topology ofHadoop cluster. It can be inferred from above figure that the masterNod~ ru~s
~~eNode and JobTracker daemons. If failure occur, the standalone node with secondary NameNode is used. If the cluster _is
Oll-~otbe ~ondary NameNode reside on one single slave Node. Whereas,on large clusters the NameNode and JobTi:acker he
machines. The slave machines uses DataNodes and TaskTrackers so as to run the tasks on the node where d~ta is st0 red.
I4
Field Oe11cripllou
-----........_._
23759 Identifier to represent wcuthcr station
2306201 5 Dote
0530 Ti111c
+ 472 13 x I000 degrees latitude
+ 231 21 x l 000 degrees longitude
-j- 523 Elevation in meters
3 Quality code
s Southern direction
1 Quality code
s South
15000 Visibility distance
1 Quality code ·
N Towards north
' 12 Temperature of air in Celsius
1 Quality code
-14 . Temperature of dew point in Celsius
1 Quality code
12139 Atmospheric pressure in hectopascals
1 Quality code
It can be observed from the origiJ:!al record that the fields' included in it are written continuously without any delimiters
between them. The table describes the tern1~ used in the record. Typically, the records are arranged in terms of date and the ~ eather
station where these records are placed in its respective direct~ry. The direct·ories are created in an year-wise manner which carries
fi1es of weather data associated with each weather station. The directory is composed of several small files _because there exist
many weather stations. The process of handling small number of large files is simpler and efficient.
MapperCode
A mapper in MapReduce is responsible for providing parallelism. The TaskTracker contains the mapper and processes it.
The code associated with mapper is referred to as mapper code. The logic used in the mapper code must be capable of execµting
independentJy. It should be capable of performing the parallel tasks mentioned in the algorithm. The input format resides in driver
program of specific InputFormat type or the file in which the mapper is executed. The output of mapper might be mapper and
value that are set in the mapper output. It is stored in an intermediate file that is specifically created in OS space path. Operations
such as read, s~uftle and sorl: are perfonned on this file.
This four parameters specify the type of inputs and outputs associated with the mapper function.
StTingTokenizer t = new
StringTokenizer(value, toString( I));
• while(t.hasMoreTokens( ))
word.set(t.nextToken( ))
context.write(w, iw);
} I •
}
}
Q28. Write short notes on reducer code . .
Ar-sw er:
Reducer Code
Reducer is capable of reducing the inte~ediate values all of.them
which share the key to smaller set of -yalues. A reducer
in \fapReduce performs three major operations. They are,
I. Shuffle
2. Sort
3. Reduce.
1_. Shuffle
It is responsible for collecting the inputs and generating the mappe
rs in a sorted order. It uses HTTP protocol to retrieve
tht required partition of the output of mappers.
2. Sort
l'
I · Jt is responsible for arranging the reducer keys'in various orders
typically using merge sort approach.
-
I.
3. Reduce
This phase of reducer uses a method reduce( object, Iterable, contex
t) which relates to <key, (collection of values)> associated
wi h every input in the sorted array. It uses TasklnputOutputContext.
Write( object, object) to forward the results generated from
rec uce( ) to RecordWriter.
The general format of reducer code can be written as,
org.apache.hadoop.mapreduce.Reducer<INPUT_KEY, INPU
T_VALUE, OUTPUT_KEY, OUTPUT_VALUE>-
lt c~n be observed that it also uses four parameters similar to
mapper code.
. .. org.apachc.ht\doop.ioi
,111pOll
apachc.hadoop.io. Int Writublc;
j1nport Org •
-a apache.hadoop.io.Text;
inlP0 rt 01 0 •
ache.hadoop.mapReduce.Reduccr;
in1port org •'ap
duce. Reducer. Context context
protected void reduce(KEYIN key, ltera~le <VALUEJN> values, org.apache.hadoop.,mapre
Exception , InterruptedExcept1on ·
throws IO
>
public class WordCountReducer extends Reducer< Text, JntWrita~Ie, Text, IntWritable
int total == 0;
for(lntWritable value : values)
total +==value.get( );
} .
cnt.set(total);
con.write(key, cnt);
In the above example, the four parameters are Text, Int Writable, Text and IntWritable
in which, the first two are input types
each of the words i.e., the
and theremaining are output types. Here, an iterator is used to move across all the words and counting
total result is provided as output.