Professional Documents
Culture Documents
Scanneroptimised PDF PDF Free
Scanneroptimised PDF PDF Free
8TH SEIVr
CSE/ISE
THREE CBCS MODEL QUESTION PAPERS &
PREVIOUS YEAR QUESTION PAPERS SOLVED
,? Network Management
As Per New Syllabus ofVfU 2015 Scheme
Choice Based Credit System(CBCS)
ALL IN ONE
SUNSTAR EXAM SCANNER
B.E.
STH Sem CSE/ISE
CBCS MODEL QUESTION PAPERS (WITH ANSWERS)
&
PREVIOUS YEAR EXAM PAPERS (WITH ANSWERS)
SUNSTAR PUBLISHER®
#4/1, Kuppaswamy Building, l 9lh Cross,
Cubbonpet, Bangalore - 560002
Phone:08022224143
E-mail: sunstar884@gmail.com
SUNSTAll EXAM SCANNER for 8,. Sein B.E. CSi/ISE, V. T. U
Authored by A Team of Experts and Published by Sunstar Publisher, Bangalore - 2.
©Authors
-----------.------------------,,
Pages : IV + 364 No of Copies : 1000
Paper used : 11.2 kg (58 GSM) Sripa thi
First Edition : 2020 - Book Size : 1/4th Crown
=l
is limited to replacement within one month
Sri Manjunatha Printers of purchase by similar edition. All expenses
Bangalore in this connection are to borne by the
purchaser.
All disputes a re subject to Bangalore
Jurisdiction only
, Bangal ore - 2.
Sripa thi
Dedicated
has been made
·ssions in this
to
All ~ngineering Students
c next edition. It
ublished nor the
onsible for any
any one, of any
' om.
e reproduced or
fD'ea ns (graphic,
luding. taping or
Oorreproduced
media or other
.tc., without the
lishers. Brea th
legal action.
,prints or. for
isher's liability
hin one month
n. All expenses
orne by the
o Ilaog,lott I_
____________________________________________
11 CONTENTS II
3. Netwo,rk Manageme_nt
► CBCS Mo_del Question Paper - 1 03 - 19
► CBCS Model Question Paper - 2 20- 36
► CBCSJune I July 2019 01 - 13
► CBCS December 2019 I January 2020 14- 20
SUNSTAR
---
-
- - - - - - ]
SUNSTAR EXAM SCANNER
- - - - - - - - - - - - -- - - - - - - - ---
MODULE 1
What is loT. Ge1~esis of loT, loT and Digitization, loT Impact, Convergence of IT and loT, loT
Challenges, loT Network Architecture and Design, Drivers Behind New Network Architectures.
Comparing loT Architectures, A Simplified loT Architecture, The Core loT Functional Stack, loT
Data Management and Compute Stack.
MODULE2
Smart Objects: Ti\e "Things" in loT, Sensors, Actual ors, and Smart Objects, Sensor Networks,
Connecting Smart Objects, Communications Criteria, loT Access Technologies.
MODULE 3
IP as the loT Network Layer, The Business Case for IP, The need for Optimization, Optimizing
IP for IoT, Profiles and Compliances, Application Protocols for loT, The Transport Layer, loT
Application Transport Methods. ·
MOOULE4
Data and Analytics for loT, An Introduction to Data Analytics for loT, Machine Leaming, Big
Data Analytics Tools and Technology, Edge Streaming Analytics, Network Analytics, Securing
loT, A Brief History ofOT Security, Common Challenges in OT Security, I low IT and OT Security
Practices and Systems Va,y, Formal Risk Analysis Structures: OCTAVE and FAIR, The Phased
Application of Security in an Operational Environment
MODULES
loT Physical Devices and Endpoints - Arduino UNO: Introduction to Arduino, Arduino UNO,
Installing the Software, Fundamentals of Arduino Programming. loT Physical Devices and
Endpoints - RaspberryPi: Introduction to RaspberryPi, About the RaspberryPi Board: llardware
Layout, Operating Systems on RaspberryPi, Configuring Raspberry Pi, Programming Raspberry Pi
with Python, Wireless Temperature Monitoring System Using Pi, OS I 8B20 Temperature Sensor.
Connecting Raspberry Pi via SSH, Accessing Temperature from DS 18820 sensors, Remote
access to RaspberryPi, Smart and Connected Cities, An loT Strategy for Smarter Cities, Smart
City loT Architecture, Smart City Security Architecture, Smart City Use-Case Examples.
Eighth Semester B.E. Degree Examination,
CBCS - Model Question Paper - 1
INTERNET OF THINGS TECHNOLOGY
Time: 3 hrs. Max. Marks: 80
Note : Am wer any FIVE /111/ q11esfio11s, selecting ONE /111/ quest/011 from each mod11/e.
Module - I
I. a. What is IOT'! Explain the features of IOT (08 Marks)
Ans. The most important features of IOT arc
Connectivity: Connectivity refers to establish a proper connection bcn,vccn all the
things of IOT to lvT platform i\l may be server or cloud. Afler connecting the JOT
devices. it needs a high speed messaging between the devices and cloud to enable
reliable. secure and bi-directional communication.
Analyzing: Aller connecting all the relevant things. it comes to real-time analyzing
the data collected and use them to build effective business intelligence. If we have a
good insight into data gathered from all these things. then we call our system has a
smart syste111 .
Jntcgrating: IOT integrating the various models to impro\'e the user experience as
well.
Artificinl lntelliJ!ence: IOT makes things smart and enhances life through the use
of data. For example, if we have a coffee machine whose beans have going to end.
then the coffee machine itself order the coffee beans of your choice from the retailer.
ScnsinJ?: The sensor devices used in IOT technologies dc1cct and measure any
clrnnge in the environment and report on their status. IOT technology brings passive
networks 10 active ne1works. Without sensors, 1here could 1101 hold an effective or
true IOT environmcnl.
Active Engagement: JOT makes the connected technology. product, or services to
active engagement between each other.
Endpoint Management: It is important to be the endpoint management of all the
IOT system otherwise; it makes lhe complete failure of1hc system. for example, if
coffee machines itself order the coffee beans when it goes to end but what happens
when it orders the beans from a retailer and wi: are not present at home for a few
days. it leads to the failure of the IOT systt:m. So, tht:re must bll a need for endpoinl
management.
b. Explain the IOT Applications (08 Marks)
Ans. The versatility of IOT has become very popular in recent years. There are many
advantages to having a device based on lvT. Mckinsey Global Institute reports
that IOT business will reach 6.2 trillion in revenue by 2025. There are lo1s of
applications are available in the market in different areas.
J ) Personal Home A11tom11tion Syi;tem: Home Automation system is the m,~or
example in this area.
3
VIII Se,rn,, ( CSE/ISC:) In.tern.et-of~ T ~
2)Wemo Switch Smart Plug: It is the mol>t useful devices which connected home
devices in the Switch. a smart plug. It plugs into a regular outlet, accepts the
power cable from any device, and can be used to turn it on and off on hit :1
button on your snrnrtphonc.
3) Enterprise: In the enterprise area many applications are there Like environmental
monitoring i.ystem. smart environment etc.
4) Nest Smnrt Thermostat: It is connected to the internet. The Nes t lenrns
automaticnlly your family 's routines and will automatically adjust the
temperature based on your acti\'ities, to make your house more efficient. There
is also a mobile app which allows the user to edit temperature and schedules.
5) Utilities: smarl metering, smarl grid, and water monitoring system are the
most useful applications in the various utility area.
6) Energy Management: Advanced Metering Infrastructure is the major example
in this area.
7) Medical and Health Care: Remote health monitoring and emergency
notification system are Cl\amples of JOT in the medicalfield.
8) Health patch Health Monitor: It can be used for the patient who can't go to
doctors. letting them get CCG. heart rate, respiratory rate, !.kin temperature, body
posture. fall de1ec1ion. and accivity readings remotely.
9) Transportation: Electronic toll collection system is the most useful example in
this area.
10) Large scale deployment: There are various large projects ongoing in the world.
Songdo (South Korea), the first of its kind fully wired Smart City, is near
completion. Everything in this cit) is planned to be wired. connected and turned
into a data strea111 that would be monitored by an array of computers without any
human interaction.
11) Another ex:imple is the Sino-Singapore Gunngzhou Knowledge City work
on improving ~,ir nnd wnter qu:1lity. French company Sigfox commenced building
an ultra-narro,,band wireless data network in the San Francisco Bay area in 2014.
Another ex:unple is the one completed by New York Wnterwnys in New York
City to connect all their vessels and being able to monitor them 24n.
So these are large applications are present in the market which is based on loT.
This world is going to become a beller place to live with more communication
with everyone. In near future large number of de" ices connected to the internet and
provide great facilities to the world.
OR
2. n. Explnin the IOT network architecture components. (10 Marl<S)
Ans. A "thing·• is an object equipped with sensors that gather data which will be transferred
' over a networl- and actuators that allow things to act (for example, to switch on or
off the light. to open or close a door. to increase or dccreasc engine rotation speed and
more). This concept includes fridges, street lamps, buildings. vehicles, production
machinery. rehabilitation equipment and everything else -imnginable. Sensors are
not in all cases physically attached to the things: sensors ma) need to monitor, for
example. what happens in the closest environment to a thing.
Gateways. Data goes from things to the cloud and vice, ersa through the gateways. A
gateway provides connectivity between things and the cloud part of the loT solution.
enables data preprocessing and filtering before moving it to the cloud (to reduce the
volume of data for detailed processing and storing) and transmits control commands
going from the cloud to things. Things then execute commands using their actuators.
Cloud gateway facilitates data compression and secure data transmission between
field gateways and cloud loT servers. It also ensures compatibility with various
protocols and communicates with field gatcwa) s using different protocols depending
on what protocol is supported by gateways. ·
Streaming data processor ensures eflective transition of input data to a data lake
and control applications. No data can be occasionally lost or corrupted.
Data Lake. A data lake is used for storing the data generated by connected devices
in its natural format. Big data comes in "batches" or in ''streams". When the data is
needed for meaningful insights it's extracted from a data lake and loaded to a big
data warehouse.
Big data warehouse. Filtered and preprocessed data needed for meaningful
insights is exfracted from a data lake to a big data warehouse. A big data warehouse
contains only cleaned. structured and matched data (compared to a data lake which
contains all sorts of data generated by sensors). Also. data warehouse stores context
information about things and sensors (for example, where sensors are installed) and
the commands control applications send to things.
Data analytics. Data analysts can use data from the big data warehouse to find
trends and gain actionable insights. When analyzed (and in many cases - visualized
in schemes, diagrams, info graphics) big data show, for example. the performance
of devices. help identify ineffi-:iencies and work out the ways to improve an loT
system (make it more reliable. more customer-oriented). Also, the correlations and
patterns found manually can further contribute to creating algorithms for control
applications.
Machine learning nnd the models ML generates. With machine learning, there
is an opportunity to create more precise and more efficient models for control
applications. Models are regularly updated (for example. once in a week or once in a
month) based on the historical data accumulated in a big data warehouse. When the
applicability and efficiency of new models are tested and approved by data analysts.
new models are used by control applications.
Control applications send automatic commands and alerts to actuators. for example:
• Windows of a smart home can receive an automatic command to open or close
.. depending on the forecasts taken from the weather service.
• When sensors show that the soil is dry, watering systems get an automatic
command to water plants.
• Sensors help mon.itor the state of industrial equipment, and in case of a pre-
failure situation, an loT system generates and sends automatic notifications to
VIII Se-mt (CSf(ISf) I n.tunet o f ~T ~
field engineers.
The commands sent by control apps 10 actuators can be also additionally stored in
a big data warehouse. This may help investigate problematic cases (for example.
a control app sends. commands, but they are not performed by actuators - then
connectivity. gateways and actuators need to be checked). On the other side, storing
command~ from control apps may contribute lo security. as an loT system can
identify that some commands arc too strange or come in 100 big amounts which ma)
evidence securit) breaches (as well as other problems which need investigation and
corrective measures).
Control applications can be either rule-based or machine-learning based. In the first
case, control apps work according to the rules stated by specialists. In the second
case, control apps arc using models which are regularly updated (once in a week.
once in a month depending on the specifics of an loT S)Stcm) with the historical data
stored in a big data warehouse.
Although control apps ensure belier automation of nn loT system, there should
be always an option for users to innuence the behavior of such appheations (for
example, in cases of emergency or when it turns out that an loT system is badly
tuned to perform certain actions).
User applications: Arc a software component of an loT system which enables the
connection of users to an loTsystem and gi,es the options to monitor and control their
smart things (while they arc connected to a network of similar things. for example.
homes or cars and controlled by a central system). With a mobile or web app. users
can monitor the state of their things, send commands to control applications, set the
options of automatic behavior (automatic notifications and actions when certain data
comes from sen<;ors).
b. Write explanatory note on JOT Data Management. (06 Marks)
Ans. loT data management enables businesses to discover usage patterns. It also challenges
assumptions m:ide during design :ind development phases. identifying weaknesses
in connected devices. In other words - it helps create the best connected products
possible. Before the product is released, loT data management allows to conduct a
field test.
I
they can with stand. When we consider that vehicle recall costs can run into millions
in compensation claims (not to mention the cost to your reputation), this is a no-
brainer. Collecting field data is also an important step post-lounch. we can provide
continuous product improvements with soHware updates and get important insights
for your next version. Througttout the producl lifetime, these insights will support
the development process of new products and additional ill:rations.
Module - 2
3. I , Whal arc Smart obj1..-cls in IOT? Explain {06 Marks)
Ans. The concept of smart in loT is used for physical objects that arc active, digital.
networked. can operate to some ex.tent autonomously. reconfigurable and has local
control of the resources. The smart objects need energy. data storage, etc. A smart
object is an object that enhances the interaction with other smart objects as well as
with people also. The world of loT is the network of interconnected heterogeneous
objects (such as smart devices, smart objects. sensors. actuators, RFID. embedded
computers. etc.) uniquely addressable and based on standard communication
protocols. In a day to day life, people have a lot of object with internet or wireless or
wired connection. Such ns:
• Smartphone
• Tablets
• TV computer
These objects can be interconnected among them and facilitate our daily life (smart
home, smart cities) no matter the situation, localization. accessibility to a sensor.
size, scenario or the risk of danger.
Smart objects arc utilized widely to transform the physical environment around us
to a digital world using the Internet of things (loT) technologies. A smart object
carries blocks of application logic that make sense for their local situation and
interact with human users. A smart object sense. log, and interpret the occurrence
within themselves and the environment, and intercommunicate with each other and
ex.change information with people.
The work of smurl object has focused on technical aspects {such as software
infrastmcture, hardware platforms, etc.) and application scenarios. Application
areas range from supply-chain management and enterprise applications (home and
hospital) to healthcare nnd industrial workplace support. As for human interface
aspects of smart-object technologies arc just beginning to receive attention from the
environment.
b. What arc Sensor Networks in JOT? Mention the churactcristics of IOT.
(10 Marks)
Ans. A Wireless Sensor Networl-. is one kind of wireless network includes a large number
of circulating, self-directed, minute, low powered devices named sensor nodes
called motes. These networks certainly cover a huge number of spatially distributed,
little, battery-operated, embedded devices that ore m:tworkcd to caringly collect,
process. and transfer data to the operators, and it has controlled the capabilities of
,
VIII Sem., (CSE/IS£)
computing & processing. Nodes are the tiny computers, which work jointly to form
the networks.
The sensor node is a multi-functional, energy efficient wireless device. The
applications of motes in industrial are widespread. A colleclion of sensor nodes
collects the data from the surroundings to achieve specific application objectives.
The communication between motes can be done with each other using tcansceivers.
In a wireless sensor net\\ork, the number of motes can be in the order of hundreds/
even thousands. In contrast with sensor n/ws, Ad Hoc networks will have fewer
nodes withou.t any structure.
Wireless Sensor Network Architecture
The most common WSN architecture follows the OSI architecture Model. The
architecture of the WSN includes five layers and three cross layers. Mostly in sensor
n/w we require five layers, namely application. transport, n/w. data link & physical
layer. The three cross planes arc namely power management, mobility management.
and task management. These layers of the WSN are used to accomplish the n/w
and make the sensors work together in order to raise the complete eHiciency of the
network.
Characteristics of Wireless Sensor Network
The characteristics of WSN include the following.
• The consumpti_?n of Power limits for nodes with batteries
• Capacity 10 handle with node failures
• Some mobil~ty or nodes and Heterogeneity of nodes
• Scalability to large scale of distribution
• Capability to ensure strict environmental conditions
• Simple to use
• Cross-layer design
Advantages of Wireless Sensor Networks
The advantages of WSN include the following
• Network arrangements can be carried out without immovable infrastructure.
• Apt for the non-reachable places like mountains. over .the sea, rural areas and
deep forests.
• Flexible if there is a casual situation when an additional workstation is required.
• Execution pricing is inexpensive.
• It avoids plenty of wiring.
• It might provide accommodations for the new devices at any time.
• It can be opened by using a centralized monitoring.
OR
4. a. How do we connect Smart Objects in IOT? Explain (08 Marks)
Ans. In a world of smart networked devices, wearables, sensors and actuators, transparent
and secure access to and usage of the available resources across various IoT domains
is crucial to satisfy the needs of an increasingly connected society. The current loT
ecosystem is however fragmented: a series of vertical solutions exist today which,
8
on the one hand. integrate connected objects within local environments using
purpose-specific implementations and, on the other hand. connect smart space.<.
with a back-end cloud hosting dedicated otien proprietary sotlware components.
symbloTc (symbiosis of smart objects across IoT environments) comes to
remedy this fragmented environment by providing an ah.\lractio11 layer for a
""unified view'" on various platforms and their resources so that platform resources
are transpl)rent to application designers and developers. In addition, symbloTc also
chooses a challenging task to implement loT platform federativ11.1· so that they can
St:curely interoperate. collaborate and share resources for the mutual benefit. and
what is more. support the migration of smart objects between various loT domains
and plalforms. i.e.. "smart object rva111i111('. symbloTe will achieve all of the above
by designing. and implementing an Open Soun·e mediatioi, prototype. In light of
the above, symbloTc opens up the potential for innovative business models that are
incrementally deployable. This is especially imporlant for SM Es who arc symbloTe's
primary targt:I group. Moreover, symbloTe removes the strict separation between loT
islands to ere;:ate ,in environment which i) has significant impact on the market since
it bridges individunl efforts nnd investments. ii) is attractive for a heterogeneous user
group. iii) malclu:s the dynamicity of modern life dut: lo case of use and iv) is helpful
for various businl'ss, home and public infrastructure use cases.
c,.n,pus
J;;;;!I Office
lluilding ~
g Q (:] ~ o~
/
□ Campus
B~ing. . . . . - -
'-
~G)
Platfonn 1
Stadium
Publlc
Q 1ransporr
□~
b. What arc the nchvork technologies for IOT connectivity'! (08 Marks)
Ans. The 3 m,,in network technologies for loT Connectivity are:
I. Standard Wireless Access - Wif.'i. 2G. 3G and standard LTE.
2. Priv~te Long Range - LoRA based platform. Zigbee, and SigFox.
3. Mobile loTTechnologies- LTE-M. NB-loT, and EC-GSM-loT
There is no doubt that loT is driving a lot of changes in connectivity (among many
9
VIII Sem, ( CSE/IS£) I ritern.et o f ~ r ~
other infrastructure technologies). The rule that connected devices impose over
networking technologies is not the same as in our mobile phones..
Standard Wireless Access - WiFi, 2G, 3G and st:mdard LTE
This is a no-brainer move. There are already plenty of devices that use this. such as:
• Smart-TVs ·
• Gaming consoles
• Panic buttons
• Video surveillance
• eHealth
• Flt:et tracking
• Industrial loT
It's an ..obvious choice'' for providers and consumers to continue using their current
internet access (Wi-Fi, 2G, 3G. LTE, etc) as the primary network option for their
home appliances. /\s always, there is a catch. the power consumption is quite a thing
for cordless devices. Most devices using this network access are statically connected
to a power outlet (or big batteries). In case of using a mobile network, another
·constraint is that their data plan acquired was thought for phones. This way it's quite
expensive if you are thinking in multiples devices per user.
Private Long lfange - LoRA based platform, Zigbee, and SigFox
The requirements of low power connectivity opened a window to private companies
to develop new networks with that specific constraint at their core which arc loT
native. These private wireless networks are used for the specific proprietary network.
It's also to deploy a network which allows third-party devices to connect and build
an ecosystem.The three leading technologies in this area arc, LoRAWAN, Zigbee.
_. ,.- .......
........_,._.,
.. ..., ......
- - • 1 " t ,.. . ......, .. ~
............
•I
1.-.. ,~
111'!o• ........ I I
..-
,: ..-;: 1•,n•'""'''
and SigFox.
Module - 3
5. a. Explain the business case towards IP. (08 Marks)
Ans. Much of the technology needed for the Internet of Things {loT) has been available
for some time, but connecting a world's worth of devices to the cloud requires more b. What arc Ligh
than just an Internet connection. For loT deployments to be successful. business Ans. /\long with ph
models and pain points for players on both the device side and network side must protocols for I
application mes
be identified, followed by solutions that can collect and analyze data securely and
efficiently. Lightweight M
protocol betwe
and are thus re
are as folio,
information be
this class treat
deleted, and re
can use differe
LWM2M proto
send messages
10
C8CS · tvfodel,Q~wn,pape,,- - l
TheClood Tetcos
e,ao,-...,t. "°"'°'
c&.,.twt...,i.-..:1~,.._
lllllll'< cb,d. t,Ot ~ TT \• --...r, -.:.t '" .lo, t} M•
ff~IIM~lhtup~,loM,1,,,1,ncf;. I. l!iUI
.,..._.,.,,....~ ..~.J-,..•- •H,o,.Mi.. D,111
.....9119'"''•""'-'-"
~
-,
Gateways J
S1;'1,.UtH;
frvi1n,mf\fo,(<J•h,,,rh,.,-, 1,11,,.1
..... ,
l•"' t\.\l<f",., l,1,11 ,f ,,.
,,1 ... ..,, .......r ,,...~.,,.., 1,,, ou•l•ou.,..._a-,.,.<1><•
..,.,,Lt,,, t..\l<t tl,.,1 ,l,. l;y .,,,r,_., , . j ,.., t r•lf"'1tf,1,l.l ,rt,"\.,,,
...,. , .•, .... .,..,,~,S.1,0:11 t, .... 0 1 ,~ • .......... J,,.....,.,.c;i
•ly•~J J•,-.f- •,J o,, ..... I 11 •~ ,, ,- lo tt,+IJ~111 •\•I\O(H
.,,ti,•,. ,, ..., ,, ......,,..-fT',
S~n~or Networks
Edge Management
llt-• .. ,,._,.,,_t••l-,..-..,,btu• ~ - '•"ti -t
,Ol..._..............frO<on1~-,..,..,, ..,•• ,..f,,•,..-,1 .,.,
.......,.,,.._,,"4.,....,-......... t.r• ....,
r.~. ~
,. ,j ,,.,.,,~,•.-..-.&el,,..,, •-.i
•.a,H,11 ..., , . 0 0
,,... .......o,1., ...",.-........ ,, . ___,
t ,~,. • ..'..,.,.-;••iltl'II" ••►' .,,..-,t' ttu"'t ... _,, ll 0
0
14
~
Llion The second step in the data acquisition process is collection and storage of data
MLS sets identified as big data. Since the archaic DBMS techniques were inadequate for
d bi- managing big data. a new method is used for collecting and storing big data. The
process is called MAD-magm:tic. agile and deep. Since, managing big data requires
WllSt:
a significant amount of processing and storage capacity. creating such systems is out-
ation. of-reach for most entities which rely on big data analytics. Thus, tht: most common
ware. solutions for big data proct:ssing today are based on two principles- distributed
mass
storage and Massive Paral.lel Processing a.k.a. MPP. Most of the high-end Hadoop
ators.
platforms and specialty appliances use MPP configurations in their system.
Non-relational Databases
, IETF
The databases that store these massive data sets have also evolved in how and where
col. It
the data is stored. JavaScript Object Notation or JSON is tht: preferred protocol for
saving big data nowadays, Using JSON. the tasks can be written in the application layer
and allow better cross-platform functionalities. Thus enabling. agile development of
scalable and flexible data solutions for the devs. Many companies are using it as a
replacement of XML as a way of transmitting stnactured data between the server and
ublish/ web application.
1ecting In-memory Database Systems
These database storage systems are designed to overcome one of the major hurdles in
tht: way of big data processing-the time taken by traditional databast:s to access and
arks) process information. IMDB systems store the data in the RAM of big data servers.
ilforent therefore. drastically reducing the storage 1/0 gap. /\pache Spark is an example of
market IMDB systems. VoltDB, NuoDB and IBM solidDB are some more examples of the
1aking. same.
Hybrid Data Storage and Processing Systems-Apache Hadoop
Apache Hadoop is. a hybrid data storage and processing system which providt:s
ig data. scah1bility and speed at reasonable costs for mid and smnll-scale businesses. It uses
-born a Hadoop lJistributed File System (HDFS) for storing large files across multiple
systems known as cluster nodes. Hadoop has a repl ication mechanism to ensure
smooth operation even during instances of individual node failun:s. Hadoop uses
, e.g. a Google's MapRcduce parallel programming as its core. The name originates from
grange 'Mapping· and 'Reduction' of functional programming languages in its algorithm
s. Born for big data processing. MapRt:duce works on the premise of increasing the
usiness number of functional nodes over increasing processing power of individual nodes.
Moreover. Hadoop can be run using readily available hardware which has sped up
its development and popularity, significantly.
s which Data Mining
his data It is a recent concept which is based on contextual analysing of big data sets to
s. voice discover the relationship between separate data items. The o~jcctive is to use a
o raised single data set for dilTerent purposes by different users. Data mining can ht: ust:d for
through reducing costs and increasing revenues.
15
VIII Sern., (CSE/ISE)
b. Wh11t do you menn by F.clge streaming a nalytic!>? Explain (04 Marks)
Ans. Azure Stream Analytics (ASA) on loT Edge empowers de\clopers to deploy near-
real-time analytical intelligence closer to loT devices so that the) can unlock the full
value of device-generated data. A.wre Stream Annlytics is designed for low latency.
resiliency, efficient use of bandwidth, and compliance. Enterprises can now deplo>
control logic close to the industrial operations and complement Big Data anal)tics
done in the cloud. Jong
Alurc Stream Analytics on 10·1 1-.dgc runs within the Al.lire lo I Edge frameworl,.. '-'ep,ncd
Once the job is creakd in ASA, you can deploy and manage it using lo r I lub. SA and el
~ of the
OR
physicall) iso
8. n. llow IOT Security is different from Physical and convcntiou:ll IT Security'~ 5. Ecos)stem
Explain. (06 Marks) When genera
Ans. I. Lifecycle mismatch alfect thi: 1arg
Many t) pes of ph) ,ical objects - buildings. automobile<;. refrigerators. light Data brcad11:
switches - last a long ti1m:, decades even. They t) picall) require maintenance, but stolen: thi~ a
they're otten not replaced until the repair bills get too high or they just don't work. multiplies th
We certainly don ·1 expect to replace them because a manufacturer decided not to even be a\\ ar
support them after five years. Ecosysll:m cl
Yet much of the software involved in loT is intended to be dispo~ablc. There may be significant di
no prO\ isions for upgrading client devices at all. Software support for even relatively Linux on ,~cl
expensive consumer devices is usually just on the order of a few) cars. attacks.
From a sccurit) per~pective, this means otherwise functional de, ice~ are likel) to be 6. Common,
exposed to unpatchcd vulnerabilities as they get older. Speaking at
2. Genernl-purposc extensible devices Bruce Schne
If these network-connected systems used specialized hardware and softwan: tu computeri;,e
operate and communicate, outdated software wouldn·t nccessaril) be a major issue. about the fi \
It would likely be hard to force such devices to take actions they weren't original!}' the same \\ a)
designed to do. into one phy
However, in practice, it"s very common for loT devices to effectively be general- I mentioned
purpose computers running open source operating systems and network stacl..s. more broadl
There arc good reasons for this: among other things, it"s easier (at least in principl..:) indi\ idual att
to update them and add new capabilities over time. However, it also means that an But even br
attacker who gains control of a device has more options to wreak ha, oc. failures in s
3, Dael economic incentives matters.
None of the above is unfixable. We keep industrial equipment running for decades. 7. ActuatorS
Sofiwarc vendors, including my employer, offer a vari•!IY of e:-.tended life support we·ve seen I
options for subscription products. These models work because customers are willing convenliona
to pay for ongoing maintenance and support at levels where ifs profitable for vendors different. it".
to supply them. Schneier cal
Those same incentives aren't in place when you buy a light switch, or perhaps even Software al
a vehicle. No consumer is likely to pay for an ongoing light switch contract. Some do. But the
may do so for cars, but it"s not common after the initial warranty period. /\s a result. pervasively
there nre 1\0 incentives for mo, t device nrnkerc; to continue supoortinit what tht:y" Vt:
16 &w.+M C.CAM ~MV
rks) sold beyond a fairly short window.
1ear- 4. Connected by default
full Vulnerability to attackers who connect to an loT device or gateway wouldn't matter
ncy. so much if making that corniection were difficult or impossible. But increasingly it is
ploy not. Tht: norm is to connect to networks, usually wirt:less networks. and often public
ytics networks. Even when thcrc·s no compelling reason to do so.
Ifs Jong been recognized that protecting computer systems again5t intruders w ho
vork. have gained physical access can be extremely challenging. (Witness breaches at the
NSA and elsewhere.) However, pervasive and routine network access introduces
many of the same threat vectors. Certainly it creates a far greater attack surface than
physically isolated systems.
rity? S. Ecosystem effects
nrl.s) When general-purpose. networkaconnected computers are breached. it doesn ·, just
affect the target or the attack.
light Data breaches can affect millions of customers when sensitive information is
e. but stolen: this applies whether we·re talking loT or more conventional IT systems. loT
vork. multiplies the issue by collecting more and more ambient data that people may not
not to even be aware is being collected.
Ecosystem damage can go beyond data. The Mirai botnet's OOoS attackcaused
ay be significant disruption to the internet as a whole. It resulted from outdated versions of
lively Linux on webcams being turned into remote-controlled bots for large-scale network
attacks.
to be 6. Common widespread failu re modes
Speaking at the Open Source L<!adership Summit earlier this year, security expert
Bruce Schneier noted that comp'Jtcrs and networks fail in a di/Terent way tha11 non-
are to computerized systems. "You worry about crashing all the cars. You·re concerned
r issue. about the five sigma guy, not the average guy. It doesn ·1 happen in lock picking in
1 inally
the same way.'' he said. because no matter how skilled, one person can only break
into one physical building at a time.
cneral- I mentioned the Mimi botnc:t c:arlier. But it"s the nature ofloT and connt:cte<I systems
stacks. more broadly that vulnerabilities and attacks usually affect many systems. Ofcourse,
ncipl::) individual attacks can still lead to data breaches or the shutdown of a critical system.
that an But even breaches that would be relatively innocuous in isolation can cause serious
failures in systems like the power grid if multiplied by a thousand or a million. Scale
matters.
ecades. 7. Actuators can affect our environment
support we·ve seen how loTcan differ in scale, connectedness and vendor support from more
willing conventional IT systems. But if I had 10 pick one aspect of loT IhaI ·s rundamentally
different. it's this one: loT is not read-only.
t:l:~;~;
result.
Schneicr calls it ··an intt:rnet that affects the world.'"
Software already controls many critical systems or directly tells humans what to
do. But the degree to which loT is replacing manual and disconnected controls
pervasively and hy default is striking.
they've
1\1\U'
17
VIII Se-n11 (CSE/ISE)
That loT can take physical actions may not really change its security model, but it
certainly raises the stakes.
We prioritize foatures. We prioritize low prici::s. We prioritize today. We do
notprioritize security over a product lifecyclc that may span decades. In devices that
have the power to a[ect the physical world.
All loT Agendu 11etwurk c:u11tributor.1· are respu11siblefor the cu11te11t and accuracy uf
/heir pus1.1·. Opi11io11s are ci /he ,wr-ilers and do 110I 11ec:essurily c:011vey t/1e tlruuglrt.1
<~( loT Agenda.
b. Wlrnt is OCTAVE and FAffi explain? (10 Marks)
Ans. Within the industrial environment, there arc a nup1bcr of standards. guidelines. and
best practices available to help understand risk and how to mitigate it. lEC 62443
is the most commonly used standa.rd globally across industrial v~rticals. It consists
of a number of parts, including 62443-3-2 for risk assessments. and 62443-3-
3 for foundational requirements used to secure the industrial environment from a
networking and communications perspective. Also. ISO 2700 l is widely used for
organizational people. process, and information security management. In addition. The first step
the National Institute of Standards and Technology (N IST) provides- a series of criterion. OC
documents for critical infrastructure, such as the NIST Cybcrsecurity Framework impact, , a lue
(CSF). In the utilities domain, the North /\merican Electric Reliability Corporntion's that at an) po
(NERC's) Criticnl lnfrastructure Protection (CIP) hns legally binding guidelines for model. (Whit
North /\merican utilities. and IEC 62351 is the cybersecurity standard for pO\\er model, dcscri
utilities. The seconds
The key for any industrial environment is that it needs to address security holistically with assets, a
and not just focus on technology. It must include people and processes, and it should owners, cust
include all the vendor ecosystem components thnt make up a control system. It is importar
In this section, we present a brief review of two such risk assessment frameworks: information
• OCT/\VE (Operationnlly Critical Thrent. t\ssct and Vulnerability Evaluation) critical.
from the Software Engineering Institute at Carnegie Mellon IJnivcrsity Within this ai
• f/\ll{ (l· actor Anal)sis oflnformation Risk) from The Open Group of the assets
These two systems work toward establishing a more secure environment but with identifying ti
two dilferent approaches and sets of priorities. Knowledge of the environment is key human actor
to determining security risks and plays n key role in driving priorities. There arc, I
OCTAVE than simply
OCTAVE has undergone multiple iterations. The version this sc<.:tion focuses on is the prioritiz
OCTAVE/\ llegro. which is intended lo be a light"cight and less burdcn~ome process technical con
to implement. Allegro nssumcs that a robust security team is not on stnndby or the applicati~
immediately at the ready to initiate a comprehensive security review. This approach with that ind·
rind the assumptions ii makes are quite appropriate, given thal man) operational The third stc
technology arens are similarly lacking in security-focused human assets. Figure 8 the range of
illustratei; the OCTAVE Allegro steps and phases. This refercn
However, it
or even the A
information.
18
- ModeiQ~Petpet"' · l
)lit it
• do ~,.p~: s,q,6:
ldenufy Are.IS
' that ofCouet:n1
ltknufy R,;ks
·yo/
SttpS:
gill.\ lde11tify Thrc>I
SttJI 7:
An;d)t.C Risks
~ SCC03CIOS
SleJ> 8:
M11tt;111on
A1lS~""(l:,ch
20
I. Tiu: identifying specific mitigations that can be mapped to the threats and risks exposed
ration. during the analysis process.
erized FAIR
1ic to FAIR (Factor Analysi~ of Information Risk) is a technical standard for risk definition
tion is from The Open Group. While information security is the focus. much as it is for
ments. OCTAVE. FAIR has clear applications within operational technology. Like OCTAVE.
in the it also allows for non-malicious actors as a potential cause for harm, but it goes to
vledgc greater lengths to emphasize the point. For many operntional groups. it is a welcome
acknowledgement of existing contingency planning. Unlike with OCTAVE, there is
a data a s ignificant emphasis on naming, with risk taxonomy definition as a very specific
apping target.
ge, the FA IR places emphasis on both unambiguous definitions and the idea that risk and
ialysis. associated allributes are measurable. Measurable. quantifiable metrics are a key art:a
factor of emphasis. which should lend itself well to an operational world with a richness of
tribute. operational data.
res an: At its base, FAIR has a definition of risk as the probable frequency and probable
magnitude of loss. With this defi nition, a clear hierarchy of sub-elements emerges.
ats are with one side of the taxonomy focused on frequency and the other on magnitude.
•fin it ion Loss even frequency is the result of a threat agent acting oo an asset with a resulting
eats. In loss to the organization. This happens with a given frequency called the threat
oint that event freq uency (TEF), in which a specified time v.1 iridow becomes a probability.
enarios There arc multiple sub-attributes that define frequency of events, all of which can
in turn. be understood with some form of measurable metric. Threat event frequencies are
applied to a vulnerability. V11/11erability here is not necessarily some compute asset
ty of an wcakm;.ss, but is more broadly dt:11111::d as the probability that tho: targeto:d asstlt will
pacted. fai l as a result of the actions applied. There are further sub-allributes here a~ well.
I to the The other side of the risk taxonomy is the probable loss magnitude (PLM). which
begins to quantify the impacts. with the emphasis again being on measurable metrics.
ation of The FAIR specification makes it a point to emphasize how ephemeral some of these
step are cost estimate~ can be, and this may indeed be the case when information security is
the target of the discussion. Fortunalt::ly for the OT operator. a signiticant emphasis
ccisions on operational efficiency and analysis makes understanding and quantifying costl>
her than much easier.
isk. The FA IR defi nes six forms of loss, four of them externally focused and two internally
walking focused. Of particular value for operational teams are productivity and replacement
controls loss. Response loss is also reasonably measured. with fines and judgments easy to
mented. measure but difficull lo predict. Finally, competitive advantage and reputation are the
·pted nor least measurable.
d by the
Module - 5
terms of 9. a. What do you setup Arduino UNO? (06 Marks)
security Ans. setup() - The setup() function is called when a sketch starts. Use it to initialize
eans of variables, pin modes. start using libraries. etc. The setup function will only run once.
21
VIII Se,m,, (CSE(ISE)
after each powerup or reset of the Arduino board.
Example
int button Pin = 3;
void setup()
{
Serial.begin(9600);
pinMode(buttonPin. INPUT);
}
void loop()
l
II ...
}
b. Explain the IOT netw ork protocol stack ( 10 Marks) IO'.! 15 4 I:, 12
Ans. The Internet Engineering Task Force (IETF) has developed alternative protocols for ~ the adapiat
communication between loT devices using IP because IP is, a flexible and reliable under routinl!.
standard. _The Internet Protocol for Smart Objects (IPSO) Alliance has published ofmthcnCl\~o
various white papers describing alternative protocols and standards for the layers net\\Ork.
of the IP stack and an additional adaptation layer, which is used for communication (3) ~etwork
between smart objects. from the trans
(1) Physical and MAC Layer (IEEE 802.15.4). The IEEE 802.1 S.4 protocol is (ROLL) \\Orki
designed for enabling communication between compact and inexpensive low power Lossy Nerwor
embedded devices that need a long battery life. It defines standards and protocols For such ncl\\
for the physical and link {MAC) layer of the IP stack. It suppo11s low po\,er describes ho"
communication along with low cost and sho11 range communication. ln the case of the nodes after
such resource constrained environments, we need a small frame size. low bandwidth. function is us
and low trnnsm it power. constraints ma
Transmission requires very little power (maximum one milliwatt). which is can be 10 avoi
only one percent of that used in WiFi or cellular networks. This limits the range function can a
of communication. Because of the limited range. the devices have to operate need to be sent
cooperatively in order to enable multihop routing over longer distances. As a result. rhe making of
the packet size is limited to 127 bytes only, and the rate ofcommunication is limited to to nt.•ighboring
250kbps. The codirrg scheme in IEEE 802.1 S.4 has built in redundancy, which makes not depending
the communication robust, allows us to detect losses. and enables the retransmission for\Vard the m
of lost packets. The protocol also supports short 16-bit link addresses to decrease the leaf nodes and
size of the header, communication overheads, and memory requirements. upwards hop b
(2) Adaptation Layer. 1Pv6 is considered the best protocol for communication in follows. We se1
the loT domain because of its scalability anti stability. Such bulky IP protocols were (towards leave
initially not thought lo be suitable for communication in scenarios with low power To manage the
wireless links such as IEEE 802.15.4. nonstoring no
6LoWPAN. an acronym for 1Pv6 over low power wireless personal area networks. nodes arc in a
is a very popular standard for wireless communication. It enables communication route informati
using IP¥6 over the IEEE 802.15.4 protocol. This standard defines an adaptation root. The root
layer between the 802.1 S.4 link layer and the tran~port layer. 6LoWPAN devices can
23
VIII Sem, ( CSE/ISE) I V\t"e,,-vte-C o f ~ i ~
with the path message lo the destination hop by hop. But there is a trade-off here
because nonstoring nodes need more power and bandwidth to send additional route
information as they do not have the memory to store routing tables.
(4) Transport Layer. TCP is not a good option for communication in low power
environments as it has a large overhead owing to the fact that it is a connection
oriented protocol. Therefore, UDP is preferred because it is a connectionless protocol
and has low overhead.
(5) Applicati~n Layer. The application layer is responsible for data formatting.
and presentation. The application layer in the Internet is typically based on HTTP.
However, HTTP is not suitable in resource constrained environments because it is
fairly verbose in nature and thus incurs a large parsing overhead. Many alternate
protocols have been developed for loT environments such as CoAP (Constrained
Application Protocol) and MQTT (Message Queue Telemetry Transport).(a)
Constrained Application Protocol: CoAP can be thought of as an alternative to HTTP.
It is used in most loT applications. Unlike HTTP, it incorporates optimizations for
product,
constrained application environments. It uses the EXI (Efficient XML Interchanges)
and 1,me. S,
data format, which is a binary data format and is far more efficient in terms of
can run on ,t and.
space as compared to plain text HTM L/XML. Other supported features arc built in ·
\\Ondcrful pro
header compression, resource discovery. autoconfiguration. asynchronous message
Installing and
exchange, congestion control, and support for multicast- messages. There arc
Raspbian, the defi
four types of messages in CoAP: nonconfirmablc, confinnable, reset (nack). and and 3, Ml loading
acknowledgement. For reliable transmission over UDP. confirmable messages arc
clicl.. the top let1 P
used. The response can be piggybacked in the acknowledgement itself. Furthermore.
"P~ thon 3 (IDEL)
it uses DTLS (Datagram Transport Layer Security) for security purposes.(b)Message
Queue Telemetry Transport: MQTT is a publish/subscribe protocol that runs over
TCP. It was developed by IBM primarily as a client/server protocol. The clients
are publishers/subscribers and the server acts as a broker to which clients connect
through TCP. Clients can publish or subscribe to a topic. This communication
takes place through the broker whose job is to coordinate subscriptions and also
authenticate the client for security. MQTT is a lightweight protocol. which makes it
suitable for loT applications. But because of the fact that it runs over TCP, it cannot
be used with all types of loT applications. Moreover. it uses text for topic names.
which increases its overhead.
MQTT-S/MQTr-SN i<: an extension of MQlT, which is designed for lo\\ power
and low cost devices. It is based on MQTT but has some optimizations for WSNs
as follows. The topic names arc replaced by topic IDs, which reduce the overheads
of transmission. Topics do not need registration as they are preregistered. Messages
are also split so that only the necessary information is sent. Further, for power
conservation, there is an offiinc procedure for clients who are in a sleep state.
Messages can be buffered and later read by clients when they wake up. Clients \\ hen P) thon 3 lo
connect to the broker through a gateway device. which resides within the sensor l. Entering Pytb
network and connects to the broker. 2. Input/output~
24
~
f hcrc OR
route
a. Ho" do ) 'OU progrnm raspbe rry pi with python? (10 'htr~)
P)1hon is a programming language that has recently become vcr) poµul;,r-so
ower
popular. in fact. that it is no" the fourth most popular language (according to the
.crion
tocol TIOBE indcx).lJnlikc other l;rnguagelt (C. Ct+, etc.). Python is an 1111e1 prc1cd
language. which means ,1,hcn a Python program is cxccutcd,';111 1111erpre1cr reads
tting the Python instructions and then performs the desired action. The: interr rer itself
TIP. is \Hitten in CPU nati,e instnactions, but the P)thon progra111 is not. fhi-, 111eans
it is Python programs can, in theory, run on ANY computer. so long as that computer
has a Python interpreter. This makes Python n cross-platform lm1gu:1g1.,, whh;h
is one of the main rea:.ons "h) it has bcconu.' so p11pular /\ program ., ri11en on
Windo\\ s does 1101 nccd to be rcwrillen to work on a Mm; or a Linux 111,11:hinc. It's
also incredibly simple. cas) tu read. and pti,,crful. II i;;•n ,Hile ad\'anccd programs.
including graphical interfaces, nel\1 orking. parnllelism. and mu-:h more. Python is a
pro<lucti, c language, resulting in faster program production and requiring less code
nnd time. Since Linux has been written for the Pi (ARM core) the Python mterprcler
can run on it and, thercforc. so can Python programs. So, ho,.,, do we \lart with this
wonderful programming language on the Pi?
e an: lnst:111ing :ind Loading the IDE
• and Rnspbian, the default OS choice for lhe Raspbcrl) Pi, should contain both Python 2
sore a1~d 3, so loadi1_1g !'~thon !thould be ca~y lo navigate through menu oplwns. firstly.
more. click the top lell P1 icon on thi! menu bar. Then na'vigate to ··Progn1111ming" and click
ssagc "Python 3 (IDEL)''.
o,cr • L:!11 •:? ~ •• -••W •.,.,r -
lients
nncct
~~----~
~
ation
also
tkes it
annot
m1:s.
0\\Cr
SNs
1eads
sages
ower
state.
lients Lmuli11g Python 3 ll>t
ensor \\ hen P) thon 3 loads. the, isiblc windo" 1, the shell. \\'luch ha l\\ u n1ai11 pu, poses:
1. Entering Python commands: ,;ome quick calc11int1ons
2. lnput/outpul for programs: gelling datn from a 11-;er and tli..,ph~ in:.: infonnation
MW' 25
VIII Se,m,, (CSE(ISE)
. .,..,,.
Sman trtr11--
Th~ 111ni11 !,/wf/ ~·hgent I
The 1•)• ,11s1> ha~ a 1•1en11 oar,,, .. 111·my options, including file opcmtions, editing., S)Stems is t
shcl\ : .;,iic nption.,; (''li.t-.:nup~ p10-rnm'' for example), program 11\!bugging, IDE avoid accident
optio,,s. ,,ml ·H·.•lp... l ur now, ,. c will only concern ourselves "ith creating a new technologies g
file, "hich is don., ~y chckir.f rile> New File. nccch:rometcrs
infrared sensors
,chide mo,cmcnt
Traffic sur-. eillanc
ncl\,ork to each o
.... -=~II (l•l•I,
Rf· ID de, ices. and
Qpen <~•O
~•t.Q<IUe M•U. pnrt, of the cit\ C:
fi~•Ml!f•~ cond1tiom, can be
CL:.• ~oo-•~l"f .oll•C surveillance using
Po•~Q:'11>,W
can .ilso be imple
~q Clr•S sensors. These ap
s.,,,iui,
5a',,!CoP!~
user •~ driving. St
and users nrc usi
CrM>
"""4\Yl."1<10>,\'
,\pphcntions toe
conditions. It nls
now was mainly i
to help drivers
Creating a new file of drivers and he
tired and helping
CreatingFirst Pro~rnm
Once the ne" w indo,~ lo.ids. "c can linall> enter ou,· first Python program. To do application\ are
this, enter the code as shown below into the window. Then, sa,e the file. the steering ( to
name=,.,. application, ,,hi
such as the acccl
age 0
currcntYcar - 0
26
~-- ----·- ---
yearBorn -= 0
name= input("Enter your name: ")
age=- inl(input("Enter your max age this year:•'))
cum:ntYear = int(inpul("What is the current year?:"))
yearBorn = currentYear - age
print("You were born in the year"+ str(yearBorn))
Running Our Program ·
With the code entered and the file saved, i1's lime to run lhc program. Running a
Python program can be: done in one or three ways: press FS in lhc window with the
program to run. go to the menu bar and click Run > Python Shell. or run the file via
a tenninal window as an argument for Python. For now. the easiest w;iy is In simply
press rs in the ,vindow with the code. Once pressed, lhe code should return no
errors. and the shell window should p~ompt for data.
b. Justify the statement " An IOT strategy for smarter cities". (06 Marks)
Ans; Smart transport applications can manage daily traffic in cities using sensors and
intelligent information processing systems. The main aim of intelligent transport
systems is to minimize traffic congestion. ensure easy and hassle-tree pa,-king, and
avoid accidents by properly routing traflic and spotting drunk dri,1ers. The sensor
technologies governing these types of applications are GPS sensors for location.
accelerometers for speed. gyroscopes for direction. RFIOs for vehicle identification.
infrared sensors for counting passengers and vehicles, and cameras for rccofding
vehicle movement and traffic. There are many typ.cs of applications in this area:( I)
Traflic surveillance and management applications: vehicles .ire connected by a
network to ench other, the cloud. and to a host of loT devici:s such as GPS sensors.
RflD devices. and cameras. These devices can estimate lrafiic conditions in different
parts of the city. Custom applications can analyze trafik patterns so that future traffic
conditions can be estimated. They implemented a vehicle tracking system for traffic
survei Ilance using video sequences captured on the roads. Traffic congest ion detection
can also be implemented using smartphone s~nsors such as accelerometers and GPS
sensors. These applications can detect movement patterns of the vehicle while the
user is driving. Such kind of information is already being colleclcd h, Google maps
and users are using it to route around potentially congested areas vf the city.(2)
Applications lo ensure safety: smart transport does not only impl) managing traffic
conditions. It plso includes safety of people trnvclling in their vehicl!:!s, v. hich up till
now was mainly in the hands of drivers. There are man) lo r applications developed
to help drivers become safer drivers: Such applications monitor driving behavior
of drivers nnd help them drive safely by detecting when they arc feeling drowsy or
tired and 111:lping them to cope with it or suggesting rest. Ted111ologie~ used 111 such
applications are face detection, eye movement detection, and pres~ure detection on
the steering (to measure the grip of the driver's hands on the skering).A srn:111phone
application, which estimates lhc drivcr'.s driving behavior using smnrtphonc sensors
such as the acceleromeler, gyroscope. GPS. and cnmcrn. has been propns,•d by Eren
t'
27
VIII Se-n-11 (CSE/ISE)
et nl. It can decide "hethcr the dri, tng i,; safe 01 rash b) an.ii~ 711lg the sensor data
())Intelligent p:uki11g 111anagen1cnt· in n small tr:111sportat1011 '-Y~tcm, parking. 1s
completely hassle free as one can easil) chcd, on the Internet to find out" l11ch parl,,.ing
lot has frec spaces. Such lots use scn-,ors to detect if the slot~ ate free or occupied
by vehicles. This data is then uploaded to a central server.(4)Smart traffic light<::
traffic lights equipped,, ith sensing. processing. and commun1catio11 capabilities .u-c
called smart traffic light~. These light- sense the traflk congestion at thl· inl<!r~cction
and the amount of trallic going each ''- ,l\. I his inform,tti,111 can he anal) Led and
then sent to neighboring traffic light, N •• ccntr:ll ~ont1ollcr It i'> possible to 11',c
this information crcativel). For example. 111 an en11.:rgenc) situation the tralhc lights
can preferential I} gin ,, 1) to 1111 ambulance. \\ hen the smart trnllic light senses an
ambulance com :ng. 11 dears the path for 1t and ,llso informs neighhoring lights about
it. Technologies used Ill thc~e lights arc cameras, communication technologies. and
data an .. ·ys1 :.1oduks. Su,·h 1 y<:t.!ms ha\'C already been deplo} cd in Rio De fanc1ro.
(S)Accl\lent dctcctio11 api'!1c·11ions: a smartphone application designed by White
dete.:ii. the occurn..11ec 1>f ,111 .1cciJcnt \\ ith the help of an accelerometer and acoustic
data It •.~mcdi:11i.:ly ~ends th1 ,niormation along\\ ith the location to lhc ncan.'.st
hospital '11 Ille addi11011al situational information ~uch as on-site photographs 1s Jlso
sent so th'.lt tlw fin,t r.:spondcrs kncm about the \\hole scenario '.llld the degree of
medical hi.:lp th:,t is required.
S)Sh:m or a C
a fac1lit) such
One of the co
loT \\orld ha,
i■lelligcmtl)p
arc the SA nnd
....
a.
is Eighth Seml'ster B.E. Degree Examination,
g CBCS - Moc.Jet Question Paper - 2
d INTERNET OF THINGS TECHNOLOGY
ls: ae: 3 hrs. Max. Marks: 80
re Net~: Atnwer a11y Ff VF. /11/1 q11e.\li1111.~, .1e/ecti11R ONE full 1111e.1·titJ11 from e11ch module.
bn
nd Module- I
SC
Explain the fou r pillars ofIOT &how arl' they interconnected with each other
~ts
(10 Miu·ks)
an
The four pillars of loT arc M2M, RFID, WSNs and SCA DA (Supervisory Contcol
iut and Data Acquisition).
nd
ro.
• M2M u~es devices to capture events, via a network connt:ction to a central server.
lite that translates the captured events into meaningful information.
,tic • RFID uses radio waves to transfer data from an electronic tag attached to an
·est object to a central system through a reader for the purpose of identifying and
lso
tracking the object.
of • A WSN consists of spatially distributed autonomous sensors co monitor physical
or environmental conditions.
• SCA DA is an autonomous system based on closed-loop control theory or a sma11
system or a CPS data connects, monitors, and controls equipment via network in
a facility s_uch as a plant or a building.
M2M
IOT RFIO
SCADA
29
WID'
Iclat~. Eighth Semester B.E. Degree Examin.ttion,
!ng IS
irking CBCS - Model Question Paper - 2
up1ed INTERNET OF THINGS TECHNOLOGY
ight,: • 3 hrs Mull.. Marks: 80
cs an.: N.-: An,i.·l!r an.I' F/J, F /11/l t/W!Miom, lt!lt!L'ling 01\'E/11/1 q11elti1111 from e11d1 mot/11/e.
·ction
d ,llld Module - 1
10 USC
lights Explain the four pill:trs of IOT &how :trc they interconnected with each other
ISC', an (10 Marks)
about Au. The four pillars of lo I are M2M, RflD, WSNs and SCA DA (Supcn isory Control
•s. and an<l Onta Acquisition).
iinciro. • M2M uses devices to caplure events,\ ia a network connection to a cenlral se1vcr.
White lhat translates the c.. ptured c,·ents into meaningful information.
coustic • RFID m,cs radio waves to transfer data from an t:lectronic tag attached to an
nearest ObJect to a central S)Stem through a reader for lht: purpose of identifying and
1s :ilso tracking the object.
.grce of • A WSN consi!.tS of spatially distributed autonomous sensors ro monitor physical
or environmental conditions.
• SCAOA is an autonomous system based on closed-loop control theory or a sma11
S) sti:m or a CPS dala connects, monitors, and controls equipment , ia nelwork in
a focility s.uch as a plant or a building.
IOT RFIO
29
VIII Se,m, ( CSF./ISF.) Interl'\.etof~T~
Four Pillars and Their Relevance to the Network:
-
RFIO Vu No Some
----...
2. RFID : is the 2nd pillar of IOT.
I!]
30
,,,
• t-,todeL,Q~t.Ct'\IPtJ1)e,.,- • 2
RFIO stands for Jbdio Frequency JDcntifica tion and ifs u non-contact ~cch~o,log~
that"s broadly used in man> industries for task~ <,u~h ~., pc_rsonncl tra~ht,ng, access
control. supply chain management, books tracking 111 hbrancs. tollgate S) stems and
~F~;
data
uses radio waves produced b) a reader to detect the ~resenc_c of(then read the
stored on) an RFl D tag rags arc embedded in small 111:ms hke cards. buttons.
or tin) capsules.
,,
/ ,,
I / ,,
I I I /
I I I I
I I I I
in that I I \ \
ion or \ '
e Type
' ' ...... __
'
' ..... __
- - \ol IU I
fficiently
ular area
31
VIII Se,m; ( CSf/ISf) I V\te+-"VU't" o f ~ T~
3. Wireless Sensor Network (Wsn):
WSN is the third pillar of IOT.
Wireless Sensor Network
It's a collection of devices " sensor nodes" . They arc small. inl!xpensivc, with
constrained power. The} an: organi1t:d 111 a cooperati, c network. rhc) communicate
wireless() in multi hop routing. I !cavil) deployment. Changing. network topology
WSN Definition
A sensor nctwo1" is composed of (I large number of scn'>or nodes that nrc dcnscl)'
deployed inside or vel) close to the phenomenon~
random dcployment8
sell:organi,ing capabilities0
(v)Com
Each node of the sensor networb consist of three subs) stcan:O
ll pro"d
Sensor subsystem: senses the en, ironmcn18
Processing subsystem: perfonns local computations on the sensed data0 b. Esplain thl' ap
Communication :;ubS) stem. rcspon'>iblc for mes~age c,chang.c ,, ith neighboring. Aa. Smart Homl':
sensor nodes0
l'hc features of sensor nodes8
Limited sensing region, processmg power, energ} 8 The number o
4. Scada (Supervisory Control ;md O:,t11 Acquisition): 60,000 people.
anal) tics inclu
HMI in\'olved in sm
amount of fun
comrnun~cauon
Suporvlsory
........ t lnte,tacc rapid rate. Th
CADA (
System
AlertMe or N
Baier, or Belk
•
~
¥ PLC Wearables
RTU
$CADA
Just like sma
P,ourammlng year, consum
Apart from t
It is impossible to keep control and supcn ision on all induc,trial acti1 ities manual\)'. such as the S
Some automated tool 1s required" hich can control, supcn ise. collect data. anal) sc:.
data and generate reports./\ unique solution 1s introduced to 111cc1 all this demand i~
SC ADA S}S!Clll
32
C'i3CS . ModcltQ1-<.e1ttowP~eY · 2
Communication Interface
33
VIII Se-vw (CS'f(I SE)
Smart City
Smart cities, like its name sugge!>tl>, i-, 'I big. inn0Hll1on and spans a ,, idc vantl} of
usc cases, from ,,ater distribution and 1raflic manag.l'mcnt to ,,aste management and
Cll\ iron mental monitoring. The n:ason "hy it 1s sv popular is that 11 trits to rcmo\c
the discomfort and problems of peopl. "I• J II\ e m cities loT sulul 1011,; offered in th,·
smart city s1.:1:tor solve various eity-relah:11 problem~. comprb,ng o ftrntnc, reducing
air and noisi;: pollution, and helping IL make 1.1t1es s,1for.
Smart Grids
Smart grids are another area ofloT t1;.!1111ll1J,:() 1hat stands out. A sm,trt grid bai.icall}'
promic;es to 1::\tract information on the hdia, h.>rs of consumen, and electricit)'
suppliers in an 'automated fashion to ,mproH the elllciem:y, economics. and
rcl111bil1ty of dectmil) ,ltstribut,on. 41,000 111011thl) Google searches ,s a testament
to this 1.onccpt's populant) Smart farm mg ,
lndu~tri:1I lntcl'net number of famu
Ont ,,.:) h.. think ol' 1he l11Justrial Internet i,; by looking 111 connected machines 1hat fam,ers \\O
and u vu·•-~ 111 111dustr:c.s ~•1 h ac; pC'\\CI !,(eneration, oil. g11s, and healthcare. II al~o
1
re\ olutiunize th
nrnl-.t:, ·•~1. 0f sttu tt l(lnS wi1t:n.: 1111pl111rncd downtime and system failures can re:,ult in scale attention.
lili:-: :11~11 ng ,,ituat10:1~. ,\ s,.stc,n \:mbcdded \\ ith the fol lend<; 10 include de, ICC!> not lx umh:n:~1
such"" fitm.:~,; t•:.mh for lie·,! 111oni111ri11g. or i.murl home applianc..:s ·1hcsc '-)'-t-.ms upphc,nion heli
are ii' 1c111 n,11 an.I cnn provide ca~c of 11~1.! hut arc not rl!!iable because the) do not
typicall) cr1;ate emergency situation!> if a cto1Antimc wns to occur. ·
Connl-ctctl Car 1. a. fapl:iin the 10
Connl!cted .:ar technology 1s a vast 1111CI an e:-.tensivc network of multiple sensors. Ans. An O,cn ic" a
antc,111,1s, cmb.:ctded sof\wrirc, and technologies that assist in communication to the4 Stage 10·1
navigalc 111 our 1;0111plex world It has 1hc responsibility of mal-.ing dcci~ton~ "ith Data Acquisiti
co11s1stc11cy, act.urae), and Si eccl I• also has tu be reliable. 1 he~I! rcq111remc11ts \\ ill ufthesestap.c~ c
become e, en more crit1c,1! ,,hen humans give up conlrol of the sttc11ng wheel nnd The 4 Stage
brak,·s II> 1he autonomou~, chicles that are being tested on our highwqys right 110\\,
Connected Health (Digital I lealthffcleh1:alth/Telcmedicine)
loT has various applications in healthcare, which are from ,emote monitoring.
equipment to advance and smart sensors to equipment integration. II has the Th Ill Q
potential to improve how phvsicians deliver care and also keep patients safe t1nd
healthy. I lealthcare lo r can nllow patients to spend more 111111:l mteracting ,, ith their
doctors. which can boo~t pati nt engagement nnd satisfaction I ro111 pl!rsonal fitness
sen~ms to surgical robots. loT in hcalthcnrc brings new tools updated" i\h the latest
technulog~ in the ecos) ~tern that hdps Ill developing bener healthcare. loT helps to
revolutio11i1c healthcare and prO\ idc pocket-friendly solutions for both tht: patient
and healthcare professional.
Snu1rt Retail
Retailer:. have started ndopting luT ~olutions nnd using lo1 embedded systems
across II number of applications that improve store operations, increasing purchases.
reduc111g. 1heft, .:nab ling i1l\ cnto1: mnnagcment, and enhancing the com,umer's
shopping experience. Tl11ough luT physical retaileri. can compcte against online
34
.,
chnllengcrs more strongly. They can regain their lost markel share and altr_act
consumers into the store. thus making it easier for the111 to buy more "' 11 ll: saving
money.
Smurt Su11ply Chain . .
'S\.Jf"Jf'tb• 4-'.'.h ,"lli.n'.'. hnvc already been 1:;c\lln~ SITHlt"lcr for a couple of yen1 •. {)ffenng
M>lulions to probhml5 like trncking or !l,OOd~ )Yhile lht:y are on the road or Ill :rnnsit or
hc!p,ng suppliers exchange inventOI)' information are some of the popular dfcrings.
With an loT cnablt:d system. factory equipment that contains emhecil!,..' <.cnsors
fly communicate data about difforcnt parameters. such as pressure. tempe,-- urc. and
lty utilization of' the mad1ine. The loT system can also process workflow , • ' change
nd equipment settings to optimize performance.
ml Smnrt Funning
Smart fanning is an otlcn overlooked in loT applications. However, be, ause the
number of farming operations is usually remote and the large numb;r J livestock
es that farmers work on. all of this can be moniton.:d by 1he Internet ofTh,ngs and can
·,o revolutionize the way farmers oper:ite day to day. But. this idea i~yet to 1\::tch a large-
t in scale attention. Nevertheless, it still remains on<: of the loT application~ tli.,t should
c~s 1101 be underestimated. Smart· fanning has the µ01cn1ial 10 become :11• imponant
nu, application fleld, spccificall} in the agricullurnl-produc1 exporting countril.s.
ot
OR
2. a. Explain the IOT network nrchitecturc nml design ( lO Murks)
ors. Ans. An Overview of the Main Singes in the loT Architecture D,agrnm. In simple terms.
I to the4 Stage loT architecture consi5ls of Sensors and actuators. Internet gi:taways and
ith Data Acquisition Systems, l:dge IT. Dnt,1 center a111.I <.:loud. The dcl:1iled prc,-,entation
"ill ofiheses1aµ,es can be found on the diaµ,rnm below.
and The 4 Stage loT Solutions Architecture
W.
.., . -,
~l.co,1 .,1....
ring s,... , t, 7
lrH1rnu,t G.,tew 1,.,-. ,
the oa,., ~ c1u1-,1tnJ11 f II T
Tnc ""Thinys
-nncl ~·.,,."'II
their Pt,,,,, I , . •r.• rc-h ,e
-,l -.~'
ness ,$,0.. •c-t;
atest 4 _,
,, . . JiJ,I
?S to l
tient
tems
C'J'l ,,1.ma J
1
C e"tfJJ :O}''i:
•r.,,l'f'O".-~~•
Ow .;f"<J~ n•.;
*
--1~
§:[.,;1
35
VIII Se-wt1 ( CSf./ISf.)
To get the proper understanding ofthl main actions and the importance of each stage
in this process, refer to the detailed reviewl> prt>sentcd below.
Stage 1: Networked things (wireless ~eusors and actuator~)
The outstanding feature about :,cm,ors is their abilit) to convert the information
obtained in the outer world into data tor analysis. In other words. it's important to
start with the inclusion of sensors in th..: 4 stages of an loT architecture framework
to get information in an appearance lh:1t can I v actually processed. r-or actuators. the
process goes even further- these de\ ices 1re able: to inu.:n em: the physical reality.
For example. they can switch off thl. light and :idjust the tempcrnture in a room.
Becau.,.: 0f this, sen,,ing and actual iug stag,e 1,;o, en, and adjusts everything needed in
the physical wnrld to ~•ain the ncce~sary insights for furthe1 analysis.
Stngc:? ~Clll>Or tlata aggreg:1'ion S)stt•ms 'lnd analog-to-digital data conversion
Even ,houth this stage of lol ,irchitectun: still means working in a close proximity rce -
with , ,i•·or~ and a'.'lu::itor.- in'i ~net getaways and data acquisition sysh:ms (DAS) cater to >our c
appca,· 'll'' 110. SF'~cifh:.ill) 11,e later conn~ct to the sensor network and aggregate to-end xperie
out1'11:1 , , ii.: I 1tcrnc1 gct,1•· , , , work through Wi-Fi, wired LANs and perform firrn,\a .
furths:, ·,, ",sing. I l•\! • ,\.,I impor1 •II~~ of this slagc is to process the enormous Layer I: ·n:.ors
amou •, ' ,nfonnatior l.ulh:c!<:d on the previou~ stage and squeeze it to the optimal Sensors h, ve bee
size tor furt,1er ·10al) s 1~. i.k,1de~, the necessary conversion in terms of .timing and healthcare aviati
strnl.l'lr: happen~ her1,; In sl:ort, Stage 2 1nak;;s data both digitalized and aggregated. and inexp nsive t
Stage 3. T hl· appeanmce of edge IT systems profession lly. Th
During thb monu.:nt among the stages of loT architecture. the prepared data is connected cnsors
transfeircd to the IT world. In particular, edge IT systems perform enhanced anal) tics Layer 2: icroc
and pn:-processing here. for c.\illl1pk. it refers to machine learning and visualization The secon layer •
technologi~s. At the same time, som\~ -idditional processing may happen here, prior data proc ssing a
to the stage of entering the data center. Likewise. Stage 3 is closely linked to the connectivity to s
previou , µhases in the building of anarchitecture ofloT. Because of this. the location generate o er I 0,
of edge IT ~ystcms is close to the one where sensors and actuators are situated. before se ing it
creating a wiring closet. At the same time. the residing in remote offices is also your colle ted da
possible of unnece sary d,
Stage 4. Analysis, management, and storage of data on data tr nsfer a
The main processes on the last stage of loT architecture happen in data center or Your mic ocontr
cloud. Precisely, it enables in-depth processing, along with a fol low-up revision for loT device store
feedback . Here, the skills of both IT :ind OT (operntion:11 technology) professionals datab:ise Your
arc needed. In other words. the phase already inclu$les the analytical skills of the hold data some
highest rank. both in digital and human worlds. Therefore. the data from other
sources may be included here to ensure an in-depth analysis. Aller meeting all the In some · ses. y
quality standards and requirements, the information is brought back to the physical take acti n and t
world-btll in a processed and precisely analyzed appearance. via a clo d appl'
Stage 5. loT Architecture when a nsor d
In fact. there is an option to extend the process of building a sustainable loT aastomc .
architecture by introducing an extra stage in it. It refers to initiating a user's control
over the structure if onl) your rc~ult doesn't include full automation. of course.
36
The: main tasks here arc visualization and management. After including Stages_. th.:
stage
system turns into a circle where a user sends commands to se1~sors/actuators (Stage
I) 10 pt:rform some actions. And the process starts all over aga111.
ation Ill. Explain the layered IOT Stack. . . . . (06 _Mar~)
ant to The loT stack is rapidly developing and maturmg mto the Thmg Stack. Tiu~ Thmg
work Stack consists of three technology layers: sensors. microcontrollcrs and mternet
s. the connectivity, and service platforms. . .
•ality. • Layer One - Sensors nre emheddt:d in objects or the physical environment lo
·oom. capture information and events for your company.
ed in • La)·er 1\vo - Microcontrollers and internet connectivity share information
captured by sensors within your loT objects and act based on this information to
rsion change the environment.
ximity • Layer Three - Through the aggregation and analysis of data, service platforms
(DAS) cater to your customers. Service platforms also control your loT product's end-
,regal~ to-cnd experience and enable your customers to define system rules and updntc
i:rform firmware.
rmous Layer l: Sensors
ptimal Sensors have been used for years in a number of different industry contexts. like
ng and healthcare. aviation. manufacturing and automotive. Now, sensors are so tiny
gated. and inexpensive they can be embedded in all the devices you use personally and
professionally. The sensor layer of the loT tech stack continues to expand as internet
data is connected sensors are added to new products and services.
ial) tics Layer 2: Microcontrollcrs and internet connectivity
lization The second layer in the Internet of Things technology stack allows for local storage.
e, prior data processing and internet connectivity. The Internet of Things needs internet
cl to the connectivity to send collected data to your cloud database. Because some sensors
ocation generate over I0.000 data points per second. it makes sense to pre-process data locally
ituated. before sentling it to your cloud database. By analy1.ing. extracting and summarizing
is also. your collected data before you send it to your cloud database. you reduce tht: volume
of unnecessary data you send to and store on your cloud database. saving you money
on data transfer and storage costs. ·
enter or Your microcontroller is a small computer embedded within a chip and it helps your
sion for loT device store and pre-process collected data before it's synced to your cloud
sionals database. Your microcontrollcr possesses a processor. a small amount of RAM to
s of the hold d11t11, some kilobytes of EPROM or flash memory to hold embedded software.
n other and solid-date memory to cache data.
, all the
In some cases. your loT device may need to use programmable microeontrollers to
hysical
take action and turn something on or off. In most cases, tht:se decisions are made
via a cloud application, but it makes sense to use progrnmmable microcontrollers
when a sensor detects something that could affect the health and safety of your end
ble loT customers.
control
course.
37
VIII Se-rn,, (CSEIISE) I vit"evn.e,t- o f ~T ~
The main and most important capability of this layer is neL\1orking. which is either
wireless or wired. _If a device is stationary and can access an external power source. a
wired network is sufficient. but a v.,ircd network docsn ·1 make sense for many loT use
cases because physical cables are needed to connec' to the ndwork. WiFi, wireless
modems, and wireless mesh ne_tworks an, the most common ways loT devices arc
connected to the internet.
If you plan to manufacture an loT device you must keep in mind dependencies for
your use case;.. Is your device mobile or fixed? Does your device need a battery or
is it connected to a fixed power supply? How much data do you need to transfer
to your cloud database per hour? Should your device's connectivity be episodic or
cont ii1uous?
Devices you use to track your health and fitness while bicycling. running, and
exercising store data while you· re active, and these devices use episodic connectivity.
Your device then syncs with the cloud v. hen it's dose lo your smartphonc or tablet.
Compare this to the continuous connectivity needed by i\mazon Echo's, oice based
digital assistant who is always listening for your commands, fetching answers from
the internet the instant you ask a question. Deptmding. on your loT product"s ust:
cases, you may need continuous connectivity.
When you research Think Stack vendors, you'll notice a wide range of different
networking protocols, hardware. software, and architectures arc used 10 build loT
products. Due to the variation in use cases and environments, you have many choices
when it comes to adding nerworking and computing capabilitie~ your loT device.
While some vendors focus more on hardware components, other vendors provide a
system of integrated software and hardware. Sometimt!s, loT sol'rware solutions spill
into the third layer of the Thing Stack. wliich is referred to as the service platform.
Layer 3: Service platforms
The first two layers for the Thing Stack embed sensor and microcomputers in your
loT device. but your loT product profits from the service platform layer. This layer
delivers value to your customers by automating processes and delivering rich data
analytics. Your cloud application combines data collected from numerous loT
sensors with your (or your customers) other business data lo produce insights that
generate business value.
It's important for your service platform 10 create a feedback loop between your
loT devices and you device management software, so you and your customers can
upgrade, monitor, and maintain the firmware on each_ t_hc device_. In most cases.
service platforms operate on cloud infrastructure and ut,hle a muln-t_enant software
architecture to deliver a seamless software-as-a-service (SaaS) expern:ncc. .
The convergence between our digital and physical worlds stresses your _IT ~pe_rallons
!t
by increasing demand for data management. storage, tagging and analys,_s. s Ill your
company's best interests to build your loTscrvice platfonn on robust clouct mtrastructure.
so you can scale infinitely as your business grows with your new loT product.
38
is either V. hile "<,OID\;ire i<; eating the world" as Marc Andreessen said, consumers and
·ource. a companies still purchase a lot of physical things. If )OU build a robust loT service
loTusc platform. ) ou obtain insights About how your customers use your loT products
\\ irelcss and scr\ ices. With a great loT sen ice platform you can manage post-trnnsaction
ticcs an: n:lu1ionships in new and engaging ways.
Because sef'I ice platforms store and make decisions based on data collected from
'lcies for all t) pes of loT de, ices. the) arc often considcn:d the backbone of post-tram,actions
tlery or relationships. Whnt if }OU reached out to cw,tomers ,, hone, er powered on your loT
trans for de, ice and proactivcly on-boarded them? What if you used anonymizcd data from
sodic or customers getting the most ,·aluc out of your product to help other customers unlock
more business value?
ing, and Now is the time 10 build ) our lo r product because sensors keep gelling smaller and
ectivity. network connectivity solutions keep gelling better. Create a plan to engage with your
r tablet. customers and facilitate gonl-orienh.:d post-transaction relationships, leading to nc,"
:c based opporluni1ii.::s 10 turn u single transnction in10 a ~trong n:lationship.
,rs from Module - 2
1ct·s use
J. a. \ Vh:it :ire sensors, nctuntors and s m:1rt objects in IOT'! (04 Marks)
Ans.
itfen:nt
ild loT OASIS FOR
choices SENSORS ACTUAT O RS
COMPA RISON
device.
rovidc a Used lo measure thi: co111i11uous and lmpd co111i11uous and discrete
Basic
<liscrct,: process variables. process,:~ parnrncters.
ms spill
tfon11. Plac.:d at Input pon Output port
Outcome Electrical signal Meat or motion
in your
is layer Magnetometer, Cameras. I.ED. Laser, Loudspeaker.
ch data Accelerometer, microphones. Solenoid, motor controllers.
1us loT
b. What 11rc smart physic.ii objccts'in IOT '! (06 Marks)
its that
A ns. The concept smart for a smart physical object simply means that it is active, digital.
;n your net\\orked, can operate to some ell.tent autonomously, is reconfigurable and h11• local
1crs can control of the resources it needs such as cncrg}, data storage, elc. Note, " smart
t cases. ob~cct_ do~s no_t necessarily nee~ lo be intdligenl as in exhibiting a strong essence of
oftware art1fic1al mtell1gence although 1t cnn be designed to also be intelligent.
Physical world smart objects can be described in h.:rms of three properties:!
• Awareue.u: is a smart object's ability to understand (thut i~. sense, interpret, and
erations
in your react to) e, ents,and human activitie~ occurring in the physical world.
ructure. • ~epre~·e11tatio11: refers to a smart object's applicalion and programmmg model-
'" particular, programming abstractions.
• Interaction: denotes the object's ability to converse with the user in terms of
input, output, control, and feedback.
39
VIII Se-m, ( CSE/IS£)
Based ~1~on these properties. these have been classified into three types:
• Ac:1tv1ty-Aware Sm art Object.\·: ~re objects that can record information about
work activities and its own use.
• ~olic:y-Aw11re Sm art Objects: Are objects that are activity-aware Objects can
interpn:t events and acti, ities with respect to predefined organi7.ationnl policie~.
• Process-Aware Smart Objects: Processes play a l'undan1ental role.: in industri,11
work management and operation. A process is a collection of n:latcd activities or
tasks that are ordered according to their position in time and space.
c. What arc the primary components of smart connected products? (06 Marks)
Ans. Smart. connected products have three primary components
• Physical - made up of the product's mechanical and electrical parts.
• Smart - made up of sensors, microprocessors. data storage. controls. software.
and an embedded operating system with enhanced user intcrfru.:e.
• Connectivity - made up ofpo11s, antennae, and protocols enabling wired/wireless
connections that serve two purposes, it allows data to be exchanged with the
product and enables some functions of the product to exist outside the physical
device. ·
Each component expands the capabilities of one another resulting in "a virtuous
cycle of value improvement"'. First. the smart components or a product amplify the
value and capabilities of the physical components. Then. connectivity amplifies the
value and capabilities of the smart components. These improvements include:
The embed cd sys
• Monitoring of the product's conditions, its external environment. and its of these ty cs cont
operations and usage. The csscnti 11 com
• Control of various product functions to belier respond to changes in its like Motor la 68H
environment, as well as to personalize the user experience. factor that ifferen
• Optimization of the product's overall operations based on actual performance their intern I read ·
data. and reduction of downtimes through predictive maintenance and remote and system archite
service.
• Autonomous product operation, including learning fro~ their environment.
adapting to users' preferences and self-diagnosing and service. ~ Sen
OR ';.\VJll.
~
(04 Marks) :;~-r,U
For example: . . · n
• Pneumatic actuators use compressed all' to ge'.1cratc motto .
• Hydrolic actuators use liquid to generate motion.
• Electric actuators use an external power source, such as a battery, to genera1e
motion. .
• Thermal actuators use a heat sour<.;e to generate motion.
40
IL What arc embedded systems in IOT? EAplnin ( 12 Marks)
ation about It is essential to know nbout the embedded clt.:vices while learning the loT or building
the projects on loT. The embedded devices arc the objects that build tht: unique
bjects can computing system. l'he!>c !>) stems may or may not connect to the Internet.
policies
1:11 An embedded device system genernlly runs :is n single npplication. llowevcr, these
n in<lusll 1,11 devices can connect through the internet connection, and able communicate through
nctivitics or other network devices.
(06 Ma rks)
ltmboddod
Sy•to,n
' ....
, solh~are. ,'
'd/wircless
d with the
he physical
·n virtuous
amplif) the lotcru @t of Thin&• tJoTJ
1vironment.
~--~
L \len-.or"· mn->ed 10
_ _ _ _;.;"'-""~'c,,:.;~:-"
J
.. _ _ __
' Se~
:
~
........
..... .,. t=.:>-
,:.,~ M iorocoat<oll•r
Controll•I'•
0-l Marks)
~Ju
:;,., .. -:,.
·..·.,·'·";I, •• 8051
tOFS.s
4TMl:0
ate motion. ~
::.,·-,t!!\
X.4\fll~
c=> 108.
03HC:1
r F•t
Dl•play Uatt
o generate - c..• ,nen1 Du la,· LC 0 <.,,11,~J"., C,15p1a,
41
VIII Semt (CSf(ISE) I nt"er-vi.et o f ~T ~
OR
5. a. Lis t the bus iness case for IT and IOT (05 Marks)
Ans. I) Recognise the need for a business case
2) Start on the shop floor
3) Identify meaningful data
4) Employ predictive analytics
5) Track your products and assests
6) Create a new revenue modela
7) Move from drawing board to reality
8) Choose the right 1OT platforms and pa1111crs
9) Build a proof of concept
I 0) Roll Olll at scale
b. Wh:11 is me:rnt by device management in JOT (05 Marks)
Ans. Device management (DM) is an es~ential part of the loTand provides efficient means
to perform many of the management tasks for devices:
applicatio s in he
• Provis ioning: Initialization (or ~ctivation) of devices in regards toconfiguration bits p.ir s ·ond) ,
and features to be enabled.
lo" cost a cl ener
• Device Configuration; Management of device st::ttings and parameters.
recent sma phon
• Software Upgrndcs: Installation of firmware. system software. and applications
on the device. 3. Low-R tc, Lo,
are anothe key le
• Faull Management: Enablt::s error reporting and access to device status
4. IP, 6 N vorki
1
t In tht: simplest dt::ployment. the devices communicate directly with the DM server.
making tic fact
This is. however. not always optimal or even possible due to network or protocol
\ariouscap hilitie
constraints, e.g. due to a firewa ll or mismatching protocols.
7 It is fores cable t
t In these cases. the gateway functions as mediator between the server and the
ii can som how co
devices. and can operate in three different ways:
S. 6LoWP N
• If the devices are visible to the DM server. the gateway can simply forward the
(1Pv6 Over Low ~
messages between the device and the server and is not a visible participant in tht:
by the 6Lo PAN
session.
The 6LoW AN c
• In case the devices arc not visible b11t understand the DM protocol in use, the and should e appl
gate\\ay can act as a proxy, essentially acting as a DM server towards the device
limited pro cssing
and a OM client towards the server.
6. RPL
• For deployments where the devices use a different DM protm:ol from the server. 1Pv6 Routi g Prot
the gakway can represent the devices and translate between the different protocols
Low-Powe and L
(e.g. TR-069. OM/\-DM. or CoAP). The devices can be reP.resented either a~
routers an their i
virtual devices or as part of the gate\\ ay.
constraints n proc
c. What :ire the netw orking technologies adopted with IOT (06 Marks) 7. CoAP
Ans. I . Power Linc Communication Constrainc Appli
(PLC) refers to communicating 0\ er power (or phone. 1.:om,, etc.) lines. compute-c nstrain
This amounts to pulsing. with various degrees of power and frequency, the electrical
line~ used fur power distribution. l'LC comes in numerous navors. At low frequencies
42
(tens to hundreds of 1-krlz) ii is possible to communica1c over kilometers with low
bit rates (hundreds of hits per scco nd >· · and was seen
Typically. this type of communication was used for remote mcte~•ng. . ·
as potentially useful for the smart grid. Enhancement~'? allow higher b_,1 rates have
led to the possibility of delivering broadband connec11v1ty over power Imes.
1. LAN (and WLAN)
7 Continues to be important technology for M2M and loT applications.
7 This is due to the high bandwidth, reliability. and lt:gncy of the technologies.
Where power is not a limiting factor. and high bandwidth is required. devices may
connect seamlessly 10 the Internet via Ethernet (IEEE 802.3) or Wi-Fi (IEEE 802. I I).
The utility of existing (W) LAN infrastructure is evident in a number of early loT
applications targeted at lhc consumer market, particularly where integration and
control with smart phones is required.
2. Bluetooth Low Energy
rks) (BLE; .. 131uctooth Smart'') is a recent integration of Nokia's Wilm:c
eans standard with the main 131uc1001h standard. fl is designed for short-range(,50 m)
applications in healthcare, filness, security, etc., where high daln mies (millions of
tion bits per second) arc required to enable application functionality. It is deliberately
low cost and energy efficient by design, and has been integrated into the majority of
recent smart phones. ·
tions 3. Low-Rate, Low-Power Networks
arc another key technology that form the basis of the loT.
4. rPv6 Networking
crver. making the fact that devices arc networked, wilh or without wires, with
tocol variouscapabilities in terms ofrl\ngc and bandwidth, essentially seamless.
T It is foreseeable that the only hard requirement for an embedclt:d device will be that
d the
it can somehow connect with a compatible gateway device.
S. 6LoWPAN
rd the (1Pv6 Over Low Power Wireless Personal Area Networks) was developed initially
in the by the 6LoWPAN Working Group (WG) of the IETF
The 6LoWPAN concept originated from the idea 1h111 "the Internet Protocol could
sc, the and should b.: applied even 10 the smallest devices•·. and that low-power devices with
cvice limited processing capabilities should be nble to participate in the Internet ofThings
6. RPL
erver. IPv6 Routing Protocol for Low-Power and Lossy Networks. Abstract
locols
Low-Power and Lossy Networks ( LLNs) are a class of network in which both the
her as routers and their interconnect are constrained. LLN routers typically operate with
conslrainls on processing power. memory, and energy (battery power).
11rks) 7. CoAP
Constrained Application Protocol (CoAP) is a prot,,col that specifies how low-power
compute-constrained devices can operate in the internet of things (loT).
ctrical
cncies
43
VIII Se,m,, (CSE/ISE)
OR a.AMQP
TIie Advanced Me sage
6. a. Explainany threelOT application layer transport mctho<ls with suitable
lhe financial indus ry. Sec
illustra tions. (12 Marks)
Its run over TCP. MQT
Ans. A.MQTI
messaging. AMQ
MQTT (Message Queue l'elemctry Transport) wa!. dc\clop by or introduci:: by IBM
then forward it, a
in 1999 and standardiLcd by OASIS in 2013 to target come up with lightweight
Rliability. Show i figure
M2M communication. It is publish/subscribe protocol architecture similar to client/
Exchange rcspon ibilit>
server protocol show in figure below. The importance ofMQTT protocol is due to its
simplicity and the 110 necll of high CPU and memory usage (reason is the lightweight Queues is based o pre-d
protocol). MQTT supports a\\ ide range or different de\ ices and mobile platforms. subscribers .,..hos bscrib
At transport layer TLS/SSL security provide to MQTf. Pl' /SHERS
.. Examples: Cs.
BROKER a da•I)
PCs. ICtS
a11dan.1 Servers
Queue alld Top1r titd potm
derice.s
C. COAP
Co/\P (com,train d appl
Show above figure there three component arc there publishers, a broker and embedded devic 'S whe
subscribers. Publishers are genernll) lightweight sensors that connect with a broker Currently there s I ITT
and send data to a broker and go back to sleep. Subscribers are loT applications that HTTP has many featurei
interested in data send by sensors and also connect with a broker. so broker send will need more csourc
mechanism. No for lo
interested data to subscribers. The brokers classif) senS0f) data in topics and send
them lo subscribers interested in the topics. This nil thing is on loT point of vie,~. protocols and .,.. can o~
MQTT provide 3 option to achieve message in As CoAP is a Re tful w~
uses client/serv model
Quality of Services (QoS): networks with I w overt
1. On( delivery (al mosc):
better protocol mparc
Deliver message according to best try of the network. • Co/\P runs er U
An acknowledgment b not required. LO\\ est le\ el of QoS. handshake fore da
2 . One delivery (at lcnst): • Co/\P prot
At least one message can be send and some duplicate message are there. An transfer as it uses fo
acknowh.:dgment is required • It suppon ~ r types
3. O n d e livery: _ I) Cnnfinnablc 2) No
Additional protocol required to ensure that one and only one mcs,ag.c <;end. It 1s Response laye used t
highcst le\cl of QoS. n:sponse, 3) n con
ocher
44
-------
B.AMQP
The Advanced Message Queuing Protocol (AMQP) is a protocol that across from
the financial industry. Security is manage with the use of the TLS/SSL protocols.
irks) Its run over TCP. AMQT is follow publish/subscribe communication protocol for
messaging. AMQP is same like MQTT but AMQT have advantage its store data
IBM then forward it, and this feature used at when network disruptions that tiroe ensures
reliability. Show in figure below a broker divide into two part exchange and queue.
Exchange responsibility to receive publishers messages and distribute to queue.
to its Queues is based on pre-define roles and condition.and it's basically send message to
eight
subscribers who subscribe those data.
rms.
PUBLISHERS Publish - - - = = : : : - - - - - .Subscrib• st:BSC/UIJrRS
BROKER
Qme,nd
uampies· Top:t endpoint Se1wrs
PCs,
and any r-----111r~,,av~~ ~ Sm·ers
dei'ices
Qu1t1•and
Topic endJ»!m
C.COAP
CoAP (constrained application protocol) is used for low power and low memory
embedded devices where it can be used for communication instead of HTTP.
er and
Currently there is I-ITTP protocol available with request/response paradigm but
broker
HTTP has many features and more footprint [5). I ITI' P runs over TCP where TCP
,ns that
will need more resources due to three way handshake and many more complex
r send
mechanism. Now for low power embediled devices, there is no need of this heavy
d send
protocols and we can optimize it to run ov~r TCP.
iew.
As CoAP is a Restful web transfer p.otocol for use with constrained networks. CoAP
uses client/server model of approach same as HTrP. It is designed for constrained
networks with low overhead and lower footprint. Some points for CoAP that makes
better protocol compared to HTTP is:
• CoAP runs over UDP (User data-gram protocol) that helps to avoid costly TCP
handshake before data transmission
• CoAP protocol is only 4-bytc header and provides reliable transter and no reliable·
re. An
transfer as it uses four types of messages.
• It support four types of message
I) Confirmable, 2) Non-Confirmable, 3) Acknowledgement and 4) Reset. Request/
d. It is
Response layer used those message and classified in I) Piggy-backed, 2) Separate
response, 3) Non confirmable request and response, and communicate with each
other.
45
VIII Se,m., ( CSE/IS£)
b. Write expla natory notes on netwo rk optimi1.ation in IOT. (04 Marks)
Ans. Generally, networl-. optimimtion is dl!ffoed as the tcchnolo~y used to improve the
performance of the network for any environment. This plays an important role in IT.
as day by day large amount uf data frum vnrious kinds of devices nnd applications
are being populated into the netv.ork. Networl.. optimization otfors various benefits
such as faster data rate, data recovery, eliminating redundant data and to increase the
response time of application and network. Network optimization in loT is gaining
increased attention due to the e:-.pectation of a high increase in traffic from loT things
and objects, as billions of loT devices are expected tu connect global network in
the coming )'ears. Due to this, it is obvious for researchers nnd operators to provide
efficient solution to optimize loT networl-.s to reduce the loT generated traffic
impacting other services in the network and to utilize network resourceefficiently,
The traffic generated by loT devices is different from the cellular network due to
heterogeneity in applications and device types. Additionally, loT traffic needs to be
regulated to monitor the working of loT devices and its services. loT application
generates fewer amount of data, however integration of devices 10 the application
generates tht: higher volume of traffic because of control plane mt:ssagcs. Hence
this non-application traffic puts a significant additional burden on the nt:twork. So to
· overcome from this burden, efficient mechanism is require(_! to addrei.i. and optimi1.e ai.:uon t at can
the control plane messaging from loT devices. make u e ofth
OR b. Explain I e com
7. a. Justify the stateme nt" Me rging Data Analytcs and IOT will pos itive ly impact Au. I ) Erosio, of Ne
business. {08 Marks) lwo of th major
Ans. Network optimization in loT is gaining increased allention elm: to the expectation of design and ongoin
a high increase in traffic from loT things and objects. a~ billions of loT etc\ ices arc th:itnch\ ol,.s \\ er
expected to connect global nct,\orl-. in the coming }Cars. Due to this, 11 •~ ob, ious or no ~·on ccti, it
for researchers and operators lo provide elftcient solution 10 optimi7e loT networks sullic icnt no" le
to reduce the loT generated tralftc impacting other sef\ ices in the network and to threat ton t\\iorl-.
utilize network rcsourcecfficiently. The tranic generated by loT devices is different or the nctl or!.. he
,t is b,:ttcr o t.. no,
from the cellular network due to heterogeneity in applications and device types.
Additionally, loT traffic needs to be regulated to monitor the working of loT devices communi ation p
and its services. loT application generates fewer amount of data, however integration solid dc!>i n to b
ofdevices to the application generates the higher volume of traffic because of control to har<lwa ·c and
plane messages. I lencc this non-application 1ralftc puts a significant additional Thi~ 1-.in<l of or,
burden on the network. So to o,ercome from this burden, efficient mechanism is nnd the i) troduct
required to address and optimi1.e the control plane messaging from loT de, ices. considcrn.iit•n of
• Competitive Edge: loT is a bull.word in the current era of technology and there poorl~ C'O trolk
are numerous loT application developers and pro, idcrs present in the market. or ina<lc1.1 1ate nc
Tiu:: use of data analytics in loT investments,, ill provide a business unit to offer In man)
better services and will, therefore, provide the ability to gain a competitive edge that arc
Pd ork
in the market.
46
r
1) . d anal tics lhat can be used and applied in the loT
le There are differe~l types of ataS y f th St: types havi: been listt:d and described
investments to gam advantages. ome o e
r.
s below. . .... Tl . . r of data analytics is also referred as event stream
• Stream1ng Analytics. us orm R 1 t' c data streams are
ts processing and it analyzes huge in-motion _data.sets. ea. - im ct· t'ons loT
he in this roccss to detect urgent s1tua11ons and imme iate ac I ••
ng :;;~t::ions base~ on financial transactions. air fleet tracking, traffic analysis etc.
gs can benefit from this method. • . z
in • S atial Analytics: This is the data analytics me_thod .that is used to ana 1_Y,-t:
1de g:ographic patterns 10 determine tht: spatial rclat1onsh1p b~tween t!1e ~hy~i~al
C objects. Location-based loT applications, such as smart parking applications can
t\y. benefit from this form of data analytics. . .
to • Time Serk-s Analytics: As the name suggests, this form of d?ta analytics 1s
o be based upon the time-based data which is analyzed to reveal as~oc'.ated trends and
tion patterns. loT applications, such as weather forecasting app~1cat1ons and health
tion monitoring systems can benefit from this form of data analy11cs metho~. . _
ence • Prescriptive Analysis: This form of data analytics is the comb11rnt1on of
So to descriptive and predictive analysis. It is applied to unde~stnncl tl.1e b_est ?eps of
imizc action that can be taken in a particulal'situation. Commercial loT applications can
make use of this form of data analytics to gain better conclusions.
b. Explain the common challenges in OT security (08 Marks)
1pact Ans. l ) Erosion ofNetworkArchitccturc
arks) Two of the major chnllenges in securing industrial environments have been initial
ion of design and ongoing maintenance. The initial design challenges arose from the concept
cs arc that networks were sale due to physical separation from the enterprise with minimal
vious or no connectivity to the outside world, and the assumption that attackers lacked
vorks sumcient knowledge to carry out security attacks. The challenge. and the biggest
and to threat to network security, is standards and best practices either being misunderstood
fferent or the network being poorly maintained. In fact, from a security design perspective.
types. it is better to know that communication paths are insecure than to not know the actual
eviccs communication paths. It is more common that. over time, what may have been a
ration
solid design to begin with is eroded through ad hoc updates and individual changes
ontrol to hardware and machinery without consideration for the broader network impact.
tional
This kind of organic growth hns led to miscalculations of expanding networks
ism is
and lhe introduction of wireless communication in a standalone fashion, without
s. consideration of the impact to the original security design. These uncontrolled or
, there
poorly controlled OT network evolutions have. in many cases. over time led to weak
arket. or inadi:quate network an·d systems security.
offer
In many industries, the control systems consist of packages. skids. or components
edge
that arc self-contained and may be integrated as semi-autonomous portions of the
network. These packages may not be as fully or tightly integrated into the overall
47
VIII Se,m,, (CSE/IS£)
control system. networl.. management tools. or securit) applications, resulting in
potential risk.
2) Pervasive Legacy Systems
Due to the static nature and long lifccyclcs of equipment in industrial environments.
many operational systems may be deemed legacy systems. For example, in a power
utility environment, it is not uncommon to have racks of old mechanical equipment
still operating alongside modem intelligent electronic de, ices (lEDs). In many cases.
legacy components arc not restricted to i!>olated net\\Ork segment!> but have now
been consolidated into the IT operational environment. from a security perspective.
this is potentially dangcrom, as many devices may have historical vulnerabilities or
weaknesses that have not been patched and updated, or it may be that patches are not
even available due to the age of the equipment.
Beyond the endpoints, the communication infrastructure and shared centralized
compute resources are often not built to comply with modern standards. In fact.
their communication methods and protocols may be generations old and must be
interoperable with the oldest operating entity in the communications path. This
includes switches. routers. firewalls, wireless access points. servers. remote access
systems, patch management, and network management tools. All of these may have
exploitable vulnerabilities and must be protected.
3) Insecure O perational Protocols dch\C'') o mcssa
Many industrial control protocols. particularly those that arc serial based, "ere "cal..ne,,,·rom a
designed without inherent strong security requirements. Furthermore, their operation unsoli~itc rc~po
was often within an assumed secure network. In addition to any inherent weaknesses s.ccunt) c emcnt
or vulnerabilities. their operational environmt:nt may not have been designed wuh
secured access control in mind.
The structure and operation of most of these protocols is often publicly available.
scc:urit,t"
lhc abili~ to tm~
s pr
has been ddrc~,
6) ICCP l ntcr-
While they may have been originated by a private firm, for the sake ofinteropcrability.
they are typically published for others to implement. Thus, it ~ecomes a rela~i~ ely lCCP is a ~omrn
used to mmu
simple matter to compromise the protocols themselves and ~ntroduce ma_l1c1ous
bet\\ ccn iffere
actors that may use them to compromise control systems for either reconna1ssa_nce
expose a utilil)
or attack purposes that could lead to undesirab_le imp~cts in normal syste~ opcra11~n.
)OPC OU:
The followingseetions discuss some common industrial protocols and the_ir respecuve
OPC is a...ed
security concerns. Note that many have serial, IP, or Ethernet~based ,crs~ons: and the
Embed mg (0
· chaIIen ges ,and vulnerabilities are different for the d1t1erent variants.
security doma111 oml P"
4) Modbus T• d facturing acro~s I indu!>
Modbus is commonly found in many industries. such as ~111 111es an manu . • In indu trtal c
environments. and has multiple, ariants (for examph:, serial, TC~/1 P). It was ere:::: of the ontrol
b) the fir.,t programmable logic controller (PLC) vendor, Mod1con, a~d _has . around OPC
in use since the l 970s. It is one of the most widely used protocols 1_n t~duStnal
deployments, and its development i~ ~~verned by the Modbus Organ1zat10~: 1:he
security challenges that have existed with Modbus arc not unm,ual. Au1henuca11on
48
of communicating endpoints was not a default operation because it would alto,~ an
inappropriate source to sead improper commands 10 the recipient. For example, tor a
message to reach its destination. nothing more than the proper Modbus address and
function call (code) is necessary.
Some older and serial-based versions of Modbus eommunic;ite via broadcast. The
ability to curb the broadcast function does not exist in some versions. Th~re _is
potential for a recipient to ;ict on a command that was not spcc!fi_cally 1a~gct1ng 11.
furthermort:, an attack could poten1ially impact unintended rec1p1cnt devices, thus
reducing the need 10 understand the details of the network topology. . .. .
Validation of the Modbus message content is also 1101 performed by the 1111t1at111g
application. Instead. Modbus depends on tht: nt:twork stack to perform this funct ion.
This could open up the potential for protocol abuse in the sys1en1.
S) DNP3 (Distributed Network Protocol)
DNP3 is found in multiple deployment scenarios and industries. It is common in
utilities and is also found in discrete and continuous process systems. Like many
other ICS/SCADA protocols. it was intended for serial communication between
ess controllers and simple IEDs.
ave There is an explicit "secure'' version of DNP3. but there also remain many insecure
implemcncations or DNP3 as well. DNP3 has placed great emphasis on the reliable
delivery of messages. That emphasis, while normally highly des\rablt:, has a specific
1vere weakness from a security pcrspeccivc. In the case of DNP3, participants allow for
1tion unsolicited responses, which could trigger an undesired response. The missing
:sses security clement here is the avility to establish trust in the system's state and thus
with the ability to trust the veracity of the information being presented. This is akin to the
security flaws presented by Grrtuitous ARP messages in Ethernet 111;:tworks. "hich
able. has been addressed by Dynamic ARP Inspection (DAI) in modern Ethernet switches.
ility. 6) lCCP (Inter-Control Center Communications Protocol) · ·
ively ICCP is a common control protocol in utilities acro·s s North America that is frequently
cious used to communicate between utiliti'es. Given that it must traverse the boundaries
;ance between different networks. it holds an extra level of exposure and risk that could
1tion. expose a utility to cybcr attack.
1
ctive 7) OPC (OLE for Process Control)
1d the OPC is ~ased 011 the ~i~rosoft interoperability methodology Object Linking and
Embe.ddmg (OLE). Thrs rs an example where an IT standard used within the IT
doma111 and personal computers has been leveraged for use as a control protocol
uring across an industrial network.
eated
In industrial control ne~vorks. OPC is limited to operation at the higher levels
been of the control sp~ce, .with a dependence on Windows-based platforms. Concerns
1strial ;ir?und OPC ~cg11.1 with the operating system on which it operates. Many or lhc
. The Wmdows devices 111 the operational spact: arc ola. 1101 fully patched, and al risk due
1 ation to a plethora of well-~nown vulnerabilities. The dependence on OPC may reinforce
Iha! dependence. Whtie newer versions ofOPC have enhanced security capabilities.
49
VIII Se,rn,, (CSE(ISE) I Y\t"ur\-et o f ~ T ~
Of p~rtic~ilar conccr~, with OPC is the dependence on the Remote Procedure Call
(RPC) piotocol. which creates two cl~s.s~s of exposure. The first requires you to
clear_ly understa1~cl th~ _man) vulnerabil111es associated with RPC. and the second
requires you to 1dent1fy the level or risk thc5e vulnerabilities bring to a specific
network.
OR
now ~T m11l OT Security Practices and Systems Vnry? (12 Marks)
8. a The differences between an enterprise IT environment and an industrial-focused OT r,gu 8 Th
Ans.
deployment are important tu understand because they have a direct impact on the
security practice applied to them. Some of these areas arc touched on briefiy earlier el iden
in this chapter. and the> .1re more explicitly discussed in the folloVving sections. operatio al dom
The Purdue Model for Control Hierarchy an mdu rial de
Regardless of where a security threat arises. it must be consistently and unequivocall) • Ente prise z
treated. IT information is typical!) used to make business decisions, such as those in • Le, 5: E
process optimization, whereas OT information is instead characteristically leveraged Res urce
to make physical decisions. such ns closing a valve. increasing pressure. and so on. dO\:l 01\!11t 11
Thus, the operational domain must also address physical safety and environmental the utside
factors as part of its security strategy-and this is not normally associated with the • Le, 1 4: B
IT domain. Organizationally, IT nml OT teams and tu\lh- have been hb1uri<.:nlh this le,el
separate, but this has begun to change. and they have sta11ed to converge. leading opti 11iiatio
to more traditionally IT-centric solutions being introduced to support operational pri1 ing. a1
activities. for example. systems such as firewalls and intru~ion prevention systems lndust ial de
• D:V Z: Th
(\PS) arc being used in loT networks. bet ,ecn th
As the borders between traditionally separate OT and IT domains blur. they must
align strategies and work. more closely together to ensure end-to-end security. The of organi
types of devices that are found in industrial OT environment!> are typically much e, ~thin
O pe tio nal
more higlily optimized for tasks and industrial protocol-specific opemtiun than their
• L cl 3 :
IT counterparts. Furthermore. their operational profile differs as well. m, naging
Industrial environments consist of both operational and enterprise domains. To
an cont,
understand the security and networking require111cnts for a control syst~m. the use of
c;c edulin
a logical framework to describe the basic comllosition and function is needed. The m nagcm
Purdue Model for Control Hierarchy. introduced in Chapter 2, is the most widely used
Sl has D
framework across indu~trial environments globally and is used in manufacturing. eL, cl2:
oil and gas. and many other industries. It segments devices and equipment by
hierarchical function lc\e\S and areas and has been incorporated into the ISA99/l[C
62443 security standard. as sho"n in Figure 8-3. For aclditional_detail on_ how the
Purdue Model for Control Hierarchy is applied to the manufacturing and 011 and gas
indllStries, sec Chapter 9. ··Manufacturing,'' and Chapter I0, "Oil and Gas."
•
Sahlw
50
,l,•.
Fig11re H: TIie L{)gic11/ Framewurk B11sed 011 the P1m/11e Model /or Cmllro/
Hier11rchy
This model identifies levels of op~rations and defines each level. The enterprise and
operational domains are separated into different zones and kept in strict isolation via
·all) an industrial demilitarized zone (DMZ): •
SC in • Enterprise zone
aged • Level 5: Enterprise network: Corporate-level applications such as Enterprise
o on. Resource Planning (ERP), Customer Relationship Management (CRM).
cntal document management, and services such as Internet access and VPN entry from
lh the the outside world exist at this level.
icall) • Level 4: Business planning and logistics network: The IT services exist at
,a<ling this level and may include scheduling systems, material now applications.
tional optimization and planning systems, and local IT scrvifeS l;uch as phone. email.
stems printing, and security monitoring.
Industrial demilitarized zone
must • DMZ: The DMZ provides a buffer zone where services and data can be shared
. The between the operational and enterprise .1:oncs. II also allows for easy segmentation
much of organizational control. By default. no traffic should traverse the DMZ;
n their everything should originate from or tenninate on this area.
Operational zone
s. To • Level 3: Operations and control: This level includes the functions involved in
use of managing the workOows to produce the desired end products and for monitoring
. The and controlling the l!nlire operational system. This could include production
used scheduling, reliability assurance, systernwide control optimization. security
uring. management, network management, and potentially other required IT services.
nt by such as DHCP, DNS, and timing.
9/IEC • Level 2: Supervisory control: This level includes zone control rooms, controller
w the status. control system network/application administration, and other control-
nd gas related applications, such as human-machine interface (HMI) and historian.
• Level I: Dasie control: At this level, controlkrs and IEDs. dedicated HM ls, and
other applications may talk to each other to run part or all ol'the control function.
• Level 0: Process: This is where devices such as sensors and actuators and
machines such as drives. motors. and robots communicate with controllers or
IEDs.
SM".+".. EKAM Sulll\w- 51
VIII Se.mt ( CSE/IS£) I Vlt"un.et o f ~ T ~
Safety 1.onc
• Safety-critica l: This level include~ device:.. sensors. and other equipment used to
manage the safety functions of the control S) s11:m.
One of the key advnntagcs of designing nn industrial network in !>tructured levels,
as with the Purdue mod..:1, is that it allows security 10 be correctly applied at each
level and between levels. For example, n networks typically rc.:side at Levels 4 and
5 and use security principles common to IT nel\\orks. the lo"cr levels nrc where the
industrial systems and loT net"orks rt:i.idc. As shown in Figure 8, 11 DMZ resides
between the IT and OT levels. Clearly, to protect the lo,"er industrial la} ers, security
technologies such as fire" alls. pro11.y !-.ef\ crs. and IPSs should he used 10 ensure
that only authorized connections from trusted sources on e11.pected ports arc being
used. Al the DMZ, and, in fact, e,en bct\\een the lower leH:ls. industrial fil'\:\\alls
that are capable of understanding the control protocols !,hould be used to ensure the
continuoui. operation of the OT network.
Although security vulncrnbilities may potential!) e11.ist at ench level of the model.
it is clear that due 10 the amount of connectivity and sophistication of devices and
systems. the higher levels ha, e a greater chance of im:ur~i~n due to the "ider atta~k
surface. This docs no\ mean that lower lcvch. art: 1101 as important from n security
perspective; rather, it means that their altac~ surfac~ is small~r, a,~d if m~tigat,~n
tcchniqm:s nrc implemented properly, there 1s poten~ially less 1mp~c_t _l~ the ov_er,,_11
sy!>tcm. /\ s s hown ·Ill ,,·,Oour•·~ 8• I , ..~ rcvic,\ of pubhshed \ ulncrnb1h11cs
•
associ.,ted
•
with industrial security in 2011 shows that the a!>scts at the higher le, els ol the
framework had more detected vulnerabilities.
7011 PHbllshC(l llulner~b•l•I~ ArOM
.\
HI'
53
VIII Sewt1 ( CSE/ISE)
optimiled ell\ ironmcn1 rhis
b. What 11rc the models of Raspberry pi ? (05 Marks)
room temperature, but also
Ans. Raspberry Pi - T[lerc are six different models of Raspberry Pi. The Pi 2 Model
and 1he tools )Oil use to incrc
Bor Pi I Model B+ and Pi 3 Model B arc ideal for beginner projects because the>
are the most versatile and have the widei.t range or capabilities The Pi 3 Mudd
8 has the added bonus of having a quad-core processor and I GB ul' RAM so it
supports heavier operating systems. like Ubuntu and Microsoft I 0. Tht: Model A+is
a powerful board for building robotics, but doesn · 1 have an Ethernet port and only
comes with one USB pon. Raspb1;::rry Pi Zero is basically a miniature version of the
Model A+, but has a more robust computing. power. It has a micro USB pon and mini
HDMI port for I080p output compatibility but doesn't have \\ ire less capability. It
only costs $5- and Ada fruit sells v. 1.3 for just $5, but you can only buy one per order.
The Raspberry Pi Zero W is the same singh:-board computer as tlw standard Zero but
does support wireless and [3luctooth connectivity. It costs $10 on Adafruit,
c. What is a temperature sensor? (03 Marks)
Ans. A temperature sensor is a device, typically, a thermocouple or RTD,that provides
for temperature measurement through an electrical signal. A thermocouple (T/C) is Play
made from two dissimilar metals that generate electrical voltage in direct proportion loT learns as much about )
to changes in temperature. technology to support leisu
OR • Culture and Night
response lo guide you i
10. a. Explain bow IOT can be used with consumer appliances (06 Marks)
recommending rcstaura
Ans. Consumers benefit personally and professionally from the optimization and data
• Vacations - Planning
analysis of loT. loT technology beha, es like a team of personal assistants, advisors.
many utiliLe agencies,,
and security. It enhances the way we live. work. and play.
• Products and Service
Home
need than current ana
loT takes the place or a full staff -
• Butler - loT wai_ts for you to rt:tum home. und ensure, your home remains full> information like your fi
prepared. It mo111tors your supplies, family. and the state of your home. It takes b. Ex.pl:1i11 the IOT Applic,
actions to resolve any issm:s that appear. Ans. Approximately 70 percen
• Chef - /\11 loT kitchen prepares meals or simply aids) ou in preparing them. 2050 according to Gartnc
• Nann_y - loT can _somewhat act a~ a guardian b} controlling acct:ss. providinr strain on the existing infra
supplies. and alertmg the proper individuals in an emergency. living, it's only going to
• Gard~er -The same loT systems of a farm t:asily work for home landscaping. To accommodate lhis nc
• Rep:urman - Smart systems perform key maintenance and repairs and also turning to the Internet of
r,:qu,:st ,h.,m. •
• Security
. Gunrd - loT watches over you 1417 It can observe and imprO\e communicat
.. to improve nearly every
individuals 111 ·1 d - · susp1c1ous
t b cl' I es away. an recognize the potentinl ofrninor equipment problems sm:,rt c:ili.:s.
. o ecomc 1sasters well before they do.
::s..
Work
s::~~:·. ~~
1
:~.:~;~: ~ t~~~ ~.r~ n~ ~h<lirlpool allows two different heat settings on the
11 0 111 g ...n remote control.
A more efficient water su
The Internet ofThings ha
Smart meters can impro\.
A smart office or other workspace combii . . due to ineffic1cnc>, and l
with smart tools. foT !cams about ou ies custom1zat1011 of the work environment entering and anal) ,ing da
y . your Job. and the wa'-, )·ou work to <Ie1·1vcr an
optimi 7 ed environment. This result!> in practical accommodations. lil-.e adjus1111g. the
room temperature. but also more advanced benehl~ lil..e mod1f>111~ your ~d1cdule
and the tools you use to increase your output and reduce your work tame. lo r acts as
a mana •er and consultant ca abk of seein "hat ou cannot.
Play
loT learns as much about you personally as it does professionally. 1 his enables the
technology to support leisure -
• Culture and Night Life - lo r can analyze your real-world activities and
response to guide you in findmg more of the things and places you enjoy such as
arks)
recommending restaurants and events bused on your preferences and experiences.
~ data
isors. • Vacations - Planning and saving for vacations prO\ es difficult for some, and
mimy utili.i:e agencies, wh ich can be replaced by loT.
• Products and Services - loT offers better analysis of the products you like and
need than current analytics based on its deeper access. It integrates with key
s folly
information like your finances to recommend great s~lutions.
\ \akcs b. Explain the IOT Application for smart cities. (10 Marks)
Ans. Approximately 70 percenl of the world"s population is expected to live in cities by
em. 2050 according. to Gartner. This rapid urban gro\\fh is already placing. a considerable;:
ovidinr strain on tht: existing infrastructure, and with more people making the move to urban
living, it's only going to get worse in the coming years.
aping. To accommodate this new demand on cities, municipalities around the globe are
nd also turning to the Internet of Things innovation to enhance their services. reduce costs,
and improve communication and interaction. Though the potential is there for loT
picious to improve nearly every aspect of urban living. there are three lo'f applications for
oblcms smart cities.
A more efficient water supply
on lht: The Internet of Things has the potential to transform the wa> cities consume water.
Smart meters can improve leak dctcction and dntu intcgrit): pn;, cnt lost 11::, cnue
due to inefficiency. and boost productivity b) rt:ducing the amount of time spent
onmenl entering and mialyzing data. Abo. these meters can he de~igncd 10 fo:nurc cu~tomcr-
liver an
55
VIII SetW ( CSE(ISE) I n.te.-net o f ~ r~
facing portals. pro, iding rcside111s wi1h real-lime access to information abou1 their
consumption and water supply.
An innovative solution to traffic congestion
As more and more people move lo cities. traffic congestion - which is alread) a
masl>ive problem-is only going to gt:t worse. Fortunald). the Internet ofThi11gs i-. well
positioned to make improvements in this area that can benefit rei,idcnts immediately.
For example, smart traffic signals can adjust their timing to accommodate commutes
and holiday traffic and keep cars moving. City officials can collect and aggregate data
from traffic cameras. mobilt: phones. vehicles. and road l>ensors to monitor traffic
incidents in real-time. Drivers can be alerted of accidents and directed to routi:s that
arc less congested. The pos1,ibili1ies a.-.: endle,;s and the impact "ill be substantial.
More reliable public transportation
Public transportation is disrupted \\htncver there are road closures. bad weather. or w l mimuce
equipment breakdowns. loT can give transit authorities the n:al-timc insights they - ID illlaKt
need to implement contingency plans. ensuring that residents alwnys have access to . . decisions taken.
safe, reliable, and efficient public transportation. Thi), might be done using insight:. llleapoflOTiso
from cameras or connected devices at bus shelters or other public arens. . _ bme period, the
Energy-efficient buildings ,apulation. With
loT technology is making it easier for buildings with legacy infrastructure to save a IICW age was u
energy and improve their sustainability. Smart building energy management systems. wid1 the creation o
for instance, use loT devices to connect disparate, nonstandard heating. cooling. for Procter & Garn
lighting, and fire-safety systems to a central management application. The energy ID linking the com
management application then highlights areas of high use and energy drills <:o staff
can correct them.
Research shows that commercial built.lings waste up to 30 percent of the energy they
use,' so savings with II smart building energy management system can be significant.
As hlorc smArt city buildings use energy management systems. the city will become
more sustainable 11s a whol..:.
Improved public safety
Smart cities and their CSP partners often implement video monitoring systems to
tackle the safety concern~ that come up in every growing cit). !)ome cities 110\\ have
Internet Phase
hundreds of cameras monitoring traffic for accidtnts and public streets for safet)
concerns. Video analytics software helps process the thousands of hours of video Connectivity
footage each camera produces, whinling it down to onl} important events. Systems (Digitize access
using loT technology turn every camera attached 10 the system into n sensor. with
edge computing and anal} tics starting right from the source. Artificial intelligence Networked Econ
technology like machine lt:arning will then complt:te the analysis 1111<1 st:nd video (Digitize busine
footage to humane; \,ho can react quick!) to ,ol,c problems and 1-eep residents sati:.
Cities arc also improving public safet} with smart lighting initiatives that replace
traditional streetlights \\ ith connected LED infrastructure. Nol only do the LED
lights last longer and conser\C energy, they also provide information on outages in
1cal lime. City workers can use that information to ensure important areas arc well lit
to deter crimes and make the public feel safer.
56
Eight Semeste r B.E. D'--gree Examination, CBCS - June / July 2019
Internet of Things Technology
Max. Marks: 80
Note : Answer atty FIVE fill/ questlotM, selecting ONE full quest/011 f rom each nwdule.
Module- I
What is JOT"! Explain in detail on Genesis ofIOT. (08 Marks)
Internet of Things (IOT) is an ecosystem of connected physical objects that are
accessible through the internet. The 'thing' in IOT could be a person with a heart
monitor or an automobile with built-in-sensors, i.e. objects that have been assigned
an IP address and have the ability to collect and transfer data over a network without
manual assistance or intervention. The embedded technology in the objects helps
them to interact with internal states or the external environment, which in turn affects
the decisions taken.
The age ofIOT is often said to have started between the years 2008 and 2009. During
this time period, the number ofdevices connected to the Internet eclipsed the world's
population. With more "things" connected to the Internet than people in the world,
a new age was upon us, and tho Internet of Things was born. The person credited
with the creation of the term "Internet of Things" is Kevin Ashton. While working
for Procter & Gamble in 1999, Kevin used this phrase to explain a new idea related
lo linking the company's supply chain to the Internet.
-
Buelnee•
Soclo111
IOISNCl
gcs in
ell lit
1
VIII Sem, ( CSE I ISE)
b. What does loT and digitization mean? Elaborate on this concept. (04 Marks)
Ans. loT and digitization are terms that are often used interchangeably. In most contexts,
this duality is fine, but there are key differences to be aware of. At a high level. 10
loT focuses on connecting "things," such as objects and machines, to a computer
network, such as the Internet. loT is a well-understood tenn used across the industry 0
as a whole. On the other hand, digitization can mean different things to different
people but generally encompasses the connection of "things" with the data they What these numbers
generate and the business insights that result. businesses interact wi
For example. in a shopping mall where Wi-Fi location tracking has been deployed, using real-time conne
the "things" are the Wi-Fi devices. Wi-Fi location tracking _is simply the capability This in tum results i
of knowing where a consumer is in a retail environment through his or her smart services that save tun
phone's connection lo the retailer's Wi-fi network. While the value of connecting quality of life.
Wi-Fi devices or " things" to the Internet is obvious and appreciated by shoppers,
tracking real-time location of Wi-Fi clients provides a specific business benefit to the
mall and shop owners. In this case, it helps the business understand where shoppers
tend to congregate and how much time they spend in different parts of a mall or
store. Analysis of this data can lead to significant changes to the locations of product
displays and advertising. where. to place certain types of shops, how much rent to
charge, and even where to station security guards.
Security
2
•June,/ J!Al,y 2019
encompas
ways bcin
Jications ar
50
•
...,.._ ·-~'-~~~~ / Billion ,
40
machines i
xperiences
fo -·- h-.-.-. / -·
po
-- ••'-
'I
contexts,
high level, 10
_.......,.
a computer 7 3,1• 7 83
the industry 0
2003 2008 2010 2015 2020
to different
data they What these numbers mean is that loT will fundamentally shift the way people and
businesses interact with their surroundings. Managing and monitoring smart objects
using real-time connectivity enables a whole new level ofdatadriven decision making.
This in tum results in the optimization of systems and processes and delivers new
services that save time for both people and businesses while improving the overall
connecting quality of life.
shoppers. OR
efit to the
shoppers Discuss IOT challenges. (08 Marks)
fa mall or
Challenge Description
While the scale oflT networks can be large, the scale of OT can be
several orders of magnitude larger. For example, one large electrical
utility in Asia recently began deploying 1Pv6-based smart meters on
its electrical grid. While this utility company has tens of thousands
Scale ofemployees (which can be considered IP nodes in the network), the
number of meters in the service area is tens of millions. This means
the scale of the network the utility is managing has increased by
more than 1,000-fold! "IP as the loT Network Layer," explores how
new design approaches are being developed to scale 1Pv6 networks
into the millions of devices.
With more "things• becoming connected with other "things" and
people. security is an increasingly complex issue for IoT. Your threat
Security surface is now greatly expanded. and if a device gets hacked, its
connectivity is a major concern. A compromised device can serve as
a launching point to attack ocher devices and systems. loT security
is also pervasive across just ..... ffery facet ofloT.
3
VIII Sern, ( CSE ( ISE)
hardware. Examples includ
As sensors become more prolific in our everyday lives, much ·or the VPNs, and so on. Riding o
data they gather will be specific to individuals and their activities. adds APls and midd\eware
This data can range from health information to shopping patterns Network layer: This is the
Privacy and transactions at a retail establishment. For businesses, this data It includes the devices
has monetary value. Organizations are now discussing who owns them. Embodiments of th
this data and how individuals can control whether it is shared and technologies, such as IEE
with whom. as IEEE 801.11 ah. Also i
loT and its large number of sensors is going to trigger a del~ge of power line communicatio
data that must be handled, This data will provide critical information
Big data and and insights if it can be processed in an efficient maimer. The
data analytics challenge, however, is evaluating massive amounts of data arriving
from different sources in various forms and doing so in a timely
manner,
As with any other nascent technology, various protocols and
architectures are jockeying for market share and standardization
bT within loT. Some of these protocols and architectures are based on
1nteropera 1 ,ty proprietary elements, and others arc open. Recent loT standards ate Commu
Ne
helping minimize this problem. but there are often various protocols
and implementations available for loT networks.
4
cscs ·J1M1.€,,(JIM(l2019
' uch 'of the hardware. Examples include backhaul communications via cellular, MPLS networks,
activities. VPNs, and so on. Riding on top is the common services layer. This conceptual layer
adds APls and middleware supporting third-party services and applications.
g patterns
s, this data Network layer: This is the commun ication domain for the IoT devices and endpoints.
It incfudes the devices themselves and the communications network that links
who owns
them. Embodiments of this communications infrastructure include wireless mesh
shared and
technologies, such as IEEE 802.15.4, and wireless point-to-multipoint systems, such
as IEEE 801.1 I ah. Also included are wired device connections, such as IEEE 190 I
a deluge of power line communications.
nformation c. Explain core "loT" functional stack.
aimer. The (04 Marks)
tocols and
dardization
re based on
Applications Cloud
7
Communicauons
-~
tandards ate
us protocols
Network j Fog
I
Things: Sensors and
(04 Marks) Actuators
Edge
I
1. "Things" layer: At this layer, the physical devices need to fit the constraints of
the environment in which they are deployed while still being able to provide the
information needed.
2. Communications network layer: When smart objects are not self contained, they
need to communicate with an external system. In many cases, this communication
uses a wireless technology. This layer has four sublayers:
Access network sublayer: The last mile of the IoT network is the access network.
This is typically made up of wireless technologies such as 802. l lah, 802.15.4g, and
LoRa. The sensors connected to the access network may also be wired.
. Gateways and backhaul network sublayer: A common communication system
organizes multiple smart objects in a given area around a common gateway. The
gateway communicates directly with the smart objects. The role of the gateway is
to connectivity to forward the collected information through a longer range medium (called the
pplication-layer backhaul) to a headend central station where the information is processed.
for interaction Network transport sublayer: For communication to be successful, network and
ndustry-specific transport layer protocols such as IP and UDP must be implemented to support the
:vertical entities. variety of devices to connect and media to use..
ss the vertical IoT network management sublayer: Addi~ prococols must be in place to allow
hysical network the headend applications to exchange data wida 6c sensors. Examples include CoAP
tocols, and the and MQTT.
5
-Jun,e,,/Ju½! 2
In.turtet of~ T~ and moisture ·analy
c:har&cteri,;tics of ga
VIII SeAW ( CSf / [Sf} upper layer, an application need\:
M •\he die density of gas su
~n,a:n. '->'ul~\;,'"5. ...,,..~~\'\ "~ce~~"a.t':(,
-. . - ..- - - -······ .... -...a. ..........., "'-~ ... , ..,,\" ; ' ._;;.on,\l'U\ \\'\.C
process the co\\ccte<l \lata, no\ o n Y " · · t t
to make intelligent decision based on the information collect~~ and, 111 turn, ms ru~ for thermal image
the "things" or other systems to adapt to the analyzed conditions and change their 6. Optical Sensors
behaviors or parameters. it measures a physi
Module- 2 maybe digital for
units. It involves n
3. a. List and explain dll!erent types of senson. . . (08 M arks)
Also, it is used in
Ans. Here is the list of Sensors most commonly used m the l oT devices, the two different
l. Temperature Sensor common devices,
2. Pressure Sensor that tum on auto
3. Proximity Sensor 7. Gas Sensor:
4. Accelerometer and Gyroscope Sensor
5. IR Sensor
6. Optical Sensor it can detect com
7. Gas Sensor 8. Smoke Senso
8. Smoke Sensor
1. Temperature Sensor: A Temperature Sensor senses and ~easure~ the te~perature
and converts it into an electrical signal. They have a maJor role m Environment,
smoke sensor, an
Agriculture and Industries. For example, these sensors can detect the temperature of
photoelectric smo
the soil which is more helpful in the production of crops.
2. Pres'sure Sensor: A pressure sensor senses the pressure applied ie, force per contains a pulsed
every I 0seconds t
unit area, and it converts into an electrical signal. It has high importance in weather
forecasting. There are various Pressure sensors available in the market for many
purposes.
3. Proximity Sensor: A proximity sensor is a sensor able to detect the presence
of nearby objects without any physical contact. A proximity sensor often emits an
electromagnetic field or a bean1 of electromagnetic radiation and looks for changes
in the field or return signal. A most common application of this sensor is used in cars.
4. Accelerometer and Gyroscope Sensor: The difference between Accelerometer
and the gyroscope is accelerometer measures linear acceleration based on vibration
whereas, the gyroscope is intended to detenuine an angular position based on the
principle of the rigidity of space. Accelerometers in mobile phones are used to
detect the orientation of the phone. The gyroscope, adds an additional dimension
to the information supplied by the accelerometer by tracking rotation or twist. A
3D gyroscope has three gyroscopic sensors mounted orthogonally. Accelerometers
and gyroscopes are the sensors of choice for acquiring acceleration and rotational
infqrmation in drones, cell phones, automobiles, airplanes, and mobile loT devices.
5. Infrared Sensors: An Infrared Sensor is an electronic device, which senses certain
characteristics of its surroundings by emitting Infrared radiation. It has the ability
to measure the heat being emitted by an object and also measures the distance. It
has been implemented in various applications. lt is used in Radiation them1omete~s
depend on the material of the object. IR sensors are also used in Flame monitors
6
and moisture analysis. IR sensors are used in gas analyzers which use absorption
characteristics of gases in the IR region. Two types of methods are used to measure
ary, but the density of gas such as dispersive and nondispersive. IR imaging devices are used
instruct for thermal imagers and also for night vision.
ge their 6. Optical Sensors: The Optical Sensors conven light rays into an electronic signal,
it measures a physical quantity of light and transforms into a fonn which is readable,
maybe digital form. It detects the electromagnetic energy and sends the results to the
units. It involves no optical fibers. It is a great boon to the cameras on mobile phones.
arks) Also, it is used in mining, chemical factories, refineries, etc. LASER and LED are
the two different types of light source. Optical sensors are integral parts of many
common devices, including computers, copy machines (Xerox) and light fixtures
that tum on automatically in the dark.
7. Gas Sensor: A Gas Sensor or a Gas detector is a device that detects the gas in an
area, which is very helpful in safety systems. It usually detects a gas leak in an area,
that results are sent to a control system or a microcontroller, that finally shuts down.
it can detect combustible, flammable and toxic gases.
8. Smoke Sensor: A smoke sensor detects smoke and its level of attainment.
Nowadays, the manufacturers of the sensor implement it with a voice alarm through
ALEXA, also notifies in our smartphones. The smoke sensor ifoftM'o types, Optical
smoke sensor, and the ionization smoke sensor. The optical smoke sensor also called
photoelectric smoke alarms works using the light scattering principle. The alarm
contains a pulsed Infrared LED which pulses a beam of light into the sensor chamber
rce per
every I 0seconds to check for smoke particles.
M"cathcr
r many b. Elaborate on small physical objects and small virtual obje\:ts. (04 Marks)
Ans. Refer Q.no 3 od Model Paper I
resence Explain "loT Access Technologies" (04 Marks)
mits an loT Access Technology is spread across licensed and unlicensed spectrum and there
hanges are several number of Radio technologies. At high this access can be classified in
in cars. two categories:
rometer I. Non -Cellular Technologies
ibration 2. Cellular Technologies
on the Each of the technologies available for loT connectivity has its own advantages and
used to disadvantages. However, the range ofloT connectivity requirements- both technical
and commercial - means cellular technologies can provide clear benefits across a
wide variety of applications. While choosing technology following requirement
needs to be considered
• It should have Global Reach
• Matured Ecosystem
• Diverse and Secure
• Scalable and QoS support
• Low Total Cost ofOwnership (TOC)
7
VIII Se.m, (CSE ( ISE)
A common infonnation set is being provided. Particularly, the· following topics are
addressed for each loT access technology: Wireless! !
1. Standardization and alliances: The standards bodies that maintain the protocols is a protoc
for a technology Wireless self:healin
HART the 2.4 G
2. Physical layer: The wired or wireless methods and relevant frequencies
3. MAC layer: Considerations at the Media Access Control (MAC) layer, which can be fi
engincerin
bridges the physical layer with data link control
4. Topology: The topologies supported by the technology Construct
5. Security: Security aspects of the technology . stack for
Thread
6. Competitive technologies: Other technologies that are similar and may be suitable products i
alternatives to the given technology Thread G
OR b. What is SANET? Ex
4. a. Briefty Explain protocol stack utilization IEEE 802.15.4. (08 Marks) based solution offcn.
Ans. A sensor/actuator netw
Description sense and measure thei
.Protocol
The sensors and/or a
Promoted through the ZigBee Alliance, ZigBee defines upper-layer cooperating in a produ
components (network through application) as well as application and cooperation is a p
profiles. Common profiles include building automation, home in SANETs are diverse!
automation, and healthcare. ZigBee also defines device object The following arc some
ZigBee functions, such as device role, device discovery, network join, and offers:
security. For more information on ZigBee, see the ZigBee Alliance Advantages:
webpage, at W\vw.ziibee.oo:. ZigBee is also discussed in more detail I. Greater deployment~
later in the next Section. 2. hard-to-reach places)
6 LOWPAN is an 1Pv6 adaptation layer defined by the fETF 6L0WPAN 3. Simpler scaling to a
4. Lower implementatic
6LoWPAN working group that describes how to transport I Pv6 packets over
IEEE 802.15.4 layers. RFCs document header compression and TPv6 5. Easier long-term mal
enhancements to cope with the specific details of IEEE 802.15.4. 6. Effortless introducti
7. Better equipped to h
An evolution of the ZigBee protocol stack. ZigBec IP adopts the
Disadvantages:
ZigBee lP 6F0WPAN adap~~tion_layer, l~v6 network layer, and RPI. routing
I. Potentially less secu
·pro~ocol. ln ~dd,tton, 1t offers improvements to [P security. ZigBee JP
is discussed 111 more detail later in this chapter.
2. Typically lower tran
3. Great.er level of imp
ISA I00. !,la .is developed by the International Society of Automation
(ISA) as Wireles~ Sy~tems for Industrial Automation: Process Control
ISAIOO. I la nod _Reln~ed Appl1cnt1ons." It is based on IEEE 802 15 4-2006 d Explain working of
spcc1ficnt1ons were bl' h d • · · , an
pu is e m 2 OIO and then as IEC 62734 Th ~oT network lechnolo
network and transport layers arc based on IETK 6L0WPAN IP 6. e mclude cellular, wifi.
UDP standards. , v , .nut
as LPWAN, Bluetooth
consider which ne
mindful of the folio
8
-JUAf!/(J!Afy 2019
9
VIII Semt (CSE I ISf) I~n.etc,f ~T~
I. Range technologies are released
2. Bandwidth subject to change.
3. Power usage Security
4. lntennittent connectivity Security is always a priori
5. lnteroperability implement end-lo-end secu
6. Security protection.
Range · h d · ty · II ._ Write note on B usiness
Networks can be described in terms of the distances over wh1c ata 1s pica Y Data flowing from or to
transmitted by the loT devices attached to the network: center servers eith~r in the c
PAN (Personal Area Network) Dedicated applications are
PAN 1·s short-range, where distances can be measured in meters, such as a wearable or on network edge platfi
. that communicates
fitness trackci: device • w1'th an app on a cell phone over BLE. applications communicate
LAN (Local Area Network) solutions combining vari
LAN is short- to medium-range, where distances can be up to hundreds of met~rs, approach with a common
such as home automation or sensors that are installe~ within_ a _factory pr~d~ction or upper (application) lay
line that communicate over wifi with a gateway device that 1s installed w1thm the started playing a key arch·
same building. in the IT markets but also
MAN (Metropolltan Area Network) .
MA1'!_ is long-range (city wide), where distances are meas~red up to a few k,lomet~rs, Discuss ne~ for optlm"
such as smart parking sensors ins1alled throughout a city that are connected m a The Internet ofThings wi
mesh network topology. challenges !.till exist for I
WAN (Wide Area Network) . . of non-IP devices, you
WAN is long-range, where distances can be measured m kilometers, such as levels that loT often im
agricullura\ sensors that are ins1alled across a large farm or ranch that are used to of the IP stack to hand
monitor micro-climate environmental conditions across the property. following sections take a
Bandwidth the nodes and the netwo
Oandwid1h, or the amount of data that can be transmitted in a specific period of is transitioning from vc
time, limits the rate at which data can be collected from loT devices and transmitted the loT space.
upstream.
Power usagt
Transmitting data from a device consumes power. and transmitting data over long Describo application p
ranges requires more power than over a short range. You must consider the devices The loT application prot
that operate on a battery to conserve power to prolong the life of the battery and Vertical industries they a
reduce operating costs. on the characteristics 0
lntennittent connectivity protocols that are suffici
loT devices aren't always connected. In some cases, devices will connect periodically well suited for constrain
by design in order to save power or bandwidth. ne Transport Layer:
Interoperability lhe constrained nature
With so many different devices connecting to the IoT, interoperability can be lnditional transport mec
a challenge. Adopting standard protocols has been the traditional approach for leT Application Trans
maintaining interoperability on the internet. However, for the loT, standardization loT application data an
processes sometimes struggle to keep up with the rapid pace of change and -.clitionaJ networks, TC
10
llc:hnologies are released based on upcoming versions of standards that are still
llbject to change.
~~ . .
Security is always a priority, so be sure to select networking technologies that
aplement end-to-end security, including authentication, encryption, and open port
protection.
Write note on Business case for IP (04 Marks)
Data flowing from or to "things" is consumed, controlled, or monitored by data
center servers either in the cloud or in locations that may be distributed or centralized.
Dedicated applications are then run over virtualized or traditional operating systems
or on network edge platforms (for example, fog computing). These lightweight
applications communicate with the data center servers. Therefore. the system
solutions combining various physical and data link layers call for an architectural
approach with a common laycr(s) independent from the lower (connectivity) and/
or upper (application) layers. This is how and why the Internet Protocol (IP) suite
started playing a key architectural role in the early I990s. IP was not only preferred
in the IT markets but also for the OT environment.
~~~~~iza~ ~~~
The Internet of Things will largely be built on the Internet Protocol suite. However,
challenges still exist for IP in IoT solutions. In addition to coping with the integration
of non-IP devices, you may need to deal with the limits at the device and network
levels that loT often imposes. Therefore, optimizations are needed at various layers
of the IP stack to handle the restrictions that aro present in loT networks. The
following sections take a detailed look at why optimization is necessary for IP. Both
the nodes and the network itself can often be constrained in loT solutions. Also, IP
is transitioning from version 4 to version 6, which can add further confinements in
;mitted the IoT space.
OR
•r long Describe application protocols for IoT. (08 Marks)
evices The loT application protocols you select should be contingent on the use cases and
vertical industries they apply to. In addition, IoT application protocols are dependent
ry and
on the characteristic~ of the lower layers themHlves. For example, application
protocols that arc sufficient for generic nodes and traditional networks often are not
well suited for constrained nodes and networks.
ically
Tbe Transport Layer: IP-based networks use either TCP or UDP. However,
the constrained nature of loT networks requires a c:loler look at the use of these
traditional transport mechanisms.
loT Application Transport Methods: This section aplores the various types of
loT application data and the ways this data can be canted across a network. As in
traditional networks, TCP or UDP are utilized in mOlt caes when transporting loT
11
VIII Set11t (CSE I I S£) -1~1J~ 2019
application data. The transport methods are covered in depth and form the bulk of the
material in this chapter. You will notice that, as with the lower-layer loT protocols,
there are typically multiple options and solutions prestinted for transporting loT
application data. This is because loT is still developing and maruring and has to
account for the transport of not only new application protocols and technologies but
legacy ones as well.
b. Discuss the various methods used in loT application transport. (08 Marks)
Ans. The following categories of loT application protocols and their transport methods
are explored in the following sections:
Application layer protocol not present: In this case, the data pay load is directly
transported on top of the lower layers. No application layer protocol is used.
Supervisory control and data acquisition (SCADA): SCADA is one of the most a. What do you mean by
common industrial protocols in the world. but it was developed long before the days Data Analytics has a si
of IP, and it has been adapted for IP networks. applications and inv
Generic web-based protocols: Generic protocols, such as Ethernet, Wi-Fi. and effective use of their da
4G/LTE, are found on many consumer- and enterprise-class loT devices that Volume: There are huge
communicate over non-constrained networks. The business organizatio
loT application layer protocols: loT application layer protocols are devised to run analyze the same for ex
on constrained nodes with a small compute footprint and are well adapted to the data can be analyzed easi
network bandwidth constraints on cellular or satellite links or constrained 6LoWPAN Structure: loT applicati
networks. Message Queuing Telemetry Transport (MQTT) and Constrained as unstructured, semi-s
Application Protocol (CoAP), covered later in this chapter, are two well-known significant difference in
examples of loT application layer protocols. business executive to an
SCADA and software.
In the world of networking technologies and protocols, loT is relatively new. Driving Revenue: The
Combined with the fact that IP is the de facto standard for computer networking business units to gain an
in general, older protocols that connected sensors and actuators have evolved and lead to the development
adapted themselves to utilize IP. A prime example of this evolution is supervisory expectations. This, in
control and data acquisition (SCADA). Designed decades ago, SCADA is an organizations.
automation control system that was initially implemented without IP over serial Competitive Edge: loT
links, before being adapted to Ethernet and 1Pv4. · numerous loT applicati
IoT Application Layer Protocols data analytics in loT in
When considering constrained networks and/or a large-scale deployment of and will, therefore, pro'II
constrained nodes, verbose web-based and data model protocols. To address this
b. Discuas Bigdata analy
problem, the loT industry is working on new lightweight protocols that are better
suited to large numbers of constrained nodes and networks. Two of the most popular
Alla. Generally, the industry
protocols are CoAP and MQTT. Figure 6-6 highlights their position in a common
1. Velocity: Velocity
Hadoop Distributed Fi
loT protocol stack.
Smart objects can gene
database or file system
12
·JIM\£,,/JiA½t 2019
CoAP MOTT
,_
UDP TCP
INS
~--~.:.:.:.:J'~-'
-~,'·--
.....:¥·---
.
·-·-=·- -b'·~~
~
r~:i:-,._ - .
~.: ,.__
~ I I
-
,,~ -
-
.,..,.:.~~1'
I:.;' .. _..,·
~~--.,.
Module-4
What do you mean by data and analytics for loT? Explain. (04 Marks)
Data Analytics has a significant role to play in the growth and success of loT
applications and investments. Analytics tools will allow the business units to make
effective use of their datasets as explained in the points listed below.
Volume: There are huge clusters of data sets that loT applications make use of.
The business organizations need to manage these large volumes of data and need to
analyze the same for extracting relevant patterns. These datasets along with real-time
data can be analyzed easily and efficiently with data analytics software.
Structure: loT applications involve data sets that may have a varied structure
as unstructured, semi-structured and structured data sets. There may also be a
significant difference in the data formats and types. Data analytics will allow the
business executive to analyze all of these varying sets of data using automated tools
and software.
Driving Revenue: The use of data analytics in loT investments will allow the
business units to gain an insight into customer preferences and choices. This would
lead to the development of services and offers as per the customer demands and
expectations. This, in tum, will improve the revenues and profits earned by the
organizations.
Competitive Edge: loT is a buzzword in the current era of technology and there are
numerous loT application developers and providers present in the market. The use of
data analytics in loT investments w ill provide a business unit to offer better services
and will, therefore, provide the ability to gain a competitive edge in the market._
b. Discuss Bigda ta a nalytics tools and technology. (04 Marks)
Ans. Generally, the industry looks to the "three Vs" to categorize big data:
I . Velocity: Velocity refers to how quickly data is being collected and analyzed.
Hadoop Distributed File System is designed to ingest and process data very quickly.
Smart objects can generate machine and sensor data at a very fast rate and require
database or fi le systems capable of equally fast ingest functions.
13
VIII Se,rn, (CSE I ISE)
2 • Variety: Variety refers to different types of data. Often you see data cate~orized
. d t t d Different database technolog,cs may
as structured, sem r structure , or uns rue ure . 1-1 d .sable to collect and store all
· ofthese types. a oop 1 •
only be capable ofncccptin,i ~e. l h~n combinino machine data from loT devices
~~::~ry- " Th1<1 cnn be ben1:nCIA w "' . \ d'
s\ructured in nature with data from other sources, such as socia me 1a or
multimedia, that is unstructured.
TbePurdueM
Regardless ofwh
. . . treated. IT inform
3. Volume: Volume refers 10 the scale of the data. Typ1cal\y, this 1s measured from
process optimizat
gigabytes on the very low end to petabytcs or even exabytes of da!a on ~he other
to make physical
extreme. Generally, big data implementations scale beyond what 1s available on
locally attached storage disks on a single node. It is common to see clusters of Thus, the operati
factors as part of
servers that consist of dozens, hundreds, or even thousands of nodes for some large
IT domain. Orga1
deployments.
separate, but this
c. With a case study relate the concept ofsecurin& loT. (08 Marks) to more tradition
Ans. It is often said that if World War Ill breaks out, it wil\ be fought in cyberspace. As loT activities. For ex
brings more and more systems together under the umbrella of network connectivity. (JPS) are being
security has never been more important. From the electrical grid system that powers As the borders
our world, to the lights that control the flow of traffic in a city, to the systems that keep align strategies a
airplanes flying in an organized and efficient way, security of the networks, devices, types of devices
and the applications that use them is foundational and essential for all modern more high\y opti
communications systems. Providing security in such a world is not easy. Security IT counterparts.
is among the very few, if not the on\y, techno\ogy disciplines that must operate with
external forces continually working against desired outcomes. To further complicate
matters, these external forces are able to leverage traditional technology as well as
nontechnical methods (for example, physical security. operational processes, and
so on) to m-eet their goals. With so many potential attack vectors, information and
cybcrsecurity is a challenging. but engaging. wpi~ \hat is of critical importance to
-.,uw,ogy vendors. enterprises, and service providers alike.
Information technolol)' (IT) environments have faced active attacks and information
socurity threats for many decades, and the incidents and lessons learned are well-
known aod documented. By contrast, operational technology (OT) environments
were traditionally kept in silos and had only limited connection to other networks. f :lolety
Thus, the history of cyber-attacks on OT systems is much shorter and has far ......,rlsezoae
fewer incidents documented. Therefore, the learning opportunities and the body of • Le¥el 5: Eate
cataloged incidents with their corresponding mitigations are not as rich as in the 1T Resource Pia
world. Security in the OT world also addresses a wider scope than in the IT world. cloc:wnent man
For example, in OT, the word security is almost synonymous with safety. In fact, 1be outside wo
many of the industrial security standards that form the foundation for industrial loT Lffel 4: Bui
security also incorporate equipment and personnel safety recommendations. tbis level and
optimization
printing, and
14
.OR
in In detail how IT 11nd OT sec:■rity practices and systems vary in real
..._ (OS Marks)
TIie Purdue Model for Control Hienrclly
ltegardless of where a security threat arises, it must be consistently and unequivocally
lreatcd.IT information is typically used to make business decisions, such as those in
process optimization, whereas OT information is instead characteristically leveraged
to make physical decisions, such as closing a valve, increasing pressure, and so on.
Thus, the operational domain must also address physical safety and environmental
factors as part of its security strategy-and this is not normally associated with the .
IT domain. Organizationally, IT and OT teams and tools have been historically
separate, but this has begun to change, and they have started to converge, leading
to more traditionally IT-centric solutions being introduced to support operational
activities. For example, systems such as firewalls and intrusion prevention systems
(IPS) are being used in loT networks.
As the borders between traditionally separate OT and IT domains blur, they must
align strategics and work more closely together to ensure end-to-end security. The
types of devices that are found in industrial OT environments are typically much
more highly optimized for tasks and industrial protocol-specific operation than their
IT counterparts. Furth~rmore, tJ1eir operational profile differs as well.
Er1e,,nse Zone
DMZ
OptmtJor•&w,,,t
rmation P"""'""Coiltll/
SCAOI\Zono
e wcll-
~nments
tworks.
has far Enterprise zone
body of • Level 5: Enterprise network: Corporate-level applications such as Enterprise
the IT Resource Planning (ERP), Customer Retltioaship Management (CRM),
world. document management, and services such II lllernet access and VPN entry from
In fact, the outside world exist at this level.
rial loT • Level 4: Business planning and lo1ladca ■■•11-ank: The IT services exist at
this level and may include scheduling s, fl • material flow applications,
optimization and planning systems, and local · such as phone, email,
printing, and security monitoring.
15
VIII Sem, (CSE ( ISE)
b. Discuss OCTAVE and FAIR formal r\sk. analysis. Here are \he vari
Am. Refer Q. no 8 b of Model Paper l 1. ARM CPU/
Module-5 that's made up o
processing unit (
9. a. Give a brief note on Arduino UNO (04 Marks) work (taking inp
Ans. TheArduino Uno is a microcontroller board based on theATmega328 (datasheet). It graphics output.
has 14 digital input/output pins (ofwhieh 6 can be used as PWM outputs), 6 analog 2. GPIO - Thes
inputs, a 16 MHz ceramic resonator, a USB connection, a power jack, an lCSP header, will allow the re
and a reset button. lt contains everything needed to support the microcontroller; 3.RCA-An R
simply connect it to a computer with a USB cable or power it with a AC-to-DC devices.
adapter or battery to get started. 4. Audio out -
The Uno differs from all preceding boards in that it does not use the FTDI USB-to• devices such as
serial driver chip. Instead, it features the Atmega 16U2 (Atmega8U2 up to version 5. LEDs-Ligh
R2) programmed as a USB-to-serial convener. Revision 2 of the Uno board has a 6. lJSB - This
resistor pulling the 8U2 HWB line to ground, making it easier to put into DFU mode. (including your
Revision 3 of the board has the following new features: can use a USB
• pinout: added SDA and SCL pins that are near to the AREF pin and two other keyboard if it ha
~wp~ . 7. HDMI -Thi
• placed near to the RESET pin, the IOREF that allow the shields to adapt to the
16
cs -Jr.,f,,tlU!//J~ 2019
volta e rovided from the board. In future. shields will be c~mp~tible b~th with
;hared the b!a! that use·the AYR, which operate with 5V and ~•th the_ Ardumo Due
,tation that operate w1·th~.>.3y. The second one is a not connected pm, that 1s reserved for
DMZ; future purposes.
• Stronger RESET circuit.
• Atmega l 6U2 replace the 8U2. . .
"Uno" means one in Italian and is named to mark the upc~mmg releas~ of Ardu~no
1.0. The Uno and version 1.0 will be the reference versions of Ardumo, movmg
forward: The Uno is the latest in a series of USB Arduino boards, and the reference
model for the Arduino platform; for a comparison with previous versions
:rvices, b. With a neat diagram, explain Raspberry PI board. (04 Marks)
Ans.
troller iCA VID£0 AUDIO lflS LAN
ontrol-
I' ~ ¥ .
. /
.. .
~
llers or Flv 2
CP
somo POWEi
~
17
VIII Sem, (CSc I IS£) •Jl,,l,.t'\,(!//J~20
othtr compatible device using an HDMI cable.
8. Power - This is a Sv Micro USB power connector into which you can plug your
compatible power supply. •· Explain in deta il sm
9. SD cardalot - This is a full-sized SD card slot. An SD card with an operating Refer Q. no IO b. of
system (OS) installed is required for booting the device. They are available for b. With case study expl
purchase from the manufacturers, but you can also download an OS and save it to
the card yourself if you have a Linux machine and the wherewithal.
10. Ethernet - This connector allows for wired network access and is only available
on the Model B.
c. With a neat diagram, explain wirless temperature monitoring system using
Raspberry Pl. (08 Marks)
Ans.
____
RASPBERRY Pl
_,,
- "' 0--...
',
l ONNa-Hl.l/LO '\
-
1/0
... ~
~....
t __
-
....
l
~
--
,I loT-enabled smart ci
environment and imp
:,• /, §:- lighting. Below, we
already implemented
-,
l _, /I
-- ~
... ,,
i' -
\
- Ruad traffic
Smart cities ensure
efficiently as possible
implement smart trail
cb
The components required for the implementation arc Digital tem~ratur~ sensor
OS18B20, Resistor4.7k0 , Breadboard, Jumper Wires, Connectm_g Wires and
Smart traffic solution
drivers' smart phone!
At the same time, s
allow monitoring g
current traffic situati
smart solutions for t
. k't The resistor is utilized as a 'pull-up' for the data-hne. It ensures take measures to pre
Raspberry P, , . . . . ..r ti
that the \-Wire data tine is at a described logic level, and hm1ts mtcucrence ~m for example, being
mechanical sound if the pin is let\ t\oating. Jumper wires is used for int~rfac1_ng has implemented a s
raspberry and sensor. Connect GPIO GND [Pin 61 on the Pi to the 1~e.gat1v_e side and closed-circuit le
of the breadboard and link GPIO 3.3V {Pin 1) on t~e Pi to the pos1t1ve side on a central traffic man
the breadboard. Next step is to insert the DS l 8B20+ mto the breadboard, C~nnect platform users of
OS\ 8820+ GND [Pin I] to the negative side of the breadboard and+ VDD [P~n 31 to Additionally, the ci
the positive side of the breadboard. Next step is to place one arm of 4.7kll res1~tor to make second-by-s1
OS 18B20+ DQ [Pin 21 and other end to the positive side of the breadboard. Finally, conditions in real t
link OS t 8820+ DQ [Pin 21 lo GPIO 4 [Pin 7] alongside a jumper wire.
18
• Jl.,(,t'\,f!, I Ju½12019
~g your OR
a. Explain in detail smart city loT Architecture • (08 Marks)
ReferQ. no 10 b. ofModel paper2
b. With case study explain smart and connected cities using Raspberry PI
(08 Marks)
IoT-enabled smart city use cases span multiple areas: from contributing to a healthier
environment and improving traffic to enhancing public safety and optimizing street
lighting. Below, we provide an overview of the most popular use cases that are
already implemented in smart cities across the globe.
Road traffic
Smart cities ensure that their citizens get from point A to point B as safely and
efficiently as possible. To achieve this, municipalities turn to IoT development and
implement smart traffic solutions.
Smart traffic solutions use different types of sensors, as well as fetch GPS data from
drivers' smart phones to determine the number, location and the speed of vehicles.
At the same time, smart traffic lights connected to a cloud management platform
allow monitoring green light timings and automatically alter the lights based on
current traffic situation to prevent congestion. Additionally, using historical data,
ensures smart solutions for traffic management can predict where the traffic could go and
ce from take measures to prevent potential congestion.
rfacing For example, being one of the most traffic-affected cities in the world, Los Angeles
ive side has implemented a smart traffic solution to control traffic flow. Road-surface sensors
side on and closed-circuit television cameras send real-time updates about the traffic flow to
Connect a central traffic management platform. The platform analyzes the data and notifies the
Pin 3) to platform users of congestion and traffic signal malfunctions via desktop user apps.
sistor to Additionally, the city is deploying a network of smart controllers to automatically
Finally, make second-by-second traffic lights adjustments, reacting to changing traffic
conditions in real time.
19
CBCS · Veo-2019 / J~
b. E;,i.plaio Access Ne;:
Eight Semester B.E. Degree Examination, CBCS - D"c 2019 / ,fan 2020
Ans. The last mile of the I
Internet of Things Technology
wireless technologies
Time: 3 hrs. Max. Marks: 80 to the access network
Note : Ans"'"' an)' FIVE fi,t/ qu.tstlons, selecting ONE full question from end, module.
Module- I t.FC
CM1
1. a. What is IOT? Explain evolutionai;y phases of the internet, (06 Marks) I
< 10 trn
Ans. Internet of Things (IOT) is an ecosystem of connected physical objects that ar~
accessible through the internet. The 'thing' in IOT could be a person with a heart
monitor or an automobile with built-in-sensors, i.e. objects that have been assigucd
an IP address and have the ability to collect and transfer data over a network without
manual assistance or intervention. The embedded technology in the objects helps
them to interact with internal states or the external environment, which in turn affects
the decisions taken. l. j
I J,,
.,,,.
l¥FM
fNtr. ..rYI
\'4\"l'fot'lllfl
CanMClhtly
BualMH WPA.~ W1ctes. P
oo,r@,._ WHAN. V/reeis
and
Soclelal •C....i WFAN Wi'l!less F
•-1 •'t/d,BrowM,
• s.,,,a,
WlAN W',..es, •
As IOT continues
applications and sp
required. !OT somet
match more or less
Internet Phase Definition
technologies w.ere d
Connectivity (Digitize lbis phase connected people to email, web services, One key parameter
access) and search so that information is easily accessed. the smart object a
This phase enabled e-commerce and supply chain technologies you
Networked Economy
enhancements along with co\\aborative engagement to distances.
(Digitize business) drive increased efficiency in business processes. Range estimates a
This phase extended the lnternet experience to the vertica\ where
as follows:
lmmersive Experiences encompass widespread video and social media while 1. PAN (persona
(Digitize internet ions) alwnys being connected through mobility. More and around a pers
more ap lications are moved into the cloud.
2. HAN(home
This phase is adding connectivity to objects and wireless tcchn
Internet of Things machines in the world around us to enable new services
(Digitize the world) 3. NAN (neighb
and ex riences. It is connectin the unconnected. NAN is often
4. FAN (field a1
meters. FAN
20
CS - Veo2019 / J<M\12020
a,. Explain Access Network sublayer witb a neat diagnm. . . . (06 Marks)
The last mile of the IOT network is the access network. This 1s typically made up of
wireless technologies such as 802. 1lah, 802. I5.4g, and LoRa. The sensors connected
to the access network may also be wired.
:JCFrus.
y r OSM. '/.'COVA.
lsFC E:-Of'fl3, • 3CPP "'3 1o-
EM'/ < 6 km Vl'l'°'A)
I
< 10cm
\
GelUat (lir~,....,,
Rongeo '\J4 9'0CI\'
AN Ari LPW
-1 W~l>N WFAN ~-.
,_
LPWA (Ut~Lla,n.<,ed)
/ \
~L \
IOA ,:'A.> 1, fJJIM'M, «2,,.~..., '-..
tJ-IUttO..~,l"'N
~.,,,.,.,
··--
00,(.
ZYl.,..;t w-,.••.•p ... r-,1or ~JbM-,.lCl,ol.f-A~
l•~fllreWJI.I ,.,.~ ')Nit.a ~ 11r•""~1 IA,cyO'-'t.
w...itc,..,.. .UIIMI\Vlft~
~ ,... (.f,.lit~
""'70.-.1>
WPA'i Wrelffl PlnOIIII "'"" 14•'-" WNAN; w,~ Net,r(,QttUJd Arw Nt;W<Jlk
wttm V/ue" Hom:, A11111 Nelwol\ WWAtl. W1R:l~s Wide Arda NtiMlll
WFAN Wireless F1et0 (OI FIICIOly) AIU NBIWOrll LPWA LOW Power Wkle Arel
Wt.AN Wireles, Local Alea Ne~
As IOT continues to grow exponentially, you will encounter a wide variety of
applications and special use cases. For each of them, an access technology will be
required. IOT sometimes reuses existing access technologies whose characteristics
match more or less closely the JOT use case requirements. Whereas some access
technologies w.ere developed specifically for IOT use cases, others were not.
One key parameter determining the choice ofaccess technology is the range between
the smart object and the information collector. Above Figure lists some access
technologies you may encounter in the IOT world 1tnd the expected transmission
distances.
Range estimates are grouped by category names that illustrate the environment or
the vertical where data collection over that range is expected. Common groups are
as follows:
1. PAN (penonal area network): Scale ofa few meters. This is the personal space
around a person. A commo~ wireless technology for this scale is Bluetooth.
2. HAN (home an:a network): Scale ofa few tens of meters. At this scale, common
wireless technologies for JOT include ZigBee and Bluetooth Low Energy (BLE).
3. NAN (neighborhood area network): Scale ofa few hundreds of meters. The term
NAN is often used to refer to a group of house units from which data is collected.
4. FAN (field area network): Scale of several tens of meters to several hundred
meters. FAN typically refers to an outdoor area larger than a single group of
Zl
· Veo2019
VIII Sem, (CSE I ISE)
This concept!
house units. The FAN is often seen as "open space" (and therefore not secured
and not controlled). A FAN is sometimes viewc:<l as a group ofNANs, but some and applicatij
verticals sec the FAN as a group of HANs or a group of smaller outdoor cells. specifications
As you can set:, fAN and NAN may sometimes be used interchangeably. In most be readily elll
cases, the vertical context is clear enough to determine the grouping hierarchy. connecting th
5. LAN (local area network): Scale of up to I00 m. This term is very common servers, whij
in 111:tworking, and it is therefore also commonly used in the JOT space when of oneM2M
standard networking technologies (such as Ethernet or IEEE 802.11) are used. business dom
Other networking classifications, such as MAN (metropolitan area network, with utility, industi
a range ofup to a few kilometers) and WAN (wide area network, with a range of J, Network la)1
more than a few kilometers), an: also commonly used. endpoints. It •
Note that for all these places in the IOT network, a " W" can be added to specifically that Iinks th
indicate wireless technologies used in that space. For example, HomePlug is a wired wireless mes
technology found in a HAN environment, but a HAN is often referred to as a WHAN multipoint sys
(wireless home area network) when a wireless technology, like ZigBce, is used in connectjons, s
that space.
c. What are the elements of one M2MIOT Architecture? Explain. (04 Murks) Explain the lune!
Ans. IP, TCP, and UDP
to take care of d
Multiple protocol
22
CS • 'Dec-2019 / ]CM\/2020
secured This conceptual layer adds A Pis and middlewarc supporting third-party sc:rvic\:S
ut some and applications. One of the stated goals of oneM2M is to "develop technical
, or cells. specifications which address the need for a common M2M Service Layer that can
. In most be readily embedded within various hardware and software nodes, and rely upon
rarchy. connecting the myriad of devices in the field area network to M2M application
common servers, which typically reside in a cloud or data center." A critical objective
of oneM2M is to attract and actively involve organizations from M2M-related
re used. business domains, including h:lematics and intelligent transportation, healthcare,
rk, with utility, industrial automation, and smart home applications, to name just a few.
range of 3. Network layer: This is the communication domain for the JOT devices and
endpoints. It includes the devices themselves and the communications network
~cifically that links them. Embodiments of this communications infrastructure include
a wired wireless mesh technologies, such as IEEE 802.15.4, and wireless point-to-
WHAN multipoint systems, such as IEEE 801.1 lah. Also included are wired device
s used in connections, such as IEEE 1901 power line communications.
OR
Explain the functionality ofIOT network management sublayer. (05 Marks)
IP, TCP, and UDP bring connectivity to JOT networks. Upper-layer protocols need
to take care of data transmission between the smart objects and other systems.
Multiple protocols have been leveraged or created to solve IOT data communication
problems. Some networks rely on a push model (that is, a sensor reports at a regular
interval or based on a local trigger), whereas others rely on a pull model (that is, an
application queries the sensor over the network), and multiple hybrid approaches are
also possible.
Following the IP logic, some IOT implementers have suggested HTTP for the data
transfer phase. After all, HTTP has a client and server component. The sensor could
.use the client part to establish a connection to the JOT central application (the server),
and then data can be exchanged. You can find HTTP in some IOT applications, but
kl HTTP is something of a fat protocol and was not designed to operate in constrained
unica111 environments with low memory, low power, low bandwidth, and a high rate of
packet failure. Despite these limitations, other web-derived protocols have been
ntion to suggested for the !OT space. One example is WebSocket. WebSocket is part of the
includes HTMLS specification, and provides a simple bidirectional connection over a single
und API connection. Some lOT solutions use WebSocket to manage the connection between
plications the smart object and an external application. WebSocket is often combined with
thus they other protocols, such as MQTT (described shortly) to handle the JOT-specific part of
the communication.
e vertical b. Describe IOT World Forum (IOTWF) stan~ardize architecture. (07 Marks)
i physical Ans. In 2014 the IOTWF architectural committee (led by Cisco, IBM, Rockwell
rotocols, Automation, and others) published a seven-layer JOT architectural reference model.
l cellular,
While various JOT reference models exist, the one put forth by the IOT World Forum
·ces layer.
Z3
VIII Sern; ( CSE I IS'f)
offers a clean, simplified perspective on IOT and includes edge computing, data
storage, and access. lt provides a succinct way of visualizing lOT from a technical
perspective. Each of the seven layers is broken down into specific functions, and this includes sensors
security encompasses the entire model. Above Figure details the IOT Reference and so on. The top o
Model published by the IOTWF. databases, and applic1
•
IT. In the past, OT an
a.a.borallon & p,_.... to even talk to each o
(1""'1Mr9 Pooplo &--. """'"-l
the IOT Reference Model defines a set of levefs with control flowing from the center
(this could be either a cloud service or a dedicated data center), to the edge, which
RealW
includes sensors, devices, machines, and other types of intelligent end nodes. ln Physical
~eneral,data travels up the stack, originating from the edge, and goes northbound to
the center.
c. Compare and contrast IT and OT.
Ans.
24
· Deo2019 I JMl/2020
IOT systcims have to cross several boundaries beyond just the functional layers. The
bottom of the stack is generally in the domain of OT. For an industry like oil and gas,
this includes sensors and devices connected to pipelines, oil rigs, refinery machinery,
and so on. The top of the stack is in the IT area and includes things like the servers,
databases, and applications, all of which run on a part of the network controlled by
IT. In the past, OT and IT have generally been very independent and had little need
to even talk to each other. IOT is changing that paradigm.
Module-2
With a neat diagram, explain bow actuators and sensors intctact with physical
world. Classify actuators based on energy type. (08 Marks)
Sensors are designed to sense and measure practically any measurable variable in
the physical world. They convert their measurements (typically analog) into electric
signals or digital representations that can be consumed by an intelligent agent (a
device or a human). Actuators, on the others hand, receive some type of control
signal (commonly an electric signal or digital command) that triggers a physical
effect, usually some type of motion, force, and so on.
,-
Sense
Sensors
.,. ___
M.iasum
._______
Real World - Digtlal Aeptese11tal100 -
Phyaicel Envirorvnenl Eleclnc Signal
useful Act
Work
Much like sensors, actuators also vary greatly in function, size, design, and so on.
Some common ways that they can be classified include the following: · Type of
t-cn rcol motion: Actuators can be classified based on the type of motion they produce (for
...
Twm,
example, linear, rotary, one/two/three-axes).
• Power: Actuators can be classified based on their power output ( for example, high
•
Ro.ii
Time
power, low power, micro power)
• Binary or continuous: Actuators·can be classified based on the number of stable-
state outputs.
• Area of application: Actuators can be classified based on the specific industry or
vertical where they are used.
• Type of energy: Actuators can be classified based on their energy type.
Type Examples
Mechanical actu11tors Lever, screw jack, hand crank
Electrical actuators Thyristor, biopolar transistor, diode
2S
VIII Se,m, ( CSE ( ISE) •
Electromechanical actuators AC motor, DC motor, step motor
Electromagnetic actuators Electromagnet, linear solenoid
Hydraulic and pneumatic actuators
Hydraulic cylinder, pneumatic cylinder, piston,
pressure control valves, air motors
Smart material actuators (includes Shape memory alloy (SMA), ion exchange fluid,
ti I d - t ) magnetorestricrive material, bimetallic strip,
1erma an magne11c ac1ua ors . • b" h
p1ezoc1cctric 1morp
Micro- and nanoactuators Electrostatic motor, microvalve, comb drive
b. List out the limitations of the smart objects in WSNs and explain the data
aggregation in WSN with a near diagram. (08 Marks)
Ans. The following are some of the most significant limitations of the smart objects in
WSNs:
I. Limited processing power
2. Limited memory
3. Lossy communication
4. Limited transmission speeds
S. Limited power
26
· Veo2019 I J~2020
very large numbers of deployed smart objects. While there are certain instances in
which sensors continuously stream their measurement data, this is typically not the
case. Wirelessly connected smart objects generally have one of the following two
communication patterns:
I. Event-driven: Transmission of sensory information is triggered only when a
smart object detects a particular event or predetermined threshold.
2. Periodic: Transmission of sensory information occurs only at periodic intervals.
The decision of which of these communication schemes is used depends greatly on
the specific application. For example, in some medical use cases, sensors periodically
send postoperative vitals, such as temperature or blood pressure readings. In other
medical .use cases, the same blood pressure or temperature readings are triggered to
be sent only when certain critically low or high readings are measured.
OR
What is Zigbee? Explain 802.15.4 physical layer, MAC layer, and security.
(08 Marks)
Zigbee is based on the IEEE's 802.15.4 personal-area network standard. All you
need to know is that Zigbee is a specification that's been around for more than a
decade, and it's widely considered an alternative to Wi-Fi and Bluetooth for some
applications including low-powered devices that don't require a lot of bandwidth -
like your smart home sensors.
Physical Layer
The 802. 15.4 standard supports an extensive number of PHY options that range
from 2.4 GHz to sub-GHz frequencies in [SM bands. The original IEEE 802.15.4-
2003 standard specifiecl only three Pl IY options based on direct sequence spread
spectrum (DSSS) modulation. DSSS is a modulation technique in which a signal is
intentionally spread in the frequency domain, resulting in greater bandwidth. The
original physical layer transmission options were as follows: 2.4 GHz, 16 channels.
with a data rate of 250 kbps 915 MHz, IO channels, with a data rate of 40 kbps 868
MHz, I channel, with a data rate of20 kbps
6 Bytes o - 127 Bytes
5 Byles
... 1 Bvte ►
MAC Layer
The IEEE 802.15.4 MAC layer manages access to the PHY·channel by defining
how devices in the same area wills~ 1bc fNqueocies allocated. At this layer, the
scheduling and routing ofdata frames are also coordinated. The 802.15.4 MAC layer ·
MM' 27
VIII Sem, (CS£ ( I S£)
performs the following tasks:
• Network beaconing for devices acting as coordinators (New devices use beacons
to join an 802.15.4 network)
• PAN association and disassociation by a device
• Device security
• Reliable link communications between two peer MAC entities
The MAC layer achieves these tasks by using various predefined frame types. Ln
fact, four types of MAC frames are specified in 802.15 .4:
• Data frame: Handles all transfers of data ~ SA<:11rir1 E~11b111rl
• Beacon frame: Used in the transmission of beacons from a PAN coordinator h1t 1nHRmft
Conu ol IS S8l lO l .
• Acknowledgement frame: Confirms the successful reception of a frame
• MAC command frame: Responsible for control communication between devices b. Explain LoRaWAN stan
MAC Frame 4 -201:fy!a• MAC Layer
26 18
var able The MAC layer is defined
of the LoRa physical la)I
battery life and ensure d
The LoRaWAN specifica
• Class A: This class is
nodes, it allows bidir
receive downstream t
after each transmissio
• Class B: This class
can be better defined
0·127 B';tes
windows compared t
beaconing process.
Security • Class C: This class i
The lEEE 802.15.4 specification uses Advanced Encryption Standard (AES) with a enables a node to be
128-bit key length as the base encryption algorithm for securing its data. Established when not transmitti
by the US National Institute of Standards nnd Technology in 200 I, AES is a block
1 Byte
cipher, which means it operates on fixed-size blocks of data. The use of AES by
the US government and its widespread adoption in the private sector has he lped it
become one of the most popular algorithms used in symmetric key cryptography. MAC Header (MHDR)
(A symmetric key means that the same key is used for both the encryption and
decryption of the data.) Security
. In addition to encrypting the data, AES in 802. I 5.4 also validates the data that is Security in a LoRa
sent. This is accomplished by a message integrity code (MIC), which is calculated architecture, as detail
for the entire frame using the same AES key that is used for encryption. of security, protecting
The first layer, called '
·the authentication ofth
LoRaWAN packets by
28 ..._..~~M'
CS - Vro2019 I ]<M112020
4 -20 Sylas
devices use beacons
ies
ned frame types. In
® Auxil ary Securify Header
IIPld ,s added to MAC lrllrM
Security
Security in a LoRaWAN deployment applies to different components of the
architecture, as detailed in Figure. LoRaWAN endpoints must implement two layers
of security, protecting communications and data privacy across the network.
The first layer, called "network security" but applied at the MAC layer, guarantees
'the authentication of the endpoints by the LoRaWAN network server. Also, it protects
LoRaWAN packets by perfonning encryption based on AES.
VIII Sem; (CSE I ISE) · Veo2019 / J,
30
· Veo2019 ( ]CU'l.12020
The join procedure must be done every time a session context is renewed. During
the join process, which involves the sending and receiving of MAC layer join
request and join accept messages, the node establishes its credentials with a
LoRaWAN network server, exchanging its globally unique DevEUI, AppEUI,
and AppKey. The AppKey is then used to derive the session NwkSKey and
AppSKey keys.
Module-3
With o ncot diagram, explain 6LOWPAN protocol heuder compression and
fragmentation. (08 Marks)
Header Compression
IPv6 header compression for 6LoWPAN was defined initially in RFC 4944 and
ne!work !ey
subsequently updated by RFC 6282. This capability shrinks the size of IPv6's 40-
IO - ~nl NS
byte headers and User Datagram Protocol's (UDP's) 8-byte headers down as low
as 6 bytes combined in some cases. Note that header compression for 6LoWPAN
,-., 11,turn TX
is only defined for an 1Pv6 header and not 1Pv4. The 6LoWPAN protocol does not
support 1Pv4, and, in fact, there is no standardized 1Pv4 adaptation layer for IEEE
. Key), used by both itself
802.15.4. 6LoWPAN header compression is stateless, and conceptually it is not too
s data integrity through
complicated. However, n number of factors affect the amount of compression, such
as well as encrypting and
as implementation of RFC 4944 versus RFC 6922, whether UDP is included, and
various IPv6 addressing scenarios. It is beyond the scope of this book to cover every
SKey), which performs
use case and how the header ~elds change for each.
and its application server.
6LoWPAN Without Header COmpreaslon
I MIC, if included. This
4--- - -- - --127 Byte IEEE 802.15.4 Frame -------♦
access to the application
1B 408 88 53B
ir AES-128 application
likely derived from r.........::=-...a....-r.--- 1Pv6 _ _ _ ~ ~
UDP [ .............:.~ ~.......... ~
aider the control of the t
6LoWPAN Header
gateways are llLoWPAN With tPYll and UDP Header Compression
management and - - - - - - - - 127 Byte IEEE 802.15.4 Frame - - - - - - - •
iooaJ VPN and 2B 4B 1088
!llts, Additional
ra!all111rmen._ Payload
or future revisions
J ~DPj
- .... ···~· ..
t
6LoWPAN Header
ust get registered and with Compressed
1Pv6 Header
join mechanisms:
't need to run a join Fragmentation
and the NwkSKey and The maximum transmission unit (MTU) for an 1Pv6 network must be at least 1280
e end device. This same bytes. The term MTU defines the size of the largest protocol data unit that can be
passed. For IEEE 802.15.4, 127 bytes is the MTU. You can see that this is a problem
to dynamically join a because fPv6, with a much larger MTU, is carried inside the 802.15.4 frame with a
rough a join procedure. much smaller one. To remedy this situation, large TPv6 packets must be fragmented
31
VIII Sem- ( CSE ( I S£) In.ru-rie:t o f ~ r ~
across multiple 802.15.4 frames a(Layer 2.
The fragment header utilized by 6LoWPAN is composed of three prima~ fields:
Datagram Size, Datagram Tag, and Datagram Offset. The I-byte Datagram Size field
specifics the total size of the unfragmented payload. Datagram Tag identifies the set
of fragments for a payload. Finally, the Datagram Offset field delineates how far
into a payload a particular fragment occurs. Below figure provides an overview of a
6LoWPAN fragmentation header.
6LoWPAN Fragmentation Hoador
18 18 28 18
Datagram Tag ~
r
Datagram
r
Datagram Offset
applicability ofexistin
their suitability for ce
solutions are validated
Size (F,eld Not Present
Oo First Fragment) the Datagram Transpo
6LoWPAN DICE
Fragmentation New generations of co
Header
access networks are e
b, List and explain the key advantage/I of Internet protocol. when implementing U
(04 Marks)
Ans.
• Adoption is just a matter of time The Internet Protocol is a must and a
requirement for any Internet connectivity. It is the addressing scheme for any data
transfer on the web. The limited address capacity of its predecessor, IPv4, has
made the transition to 1Pv6 unavoidable. Google's figures arc revealing an 1Pv6
adoption rate following an exponential curve, doubling every 6 months.
• Scalability: 1Pv6 offers a highly scalable address scheme. The present scheme.
of Internet Governance provides at most 2 x IO 19 unique, globally routable,
addresses. This is many orders of magnitude more that the 2 x I09 that is possible·
with 1Pv4 and the IO 13 that is the largest estimate oflOT devices that will be used
this century. It is quite sufficient to address the needs of any present and future
communicating device still allowing it to have many addresses. ·
• Solving the NAT barrier: Due to the limits of the 1Pv4 address space, the current
Internet had to adopt a stopgap solution to face its unplanned expansion: the
Network Address Translation (NAT). It enables several users and devices to share
the same public IP address. This solution is working but with two main trades-
off: The NAT users are borrowing and sharing IP addresses with others. While olapplicaf
this technique allows single stakeholders to mount large applications, it becomes IDlutions. I
completely unmanageable if the same end-points are to be used by many different IEEE IBIS 201
stakeholders; this would occur in an IOT deployment where the same sensors protocol over IP can
are to be used by multiple, independent, stakeholders. Secondly the mechanism or UDP or by installi
cannot be used to access specific end-points from the Internet. between the serial pro
32
• Malti-Stakeholder Support: 1Pv6 provides for end devices to have multiple
addresses and an even more distribllled routing mechanism than the IPv4 Internet.
This allows different stakeholders ID assign IOT end-device addresses that are
consistent with their own applicaliaa and network practices. Thus multiple
mkeholders can deploy lhcir owa lpplications, sharing a common sensor/
actuation infrastructure, wilhoul I lang the technical operation or governance
of lhe Internet.
Esplain Rl'L encryption and autheadalioa on constraint node. (04 l\:larks)
ACE
Much like the RoLL working group, the Authentication and Authorization for
Constrained Environments (ACE) working group is tasked with evaluating the
applicability ofexisting authentication and authorization protocols and documenting
their suitability for certain constrained-environment use cases. Once the candidate
solutions are validated, the ACE working group will focus its work on CoAP with
the Datagram Trnnsport Luycr Security (DTLS) protocol.
DICE
New generations of constrained nodes implementing an IP stack over constrained
access networks are expected to run an optimized lP protocol stack. for example,
when implementing UDP at the transport layer, the IETF Constrained Application
Protocol (CoAP) should be used at the application layer. In constrained environments
secured by DTLS, CoAP can be used to control resources on a device. (Constrained
environments are network situations where constrained nodes and/or constrained
networks arc present.
The DTLS in Constrained Environments (DICE) working group focuses on
implementing the DTLS transport layer security protocol in these environments.
The first task of the DICE working group is to define an optimized DTLS profile
for constrained nodes. In addition, the DICE working group is considering the
applicability oflhe DTLS record layer to secure multica.~t messages and investigating
how the DTLS handshake in constrained environments can get optimized.
OR
Explain tunneling legacy SCADA over IP networks and SCADA protocol
translation with a neat diagram. (08 Marks)
Deployments of legacy industrial protocols, such as DNP3 and other SCADA
protocols, in modern IP networks call for flexibility when integrating several
generations of devices or operations that are tied to various releases and versions
of application servers. Native support for IP can vary and may require different
solutions. Ideally, end-to-end native IP support is preferred, using a solution like
IEEE 1815 2012 in the case of DNP3. Otherwise, transport of the original serial
protocol over IP can be achieved either by tunneling using raw sockets over TCP
or UDP or by installing an intermediate device that performs protocol translation
between the serial protocol version and its IP implementation.
33
VIII Se,m,, ( CSc I ISc) - oeo2019 I Ji
connection to their ~
RTUs both ends of the linW
over the IP networ
piece of software is
to IP ports. This so
serial redirector in e
Raw Socket ~ aster and m ent Set-Up converts it to a TCP
Detween Routers and SCADA Server the SCADA server s
.i SCA DA
B, where a router o
serial ports to IP po
RTUs Server socket connections.
L:.;...--=-~-=--=-:=:::" Application Comrnunieates Through
COM Port~ Mapped to IP+ TCP/UDP DeKribe MQTT fi
Pons by IP/Senal HPCllrl!Ctor SW Message Queuing
Scenllrio B: Raw Socket between Router and SCADA Server - no SCADA appllcatlon change At th'e end of the I
on ....., but IPfSerlal Redirector aoftwar• and Ethernet Interface to be added Eurotech) were loo
Raw Soclcet Master and Cl .nt Set-Up monitor and control
Between Routers and SCADA Server location, as typicall
! SCADA
the development an
(MQIT) protocol th
ATU:i Server
Appllcanon Communicates Otreclly of Structured lnfor~
over Raw Socket IP + TCP/UDP Ports oil and gas industri
designed, with con
Soenarto C: R- Socket between Router Ind SCADA Server - SCADA appflcallon knows communications, am
how to clreotly COlllmunlcate over II Raw Socket end Ethernet Interface some of the rations
framework based on
Araw socket connection simply denotes that the serial data is being packaged directly
into a TCP or UDP transport. A socket in this instance is a standard application
programming interface (API) composed of an IP address and a TCP or UDP port
that is used to access network devices over an IP network. More modem industrial
application servers may support this capability, while older versions typically require
another device or piece of software to handle the transition from pure serial data to
serial over IP using a raw socket. Above Figure details raw socket scenarios for a
legacy SCADA server trying to communicate with remote serial devices.
In all the scenarios in above figure, notice that routers connect via serial interfaces to
the remote tenninal units (RTUs), which are often associated with SCADA networks.
An RTU is a multipurpose device used to monitor and control various systems, MQIT is a lightwei
applications, and devices managing automation. From the master/slave perspective, fixed header with op
the RTUs are the slaves. Opposite the RTUs in each Figure scenario is a $CADA note that a control pa
server, or master, that varies its connection type. In reality, other legacy industrial an overview of the
34
- Dec-2019 / Jouv2020
,,,,
.,, . MOTT Client
(Subscr1t>er)
35
VIII Sem, ( CSf I I S£) · Vec,2019 / JtM\/202
~essage Type
Rema1111ng Leng1h
-
•--:'\,,_ ;-- ... --:;:,;; -·
-:¥-' ~ -•~' .. ".' • ;'. -
- •:"} _ -~-~ ·:~;. ~·:.i.
R..icl.
Pavk...U ro1,11,1n,111
SNitth
NameNode-
Message Type Value Flow Description
CONNECT I Client to server Request to connect
CONNACK 2 Server to client Connect acknowledgement
Client to server
PUBLISH 3 Publish message
Server to client
Client to server
PUBACK 4 Publish acknowledgement Hadoop relics on a scale-o
Server to client
Client to server and storage to distribute
PURREC s Server to client
Publish received MapReduce and HDFS tak
process massive amounts
Client to server
PUEREI. 6 Publish release nodes in the cluster. For H
Server to client
cluster, including NameN
Client to server Publish complete NameNodes: These are a
PTJBCOMP 7
Server to client
HDFS. They coordinate
SUBSCRIM 8 Client to server Subscribe request each block of data is sto
SUBACK 9 Server to client Subscribe acknowledgement is coordinated through the
UNSUBSCRIBE 10 Client to server Unsubscribe request NameNode notified of th
UNSUBACK 11 Server to client Unsubscribe acknowledgement NamcNode takes write
available nodes in configu
PINGRBQ 12 Client to server Ping reouest NameNode is also respo
PINGRESP 13 Server to client Ping response should occur.
DlSCONNLCT 14 Client to server Client disconnecting DataNodes: These are th
NameNode. It is commo
Module- 4 the.data. Data blocks are
7. a. Explain the elements ofHadoop with a neat diagram. _(~7 Marks) three, four, or more time
Ans. Hadoop is the most recent entrant into the data management _market,_ but it 1s arguably of the DataNodes, the D
the most popular choice as a data repository and processmg engine. Hadoo~ -:vas replication policies, to e
originally developed as a result of projects at Google and_ Yahoo!, and the original techniques such as Red
intent for Hadoop was to index millions of websites and quickly return search results not used for HDFS beca
for open source search engines. Initially, the project had two key ele'.11ents: redundancy with this rep
• Hadoop Distributed File System (HDFS): A system for stormg data across b. Explain neural networ
multiple nodes . . .
• MapReduce: A distributed processing engme that splits a large task mto smaller Neural networks are M
ones that can be run in parallel you took at a human fi
36
• 'Dec-2.019 / JOIA'\,,2020
Switth
37
VIII Sem,, (CSE ( ISf) · Veo2019 /
colors, movements, facial expressions, and so on. Your brain combines these
elements to conclude that the shape you are seeing is human. Neural networks mimic
the same logic. The information goes through different algorithms (called units), • FNF Flow Mo
each of which i~ in charge of processing an aspect of the information. The resu_lting NetFlow cache
value ofone unit computation can be used directly or fed into another unit for further the flow record
processing to occur. In this case, the neural network is said to have several layers. record: match s~
For example, a neural network processing human image recognition may have two or characteristid
units in a first layer that determines whether the image has straight lines and sharp is the Flow Ex
angles because vehicles commonly have straight lines and sharp angles, and human information, in
Monitor includ
figures do not. If the image passes the first layer successfully (because there are no
or only a small percentage of sharp angles and straight lines), a second layer may the size of the c
look for different features (presence of face, arms, and so on), and then a third layer • FNF flow reco
might compare the image to images of various animals and conclude that the shape is values used to
a human (or not). The great efficiency of neural networks is that each unit processes predefined for
a simple test, and therefore computation is quite fast. This model is demonstrated record aggregat
How N•ural Nalwork1 Netflow. User-
A.eogni. • Oog In• Pholo in the flow reco
a wide range o1
expected that 1I
user-defined anl
example, secur
• FNF Exporte
Using the sho
an application
iJ1formation p
NetFlow Expo
of transport fo~
be configured
• Flow export f
collection and
• NetFlow expo
• NetFlow serve
export. It is oft
patterns.
Explain Formal
Refer Q. no 8 b o
b. Explain the puru
operational domai
an industrial demi
38
• Veo2019 / JC\.+'\12020
• FNF Flow Monitor (NetFlow cache): The FNF Flow Monitor describes the
NetFlow cache or information stored in the cache. The Flow Monitor contains
the flow record definitions with key fields (used to create a flow, unique per flow
record: match statement) and non-key fields (collected with the flow as attributes
or characteristics of a flow) within the cache. Also, part of the Flow Monitor
is the flow Exporter, which contains information about the export of NetFlow
information, including the destination address of the NetFlow collector. The Flow
Monitor includes various cache characteristics, including timers for exporting,
the size of the cache, and, if required, the packet sampling rate.
• FNF Oow record: A flow record is a set of key and non-key NctFlow field
values used to characterize flows in the Netflow cache. Flow records may be
predefined for ease of use or customized and user defined. A typical predefined
record aggregates flow data and allows users to target common applications for
NetFlow. User-defined records allow selections of specific key or non-key fields
in the now record. The user-defined field is the key to Flexible NetFlow, allowing
a wide range of information to be characterized and exported by NetFlow. It is
expected that different network management applications will support specific
user-defined and predefined flow records based on what they are monitoring (for
example, security detection, traffic analysis, capacity planning).
• FNF Exporter: There are two primary methods for accessing NetFlow data:
Using the show commands at the command-line interface (CU), and using
an application reporting tool. NetFlow Export, unlike SNMP polling, pushes
information periodically to the Netflow rcponing collector. The Flexible
NetFlow Exporter allows the user to define where the export can be sent, the type
of transport for the export, and properties for the exp.ort. Multiple exporters can
be configured pc::r FlowMonitor.
• Flow export timers: Timers indicate how often flows should be exported to the
collection and reporting server. ,
• NctFlow export format: This simply indicates the type offfow reporting format.
• NetFlow server for collection and reporting: This is the destinlllion of the flow
export. It is often done with an analytics tool that looks for anomalies in the traffic
pattems.
OR
Explain Formal R isk Analysis Structures. (08 Marks)
Refer Q. no 8 b of June/July 2019
b. Explain the purude model for control hierarchy and OT network characteristics.
(08 Marks)
This model identifies levels of operations and defines each level. The enterprise and
operational domains arc separated into different zones and kept in strict isolation via
an industrial demilitarized zone (DMZ):
39
VIII Sem- (CSE ( IS£) - t>ec-2019 / J
4. Level 0: Proc
machines such
EMerpr1so L one lEDs.
• Safety zone
DMZ l. Safety-critical
to manage the
OpemtlOns support
40
· 'Dec-2019 / J<M\/2020
4. Level 0: Process: This is where devices such as sensors and actuator,; and
machines such as drives, motors. and robots communicate with controllers or
IEDs.
• Safety zone . .
J. Safety-critical: This level includes devices, sensors, and other equipment used
to manage the safety functions of the conbOI system.
Module-5
Explain the following with respect to Anlaiao programmlag.
I) Structure ii) Fuactlons
W) Variables iv) Flow control statemeats
v) Data type vi) Constaata. . (08 Marks)
i)Structure: Arduino programs can be divided i~ three i:nam parts_: Structure, Values
l (variables and constants), and Functions. In this tutorial, we will learn abo~t the
Arduino software program, step by step, and how we can write the program without
any syntax or compilation error. Let us start with the Structure. Software structure
consist of two main functions - Setup() function, Loop() function
Void setup ( ) {
}
PURPOSE - The setup() function is called when a sketch starts. Use it to initialize
the variables. pin modes, start using libraries, etc. The setup function will only run
once, after each power up or reset of the Arduino board.
INPUT--
OUTPUT--
RETURN--
Void Loop ( ) {
}
PURPOSE - After creating a setup() function, which initializes and sets the initial
values, the loop() function docs precisely what its name suggests, and loops
consecutively, allowing your program to change and respond. Use it to actively
control the Arduino board.
INPUT--
OUTPUT--
RETURN--
ii) Functions: A function is declared outside any other functions, above or below
!11~ loop ~~ction. We can declare the function in two different ways - The first way
1s JU~ wnt1~g the ~ of the fun~tion called a function ·prototype above the loop
function, which consists of- Function return type Function name Function argument
type, ~o need to write the arg_ument name Function prototype must be followed by
a semicolon ( ; ). The following example shows the demonstration of the function
declaration using the first method.
Example
41
VIII Semt (CSE I ISE) • 'Deo2019 / JCM'V2
block ofcode. Local variables are not known to function outside their own. Following
Unsigned s
is the example using local variables long long
Void setup () {
Vii) Constants: The co1
}
Void loop() { modifies the behavior o~
int X, y; the variable can be use
int z ; Local variable declaration changed. You will get a
x=O: Constants defined wit
y = O; actual initialization govern other variable
z= 10; keyword a superior me
} Example Code
iv) Flow Control statements: const float pi = 3.14;
float x;
S.No. Control Statement & Description
II ....
If statement x = pi • 2; // it's fine t
It takes an expression in parenthesis and a statement or block ofstatements. pi = 7; // illegal - y
1 If the expression is true then the statement or block of statements gets ~ Explain Raspberry E
executed otherwise these statements are skipped. Refer Q. no 9 b of Jun
l f ... else statement
2 An if statement can be followed by an optional else statement, which
executes when the expression is false.
If.. .else if .. .else statement
,.
L Write a python pro
Blink
3
The if statement can be followed by an optional else if...else statement,
which is very useful to test various conditions using single if...else if
statement.
., Tums on an LED o
42
switch case statement
Similar to the if statements, switcb.;.case controls the flow of programs
4
by allowing the programmers to specify different codes that should be
executed in various conditions.
Conditional Operator? :
5
The conditional operator? : is the only ternary operator in C.
v) Data Type: Data types in C refers to an extensive system used for declaring
variables or functions of different types. The type of a variable determines how
much space it occupies in the storage and how the bit pattern stored is interpreted.
The following table provides all the data types that you will use during Arduino
programming.
e that function or void Boolean char Unsigned byte int
char Unsigned int word
ir own. Following
Unsigned String-char String-
long short Hoat double array
long array object
..
vn) Constants: The const keyword stands for constant. It 1s a variable qualifier that
modifies the behavior ofthe variable, making a variable "read-only". This means that
the variable can be used just as any other variable of its type, but its value cannot be
changed. You will get a compiler error if you try to assign a value to a const variable.
Constants defined with the const keyword obey the rules of variable scoping that
govern other variables. This, and the pitfalls of using #define, ma~es the const
keyword a superior method for defining constants and is preferred over using #define.
Example Code
consl float pi= 3.14;
float x;
II ....
x = pi • 2; II it's fine to use consts in math
ofstatements.
pi= 7; II illegal• you can't write to (modify) a constant
ents gets
b. Explain Raspberry Pi learning board. (08 Marks)
Ans. Refer Q. no 9 b of June/July 2019
t, which OR
10. a. Write ~ python program on Raspberry Pi to blink an LED. (06 Marks)
Ans. 1•
Blink
Turns on an LED on for one second, then off for one second, repeatedly.
•1
II the _setup function runs once when you press reset or power the board
void setup() { II initialize digital pin 13 as an output.
pinMode(2, OUTPUT);
43
VIII Sem- (CS£ I IS£)
}
// the loop function runs over and OIIII'
void loop() {
digita1Write(2, HlGH); // tum da the voltage level)
delay( I 000); II wait for a
digita1Write(2, LOW);
delay( I000); // wait for
}
(06 Marks)
b. Explain smart dly F , • 5 b II e.
Ans. ReferQ. no lOL•ll?~l~J"..,•
c. Writeasltort.- e
i) IOT Qal1eagel.
(04 Mark.,)
ii) Baeldaaal Technolopll.
Ans. i) IOT Cballen1es: Refer Q.
aetwork switching equipment use the
ii) Backhaul Technologies:
tcnn to mean "getting data• .ai.bolie'' (which is similar to its use in
the satellite communicatioa Ascend uses the term to describe
how its MAX 2000 swildl data from a backhaul T-1
line on which mobile _. ted to an Internet service
provider and the bac may sometimes be used to
mean the use of the unications line.
44
As Per New VTU Syllabus w.e.f 2015-16
Choice Based Credit System(CBCS)
SUNSTAR
(06 Marks)
(04 Marks)
etimcs be used to
ons line.
..•
(f.FFCCTIVF. FROM TIIE AlAonacYFAll 2016-20171
IA M ark• 20 Time: 3 hrs.
Subjttl Cllllt IS<liG
Eua M arki 80 Note : A11swer a11y FIVE full qu
Numbtr of l ,tttun- lloun/Wttk
t:u • Hours 03
Total Numbr r of Ltrtun- lloun
3
VIII Sem, (CS£flS£) CBCS · }vfodel,Q~Wt'IIP
b. Wilh II neal I a - ~ - l ) H e a i roles in an HDFS deployment? ( 12 Marks) lets the conversation between t
Ans. HDFS Cuap I • progressing, the NameNode al
The design ofHDf'S 1111111 lypaofnodes: from DataNodes. The lack of
JS DltaNocles. a case, the NameNode will ro
In a basic dcsip. • . . . ., I ---~all die metadata needed to ~tore and retrieve now-missing blocks Because t
B1 64 the actual d-■ ..... C M I acaually stored on the Name Node, however. (decommissioned) for maintena
The desil!II IS 1 17
5 C _
El
_._._ a sin~lr Name Node da-,mon and singl,:
-·
. . al: new fsimage, and uploads the n
-----
restarts, the fsimage ti le is reas
• since the last checkpoint. If th
'---·- ---·-· NameNode could take a prohrn
system.
5
VIII Sewv (CSE/IS£) 13 ~ Vci.t"c;1; A ~
for instancC, assume you need to count how many times the name ..KutuLov·· appears in the
novel War and Peace. One solution is to gather 20 friends and give them each a sectio~ of the Map(key I,value I)-> list(key2, value
book to search. This step is the map stage. The reduce phase happens when eve1 yone ,s done: The reducer function i~ then applied t
counting and you sum the total as your friends tell you their coums. _ _ of values in the saim domain:
Now consider how this same process could be m;complished using s1mpll' •n,x command- Reduce(key2. list(value2))-> list(va
line tool~.~ following grep comm.ind llpplied a specific mnp to a text file: Each reducer call typically produces
$ gn:p .. Kutuzav ·· war-and-peace.tAt MapReduce framework transforms a
This command searches for the word Kutuzov (with leading and trailing space) in a leAt The MapReduce model is inspired
file called war-and-peace.txt. Each match is reported as a single line of text that con1ains · ~any functional programming lang
the search 1enn. 1be actual text file is a 3.2MB text dump of the novel War and Peace and important properties:
is available from lhe book download page. The search term, Kutuzon. is a character in the i.D~ta flow is in one direction (map t
book. lfwe ignon: die grep count (-c) option for the moment, we can reduce the number of ~~e mp1~1 lo another MapReduce proc
instances to a smgle number (257)'by sending (piping) the re,ull~ of grep 11
• As w11h _func1ional programming.
into we -1 11
and reduction limc1ions lo lhe input
(we -1 or •word count" reports the number of lines•it rcce,vcs.) ~~ Hadoop data lake is always prese
S grep - KUllmJII" war-and-peace.txt!wc - I
111 . Because there is no dependency on
257 to th~ data,the mapper and reducer da I
Though noc strictly a MapReduce process, this idea is quite similar to and much faster than provide better perfonnancc
the manual process of counting the inst.inccs of Kut~on in the printed book. rhe analogy
Distributed(paraflel) imple~entations
can be laken a bit funbcr by using the two simple (and naive) shell scripts shown in Listing
analyzed qui~kly. In general, the ma
I. I and Listing 1.2. 1lle shell scripts are operation (much more slowly) and tokcni.:c both the ~ubset of the mpu1 data. Results from ~
Kutuzov and Petersbwa llriags in the text: m the reducer phase:.
S cat war-and-pea"e lAl mapper.sh I ./reducer.sh
KuhtL0V. 615 b. E1plain compllln& and runnln& P
Petersburg, 121 program.
Notice that more iAllace of Kutuzov have been found (the first grcp command ignore Ans. ~ord~ount is a simple application that
instance like -Kunaov.· or ~Kutuzov,"). The mapper inputs a text file and then outputs daia ~•ven mput set. The MapReduce frame
in a (key. value) paar (IOkeo-aame, count) format. Strictly speaking, the input to the 5cript is rs, the frame~ork views the input to the
the file and 111c keys ll'C Kutuzov and Petersburg. The reducer script takes these key-value ~ey-value parrs of diff'erent types. The
pairs and combines 111c similar token and counts the total number of instances. The rcsull is a (mpul) <k I. v I ~ ..... map.,,. ,ki. v" ....
_#
7
VIII Se,rn., (CSf/ISf) 13~V~A ~ CBCS - /vfodel,Q~im\f Po
<Goodbye. I>
<Hadoop, ,,..,.
1_5/05/24 18: 13:26 INFO
hmulus.8188 ·w~' v l/tin11!1in
I
WordCount sets a mapper l_.~105/24 18: 13:26 INFO
jub.se1MappcrClass (Tokeni1erMap~r cla~s) , hmulus/ I 0.0 0 I 8050
a combiner job sc1Comb1ncrClass(lntSumReducer.class) ; l:'\/05124 18:13:26 WARN
and a reducer JOb.se1Combiner('lass(ln1SumRcducercla~~), not performed. Implement tl
Hence, the output of each map 1~ passed through the local combiner (\\hich sums the values to remedy this.
in the same way as 1hc reducer) for local ;1ggrcga1,on :md 1hcn sends 1hc darn on to 1he final 15/05/24 18: 13:26 INFO int
reducer. Thus, each map a,cn"C the wmbincr ~rforms 1he folio\\ ing pre-reductions: 18: 13:27 INFO mapreduce.J
<Bye. I> r...J
,Hello, I> File Input format Counters
<World, 2> Bytes Read=3288746
<Goodbye, I> file Output funnm Counters
<Hadoop, 2> Bytes Wriltcn-467839
<Hello, I> In addition, the fol1011 ing fil
The reducer• n •ia 1M reduce method, simply sums the values, which are the file name may be slightly ditl
occurrcll« C.-S ilr cadl kC) 1be rele..,ant code section i5 as folio\\~: public void reduce S hdfs dfs -Is war·,and-peace~
(Texl kq-. lluak c:lalWrilable> waka. Found 2 times
Con1wccmca1 •rw-r--r-- 2 hdfs hdfs O2015-
) 1hn)lls IOEAccpuon, I ~ : int sum 0: ~rw-r-r-- 2 hdfs hdfs 467863•
for llntWritablc val ~~ - val get ( ) : r~c complete list of word co
I wrth lhe following command
result.set (sum); con1e,t •nlC (key rauhJ: • •and -peac
S hdfs dfs -get war
I If the WordCount program i,
The fi nal output oflhe reducer IS 111£ following: overwr11t: l~te war-and-peace
<Dye, t> remo_vcd wuh the following
,Goodbye. I> $ hdls dfs -rm •r -sJ..ipTrai.h w
<Hadoop. 2>
<Hello, 2> Explain with example A
<World,2> Apache Pig is a high-lcve
To compile ad . . 111e program from the command line, perform the following steps: transformations using n
1. Mae a local wordcount_Classes dam:tcxy a set of transformations
S makdirwordcount_classes cxtmct, transform, and I
2. Compile Ille WordCount.java pnignm using the 'hadoop classpath• data processing.
command to include ull the available Hadoop class paths. Apache Pig has several
Sjavac-q, ' hadoop classpath" -d \llordcoun1_dasscs WordCount.j:1va done on 1he local ma,hi
3. The jar file can be created using the following command modes execute the job o
Tezcngine
Siar -1:vt wordcounl.jnr -C 1\orJcount_d~-
4· lo run the c>.amptc, ..,,.,....... :~r"' J,....,·111ry in HUFS and place a te>.t file in the new
L
directory. For this e~ample, 1\e 1\ ill use the \\ar-and-peace.1>.t:
S hdfs dfs -mkdir war-and-peace-input Interactive Mode
S hdfs dfs -put war-and-pcace.txt war-and-peace-input Batch Mode
S. Run tfle WordCount application using the follo,~ing command: There arc also in1cmc1tH:
S hadoop jar wordcount.jar WordCount war-and-pl.'ncc-input developed locally in inte
__, war-and-peace-output scale on chc cl11s1er in a
If everything is worki ng ~orrcetly. I luduop mi:ssage~ for the job should look like the Pig faample Walk-Th
following tabbn:viated vcr.,1on):
8
15/05/24 18: 13:26 INFO impl.Timelincclientlmp I: Timelinc service address: http://
limulus:8188/ws/v 1/timdinc/
15/05/24 18. 13:26 INFO client .RMProx) Connec1ing to ResourceManager at
limulus/ I 0.0.0. I:8050
15/05/24 18: 13:26 WARN maprcduce.JobSubm111er Hadoop command-line option parsing
not performed. Implement lhe Tool interface and execute your application with ToolRunncr
er (which sums the values to remedy this.
ih the d:it:i on to the final 15/05/24 18:13:26 INFO input.FilclnpulFormat Total input paths to process : I 15/05/24
ng. prc-rcducuons: 18: 13:27 INFO maprcduce.JobSubmitter: number ot splits: I
[...)
File Input Format Counters
Bytes Read 3288746
File Output rom1a1 Counters
Bytes Wri1tcn=467839
In addition, the follo,1 ing fi les should be in the war-and-peace-output din:ctory. The actual
~ the values. "h1ch an: the tile name mny be slightly different depending on your Hadoop ver:,ion.
ollows: public void reduce $ hdfs dfs -ls war-and-peace-output
Found 2 times
-rw-r--r-- 2 hdfs hdfs O20 I5-05-24 11: 14 war-and-peace-output I _SUCCESS
-rw-r-•r-- 2 hdts hdfs 4678639 20 I S·0S-24 11: 14 war-and-peace-outpuV part -r-00000
The complete list of word counts can be copied from HOF S to the working direc1ory
wilh the following command:
$ hdfs dfs -get war-nnd-peace-oulput/part -r-00000.
If the WordCount program is run again using the same outruts, ii will foil when it tries
overwrite the war-and-peace-output directory. 1'11c output directory ,md all contc:nts can be
removed ,1 uh the following command:
$ hdfs dfs -rm -r -skipTrash war-and-peace-output
Module-2
3. a. Explain with example Apache pig and Apache Hive? (08 Marks)
the folio" ing steps: Ans. Apache Pig is a high-level language that enables programmers to IHile complex MapReduce
transfonna1ions using a simple scripting language. Pig Lalin (the actual lunguagc) defines
a set oftransformlltions on a data set such as aggrcga1.:, join. and son. Pig,~ o0en used 10
extmct, transform, and load (ETL) data pipelines, quick n:se.1rch on raw data, and itemtivc
data processing.
Apache Pig has several usage modes. The first is a local mode in which all processing is
done on the local machine. The non-local (cluster) modes are MapReduce and Jez. These
modes execute the job on the cluster using either tht MapReduce engine or the optimized
Tczenginc
l;k.~ a te\t file in the new
fable 2.1 Apache Pig Usage Modes
Local Mode Tez Local Mode Map Reduce Mode Tcz Mode
lnteraclivt Mode Yes Experimental Yes Yes
Batch Mode Yes fapen mental Yes Yes
There are also mteract1ve modes, usmg small amounts~ f data. and then run al
developed locally in interactive modes, using small amounts of data. and then run :11
scale on the cluster in a pmducrion mode. The modes nrc summar,;,cd in Tnt,lc ::?. '
Pig f.Aamplc Walk-Through
9
VIII Se-wt1 (CSf./ISE) 8~V~A ~ C6CS · Moctei Q ~
For this e"<ample, th..- folio,, mg software environment 1s assumed Other environments Comments are delincJ
should "ork in a s1m1lar fashion. called id.out for !he re
• OS: Linux and then start Pig wil
• Platform• RHEL 6 6 $ /bin/rm -r id.out/
• 1lorton\\orkS HOP 2 2 \\tllCh Hadoop v~ion 2 .6
S pig -x local id.pig
• Pig version: 0.14.0 .. . If the script worked c
I f pseudo-distributed installation IS uSN, hlnstallauon R..-c1pcs. instructions for installing
length file with then
Pig are. the only difference is
In this ~imple example, Pig is used 10 cx1rae1 user names fromt h~ 1~tc /pass~vd file. A full S hdfs dfs -rm -r 1d.Ol
description of the Pig Latin langua&e is beyon~ the scope of this mtroduct1on, but more $ pig id.pig
infonnation about Pig can be found al http l p1g.apache.org/docs rtl.14.0 start.html. The
If Apache lez is insta
following example assumes the user IS hdfs.but an> valid user" ith access to IIDFS can run
learn more about wri1
the example. . . . . Usin& Apache Hive
To begin the example, copy the pasPl,d file to a worl..1ng d1re1:tol')' tor 101:al Pig optm111on:
Apache Hive is a dat
S cp /etcipass\\d .
summarization, ad h
Next, copy the data file into IIDFS a Hadoop MapReduce operation:
called HiveQL. Hive
S hdfs dfs -put passwd pass" d
pe1abytes of data usin
You can confinn the file is in IIOFS by anering the following command:
• Tools to enable e
hadfs dfs -ls passwd
-rw-r-r- 2 hdfs hdfs 2526 2015-03-19 11 :08 passwd • A mechanism to i
In the following example of local Pia opm1tion, all processing is done on the local machine • Access 10 fi Jes st
HBase
(Hadoop is not used). F1r5t, 11,c llllffact1ve command line is started·
• Query execu1ion
S pig -x local Hive provides users ~
If Pig starts correct!~. ~ou •ill Jff a grunk prompt You may !llso s~~ a bunch of INro
Hadoop clusters. Alt
messages, which you can ...,.-c Next, enter the following commands to load the passwd file wiJh lhc MapRcduce
and then grab the user name and dump it to the tenninal. Note that Pig commands must end
Hive queries can also
with a semicolon(;).
YARN in lladoop ver
grun1> A~ load 'pass,.,d" using PigStorage l' ;');
Hive £\ample Walk-
grunt> B = forcach A gtnttale SO as id:
grunt..,. dump 0;
For this example. th
The processing will ~tan and a h~I of user name& w,11 ~ printo:d to the screen. To exit the should work in a simi
1n11:rac11ve scsMon. enter 11,c command quit. • OS: Linux
S pig -x mapreduce • Pla1fonn: RHEL 6
• Hononworks 11D
The same sequence of commands can be entered at the grunt> prompl. You may wish to
change the SO argument 10 pull out other items in the passwd file. In the case of this simple • Hive version: O 14.
script,you ,~ill notKC lhal lhc: MapReduce version llke) much longer. Also, bccuasc we arc Although the followi
running this applicalion under Hadoop, make sure t'1e file is placed in HDFS. HDFS can nm the e,, a
If you are using the Horton\\ orks 1-1 DP dis1ribution "ith te1 installed. the 1ez engine can be To star I live, simply e
used as follo" s: a hive> promp1.
S pig -x tez Shive
Pig can also be from a script. An exampll: script (id.pig) is available from the example code (some message may sh
download (see Appendix A, ·•Book Webpage and Code Download"). This script, which is hive>
repeated here.is designed to the same things as the interactive version: As a simple test, create
,. Id.pig., (;).
A = load ·passwd' using PigStorage ('; ·,; - load the pass,\d file hive> CREATE TABL
n - forcaeh A genemtc SO as id,•· extract the u~er IDs OK
dump B. Time tnkcn: 1.705 se1,;
store 13 mto ,id.out'; --,-.r11c rhc results to a directory name id.out hive> SHOW TABLE
10
Comments are delinealed by,.,. and •• at the end ofa line. rhe )Cripl "ill create a directory
. r h It f 'rst ensure that 1he id.out director) isnot in~ our loc:al directory.
called 1d.oul ,or t eresu s. 1 , •
and then s1:1r1 Pig with 1hc script on 1he command hoe:
11
VIII Semt (CSE/IS£) CBCS · Modei-Q~P
OK [DEBUG) 434
pokes [ERROR) 3
Time taken: 0.174 s«onds.. fetched: I row (s) [FATAL] I
hive> DROPTABLC pokrs [INFO] 96
OK [TRACE] 816
rime 1aken: 4.0311 ~ecoods [WARN] 4
A more detailed example can be de\elop,:J using a web server log tile to summarize Time taken: 32.624 seconds,
message types. First, create a labk using lhc folio" ing command: To exit Hive, simply type ex
hive> CREATE TABLE logs(tl 5lr1llg. C string, 13 string, t4 string, hive> exit;
..... tS string, t6 string, t7 string) ROW FORMAT DELIMITED FIELDS
b. Explain with lhc following
_, TERMINATED BY' · ;
I) Create the database
OK
Time taken: 0.129 seconds 2) Inspect the database
Next, load the data-in this caK.from the sample.log file This tile is available from the 3) Create row
example code download *•file is found in the local directory and not in HDFS. 4) Delete a row
hive> LOAD DATA LOCAL I PATH ·sample.log· OVERWRITE INTO TABLE logs; 5) Remove a ta ble
Loading data to table defaull logs 6) Adding data in Bulk.
Table default.logs stats: (..-f les=I, numRow=0, totalSize,=99271, rawDataSize"'0) I) Create the Data base
OK The next step is to create the
Time taken: 0.953 s«onds hbase (main) :006:0> create '
Finally, apply the ~lccl Skp 10 the file. Note that this invokes a lladoop MapReduce 0 row(s) in 0.8150 seconds
operation. The resulU appc.,ar II the end of the output (e.g., totals for the message types In this case. the table name i
DEBUG, ERROR. and so on) the row key. The price colun~
hive> SELECT t4 AS ~v. COUNT(") AS cnt FROM logs WIIERE t4 LIKE;(%' GROUP command is used to add to ti
BYt4; data can be entered by using t
Query ID "' hdfs_20 I50327 GOOOO_dlela265-a5d7-4ed8-b785-2c656979 l 368 Total jobs • I put 'apple'. '6-May-15', 'pric
Launching Job I oul of I put 'apple'. '6-May-15', ·pric
Number qf reduce tasks not specified. fatimated from input data size: I put 'apple'. ·6-May-15', 'pric
In order to change the average load for a reducer (in bytes) : put :apple', '6-May-15'. 'pric
set hive .exec.reduce.bytes.per.redueer=<number> put 'apple'. '6-May-ts·, 'vol
In ordtr to limit the mIDlimum number of n:ducers: Note that these commands c
set hive.exec.reduters.max-<number> from the book download files
In order to set a constant number of reducers: commands can be retrieved an
set mapreduce.job.reduces"'<numbcr> 2) Inspect the Database
Starting Job = job_ 1427397392757_000 I, Tkeing URL = http://norbert:8088/proxy/ The entire database can be I
application_ l42739739257 _000 I/ command with large database.
Kill Command= /opl/hadoop-2.6.0/bin/hadoopjoh killjob_l4273973927.57_0001 Hadoop scan 'apple'
job information for Stage- I: number of mappers: I; number of reducers: I 2015-03-27 hbsc (main) :006:0> scan 'app
13 :00: 17,399 Stage- I map= 0%, reduce = 0% Row COLUMN+CELL
20 I5-03-27 13:00:26.100 Stage- I map= I 00%, reduce= 0%, Cumulative CPU 2.14 sec 6-May-15 column~pricc:close
2015-03-27 13:00:34,979 Stage- I map= l00%, reduce = 100%. Cumulative CPU 4.07 sec 6-May-15 column" pricc:high.
MnpReduce To1al cumula1ivc CPU time: 4 seconds 70 msc1: 6-May-15 column- price:low.
Ended Job = job_l427397392757_000 I column-price:open, timestam
MapReduce Jobs Launched: 6-May-15 column=volumc:, ti
Stage-Stage-I : Map: I Reduce: I Cumulmiw CPU: 4.07 se1: I ll)FS Read: 106384 3) Get a Row
HDFS Write: 63 SUCCF.SS You can use the row key to ac
Total MapReduce CPU Time Spent: 4 seconds 70 msec the row key.
OK
12
[DEBUG] 434
[ERROR) 3
[FATAL) I
(INFO) 96
(TRACE] 816
[WARN)4
marize Time taken: 32.624 seconds, Fetched. 6 row(s)
To exit I live, simply type exit: :
hive> exit;
b. E~plain wilh lhe rollowlng com mud~ in 1hr If b11 se c.111111 model. (O.i Murks)
I) Create the database
2) Inspect the database
3) Create row
ilable from the
1 in HDFS.
4) Delete a row
BLE logs; 5) Remo,•e II table
6) Adding dala in Bulk.
111aSize..O) I) Creale the Dalabnse
The next step is to cn:ate the database in HBase using the following command:
hbase (main) :006:0> create 'apple', 'price·, 'volme'
doop MapReduce 0 row(s) in 0.8150 seconds
he message types In this case. the table name is apple, and two columns arc defim:d The data \\ ill be used a~
the ron key. The price rnlumn is a famil) of four values (open, close. low, high). The put
(IKE,(%' GROUP command is used to add to the database from "ilh in !he shcl I. For inslancc. the preceding
data can be cntl!rcd by using lhc following commands.
1368 Total jobs= I put 'apple' . ' 6-May- Is·. ' pricc:open·. · 126.56'
put 'apple'. '6-May-15', 'price:high', ' 126.75'
put 'apple'. '6-May-15', ·price:low·. '125.36'
put :apple', '6-May-l s·. 'price:close·. '125.0 I•
put 'apple'. '6-Mny- 15', ' volume·, '71820387'
Note tha1 these commands can be copied and paMed in10 I!Base shell nnd are available
from the book download files . 'I he shell also k.:t!ps a hbtory for the scc.:tion. imd previous
commands cnn be retrieved and edited for resubmission.
2) Inspect lhe Database
orbert:8088/proxy/ The emire database can be listed using 1he scan command. Be careful when using 1his
command with large database. This example is for one row.
757_0001 Hadoop scan 'apple'
·ers: I 2015-03-27 hbse (main) :006:0> scan 'apple'
Row COLUMN ➔ Crl.L
CPU 2. 14 sec 6-May-15 column -price:closc. timestamp 14309551 :!8359. value 125.0 I
ive CPU 4.07 sec 6-May-15 column pricc:high. timestamp 14309551~602-1. value 1~6.75
6-May-15 column=price:low. 1imestamp 1430955126053. val\Je- I 23.36 6-May- I 5
column=price:open, timestamp-1430955125977. valuc- 126.56
6-May- IS column=volumc:, timestamp= 1430955 I41440. value ·71820387
. 106384 3) Gel a Row
You can use lhe ro11 key lo access an individual ro11 In lhe ~hx:k pric.: datnhase. 1hc daw i~
the row key.
13
VIII Sem, (CSE/ISE)
hbase (main) OOl:O> . . apple', ·6-May- Is·
COLUMN CELL Using the VARN Dlstrlbut
price:closetin.e-..,=1430955 l:?8359, value: 125.0 I For th e purpose of the exam
pricc:high t.mes1an1p• 1430955 I:?6024, value= 126.75 assign the following installat'
price:low tunestanip-1•00955126053, value~l23.36 application:
price:opent1rnestampa 14;o955 I 25977, value-126.56 S export YARN OS
volume: timcstamp 1430955141440. value 71820387 distributedshell.jar -
row(s) in O 130 seconds For the pseudo-distributed in
4)Delete I Row run the Distributed-Shell Bppj
location Hadoop);
You can delete an entire row by &iv11111 the deleteall command as follows:
hbasc(main) :009:0> delctNIJ •app1c·, "6-May-15' S export YARN DS=SH
5) Remove a Table disrributedshell-2.6.0.jar
!fanother dislribution is use
To remove (drop) a table, YOII - fina disable it. The following two commands remo~c
the apple table from Hbasc lllmc(maan) :009:0> disable 'apple' hbase(main) :0 10:0> drop Jar and SCI SYARN_OS base
'apple' can be found by running the
6)Adding data in B11lk S yam org.apache.hadoop.ya
.... -help
There are several ways to efficiently load bulk data into 11 Base. Covering all of these methods
The output of this command
is beyond the scope of this chapter lastad, we will focus on the lmportTsv utility. which
usage: client
loads data in tab-separated values (tsv) format into HBase. It has two di~tinct usage modes:
• Loading data from I tsv-format file as HDFS into IIBase via the put command •appname <arg>
• Preparing StoreFilcs to be loaded via the completebulkload utility
The (OI/Qlrins example shows how to USC lmporTsv for the first option, loading the ISV• ~attempr_failures_ vafidir
rnterval <arg> ·
format file using the put COflUIIIDd. The s«Olld option works_ in a two-step fashion and can
be • Kplo""d by t'MSUlting hUp:l/hbac.apache.org/book.html#1mporttsv. . _ . .
The l\n,t "'"P •• cunv•r\ the, Appl.........__... Ale to 1£Y fonnat The following scr,pt. which IS
included in the book software. "di l'ftllOft Ille nr.,1 lim: and,,., 1hc cunv.:r~1on. In do,ng so,
-c.:u111ainer_mcmo,y <aJ'B>
it creates a file named Apple-stock.lSv
S convert-to-tsv.sh Apple-stock tsv /Imp
Finally, lmpottTsv is run using lhe follow mg command line. Note the column designation in -container_vcores <arg>
the •Dimporttsv.columns option. In lbc example, the HBASE_ROW_ KEY is set as the first
column-that is, the data for the data. -create
$ hbase org.apache.hadoop.hbase.mapmiuce.lmportTsv -Dimporttsv.columnss
.... HBASE_ROW_KEY, price open.price:high,pricc:low,price:close,volume -debug
- apple /tmp/Apple-stock.tvs
The lmportTvs command worts will use MapReduce to load the data into HHase. To "erify -domain <arg>
that the command worts. drop and re-create lhe apple dalabtie, as described previously,
before running the import command. •help
-jar <arg:>
c. What is YARN? E1plaia aay five commands? (04 Marks)
Ans. The Hadoop YARN project includes the Distributed-Shell application. which is example an ·keep containers acros
of a Hadoop non-MapRcducc application built on top of YARN. Distributed- Shell is a applicm ion_anempts
simple mechanism for running shell commands and scripts in containers on multiple nodes
in a Hadoop cluster. This application is not meant to I>.: a production administrntion tool. but -log_propenics <arg>
rather a demonstration of the non-MapRcducc capabiht) that can I>.: implemented on top of -master_memeory <arg>
YARN. There are multiple mature implementations of a distributed shell that administrators
typically use to manage a cluster of machines. In addition. Distributed-Shell can be used as
a s!arting point for exploring and building 1-ladoop YARN applications. This chapter offers -master_~cores <arg>
cu•~c~ on how the Distributed-Shell can be used to understand the operation of YARN
apphcataons.
14
tV~A~
Using the YARN Distributed-Shell
For the plll'J)Ose of the examples presented in the remainder of6is chapter. we assume and
assign 1he following ins1alla1ion path, based on Hononworks HDP2 2. the Dis1ributed-Shell
application:
$ ex pon YARN_ DS ~tusr/hdptcurrenlt hadoop-) arn-clientllladoop-yam-applications-
distributcd~hel I.jar
For the pseudo-distributed install using Apache Hadoop version i .6 0.die following path will
run the Distributed-Shell application (assuming SHA DOOP_HOME is defined to reflect the
location Hadoop):
$ export YARN_DS=SHADOOP_ HOME/shan:'hadoop/yam/h~yam-applications-
s:
distributedshell-2.6.0.jar
If another distribution is used, search for the file hadoop-yam-applicalionl- distribu1edshell•.
jar and sel SYARN_DS based'on its location. Distribu1cd-Shell exposes v«ious options that
0 commands remove
can be: found by running 1hc following command.
main) :010:0> drop
S yarn org.apache.hadoop.yam.applications.distribuledshcll.Client -jar SYARN_DS
_. -help
The output of1his command follows:
gall of these methods
usage: client
portTsv utility. "hich
-appname <arg> Application Name Defauh
!listinct usage modes:
value -distributedSMII
t command
-nt1empt_failurcs_validity_ when attempt_failun:_validit)_ intcrvol in milliseconds is
tion, loading the tsv- inlerval <arg> set 10 >O. the failures Nnber will not 1ake failure which
o-step fashion and can happen out of the validilylnterval into failure count. If
failure count reaches IO maXAppAnempts, the application
v.
!lowing script. which i~ will be: failed.
unversion. In doing :.o, -container_memory <arg> Amount of memory in MB to be requested to run the shell
command
-container_ vcorcs <arg> Amoun1 of vinual cores to be reques1cd 10 run 1hc shell
column designation in command
KEY is set as the first
-create Flag to indicate wht:thc:r to crea1e the domain
specified with -domain.
columns=
volume -debug Dump out information
-domain <arg> ID of the timeline domain when: the timclinc
cnlitics will be put
-help Print usage
Jar file containing the applicatio,i ma~u:r
-jar <arg>
(04 Marks) Flag to indicate whether to keep 1:ontainers acros~
-keep_ conta1ners_across_
. which is an example application attempts. If the nag is true. running containers
application_attempts
istributed- Shell is a will not be retrieved by the new application attempt
ners on multiple nodes log4j .propertit:s file
-log_propcrtics <arg>
inistration tool. but Amount of memory in MB to be reque~ted to run the
imph:m.:ntcd on top of -master_memeory <arg>
application master
hell that administra1ors
Amount of virtual cores 10 bt: requested to run the
Shell can be used as -master_vcorcs <arg>
application master
ns. This chapter offers
e operation of YARN 15
VIII Se-mt (CSE/IS£) CBCS · Model-Q~
UM:~ and groups thal allowed IO modif> lhe limtine of various Hadoop serviJ
-modify_acls -.arg>
cntilics the 1imeline enlilies in the given domain the scrv ice managed b)
services managed by A
-node_label_expression <arg> Node label expression to determine the nodes where all metrics (Ganglia).
the containers of this application will be allocated, " " Da~hboard View
means containers can be allocated anywhere,,f you don't The Dashboard view pr
spei:ify 1he option, default nod1:_labcl_expression of cluster. The actual servi
queue will bl: u~cd. remove, or add these wi
-num_containers , nrg~ No. of containers on \\htch the shell c9mmand needs to • Movmg. C:hcl.. and h
be e11ecuted • F.dit: Place the mous
-priority <arg> Application Priority. corner of the widget.
Default 0 the widget.
RM Queue in which this application is to be submitted • Remove: Place the m
-queue <arg>
• Add: Click the small I
-shell_args <arg> Command line arge for the shell script. Multiple args can will be displayed. Sel
~ separated b~ empt~ space.
Some widgets provide a
-shell_cmd_priority <arg..- Prionty for the shdl n1mmnnd rnntniners instance, 1he DataNodes
-shell_command <erg> Shell command to be executed by the Application Master. Figure 4.2 provides a de
Can only specify either. -shell_command or • -shell_script The Dashboard view als
Environment for shell script. Specified asenv_ kercnv_ map selected m.:trics acr
cluster will be displayed.
val pairs
from the Select Metric p
-shell_script <arg> Location of the shell script to he executed. Can only 4.3
specify either
- -shcll_command or
• -shell_script
-tinu."Out <arg> Application timi:0\1I in milliseconds
-view _acts <arg> Users and group that allowed to view the timeline entities
in the given domain
OR
4. a. Esplaln viru1 or Apache Ambarl? (12 Marks)
Ans. Afier completing the initial installation and logging into Ambari) a dashboard similar to
that shown in Figure 4.1 is presented. The same four-node cluster as created that will be
used to explore Ambari . It you need to reopen the Ambar1 dashboard interface. simpl) enter
the following command (which assumes you are using the Firefox browser. although other
browsers may also be used):
S fin:fo,.. localhost : 8080
The default login and password Arc admin and admin. respectively. Befor1: continuing any
further. you should change the default password. To change the password. select Manage
Ambari from the Admin pull-down menu in the upper-right corner. In the management
window. click Users under User+ Group Management, and then click the admin user name.
Select Change Password and enter a new password. When you arc finished, click the Go To _ ,_. ___
Dashboard link on the ldl side of the wiRdow lo retum to the dashboard view.
···--···-··'
,
To leave the Ambari interface. select the Adm in pull-down menu of the m,t~lled ,crvicc:s.
Figure 4./
A glance at the dashboard should allow you to get a sense of how the cluster is performing.
The top navigation m.:nu bar. shown in figure ,I.I, provides acces~ to th.: Dashboard, Services,
Hosts, Adm in and Views fc:atures (the 3 " J cube is the Views mc:nu). The Matus (up/down)
- -- --.., --
browser. although other
- 5Hvlcc View
The service menu provides a
provides a graphical 1111:thod
etc/hndoop!con f Xt-1 L fi ks).
service metrics and un Allers
Similar to the Dilshboard vie
menu. To select a scrvicc,clic
will have is 01111 summar). /\ I
example. Figure .J.5 shows II
status ofNamcNodc, Second,
displayed in the Summnry 11i
status or the scr1 i,c and its c
·•- -· .. II <> e
metrics are displayl·J ns 11 idg
As on the dashboard, tln:sc wi
the Configs lab will open an
...
--
-
(properties) an: the same on
through the Ambari intnfricc
--- __ _
iii ...........
..,.
-· ·----
_, ,,.._.~ .-~· ....... ..__
·~ ■--......,..., • • e
.• ----- ..........
-·
. 1w ,,...
-
1w
C
0
C
C
C
Cl ....... ,
C
....,_ . . . .11...,.1uo
C
Fi;:ur
18
llfrVah:t,A ~
Conliguralion hb1ory is 1he final tab in the dashboard window. This view provides a list
of configurn1ion changes made to the cluster. As sho\!on in Figure 4.4, Ambari enable
configurattons to be sorted by service, configuration, group, data, and author. To find the
specific configurmion se1tings, click the service name. More information on configuration
setting is provided lalcr 111 the chapter.
Service View
The service menu provides a detailed look at each St"n;ice running on the cluster. It also
provides a graphic.ii mc1hod for configuring each service (i.e., instead of hand-editing the/
etc/hadoop/conr XI\ IL Ii les). The summary tab provides a current Summary view of important
service mclrics :ind an Alters and Health Ched.s sub- window.
Similar to the Dashboanl view, the currently installed services arc listed on the lefi- side
menu. To S1.'h:ct a sen icc,click the service name in the menu. When applicable, each service
will have is 01111 M1111niary. Alters and I lealth Monitoring and Service Metrics windows. For
example, Figure .J.5 sho11s lhe service view for HDf'S. Important information such as the
status ofNamcNodc, SccondaryNameNodc, D:1taNodes, uptime, and available disk space is
displayed in the Summary II indow. The Allers and I lealth Checks widow provides the latest
status of the service and its component systems. Finally, several important real-time service
metrics are displayed as \\ idgets nt the bottom of the screen.
As on the dashboard, these widgcls can be c;,..panded to display a more detailed view. Clicking
the Configs tab will open an options from, shown in figure 4.6, for the service. The options
(properti~s) arc the same ones that arc set in the Hadoop XML should manage them only
through the Ambari intrrlhcc-th~t is, the us~r should not edit the files by hand.
• ♦
1io 1111 • • I •
w:::r:==i
........
... --·
,sa e
"" ..1•~,., -- ...,;,e~·......,..,_.,.~
. ...................
fl, • ~.,, .... ..... 11 •- ,.,., .., , .
_.,..._\'II
:l
19
VIII Sen11 ( CSE/IS£)
TI1e c~nl Sdlm,!.S a ,1ilablc for each service arc shown in the form. Tiu: adminis1rn1or can sofiware installed. I he rcm:iini,
set each of these properties by changing the values in the form. Placing the mouse in the input over the various service compon
box of 1hc proix-n~ <lispla> s n short description of each property. Where possible, property Further details for ;i particular h
are group,:d by functionalily. The form also has provisions for adding prope11ies that are not A s shown in fi1.:111c ,1.s. 1hc indi
listed. An e:.:unpk of changing service properties and restarting the service components is Host Metrics, a71d Summary info
providt>J in the ~M. n:i~1ng I ladoop Service" section. currently nmning on the ho~t. E
If a sen1ce prm J.:s 11s O\\O graphical interface (e.g., IIDFS, YARN, Oozic), then that plnccd in maintenance mode Til
interface can be opened in a separate browser tab by using the Quiel... Links pull- down menu metrics (e.g., C PU, llll'n1ory, dis
located in lop miJdlc of the window. version of the graphic. lhc Sui
Finall), the Sc-n ,,: i\C'tll'n pull-down menu in the uppcr-lefi corner provides a method for includinu till' la~, lime a heartbc~
starting and sto p '° c ch ~c, vice and/or its component dae'.11ons :icross the cluster. Some
service ma) h:nc a sct of unique actions (such a_s rebalancing_ 1IDFS) that apply to o~ly
· .· 1• ns e,n.,llv CH'ry service has a Service Check option to make sure the service
ce1tam s11ua 10 • ~ .. • , . • d
is working pmpcrt~. The scf\icc check is initially run as part of the installation process an
can be valuabk ,,hen d a_no~in:: roblcms.
• • IJ:II I
-~~
Q
--
c:a
. ..... ,..•,,,_
.. r,
----. "'
..
...
.....
..
. ..~ -
. ...... -
; - : :• ...; ;• •, .1• •
'•- ..
♦ • ■
,,_
t,....
- ..
I
I
I
...
..,
:J ' •
.........
_:_
• f , . •
-_______
,....,,.... ...,. , ..,.,___,,,
..........,.....,___.
which
the J to .a<>or.
= cgatc complete
ob~1storyServer can bc
Managing YARN Jobs
_YAR~ jobs can be managed
....,... Mmcludmg -kill • -li•t•, and -st
apReduce jobs can also be c
-appTypi:s <comma-scparared
HU,M - - - -- - •
.,.,, .. _....,,....ni.
.._..........,,.. .........~-,.
-help D
_______
•~ -kill <Application 10>
·---
K
., ' "°''.. - - - - - - ....,. __...._.,. __....,._
....---·-..............._,.,,... ",,..
-list
.,,..
Li
..,..
...
',.,....
....,.,.. ;P
-------
• S tatus <Application ID>
,,nut ,..__ _ _ _ _ _ _ 'f"""'
,.,,......___.......,..
Neither the YARN Resource r
...,,.. applications. If a job needs to
Application ID :ind then use lhe
- -
Setting Container Memory
YARN manages application re
amount ofcomnincr melllOl)' 13~
file:
Figure 4. 9 t1111h11ri i1/.\t11lled p,,c~ with 1•enio11s,1111mbers a11d descriptlo11s • yam.nodcma11ager.reso11rce.1
use for containers.
b. Explain the llusil: "ttiuloop YARN administration? {04 Marks)
Ans. YARN has several administrative features and commands. To find out more about them, • schedulcr.minimum-allocatio
examine the YARN commands documentations at https://hadoop.apache.org/ docs..'currcnt/ ResourccManag.cr. A request.:
hadoop-yarn/hadoop-yarn-site/YamCommands.html#Administration_ Commands. The container of1his size (default
main administration commands is yam rmadmin (resource manager administration). Enter • yarn.scheduler.maximum-all
yam nnadmin -help to learn more about the various options. Resoum:M:in:igcr (default 81
Setting Cont.1incr Cores
Decommissioning YARN Nodes
If a NodeManag.cr host/nodes to be removed from the cluster, it should be decommissioned You can set the munbcr of cores fi
first. Assuming the node is responding, you can easily decommission it from the Ambari web xml:
UI. Simply go 10 1hc 110s1s view, click on the host, and select Decommission from the pull• • yarn.scheduler.maximum-all
down menu next lo the NodcManagcr component. Node that the host may. also be acting as request at the RcsourceMana
a HOFS DotoNodc. Use 1hc Ambari Hosts view to decommission the HDFS host in a similar this allocation will not wke c
number of cores. ·1 h.: default
fashion.
22
VCtmlA~
23
vrrr Sem, (csr:/rsr:)
C13CS • Model, Q ~ t'r.c-n,, Pet.per .
• yarn.schcduli:r.maximum-allocation-vcorcs: The maximum allocation for every container
Bl can help nrnt,.c bolh bctlcr.
request at the Res~Manager, in terms of virtual CPU cores. Request larger than this
Strategic dcci,ion~ are those that i,
allocation will not take dl"cct, anJ the number of cores will be capped at this value. The
out to a Mw customer set would be
default is 32.
Operational decbions arc more
• yarn.no'demanager.resou«e cpu-vcorcs: The number of CPU cores that can be allocated greater e~cicncy. Upd:uing an old 1
for containers. lo stratcgrc decision-making. the !t
Setting Maplkduce Pruptttia the path 10 reach the goal. ~
MapReduce runs <is a YAR!'J a,,plia8aM Consequently, it may be necessary to adjust some of The consequences ol'thc decision w
the mapred-site. ,1111 properties as~ ldlle IO the map and reduce containers. The following scanni?g for new possibilities and 1
properties arc used to set some Jff-cz n -.id memory size for both the map and reduce analysis of man) possible scenarios.
contaim:rs: found from
. data min in":,•
• mnpred.child.Java.opts provides a lllp • smaller heap size for child JVMs of maps Opera11onal dl'cisions can be made m
(e.g., - -Xmx2048m). system can be crc:ucd and modded
• mapreduce.map.mcmory.mt. p.o•-= a lllpr or smaller re~ource limit for maps (default of the domain. rliis model can help
= 1536Ml3). m'.tomalc opcmtio11s level decision-
• m:tpreduce.n:duce.memOl).mb ,..,.,as • larger heap size for child JVMs of maps micro level operational decisions in
(default = 3072Ml3). . F~r e~ample, a bank migh1 want to
• mapreducc.rcduce.java.opts pnmdG a a.pr or smaller heap size for childreducers. sc1t:nt1fic way using data-based mod
Modale-3 A dccision-lrec-bascd model could
Developing sud1 dccision lree m
5. a. Why )houltl uri::111iz11tiun in\·csl •• ,11iu■ lalelllgence(Bl) solutions? Are Bl toob techniques. Elfcc1ive OI has an cvolt
more important than IT sccurit)' toht..._. (08 Marks) people and organizations at't, new fo
Ans. Buslne~s intl'llii;cnce (UI) ban umbrella . . . . _ includes a variety of IT applications that be testcd against the new d,na. and i
are used to analyze an organiwtion 5 ~adQlllllllunicatc the information to relevant users. In that casc, deci~ion models should
Business Intelligence, 131 is a conccpl • ...ity involves the delivery and integration of An _u~cnding process of gencrnting
relevant and usdul business informaaoa ••organization. Its major components are data dec1s1ons, and thus c,111 be a sii.tnifican
warehousing, tbta 111 in ing. query111g. ad lqlllllmg (figure 5. 1). DI tool~ more import.ult Iha~ IT sc
I
..
Hu I . . . lnt-elllgencu 01 tools includc darn wnrdiousino
of management and tactil:al strah:gic management processes can be improved . include: many st:11is1ical l'i~nctions. Th
DI for Belter Dcd~iuns: There are two main kinds of decisions: end to ensure that the 1al>ks and or:ip
• Strategic da'(;isions and real time. "
• Operational decisions. Data mining ~)'Mems, sudi as ([3M
24
cacs - Modcl, Q u.e¢c,0' n, f'oq}(w • 1
location for every container Bl can help make l>oth bcltcr.
s. Request larger than this Strategic dcci,ions an: those that impact the direction of the company. The deci~ion to reach
capped at this value. The out to a new customer set would be a strat.:gic decision.
Operational decisions arc more routine and tactical decisions, focused on developing
ores that can be allocated greater d'ficiency. Updating an old website with new features will be an operational decision.
In strnt.:gic dccision-makini:;, the goal itself may or may not be clear, and the same is true for
the path to reach the goal.
essary to adjust some of The consl!quences of the decision would be apparent some ume later. Titus, one i~ rnnstantly
containers. The following scanning for new possibilitic:s and 11ew paths to achieve the goals. B l can help with what-if
001h 1he mup nnd rcd,1cc analysis ofm:iny possible sc.,narios. 131 can also help create new ideas based on n"w patterns
found from data mining.
for child JVMs of maps Operational decisions can be made more efficient using an analysis of past data. Aclassification
system can be created and modeled using the data of past instances to develop a rood model
limit for maps (default of the domain. This model can help improve operational decisions in the fu1urc. Bl can help
automate operations level decision-making and imprO\I! efficiency by makin 6 millions of
or child JVMs of maps micro level operational decisions in a model-driven v.a)
For example, a bl!llk might want to make 1focisions about making financial loans in a more
p size for childreduccrs. scientific way using data-based models.
A decision-tree-based model could provide a consistently accurate loan decisions.
Developing such d.:cision tree models is one of the main applications of data mining
solutions? Are Bl tools techniques. Elfoclivc Bl has an ev.olu1ionary compom:nt, as business models evolve. When
(08 Marks) peopl.: and organizations act, new facts (data) are generated. Current business models can
iety of IT applications that be tested against the new data, and it is possible that those models will not hold up well.
onn,llion to relevant users. In that case, decision modds should be revised and new insights should be incorporated.
li\ery and integration of An unending process of generating fresh new Insights in real time can help make better
~or components are data decisions, and thus can be a signifil:ant compelitivc advantage.
Bl tools more i111porlant than IT sccurily solutio11s:
131 tools include data ward10using, online analytical processing, social media analytics,
reporting, dashboards, querying, and data mining.
Bl tools can range from wry simple tools that could be considered end-user tools, to very
sophisticall!d tools that offer ~ very hro~d and complex set of functionality. Tims, even
executives can be their own !31 i:xperts, or they can rdy on Bl specialists to set up the Bl
mechanisms for lhcm. Thus, large organizations invest in expensive sophisticated Bl solutions
illg cycle that provide good information in real time.
lhe lifeblood of business. A spreadsheet tool, such as Mkrosoft Excel, can act as an easy but elfecti,·e I•I tool by itself.
ent and predicting the Data can be downloaded and stored in the spreadsheet, then analyzed to produce insights,
facts and feelings. Data•
then prcsenkd in lhc form ofgr:iphs and tables. This systern offers limited automation using
alone. Actions based on
macros and other li:!:1turc:s. The analytical features incluck basic statistical and financial
ting, using fresh insights,
functions. Pivot rnbh:s h.:lp do sophisticated what-if analysis. Add-on modules can be
installed to enable modcratdy sophisticated statistical analysis.
A dashboarding 5y~tcm, such as l~1blcau, can offer a sophisticated set of tools for gathering,
onitor business trends in
analyzing, and prcscnting data. At the user end, modular dashboards can be de~igned and
nario. If effective business redesigned easily 11 ith a graphical uso:r interface. The back-end data analytical capabilities
king processes at all levels
include many stmislical functions. The dashboards artl linked to data warehouses at the back
be improved .
end to cnsurt! that the tables and graphs and other tllements of the dashboard are updated in
real time.
Data mining systems, such as IBM SPSS Modeler, are industrial strt)ngth systems that
25
VIII Se.tw (CS'E/IS'E) 8ift'Vam-A~
providccapabilitics to npply a wide mngc ufanalytical models on large data sets. Open source decision making and helpin
systems, such as Wd;a, arc popular platforms dcsigm:u 10 help mine lilrgc amounts of Uilla DW enables a consolidated
to discover path:ms. Thus, the entire organiL1tio
DW lhus provides l>clli.:r a1
b. What ls tlae pur11ose or a dah1 warehouse? (04 Marks)
users to perform c,tcnsive a
Ans. Purpose or D,ua W:m:housc lies socnc..-herc in its definition it.self i.e.a database created by
operational u.11al,as,:s us,·J
combining data th.it is gathcred through various sources thilt can be of different types and
formats (e.g. te,1. sql. xml elc). c. Dusincsscs ncl.'d 11 "two-s~
Now what , ou 11 ill <lo afkr stonng such huge amount of data from different sources into a
single database. you 11 ill a11alyx lite data which you ha".i accumulated and try to answer Ans. Some of the ex;irnplcs citl.'
queries which wi.:re not possible: or were performance intensive earlier. The airlincs haw all lhi: d;tt
In a nutshi.:11 Dara "archoust I process of coll\.'Cting dala , transforming it, loading into until all lhc bags h,t\ i.: arriw
single uatabnsc and th.:n 1151111 a Bl tBusin.iss Intelligence) tool to answer your analytical then n:port ii lo 1he1r n"to
queries and prediction ofal) flll1ba' questions that may arise are helpful to your domain or l..110,1 upfront that lhcir bag
busines~. Po1,er companil.'~ hav.: 1hc
Below arc few rl.'nSOIIS hours after dozo.'ns of c11~101
Im pro, ing \'i~ibility of Dat:a : An organization registers data in different systems, which ohcad oftiim: to pr.:l'cnt foil
support the , ,tn us busm~-ss p11xcs~~ In order to create an overall picture of business The Fcd has all lhi: d,tta 10
opct Miur.~. customm amt supplii.:rs-thus creating a single version of the truth -the data must is still clinging on to ,m o
come 10,.dh,r an one pl.11:.: anJ made compatible. Both external (from the environment) and data and :idjust 1hi: policies
internal dala (from ERP. <:RM and liaanc:1al systems) should merge into the data warehouse smancr. r,•,11 time u>111p111.:1
and thi:n ~ arou~d. Therefor,: having a single source to answers all your qucrie~. l1bco 's products bring Ihat
lmpro,ed Pcrfora1111m:e: On.: could ll5C an already existing operational d.11abase if there is d:lla. Thl• most cvmmon fo
only single <k-stinatio11(D,1tabasc:) ofall the data. yet there few constraints like performance and columns.
which degr.ade for bolh 01)1!ratlOl\al processes and reporting processes. Therefore wc: create n Documents. on th.: othcr h,1
database tum:d and optimiz..-d d;atab.lsc which will be ready to answer queries which require of information conwined i
to bring h11ge amo11111 of data :ind :mal)Sis. anal) zed casil) sincc 1hc co1
lnc:re11sc: D11ta Qualit)· : Stal.chold.:rs and users frequently overestimate the quality of data Enlcrpris.: could unkash if i
in the sourc.: sy~tcms. Unfortunately, source systems quite of\en contain data of poor quality. of docu1111:nts 10 ob1ain the t
When w,· nsc a d.11.1 v.,irchous..:. \IC can gn:atly improve the data quality, either through - lnfinote brings stru,turl! 10
were possihl.: - corrcc1i11g the data 11hilst loading or by tackling the problem at its source. build tools Ihm cnn giw cor
Faster and Mori: 1111\'llnccd Hc11ortini:: The structure of both data warehouses enablc:s end and beconic pru.icti~e instc
ustrs to r.:port in a lkxibll! mannc:r ,tnd 10 quickly perform interncti\'e analysis on the basis lnfinote can hclp) uur corp
of various prcdclini.:tl :mg.ks. Thi.:y may, for example, with a single mouse click jump from your documi:111s.
year level - tv q11,1rt,·r - lo monlh l,:v,:I and quickly switch between the customer data and the Liberty Ston•s C11sc £\ere
product data II ltl.'rcby 1111: indii:.itor remains fixed. In this way, end users can actually juggle Liberty Stores Inc is a spcci,
with the data and lhus ,1uickly gain knowledge about business operations and KPls (Key wellness producls. .imt edui:
Performance lmlii:.itur). and Sus1ainabll.') ciliLl!ns \\\
A dat;a w.1rchuusc ( U\\ ) 1s ,Ill organiz.:d collection of Integrated, ~ubje\:t ui icnlt:u u.1tabases The comp:m) i~ :?O ~""~ o
designl•d to suppm1 d,:cision support functions. DW is organized at the right level of countries, 150 citi.:s, nnd hn
granularil) to pro, idc i:11!,in cnlcrprisc wide data in a standardized format for reports, queries, The company has rl!\cnu,:s
and analysis. DW is pit~ ~icall) .utd func1iom1lly separate from an op,:mtional and transactional The company pa) s ~pecial
datab,tsc. Cn:a1ing a DW for analysis and queries represents significant investment in time nnd produci:d. II donates ab
and elfot1. It h.is to 1>1: wns1a111ly kepi up-to-date for it to ~ useful. OW offers many business chari1ablc causl.'s.
and technical b,:ndits. I. Crcme a cumpr.:hcnsin: l
OW supports b11~i11<.:>> 1.:po11ing and dal:i mining uc1ivities. It can facilitate distributed access 2. Create anolhl!r dasltbo:trd
to up-10-d,ll.: bu~1110·~~ l..no,, h:dge for departments and functions, thus improving business
ellicicncy and customer scrvicl•. OW can pr.:sent a competitive advantage by facilitating
26
riV~A~
data sets. Open source decision making and helping reform business processes
large amounts of data DW enables a consolidated view of corporate d:.ta. all cleaned and organized.
rhus, the entire organi.G.'ltion can see an int..:e:r.:ited, iew of usctf.
DW thus provides belier anti timely information. It simplifies data access and allows end
(0-t Marks)
.a database created by users lO pt:rform extensive analysis. It enhances overall IT performance by not hu ·:fening the
of different types and operational databases used by Enterpris.: Resource Planning (CRPJ and other S) , ms.
c. Businesses need a " two-second :1d,·a111;a~~ to succeed. What does that mean to you?
different sources into a rot Marks)
ated and try to answer s. Some of the examples crtcd for ... Iwo S1."~00l.1 Advantage" are:
r. The airfim.:s have all the dala about your ba=-s Why is then that you have to 11 ai! fc-r eternity
orming it, loading into until all the bags have arrived at the baggag~ can•usel 10 discover that your bag i,. •issing and
answer your analytical then repo11 it to their customer service'.' \\ h) c.10 ·1 airlines be proactive and 11.t ;,n~sengcrs
lpful to your domain or know upfront that th.:ir bags 11 ill be nrri1 ing later''
Powt:r wmpanic~ hnvc thc data rtt hand on grid foilures. Why do they only respond several
hours allcr dozcns of custom.:rs call and compl,,in? Wouldn't it be better ifth~, use the data
ifferent systems, which alwad of timc lo prcl'cnt foilurcs in the first place''
II picture of business The Fed has all the darn to take fi:.cul. economic and monetary dccis,on~ in re::l 1:me. Why
the truth -the data must is still clinging on to an obsoklc model of meo.:11ng only a few tunes a )car to review the
m the environment) and data and adjust the policies and rates in h1mlstght? Why can't the Fed be replaced by a much
into the data warehouse smm1er. real time computer :tlgorithm?
1your querh:s. Tibco's products bring that valu:tbk two second ,1dvantagc ro the cnterprbc fot structured
ional database if there is data. Th.- mo5t common for111 of structun:d data is a database, where data is storl!d in rows
traints like performance and colu11111s.
. Therefore we create a Docu111cnts. on the other hand. represent thc world of unstructured d,1ta. There is a wealth
er queries which require of information contnincd in Enterprise documents. However, this information cannot be
analyzed easily since the content i~ not organiLed in a structured way. Imagine the potential an
imate the quality of data Enterprisc could unleash if it were ablc to analyzl.! the information scattered across thousands
t:tin data of poor quality. of documc.:nts to obtain the two ~ccomt adl'antagc.
quality.-either through - tnfinote brings strurturc to thc u11s1ruc111rcd wo, Id of documents. It provides a platform to
problem at its source. build tools tlwt can give rnrporntions the ability to extract information from their documents
warehouses enables end and become proactive in~tcad of being reactive. In my future columns, I will explain how
1\'C analysis on the basis tnfinotc <:,Ill hdp }'\JUI' corporation get the two second advantage by mining information from
mouse click jump from your documents.
c customer data and the Liberty Storrs C.1sc E,crcisc:
Liberty Stores Inc is a specialized glob:tl rc.:tail ch:iin th:tt sells organic food. u.-ganic clothing.
well_ncss products. :ind education products to enlightened LOI 1/\S (Lifestyles of the Healthy
and Sustainablo.:) citi,:cns worldwide.
oriented databases The company is 20 years old and is growing rapidly. It now operates in 5 continents, SO
at lh..: right level of countrit:s. 150 citi.:s, and has 500 stores. It sells 20.000 products and has I0.000 employees.
for reports, queries, The company has rl.!venucs of over SS billion and has a profit of about 5 p~rctJnt of revenue.
Iand transactional The company pa)s special attention to the conditions under which the products are grown
and 'produced. ft tfonat..:s about one-fifth (20 percent) of its pretax profits from global focal
charitable caus~s.
I. Create a comprehensive dashboard for the CEO of the company.
·1r1a1e distributed access 2. Create another dashbo:irtf for a country hl·ad.
us improving business
\anrage by facilitating
27
VIII Setw (CSE(ISE) 'B~Da.tt..,A~
Suµ,-rv,,ed
le,lr,ung;
Cl,>~slfl(:.itlo
UnsupNvl~
Le.trninr.:
fxr,lnr,,tion
Fi
Data may be mined to he1
e'ICplore lhe data 10 rind in
kind of prnhlem being so
The most import.int class
OR These are problems whet
pallcms 1ha1 would impr
6. a. Whal l.s Ll,1111 ml in,:? \\ hal .ire sur~·n'i)cd ond unsupervised lcarninJ techniques?
data of past decisions is 0
(08 Marks) codified 10 produce more
Ans. Dara min,ni; I) ti,: art anJ scirncc of di5'overing knowledge, insights, Md patterns in data.
learning as lht"rc is a way
It is 1he a~I of c:~•,acting usdul pallcm~ from an organized colleclion of data. Patterns must
A decision ln:e is a hiera
be ,alid, novd, potentially usl•ful, and understandable. The implicit assumption is that data
about the past c.in I cHul patterns of activity that can be projected into the future. an easy and logical mann
many reasons.
Data mining is a multidisciplinary field that borrows techniques from a variety of fields.
It utilizes the lnowlcdgc of data quality and data organizing from the databases area. It I. Decision lrccs arc eas
draws modeling and analytical techniques from statistics and computer science (artificial TI1ey also show a high pr'1
intelligence) areas. It also draws the knowledge of decision-making from the field of business 2. They sdcct lhe mosl rel
management. decision-making.
The field of d.ita mining emerged in the context of pattern recognition in defens.:, such as 3. Decision trees arc tolcr
from the user~.
identifying a fri.:nd-or-foc on a battlefield. Like
4. Even nonlinear relation
many other ddi:nse-inspircd technologies, it has evolved to help gain II competitive advantage
in business. There an: many algorithn
CART, and CHAIO.
For example, "customers who buy cheese and milk also buy bread 90 percent of the time"
would be a useful p:ittcrn for a grocery store, which can then stock the products appropriately. Regression is a relative!
Similarly, "peQplc with blood pressure greater than 160 and an age greater than 65 were at a The goal is to fit a smoo
high risk of dying from a hcan stroke'" Is of gi cat diagnostic value for doctors, who can then for exampl!.!, can be used t
focus on treating such patients with urgent care and great sensitivity. IClllf>':ra1ure !:>imply plolli
Past data can be or predictive value in many complex situations, especially where the pattern equation will lit lhc data
may not be so easily visible without tin: modeling technique. Here is a dramatic case uf a future day can be predict
Artilici11I ncurul nd wor
data-driven decision-making system that beats the best of human experts. Using past data, a
Intelligence slrcam in Co
decision tree mo<lcl was developed to predict votes for Justice Sandra Day O'Connor, who Neurons receive stimuli,
had n swing vote in a 5-4 divided US Supreme Court. All her previous decisions were coded successiwly, and eventual
on n few variables. What emerged from data mining was a simple four-step decision tree that just one neuron :ind the re
was able to :iccurntcly predict her votes 71 percent or the time. In contrast, the legal analysts layers of neurons involve~
could at best predict corrcclly 59 percent of the lime. (Source: Martin et al. 2004) The neural network can~
points. It will continue to
parameters bnscd on fceJ
28
{~,O:;I; A ~ C8CS · Mod<wQ~Paper · l
29
VI·II Semt (CSE/ISE)
CBCS - 1-,fcxlel,Q~
pa:.sc\l \\ ithin th.: In) .:rs of neurons mu> not make intuitive sense to an observer. Thus, the
neural m.-tworks ore considered a blat.:k-box system. 6. Outlier data clemen
At some point, the neural network will have learned enough nnd begin to match the predictive of results. For exam
accur01:y ofa human cxpcn or allernativc classlfkation techniqucs The predictions of some educational scuing.
7• Any biases in the sci
ANNs that have been trained over u long p..:riod of time with a large amount of data have
becomt: dcci:.ivcly more aci:uratc than human e.,pcrts At that point, the ANNs can begin co of the phenomena un
than is typical of th
be seriott\l)' considered for dcploymcm, in real situations in real time data.
ANNs nrc reP I ,r bcclluse they arc cvl!ntually able to reach a high predictive accuracy. 8 Data should bc brou
ANNs arc J. , relatively simple to imph:mcnt and do not have any issues with data quality. be available daily, bu
ANNs rl!qum: a Jot of d:ua to tmin to develop good predictive ability. To relate these varia
Clu,ter 11nalysa is an explor.itory learning tc1.hniquc that helps in identifying a set ofsimilar this case, monthly.
gt"()up5 in lhc J.ala It is a techni,1uc used for nucomntic idcn11ficntion of natural groupings of 9. Data may need 10 bet
things Oa&:I lllSla&\ces that ;ire similar to (or near) each other arc catcgorited into one cluster, much variability, bee;
\\ hile dala mst:inces that :ire VCI) different (or far a\\3)') from each other arc categorized into may dull the effects
separale clusacrs TI1ere can be an)' number of clusters that could be produced by the data. information density
The K-means technique is 3 popular 1cchn1quc ant.I al!O\\S the user guidance in selecting the
c. What is data , isunliza
right number (KJ ofclusters from the data. . .
Clustenng is also kno\\11 as the segmentation technique. l he technrque shows thc_clustcrs of Ans. Data Vis11:ilil.1tion is th
things from past data. rhe output is the centroids for each cluster and the ollocatton of data for the end user. ldcal vi
points 10 their clllitcr. right visual form. to con
The ccntruid definition i~ uwd 10 assi~n new data instances that can be assigned lo their
understanding ofthl! co
clustcr hom"-s Clllitering is also a part of the anificial intelligence family of techniques.
availabtc to present data
Assodatlon rules arc a popular data mining method in business, especially where selling is
totality of the situation.
involved. Also l.nown ns marl.ct t,askct analysis. it helps in answering questions about cross•
Data visualization is the
selling opponunities. This is the hean of the personalization engine used by e-commerce
sues til.e Amazon.1:om ,111d streaming movie sites like Netflix.com. 1 he technique helps find presentation in an easy•
in1erestingrdationships (a0init11:s) bet\~ecn \'ariables (items or events). These arc represented data should be convertc
as rules of the form X => Y, "here X and Y are sets of data items. A form of unsupervised the consumer of data. T
lcanung, 1t has no tlt:1'1Cndcn1 variabk; and th,:rc are no right or wrong answers. There arc just an actionablt: 111,inner. I
stronger and \\ eaker allinitics. Thus, each ntle has a confidence level assigned to it. A part data might lose interest
of the machine-learning family, this t\.-chnique achieved legendary status when a fascinating The qu:ility of t.lata vir,
relationship \\as found in the sales of diapers and beers. Data can be pres.:ntl!d
graphs of v:irious type•
b. Why Is data pn.'par11tion so important and time consuming? (04 Marks) in t:ibles" . 1101\ e, cr. ;
Ans. Data cleansing :ind preparation is :i laborintensive or semiautomated activity that can take up give shap.: to dat:i. Tu
to 60 10 70 percent of the time needed for a data mining project. objcctivcr. for graphic,
I. Duplic:11.: data needs lo be n:moved. The snmc data may be rcceivcd from multiple I. Sho\\, and en·n re
sources. When merging the data sets. data must be de-duped. large masSC\ of dat
2. Missing values necd to be fillet.I in, or those rO\\S should be removed from analysis. 2. Induce the viewer t
1\-li~~ing \ olues c;in bc fi 11.:d in \\ ith 11\'crage or modal or default values. so natural to the dal
3. Data elemcnti. ma)' 1wed lo b.: transformed from om: unit to another. For example, total 3. ~void distorting \\
co~ts of health care and the total number of patients may need to be reduced to cosl/ simplifying. some
patient to allow compar.ibility of that value. 4. Make large dat:i sc
'1. Continuous v,alues may ne.:d 10 be binned into a few buckets to help with some analyses.
For c"ample. work ex1ll!fJence could be binned as low, mcdium, and high. data together 10 tt.:I
5. Encourngc th.: eye•
S. Data elements may need to be adjus~cd to make them comparable over time. For example,
e> c~ \\ 01ild rwtural
currency v:ilues may need to be adJustcd for innation; they would need to be convened
to th.: same base yc.ir for com1,arability. TI1ey may need to be convened to a commo 6. Reveal the data :11
~Ull\ill\.)• n curiosity. and thus
30
fi-g-VCttiv A ~ I M \ '
to an observer. Thus, the 6. Outlier data clements need to be removc(j aficr careful review, to avoid the skewing
of results. For example, one big donor could ~i...ew the analysis of alumni donors in an
in to match the pn:clicllve educational setting.
The predictions of some 7. Any biases in the selection of data should be corrected to ensure the data is representative
rge amount of data have of the phenomena under analysis. If the data includ,"S many more members of one gender
t, the ANNs can begin to than is typicnl of the populmion of intcrcM, then ndJustments need to be applied to the
e. data.
igh predictive accuracy. 8. Data should be brought to the same granularity to ensure comparability. Sales data may
issues with data quality. be available daily, but the s:iles person compensation data may only be avail:ible monthly.
To relate these variublcs, the dat:i must be brought to the lowest common denominator, in
dentifying a set of similar this case, monthly.
of natural groupings of 9. Data may need to be selected to incre:ise infonnation density. Some data may not show
e ori2ed into one cluster,
0 much v:iriability, because it was not properly recorded or for ony other reasons. This data
o~hcr arc categorized into may dull the effects of other differences 1n the data and should be removed to improve the
be produced by the data. information dcnsi!y of the d.ua.
guidance in selecting the c. What is data visu:1lizutio11'! llow would you judge the quality of data visualizations?
(0-' M:irks)
,quc shows the clusten. of Ans. Data Visuali1ntion is the a11 and ~cierice of mnking data easy to understand and consume,
and the allocation of data for the end user. Ideal visuali1ation shows the right amount of d.ita, in the right order, in the
right visual form. to convey the high priority information. The right visuali;:•1 ion requir~s an
can be assigned to their understanding oftlte consumer·s needs, nature of the data, and the many tools and •~~!m1ques
family of tcchniq11cs. available to pr..:senl data. The right visualization arises from a complete understanding oftht:
esix-cially where selling is totality of the situation. One should use visuals to tell a trne, complete and fast-paced story.
mg questions about cross• Data visunti,ation is the last step in the dntn lif.: cyclc. This is wherc thc data is processed for
me u~cd by e-commerce presentation in an casy•to-consuml' manner to the right audience for the right purpose. The
The technique helps find data should be conve11ed into a language and fom1at that is best preferred and understood by
ts). These are represented the consumer of dal:i. The pr.:scntation should aim to highlight the insights from the data in
A form of unsupervised an actionable manMr. If the data is presented in too much detail, then the consumer of that
g anS\\ crs. There an: juM data might los.: intcNsl nnd the insight.
level as~i~ncd to it. A pm1 The quality of data visuali1..1tions can be judged by Excellence Visualization.
status "hen a fascinating Data can b1: prcsl'ntcd in the form of 1cctangular tables, or it can be presented in colorful
graph~ of v:iriou~ t:,-pcs. ·'Small. non comp:ir..itive. highly-lnbeh:d da~a sets usually_ belong
in tables" . 1101\ ever. as th.: amount of data grows. graphs arc prelcrablc. Graphics help
give shape to data. fufie. a pioneering expert on data visualization, presents the following
objectives for grnphical cxci:lh:nce: . . .
I. Sho11, and e,en rc\eal, the dat:i: The data should tell a story, especially a story hidden in
large masses of d.ita. 1101\ever, reveal the data in context, so the story is correctly told.
oved from analysis 2. Induce the vie11cr to think of the subst:incc of the data: The fonnat of the graph should be
lues so naturnl to the dat,1. that it hidcs itself anu lets data shim:.
• For example, tota 3. Avoid distorting wh:it th.: data have to say: Statistics can be us~d to lie. In thc n_am~ of
to be ri·duced to cos simplifying, some crucial context could be rcmo,ed leadi~g to_ dts~orted commu111~at1on.
4. Make large data sets cohcn:nt: 13) giving shape to dntn. v1sualiza11ons cnn h~lp bring the
o help 1\ ith some analysi:s. ili1rn togcther to tell a compr<'h1:nsi1 c St0I') . .
, and high. 5. Encour;ige th..: eyes to compar..: diffcrcnt pieces of data: Organize the chart m 1,ays the
le over time. For example, e)·cs 1\ ould naturally mow to derive insights from the graph: . . .
uld need to be conve11ed 6. Reveal thc cl,lla at scvcrnl kwb of detail: Graphs leads to insights, which raise further
converted to a common curiosity. and thus present,1tions should help get to the root cause.
31
VIII Sem,, (CSE(ISE}
7. Serve a ~casonably ~lear pu~sc. - informing or decision-making.
8. Closely mtegi:ate with the stallsllcal and verbal descriptions of lhe dataset. There should
be no separation of ch~rts and text in presentation. Each mode should tell a complett:
story. Intersperse text \\1th the map/graphic to ~ighlight the main insights.
Modulc-4
7. a. What is a decision tree'! Why arc decision trees the most popular classlfkatlon
technique? (02 Marks)
Ans. A decision tree is a tree where each node represents a fcature(attributc), each link(branch)
represents a decision(rule) and each leaf repn:scnts an outcome(categorical or continues
value).The whole idea is to create a tree like this for the entire data and process a single
Fi,:ure 7.1: Scull
outcome at every kaf{or minimite the error in evef) leaf) (Source: Groebner el al.
Decision trees arc a simple way to guide one's path to a decision. The decision may be a simple
Chart (a) sho,\S a very s
binary one, whether to approve a loan or not Or it may be a complex multi-valued decision,
the value ofy increases A
as to what m:iy be the diagnosis for a particular sickness. Decision trees are hierarchically
between the vuriablcs x
branched structures that help one come to a dcc~ion based on asking cert:iin questions in a
decreases proportionnll
particular sequence. • . . .. Chart (c) shows a curvi
Decision trees arc one of the moM widely used techniques for class1ficat1on. A good dec1s1on the value of) decr.:ase
tree should be short and ask only a fe\\ meaningful questions. They are very efficient to use,
relationship, like an nrc
easy to explain, and their cl:issifica1ion acancy is competitive with other methods. Decision
(quadratic means the
trees can gcn.:rate knowledge from a rc:w &esl instanc'--s that can then be applied to n broad
positive curvilinear relal
populallon. 0-:1.i~io,.. tNec ar,· us,•,1 to answer relatively simple binary decisions.
Jh!JS would nol be a stro
That means variables x
·1,. Whot is a rcl(rcssion model? Whal is a scallcr plot? llow does ii ·help?(6 Muks) candidates that model
Ans. Rc:gression is a wdl-knonn st.1tistical tedmi4uc: to model the predictive relationship between regression cqu:11ion en
sevi:ral indepcnd,mt variables (DVs) and one dependent variable. The objective is to find little: more complex, qu
the best-fitting curve for a deJ)l!nd.:nt variable in a mul1idimensional space, with each order polynomial regrcs
ind.:pcndcnt v,u·ioblc being a dimension. TI1e curve could be a straight line, or it could be a Charts (c) and (f) have
nonlinear curve. or using any other mod
The qualit) of fit ofth.: cun.: to the data can be measured by a cocffiei,mt of correlation (r),
i:.Euminc the steps in •
which is the: square root of the amoun1 of variance explained by 1he curve.
kind or ohjci.:1he func
The key slcps for rcsression a~ simple: price predictor S)Slem
I. List oll tho, voriables available for making the model.
2. Establish a Dcpendcnl Variable (DV) of interest.
Ans. It t:ikes resourc.:s, train
3. E~aminc visual (if pos~ible) rdalionships between variables of inlerest. mining pla1fo1111s offer
4. Find a way lo prc:dict DV using the olhcr variables. required to build :in AN
A s~atter plot (or sca~1cr diagram) is a simple exercise for ptouing all dala poinls belween two I. Gather dnt:i: Divide 1
1
variables on a 1wo-d1111ensional graph. It provides a visual layout of where all the da1a po·nts divided into trainin
are placed in that two-dimensional space. 2. Select the nct,1 ork ar
The s~ath:r_plot can_b.: useful for graphically intuiting the rela1ionship between two variables. 3. Sdecl the algorithm.
Here 1s a picture (figure 7.1) that shows many possible patterns in scatter diagrams. 4. Set 11.:1 \\ ork p,1ra111et
5. Train tin: ANN with '!
6. Validate the model w
7. Freeze the" eights a,
8. Test the trained net"
9. Deploy the ANN uh ,
Otht:r neural nct\\'ork :ir
32
classification ....... ...,_....,
(02 Marlu)
link(branch)
or continues
ess a single
Fig11re 7.1: Sc•11fler p/1,1f.f lltuH·i11g IJ-pn ofrdMJuRsl,ips among two 1•arlab/es
ay be a simple (Source: Groebner et al.20 13)
alued decision, Chart (n) shows a vcry strong linear relationship between the variables x and y. That means
hierarchically the value of y increases proportionally with x. Chart (b) also shows a strong linear relationship
questions in a between the variables x and y. Here it is nn invffW ~lationship. That means the value ofy
decreases proportionally with x.
!A good decision Chart (c) shows a curvilinear relationship. It is an inverse relationship, which means that
efficient to use, the value of y decn:ascs proportionally with x. However, it seems a relatively well-defined
thods. Decision relationship, like an arc ofa circle, ,1hich can be represented by a simple quadratic equation
plied to a broad (quadratic means the power of two, that is, using terms like x2 and y2). Chart (d) shows a
decisions. positive curvilinear relationship. J lowever, it docs not seem to resemble a regular shape, and
thus would not be a strong rcl.11ion~hip. Charts (e) and (f) show no relationship.
That means variables x and y ar..i independent of each other. Charts (a) and (b) are good
~6 Marks)
candidates that model a simple linear regression model (the terms regression model and
~tionship between
bjective is to find regression equation can be used interchangeably). Chart (c) too could be modeled with a
space, with each little more comple:-., quailratic regression equation. Chart (d) might require an even higher
,e, or it could be a order polynomial regression equation to represent the data.
Charts (e) and (f) have no rclation~hip, thus, they cannot be modeled together, by regression
t of correlation (r), or using any other modeling tool.
c. Examine the steps in developing a neural nehtork for predicting stock prices. What
kind of objective function and "hat kind of data would be required for a good stock
price predictor ,ptcm using ANN? (08 Marks)
Ans. It takes resourccs, training data. and skill and time to develop a neural network. Most data
mining platfom1s offer at !cast the MLP algorithm lo implement a neural network. The steps
required to build an ANN ar..i as follows:
I. Gather data: Divide mto training data and test data. TI1e training data needs to be further
divided into training data and validation data.
2. Select the m:t\\ork architecture, such as feedforward network.
een two variables. 3. Select the algorithm, ~uc:h as Multila}cr Pcrception.
diayams. 4. Set network p.ir;unctcrs.
5. Train the ANN with training data.
6. Validate the mo<ld with validation data.
7. Fr..:ezc th.: w..:ights :ind other parameters.
8. Test the traincd net"ork with test data.
9. D..ip/oy 1he /\NN nhen it arhie,·cs good pri:dict_i~e ~ccuracy.. . .
Other neural n..:hvorl- architectures include probab11is11c networks and self-org:u11z111g feature
33
VIII Sem,, ( CSf/ ISf) C8CS · N odel-Q~ p
1
maps. A computer neuron is built
to the neuron marked with
Training an ANN: Training data Is split into three parts
~omputation. The input lay
Trainln2 sci lfhis data set is used to adiust the ·•wei11hts on thc nc:ural nctuoric (- we/4). 1s the a>.on. Each input sig
V1lid1tioa wt This data i;t'I is USl.'tl tominimize over liuinit and verifying aecuracv (- 2Xi /o).
0
inpul value and the neuron
This dala set is used only for testing the final solution in order to confirm nrc compuled in the tmini
Testing act lhe actual pn:dic1ivc: JIO" ,-r ofdtc ndwotk (- 20 ¾) d~ttn\ and back \lrO\)ag;l
Neural Ncl\\orks. An acti
approach mcnn~ lhAI Ibo data ii divided into k equal pieces, and the learn in~
· r'C'pCIIICd It-times
· · h each pieces
· · the trammg
· · set in the output signal of the
lid cross - process ,s
k - fold wit becommg
va a IIon trh·1s process leads to Iess b'1as and more accuracy, but is more time C0l1$uming output of other neurons, a
manner. This is the basic i
Machme leaming• has provcd to improve efficiencies si gnificantl y, and there are many
jobs which have b.:en replaced by smarter and faster machines using artificial intelligence more detail in this article.
or machine lcaming. The stock markets are no exceptions to this. Today, there are several Working of Neural Ncl"
Machine Leaming algorithms running in the live markets. These algorithms often provide We will look at an examp
greater returns than other ahemate algorithms or sometimes even higher than experienced consists of the parameters
traders. In this article, I will talk about the concepts involved in a neural network and how it Our brains essentially hav
can be applied to predict stock prices in 1he live markets. Let us staJt by understanding what see, smell und taste. The ri
a neuron is. emotions and feeling:,, fr
make us act or take deci
Neuron
brains. Therefore, there ar
The first layer takes in th
are the inputs to the next
Hence, in this cxtn:mely
input layer, l \10 hidden I
all know that the brain is
computations arc done in
To understand the 11 orl..in
example, ,~here the 0 1
parameters, there is one
price.
This is the neuron that you . . . lie familiar with, well if you aren't you should now be
pateful that you can understacl 6il kause there are billions of neurons in your brain. There
are three components to a neura. • claldrites, the axon and the main body of the neuron.
1be dendrites arc the receivers o(dlc lipal and the axon is,the transmitter. Alone, a neuron
is not of much use, but when ii ii CIGMCCted to other neurons, it does several complicated
computations and helps openle tbc _ . complicated machine on our planet, the human ~
body.
Input value 1 X, ,
Input value m Xm
34
C15CS · ModwQue¢icn,p~e,- - l
A comp~ter neuron is b~ilt in a sim_ilar manner, as shown in the diagram. There are inputs
to the ne~ron mark_ed with yellow circles, and the neuron emits an output signal after some
computation. The input layer resembles the dendrites of the neuron and the output s· al
~s the axon. Each Input sign.ii i) U)~igncd • weight, w, This weight is multiplied b~g~he
input value and the neuron stores the weighled sum of all the input variables. These weights
are computed in the training phase of the neural network through concepts called gradient
descent and back propagation, ,,e will cover these topics in our subsequent blog posts on
Neural Networks. An activation function is then applied to the weighted sum which results
in the output signal of thc neuron. The input signals .ire generated by other n~urons, i.e, the
output of other neurons, and the network is built to make predictions/computations in this
manner. This is the basic idea ofa neural network. We will look at each of these concepts in
morc detail in this article.
Working of Neural Nl!l"orks
We will look at an example to understand the working of neural networks. The input layer
consists of the parameters that ,1 ill help us arrive at an output value or make a prediction.
experienc Our brains essentially have five basic input parameters, which are our senses to touch, hear,
rk and how i sec, smell and taste. The neurons in our brain crca_te more complicated parameters such as
tanding wha emotions and feelings, from these basic input parameters. And our emotions and feelings,
make us .ict or take decisions which is basically the output of the neural network of our
brains Therefore, then: arc 1\1 o layers of compulations in this case before making a decision.
The first layer takes in the five senses as inputs and results in emotions and feelings, which
arc the inputs to the next layer of computations, where the output is a decision or an action.
Hence, in this e>.tremcly simplistic modd of the working of the human brain, we have one
input layer, two hidden layers. and one output l.iyer. Of course from our experiences, we
all know th.it the brain is much more complicated than this, but essentially this is how the
comput.ilions an: done in our brnin.
To undersland the working of a m:ural network, let us consider a simple stock prjce prediction
example, where the 01 ILCV (Open-High-Low-Close-_Volume) valu~s. are the input
parameters, there is one hidden layer and the output consists pf the pred1ct1on of the stock
price.
lnpotl.a,rt
u should now be
your brain. There
ody of the neuron.
•r. Alone, a neuron
veral complicated
planet, the human
loft• .S
- OUtput signal
35
VIII Se,m,, {CS'E/IS'E)
.
• layer will have different weights for each ofthe fi ve input
·
lo various combinauons of the inputs· for example, the fi rst neuron m1g
parameters
an m~ght have _different ~ct•~alton functions, which will activate the input parameters
according
!ook1?g at tht: ~olumc and the ~ilfermce_ between the Close and the Open price and might be
rgnormg the lllgh and _Low pnc~. In this case, th~ weights for High and Low prices will be
· ht be
ze_ro. Based_on the we1~1s that the modrl has trained itself to attain, on activation function
w1ll_bc applied to th~ ,~e1ghted sum an die neuron, this will result in an output value for that
l
This approach could be succ
n~eds to be optimized. Howe
particular neuron. S1m1larly, the other IWO neurons will result in an output value based on ~•dd~n larers increases, the n
their individual nctivation functions and ~ t s . finally, the output value or the predicted lime II will require to train su
value of the stock price will be the sum oldie three output values ofeach neuron. This is how supercomputer. For this rcasQ
the neural network will work 10 pr(dict SIDCk p-ices. computing the weights of the
Gradient descent involves an
Conclusion
There are two ways to code a pro".PID far pafonning a specific task. One is to define all the slope we adjust the weights, t
rules required by the program t o ~ die result given some input to the program. The values for all possible combi
other way is 10 develop tht: frnmcw• .,_ wllidl the code will learn to perform the specific diagrams below. The first pl
iask by training itself on a datak.'t .._.. ......8 the result it computes to be as close to the can be seen that the red ball n
actual results which have been obsam 11lis process is called training the model, we will function. In the second diagra
function. Therefore, we can
now look at how our neural neouwt will- 111elfto predict stock prices.
The neural network will be gi,ca * ....., which consists of the OHLCV data as the moving in the direction of th]
input and as the output, we "ould •a,w* model the Close price of the next day, this is
duration. With this approach,
computations do not take vet)
the value that we want our modd IO - - •predict.The actual value of the output will be
represented by 'y' und the predicted ¥alue will be represented by y", y hat. The training of
the model involves acijllllia& the wcigbas of the variublcs for all the di!Terent neurons present
in the neW':11 nct•111L . . . . . . . . ailimizing the 'Cost Function'. The cost function, as
the name sugaD ~ - - • SI J a pediction using the neural network. It is a measure
of how far oft"dlt. p I S • • 111n die actua1or ~rved valued, y. Thhefrc fareh many
·ons...... $ 6e most popu1ar one 1s compute as a1 o t c sum
cost fun
--" t d.,.-
Cl IINa . - ,__,_,...,ed v:ilues for the ll'lllll•n&'' dataset.
of sqUUJCU 1uC:il - r-· .
C = L□ 1/2 ( "Y - y) 2
h
The way the neural m:t"ork trains ibelfis by first compu~ing the cost functi~n for t e tra_ining
datnscl for a given set of "eights for the neurons. Then ti goes back and adJUStS the we~ghts,
followed by computing the cost function for the training dataset based on the new weights,
The process of i;ending the errors back to the network for adjusting the weights is called Gradient descent can be do
backpropagation. This is .-.:pealed several times till the cost function has been minimized. We gradient descent and mini-ba
will look al how the weights are adjusted and the cost function is minimiz.ed in more detail is computed by summing al
next. computing the slope and adl
Gradient Descent the cost function and the adji
The weights are adjusted to minimize the cost function. One way to do this is through brute dataset. ·1his h ci-tremely u
force. Suppose we take IOOO values for the weights, and evaluate the cost function for these cost function is not strictly
valm:s. Wh.:n w.: plot the graph of the cost function. we will arrive at a graph as shown below. process 10 arrive at the glo
The best value for weights would be the cost function corresponding to the minima of this getting stuck with a subopl
graph.
36
--------
- - -----
er consists of 3
ck price. The 3
nput parameters
put parameters
euron might be
e and might be
iw prices will be
·vation function '
. be successful for a neural network involving a s,·ngle werg
This approach .could • ht w h.rch
n~eds to be op!rmrzcd. However. as the number of weights to be adjusted and the number of
hidden layers increases, lhe number of computations n:quired will increase drastically. The
time it will require to train such a model will be extremely large even on the world's fastest
sul)l!rcompuler. For this reason, ii is essential to develop a better, faster methodology for
computing lhe "'eights of lite neural network. This process is called Gradient Descent.
Gradient descent involves analyzing the slope of the curve of the cost function. Based on the
to define all the slope we adjust lhe weights, lo minimize the cost f1111Ction in steps rather than computing the
e program. The values for all possible combinations. The visualization of Gradient descent is shown in the
form the specific diagrams below. The first plot is a single value of weights and hence.is two dimensional. It
be as close to the can be seen lhat the red ball moves in a zig-zag pattern to arrive at the minimum of the cost
te model, we will function. In the second diagram, we have to adjust two weights in order to minimize the cost
function. Therefore, we can visualize it as a contour, as shown in the graph, where we are
'LCV data as the moving in the direction of the steepest slope, in order to reach the minima in the shortest
e next day, this is duration. With chis approach, we do not have to do many computations and as a result, the
the output will be computalions do not lake very long, making the training of the model a feasible task.
~t. The training of C
mt neurons present
e cost function, as
ork. It is a measure
,, y. There are many
d as half of the sum
dataset. y
37
VIII S0011 (CSf(I Sf)
batch gradient descent, which is a combination of the batch and stochastic methods. Herc,
we create different batches by clubbing together multiple data entries in one batch. This lnp
essentially results in implementing the stochastic gradient descent on bigger batches of data
entries in the training dataset. Next, let us understand how backpropagation works to adjust
the weights according to the error which had been generated.
The Design Principl or A
Backpropaga tion
I . A neuron is the basi
Backpropagation is an advanced algorithm which enables us to update all the weights in the
element) receives inputs fo
neural network simultaneously. This drastically reduces the complexity of the process to
adjust weights. If we were not using this algorithm, we would have to adjust each weight computation on the basis
individually by figuring out what impact that particular weight has on the error in the passes on the output to the
the weights for each input
prediction. Let us look at tht: steps involved in training the neural network with Stochastic
X
Gradient Descent:
• Initialize the "eights to small numbers very close to O(but not 0)
• Forward propagation - the neurons are activated from left to right, by using the first data x2
entry in our training dataset, until we arrive at the predicted result y
• Measure the error which will be generated
• Backpropagation - the error generated will be back propagated from right to left, and the
weights will be adjusted according to the learning rate
• Repeat the previous three steps, forward propagation, error computation and back
propagation on the entire training dataset Flg11
• This would mark the end of the fu'st epoch, the successive epochs will begin with the 2. A Neural network is a
weight values of the previous epochs, we can stop this process when the cost function neuron, and at least one p
converges within n certain acceptable limit a simple, single-stage co
OR neuron and the result may
of processing elements in
8. a. What is a neural network? llow docs it work? Explain the Design Principles of an depending upon the comp
Artificial Neural Network. (08 Marks) sequence, or they could w
Ans. Artificial Neural Networks (ANN) are inspired by the information processing model of the
mind/brain. The human brain consists of billions of neurons that link with one another in an
intricate pattern. Every neuron receives information from many other neurons, processes it,
~~~l
x2 ' Wll
gets excited or not, and passes its state information to other neurons. ''-wl,il wl l
x~~x>. ...:-
-
Just like the brain is a multipurpose system, so also the ANNs arc very versatile systems.
They can be used for many kinds of pattern recognition and prediction. They are also used
for classification, regression, clustering, association, and optimizatio,n activities. They are
,,.-..-Iii
used in finance, marketing, manufacturing, operations, information systems applications, and ,"
x~124
soon. lnp
ANNs are composed of a large number of highly interconnected processing elements
(neurons) working in a multi-layered structures that receive inputs, process the inputs, and
produce an output. An ANN is designed for a specific application, such as pattern recognition 3. The processing logic of j
or data classification, and trained through a learning process. Just like in biological systems, input streams. The processi
ANNs mak~ adjustments to the synaptic connections with each learning instance. function, from the proces!
ANNs arc_ like a ~la~k box trained into solving a particular type of problem, and they can intermediate weight and pr
develop high prcd1c11vc powers. Their intermediate synaptic parameter values evolve as the in its objective of solving a
sy~tem obtains feedback on its predictions, and thus an ANN learns from more training data an opaque and a black-box
(Figure 8.1 ). 4. The neural network can
many training cases. It w.
communication based on
become better at making a
38
stochastic methods. Herc,
entries in one batch. This
t on bigger batches of data
Inputs L :;:!:§~ § ~utpu\~
x3 -::-~,tm-□'tf
right, by using the first data
ult y 'imat~!!_ Y
x4 ·
39
VIII Sem, ( CSE/ISE)
Depending upon the nature of the problem and the availability of good training data, at some
point the neural network will learn enough and begin to match the predictive accuracy of Now append the scr-J
a h~man e~pert. _In many practical situations, the predictions of ANN, trained over a long number of columns
penod ofllme with a large number of training data, have begun to decisively become more original data and the I
accurate than human e>.pcrts. Al that point ANN can begin to be seriously considered for data to label records
deployment in real situ:itions in'renl time. Generate a predictive
sets. lf it is impossible
b. What is unsupervised learning? When Is ii used? (04 Marks) artifacts then there is !
Ans. Unsupervised learning, by contrast, docs not begin with a target variable. Instead the objective st rong strncture in the
is to find groups of similar records in the data. One can think or unsupervised learning as In the CART model se
a form or data compression: we search for a moderate number of representative records of Original records d ]
to summarize or stand in for the original database. Consider a mobile telecommunications nodes reveal patterns •
company with 20 million customers. The company database will likely contain various randomized anifact.
categories of infonnation including customer characteristics such as age and postal code, We don not expect the i
product infonnation describing the customer's mobile handset, features of the plans the of Original from Copy
subscriber has selected, details of the sub~crib.:rs use of plan fcntures, and billing and payment interesting data group~
infonnation. Although it is almost certain that no two subscribers will be identical on every This approach to uni
detail in their customer records, we would expect to find groups of customers that are very technology because:
similar in their ovcrnll pnttern of llcmoi;rnphics, selected equipment. plan use, and spending • Variable selection
and payment behavior. If we could find say 30 representative customer types such that the groups of variable
bulk of custonms an: \\Cl! described as belonging to their "type". this information could be • Preprocessing or re
very useful for marketing. planning, and new product development. We cannot promise that influenced by how
we can find clusters or groupings in data that you will find useful. But we include a method • Missing values pr
quite distinct from that found in other statistical or data mining software. CART and other • The CART-based
Salford data mining modules now include an approach to cluster analysis, density estimation select the optimal 1
and unsupervised learning using ideas that we trace to Leo Brciman, but which may have
been kno,, n informally In among statisticians at Stanford and elsewhere for some time. The c. What arc assoriatio
Ans. Association rnll! mini
method detects structure in data by contrasting original data with randomized variants of that
help identify shoppin
data. Analysts use this melhod implicitly ,,hen viewing data graphically to identify clusters
interesting relationshi
or other structure in d:11a visually. Take for example customer ages and handsets owned. If
there is a p:lllern in the data then we expect to see certain handsets owned by people in their cross-sell related item
AII data used in this te
early 20's. and rather different handsets owned by customers in their early 30's. If every
learning algorithms. '
hnndscr is just as likely to be owned in every age group then there is no structure relating
these two data dunensions. The method we use generalizes this everyday detection idea to how it is often e:-.plaii
high cfimcnsions. of- sale transaction da
among it.:ms. An exn
The method consists of these slcps:
compulcr and vims
Make a copy of the original datn, and then randomly scramble each column of data separately.
the time."
As an exam pk st.lrting with dnta typical of n mobile phone company, suppose we randomly
exchanged date or bi11h informmion at random in our copy of the database. Each customer In business environ~
marketing, it is used I
re~ord woul~ likely contnin age information belonging to another customer. We now repeat
th~s _process •~ every column of the datn. Orciman uses a variant in which each column of design. online :td\eni
ongmal data 1s rcplaced \\ ith a bootstrap resample of the column and you can use either This analysis can su
method in Salford sofiware. of products promote
In retail environment!
Note that all 11e have donl! is mo,ed infonnation about in the database, but other than moving
data we not changed an) thing. So nggregates such as averages and totals will not have close tougher for cus~
cha?gcd. Any on~ customer rccord is now a "Frankenstein" record, with item of information the cuslomer has to \~
having been obt~ined from a diffl!rcnt cuslomer. Thus, date of birth might be from customer In medicine, this tee
IO 1135. the service plan taken from customer 456779 and the spend data from 98700 I. dingnosis and patient
Represi:nting Associ
40 s..,,..-,....... E,,,..... Sc.,.,,,,...,. s......-,....... E,,,..... Sc.,.,,,,...,.
of good training data, at some Now append the scrambled data set to the original data. We therefo~ now have the _same
h the predictive accuracy of number of columns as before but twice as many rows. The top pon1on of the data 1s the
of ANN, trained over a long original data and the bottom portion will be the scrambled copy. Add a new column to the
n to decisively become more data to label records by their data source (''Original" vs. "Copy").
be seriously considered for Generate a predictive model to attempt to disc~iminate between ~e Original ~nd Copy data
sets. If it is impossible to tell, after the fact, which records are original and which are random
(04 Marks) artifacts then there is no structure in the data. If it is easy to tell the difference then there is
variable. Instead the objective strong structure in the data. . . .
of unsupervised learning as In the CA RT model sepnrating the Original from the Copy records, nodes with o high fraction
r of representative records of Original records define regions of high density and qualify_as potential "clusters"._Such
mobile telecommunications nodes reveal patterns of data values, which appear frequently m the real data but not m the
will likely contain various randomized ai1ifact.
uch as age and postal code, We don not expect the optimal sized tree for cluster de1ection to be the most accurate separator
t, features of the plans the of Original from Copy records. We recommend that you prune•back to a tree size that reveals
tures, and billing and payme interesting data groupings.
rs will be identical on every This approach to unsupervised learning represents an important advance in clustering
ps of customers that ace very technology because:
ment, plan use, and spending • Variable selection is not necessary and different clusters may be defined on different
customer types such that th groups of variable
pe", this information could • Preprocessing or rescaling of the data is unnecessary as these clustering methods are not
ment. We cannot promise th influenced by how data is scaled
eful. But we include a meth • Missing values present no challenges as the methods automatically manage missing data
ing software. CART and oth • The Ci\RT-bnsed clustering gives easy control over the number of clusters and helps
er analysis, density cstimati select the optimal number
Breiman, but which may ha
c. What arc association rules? llow do they help? (04 Marks)
elsewhere for some time.
Ans. Associa1ion rule mining is a popular, unsupervised learning technique, used in business to
ith randomized variants ofth
help identify shopping pallcrns. It is also known as market basket analysis. It helps find
graphically to identify cluste
interesting relationships (allini1h:s) between variables (items or events). Thus, it can help
r ages and handsets owned.
cross-sell related items and increase the size of a sale.
dsets owned by people in the'
in their early 30's. If eve All data used in this technique is categorical. There is no dependent variable. It use~ machine-
there is no structure relati learning algorithms. The fascinating '·relationship between sales of diapers and beers" is
is everyday detection idea how it is ollen explained in popular literature. This technique ac.cepts as input the raw point
of- sale transaction data. The output produced is the description of the most frequent affinities
among items. An example ofan association rule would be, "a Customer who bought a laptop
h column ofdata separately. computer and virus .protection so fl ware also bought an extended service plan 70 percent of
lhe time."
pany, suppose we random
the database. Each custom In business environments a pauern or knowledge can be used for many purposes. In sales and
customer. We now re marketing, it is used for cross-marketing and cross selling, catalog design, e-commerce site
in which each colum design. online adve11ising optimization, product pricing, and sales/promotion configurations.
n and you can use e This analysis can suggest not to put one item on sale al a lime. and instead to create a bundle
of.products promoted as a package to sell other nonsclling items.
base, but other than movinJ In retail environments, it can be used for store design. Strongly associated items can be kept
and totals will not have close tougher for customer convenience. Or they could be placed far from each other so that
with item of information lhe customer has to walk the aisles and by doing so is potentially exposed to other items.
inh might be from customer In medicine, this technique can be used for relationships between symptoms and illnesses;
pend data from 98700 I. diagnosis and patient characteristics/trllatments: genes nnd their functions; and so on.
Representing Association Rules
41
VIII Sem, (CSf/IS'E)
CBCS • Mod<!l,Q"""'°"'l
A generic rule is rcpresenicd berwccn a ser X and Y: X ⇒ y (S%, C%J w.,.... , .....
X, Y: products and/or services
X: LeO-lrnnd-side (LHS or Anrecedcnl) .. ... ......
~inw
O\·-ett11:
...,
~<
::J
!!!!
Y: Right-hand-side (RHS or Consequenr)
S: Support: how often X and Y go together in the total transaction set
C: Confidence: how often Y goes together wi1h X
.......
~
O\'Mllf
r..- .,,,
....,,... i
.:.:.::.
Example: {Laptop Computer, Antivirus Software} ⇒ {Extended Service Plan} [30%, 70%]
Module -5
n..ov
•.a~"\'W
r1A ..,.,
......
,;r
'"""..,... ..,.
9. a. Why ls text mining useful in the age of sodaI media? (04 Marks)
Ans. Text mining is the art and science of discowring knowledge, insights and patterns from an
1°'1.·.-11'11•t
01:~rr,1~
RA'•"
....,.
•,c
organized collcction of textual databases. Textual mining can help with frequency analysis of Step 3: Now, use Naive
important terms, and their semantic relationships. class. The class with th.: I
Text is an important part of th.: growing data in the world. Social media technologies have Problem: Players will pl:i
enabled users to become producers of text and images and other kinds of information. We can solvc ii using abo
Text mining can be applied to large-scale social media data for gathering preferences, P(Ycs I Sunny) - P( Sunn
and measuring emotional sentiments. II can also be applied to societal, organizational and llere we havi: P (Sunny I
individual scales. Now, P (Yes l Sunny) = o.
Text mining works on texts from practically any kind of sources from any business or non- Naive Oaycs uses a simi
business domains, in any formnts including Word documents, PDF files, XML files, text various a"ributes. lliis a
messages, etc. Herc arc some representative examples: having multiple: classes.
I. In thd legal profession. text sources would include law, court deliberations, court orders, Naive Oa) .:s stand for:
etc. The "-Ord Bayes refers to
2. In acadcm ic research, it would include texts of interviews. published research articles, etc. B:iycs) which cornpulcs t
J. The: world of finance will include statutory reports, internal reports, CFO statements, and also on the basis of prior
more. l11e word Naive reprc~cu
4. In medicine, it would include medical journals, patient histories, discharge summaries, etc. arc inJc:pc:nJcnt variables
S. In marketing, it would include advertisi:ments, customer comments, etc. height , weight, age, g1md
6. In the world of technology nnd search, it would include patent applications, the whole of
c. What is II su11port \'Ccto
information on the world-widi: web, nnd more.
Ans. A Supp0rt Vector Mach inc
b. What Is a Naive-Bayes technique? What does Naive & Bayes stand for? hyperplane. In other wor
(08 Marks) outputs an optimal hypi:'11
Ans. Naive Bayes al&orithm : Naive Bayes is a simple technique for constructing classifiers and lhis hypi:rplane is a lined'
models that assign class labels to problem instances. rcpre51:nted as vectors of feature values, Confu~ing? Don't worry,
where the class labels are drawn from some finite set. It is not a single algorithm for training Suppose you arc given pl
such classifiers, but a family of algorithms based on a common principle: all naive Bayes decide a separating line fi
classifiers assume that the value of a particular feature is independent of the value of any
other feature, given the class variable.
Naive Bayes alaiorilhm works :
Let's understand it using an example. Below I have a !raining data set of weather and
corresponding target variable ·Play' (i.uggesting possibilities of playing). Now, we need to
classify whether players will play or not based on weather condition. Let's follow the below
steps to perfonn it.
Step I: Convert the data set into a frequency table Image A: D
Step 2: Create Likelihood table: by finding the probabilities like Overcast probability • 0.29 You might have come
and probability of playing is 0.64. separatl:ls the two classes
42
s.u..+.-t> e.r- ~
rV~A~ CBCS - ModeltQu.e~t"'i.ct'\IPoi.pe,.,- · l
wenttt
...
FIii'
.........
o o] >Jc ...,_,_tatl1
"""""
0\.'4f'Cllt
narav
.
,.1i1t"l4t
.,=··•:•r.
hO
4 ....s,;1,
, ...
....
oz I
......
..._,. > l I
~"'""'"' ~,w
!,tll'l•w ~
,
> d 1 !4 O.>< I
0\'4rtllf
(.A "IV
rt,\f'\y
~r
'-ir
A'I
.I\,,..., ..
5
..
rl'/1.J
,
.,.
vice Plan} [30%, 70%] ",Ot'l'W
nA "'iV
'•tJfl'W
...
...,
(')\•.-n-Hl' .,.
(04 Marks) t-..i.'.."l'T
...,._..
R#li-,V ~;r
ts and patterns from an
ith frequency analysis of Step 3: Now, use Naive Oayesian equation to calculate the posterior probability for each
class. The class with the highest posterior probability is the outcome of prediction.
edia technologies have Problem: Players will play if weather is sunny. Is this statement is correct'/
r kinds of information. We can solve it using above discussed method of posterior probability.
r gathering preferences, P(Ycs I Sunny)== P( Sunny I Yes) • P(Ycs) / P (Sunny)
cietal, organizational and Herc we have P (Sunny /Yes)= 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64
Now, P (Yes I Sunny)= 0.33 • 0.64 / 0.36 = 0.60, which has higher probability.
om any business or non• Naive Bayes uses a similar method 10 predict the probability of different class based on
F files, XML files, text various allributes. This algorithm is mostly used in text classification and with problems
having multipl.i class.:s.
eliberations, court orders, Naive 13ayes scand for: •
The word Bayes refers lo llaysian analysis (based on the work of the mathematician Thomas
shed research articles, etc. 13ayes) which com pules the probability of a new occum:nce not only the recent record, but
orts, CFO statements, and also on the basis of prior experience.
The word Narvc represcncs the strong assumpcion chat all the parameters of the instances
discharge summaries. etc. arc in<lcpemlent variables with little or no correlation. Thus if people are identified by their
ents, etc. height, weight, age, ~ender, all these variables are nssunh:<l to be independenc of each other.
applications, the whole of
c. Whut is a su1111orl \'cctor 11111chinc'! (04 Marks)
Ans. A Support Vector Machine (SVM) isa discriminative classificrfonnallydefincd by a separating
stand for? hyperplane. In ocher words, given labeled training data (supervised learning), the algorithm
(08 Marks) outputs an optimal hyperplane which categorizes new examples. In two dimentional space
onstructing classifiers and this hyperplane is a line dividing a plane in two parts where in each class lay in either side.
s vectors of feature values, Confusing'/ Don't wony, we shall learn in laymen terms.
ingle algorithm for training Suppose you are given plot of two label classes on graph as shown in image (A). Can you
principle: all naive Bayes decide a separating line for the classes?
endent of the value of any
•••
data set of weather and •• •
laying). Now, we need to • ■ II
'on. Let's follow lhe below • •
Image A: Drawn line that separates black circles and blue squares.
You might have come up with something similar to following image (image B). It fairly
separates the two classes. Any point that is left of line falls into black circle class and on right
43
VIII Se,m, (CSE/IS£)
C8CS • Model,Q~
falls into blue square class. Separation of classes. Thal's whal SVM does. It finds out a line/
hyper-plane (in multidimensional space that separate outs classes), Shortly. we shall discuss pages. There are two bas
why I wrote multidimensional space. I. Hubs: These are pagei
gathering point, where pe
com, or government site!
•••
••
com and yelp.com could
2. Authorities: Ultimate;
• •• • complete and authoritat
■ ■ information, news, advic
of inbound links from ot~
page for expert medical 0
Image B: Sample cut to divide into two classes. news.
Web usage mining
OR As a user clicks anywhe
10..a. What are the three types of web mining? (10 Marks) entities in many location
web server providing the
Ans. The web could be analyzed for its structure as well as content. The usage pattern of web
pages could also be analyzed. Depending upon objectives,_,~eb mining can be divide~ i?to _ activity on those pages.
proxy server, or ad server,
three different types: Web usage mining, Web content mmmg and Web structure mmmg
The goal of web usage min
(Figure I 0.1 ).
through Web page visits a
access logs, referrer logs,
usage profiles are also g
- -
( Web Mining )
44
~V~A~
I does. It finds out a line/ pages. There are two basic strategic models for successful websites: Hubs and Authorities.
Shortly, we shall discuss I. Hubs: These are pages with a large number of interesting links. They serve as a hub, or a
gathering point, where people visit to access a variety of information. Media sites like Yahoo.
com, or government sites would serve thal purpose. More focused sites like Traveladvisor.
com and yelp.com could aspire to becoming bubs for new emerging areas.
2. Authorities: Ultimately, people would gravitate towards pages that provide the most
complete and authoritative information on a particular subject. This could be factual
information, news, advice, user reviews etc. These websites would have the most number
of inbound links from other websites. Thus Mayoclinic.com would serve as an authoritative
page for e;,.pert medical opinion. NYtimes.com would serve as an authoritative page for daily
news.
es. Web usage mining
As a user clicks anywhere on a webpage or application, the action is recorded by many
entities in many locations. The browser at the client machine will record the click, and the
(10 Marks)
web server providing the content would also male a record of the pages served and the user
he usage pattern of web activity on those p:iges. The entities between the client and the server, such as the router,
ning can be divided into pro;,.y server, or ad server, too would record that click.
d Web structure mining The goal of web usage mining is to extract useful information and patterns from data generated
through Web page visits and transactions. The activity data comes from data stored in server
access logs, referrer Jogs, agent logs, and client-side cookies. The user characteristics and
usage profiles are also gathered directly, or indirectly, through syndicated data. further,
metadata, such as page attributes, content attributes, and usag9tatn are also gathered.
The web content could be analyzed al multiple levels (Figure 10.2).
I. The server side analysis would show the relative populnrity of the web pages nccessed.
Those websites could be hubs and authorities.
2. The client side analysis could focus on the usage pattern or the actual content consumed
and created by users.
I. Usage pattern could be: analyzed using 'clickstream' analysis, i.e. analyzing web activity
for pallerns of sequence of clicks, and the location and duration of visits on websites.
iversal resource locator). Clickstream analysis can be useful for web activity analysis, software testing, market
their content is managed research, and analyzing employee productivity.
Systems. Every page can 2. Textual information accessed on the pages retrieved by users could be analyzed using
inds of content including text mining techniques. The te;,.t would be gathered and structured using the bag-of-words
technique 10 build a Term-document matrix. This matrix could then be mined using cluster
ge/URLs, including the analysis and association rules for patterns such as popular topics, user segmentation, and
Id be analyzed to gauge sentiment analysis.
population. The text and Si.Ulm.A Mine Data for
visit counts. The pages PctRi!:'9 dos, u,~,g~ PJttcrn~
foe ao-1lvil1 u~, Prot,ies
that attracts most users. Webslto
Wcbl.oCt,
lde#1Ufv u,e,,
..w.t, p:a,re
v,slton, ldcm,fv vi1l11
can be transfonned with Wobsltc Unirs, CUtk st,.e.1ms
Identify PO&O
proftles
•Web
igned 10 keep the more Customcirs views
op1lmlt~lon
-summar,zo
o1cl1V'lty dal:t
ocol (http). Any page Figure: 10.2 Web Usage Mi11i11g arc//itcct11rt
paite. The intertwined Web usage mining has 11ld11y bu~incss applications. It can help predict user behavior based on
lyrical algorithms. The previously learned rules and users' profiles, and can help determine lifetime value of clients.
of hyperlinks among It can also help design cross-marketing strategies across products, by observing association
45
VIII Se.tw (CSE/IS'E)
'B(,g,Vam-A~
rules a~ong the pages on the website. Web u_sage can help evaluate promotional cam a, ns
and see rfthe ~s~rs were attracted to the websnc and used the pages relevant to the cam~i'!n.
_Web usage mmrng could be used to present dynamic information to users based on their
interests and profiles. This includes targeted on line ads and coupons at user groups based on Eigl1U
user access panems. C
b. What is social network analysis? faplain the a pplications of SNA? (06 M11rks) Time: 3 hrs.
Ans. Social Network Analysis (SNA) is the mapping and measuring of relationships and streams ,._ Note : ; f11swer Q11y Fl VE
between people, groups, organizations, computers, URLs and other sources of information
that arc connected. M:inagcment consultants in particular use Social Network Analysis to
map their business relations and further investigate their mutual relationships. This can be
compan:d to a central nervous system that connects everything. I . a. Expluln Gcncrul HOF:
Social networks are a graphical representation of relationships among people and/ or entities. Ans. General HOFS Comm
SN/\ is the art nnd science of discouvcring patterns of interaction and influence within the Examples in this scctior
participantsinnnetwork.Thcseparticipantscouldbcpeople,organizations,mnchincs,eoncepts.or S hdfs version
any othcr kind of entities. Then: nrc two mnjor levels ofsocial nel\\ ork analysis: discovering Hadoop 2 .6 .0 .2 .2 _4 .,
sub- networks within the network, and ranking the nodes to find more important nodes or Subversion
hubs. 22aS6~cl>l:448969d0790
Computing the relative influence of each node is doni: on the basis of an input• output matrix Compiled with protoc 2
offlo\\S of influencc among the nodes. Fr~m source with check
Applications orSNA: using /usr/hdp/2 .2 .4 _2_.
• Sclf-awnrcm:ss 1-lDFS provides a series o
• Communities A list of those comman
• Marketing these commands will be t;
• Public henllh S hdfs dfs
Sctr -awareness: Visualizing his/her social network can help a person organize their Usage:. hadoop fs (generi
relationships and support network [•cat [·1gnoreCrc] <src::,.
Communities: SNA can help identification. construction, and strengthening of networks {•checksum <src>...J ..
within communities to build wellness. comfort, and resilience. Analysis of joint authoring [-chgrp [·RJ GROUP PA
relationships and citations help identify subnet works of specializations of knowledge in [·chown f·RJ <MODE(
an academic field. Rescarchcrs nt Northwestern university found that the most detenninant [-copyFromLocal HJ [.'pJ
factor in the success of n broad way play was the strength amongst the crew and cast. (-count (·g) f·h) <path>...
Mnrkcling: there is a popular network insight that any two people arc related to each other [•cp HJ [·p] ·p[topax] J <
through at most seven degrees of links. Organizntions can use this insight to reach out with [•createSnapshot <snapsh
their message to large number of people and also to listen actively to opinion leaders as [-dt!lelesnapsho1 <snapsh
ways to understand their customers needs and behaviors. Politicians can reach out to opinion {-df(-hJ <path::,., ..)
leaders to get their message out. (-du (-sJ [·hJ <path>...] {•c
Public health: Awareness' of network can help identify the paths that certain diseases take [·gel [· pJ [•ignoreCrcJ [-er
to spread. Public health profossionals can isolah: and contain diseases before they expand to f•fetfa11r (·RJ [•n name I ·d
other networks. [•help [cmd...] J
[•Is (·dJ [•h] [·RJ [<path>..
{·mkdir {·pJ <palh>...) {mo
[•movcToLocal <src::,. <loc
[·put HJ f•p) HJ <localsrc
(•rt:nameSnapshot <snapsh
[·rm HJ [•r l ·RJ {•skipTra
[-rmdir [-ignorc-fail-on-n
[•setfacl [·RJ [{·b I •k} {·m
<acl_spec> <path>] J
46
te promotional campaigns
s relevant to the campaign. E ighth Semester B.E. Degree Examimatioo,
on to users based on their
ons at user groups based on CBCS - Model Question Paper - 2
BIG DATA ANALYTICS
SNA? (06 Marks) Time: 3 hrs. Mox. Marks: 80
f relationships and streams Note: Am111er any FIVE/11ll ,111eltio,u, lelet:tlng ONl::ful/ que:itlonfrom eac:11 m odule. ~
1,..
47
VIII Set111 (CSE/ISE) Blg,VattvA ~
[-setfattr {-n name [-v value] 1-x name} <path>] (-setrep [·RJ [-w] <rep> <path>...)
[•stat [format] <path>..] drwxr-xr-x -hdfs
[-tail -[-f] <file>] drwxr-xr-x -hdfs
[~test -[dcfsz] <path>] d,wxr-xr-x -hdfs
[-text [-ignoreCrc] <src>...] [-touchz <path>...]
drwxr-xr-:-. -hdfs
[-truncate [-w] <length> <path>...] [-usage [cmd...]]
Generic options supported are drwxr-xr-x -hdfs
-conf <configuration file> specify an application configuration file drwxr-xr-x -hdfs
-D <property=value> use value for given property The samc result l·an be
-fs <local I namenode:port> specify a namenode $ hdfs dfs -Is /user,hdts
-jt <local lresourcemanager:port> specify a ResourceManager Make a Directory in I
-tiles <comma separated list offi lcs> specify comma separated files to be copied to the map To make a directory H9
reduce cluster path is supplied. tht: us
-libjars <comma separated list of jars> specify comma separated jar files to include in the S hdfs dfs -mkdir stuff
class path. Copy Files to IIDFS
-archives <comma separated list of archives> specify comma separated To copy a file from you
archives to be unarchived on the compute machines. full path is not supplit:d
The general command line syntax is in the din:ctory stuff tha
bin/hadoop command [genericoptions) [commadOptions] $ hdfs dfs -put test stuff
List Files in HDFS The file transfer can be
To list the files in the root HDFS directory, enter the following command: $ hdfs dfs -Is stuff
$ hdfs dfs -Is / ' Found I iteri1s
Found 10 items -rw-r--r--- 2hdfs hdfs 12
drwxnvxrwx -yarn hadoop 0 2015-04-28 16:52 /app-logs Copy Files from IIDFS
-hdfs hdfs 02015-04-21 14:28/apps Files can be copied back
drwxr-xr-x
the file we copit:d into H
drwxr-xr-x -hdfs hdfs 0 2015-04-2 1 I0:53 /benchmarks name test-local.
drwxr-xr-x -hdfs hdfs 02015-04-21 15: 18/hdp $ hdfs dfs -get stulli'test t
drwxr-xr-x -mapred hdfs 02015-04-21 14:26 /mapred Copy files Within 1101
drwxr-xr-x -hdfs hdfs 02015-04-21 14:26 /mr-history The following will copy
S hdfs dis -ep stuff/test te
drwxr-xr-x -hdfs hdfs 0 20 15-04-21 14:27 /system Delete a File within 110
drwxrwxrwx -hdfs hdfs 0 2015-05-07 13:29 /tmp TI1e following command
drwxr-xr-x -hdfs hdfs 0 2015-04-27 16:00 /user prcviously:
02015-05-27 9:0 I /var $ hdts df.s -m1 test.hdfs
drwx-wx-wx -hdfs hdfs
Movc;:d: ·hdfs: / /limulus:
To hst files m your home directory, enter the following command.$ hdfs dfs -ls
hdfs/ .Trash/Current
Found 13 items Note thrtt when the ts.tr.i
drwx---·· · -hdfs hdfs 0 2015-05-27 20:00 .Trash deleted files are moved to
drwx------ -hdfs hdfs 0 2015-05-26 15:43 .staging -skipTrash option.
drwxr-xr-x -hdfs hdfs 02015-05-28 13:03 Distributedshc 11 S hdfs dfs -rm -skipTrash
Oi,ll;ti, u Diri,c to ry in HO
drwxr-xr-x -hdfs hdfs 0 2015-05-14 09: 19 TeraGen-S0GB
The following command\
drwxr-xr-x -hdfs hdfs 0.2015-05-14 10: 11 TeraSort-50GB S hdfs dfs -rm -r -skipTra
drwxr-xr-x -hdfs hdfs 0 2015-05-24 20:06 bin Gel an IIDFS Status Re1
Regular users can gt:t an
drwxr-xr-x -hdfs hdfs 02015-04-29 16:52 ecamples
Those with I-IDFS admini
this commamr uses dfsad,
48
~Vc::t,0;vA~ C !iCS · Modclt Que«c.en., Petpe-Y · 2
repo,, is ~imilar to the data presented in the IILJFS web GUI dbcount: An exam
S hdfs dfsadmin -report tlistbbp: A map/red
Configured Capacity: 1503409881088 (1.37 TB) i:rcp: A mapfn:duce
Present Capacity: 1407945981952 ( 1.28 ra) effects a join over s~
DFS Remaining: 12555 I0564864 l 1.14 TB) 1nullililc1,c: /\job t
DFS Used: 152435417088 (141.97 GB) program to find sol
DFS Used%: 10.83% pi: A map rcJuce pn
Under replicated blocls: 54 Blocks with corrupt replicas: 0 Missing blocks:0 nt11domtc1Hilcr: A
------······-·············-···-··-··-·· sccond11rysort: An
report: J\ccess denied for user deadline. Superuser privilege is required. program that sorts ti
b. Wrile a ~hort note on ruMini: map reduce na mplc and alJ>o e,plain the e1isting )udoku: A sudol..u 5
ft\'ailable l'\amplc~. (08 Marls) tcragcn: Generate <l
Ans. Running J\111pltcducc E'\:111111lcs tcrasorl: Run lhc tc~
All Had Jop , ell-ases come wi1h MapReduce example applications. Running the existing lt'r:1111lidatc: C'heckl
MapReJu~e c:-.amplcs is a s;mple pru.:css•once the example files are located, that is. For \\Ordl'Ount: A rnap-'ri
example. 1f you installed ! laddop version 2.6.0 from the Apache sources under /opt, the \\ord111can: A map/r
cx:unrk ,··ill he in •he follo1\i11g dirwory: files.
/opt /h.,<' -r·" .6.0l~hare'hadoop maprcduce/ wordian: A map'redu
In other, .i1 ~ion~. the c:-.amples may I~ in ·un.'lib 'hadoop-maprcduce/ or some other location. "ord slandard de, i
The exact location .>f the e-.:amplc jur file can be found using the find cominand. length of the words in
S find/ -name ''haddop-mapreduce-i:xample* .jar" •print
Consider the following software cn~ironment
2 · a. EAplain with neat di·
• OS: Linux
• Platform: RHEL 6.6 basic steps of MopR •
• Hortonworks I IDP 2.2 with Iladoop Version: 2.6 (diagrnm).
In this en, ironment, the location of the examples is /usr/hdp/2.2.4.2•2/hadoop- mapreduce. Ans. M:1flltcduce rarallcl
For the purpose of this example, an env1ronn1cnt variable called HADOOP EXAMPLES can algorithm is fairly sim
be defim:d as follows: - function. Operntionall
$ export HA DOOP_EXAM Pl ES•tusr/hdp/2.2.4.2•2/hadoop-mapreducc be quite compfc:,. Pura
Once yo,11kfine thi: examples path. >ou can run the Hadoop examples using the commands mapper and reducer pru
discussed in lhe folio" ing sections. The basic steps 11rc as
Listing AvnHuble Eumplcs l.lnpu l Splits, IIDFS ~
A list of the available c:-.umples cnn be found by running the following command. In some chunk or block and wril
cases, tl_1c version number may be pa11 of the jar file (e.g., in the version 2.6 Apache sources, on multiple machines (1
the file 1~ named hadoop-mapreducc-cxamples•2.6.0.jar). determined by I IDf'S a
$ yam jar SHA DOOP_EXAMPLES/hadoop-mnpn:duce•cxnmpte.jar considered pa11 or tire I
Note: In previous Vl!rsion of Hadoop, the command hadoop jar... was used to run Map Reduce throughout HOFS ~crvci
programs. Ne_wer versions provides the yarn command. which offers more capabilities. Both The inpul splits used b>
commands will ,,ork for these e~.imples. example, lhe splil siLe c
The possible examples arc as follows: records) or an nclual size
An example program must be givcn as the first argument. The number of~plits corr
Valid program names are: 2.Map Step. The mappir
agg~rgalc n ordcount: An Aggregate based map/reduce program that counts the words in For large amounts ofdma
the input files. the specific mapping pr
agi:rcgalt!wo_rdhis~: An Aggregate based map/reduce program that computes the histogram where the block resides.
of the words 111 the input files. will lry to pick a node th
bbp: A map redure program that uses Oailey•Borwcin•Ploulfe that compute exact bits of Pi. called rack awareness)
50
C.BCS • >--fodel,Q~«on,petpe,.,- - 2
tlbcount: An example job that count the pagevie" counts lrom a tlatabasc
tlistbbp: A map'reduce program that uses a BUP•t)pe formula to compute exact bits of Pi.
grep: A map/reduce program that counts the match~'S of a regcx in the input jo1fi ·\ job that
effects a join over sorted, equally partitioned data~ct~
mulllfilt>,,c: A job that counts words from several files. pcntomlno: A map/reduce, till laying
program lo lind solutions to pentomino problems.
pi: A map. reduce program that estimates Pi using a quasi-MonteCarlo method,
blocks:0 rnntlomtewritcr: A map/reduce program that writes IOGA of random textual d~ta per node.
sccontlar)sort: An example defining a secondary sort to the reduce. aort: A mJpircduce
program that sorts the data written by the random writer.
ed.
sudoku: A s11dok11 solver.
0 exi>lain the c,isting tcragcn: Generate dntn for the ter.i~on
(08 Marks) tcrn)ort: Run the teroso11
tcravalidate: Checking results oftcrasort
s. Running the e>.tsting wonlcount: ,\ mnp/rl·ducc progrnm thnt counts the word~ in the input files.
are located. that is. For wordmcun: A map/n.:duce progrnm that counts the.: average length of the words in the input
• sources under lopt. the files.
wordian: A map/reduce program that counts the median length of the words in the i11put files.
" ord standard dc1 iation: A map reduce program that counts stand.ire.I dcvbt,on of the
•e/ or sortll! other location length of the words in the input liks.
d cominnnd:
OR
2. E.~plain with ucat diagram Apache lladoop parallel m:1p !'educe data now (or) Explain
11.
basic steps of MapRcducc parallel data flow "ith the CAamplc of 11ord count program
(diagram). (08 Marks)
Ans. MapRcducc Parallel Data Flow: From a programmcrs perspective, the MapReduce
4.2-2/hadoop- mapreduce. algorithm is fairly simple. 1 he programmer must provide a mapping f11ncti1•11 nnd a ri.'ducing
\DOOP_EXAMPLES can function. Operationally, ho11 ever, the Ap:iche Hndoop parallel MapReduc~ data now can
be quite complex. Pnralll'I execution of MapRcduce r.:quircs other steps in addition to the
reduce mapper and reducer processes.
pies using the commands The basic ~tl'ps arc ns follows:
I.Input Splits, HDFS distributes and replicates data owr multiple servers. The default data
chunk or block and wrillen to different machines in the cluster. The data arc also replicated
wing command. In some on multiple machines (typically thr.:e machine). These dnta slices are physical boundaries
rsion 2.6 Apache sources, determined by HDFS and have nothing to do "ith the data in the file. \lso, while not
considered part of tire MapReduce process. the time required to load and Jistributc data
:ll' throughout I IDFS servers can be considercd part of the total processing llmc.
used to run MapReduce TI1e input splits used by MapRe<luce :ire logical boundaries based on the input dnta. For
more capabilities. Both example, the split size can b.: based on the number of records in a ti le (if the d,tl,1 exist ns
records) or an actual size in b:rtes. Splits are almost always smaller than the I IDFS block size.
The number ofsplits corresponds to the number of mapping processes used in the map stage.
2.Mnp Step. The mapping process is II here the parallel nature of I ladoop comes into play.
For large amounts of dat:i, many mt1ppcrs can be operating at the same time. I he user provides
that counts the words in the specific mapping process. MapReducc will try to ext!cutc the m~pper on the machines
where the block resides. Because thc file is replicated in I IDFS, th..: kast b•t~y. Mar,Reducc
computes the histogram will try to pick a node that is closest to the node that hosts the tfata block (a ch:iracteristic
called rack awareness). Thi: last choice is any node in the cluster th.it has access to HDFS.
comp1.1e exact bits of Pi.
51
VIII S<?-iw (CSf(ISE)
3.Combincr Step. It possibl~ to provide an optimization or pre-reduction as pa~ of the map step is to write the outpu
stage when: key-value pairs are combined prior to tht: next stage. The comb111c:r stage 1s As mentioned, a combin
optional. · ·1 k tb For instance, 111 the prev
4.Shuffte Step. Because the parallel reduction stage can complch:, a 11 s11111 ar eys mus e (spot, I)
combined and counh:d by th.: same reducer process. Therefore, results of the map stage ~1ust (run, I)
be collected by the key-value pairs and shulTh:d to the same reducer process. If only a smgll: As shown in figure: I .2,
reducer process is used, the shullle stage is not needed. . . This optimization can he
5.Ucduce Step. The final step is the actual reduction. In this stage, the data reductton 1s
erformed as per the programmers design. The reduce step is also optional. The results ~re
~rittcn to HDFS. Each reducer will write a11 output file. For example, a MapRcducc Job
ru11ni1w four rc:ducers will create files called pa11-0000. part-000 I , and part-0003.
Figure"1. I is an example ofa simple Hadoop MapReduce data now for a word count program.
The map process counts tin: words in the: split,and the reduce process calculates the total for
each word As mentioned earlier, the actual computation of the map and reduce stages are up
to the pro~•rammcr. The Mapreduce data now shown in Figure 1.1 is the same regardless of
the specific map and reduce t,1sl-s.
Input , Spl~ Map Shullle Aeduce OutplA Figure 1.2 Atltl,
b. With lwo programming c
Interface.
Ans. Using lhc Streaming lute
The Apache Hadoop strca
engine. The streams intcrf
and stdout.
When working in the 1-/ado
by ~he user. This approach
eas,ly tested from the comm
Listings I. I and 1.2, will be
Listing 1.1 Python Mappc
#!/ust/ bin/cnv py1hon
import sys
Figure I.I Ap11d1e ll<11luop p1m1llel /llapre<luce 1/maj/0111 # input comes from STDIN
The input to 1hc MapRcduce application is the following file in HDFS with three lines of for line in sys.stdin:
text. The goal is to count the number of times each word is used. # ren_iovc leading and trniling
see spot run # split lhc line into words wo
run spot nm # increase counters
see the cat for word in words:.
The first thing MapReduce will do is create the data splits. for simplicity, each line will be # write the results to STDOU
one split. Since each split will require a map task, there are three mapper processes that count the
the number of words in the split. On a clusler. the results of each map task are written to local # Reduce step. i.e. the input r.
disk and not to HDFS. Next, similar keys need to be collected and sent to a reducer process. # tab-delimited; the trivial wo
The shufile s11:p required data movement and can be expansive in terms of processing lime.
Depending on the nature of the applicntion, the amount of data that must be shuffle throughout Listing 1.2 Python Reducer
the cluster can va,y from small to large. #!/usr/bin/env python
Once the da1a have been collected at1d so,1cd by key, the reduction step can begin (.even if from operator import itemegct
only partial results arc availablc). It is not necessary-and not normally recommended-to have
currcnt_word~None current c
a reducer for each key-value pair as shown in Figure 1.1 . In some cases. a single reducer will
#input comes from STDIN -
provide adequate p,•rfonnnnc.:: in 01hcr cases. multiple reducers may be required to speed up
for line in sys.stdin:
the reduce phase. Th.: number of reducers is a tunable option for many applications. The final
# remove leading and trailing
52
C13CS - Mode.lt Q ~ Pc;,.,per- • 2
ct ion as part of the map step is to write the output to I IDFS.
I he combmer ~tage is As mentioned, a combiner step enables SIJfflC ~reduction of the map output data
For inst:mcc, 1,1 the previous cAample, one map produced the following counts: (nin, I)
II similar keys musl be (spot, I)
!I of then1.1p stage must (nrn, I)
process. If 01,l) a singlt: As shown in figure I .2, the count for run can be combined into (run ,2) before the snuffle.
This optimization can help minimize of data lnlnsfer needed for the shuffic phase.
e, the data reduction is ""' j' ~hulDC!
53
VIII Se.mt (CS'E/ISf)
# parse the input we got from mapper .py word, count= line .split(' /t', I)
Locate the hadoop-strenm
# convert count (currently a string) to int contain a version tag. In 1
try: following command line \
count= int(count) input file.
except ValucError: $ hadoop jar /usr/hdp/curr
# count was not a number, so silently# ignore/ discard this line -+
continue -file ./mapper .py
# this IF-switch only works because Hadoop sorts map output # by key (here: word) before -mapper ./mnppcr .py
it is passed to the reducer
-~ le .!reducer ·PY -reduce .
if current_word ~ = word: current_count += count else:
-mput war-and-peace-inpu
if current_word: -output w:ir-and-peacc-out
# write result to STUDOUT
print '%s/t%s' 5 tcurrent_word, current_count) current_count "- count
T_he output will be the f.11111
din:ctory. The ac1ual file na
current word "-word
Also note th,:! lhe Python sc
# do not forget to output the last word if needed! if current_word "- = word:
C code,o,· nny language that
print 'o/os/to/os• % ( current_word, current_count)
The operation of the mapper .py script can be observed by running the commands as shown Al_though the streaming int
usmg Jav~ directly. In parti
in the following: Another disadvantage is that
$ echo "foo foo quux labs foo bar quux" I ./mapper .py
API are not itvaifable in strc
Foo I
Poo I
Quux I 3. a. Explain How to quire data'
Labs I Ans. Apache flume is nn indepen
Foo I HDF~. Often dnta trnnsport i
Bar I machmes and locations. Flu
Quux I message, and just about nny c
Piping the result of the map into the sort command can create a simulated shuffic phase: As shown in Figure 3.1, a Flu
Bar I • Source. The source compo
Poo I to more than one channel.
Foo I another Flume ngent.
Foo I • Channel. A channel is a d,
Labs I It can be though of as buffe
Quux I • Sink. The sink delivers data
Quux I A Flume agent must have nil 1
Finally, the full MapReduce process can be simulated by adding the reducer .py script to the severn/ sources, channels, and
following command pipeline: take data from only a single ch
$ ehco "foo foo quux labs foo bar quux" I ./mapper.py I so1t a si_nk removes the data. By de
-+ -k I, I l./rcduccr .py optionally stored on disk to pre
l3ar I
foo 3
Labs I
Quux 2
To run this application using a I-ladoop installation, create, if needed, a dirertory and move
the war-and-peace.Ix! input file inlo HDFS:
$ hdfs dfs -ml-.dir war-and-peace-input
$ hdfs dfs -put war-and-peace.txt war-and-peace-input
Mnke sure the output directory is removed from any previous rest runs: ..
$ hdfs dfs •rm •r -skpiTrash war-and-peace-output Figure J. / Flume Age11I will,
54
rLg, Vctt~A ~ '
cacs - ModeLQ~t'WYIIPc>.per -2
(' I t', I)
Locat~ thc ha~oop-streaming.jar file in )Our d1~nbution. The location may vary, and it may
contam a vers1on tag. In this example, the Hononworks HOP 2.2 distribution was used The
:ollowing command line will use the mapper py and reducer .py to do a word count 0 ~ the
input file.
S hadoop jar /usr/hdp'currctn/hadoop-mapreduce-cltent/hadoop-streaming.jar
-+
-file /mapper .py
y key (here: word) before -mapper ./mapper .py
-file }reducer .py -reduce ./reducer .py
-input war-and-peacc-input/war-1nd-pcacc .txt
-output war-and-peace-output
The output will be the familiar LSUL'CESS and part -00000) in the war-and-peace- output
unl directory. The actual file name may be slightly difference depending on your I ladoop version.
Also note th~! the Python scripts used in this example could be Bash, Perl, Tel, Awk, compiled
., =word· C code,or nny langua1:,.: that can read nnd ,Hite from stdin and ~tdout.
Although the streaming interfa~e i~ rather sunple, it docs have some disadvantages over
g the commands as shown using Java directly. In pm1icular, not all .. p;,lications are string and character binary data.
Another disadvantage is that many tuning parameter~ available through the full Java Hadoop
API are not available in streaming.
Modulc-2
Explain I low lo quire data stn•ums using Apache F'lume? (0~ Marks)
Apache rlume i~ an 111dcpcndcnt ngent designed to collec1, transport, and store data into
HDFS. Often daw transport involves a number of flume agents that may traverse a series of
machines and locations. Flume is often used for log files, social media-generated data, email
message, and just about any conlinuous data source.
As shown in Figure 3.1, a flume agent is composed of three components.
imulated shuffie phase: • Sou rec. The source component receives data and sends it to a channel. It can send the data
to more than one channel. The input dala can be from a real-time source (e.g., weblog) or
another Flume agent.
• C h111111cl. A channel is a data queue thnt forwards the source dntn to the sink destination.
It can be though of as bufft:r thal manages input (source) and oulput (sink) flow rates.
• Sink. The sink delivers data to destination such as IIDFS, a local file, or another flume agent.
A Flume agent must have all lhree of these components defined. A flume agent can ha,·e
several sources, channels, and sinks. Sources can \Hite to multiple channels, but a sink can
the reducer .py script to the take data from only a single channel. Doto written to n channel remain in the channel until
a sin"- removes the data. By default, the data in a chanmd arc kept in memory but may be
optionally stored on dis"- to prevent data loss in the event ofa network failure.
runs:
..
Figure J. l Flume Age111witlt source, ,·1tu1111el, u11d li11k(atfuptetlfrom Apatl,e Flume
t/tJt'llllll!IIIOII0/1)
55
VIII Set11, (CSE/ISE) CBCS - Modcl,Q~
As sho\\n in Figure 3.2, Sqoop agents may be placed in a p1pclmc, possibly tu traverse namespacc andlogs.
several machines or domains. This configuration i~ normally used when data are collected Th-.: web-based UI c
on one machine (e.g., a web server) and sent to another nmchine that has access to I !DI'S. the NamcNodc. In A
Links pull-down me
tab 1~ iIf open with t
~ ~c,,
____..,,...____
the following comm
S fircfu.-: http:loc:alh
loo Thcire are five labs
Ulilities. fhc Overv
Figure 3.2 Pipeline is created b} connecting Fume agents ( Adapted from Apache Flum.: line tools also offer
Sqoop Documentation) information lir..c th~
The Sanpshot win
informalion on snap
Figure 3.6 provide
read~ the prcv iou~ Ii
image, thereby crca
DataNodcs come o
starts. Completed pl
in italics. Phases tha
have been complete
out of safe mode.
The Ut1lillcs menu
browser. From this\
which is not shown,
......
....,,...a.,....., ...
Overvi8W °llolu
Figure J.J A Flume c:u,uulidutiu11 11f!lwork (Ad11ptedfrom Ap11c:fle Flume
D uc11m,mtutlo11)
In a Flume pipdine, the sink fonn one agent is connected to the source of another. The
data tran~fer format normally used by Flume. which is called Apache Avro, provides several
useful features. First, Avro is a data serialization/deserilization system that uses a compact
binary format. The schema is sent as part of the data exchange and is defined using JSON
(JavaScript Object Notation). Avro also n:mot.: procedure calls (RPCs) to send data. That is, Summary
an Avro sink will contact an Avro source to send data. Another useful flume configuration is
shown in Figurc 3.3. In this configuration.Flume is used to consolidate SC\'eral data sources
before committing them to IIDFS. There arc many possible ways to constnict Flume transport
...,.. • ._...wd1nn
networks. In addition, other Flume features not described in depth here include plug-ins and .....,_ toon(IJ . . . ., . , . . •u
56
l lfl'Vett"G\IA~
namespace and logs.
line, possibly to traverse The web-based UI con be s1nrted from within Amban or from n web browser connected tu
when daH1 arc collec1ed the NamcNode. In Ambari, simply select the IIDFS Kn.<ice window and click on the Quick
al hns access to \-IDFS. Links pull-down rncmt"in the top middle of the page. Select NameNodc UI. A n.:w browser
tab will open with the UI shown in Figure 3.4. You can also start the UI directly by entering
the following command:
S fire fox http: loc(llhost:50070
There arc five t111Js on the UI : O,·en iew, Datanodcs, Snapshot, Startup Progress, and
Utilities. fhc Ovcrvicw page provides much of the essential information that the command-
line tools also offer, but in a much easier- to - read format. Th" Dntnnodcs tab cJbpluys nude
pied from Apache Flume information like that shown in figun; 3.5
The Sanpshot window lists the "snap-shottablc" directories and the snapshots. Further
information on snapshots can be found in the "HDFS Snapshots'' section.
Figure 3.6 provides a NamcNode sta1111p progress view. when the NameNodc starts, ii
reads the previous file system image file(fsirnage); applies any new edits to the file system
image, thereby creating a new file system image; and drops into safe mode until enough
DataNodes come onlinc. This progress is shown in real time in the UI as the NameNode
starts. Completed phases nre displayed in bold te,t. The currently running phase IS displayed
in italics. Phases that have not yet begun are displayed in gray text. Figurl! 3.6, all the phases
have been completed, and as indicated in the overview window in Figure 3.4, the system is
out of safe mode.
The Utilities menu ull't:rs twu options. The first, as shown in Figure 3.7. is a file system
browser. From this window, you can easily explore the HDFS namespace. The second option,
which is not sho,~n. links 1~ the various NnmcNodc lolls.
HDFS
_..
:::::.~::....-C ..
lidate sc,·eral cfala sources 4L"l,ot,t'Ml l • )\J)fll ..... ,._~•lolfl<.....
construct Flume transport ".,..,.,....,y..witna,,.,...,,M-4 ... --'~"Ma•,....,..,..,._.,., .. ,_... -----.•~Ml-
here include plug-ins and 9"ff'MWMll"9'r.,...•••l.,__,,,. M,MI(....,.,....,.,_,.__,,,,,.._._._.., tilt
_.,.
......-
o,,.,..., Ot\C'A
(12 Marks)
-
.... Of • . , . .,
lUff
57
VIII Se.tw (CSE/ISE) 13i.g-Vc;,.m,A~
1. Add the 1
most,cases, the
hdfs.
Datanode Information
In operalJOn
- <_.,
l~IC.J
no1c.1
1 tJ{4
11n<:ot
.... .,,~
'4,.....
, .. uca
- -...
nn,01
lUUQI
"'
"'
...... ,._. ..-.-
111 w1>.»111t
ll.!SCil4l.N
.• -
f tUJ,.N
JUUU~
Browse Direc
Jid t ,c.a
lM1fA
UM<A
11Mt.r
HAl(;,I
1•••ot
)UJI-
'"""°'
...
1tOU•C10'M
UMc.t j a.ti-. .
• ,UU-l:U-l
)U)U)-.J
0
Oecom,sslomng
- .... .-
__.............
....... _.,__
..
-
~ ,__... ~~...J<wl'r•~ -• 1111•••••••teM.l1 I._JJ Ml
.,.,...._
.....
.....
--
------
aac
2. Create the username
hdfs dfs -mkdir /uscr/<u
3. Give that account
<usemame>:<groupnam
·-·- ·-
To check the health of
command. The entire 11
--
--
,Mf,.~tUfff~---00000,aoaaaaa,,,.)t.,OOOGOOCUIDIDUIJTIJ I Ml 0.Hll~J
~repo,urdMKt,(111n:11
l-
·- ·-
,..,.
·-
1---.,-
$ hdfs fsck/
Connecting to namenode
FSCK started by hdfs (a
14:48:01 EDT20 15
·············· .. ,,,,............
,,
' Status: HEALTHY
Figure: 3.6 NameNo,le web i11terfau sl,owi11g start11p progress To\a\ si1.e·. \004'.BS6S1g
Adding Users to HDFS Total symlinks: o (F
Keep in mind that errors that crop up while Hadoop applications are running are often due 'Tota\ b\ocks (va\idated:
to file pennistiont. b)t:>rk.s- (not vs,J;J,.,.J),
To quickly crea1e user accoums manually on a Lmux-based system, perform thefollowing Min
steps: · Ove
Und
S8
l. Add the user to the group for your operating S)stcm on the HDFS client system. In
most,cases, the groupnamc should be that of the HDFS superuser, which is ofi.:n hadoop or
hdfs.
useradd G <ttrou name> <username>
''"' •
(lt\"'4
.....,...
•
..
110101
,uu,1-1
u•1• • H
, ••,1,>J
,.,,..,
... ~- .....
.,.,....,.I'
°""''
~tfll
....
-·
......
.....
....
,.
..••
••
..- --.... ••
-.......
""
~
...
......
.... _,.
-- ..... "11:tM14fb
.. -... .....--
-----·IT·I
....,
..... ....
..,
_
""" ...
--
"""
....••
,...
...,..
··-
.,....,..,. ....
....
.....
.. ....
_..,.
..........
-··
.,.....,
-
-.... -....
.....
..
01
....
..,
, .
be done when new DataNodes are ndded or whc11 a Data Node is removed from service. This ♦- · ~ ......-•.· ,, ' p ... .., . .. . .
step does not create more space in I IDFS, but rather improves efficiency.
The HDFS superuser must run the balancer. The simplest way to run the balancci is to enler
the following command:
S hdfs balancer Snapshot Summ
By default, the bal:incer will continue 10 rebalance the nodes until the nwnber of data block
on all Data Nodes are within I0% of each other. The balancer can be stopped without harming SnapshoUablo dlreetor,es.
HDFS, at any tum: by entering a Ctrl-C. Lower or higher thresholds can be sci using the
-threshold argument. For example, gmng the following command sets a 5% threshold:
$ hdfs balancer -threshold 5
...
The lower the threshold. the longer the balancer will run. To ensure the balancer d~s not
swamp the cluster networks, you can set a bandwidth limit before running the balancer, as S naps~ollcd (1,rcccoroes. I
follows:
$ dfsadmin -setl3alancerl3andwidlh newbandwidth
111c newbandwidth option is the maximum amount of network bandwidth, in bytes per
second, that each DotaNode con use during the balancing opcrnlion.
Balancing datablocks can also break HBase locali1y.When HBase rgions are moved, some Figure: 3.8 Apuc/1
data locality is lost, and the RcgionServcrs will then reqnest the data over the network from $ hdfs dfs -Is /user/hdfr
remote DataNode(s). This condition will persist until a major Hl3ase compaction event take, -rw-r-r- 2 hdfs hdfs
place (which may either occur at r<!g111:lr intcrvnls or be initiated by the administrator). input/war-and-peace.txt
IIDFS Safe Mode The NameNode UI prov
when the NameNode starts, it loads the file system state from the fsimage and then applies have been 1aken. Figure 3
th.: edits log file.. It then waits for DataNodes to n:pon their blocks. During this time, the snapshot. give the followi
NameNode stays 111 a-read-only Safo Mode. The NameNode leaves Safe Mode automatically $ hdfs dfs -deleteSnapsho
60
~V~A~
after the DataNodcs have reported that most file system blocks are available.
The administrator can place liDFS in Safe Mode by giving the following command:
$ hdfs dfsadmin -safcmode enter
Entering the following command turns off Safe Mode:
cks: I $ hdfs dfsadmin -safemode leave
onds I IDFS may drop into Safe Mode if a major issue arises within the file system (e.g., a full
DataNode). The;: file system will not leave Safe Mode until the situation is resolved. To check
les, and management of whelher HDFS is in Safe Mode, enter the following command:
$ hdfs dfsadmin -safemode get
Decommissioning HDFS Nodes
If vou need to remove a DataNode host/node from the cluster you should dccom- mission
it first. Assuming the node is responding, it can be easily decommissioned from the Ambari
web UI. Simply go to the I losts view, click on the host and selected Decommission from the
ates the existence of a pull-down menu next to the DataNode component.
ories under it. Note that the host may also be acting as a Yarn NodeManager. Use the Ambari 1-1 to
e fi ks to ~-.,hich theybelong decommission the YARN host in a similar fashion.
The restoration process is basically Hl-irnple copy from the snapshot to the previous directory
(or anywhere else). Note the use of the ~/. snapshot/wapi-snap- 1 path to restore the file:
$ hdfs <lfs -cp /uscr/hdfs/war-and-peacc-input/.snapshut/wapi-snap-1/war-and- peace . txt /
user/hdfs/war-and-pcace-input
r of data blocks across the Confirmation that the file has been restored can be obtained by issuing the following
Nodes, the HDFS balancer command:
ta blocks are moved from
hreshold. Rebalancing can • ~•~ • ,A l ~
♦.. cflo,r,,.IJ'! ..,..,~.• ti • 1 p •I•• ,t 'II •'(1
l
ciency.
, run the balancei is to enter
Snapshot Summary
ii the number of data block
be stopped without harming SnJpshonable dtrec10,,es· 1
sholds can be set using the
d si:ts a 5% threshold: .............
nsure the balancer does not
bre running the balancer, as Snaps'1ollcd d1tectoroes· 1
--
rk bandwidth, in bytes per -· ---
"1tnYU,llt.-01,..
se rgions are moved, some Fig11rl!: 3.8 Ap11clte N11111eNotll! Hll!b i11ll!tjt1ce slwHJi11g Mltp.f/,o/ i11for111atio11
data over the network from $ hdfs dfs -ls /user/hdfs/war-and-peacc-input/ Found I items
ase compaction event take, -rw-r--r- 2 hdfs hdfs 32887462015-06-24 21: 12 /user/hdfs/war-and- peace-
by the administrator). input/war•all(f-peace.txt
The NameNode lJI provides a listing of snapshottable direclories and the snapshots that
e fsimage and then applies have been taken. Figure 3.8 shows the results of creating the previous snapshot. To delete a
cks. During this time, the snapshot, give the following command:
s Safe Mode automatically $ hdfs dfs -deleteSnapshot /user/hdfs/war-and-peace-input wapi-snap-1
61
VIII Sem, (CSf/ISf) CBCS - 1,,todel,Q~
To make a directory "un-snapshottable'' (or go back to the default state), use the When the DataNode
following command: from the user.
,... - .....
S hdfs dfsadmin -disallowSnnpshot /user/hdfs/war-and-peace-input Disallowing snapshot
on /user/ hdfs/war-and-peace-input ~uccceded i - -· - ~-~
....
OR
4. a. llow to manage Hadoop scn •kc? (08 Marks)
Ans. During the cot1rse of normal lladoop cluster operations, service may fail for any number of ·-··-
·-·-·-
' u ..
reason. Amb,1ri monitors all of the Hadoop service and reports any service interruption to
the dashboard. In addition, when th!! system was installed, an administrative email for the
Nagios monitoring system was required. All service interruption notifications are sent to this
email address.
·----
g ..
Figure 4.1 shows the Ambari dashboard reporting a down DataNode. The service error
indicator numbers next to the IIDFS service and Hosts menu item indicate this conditions.
The DataNode widget also has turned red and indicates that 3/4 DataNode are operating.
·-··-·-
~
Clicking the HDFS service link in the left vert ical menu will bring up the service summary
screen shoOwn in figure 4.2. The Alters and llealth Checks window confirms that a DataNodc
is down.
The specific host (or hosts} with an issue can be found by examining the llosts window. As
shown in Figure 4.3, the status of host n I ha~ changed from a green dot with a check mark
inside to a yellow dot with a dash inside. An orange dot with a question mark inside indicates
the host is not responding and is probably down. Other service interruption indicator may also
·-- -
be set as a result of the unrcs sive node.
- •· --··-··
I
.............. . . • 1•
Fi~ur~ 4,2 AnllJari
·-·•
·-- ---_... - ·-
·- ---- --
_.,. ....
., •-·
1·-- ...
♦ . . ..........
---
·--
•·-
- - - --
ll;~•- 3,14 ·-·-· ....... ..
'
·-
•·-
.. --
.h• ·-
~
--_, -- - 0.1 4 ms
L• ~
no
"··.. -
-· .. ·-
....... ---
52.9d
,_
1.33 52.9 d
A
- .. - - ·- ·- - --
-- Flg11r~ 4.J
62
~V~An~
fie), use the When the DataNodc daemon is restarted, a confirmation similar to Figure 4.5 is required
from the user.
Disallowing snapshot
•.....,.~•- , ......,"_'
......,. ......., • ~
~-----~........
~ •...:.:-'-"---•------.....-a......,,~ie:,..,.~....,.•-
(08 Marks)
y fail for any num~r of
service intcm1puon to ·-........
...
- ---·-- .,.......,_
----
A=.,,,.,,_ __
---- ..
ninistrative email for th_e
ifications are sent to this
·- --··-
__ .... ·- -
...................
.... - •
,......
· - • • • • •~ t ~ -
- ·•
---
•
ataNode are operating.
g up the service summary
confirms that a DataNode
~
-··-·· .......
.,. Fi ure 4.2 Ambari HDFS sen•h·e s11111111arv wi111/ow i11dit-ali11 a dow11 DutaNode
... .... .._..._,_
L- --
-.;,Will ..... ·- . . . -·
-- - ,_...
0
••••
-
n
- ..--. I$ "' • co
......
t \ lo~. .
I
... ,,.
, 11-t. , .... ,.
.. -~~--
u• • I , I ,,·~ "
r:e •• ~
I
....
,.
.33
--- 52.9 d Fig11rt! 4.J A111b11ri Ho1ts SL'tet!1t l11dicat/11g an inut! witll /1ost n/
.:J
414
ataNode issue
cting the Components sub-
host. At this point, checking
of the failur.:. Assuming the
the Start option in the pull•
63
VIII Se+w (CSE/ISE) 13lft'V~A~ CBCS - Modcl,q~
~, _. - .....-
.,,. g ... __,_.o.,:....._,~ 0 D 6 • •
-- ·-·
··-- ·-
·-- - ~· ·-
.....
., __
__
c.........._..,,.,.~~
"'°";,,...t ,_ '--·
~~-'°"""'°"""'
,c-, ,.....-.,1., .. p ~ -
_.,_
.,.,_
·--.
.~~J~
...... ,;.,..
i,
.......
Ml
L--
• ,,,..,_ tcflll!
__, ,,... 1-
OI , ...... ~ ...
--
~f'.-,4 , •
... _
,_..... . - 1...
,.,. .l.._1 . . .
l ... . . . 6 """
'M-'',.,,,.
----
Fig11re 4.4 Amb"ri 111ii11low for /toll 11/ i11dic"ti11g tl,e D"t"Notle/1/DFS sefl•lce /,as
~topped :ig_ure 4.6 Ambari dash
When a service daemon is started or stopped, a progress window similar to Figure 4.5 is ~nd~cators will slowly J
opened. The progress bar indicates the status of each action. Note that previous actions are md1cators arc beginning
part of this window. If something goes wrong during the action, the progress bar will tum n:d. real-time widget updates
If the system generates a warning about the action, the process bar will turn orange.
When these background operations are running. the small ops (operations) bubble on the top b. Wrile a shore note on v.
menu bar will indicate how manv o rations are runnin •.
Ans. The central YARN Re
machine and acts as the
1 Background Operations Running
applications in the clusterl
·,i.-i f ...... . Sto. M ! IO) J resources and, therefore,
I ... users. Depending on the a
the ResourceManagcr dy
I..,,.
• particular nodes. A conta1
to a particular cluster n
100'0
• interacts with a special sy
100'0
• for scalabili1y. NodeMan
fault reporting, and conta
.,,,~,!~ ,,..,v.......
100'0
• ResourceManager dcpen
C:oo~~ ,.,., <>. :11 --- IOO'lo
• User applications are su
through an admission c
various operational and
I-
• accepted pass 10 the scti
Oo,.,. _ _ _ _ _ _ _ _ . . _ _
resources to satisfy the re
state. Aside from intcma
single ApplicationMaster
Fig11re 4.5 Amb,,ri progress wi111/ow for DataNo(le restart the Applicat1onMaster d
Once the DataNode has been resfarted successfully, the dashboard will reflect 1he new status request additional resour
(e.g., 4/4 Data Node are Live). As shown in figure 4.6, all four
64
I•- · - ..........--
. \ ...,.._,
v (: u ... .,.w- o.-M-b~... 0- e -. -;- ■
•-·
--__ --
--.... --
..
--L a..... , ...
;::.:,-_J.._,..
--e) 0.07 ms
-""-
....
-- •...
52.9 d
............_
"'
-·-
--·
'"'---•
,......,
1.33
... ......
52.9d
......-.
- . --
L-- - - - i l - -
-
S ·- . - ~
to a particular cluster node. To enforce and track such assignments, the ResourceManager
interacts with a special system daemon running on each node called agers are heartbeat based
for scalability. NodeManagers are responsible for local monitoring of resource availability,
fault reporting, and container life-cycle management (e.g., stmting and killing jobs). The
ResourceManager depends on the NodeManagers for its "global view" of the cluster.
6-- User applications are submitted to the ResourceManager via a public protocol and go
1
,..,,. . through an admission control phase during which security credentials are validated and
various operational and administrative checks are performed. Those applications that are
accepted pass to the scheduler and are allowed to run. Once the scheduler has enough
resources to satisfy the request, the application is moved from an accepted state to a running
slate. Aside .from internal bookkeeping, this process i1.volves allocating a container for the
single ApplicationMaster and spawning it on a node in the cluster. Often called container 0,
otle rellart the ApplicationMaster does not have any additional resources at this point, but rather must
d will rcf!l::ct the new status request additional resources from the ResourceManagcr.
65
VIII Sem, (CSE/IS£)
The Application Master is the ·•master" userjob that manages all application life-cycle aspects, was designed and integ
including dynamically increasing and decreasing resource consump1io11 (i.e., containers), Figure 4.7 illustrates th
managing tile flow of execution (e.r., in case of MapReduce jobi, running reducers against the YAl{N components app
oulpul of maps), handling faults and computation skew, and performing other optimizations. and the two applicatio'l
The Applica1ionMaMer is designed to run arbirtratry user code that can be \Hillen in any application uses a dille
programming language, as all communication with the Reso'urceManager and NodeManager passing Interface (MPI)
is encoded using extensible network protocols application.
YARN makes few assumptions about the ApplicationMaster, although in practice it expects The darker client(MPI
most jobs will use a higher-level progr.1mmin:; framework. By delegating all these functions running n MapReduce a
to ApplicationMash:rs, YARN's archilecture gains a great deal of scalability, programming
c. E,pl:iin c:ipaeicy sch
model flexibility, and improved user agility. For example, upgrading and teMing a new
MapReduce framework can be done independently of olhcr running MapReduce frameworks.
Ans. Ca11acity Scheduler Ba
Typically, an ApplicationMastcr will m:ed to harness the processing power ofmuhiple servers Thi: Capacity scheduler
to complete a job. To achieve this, the ApplicationMastcr is~ues resource reqmms to the si:curcly ~hare a large I I;
Resoum:Manager The form of these requests includes specification of locality preferences Capacity scheduh:r has
(e.g., to accommodate I IDFS use) and properties of the containers. The ResorceManager \\ ill To use the Capacity sc
:11tempt to satisfy the resource requests coming from each npplication according to avnilability fmction of the total slo
and scheduling policies. When a resource is scheduled on behalf of an ApplicationMaster, amount of resources fo
hard limits on tho: capa
the ResourceManagcr generates a lease for the resource, which is acquired by a subsequent
Control Lists) that cont
ApplicalionMaster heartbeat.
~- - Mapll.._Cllenl
safeguards arc in place
users.
I Sche!duler I ~ The Capacity scheduler
IAppllcatlonManage, I .,
I
NMU¼#SIM minimum capacity guar
of dcmand (i.e.. a group
fa.ccss slots arc given t
divided by rhe queui: ca
capacity guarantee gel th
elasticity for the users in
Administrntors can chan
run time without disrupt
delete queues at run time
that while existing appli
The Capacity schedule,
application can oprional
Using information from
on the best-suited nodes.
The capacity scheduler
assiiining the minimum
should be assigned a m
Figure 4. 7 : ~1r11 11rd1lre,·111re wir/1 fwo t:lie1lfl(M11pR,:d11,·e tmd MP/). Within each queue, mul
The ApplicationMasler then works with the NodeManagcrs to start the resource. A loken- the approach used with t
based si:curity mechanism guaranh:cs ils a111hentici1y when the ApplicationMaster presents jobs nri: placed in the di!
the container lease to the NodcManager. In a typical situation, running containers will The ResourceManager
communicate with the Applicatio11Mastcr through an applicalion-specific protocol to report thi:ir utilintion. Figure
slatus and health infonn:uion and 10 receive fra111cwork-sp,:cific co111111an<ls. In this way, scheduler view, click th
YARN provides basic infr.i\lructurc for monitoring and life-cych: managcment of containers, Information on configur
"hile each framework manages applic:uion-specific semantics design, in which scheduling org/does/currcnt/hadoo
66
VCNtl;vA~
at ion life-cycle aspects, was designed and integrated around managing only MapReduce tasks.
ption (i.e., containers), Figure 4.7 illustrates the relationship between the application and YARN components. The
ing reducers against the YARN components appear ns the large outer boxes (ResourceManager and NodeManagers),
ing other optimiz.1tions. and the two applications appear as smaller boxes (containers), one dark one light. Each
t can be written in any application uses a different ApplicationMaster; the darker client is running a Message
ager and NodeManager passing Interface (MPI) application and the lighter client is running a traditional MapReduce
application.
gh in practice it expects The darker client(MPI AM.) is running an MPI application,and the lighter client(MR AM 1)is
ating all these functions running a MapRcduce appiication.
calability, programming c. Explain capacity sclu:dulcr background. (04 Marks)
ding and testing a new Ans. Capacily Sd1cdulcr Background
apRcduce frameworks. The Capacity scheduh:r is the default scheduler for YARN thal enables multiple groups lo
ower of multiple servers securely share a large Hadoop clus1cr. Developed by 1he original I ladoop team at Yahoo!, the
resource requests to the C:ipacity scheduler has successfully run many of the largest Hadoop clusters.
n of locality preferences To use tht! Capacity scheduler, one or more queues are configured with a predetermined
he ResorceManager will fraction of the total slot (or processor) capacity. This assignment guarantees a minimum
n according to availability amount of n:sourccs for each queue. Administrators can configure soil limits and optional
f an ApplicationMaster, hard limits on the capacity allocated to each queue. Each queue has strict ACLs (Access
acquired by a subsequent Control Lists) that control which users can submit applications to individual queues. Also,
safeguards anJ in place to ensure that users cannot view or modify applications from other
users.
The Capacity scheduler pcrmits sharing a cluster while giving each user or group certain
~~:MPICllont ,,._':-•· minimum capacity guarantees. Thes..: minimum amounts are not given away in the absence
of th:mand (i.e.. a group is always guaranteed a minimum number of resources is available).
Excess slots an: giwn to the mos, starved queues, based on the number of running tasks
divided by th..: queue capacity. Thus. lhc fullest queues as defined by their initial minimum
capacity gunrantee get the most nee,Jt!d resoum:s Idle capacity can be assigned and provides
elasticity for the users in a co~l-cffectivc manner
Adminislrators can change queue dcfinilions and properties, such as capacity and ACLs, at
run t:me without disrupling users. They can also add more queues at run time, but cannot
delele queues at run time. In addition, administrators can stop queues at run time to ensure
that while existing applications run to completion, no new appli-cations can be submitted.
The Capacity scheduler currently supports memory-intensive applications, where an
application can optionally specify higher memory resource requirements than the default.
Using information from the NodeManagcrs, the capacity scheduler can lhcn place containers
on the best-suited nodes.
The capacity scheduler works best when the workloads arc well known, which helps in
assigning the minimum capacity. For this scheduler to work most effectively, each queue
should be assigned a minimal capacity that is less than the maximal expected workload.
•,hm! "'"' MP/), Within each queue, multiple jobs are scheduled using hierarchical FIFO queues similar to
tart the r..:sourcc. A token- the approach used with the stand-along Flf'O scheduler. If there are·no queues configured. all
l\pplicat ionMastcr presents jobs are placed in the default queue.
, running containers will The ResourceManager UI provides a graphical represrntation of the schedular queues and
-specific protocol to report
their utilization. Figure 4.8 shows two jobs running on ,1 four - node cluster. To select the
c commands. In this way, scheduler vicw, click the scheduler option at the bottom of the lefi-side vertical menu.
nanagcmcnt of containers, Information on configuring lhe capaci1y scheduler can be: found at https : //handoop. apache.
sign, in which scheduling org/docs/currcnt/hadoop-yarn/hadoop -yam-site'CapacitySchcduler.html and from Apache
67
VIII Se,m, (CSE/IS'E) 13~Vam.,A~ CBCS • Mode!,qd
Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2.
In addition to the capacity schedular, H.idoop YARN offers a Fair scheduler. More information
13. Microsoft Bl pla
can be found on the Hadooo webs ite. 14. MicroStrategy
l S. Ml't'S
· - -·..... • 1- - . ...................... .
• .A.._..J..'.~~ .. •· ~ -- ---· --- .. c ■•'l'!lfl,/t • ••••• 16 Open!
~
__ NEW,NEW_ SAVING,SUBMITTED,ACCEPTED,RUNNING Application• - - - -- 17. Oracle Bl
--..,.,
,.-. a ...,.. -..""
..... ._. ._._ ._. <;rt....., .,..,.._~ ......., K •"'IIC.-.IWC'- ..., ~ _ , . . ~ ~ ....... 18. Oracle Enterpris,
~" ....... .........., , ...~ ... --..... ~ ... fr.I.I -.. "'-" .. "'""'" ..... ...... .... ..... --
• • , , a JJ:)O . ....._ oe n 11 t t t t t t
19. Oracle Hyperion
..- ear r · ... "'
,. - • ,..,. :iG2P ; tsrt norcesn
~ :et e cer The Bl tool used in o 1
.....
·--
l'Q ~
..... \,e. . . . . -
As higher education
decision-making. The
quality of student ex
,.,.,.
....... w.,.,.,, ...... ~ .ao --- I. Student enrolmen
·-1!)11
.,,.~ \.11""'1' requires schools to de
. . ... ..
...,,..
, t,.,t .....Mu
- H~"'f
,.,.,.
~>on
~ IN)l'OC D alW
develop models of wll
,..,,..,....
u , ,10,
those students. The st
can be taken in time.
Fig11re 4. 7 : Ap11cl1e YARN res11urce 111a11age web i11terface s!,owi11g c11pacity scl1e,luf(:s
2. Course offerings:
i11/or111atl1111
new courses are likely
Modulc-3 reduce costs, and impr~
Describe list or business intelligence tools used in the organization. Explain any 2 or 3. Alumni pledges: Sc~
5. a
to pledge financial sup
them used in your organization. (10 Marks)
to pledge donations to
Ans. According to the list of best business intelligence tools prepared by experts from Finances
other forms of outreach
Online the leading solutions in this category comprise of systems designed to capture,
categorize, and analyze corporate data and extract best practices for improved decision b. What are the key elem
making. The more advanced the system is, the more data sources it will combine, including Ans. OW has four ke elcmc
internal metrics coming from different company departments, and external data extracted Pata Sources
Operations
from third-party systems, social media channels, emails, or even macroeconomic data.
-ERP systems
Ultimately, business intelligence sofiware helps companies gain insight on their overall · Legacy systems
growth, sales trends, and customer behavior. · Point of Sale
• RFID systems
I. Sisense 20. Palo OLAP Server -W eb usage
2. Actuate Business Intelligence and Reporting Tools (BIRT) 21. Pentaho External
-Suppliers
3. icCube 22. Profit base a Cust omer s
23. QlikView •Government
4. Domo
5. Board Management Intelligence Toolkit 24. Rapid insight.
The first element is the
6. Clear Analytics 25. SAP business intelligence
process of transforming t
7. Duccn 26. SAP DusinessObjects of regularly nnd accurate
8. Gooddata 27. SAP NetWeaver BW is the data access and an,
9. IDM Cognos Intelligence 28. SAS Bl deliver insights and othc
I0. lnsightsquared 29. Silvon Data Sources
OWs are created from s~
I I. JasperSoft 30. Solver need lo be structured befi
12. Looker 31. SpagoBI
68
~ - - --- --
ing with Apache Hadoop 2. 13. Microsofi Bl platform 32. SQL St:rver Analysis Servic~
·che<luler. More information
14. MicroStmteg~ 33. Style Intelligence
15. MITS 34. Syntell solutions
16. Openl 35.Targit
UNNING Applications 17. Oracle DI 36. Vismatica
18. Oracle Enterprise Bl Server 37. WebFOCUS
19. Oracle Hyperion System 38. Yellowfin Bl
The Bl tool ust:d m our organization : Education
As higher education becomes more expensive and competitive, it is a great user of data-based
decision-making. There is a Wong need for efficiency, increasing revenue, and improving the
quality of student experience at all levels of education.
J. Student enrolment (recruitment and retention): Marketing to new potential students
requires schools to develop profiles of the students that are most likely to auend. Schools can
develop mod1ils of what kinds of students are attracted to 1he school, and then reach out to
those students. 1l1e students at risk of not retuming can be nagged, and corrective measures
can be taken in time.
2. Course offerings: Schools can use the class enrolment data to develop models of which
new courses are likely to be more popular with students. This can help increase class size,
reduce costs, and improve student satisfaction.
3. Alumni pledges: Schools can develop predictive models of which alumni arc most likely
iization. Explain any 2 of to pledge financial support to the school. Schools can cr.eate a profile for alumni more likely
( 10 Marks)
to pledge donations to the school. This could lead to a reduction in the cost of mailings and
i1 by experts from Finances other forms of outreach to alumni.
terns designed to capture,
Ices f~r impr~ved. decis~on b. What arc the key clements or a data warehouse? Describe each. (6 Marks)
s it wall combine, including Ans. DW has four ke clements (Figure 5.1 ).
nod external data extracted Qi!Si! ~!2!.!'~l:i
Oper.,t ions l2ila oasa Mart or . Accesslne u,en
ven macroeconomic data. -ERP system~ Transformation & apol(cauons
~ar1bP!.!H
in insight on their overall · Legacy systems • One data OlAP tools
-Point of Sale - Select Data Reporting Tools
-Extract Data mart ror e;,ch
-RFID systems department,
Dashboards
Palo OLAP Server •Web USilgC?
-Cleanse data or
Queries
Exte?rnal
-Compute Data Mobile Devices
•A Ware?house
-Suppliers fields Data Mining
•Integrate Data for the whole
rofit base ·Customers enterprise Custom apps
-Government •load dala
69
VIII Semt (CS'f/IS'f) CBCS - lvtodel,Q~
I. Operations data include data from all business applications, including from ERPs systems database management sy
that form the backbone of an organization's IT systems. The data to be extracted will depend and reliable providers of
upon the subject matter or OW. For example, for a sales/marketing DW, only the data about The provider oflhe ope
customers, orders, customer service, and so on would be extracted. Alternatively, a best-of-b
2. Other applications, such as point-of-sale (POS) terminals and e-commerce applications, there for data migration,
provide customer-facing data. Supplier data could come from supply chain management OW Access
systems. Planning and budget data should also be added as needed for making comparisons Data from OW could be
against targets. I. A primary use of D
3. External syndicated data, such as weather or economic activity data, could also be added example, a sales perform
to DW, as needed, to provide good contextual information to decision mal,.ers. with plan. /\ dashboardin
Figure 5.2 Data warehousing architecture users. The data from o
D11ta Transformation Processes executives. The dashboar
The heart ofa useful DW is the processes to populate the DW with good quality data. This is data for root cause analys
called the extract-transform-load (ETL) cycle. 2. The data from the war
I. Data should be extracted from many operational (transactional) database sources on a that make use of the inte
regular basis. 3. Data from DW is used
2. Extracted data should be aligned together by key fie lds. It should be cleansed of any extracted, and then combi
irregularities or missing values. It should be rolled
up together to the same level of granularity. Desired fields, such as daily sales totals, should
be computed. The entire data should then be brought to the same fonnat as the central table 6. a. Des~ribc the key steps i
ofDW. . processes?
3. The transformed data should then be uploaded into DW. This ETL process should be Ans. Effective and successful u
run at a regular frequency. Daily transaction data can be extracted from ERPs, transfonned, skills. The business aspect
and uploaded to the database the same night. Thus, DW is up-to-date next morning. If DW one imagine possible relati
is needed for near-real-time information access, then the ETL processes would need to be help fotch the data from m,
executed more frequently. ETL work is usually automated using programing scripts that are business problem, and the
written, tested, and then deployed for periodic updating DW. An important element is to
OW Design the problem with smaller
Star schema is the preferred data architecture for most DWs. There is a central fact table that it~r~tive sequence of step
provides most of the information of interest. There are lookup tables that provide detailed mmmg techniques over a I
values for codes used in the central table. For example, the central table may use digits to Cross-Industry Standard p
represent n sules person. The lookup table will help provide the name for that sales person (Figure 6. 1):
code. 1lere is an example of a star schema for a data mart for monitoring sales performance
(figure5.2).
m:clJIIOtitm' a tt-ltt'tirtlil:Jtff\RI
I
ISh@frASi t ·M!Mli
Fig
Figure 5.2 Star schema arcl1itect11re I. The first and most import
Other schemas include the snownake architecture. The difference between a star and the right business questions
snowflake is thnt in the latter. the lookup tables can have their own further lookup tables. payolfs for the organizatio
There are many technology choices for developing DW. This includes selecting the right mining project is like any otl
is successful. There should be strong executive suppon for the data mining project, which 2.Ceomctric Project
means that the project aligns well with the business strategy. A drawback of pixel•
2. A second important step is to be creative and open in proposing imaginative hypotheses understanding the dist
for the solution. Thinking outside the box is important, both in terms of a proposed model as Geometric projection
well in the data sets available and required. data sets.
3. The data should be clean and of high quality. It is important to assemble a team that has a A scatter plot displays
mix of technical and business skills, who understand the domain and the data. Data cleaning added using different
can take 60 to 70 percent of the time in a data mining project. It may be desirable to add new Eg. Where x and y
data elements from external sources of data that could help improve predictive accuracy. different shapes
4. Patience is required in continuously engaging with the data until the data yields some good Through this visualilU
insights. A host of modeling tools and algorithms should be used. A tool could be tried with
different options, such as running different decision tree algorithms.
5. One should not accept what the data says at first. It is better to triangulate the analysis by
applying multiple data mining techniques and conducting many what-if scenarios, to build
confidence in the solution. Evaluate the model's predictive accuracy with more test data.
6. The dissemination and rollout of tht: solution is the key to project success. Otherwise the
project will be a waste of time and will be a setback for establishing and sllpporting a_data•
based decision-process culture in the organization. The model should be embedded in the
organization's business pr~esses.
b. What arc the data visualization techniques? When would you use tables or graphs?
(08 Marks)
Ans. Data Visualization techniques are :
I .Pi,.el oricnlL-d visualization techniques:
Fig
A simple way to visualize the value of a dimension is to use a pixel where the color of the 3.k on based \ isualiza
pixel renects the dimension's value. It uses small icons to re
For a data set of m dimensions pixel oriented techniques create m windows on the screen, 2 popular icon based te
one for each dimension. 3.1 Chem off faces: _ TI
The m dimension values of a record are mapped to m pixels at the corresponding position in human face.
the windows. The color of the pixel renccts other corresponding values.
Inside a window, the data values are arranged in some global order shared by all windows
Eg: All Elcctronil"s maintains a customer information table, which consists of 4 dimensions:
income, credit_limit, trnnsaction_volume and age. We analyze the correlation between
income and other attributes by visualization. Fig 6. 4 : d tcm ujffi
We sort all customers in income in ascending order and use this order to layout the customer ~hspacc {I cm}$ 3.2 St
data in the 4 visualization windows as shown in fig. where each figure has 4
The pixel colors are chosen so that the smaller the value, the lighter the shading. 2 dimensions are mappe
Using pixel ba~cd visualintion 11e can easily observe that credit_limit increases as income angle and/ or length oft
increases customer whose income 1s m the middle range are more likely to purchase more 4.Hlcra rchical \'l~ua li
from All Electronics, these is no clear correlation between income and age. The subspaces are visual
□ EJ@
II Types or Charts
TI1ere arc many kinds
I"' popular fonn of data. It
around alphabetical fist
I shows some of the popu
Income ctC\SitJun.at truu1e1.1oa_volumc .qe
I. Linc graph. This is a
Fig 6.2 : Pixel uric11/1t1/ 1°illl(1/iwtio11 of, 11/lr lbutes ~~ sorli11g 111/ c·ustumers in i11c·ome as a series of points conn
Asc1t11di11g order . is usually sho" n on the l
72
[9'V~A~
' mining project, which 2.Ccometric Projection visualization tech■-...
/\ drawback of pixel-oriented visualization techniques IS dlat they cannot help us much in
imaginative hypotheses understanding the distribution of data in a multidimensional space.
; of a proposed model as Geometric projection techniques help users find interesting projections of mullidir.wnsional
data sets.
emble a team that has a A scatter plot displays 2-D data point using Cartesian cCMXdinates. A third dimension can be
the data. Data cleaning added using different colors of shapes to represent different data points.
be desirable to add new Eg. Where x and y arc two spatial attributes and the third dimension is represented by
predictive accuracy. different shapes
e data yields some good Through this wisualization, we can see that points of types"+" &"X" tend to be collocated.
tool could be tried with IO
~
10 □
iangulate the analysis by 60
hat-if scenarios, to build t:,
y with more test data. y so
ct success. Otherwise the 00 0
ng and supporting a data- a
30
ould be embedded in the b.
10
I,.
to L:,..
use tables or i raphs?
(08 Marks) 0
=
o tO 10 lO 40 ~ 60 10 IO
73
VIII Se.m, (CSE/ISE)
y-axis to compare of the line graphs of all the varmbles.
2. Scatter plol: This is another very basic and useful graphic fonn. It helps reveal the
relationship bcl\\een two variables. In the 11bo,e caselet, it shows two dimensions: Life
Expectancy and Fertility Rate. Unlike in a line graph, there are no line segment• connecting
the points.
3. Dai- graph: A bar graph shows thin colorful rectangular bars with their lengths being
proportionnl to the values represented. The bars can be plotted vertically or horizontally. The
bar graphs use a lot of more ink than the line grnph and should be used "hen line graphs are
inadequate.
4. Slacked Dar graphs: These are a particul.u method of doing bar graphs. Values of
multiple variables are stacked one on top of the other to tell an interesting story. Bars can
also be nonnaltzed surh as the tolal height of every bar is equal, so ,t can show the relative
composition of each bar.
5. Hlstoi;rams: The~..: ore like tiar graphs, except that they are useful in showing data
freq11en-.1i!!- v: data values on classe~ (or ranges) ofa numerical variable
6. r,e c,,:irts: Tht~e an: very ropul,tr to ~hO\V the di~tribution of a variable, such as sales by
region. ·11,: s:1, of a slice? is 1cp,csentative oithe relative strcngths of each value.
7. 801. l:h:irts· r'it~e arc sped.ti form of charts to show the distribution of variables. The
box slco, ~ thc 1111Jdlc nalfofthe value~, while \,hiskcrs on both side~ extend to the extreme
value, ,11 either direction.
8. Bubbl~ C rnrh: This 1s an inleresting way of displaying multiple dimensions in one chart.
It is a variant of a scatter plot with many data points mao..ed on t\\ o dimensions. Now imagine
that each data point on the graph is a bubble (or a circle) ... the size of the circh: and the color FOOTP
fill in the circle could represent two additional dimensions.
~~~-,--
fr
6SO ....,.,
I
6so r....,__
...-... . Fi,:11re6.7:
~'.~
C.,-,ot, s.--.vi,;.1..,
·-·.,._·.,,• A table ;, b.:st when;
• You need to look up $
• Users need precise vt
• You need to prccisel)
Figure 6.5: Man.I' typ~ ofgraphs • You have multiple da
9. Dials: These are charts lil-e the speed dial in the car, that shows whether the variable value A graph is best when:
(such as sales number) is in the low range, medium r.1ngc. or high range. These ranges could
• The message is conta
be colored red. ycllow and grcen to give an instanl view oftbe data. • You want to reveal r,i
10. Geographical Data maps are panicularly useful maps to denote statistics. Figure 6.6 • Show general trends
shows a tweet density map of the US. It shows where the tweets emerge from in the US. • You have large data ~
11. Pictographs: One can use picturis to rtpresent data. E.g. Figure 6.7 shows the number of
• Graphs and tables sei
liters of water needed to produce one pound ofeach of the products, \Vhere images are used to purpose.
show the product for ea>y reference. Each droplet of \~ater also represents 50 liters of water.
74
C'BCS • MocleiQ~Petpu · 2
f'/ ,, ~ f~
6SO •ui.r__ 65Q Wh•".;_ 140 0 ~••shwn 2SOOMIIJ•!._
.1
6SQ Tou1__
i
750'•··'~&.t
0
go ru -
r
84 0 C.lln -
.·-·...·....
Figure 6. 7: Pictugraph uf llrtter footprilll (lo11rce: waterfoQ/J)ri11t.org)
75
VUI Se-111, (CS£:/IS£)
77
VIII Seoi,, (CS£/IS£)
difficult.
Decision Trees
Should be faster once trained (although both algllrilluns can train ~lowly depending on An ideal cluster can be de
exact algorithm and the amount/dimens1onality of the data). This is because a decision tree In reality, a cluster is a subj
inherently "throws away" the input features that ii doesn·1 find useful, v.hereas a neural net knowledge. In the sample~
will use them all unless you do some feature sdection as a pr.:-processing step.
If it is important to understand what the model is doing, the trees are very 111terpretnbtc.
Onl)' model functions which are axis-parallel splits of the data, which may not be the case.
You probabl· •111 to be sure to prune the tree to avoid over-fining.
Neural Neb
Slower (both for training and classification), and less interpretable.
If your data arrives in a stream, you can do incremental updates with stochastic gradient
descent (unlike decision trees, which u~e inherent!)' batch-learning algorithms).
Can model more arbitrary functions (nonlinear int.:ractions, etc.) and therefore might be more Jr seems like there are two
accurate, provided there is enough training data. But it can be prone to over-fitting as well. as three cl11sters, dependin
There are many advantages of using ANN. way to calculate ii. Heuristi
I. ANNs impose very little restrictions on their use. ANN can deal with (identify/model) Three business application<
hi1;hly nonlinear relationships on their own, without much work from the user or analyst. Cluster analysis is used in
1l1ey help find practical data-driven solutions where algorithmic solutions arc nonexistent or It helps provide characteriz
too complicated. ~atural groupings of custom
2. There is no need to program ANN neural networks, as they learn from examples. They get m a specific domain and
better with use, without much programing effort. business applicatioo of clus
3. ANN can handle a variety of problem types, including classification, clustering, clusters based on their cha~
associations, and so on. on. Here are some e>.ampl~
4. ANNs arc tolerant of data quality issues, and they do not restrict the data to follow strict I. Market Se;:1111mtatio1t: C·
nonnality and/or independence assumptions. by their common wants an
S. ANN can handle bol11 numerical and categorical variables. 2· Product portfolio: Peopl
6. ANNs can be much faster than other techniques. and large sizes for clothing •
7. Most importantly, ANN usually provide better results (prediction and/or clustering) 3· Tl!.\! Mil1/11g: Clustering
10 their contc111 similarities ~
compared to statistical counterparts, once they have been trained enough.
The key disadvantages arise from the fact thal they are not easy to interpret or explain or
compute.
9. II. Give the Comparison bet
I. They are deemed to be black-box solutions, lacking explainability.
Ans. Text Mining is a fonn of da
2. Optimal design of ANN is still an art: It requires expertise and extensive experimentation.
3. It can be difficult to handle a large number of variables (especially the rich nominal Data Mining. However lher
attributes) with an ANN. is that te>.1 mining requ,ires
4. It takes h1r~e data sets to train an ANN. techniques can be applied.
Dimtnsion
b. Define Clusters'! Describe three business applications in you r industry where cluster
analysis will be useful. (08 Marks) Natur, or
data UnMru<:lun:d d
Ans. Definition of a Cluster :An operational definition of a cluster is that, given a representation of
n objects, find K groups based on a mt:asure of similarity, such that objects within the same Languaee Many languag
group arc alike but the objects in different groups are not alike. ustd world; many I
However, the notion of similarity can be interpreted in many ways. Clusters can differ in documtnts an:
terms of their shape, size, and density. Clusters are pattcms, and there can be many kinds Clarity a nd S.:ntcnc.:s can
of pattcms. Some clusters are the traditional lypcs, such as data points hrnging together. prtcision conrrndict the
However, there are other clusters, such as all points representing the circumference of a
Consistenfy Din.:rcnt pans
circle. ll1crc may be concentric circles with points of different circles representing different other
clusters. The presence of noise in the data makes the detection of the clusters even more
78
' V~A~
difficult.
An ideal cluster can be defined as a set of points that is compact and isolated.
n slowly depending on In reality, a cluster is a subjective entity whose significance and interpretation requires domain
· because a dccision m:e J...nowledge. In the sample data below (Figure 8.1), how many clusters can one visualize?
ul, whereas a neural net X
essing step.
e very mterprctable. X
X
ch may not be the case. X
X X
X X
X
X X X
X
with stochastic gradient Fig11re 8.1: Visual clus/er example
algorithms). It seems like there are two clustm of approximately equal sizes. However, they con be seen
d therefore might be more as three clusters. depending on how we draw the dividing lines. There is not a truly optimal
c to over-fitting as well. way to calculate it. I h:uristics are ofien used to define the number of clusters.
Three business applications:
cal with (identify/model) Cluster analysis is used in almoM every fir Id where thet e is a large variety of transactions.
from the user or analyst. It helps provide characterization, definition, and labels for populations. It can help identify
lutions arc nonexistent or natural groupings of customers, products, patients, and so on. It can also help identify outliers
in a specific domain and thus decrease the size and complexity ot prchlems. A prominent
from examples. They get business applicatioo of cluster analysis is in market research. Customers are sei>:-iented into
clusters based on their characteristics-wants and needs, geography: price sensitivity, au-! :o
classification, clustering, on. Herc are some examples of clustering:
I. Market Seg111e11tati1111: Calegorizing customers according to their similarities, for instance
ct the data to follow strict by their common wants and needs, and propensity to pay, can help with targeted marketing.
2. Product portfolio: People ofsimilar sizes can be grouped together to make small, medium
and large sizes for clothing items.
3. Text Mi11i11g: Clustering can help organize a given collection of text documents according
.diction and/or clustering) to their content similarities into clusters of rcluted topics.
enou!?,h.
to i~terpret or explain or Modulc-S
9. a. Give the Comparison between Tcxl Mining and Data Mining? (08 Marks)
11ity. Ans. Text Mining is a form of data mining. There are many common clements between Text and
extensive experimentation. Data Mining. However, there art: some key differences (Table Below). The key difference
:specially the rich nominal is that text mining requires conversion of text data into frequency data, before data mining
techniques can be applied.
Dim,nsion Tut Mining Data Mlaln1
ur industry where cluster Nature of Numbers; alphabetical and logical
(08 Marks) Unstructured data: \\-Ords, phrases. sentences
data values
iat, given a representation of Many languages and dialects used in lhc
hat objects within the same Language Similar numerical S)Stcms across
world; many languages arc extinct. new
UKd
docu1rn:nts arc di~owrcd the world
ways. Clusters can differ in
Clarity 11ml Sentences can be ambiguous: sentiment may Nwnbers an: pn.-cisc
d there can be many kinds pr,cision con1rndict the words
ta points hrnging together.
ing the circumference of a Ditlen:nt parts of data can b.:
DitTen:nl parts of the tc,t can contradict each inconsistc:nt.
Consist,ncy thus. requiring
ircles representing different oth.:r statistical significance analysis
of the clusters even more
79
VIII Sem, (CS'E/IS'E) cacs - ModeiQ~P,
Tcxl may prcscnl a clear ond consis1cn1 nr the computations involved
Scnlimenl mi:1.cd ~n1i1111:nt. across a ~ontinuum. Spoken NIA Bayes, simple Bayes, or in
"ords adds funhcr s.:nlimcnt The advantages of Naive B1
Spdling errors Ditrcrinp , alucs of proper ls,ucs wilh missing valm:,. · II uses a very in1ui1ive 1
Quality nouns, such as nnnu:s. Varying 4uali1y of several free parameters tha
language translation 0111licrs, and so on
· Since lhe classifier return
N111ur\' or ~\:) wonJ•ba~J ~arch. 1,;oc:\.i~lcnce of
/\ full "id.: rJngc of ,1a1is11cal of tasks than ifan arbitrary
and machinc-1.:ammg analysis tor It does not require large
analysis 1hcmcs: scntimcnt mining relationships and diffc:n:nccs · Naive Bayes classifiers a
b. In what wiiys Is Naive-Day&:$ belier than other classification technlqul-s? Compare with
decision tree. (08 Marks)
10, a. What is clickstream anal
Ans. Classification is the separation or ordering of objecls into classes .There are two phases in
On a Web site, clicks1rean
classification algorithm: first, the algorithm tries to find a model for the class attribute as a
collecting, analyzing and
function of other variable\ of the datasets. Next, ii applies pn:viously desigi;icd model on the
- and in what order. The p
new and unseen datasets for detennining the related class of each record.
There are 1wo levels of c
Classification has been applied in many fields such as medical, astronomy, commerce,
Traffic analytics operates at
biology, media, e1c. There are many techniques in classifi&alion method like: Decision Trt:e,
how long ii lakes each png
Nai've Bayes, k-Nearesl Neighbor, Neural Networks, Support Veclor Machine, and Genetic
and how much dala is •~
Algorithm. In this paper we \\-ill use Decision Tree, Naive Bayes, and k-Ncarest Neighbor.
uses clickstream data to de
They are both supervised learning algorithms used for classification tasks. It strongly depends
concerned with what pages
of the data you have and what you arc trying 10 leam. a shopping can, what ilem
Although ii depends on the problem you arc ~olving, but some gcm:ral advan1ages are
loyally program and uses a
following:
Because an extremely larg~
Naive - Biiycs :
many e-businesscs rely on
I. Work well wi1h small dataset compared to DT which need more data
inlerprcl the dala and gene
2. Lesser overfilling
considered to be most effect
3. Smaller in size and faster in processing
evaluation resources.
Decision Tree:
I. Decision Trees arc very flexible, easy lo understand, and easy to debug b. Explain brlcny the leehnl
2. No preprocessing or transfonnalion of features required
3. Prone 10 overfilling bul you can user pruning or Random forests to avoid that. Ans. TECIINIQUES AND AL
In Britf : Decl.s/011 Trl!I! analysis • discovering sub-1
A d1:cision tree is a now-chart-like 1rce s1ruc1un:, \\-here each internal node denotes a test on important nodes or hubs.
an auribute, each branch represents an outcome of 1he test, and leaf nodes represent classes Finding Sub-networks: A
or class distri butions. l l1e popular Decision Tree algorithms are ID3, C4.5, CART. The ID3 be seen as an interconnected
algorithm is con~idcrcd as a very si111ple decision lrce algorithm. II uses infonnalion gain as unique charac1eris1ics. This
splitting criteria. C4.5 is an evolution of 1D3. It uses gain ratio as splitling criteria. between lhem would belong
CART algorithm uses Gini coefficicnl as lhe test attribute selection criteria, and each time belong to separate sub-netwo
selects an a11ribu1e with 1he smallest Gini coefficient as the 1es1 attribute for a given set [5]. no correct number ofsub-ne
n ,e advantage: of using Decision Trees in classifying the dala is that they are simple to for decision -making is lhe
undersland and in1erpre1 . I lowever, decision trees have such di~dvantagcs ,s : Visual represen1a1io11 of 11
I) Most ofthe ulgorithms (Ii/cit /DJ und C.U) require thut the target ut1rih11te will /11.rve only differentiate the types of n
discrete values. help visually identify the s
2) A.f decision tre1ts use the "clivide um/ ,·mUJuer" 1111ttltod, the,• t,mcl to perjurm well ifufew strong relationships around
highly relt.1vun1 uflribut<!.!i l!.ri.il. but less .10 ifmmry complex i11teractiuns are present. sub-network. A sub-network
Naive Buyes : Nai"ve Bayesian classifiers assume 1ha1 there are no dependencies amongst them. In this case ,one or m
attributes. This assumption is called class conditional indcpendenc1:. It is mad11 to simplify
80
the computations involved and, hence is called "naive" . This classifier is also called idiot
Bayes, simple Bayes, or independent Bayes .
The advantages of Naive Bayes are:
· It uses a very intuitive technique. Bayes classifiers, unlike neural networks, do not have
missing values. several free parameters that must be set. This greatly simplifies the design process.
u so on · Since the classifier returns probabilities, it is simpler to apply th!i!se results to a wide variety
range or statistical of tasks than ifan arbitrary scale was used.
tnc-lcarning analysis for · It does not require large amounts of data befon: learning can begin.
iPS and ditTercnccs · Naive Bayes classifiers arc computationally fast when making decisions.
o debug b. Explain briefly the techniques and algorithms of social network analysis?
(14 Marks)
s to avoid that. TECHNIQUES AND ALGOIUTIIM : There are two major levels of social network
analysis - discovering sub-networks within the network and ranking the nodes to find more
rnal node denotes a h:st on important nodes or hubs.
af nodes represent classes Finding Sub-networks: A large network could be 1he better analyzed and managed if it can
D3, C4.5, CART. The ID3 be seen as an interconnected set of distinct sub-networks each with its own distinct identity and
It uses infom1ation gain as unique characteristics. Thi~ is like doing a cluster analysis of nodes. Nodes with strong ties
splitting criteria. between them would belong to the same sub-network ,while those with weak or no ties would
ion criteria, and each time belong to separate sub-networks .This is unsupervised leaming technique ,as in Apriori there is
tribute for a given set [5]. no correct number of sub-networks in a nc:twork. The usefulness of the of the network structure
is that they are simple to for decision -making is the main criterion for ad'opting a particular structure.
Visual representation of networks can help identify sub-netw<;irks. Use of color can help
differentiate the types of nodes. Representing slrong ties with thicker or bolder lines could
help visually identify the stronger relationships. A sub-network could be a collection of
d lo perform well if a few strong relationships around a hub node. In 1his case, th1. hub node could represenl a distinct
·tiuns are present. sub-network. A sub-network could also be a subset of nodes with dense relationships between
no dependencies amongst them. In this case ,one or more nodes will act as gateway to the rest of the network.
. It is made to simplify
81
VIII Sem.- (CSf/ISf) CBCS - Mode.l,Q~wn, j
There are two outbound I
half of node A's influenc
A ,so both C and A recei
There is only outbound I
node D . There is only
influence of node C.
Node A gets all of the i1
Thus,
Node B gels half the inH
Thus,
Node C gets half the infl
Thus,
Node D gets all of the in
Thus,
Fig A nch••ork wit/1 Dlsti11ct sub Nt!twork We have 4 equations usi
Computing lmportuncc of Nodes: When the connection~ between nodes in the network We can represent the coe
have a direction to them ,then the nodes can be compared for their relative influence or ( I0. I) given below..This
rank. This is done using · Influence Flow Model'. Every outbound link from a node can be not represented in an equ
considered an outflow of influence. Every incoming link is similar an inflow of influence.
More in-links to a node means greater importance. Thus there will be many direct and
indirect flows of influence between any two nodes in the network.
Computing the relative influence of each node is done on the basis of an input-output matrix Ra
of flows of influence among the nodes. Assume each nodes has nn influence value . The Rb
computational task is to identity a set of rank values that satisfies the set of links between the Re
nodes. It is an iterative task ,,here we begin with some initial values and continue to iterate
H.d
till the rank values stabilize.
For simplification , let us
Consider the following simple network with 4 nodes (A,B,C,D) and 6 directed links between
a fraction as the rank valu
them as shown in the figure( I0.2). Note that there is a bidirectional link . Here are the links:
compute new rank values
Node A links into B
as Ji n or //.I for the nod~
Node B links into C
Node C links into D
Node D links into A
Node A links into C
Node B lin!..s into A
0 :!:==:=====;
j
Vari11
ll&i
H.~
0◄•------~ Rel
Fig R~
Computing the revised va,1
The goal is to find the relative importance, or rank, or every node in the network . This will
help identify the most imponance node(s) in the network. shown as iteration I.
Using the rank values fro
We begin by assigning the variables for influence ( or rank) value for each node, as Ra , Rb
for these variables ,show
, Re , and Rd. The goal is to find the relative values of these variables.
82
There are two outbound links from node A to node 8 and C. Thus , both Band C receives
half of node A's influence. Similarly, there are two oudlound links from node B to node C and
A ,so both C and A receives half of node B's influcnc:c
There is only outbound link from node D to node A . Thus ,node A gets all the influence of
node D. There is only outbound link from node C to node D nnd hence, node D gets all the
influence of node C.
Nodc A gets all of the influence of node D and half the inlluence of node 8 .
Thus, Ra =IJ.5 X Rb +Ri.
Node O gets half the infl uence of node A.
Thus, Re =0.5 X Rn.
Node C gets half the infl uence of node A and half the influence of node B.
Thus, Re =IJ.5 X Ra+ 0.5 X Rb.
Node D gets all of the influence of node C and half the influence of node B.
Thus, R,I : Re.
rk We have 4 equations using 4 variables . These can be solved mathematically.
een nodes in the network We can represent the coefficient of these 4 equations in a matrix form ns shown in the Dataset
their relative influence or (10. 1) given below..This is the Influence Matrix. The zero value represent that the term is
d link from a node can bo not repre~ented in an equation.
'tar an inflow of influence. Data St!t lO. I
will be many direct and Ra Rb Re Rd
Ra 0 0.50 0 1.00
is of an input-output matrix
Ith 0.50 0 0 0
an influence value . The
the set of links between the Re 0.50 0.50 0 0
ues an<l continue to iterate Rd 0 0 1.00 0
For sirnplifica11on, let us also state that all the rank values add up to I. Thus. each node has
ind 6 directed links between a fraction as the rank value . Let us start with an initial set of rank values and then iteratively
al link . Here are the links: compute new rank values till they stabilize. One can start with any initial rank values, such
as J/11 or // .J for the nodes.
Variable Initial Value
lb 0.250
Rb 0.250
Re 0.250
Rd 0.250
Variable Initial Value Iteration I
lb 0.250 0.375
Rb 0.250 0. 125
Re 0.250 0.250
Rd 0.250 0.250
Computing the n:v1se<l values usmg the equations startc1l earhcr,we get a revised set of values
~e in the network . This will shown as iteration I.
Using the rank values from Iteration I a~ the new starting values ,we can compute new values
e for each node , as Ra , Rb for these variables ,shown as Iteration 2. Rank values will continue to change.
blcs.
83
VIII Sun, (CSE/ISE)
85
VIII Sern, (CSf/ISf) C!lCS·M~Q~
this limitation by adding support for multiple NameNode namespaces to the HDFS file I. Run teragen to gene~
system. The key benefits are as follows: $ yam jar SHA DOOP El
• Namespace scalability. HDFS cluster storage scales horizontally without placing a burden 500000000 -
to the NameNode. --,/ user/hdfs/TeraGen-50
• Better performance. Adding more NamcNode to the cluster scales the file system read/ 2. Run terasort to sort
write operations throughput by separating the total namespaces. $ yam jar $HA DOOP E1
• System isolation. Multiple NameNode enable different categories of applications to be --tuser/hdfs/TeraGen:So
distinguished and users can be isolated to different namespaces. 3. Run teravalidate to
Figure I.I illustrates how IIDFS NameNode Federation is accomplished. NameNodel $ yam jar SHADOOP E
manges the / research an /marketing namespaces, and NamcNodc2 manages the / data and /
--t user/hdfs/TeraSort:SO
project namespaces. The NameNode do not communicate with each other and the DataNodes
To report results, the tim
"just store data block" as directed b either NameNode.
megabytes/second (MB/
should be run with a rcpli
- .....
Nl..,...,;oonOO l"IOI
tasks is set to I. lncreasi
- -
....
,
"""'-
.... aaafN For example, the followi 1
$ yarn jar SHA DOOP E
_, -Dmapred.reduce::Us
do not forget to clean u
following command will
$ hdfs dfs -rm -r -skipTra
Running the TcslDFSI
Hadoop also includes an
Figure I . I HDFC Na111eN01le Feder111/011 ex11mple.
benchmark is a read and
b. Explain with a concept or running basic lladoop Benchmarks. (8 Marks) to and from HDFS and i
Ans. Running Basic Hadoop Benchmarks: Many Hadoop benchmarks can provide insight file size and number offi I
into cluster performance. The best benchmarks are always those that reflect real application benchmark, you should r
performance. steps. In the following ex
The two benchmarks discussed arc : tcr asort and TcstDFSIO, provide a good sense of how 16files of size IGB ares
well your Hadoop installation is operating and can be compared with public data published mapreduce-client-jobclien
for other Hadoop systems. The results, however, should not be taken as a single indicator for file. Running it with no
ssytem-wide perfommnce on all applications. . (load testing the NameN~
The following benchmarks are designed for full Hadoop cluster installations. These tests commonly used Hadoop I
assume a multi-disk HDFS environment. Running these benchmarks in the Hortonworks reported of these benchm
Snadbox or in the pseudo-distributed single-node install is not recommended because all I . Run TestDFSIO in w~
input and output ( 1/0) are done using a single system disk drive. $ynrnjar$HADOOP E
Running the Ter11sort Test -+ TestDf.'SIO -write .:;-,rf
The terasort bechmarks so11s a specified amount of randomly generated data. Example results arc as fol
This benchmark provides combine testing of the HDFS and MApReduce layers ofa Hadoop fs.TestDFSIO: ----TcstO
cluster. A full terasort benchmark run consists of the following three steps: fs.TestDFSIO: Date & ti1
I. Generating the input data via teragen program. files: 16
2: Running the actual tcrasort benchmark on the input dllta. fs.TestDfSIO: Total M
3. Validating the so1ted output data via the teravalidatc program. 14.890106361891005 fs.
In general, each row is 100 bytes long; thus the total amount of data wrillen is 100 times fs.TcstDFSIO: 10 rate ~t
the number of rows specified as part of the benchmark (i.e., to write 100GB of data, use I sec: 105.631
billion rows). The input and output directories need to be specified in HDFS. The following 2. Run TcstDFSIO in re
sequence of commands will run the benchmark for 50G B of data as user hdfs. Make sure the $ yarn jar $HA DOOP_ E
/ user/hdfs directory exists in HDFS before running the benchmarks. --> TcstDFSIO -read -nrFi
'Bi,g,V~ A ~
mespaccs to the HDFS file I. Run tcragen to generate rows of random data to sort.
S yam jar SHADOOP_EXAMPLES/hadoop-mapreduce-examples.jar teragen
olly without placing a burden 500000000
-+/user/hdfsfferaGen-50O0
scales the file system read/ 2. Run tcrasort to sort the d u tu base.
$ yam jar $HA DOOP_EX/\MPLES/hadoop-mapreduce-examples.jar h:rasort
ces.
gories or applications lo be -+/user/hdfsfferaGcn-50GO /user/hdfs/TcraSort-50O8
ces. 3. Ru n teravalidatc to validate the sort.
accomplished. NameNode I S yarn jar SHA DOOP_EXAMPLES/hadoop-mapreduce-cxample.jar ter::ivalidate
cle2 manages the / data and / -+/user/hdfsfferaSort-50GB /user.hdfsfferaValid-50GB
ach other and the DataNodes To report results, the time for the actual sort (terasort) is measured and the benchmark rate in
megabytes/second (M13/s) is calculated. For best performance, the actual tcrasort benchmark
should be run with a replication factor of I. In addition, the default number of terasort reducer
tasks is sel to I. Increasing the number of reducers oOcn helps with benchmark performance.
For example, the following command will instruct tcrasort to use four reducer tasks:
S yarn jar SIIADOOP_EXAMPLES/hadoop-mapreduce-example.jar terasort
.... -Dmapred.reduce.tasks~4 /user/hdfs/TeraGen-50GB /user/hdfsfferaSort-50GB Also,
do not forget to clean up the terasort data belween runs (and after testing is finished). Tite
following command will perform the cleanup for the previous examplt::
$ hdfs dfs -rm -r -skipTrash 'Iera•
Running the TcstOFSIO Benchmark
Hadoop also includes an I IDFS benchmark application called tcstDFSIO. Tite TestDFSIO
ex ample. benchmark is a read and write test for I IDFS. That is, it will write or rc:ad a number of files
(8 Marks) 10 and from HDFS and is designed in such a way that it will use: one map task per file. The
ch~arks can provide insight file size and number of files are specified as command-line arguments. Similar to the terasort
sc that reflect real application benchmark, you should run this test as u~er hdfs. Similar to terasort, TestDFSIO has several
steps. In the following example.
provide a good sense o~how 16files of size I GB are spc:cificd. Note that the TestDFSIO benchmark is part of the hadoop-
~d with public data published mapreduce-client-jobclicnt.jar. O1her benchmarks are also available as part of this jar
taken as a single indicator for file. Running it with no arguments will yield a list. In addition to TestDFSIO, NNBench
(load testing the NameNo<lc) 3nd MRBcnch (load testing the MapReduce framework) are
ter installations. These tests commonly used Hadoop benchmarks. Nevertheless, TestDFSIO is perhaps the most widely
hmarks in the Hortonworks reported of these benchmarks. The steps to run TestDFSIO are as follows:
t recommended because all I. Run TestDFSIO In write mode and create data.
•e. S yam jar SHA DOOP_EXAMPLES/hadoop-mapreduce-clicnt-jobclient-tests.jar
-+ TestDfiSIO-wrilc -nrfiles 16 -fileSize 1000
encratcd data. Example results are as follows (data and time prefix removed).
pReduce layers of a Hadoop fs.TcslDFSIO: -----TestDFSIO ----: write
three steps: fs.TestDFSIO: Date & lime: Thu May 14 10:39:33 EDT 2015 fs.TestDFSIO: Number of
files: 16
fs.TestDFSIO: Total MBytes processed: 16000.0 fs.TestDFSIO: Throuhput mb/scc:
14.890106361891005 fs.1estDFSIO: Average 10 ralc mb/sec: 15.690713882446289
of data written is 100 times fs.TestDFSIO: 10 rate std deviation: 4.0227035201665595 fs.TestDFS IO: Test exec time
write I00GO of data, use I sec: 105.631
ed in HDFS. The following 2. Run Tcst0FSIO in read mode.
as user hdfs. Make sure the $ yam jar $HA DOOP_EXAMPLES/hadoop-mapreduce-client:jobclient-tests.jar
-+ TcstDFSIO-read -nrFiles 16 -fileSize 1000
87
VIII Semt (CSE/IS£) 8~V~A~ C13CS - Modei-Q~
Example results are as follows (data and time prefix removed). The large standard deviation archives to be unarch
is due to the placement of tasks in the cluster on a small four-node cluster. fs.TestDFSIO: rhe general commam
--TestDFSIO --- : read bin/hadoop command
fs.TestDFSIO: Data & time: Thu May 14 10:44:09 EDT 2015 fs.TestDFSIO: Number of
files: 16
fs.TestDFSIO: Total MBytes processed: 16000.0 fs.TestDFSIO: Thrughput mb/sec: 2. a. Write a short note o
32.38643494172466 fs.TestDFSIO: Average 10 rate mb/scc: 58.72880554199219 i. Specuulative exec
fs.TestDFSIO: 10 rate std deviation: 64.60017624360337 fs.TestDFSIO: Te~t exec time sec: ii. Hadoop Map red;
62.798 Ans. i. Specuh1tivc EllCd
3. Clean up the TestDFSIO data. predict or manage un
$ yam jar SHA DOOP_EXAMPLES/hadoop-mapreduce-client-jobclient-tests.jar and monitor resourc
-. TestDFSIO -1.:lenn prac1ice, however, tH
Running the TestDFSIO and terasort benchmarks help you gain confidence in a Uadoop possible that a cong ,
installation and detect any potential problems. It is also instructive to view the Amabri some olher similar p
deshboard and the YARN web GUI (as described previously) as the tests run. When one pan of a
Managing Hadoop MapReducc Jobs else because lhe ap
Hadoop MapReduce jobs can be managed using the maprcd job command. The most of lhe parallel Map
important options for thi~ command in terms of the examples and benchmarks are -list, -kill, that input da1a are i
and -status. copy of a running m
In particular, if you need to kill one of the examples or benchmarks, you can use the mapred example. suppose 1h.
job -list command to find the job -id and then use mapred job -kill <job-id> to kill the job that some are still
across the cluster. MapReduce jobs can also be controlled at the application level with the busy or free servers.
yam application command. The possible options for mapred job are as follows: then terminated (or
S mapredjob approa.ch can be ap~
Usage: CLI <command> <args> [-submit <job-file>] execution can seem
[-status <job-id>] .xml configum1ion ~
[-counter <job-id> <group-name> <counter-name>) (-kill <job-id>] ii. Hadoop MapRe
[-set-priority <job-id> <priority>] . Valid values for priorities are: VERY HIGH HIGII The capability of I
NORMAL LOW VERY LOW -
foilures can intluenc
[-events <job-id> <from:-evcnt-#> <#-of-events>) [-history <jobHistoryFilc> J Hadoop clusters ha
[-list [all] J
possible for many d;
[-list-active-trackersJ
a cluster.
[-list-black Iisted-tmckers]
The use of server
[-lisl-attempt-ids
RED ..;job-id> <task-t..,..,..>
., r- <task-state>] • "vii I'd va Iucs for <task-typ.!> arc
I somewhat dilfercnl
. UCE MAP. Valid values for <taks-state> are running, completed
is possible lo build
[-krll-task <task-attempt-id>) [-fail-task <task attempt-id>)
[-logs <job-id> <tosk-allempt-id>] nodes). However.a
Generic options supported are bo1h roles. Another
tolcra1e dissimilars
-conf <configuration file> specify an application configuration file
large disparities in
-D <property; valucs> use value for given propeny
-~s <local I namcnodc:port> specify a namenode MapReduce executi
·JI <local I resourcemunager:port> specify II RcsourceManager b. Ei.plaln com pili ng
-files ~comma separated list offiles> specify comma separated files 10
be cop,ell to the m:ip reduce cluster Ans. rile Hadoop Grep.j
-libjars _<comm? scpamtcd list ofjars> specify comma separated jar times they occum:
files to include m the classpath. docs not display the
-archives <comma separated list of archives> specify conima separated needed for the strin
The program runs
88
oved). The large standard deviation archives to be unarchived on the compute machines.
II four-node cluster. fs.TestDFSIO: The general command line syntall. is
bin/hadoop command (genericOptionas) [commandOptions]
T 2015 fs.TestDI-SIO: Number of OR
.TestDFSIO: Thrughput mb/sec: Write a short note on following
te mb/sec: 58.72880554199219 i. Spccuulative e,.ccution
fs.TestDFSIO: Test exec time sec: ii. lladoop Map reduce h:irdwure. (4 Marks)
Ans. i. Speculative Execution: One of the chalhtngcs with man)' large cluster is the inability to
predict or manage unexpected system bonlcncd.s or failures. In theory, it is possible to control
and monitor resources so that network traffic and processor load can be evenly balanced: in
client-jobclient-tests.jar practice, however, this problem represents a difficult challenge for large systems. Thus, it is
possible that a congested m:twork, slow disk controller, failing disk, high processor load, or
ou gain i:onfidence in a lladoo~ some other similar problem might lead 10 slow performance without anyone noticing.
instructive to view the Amabn When one pait of a Map Reduce process runs slO\\ ly, it ultimately slows down evel)'thing
ly) as the tests run. else because the application cannot compkte until all processes arc finished. The nature
of the parallel MapReduce model provides an interesting solution 10 this problem. Recall
apred job command. The most that input data arc immutable in the MapReduce process. Thercforc,il is possible to start a
les and benchmarks are -list, -klll, copy of a running map process without disturbing any other running mapper processes. For
example, suppose that as most of the map tasks arc coming to a close, the ApplicationMaster
chmarks, you can use the mopred that some arc still running and schedules redundant copies of the remaining jobs on less
ed job -kill <job-id> to kill t~e job busy or free servers. Should the secondary processes finish first, the other first processes are
•d at the application level with the then terminated (or vice versa). This process is known as speculative cxc<:ution, The same
cd job are as follows: approach can bi: applied to n.:duccr processes that sc!\!m to be taking a long time. Speculative
execution can seem to have a slow spot. It can als? b.: turned olf and on in the mapred-si1e
.xml configuration file.
Ii. Hadoop MapRcduce Hardwuc
<job-id>] The capability of I ladoop MapRcduce and IIDFS lo tolerate server-or even whole rack-
iorities are: VERY_HIGH HIGH
failures can influence hardware designs. The use of commodity (typically x86_64) servers for
Hadoop clusters has made low-cost, high-cost, high-availability implementations ofHadoop
<jobH istoryFile>1 possible for many data centres. Indeed, the Apache Hadoop philosophy seems to assume on
a cluster.
The use of server nodes for both storage (I IDFS) and processing (mappers, reducers) is
somewhM dilferent from thi: traditional separation of these two tasks in the data centre. II
Valid values for <task-type> are is possible to build Hadoop system and separate the roles (discrete storage and processing
completed nodes). However.a majority of Iladoop systems use the general approach where servers enact
l both roles. Another interesting feature of dynamic MapReducc execution is the capability to
tolerate dissimilar servers. Thnt is, old and m:w hardware can be used together. Of course,
large disparities in performance will limit the faster systems, but the dynamic nature of
MapReduce execution will still work effectively on such systems.
b. E:r.plain compiling and running the lladoop Grep chaining cumpte .. ith pro,:ram.
(08 Marks)
Ans. The Iladoop Grep.java example extracts matching string from text fi lcs and counts how many
times they occurred. Thc command works differently from the •nix grcp command in that it
docs not display the co111ple1e matching line, only the matching string. If matching lines are
needed for the string foo, use. •roo. • as a regular expression.
The program runs two map / reduce jobs in sequence ans is an example of MapReduce
89
VIII Sem, (CSc/IS'f) cscs · ModevQ~
chaining. The first job counts how many times a matching string occurs in the input, and the grepJob.setOutputKeyCla
second job sorts matching strings by their frequency and stores the output in a single ft le. grepJob.se1Output-Valuescl
Listing 1.1 displays the source code for Grcp.java. grcpJob. waitforCompletio
Note that all the Hadoop example source files can be extracted by locating the hadoop- Job sortJob = new Job (co
mapreduce-example-•-sources.jar either from a Hadoop distribution or from the Apache Fi le InputFormat.setlnputP
Hadoop website (as part of a full Hadoop package) and then extracting the files using the sortJob.sctlnputFormatCla
following command (your version tag may be different): sortJob.setMapperClass(ln
$ jar xf hadoop-mapreduce-example-2.6.0 sources.jar. sortJob.setSortNumReduc
Listing 1.1 Hadoop Grep.java Eumplc sctOutputPath(sortJob,nc
package org.apache.hadoop.examples; sortJob. waitForCompletio
import java.util.Random; }
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf. finally {
Configuration; import org.apachc.hadoop.fs.Filesystem; import org.apache.hadoop.fs.path; FileSystem.get(conf).delet
import org.apache.hadoop.io. LongWritable; }
import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.•; return O;
import org.apache.hadoop.mapreduce.lib.input.FilelnputFormat; }
import org.apache.hadoop.mapreduce.lib.input.Sequencefilelnputformat; public static void main (St
import org.apache.hadoop.mapreduce.lib.map.lnverseMapper int res == ToolRunncr,run(n
import org.apachc.hadoop.mapreduce.lib.map.RegexMapper; System.exit(rcs);
import org.apache.hadoop.mapreduce.lib.output.fileOutputformat; }
!mport org.apache.hadoop.mapreduce.lib.output.SequenceFileOutpulFormat; }
~mport org.apache.hadoop.mapreduce.l ib.reduce. LongSumReducer; In the preceding code, each
import org.apache.hadoop.util.Tool; provided regular exprcssio
import org.apache.hadoop.util.ToolRunner; task and extracts text mate
/*Extracts matching regexs from input files and counts them.*/ output as <matching strin
public class Grep extends configured implements Tool ( sum up the number of mate
private Grep ( ) () // singleton ' reducer uses the LongSum
public int run ~String [] args) throws Exception { if(args .length <3) { key.
System.out.prit In ("Grep <inDri> <outDir> <regex> (<group>]"); The second job takes the ou
return 2;
reverses (or swaps) its input
)
so the Identity Reducer class
Path tempDir =
There is also an ldentityMal
new path ("grep-temp-"+
s1ored in one file and it is soJ
Integer .toString (new Random ( ) .
nextlnt (Integer .MAX_YALU£)); count and a string per line.
The example also demonst ,
Configuration conf=getCom () ;
or a reducer.
~onf.set (RegexMapper .PATTERN, arge(2] );
tf(args. length = .. 4) The following discussion de
conf .set (RegexMapper .GROUP, nrgs [JJ ); The steps are similar to the p
Job grepJob "' new Job (count); I. Create a directory for the
try { $ mkdir Grep_classes
grepJob.setJobName ("&rep-search")· 2. Compile the WordCount.j
Filelnputformat.setlnputPaths (grcpiob, args[OJ); $ javac -cp 'hadoop classpat
grepJob.setMapperClass (RegexMapper.class) ; 3. Create a Java archive usin~
grepJob.selCombinerClass (LongSumReducer.class); $ jar -cvf Grep.jar -C grep_ch
~pJob.setReducerClass (LongSumReducer.class); If needed, create a directory a
FtleOutputFormnt.setOutputPath (grepJob tempDir)· $ hdfs dfs -mkdir war-and-pe
$ hdfs dfs -put war-and-peace
grepJob.setOutputFormatClass (SequencefileOutputFormat.class);
90
S.,,1,l-11.- e«- 5r...,,,,..,..
I 8~V~A~
g occurs in the input, and the grepJob.sctOutputKeyClass (Text.class);
s the output in a single file. grepJob.setOutputValucsclass (Long Writable.class);
grepJob.waitForCompletion (true);
·ted by locating the hadoop- Job sortJob - new Job (conf); sortJob.setJobName ("grep-sort'');
ibution or from the Apache FilelnputFormat.setlnputPaths (sortJob, tempDir);
extracting the files using the sortJob.sctlnputformatClass(Sequencefilelnputfonnat.class);
sortJob.setMapperClass(lnverscMapper.class);
sortJob.setSortNumReduceTasks (I); //Write a single file FileOutputFormut.
sctOutpulPath(sortJob,new path(args [I]); LongWritablc.DccreasingComparator.class);
sortJob.waitForCompletion (true);
}
finally {
org.apache.hadoop.conf.
FilcSystcm.get(conf).delcte (tcmpDir, true);
org.apache.hadoop.fs.path;
}
return O;
apreduce.•; )
public static void main (String [] args) throws Exception {
utformat;
int res = ToolRunner,run(new Configuration ( ), new Grew (), args);
System.exit(res);
}
at; }
utputformat;
In the preceding code, each mapper of the first job takes a line as input and matches the user-
cer; provided regular expression against the line. The RcgcxMnppcr cfnss i~ used to perform this
task and extracts text matching using the given regular expression. Thc matching strings are
output as <matching string, I> pairs. As in the previous WordCount example, each reducer
sum up the number of matching strings and employs a combiner to do local sums. The actual
reducer uses the LongSumRcducer class that outpub 1hc sum of long values per reducer input
key.
<3){
The second job takcs the output of the first job as its input. The mapper is an invcrsemap thal
I"); rcverses (or swaps) its input , key, value> pairs into <valuc,key>.There is no reduction step,
so the ldcntityRcduccr class is used by default. /1. II input is simply passed to the output (Note:
There is also an ldentityMappcr class.) The number of reducers is set to I, so the output is
stored in one file and it is sorted by count in descending order. The output text file contains a
count and a string pcr line.
The example also dcmonstmh:s how to pass a command-line parameter to a mapper
or a reducer.
The following discussion describes how to compile and run the Grep.java example.
The steps arc similar to the previous WordCount example:
I. Create a directory for the application classes as follows:
$ mkdir Grep_classes
2. Compile the WordCount.java program using the following line:
S javac -cp 'hadoop classpath' -d Grep_classcs Grcp.javn
3. Create a Java archive using the following command:
$ jar -cvf Grep.jar -C grep_classes/.
If needed, create a directory and move the war-and-pcacc.txt lite into I IDFS:
S hdfs dfs -mkdir war-and-pcaC\.'-input
S hdf~ dfs -put war-and-peacc.txt war-and-peace-output
91
VIII Se+1'11 ( CSE/IS£)
As always, make sure the output directory has been removed by issuing the following Container: container_ 143266'
command: ....:----=-=-- - -- ---
$ hdfs dfs -rm -r -skipTrash war-and-peace-output
Entering the following command will run the Grep program: Container: container_ 142667
$ hadoop jar Grep.jar org.apache.hadoop.example.Grep war-and-peace-input -=====-===-~= - == It:
92
cacs • ModeiQ~Pc;q,u. 3
y issuing the following Container: container_ l432667013445_000 I_0 I_OOOOOI on no_45454
93
VIII Se.+111 (CSE:/ISE:)
• Control flow nodes enable decisions to made about the previous task. Control decisions
run on standard Hadoop
are based on the results of the previous action {e.g., file size or file existence). Decision
inefficienl nml totally unn
nodes are essentially switch-case statements that use JSP . . YARN provides the user
EL (Java Server pages-Expression Language) that evaluate to either true or false. Figure 3.2
MapReduce. Support for
depicts a more complex worl.folw that uses all of these node types.
In addition, using the flex
..
own web interface to mo
iii) I lamster: Hadoop a
The Message Passing lnte
ERROR MPI is primarily a set of
operate over popular ser
have full control over thei
run within a Hadoop clus
discussion of the issues in
~Wo,ldloWDAO
version ofMPICH2 is ava
Figure 3.1 A Jimple 001,ie DAG worJ.jfow iv)Apache Spark
Spark was initially develo
,:;. performance, such as ite
interactive d(lta mining. Sp
Spark holds intermediate r
holds supports more than
possible analyses that can
Java, and Pylhon.
Since 2013, Spark has been
of porting and running Sp
single underlying file syste
.·--
J•- ......., . .__.__
..
.. •(! ••-~ ... .. .. , ••
Clw:.I!. \\\1!. \!S.l!.t ~~ ~~'j
Figure 4.4, is presented.
new property is ohanged
. --
The new property will n
< ) ·;
m - -- --
·--- • ·--- • ·--- •
....
-· Figure 4.5, the Restart b
To be safe, the Rcsta11 Al
service will be restarted ·
Aller the user clicks R•
displayed. Click Confirm
Save Co
---
Fl ure
--
Figure 4.1 Ambari YARN properties view
Summary eon11g,
The number can be options offered depends on the service; the full range of YARN properties
can be viewed by scrolling down the form. To easily view the application logs, this property 0....., YARNOola1M1(•) I .
must be set to true. This property is normally on by default. As an example for our purpose
Figure 4.5 Ambarl
here we will use the Ambari interface to disable this feature. As shown in Figure 4.2, when
a pr~pe is chan ed the ~ccn Save button becomes activated. Confirma
>:•
. _..._ You are at>ot.t to
This ,. 11 :r 991!1
... Ma1n1~• .lt'C8 !
.,.,,,.aa~• 0 • 0
Figure 4.
Figure 4.2 YA RN prop,uties wit!, log aggre1:atio11 t11me1/ off Similar to the DataNodc re•
Changes do not become permanent until the user clicks the Save button. A save / notes progress bar is for the entire
window will then~ displayed. It is highly reco1rnrn:nded that historical notes concerning the arrow to the right of the bar
change be added to this window. Once the restart is complet
nf.~v==e:".c~o='n~filg
i:::u-:-,-a:-;
tl-o-
n- - - - - - - - - - ~
YARN ResourceManager Ap
down menu in the middle oil
4.8 will be displayed .
Ambari tracks all changes m
4.1 and in more detail in Fig
created. Reverting back to a
potential for version confusi
Figure 4.3 Amburi ,•011jig11rutio11 l·u11d11otes w/11<10111. Figure 4.3 and Figure 4. t I).
96
... _____ .., .. Once the user adds nny notes a<ld clicks the Save bunon. another window, shown in
Figure 4.4, is presented. This window confirms that the properties have been saved. O~ce the
new property is changed, an orange Restart button will appe~r at the top left of the wmdO\~.
The new property will not take effect unlll the required services are restarted. As shown tn
- . -. Figure 4.5, the Restart button provides two options: Restart All and Restart NodeManagers.
To be safe the Restart All should be used. Note that Restart All docs not mean all the Hadoop
service will
be restarted ; rather, only those that use the new property will be restarted.
After the user clicks Restart All, a confirmation window, shown in figure 4.6 will be
displayed. Click Confirm Restart All to be in the cluster-wide restart.
ve Configuration Changes
m
Fl ure 4.4 Ambur/ co11 11r"tio11 clw11 e 1101/ 1mllo11
Sunvna,y Cor,'G• Q..d'.ll.Q•
/el view .
he full range ofYARN properties
e application logs, this property
. As an example for our purpose Figure 4.5 Am/Jar/ R~larl 1111clio11 a es i11 lervlce ro erlles
e. As shown in Figure 4.2, when onfirmation X
rated
Voll aro abOl.t 10 leslllrt YAHi
Tn,s " , gge, ., M:i ~. tr~ ur,w 1. •e.tartO<I To ~ppioss alMS.tum OIi
MaT'~ffdr,c.- Mxiv rlJ, •'M,,, ~•I)! IC •IJlnr~ r&slJ~ ail
97
VIII Se.m, (CSf/ISf) 8£'1'Vatl;l,A~
current version is indicated by a green Current label in the horizontal version boxes or in the
dark horizontal bar. Scrollin thou h the version boxes
1 Background Operationa Running •
.:l rm
cc----.
v-• . .
c..,. ..
■- ...
a::,
,,,_&,_..,.....,c....
Con"9'to<OOZIE
.,1'1e51111_.<lll--~ _.,,..,:,o,i,O:.Z
---· 0
.........._..._ ..
Fi 11re ,. 7 Ambur/ ro ress wl11dow ur c/u.fter wide YARN start
. . . . . . . . . ...
~
,,...
·-·-
- L09~ not iva,lab~ for Job_104480791998_0001 "99~a11on m1y not t,., co~oete CM<k
~ck later or try the nOdemanooer al nl •5•5•
Fl,:ure 4.8 YARN Resourcc/lla11agcr l11terface wlt/1 lug aggregatio11 turm!d off
---
or pulling down the menu on thi: ten-hand side of the dark horizontal bar will display the
previous configuration versions.
To revert to a previous version, simply select the version from the version boxes or the pull- ""'PM■
down menu. In Figure 4.10, the user has selected the previous version by clicking the Make
Current button in the information box. This configuration will return to the previous state ,.,..,_.
--..--
where log aggregation is enabled.
Figure-I.I
As shown in Figu
is saved. Again. it
box. When the sa
configuration. The
needed before the c
98
c.:acs - ModeltQ~Paper - 3
orizontal version boxes or in the &ullffil/)' Conllgo
o..o...--
...
- - M (10,
"""'
.:l rm,. __- cm ........ _ - 11 . . . . ._ - m •• _,,.ag,J
• .
---
100'II,
.
f'k•W:Juf0.1 M.UM()t 1'
'°""' • .....
'°""' •
'°""' •
,..,...
~-
_.,,. 1024
0 • 0
• yam.ad- •
·- .. yarn_... • ,.!. .
yarnJoO~ 0 • 0
El Figure 4.9 Ambarl co11jig11ration cl,ange 111a11age111ent for YARN service (Version JI/2
current
wide YARN start
rm
..,_
---•---•-..•• •"u• ·-- --·
rm g-·- rm ~---· m. _...
- m _...
-· _. ■
-- -
• ~ ~ ""90'
,: aggregation tumed off
horizontal bar will display the
99
,
VIII Semt (CSE/ISE)
:.Fs~~
In th~ following example, a simple four-node cluster is used to demonstrate tne
~~::~~t~e gateway. Other potential options, including those, related to sec:~~!~;
HDFS . s e ant is e>.ampl~. A D~taNodc is used a!, the gateway node in this example 'and
Log into a DatnNode and 0
Data Node nO is used as th
# service rpebind stop # se
Next, start the HDFS gatei
is mounted on the main (login) cluster node. ,
Step I : &I Conliguration Files and nfs3 as follows :
S~veral Hadoop configuration lites need to be chan ed I h' . #/usr/hdp/2.2.4.2-2/hadoo
will be used to niter the IIDF'S configuratio Iii Dg . n t ts example, the Amban GUI hadoop.sbin/hadoop-daemo
until all lhe following changes are made ~f es. o not sa~e the changes or restart HDF'S The portmap daemon will
the~.: files by hand and then rcstar1 the ap~rop~;t~ :': ?~' u~,ng Ambari, you must change /vvar /log/hadoop/root/h.id
cnv1ronmen is assumeck crvices across the cluster. The following To confirm the gnteway is
• OS: Linux look like the following:
• Platform: RHEL 6.6 #rpcinfo -p no
• Hononworks IIDP 2.2 with lladoop version- 2 6
Several properties need to be add d . . ~
• e to the /etc/hadoop/conf ig/core-site.xml file.
Using LJ.§
100
Sw.~i,_... E.c- ~""-
A b · 0 to the HOFS service window and sel«t the Configs tab. Toward the bottom of~he
X m an, g r k · the Custom core- site xml section Add the following
screen, sele~ tll(c1hA~d1e~~:yro~n1h:key field Ill Ambari is· tile name fi~ld included in this
two properties e 1
code):
<property>
<name>hadoop.proxyuser.root.groups</name>
E·AiA: :I <value>•</value>
< /property>
,r 11ew ,·01,jigurut/011
<property>
nabri versioning tool: <name>hadoop.proxyuser.root.hosts<lname>
5 created.Reverting 10 a previous <value>•<Jvalue> <!property>
The name of the user who will start the Hadoop NFSv3 gateway is placed in the name field.
thout having to change or restart
In the previous example, root is used for this purpose. This setting can be any user who
starts tile gateway. If, for instance, user nf sadmin starts the gateway, then the two names
would be hadoop.proxyuser.nfsadmin.groups and hadoop.proxyuser. nfsadmin.hosts. The •
he service by using the
value, entered in the preceding lines, opens the gateway to all groups and allows it to run on
any host. Access is restricted b entering groups (comma separated) in the group's property.
·v3Gateway to HDFS Entering a host name for the host's property can restrict the host running tile gateway.
(04 M11rks) Next, move 10 the Advanced hdfs-site.xml section and set the following property: property>
<name>dfs.nfs3.dump.dir</name>
Feature enables files to be easily <value>/tmp1.hdfs-nfs<lvalue>
vay supports NFSv3 and allows <lprop1my>
, . Currently the NFSv3 gateway The NFSv3 dump directory is needed because the NFS client ofierl recorders writes.
Sequential writes can arrive at the NFS gateway in random order. This directory is lZ to
>eal file system using an 'NFSv3 temporarily save out-of-order writes before writing to I IDFS. Make sure the dump directory
has enough space. For example, if tile application uploads 10 files, each of size 100MB, it is
~eir local file system. recommended that this directory have IGB of space to coYer a worst-case write reorder for
y to the IIDFS file svstem. every file.
t point. File append is supported, Once all the changes have been made, click the green Save button and note the changes you
made to the Notes box in the Save confinnation dialog. Then restart all of HDFS by clicking
llameNode, or any HOFS client. the orange Restart button.
at https:/' hadoop. apache.org,I Step 2: Start the Gateway
11ay Log into a DataNode and make sure all NFS services are stopped. In this example,
Data Node nO is used as the gateway.
cd to demonstrate tne steps for # service rpebind stop# service nfs stop
g those, related to security, are Next, start the HDFS gateway by using tile hadoop-daemon script to start portmap
way node in this example, and and nfs3 as follows :
#/usr1hdp/22.42-2/hadoop!sbin/hadoop-daemon.sh start portmap #/usr/hdp/224 2-2/
hadoop.sbin/hadoop-daemon.sh start nfs3
is example, the Ambari GUI The portmap daemon will write its log to
e the cllanges or restart HOFS /vvar /log/hadoop/roollhadoop-root-nfs3-n0.log
g Amb:iri, you must change To confinn the gateway is working, issue the following command. The output should
the cluster. The following
look like the folio" ing :
#rpcinfo -p no
program vers proto port service
10000.S 2 top 4242 mountd
f ig/core-site.xml tile. Using
101
VIII Setfl/ (CSF:/ISE) 13iif'V~A~
102
I. Siscnsc 20. Palo OL/\P Server
2. /\ctu~t.: Rusin.:ss lntellig.:ncc and Rc:porting Tools (BIRr) 21. Pcnlaho
3. icCube 22. Profit base
4. Domo 23. QlikVic11
5. 80(,rd Management lntcllig.:nc,: Toolkil 24 Rapid tn<ight
6. Clear Anni) tics 25. SAP busine5s intelligence
7. Duccn 26. SAP BusinessObJccts
8. Gooddala 27. SAP Net Weaver OW
9. IOM Cognos lnh:lligence 28. SAS DI
command: I0. lnsightsquared 29. Silvon
11. JasperSoll 30. Solver
12. L<>oker 31. SpagoOI
eek the previously 13. Mieroson Bl platform 32. SQL Server Analysis Services
14. MicroStralcgy 33. SI) le lnlclligcncc
15. MITS 34. Syntell solutions
he main login node is used.
16. Opcnl 35. rargit
the following directory :
17. Oracle 81 36. Vismatica
way node will be different 18. Orack Enterprise 81 Server 37. WebFOCUS
i: name. #mount •t nfs -0 19. Oracle llyp,:rion Sy5tcm 38. Ycllowfin 01
The 131 tool used 111 our orgamzatton : Educallon
client users. The following As higher education becomes more expensive and competitive, it is a great user of data-based
decision-making. There is a strong need for efficiency, increasing revenue, and improving the
quality of studt:nl eKperience at all levels of education.
er var The gateway in the I. Student enrolment (recruitment and retention): Marketing 10 new potential students
an requires that the login requires schools to develop profiles of the sludents that are most likely to attend. Schools can
DFS. For example, if the develop models of what kinds of students are anracled lo the school, and then reach out to
s user admin and existing those students. Tht: students al risk of not returning can be flagged, and corrective measures
can be taken in time.
ient machine has the same 2. Course offerings: Schools can use the class enrolment data to develop models of which
is usually not a problem if new courses arc likely lo be more popular with students. This can help increase class size,
to create and deploy users reduce costs, and improve student satisfaction.
3. Alumni pledges: Schools can develop predictive models of which alumni are most likely
to pledge financial suppon to the school. Schools can create a profile for alumni more likely
to pledge donations to the school. This could lead to a reduction in the cost of mailings and
zation. Explain any 2 or other forms of outreach to alumni.
(IO Marks)
by experts from Finances b. What are the sources and types ordata for a data warehouse? How will data warehousing
ems designed to capture, evolve in the age of social media? (06 Marks)
s for improved decision Ans. Data Sources: DWs arc created from structured data sources. Unstructured data, such as text
it will combine, including data, would need to be structured before insened into DW.
d external data extracted I. Operations data include data from all business applications, including from ERPs systems
en macroeconomic data. Ihat form the backbone of an organization's IT systems. The data to he extracted will depend
upon the subject math:r of DW. For example, for a sales/marketing DW, only the data about
insight on 1heir overall
customers, orders, customer service, and so on would be extracted.
103
VIII Sem., (CSf(ISE) CS-Modei,Q~P
2. Other applications, such ns point-of-sale (POS) terminals and e• commerce applications, for example, work experie
provide customer-facing data. Supplier data could come from supply chain management 5. Data elements may need
systems. PlaMing and budget data should also be added as needed for making comparisons currency values may need
against targets. the same base year for com
3. External syndicated data, such as weather or economic activity data, could also be added 6. Outlier data elements
to DW, as needed, to provide good contextual information to decision makers. of results. For example, o
Three main types or Data Warehouses are: educational setting.
I . Enterprise Data Wareho use: 7. Any biases in the selecti
Enterprise Dalo Warehouse is a centralized warehouse. It provides decision support service of the phenomena under
across the enterprise. It offers a unified approach for organizing and representing data. It nlso than is typical of the popul
provide the ability to classify data according to the subject and give access according to those 8. Data should be brought
divisions. available daily, but the sal
2. Operational Data Store: relate these variables, the
Operational Data Store, which is also called ODS, are nothing but data store required when case, monthly.
neither Data warehouse nor OLTP systems support organiz.ations reporting needs. In ODS, 9. Data may need to be se
Data warehouse is refreshed in real time. Hence, it is widely preferred for routine activities much variability, because
like storing records of the Employees. may dull the effects of otli
3. Data Mart: information density of the
A data mart is a subset of the data warehouse. II specially designed for a particular line of
b. Describe some key steps
business, such as sales, finance, sales or finance. In an independent data mart, data can collect Ans. Data has been described 35
directly from sources. The volume of data used
A DW project reflects a significant investment into IT. All ofthe best practices in implementing
and continues to grow. Fo
1111y IT project should be followed.
1. The DW project should align with the corporate strategy. Top management should be
downloaded from Science
profiles on Scopus and 3
consulted for setting objectives. Financial viability Return on Investment (ROI) should be
harder for a user to grab a
established. The project must be managed by both IT and business professionals. The OW That's where data visual"
design should be carefully tested before begiMing development work. It is of\en much more
and easy-to-understand vis
expensive to redesign after development work has begun.
There arc many advanced
2. It b important to manage user expectations. DW should be built incrementally. Users
fur specialized purposes s
should be trained in using the system, and absorb
disaster relief monitoring.
the many features of the system. is to help readers sec a pal
J. Quality and adaptability should be built In from the start. Only cleansed and high· read tedious descriptions s
quality data should be loaded. The system should be able to adapt to new access tools. As a profit growth of 25% in
business needs change, new data marts can be created for new needs
visualization summarizes i
OR on the points that are relev
6. a. Why Is data preparation so important and time consuming? (0-1 Marks)
An analysis clearly explain
creating a good visualizati
Ans. Data cleansing and prcpnmtion is a laborintensive or semiautomated activity that can take up
to 60 to 70 percent of the time needed for a llata mining project. Visualization E"'ample
To demonstrate how each
I. Duplicate data needs 10 be removed. The same data may be received from multiple sources.
a company who wants to
When merging the data sets, data must be de-duped.
important raw sales data fo
2. Missing values need to be filled in, or those rows should be removed from analysis. Missing
values can be filled in with average or modal or default values. Produ
3. Data elements may need to be transformed from one unit to another. For example, total AA
costs of health care and the total number of patients may need to be reduced to cost/patient to 88
allow comparability of that value.
4. Continuous values may need to be binned into a few buckets to help with some analyses.
cc i
DD
ti.MA
104
-
CBCS - ModevQ~P"'Pet"" - 3
e- commerce applications, For example, work experience could be biMed as low, medium, and high.. .
upply chain management 5. Data clements may need to be adjusted to make them comparable over ttme. !'or example,
d for making comparisons currency values may need to be adjusted for inflation; they would need to be converted to
the same base year for comparability. They may need to be converted to a common currency.
data, could also be added 6. Outlier data elements need to be removed after careful review, to avoid the skewing
sion makers. of results. For example, one big donor could skew the analysis of alumni donors in an
educational setting.
7. Any biases in the selection of data should be corrected to ensure the data is representative
s decision support service of the phenomena under nnnlysis. If the data includes many more members of one gender
d representing data. It also thon is typical of the population of interest, then adjustments need to be applied to the data.
,e access according to those 8. Data should be brought to the same granularity to ensure comparability. Sales data may be
available daily, but the sales person compensation data may only be available monthly. To
relate these variables, the data must be brought to the lowest common denominator, in this
t data store required when case, monthly.
reporting needs. In ODS, 9. Data may need to be selected to increase information density. Some data may not show
:ferrcd for routine activities much variability, because it was not properly recorded or for any other reasons. This data
may dull the elTccts of other differences in the data and should be removed to improve the
information density of the data.
~ned for a particular line of b. Describe some key steps in data visualization. (08 Marks)
nt data mart, data can collect Ans. Data has been described as the new row material for business and the ..oil of the 21" century.
The vol1tme of data used in business, research and technological development is massive
est practices in implementing and continues to grow. For instance at Elsevier, there are about 700 million articles per year
downlondcd from ScienceDircct, 80.000 institution profiles on Scopus, 13 million researcher
Top management should be profiles on Scopus and J million researcher profiles on Mendeley. II becomes harder and
nvestment (ROI) should be harder for a user to grab a key message from this universe of data.
ness professionals. The OW That's where datn visualization comes in: summarizing and presenting large data in simple
work. It is of\en much more and easy-to-understand visualizations 10 give readers insightful information.
105
VIII Set111 (CS'E/ISE)
EE 933 30 7
FF 676 35 6
CG 1411 128 13
HH 5116 132 38
JJ 215 7 2
KK 3833 122 50
LL 1348 15 7
MM 1201 28 13
Table 6.1: Raw Performance Data
To reveal some meaningful pattern, a good first step would be to sort the table by Product
The number of orde
revenue, with highest revenue first. We could total up the values of Revenue, Orders, and
Sales persons for all products. We can also add some important ratios to the right of the table the revenue is widcl
number of orders.
(fable 6.2).
Product Revenue Orders SalesPers Rev/Order Rev/Sales P Orders/Sales P
AA 9731 131 23 74.3 423.1 5.7
HH 5116 132 38 3S.S 134.6 3.5
KK 3333 122 50 31.4 76.7 2.4
GG 14 11 128 13 11.0 108.5 9.8
LL 1348 15 7 89.9 192.6 2.1
MM 1201 28 13 42.9 92.4 2.2
cc 992 32 6 31.0 165.3 5.3
EE 933 30 7 31.1 133.3 4.3
Therefore, the ord
FF additional data is m
676 35 6 19.3 112.7 S.8 into 4 sizes: Tiny, s
BB 3SS 43 8 8.3 44.4
JJ 21S 7
5.4 Product 1
2 30.7 107.5 3.S AA
DD 12S 31 4 4.0 31.3 7.8 1-fH
Total 25936 734 177 KK
35.3 146.5
Tttble 6.2: Sorted data witlt .. 4.1
GG
There are too many n be . • adtlttt011al raltos
. . um rs on this table to visualize a t els . LL
m different scales so plouing them on th h ny ren m them. The numbers are
n be . e same c art would not be E MM
um rs are m thousands while the SalesP be easy. .g. the Revenue
or double digit. ers num rs and Orders/SalesPers are in the single cc
~ne_ could start by visualizing the revenue as . EE
significantly frorn the first product to th ( ~ pie-chart. The revenue proportion drops FF
It is interesting to no1e that lhe to 3 odenext. Figure 6.1 ).
p pr ucts produce almost 750,
70 ofth e revenue.
BB
JJ
DD
Total
RevfflUof 1hare by
Products
38
2
so
7 ,u . .... ... , ..'O •
13
'o sort the table by Product Figure 6.1: R,mm11e Sitare by Proc/11ct
The number of orders for each product can be plotted as a bar gmph . This show~ thm while
of Revenue, Orders, and the n:venue is widely different for the top four products, they have approximately the same
ios to the right of the table number of orders.
O rdors by Product
,ales P Orders/Sales P
.3.1 5.7
.7
08.5
2.6
3.5
2.4
9.8
2.1
2.2
5.3
II I. III
Figure 6.2: Orcll!fS by Pro,l11cts
I I-I
Therefore, the orders data could be investigated further to sec order patlcms. Suppose
4.3 additional data is made available for Ordi:rs by their size. Suppose the orders are chunked
5.8 into 4 sizes: Tiny, Small, Medium, and Large. Additional data is shown in Table 6.3.
5.4
Product Total Orders Tiny Small Medium Large
AA 131 5 44 70 12
3.5
HII 132 38 60 30 4
7.8 KK 122 20 so 44 8
4.1 GG 128 52 70 6 0
tius L,L 15 2 3 s 5
in them. The numbers an: MM 28 8 12 6 2
t be easy. E.g. the Revenue
cc ~,
.)~ 5 17 10 0
rs/SalesPers are in the single
EE 30 6 14 10 0
revenue proportion drops FF 35 10 22 3 0
BO 43 18 25 0 0
% of the revenue. JJ 7 4 2 I 0
DD 31 21 10 0 0
Total 734 189
.. 329 185 31
711ble 6.3: Aclc/111011al ''"'" Oil Oft/er M:es
figure 6.3 is a stacked bar graph that shows the p.:rcentage of Orders by size for each product.
This chart (Figure 6.3) brings a different set of insights. fl shows that the product HH has
107
VIII Sem, (CSE/ISE) 'Blg,-V~A~ C'BCS·M~Q~
a larger proponion of tiny orders. The products at the far right have a large number of tiny 3. Cltoose approprlu)
orders and very few large orders. it could be presented
4. Tlte data set ,·011l
Product Orders by S ize
0
is not necessarily bett
'j S. Tl,e vlsun/lu,tlon
or targets wilh which
t:-; "'j 6. Tl1e numerical du
#-o
'j I I I sI I I I I I I I I
~ ~ Ill
person were plotted
choices.
J; f i '11 'o 1. l/ig/1-lellf!/ visua/1
Ill
or targets with which to compare the results.
6. The numerlcal data may 11ud to 1w bhuwtl .,_ 11 few t·ategorles. E.g. the orders per
person were plotted as actual values, while the order sizes were binned into 4 categorical
choices.
7. lligll-level 11isuull1.utio11 could be backedby more tlttloiled analysis. For the most significant
results, a drill-down may be required.
8. There may be need to present 11ddlliona( textutll illfom,ot/011 to tell 1hc whole s1ory. for
example, one may requir~ noteS' lo explain some extraordinary results.
r Slt.t
Modulc -4
What Is pruning? Whal are pre-pruning and post-pnaaln&? Why chOOH one over the
persons. This analysis could be other? (08 Marks)
;alesperson. There could be two Pruning : TI1e tree could be trimmed to make it more balanced and more easily usable.
and the other for the revenue The pruning is often done after the tree is constructed. to balance out the tree and improve
~easures on the same graph to usability. The symptoms of an overfilled tree are a tree too deep, with too many branches,
e two data have different scales. some of which may reflect anomalies due to noise or outliers. Thus, the tree should be pruned.
l. There are two approaches lo avoid over-fitting.
• Pre-prunins mcons 10 halt the tree construction early, when certain criteria are met. The
downside is that it is difficult to decide what criteria to use for halting the construction,
because we do not know what may happen subsequently, ifwe keep growing the tree.
• Post-pruning: Remove branches or sub-trees from a "fully grown" tree. This method is
commonly used. C4.S algorithm uses a statistical method to estimate the errors at each node
--1 •- for pruning. A validation set may be used for pruning as well.
The most popular decision tree algorithms are CS, CART and CHAID (Table 7. 1)
Tuble 7.1: Co111purl11g popular Deds/011 Tree ul,:orllltno
Decision Tree C4,S CART CIIAID
llerati\e Classification and r,:gr,:ssion Chi-square automatic
~ product Full nami,
her. One line shows the revenue Dichotomi~r (103) tl\."\.'S interaction d.:1......1or
r salesperson. It shows that the Adjusted significanc-:
Basic algorithm Hunt's algorithm Hunt's algorithm
2.1 orders per salesperson. The h:stin2
for each for the products. The £>.:\eloper Ross Quinlan Bn:mman Gordon KDSS
at just 30. And thus additional When do:vclor>cd 1986 1984 1980
Classification and regn:ssion Classification and
Tyjl(s of11'\l\:S Classification
trees rcl!Jc:ssion
(0-& Marks)
Serial Trec growth and tree Tree gro"th ond tn:o: pnming Trt.:1: gro" lh and Ire:.:
ing are some key requirements im11kmcnt;1tion orunin2 I orunin11
Discreh: and
· s some understanding of the Non-normal data also
Tyl'( of data continuous; Discr.:tc: and continuous :acc.:plc..-U
in a business setting, one may incomnl<.1,: da1a
uctivity.
sorted by numerical variables,
109
VIII Setn, (CSE(ISE) t3i.g,V~A~ cacs . Model,Q
Binary splits only; cle, er
Multiway splits as
T) p,:s of splits Multiway splits surrogatc splits to r.:duc.: trcc dc:fault
depth
Spliuing criteria Information gain Gini coclTicicnt, and oth<'rs Chi-squnrc lCSl
Cle, er bouom up
T.-.:.:s can l>l:comc , cry
Pruning criteria technique avoids Remove weakest Links first
large
overfilling
Popular in market $3,50,000
Publicly availablc in most rcs.:arch, for
Implementation Publicl) a, ailabh: S3,00,000
packages SC'1.mcntation
s2.so.000
b. Using the data that follows, create a regression model to predict a house price Crom the
size or the house. llcre arc samrle house ttata: (08 Marks) i $2,00,000
110
CBCS - Mo-d,cl,Q~Pct.per - 3
$168.500 1,840
Multiway split, a~ $180.-lOO 1.720
default Sl~6.:?00 1,660
S288.HO 2,405
Chi-square test $156,750 1,525
$202,100 2,030
Trees can bccom~ very $256,800 2,240
large
$3,50,000
Popular in market
research, for $3,00,000
s.: m.:ntation $2,S0,000
~
ct a house price from the S: $2,00,000 +-::,-+-::~H:.it7~~~-i-t
(08 M arks) :,c
i $1,50,000 • House Price
$50,000
$0
0 1000 2000 3000
Size (Sq ft)
Figure 7. I Scatter plot and regression equal ion between House price and house size
The two dimensions of (one predictor, one outcome variable) data can be plotted on a scatter
diagram. A scatter plot with a best-fitting line looks like the graph that follows (Figure 7.1).
Visually, one can see a positive correlation between house price and size (sqfl). However, the
relationship is not perfect. Running a regression model between the two variables produces
the following output (lruncated).
Rt>&rtssion Statistics
Multiple r 0.891
r2 0.794
Cocllicicnts
ollows. y is the dependent Intercept -54,191
nt variable, or the predictor Size (sql\) 139.48
1• .r2, ...) in a regression It shows the coefficient of correlation 1s 0.891. ,2, the measure of total variance explained
in the regression equation. by the equation, is 0.794, or 79 percent. That means the two variables are moderately and
positively correlated.
Regression coefficients help create the following equation for predicting house prices.
House Price (S) = 139.48 x Size (sqn) - 54,191
This equation explains only 79 percent of the variance in house prices.
Suppose other predictor variables are made available, such as the number of rooms in the
house, it might help improve the regression model. The house data now looks like this:
llouse Prict> Si:i~ (sqrt) # Rooms
$229,500 1.850 4
$273,300 2,190 s
$247,000 2,100 4
111
VIII Sem, (CSE/ ISf)
C3CS - ~~Que.,stum,,p_
$195,100 1,930 3 The predicled Yalues shoul
$261.000 2.300 4 able lo predict the actual v
$179,700 1,710 2 10 fine-tune and improve tH
$168,500 1,550 2
$234,400 , 1,920 4
$161!,SOO 1,840 2 What makes a neural neh
$180,400 1,720 2 learning tasks?
Supervised Learning
$156,200 1,660 2
• Training data includes
$288,350 2.405 s • For some examples the
$156,750 1,525 3
model during the learning
$202,100 2,030 2
• The construction of a pro
$256,800 2.240 4 • These method~ are usual
While it is possible to make a three-d1mc:ns1onal scatter plot, one can alternatively examine • Have lo be able lo gene
the correlation matrix among the variables. without knowing a priori
House Price Size (sq R) # Rooms Supervised learning is bl
llouse Price I classification already assi1
Size (SQ ft) 0.891 I Perceptron (MLP) models
Rooms 0.944 0.748 I I. One or more layers of
It shows that the house pnce has a strong corrclatton with number of rooms (0.944) as well. network that enable the m:
Thus, it is likely that adding this variable: lo the regression model will add to thi: strength of 2. The nonlinearity reflect
the model. Running a regression model between these three variables produces the following 3. The interconnection m
output. These characteristics alonE
Learning through training
Rearesslon Statistics
algorithm. The error co
Multiple r 0.984 output samples and finds
0.968 the desired output and a
rJ
the product of the error s1
Coefficients principle, error back prol)i
lnh:rcepl 12.923 Forward Pass: llere, in~
forward, neuron by neuro
Siic(sq0) 65.60 as output signal: y(n) =
Rooms 23.613 v{n) -l: w(n)y{n). The o
It shows the coefficient o corrclatton of this regression mod el is 0.984. r2, the total variance desired response d(n) 1111
explained by the equation, is 0.968, or 97 percent. That means the variables are positively network during this pass 1
and very strongly correlated. Backward Pass: The e
Adding a new relevant variable has helped improve the strength of the regression model. propagated backward thr
Using the regression coefficients help~ 1.1 ealc the following equation for predicting house each layer and allows the
prices. with the delta rule as:
House rrlce ($) = 65.6 x Si:te (sqR) + 23,613 x Rooms + 12,924 t.w(n) = 11 • 6(n) • y(n).
Thi~ equation sho"s a 97-percent goodness-of-fit with the data, which is very good for This recursive computat'
business and economic dala. There is always some random variation in naturally occurring for each input pallem till
business data, and it is not desirable to overfit the model to the data. Supervised learning para
This predictive equation should be used for future transactions. Given a situation that follows non-linear problems sud
it will be possible 10 calculate the price of the house with 2,000 sqft and 3 rooms. ' Unsupervised Learninil
• The model is not provil
House Price Size s rt # Rooms
•. Can be used to cluste
? 2000 3
House Price (S) - 65.6 x 2,000 (sqR) 1 23,613 x J -t 12,924 - $214,963
112 Swi.tAf' e.- ~ W '
The predicted values should be compared to the ac:aal values to see how close the model is
able 10 predict the actual value. As new data pocnlS become available, there are opportunities
to fine-tune and improve the model.
OR
What makes a neura l network nrsatlle enough for supervised 11s well as non-supervised
lenrning h1slui? (08 Marks)
Supervised Learning
• Training data includes both the input and the desired results.
• For some examples the correct results (targets) are known and are given in input to the
model during the learning process.
• The construction of a proper training, validation and test set (Bok) is crucial.
• These methods are usually fast and accurate.
e can alternatively exami • Have to be nble to generalize: give the correct results when new data are gi\C0 in input
without knowing a priori the target.
# Rooms Supervised learning is based oil ·training a data sample from data sou•-cc with correct
classification already assigned. Such techniques are utilized in feeJfoa ,Hird or Multilayer
Perceptron (MLP) models. These MLP has three distinctive character isucs:
I. One or more layers of hidden neurons that are not part of the input or output layers of the
network that enable the network to learn and solve any complex problems
ber of rooms (0.944) as well;
2. The nonlinearity reflected in the neuronal activity is dilfen:ntiahlc and,
clel will add to the strength 3. The interconnection model of the network exhibits a high degree of connectivity
dobles produces the followi
These characteristics along with learning through training solve difficult and diverse problems.
Learning through training in a supervised ANN model also called as error back-propagation
algorithm. The error correction-learning algorithm trains the network based on the input-
output samples and finds error signal, which is the difference of the output calculated and
the desired output and adjusts the synaptic weights of the neurons that is proportional to
the product of the error signal and the input instance of the synaptic weight. Based on this
principle, error bock propagation learning occurs in two passes:
Forward Pass: Herc, input vector is presented to the network. This input signal propagates
forward, neuron by neuron through the network and emerges at the output end of the network
as output signal: y(n) = 'P(v(n)) where v(n) is the induced lo..:al field of a neuron defined by
v(n) • l: w(n)y(n). The output that is calculated at the output layer o(n) is compared with the
desired response d(n) and finds the error e(n) for that neuron. The synaptic weights of the
s 0.984. r2, the total variance.
network during this pass are remains same.
s the variables are positivelJ Backward Pass: The error signal that is originated at the output neuron of that layer is
propagated backward through network. This calculates the local gradient for each neuron in
of the regression model.
each layer and allows the synaptic weights of the network to undergo changes in accordance
quation for predicting house
with the delta rule as:
6w(n) "' 11 • 6(n) • y(n).
24 This recursive computation is continued, with forward pass followed by the backward pass
ata, which is very good for
for each input pattern till the network is converged .
iation in naturally occurrina Supervised learning parndi~m oran ANN is efficient and finds solutions to several linear and
tlata. non-linear problems such as classification, plant control, forecasting. prediction, robotics etc
iven a situation that follows, Unsupervised Learning
sqft and 3 rooms. • The model is not provided with the correct results during the training.
•. Can be used to cluster the input data in classes on the basis of their statistical properties
113
VIII Se.w (CS'E(ISE) 8(1J'V~A~
only.
• Cluster significance and labeling.
• The labeling can be carried out even if the lnbcls are only a~ailnble for a small number of
objects representative of the desired classes. A scatter plot of IO data
Self-Organizing neural networks learn using unsupervised learning algorithm to identify (Figure 8.1 ). As a bottom
hid<len patterns in unlabelled input data. This unsupervised refers to the ability to learn and intuited.
vrgani.Gc informu1ion without providing :m error sienal to evaluate the potential solution. The points arc distributed
The lack of direction for the learning algorithm in unsupervised learning can sometime circle would represent the
be advantageous, since it lets the algorithm to look back for patterns that ha\e not been distance between the poin
previously considered. The main charactcnstics of Self-Organizing Maps (SOM) are: The three points at the bo
I. It transforms an incoming signal patt..:rn of arbitrary dimern,ion into one or 2 dunensional the other cluster. The two
map and perform this trnnsformation adaptively new centroids.
2. The network represents feed forward stmcturc \\ ith a single computational layer consisting The bigger cluster seems t
of ncun:,,s arnngcd in rows and cc-lumns. sep.irah: r luster. The three
3. At enc 11 st~!_le ot rl'prcscntation, cacl1 input signal is kept in its proper context and, TI1is solution h,1:; rhree cl
4. Ncuro ·s d,:alin~ wtlh closr l} related pieces of information are close together and they However, its centroid is n
comnwn ,ate through ,ynapt1c connections tight-fitting, with a nice ce
The cvm:·utati,•r.al hi)er i~ also called a, competitive layer since the ne1irons in the layer much usefulness.
compe:e with each other to become active Hence, this learning algorithm is called competitive
algori,'111' Unsupervised ahorithm in SOt-.1 works in three phases:
Competition phJse: for ec1ch input pattern x, presented to the network, inner product with
synaptic w.:ight w 1s cJlculoted and the neuron~ in the competitive layer finds a discriminant
function that induce competition among the neurons and the synaptic weight vector that is
•
close 10 the input vector in the Euclidean distance is announced ns winner in the competition.
That neuron is called best matching neuron, i.e. x = arg min ~ x • w ~. •
Cooperative phase: the\\ inning neuron determines the center of a topological neighborhood
h ofcooperating neurons. This is performed by the lateral interaction d among the cooperative
neurons. This topological neighborhood reduces its size over a time period.
Adaptive phase : enables the winning neuron and its neighborhood neurons to increase their
individual values of the discriminant function in relation to the input pattern through suitable
synaptic weight adjustments, t:.w = 11h(xXx - w). n ~
Upon repeated prcsentntion of the training patterns, the synaptic weight vectors tend to Fig11u 8.1 /11
follow the distribution of the input patterns due to the neighborhood updating and thus ANN
learns without supervisor [2].
Self-Organizing Model naturally represents the neuro-biological behavior, and hence is used
in many real world applications such as clustering. speech recognition, texture segmentation,
vector coding etc
b. ldentiry lhe clusters from the dala as shown in Dataset below table.Determine the
number or cluslen and the center points or those cluslen. (08 Marks)
X y
2 4
2 6
s 6
4 7
8 3
s 6
5 2
114
for a small number of ~
Ans. A scatter plot of 10 data points in 2 dimensions shows lhem distributed fairly randomly
algorithm 10 identify tFigure 8.1 ). As a bottom-up technique, the number of clusters and their centroids can be
he ability to lo:mn and intuited.
the potential solution. The points arc distributed randomly enough that it could be considered as one cluster. The
earning can sometime circle would represent the central point (centroid) of these points. However, there is a big
ns that have not been distance between the points (2,6) and (8,3). So, this data could be brol,.en into two clusters.
aps (SOM) are: The three points at the bottom right could form one cluster and the other seven could form
o one or 2 dimensional the other cluster. The two clusters would look like this (Figure 8.2). The circles will be the
new centroids.
lalional layer consisting The bigger cluster seems too far apart. So. it seems like the four points on the top will form a
separat.: cluster. The three clusters could look like this (Figure 8.3).
per context and, This solution has three clusters. The cluster on the right is for from the other two clusters.
close together and they However, its centroid is not loo close to all the data points. The cluster at the top looks very
tight-filling, with a nice centroid. I he tliird cluster, at the left. is spread out and may not be of
the neu10ns in the layer much usefulness.
thm is called competitive
• 4, T
• ~- 7
vork, inner product with
ayer finds a discriminant • 2. 6
• !,, 6
• 6. G
topological neighborhood
n d among the cooperative
• 6, 3
• 8, 3
e period. • s. :l
115
VIII Se..111 (CSE(ISE) CBCS • Mod.et,Q~F
ti
.,.,
their old (shaded!_values
n
Fl,:ure8.4 R
0 9
Figuu ll.1Dfridl11,: l1110 two clusters (cemroldl 1how11 a.s tllidc dots)
Ii
0 Ii 8 9
Figure I.J Divldi11g /1110 three clusters (ce11troids show11 a.s thick dot1)
This was an exercise in producing three best-filling cluster definitions from the given data.
The right number of clusters will depend on the data and the application for which the data
would be used.
K-Means Al&orithm for Clustering
K-means is the most popular clustering algorithm. It iteratively computes the cluste~ and
their centroids. It is a top-down approach to clustering. Starting with a given number of K
clusters, say 3 clusters; thus, three random centroids will be created as starting points of the
centers of three clusters (Figure 8.4). The circles are initial cluster centroids.
s,~p I: For a data point, distance values will be from each of the three centroids. The data
point will be assigned to the cluster with the shortest distance to the centroid All dat."I points
will thus be assigned to one data point or the other. The arrows from each data element show
the centroid that the point is assigned to (Figure 8.5).
Step 2: The centroid for each cluster will now be recalculated such that it is closest to all the
data points allocated to that clw.ter. 111c dashed arrows show the centroids being moved from 0
116
cacs· - Model, Q ~ Pc;q,iw - 3
their old (shaded) values to the re~ised new values (Figure 8.7).
• 4, 7
• s. 7
• • 2, 6
• S,6
• 6, 6
• ,, . ......
• 5, 2
• 6,,
- ♦ 8, 3
•
0
" " ..
Fi,:11re 8.4 Ru11domly assi1:11i11g t//ree ce11troi<lsfor t//ree tlflta clusters
r--
8 9
2. 4
l--~---~--------.J
0
2 a 4 S 6 7 R
Figure 8.5 Assig11i11g data poi11ts to closest ce11troi,/
8 9
~
.l~.8.3
0 l 2 3 4 5 6 7 a 9
118
'BC.S,,VamtA~ CBCS - Mod®Q~PRJ)U • 3
closest to it (Figure 8.7). Herc is the pseudocode for impleN '""" & 1 algorithm. Algorithm K-Means (K
e cluster until finally the number of clusters, D list of data ~ )
omputcd by th1s algorithm I. Choose K number of random data pol-.a-.. ccaln>ids (cluster centers).
2. Repeat till cluster centers stabilize:
id (6.5,4.5), a 2-datapoint a. Allocate each point in D to the nearal all. oida.
id (3.5,3). b. Compute centroid for the cluster usial II,... ■ lbc cluster.
lly. This is a function of the
in the visua I cxen:ise were Selecting the Number or Clusters
. The K-means clusterinl The correct choice of the value ofK is often ambiguous. It depends on the shape and scale of
w random centroid startina the distribution points in a data set and the desired clustering resolution of the user. Heuristics
zc. tfthc cluster dcfinitiom are needed to pick the right number. One can graph the percentage of variance explained by
osen is too high or too low. the clusters against the number of clusters. The first clusters will add more infonnation, but at
some point the marginal gain in variance will fall, giving a sharp angle to the graph, looking
like an elbow.
At that elbow point, adding more clusters will not add much incremental value. That would
be the desired value of K (Figure 8.9).
20
Elbow for KMeans clust ertno
----,.... --,- -r-- -·
0.4
7 I 9
: !;1.
r. :.!i&I
-1
C.tegon,lno N ~ beforehand is not nee
3. SVMs provide a g
ofa Gaussian kernel)
,... ,__... _,.·
··--··-
-""·- ......-
generalization grade,~
4. SVMs deliver a u
I I1 ---~- j advantage compared
local minima and for
Senument ~tys.i.s
5. With the choice ofi
stress on the similari
---
of two companies is,
company, the values 0
of the training sample
__,,___ M•d1~I 0 1Agno•\ &
classified according to
Here arc some exam
monotonicity. One c
estimated with a lin
expected one occordin
l~l~:
The reason for that
121
VIII Se,m, (CSf/ISf)
which the financial ratios of non-eligible companies should change, in order to reach
eligibilil)'.
The PD can represent a third dimension of the graph, by means of isoquants and colour 10. •· What are the two majo
coding. The approach chosen for the estimation of the PD can be based on empirical estimates Ans. The Web works through
or on a theoretical model. can create a hyperlink to
Since the relation between score and PD is monotone, a local linearization of the PD can be or self-referral nature 0
calculated for single companies by estimating the tangent curve to the isoquant of the score. structure of Web pages ~
For single companies this can offer interesting information about the factors inftuencing their pages. There are two
financial solidity. I. H11bs: These are pag
In the figure below the PD is estimated by means of a Gaussian kemelS on data belonging to gathering point, where ~
the trade sector and then smoothed and monotonized by means of a Pool Adjacent Violator com, or government sit
algorithm.6 The pink curve represents the projection of the SVM threshold on a binary space com and yelp.com could
with the two variables K2J (net income chnnge) and K24 (net interest ratio), whereas all 2. A11tl1orltla: Ultimat
other variables are fixed at the level of company j. The blue curve represents the isoquant for complete and authorita
the PD of companyJ, whose coordinates are marked by a triangle. information, news, advi
Figure, Graphical Visualization or the SVM Threshold and or ■ Loe■I Llnearization or of inbound links from
the Score· page for expert medical
Function: Example of a Projection on a Bi-dimensional Graph with PD Colour Coding news.
Probability of Default
Write short notes on w
Web Minin& Al&oritbm
Hyperlink-foduced Topi
as being hubs or authori
The most famous and
Google co-founder
its search function. This
web page by counting
number of links, and/or
works in a similar way
with relations to more
higher status. PageRank
a Google: Search query.
many ways and the lates
... of the algorithm and m
·228 ,o, '38 788 110 stan~rd el~ments that "i
1
Nol Income Ct,;,na-, 1<21 website. This process is
The grey line corresponds to the linear approximation of the score or PD function projection c. Esplain the Pnctlcal ,
for company j. One interesting result of this graphical analysis is that successful companies between Social Networ
with a low PD often lie in a closed space. This implies that then: exists an optimal combination
area for the financial ratios being considered, outside of which the PD gets higher. If we Ans. PRATICAL CONSID
consider the net income change, we notice that its influence on the PD ii; non-monotone. Both Network Siu :Most SN
too low or too high growth rates imply a higher PD. This may indicate the existence of the network can be very chal
optimal growth rate and suggest that above a certain rate a company may get into trouble; of the number of nodes.
especially if the cost structure of the company is not optimal i.e. the net interest ratio is too possible pairs of links.
high. Out if a company lies in the optimal growth zone, it can also afford a higher net interest
ratio.
122 s.-+M ~ ~
B"if'Vam,A~
thange, in order to reach OR
s of isoquants and colour What are the two major ways that a website can become popular? (04 Marks)
. sed on empirical estimates The Web works through a system ofhyperlinks using the hypertext protocol (http). Any page
can create a hyperlink to any other page, it can be linked 10 by another page. The intertwined
arization of the PD can be or self-referral nature of web lends itself to some unique network analytical algorithms. The
structure of Web pages could also be analyzed 10 examine the panem of hyperlinks among
0 the isoquant of the score.
the factors influencing their pages. There are two basic strategic models for successful websites: Hubs and Authorities.
I. H u bl ': These are pages with a large number of interesting links. They serve as a hub, or a
gathering point, where people visit to access a variety of information. Media sites like Yahoo.
crnelS on data belonging to
com, or government sites would serve that purpose. More focused sites like Traveladvisor.
of 3 Pool Adjacent Violator com and yelp.com could aspire to becoming hubs for new emerging areas.
threshold on a binary space 2. A11tl1oriti~s: Ultimately, people would gravitate towards pages that provide the most
interest ratio), whereas all complete and authoritative information on a particular subject. This could be factual
e represents the isoquant for information, news, advice, user reviews etc. These websites would have the most number
e. f of inbound links from other websites. Thus Mayoclinic.com would serve as an authoritative
or 8 Local Unearlzation o page for expert medical opinion. NYtimes.com would serve as an authoritative page for daily
news.
ith PD Colour Coding
b. Write short notes on web mining algorithm. (04 Marks)
... Ans. Web Minln& Al1orithm1
"' Hyperlink-fnduced Topic Search (HITS) is a link analysis algorithm that rates web pages
as being hubs or authorities. Many other HITS-based algorithms have also been published.
The most famous and powerful of these algorithms is the PageRank algorithm. Invented by
Google co-founder Larry Page, this algorithm is used by Google to organize the results of
its search function. This algorithm helps determine the relative importance of any particular
web page by counting the number and quality of links to a page. The websites with more
number of links, and/or more links from higher-quality websites, will be ranked higher. It
works in a similar way as determining the status of a person in a society of people. Those
with relations to more people and/or relations to people of higher status will be accorded a
higher status. Page Rank is the algorithm that helps determine the order of pages listed upon
a Google Search query. The original PageRank algorithm formuation has been updated in
many ways and the latest algorithm is kept a secret so other websites cannot take advantage
of the algorithm and manipulate their website according to it. However, there are many
standard elements that remain unchanged. These elements lead to the principles for a good
110
website. This process is also called Search Engine Optimization (SEO).
ore or PD function projection c. Explain the Practical consideration or Social network analysis. Give the dlfl'erence
s is that successful companies between Social Network Analysis vis Tn ditlonal Data Analytics
ell.ists an optimal combination (08 Marks)
ich the PD gets higher. If we Ans. PRATICALCONSIDERATION:
the PD is non-monotone. Both Network Size :Most SNA research is done using small networks. Collecting data about large
indicate the existence of the network can be very challenging. This is because the number of link is the or1er of the square
ompany may get into trouble; of the number of nodes. Thus, in a network of 1000 nodes there are potentially I million
i.e. the net interest ratio is too possible pairs of links.
lso afford a higher net interest
123
VIII Sem, ( CSf/ISf)
Gathering Data: Electronics communication records (email, chat, etc.) can be han1essed to Eight Semester B.E.
gather social network data more easily. Data on the nature and quality of relationship need to
be collected using survey documents. Capturing and cleansing and organizing the data can
take a lot of time and effort , just like in a typical analytics project.
Computation And Visualization: Modeling large networks can be computationally Note: Amw~r any FIVEfu
challenging and visualizing them also would require special skills . Big data analytical tools
may be needed to compute large networks.
Dynamic Networks: Relationships between nodes in a social network can be fluid. They can How docs the Radoop
change in strength and functional na16urc. For Example, there could be multiple relationships an example.
between two people... they could simultaneously be coworker, coauthors, and spouses, The In Hadoop, MapReduce
network should be modeled frequently to see the dynamics of the network. individual tasks that can
Table 10. I Social Network Analysis vis Traditional Data Analytics tasks can be joined toget
Dimension Social Network Analysis Traditional Data Mining MapReduce consists of
Nature of learning Unsupervised learning Supervised and Unsupervised Map Function - It take
learning individual elements are
Analysis of goals Hub nodes ,important nodes, Key decision Example - (Map functio~
and sub-networks centroids Input Set of data
Dataset structures A graph of nodes and (directed) Rectangular data of variables
links and instances
Convert into~
Analysis techniques Visualii.ation with statistics; Machine learning, statistics Output set of data
iterative graphical computation
(Key, Value)
Quality measurement Usefulness is key criterion Predictive accuracy
classification techniques Reduce Function _ Take
tuples into a smaller set
Ex.ample - (Reduce funct
Input
(output or Map Seto
function)
Conv
Output small
tuple
124
Eight Semester B.E. Degree Eu■imdioll. CBCS - June / July 2019
t. etc.) can be harnessed 10
ality of relationship need to BigDataAllalytlcs
and organizing the data can e: 3 hrs. Max. Marks: 80
ct. ty Note : Am,wer a11y FIVE full question:,, sel«tbti ONE/11// q11estion/rom eacll module.
can be computational
is . Big data analytical tools
Module- I
How does the Hadoop MapReduce Data flow work or a word count program? Give
ork can be fluid. They can an example. · (08 Marks)
Jld be multiple relationshiP5 In Hadoop, MapReduce is a computation that decomposes large manipulation jobs into
:oaulhors, and spouses, The individual tasks that can be executed in parallel across a cluster of servers. The results of
• network.
tasks can be joined together to compute final results.
Data Analytics
MapReduce consists of2 steps:
itional Data Mining Map Function - It takes a set of data and converts it into another set of data, where
[Vised and Unsupervised individual elements are broken down into tuples (Key-Value pair).
mg Example- (Map function in Word Count)
decision Bus, Car, bus, car, train, car, bus, car, train, bus,
Input Set of data
~ids TRAIN,BUS, bus, caR, CAR, car, BUS, TRAIN
angular data of variables (Bus, I), (Car, I), (bus, I ). (car. I). (1rain, I),
stances Convert inlo another
(car. I), ( bus, I). (car, I), (train, I), (bus. I).
ine learning, statistics Output set of data
(TRAIN,I),(BUS, I), (buS,I), (caR, I), (CAR, I),
(Key,Value)
(car,I), {BUS,1), {TRAIN, l)
ictive accuracy Reduce Function - Takes the output from Map as an input and combines _those data
~fication techniques tuples into a smaller set of tuples.
Example - (Reduce function in Word Count)
(Bus, I), (Car, I), (bus, I), (car, I), (train, I),
Input Set ofTuples (car, I), (bus,I), (car,l), (train,!), (bus,I),
(output of Map
function) (TRAIN,l),(BUS,1), (buS, I), (caR,I), {CAR,1),
(car,I), (BUS,1), (TRAIN,I)
Converts into (BUS,7),
Output smaller set of (CAR,7),
tuples (TRAIN,4)
1
VIII Setr11 ( CSE I IS£)
j.setJarByClass(WordC
j.setMapperClass(Map
lnl-"'l Bu, ~
~1
~ -
1-!.:::.1.l...J:
Out1>~l j .setRcducerClass(Red
C•r 1.
j.setOutputKeyClass(Tc
;,
l lu<Cu Jron t f •••n l
@:] I
,[oTD
eu1 l j.setOutputValueClass(I
-u
au, Cat , , 11n CAR l
Ttt11""1 flll'lt C•r ,~ -{ fro , Plant ~
~ JAAIN l FilelnputFormat.addlnp
lo"1lo<PLO.. ,
1 TAAIN I ,[rAAiN}J- ~~f 2 FileOutputFormat.setO
t lu< &u, Plo"tJ\ 1_:AAIN
s
I
Systcm.exit(j.waitForC
I l •tA~( :
PIAN[ 1J ,u...t 2)
}
public static class Ma
lntWritablc>{
Reducing Combining
public void map{Long
Splitting Mappina Intermediate
Splllllng lnterruptedl::xception
{
F11, Workflow of ~\lplleducinc Srring line = value.toStr
Workftow of MapReducc consists of S steps: String(] words=Jinc.split
1. Splitting - The splitting parameter can be anything, e.g. splitting by space, comma, for(String word: words )
semicolon, or even by a new line ('\n'). {
2. Mapping - as explained above. Text outputKey .. ne
3. Intermediate splitting - the entire process in parallel on different clusters. In order to lntWritablt: outputValu
group them in "Reduce Phase"1he similar KEY data should be on the same cluster. }con.wrile(outputKey, 0 ~
4. Reduce - it is nothing but mostly group by phase.
5. Combining- The last phase where all the data (individual result set from each cluster) }
is combined together to form a result. }
package PackagcDemo; public static class Red
import java.io.lOException; lntWritable>
import org.apache.hadoop.conf.Configurntion; {
import org.apache.hadoop.fs.Path; public void reduce(Text
import org.apache.hadoop.io.lntWritablc; IOException, Interrupted
import org.apache.hadoop.io.LongWritable; {
import org.apache.hadoop.io.Text; int sum= O·
import org.npache.hadoop.mnpreducc.Job; for(lntWritablc value .
import org.apache.hndoop.maprcduce.Mapper; { .
import org.apachc.hadoop.mapreduce.Reducer; ;um += value.get();
import org.apache.hadoop.mapreduce.lib.input.FilelnputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; } con.write(word, new Int
import org.apachc.hadoop.util.GenericOptionsParser;
public class WordCount { }
public static void main(String 0 args) throws Exception }
{
Co'.'f\guration c = ncw Configuration(); b. Briefly explain IIDFS Na
Stnn~0 filcs=new GenericOptionsParser(c,args).getRemniningArgs()· and Backups.
Path input- new Path(files[O)); • Ans. HDFS Federation impro
Pat~ output=new Path(filcs[ I]); of namespace and storage
Job J=new Job(c,"wordcount''):
multiple namespnccs in '
2
· JIM\el(Jvly 2019
j.setJarRyClass(WordCount.class);
j.setMapperClass(MapForWordCount.class);
j.setRcducerClass(ReduceForWordCount.class);
j.setOutputKeyClass(Text.class);
ous) j.setOutputValueCla:;s(lntWritable.class);
CAR 2 Filelnputformat.addlnputPath(j, input);
TP.AJN 2
~ TR.\IN 2 { PlAAf 2 FileOu1putformat.sctOutputPath(j, output);
System.exit(j. waitForCompletion(true)?0: I);
} .
PlAI;[ 2 public static class MapForWordCount extends Mapper<LongWritable, Text, Text,
lntWritable>{ .
public void map(LongWritable key, ~xt value, Context con) throws IOExcept1on,
Combining lnterruptedExccption
{
String line= value.toString();
String[] words=linc.splil(",");
for(String word: words )
b. Brielly explain HDFS Name Node federation, NFS Gateway, snapshots, Checkpoint
and Backups. (08 Marks)
'ningArgsO;
HDFS Federation improves the existing HDFS architecture through a clear separation
of namespace and storage, enabling generic block storage layer. It enables support for
multiple namespaces in the cluster to improve scalability and isolation. Federation
3
VIII Sem, ( CSE I ISE) cs -JIM'\£,/Jul;y 20
also opens up the architeclure, expanding the applicability of J IDFS cluster to new HDFS snapshots ares·
implementations and use cases. dfs-snapshot comman
r·· ··· · ··· ····· ··~ r---------------• -----------------.
: :~ •: :NN-n : . system. They offer the
l~
' NN-1
I • Snapshots can be t
I '
I
I '' • Snapshots can be
I
I
I
recC1Very.
I ••
I • Snapshots creation
I
I • Blocks on the Dat
t••·
I
I
list and the file si
there are duplicat
,________________ Jj • Snapshots do not
A Checkpoint Node
changes are just writ
NameNode runs for
because more chang
the metadata. The C
NameNode and me
HDFS has two main layers: uploads the result to
I . Namespacc There was also a si
• Consists of directories, fiks and blocks. the "upload to Nam
• It supports all the namespace related file system operations such as create, delete. the Secondary Name
modify and list files and directories. Secondary NameNod
2. Block Storage Service, which has two parts: Backup Node The •
• Block Management (performed in the Namenode) Node, but is synchr
a. Provides Datanodc cluster membership by handling registrations, and periodic periodically because
heart beats. the current state in-1
b. Processes block reports and maintains location of blocks. checkpoint.
c. Supports block related operations such as create, delete, modify and get block
location.
d. Manages replica placement, block replication for under replicated blocks, and What do you unde
deletes blocks that arc over replicated.
• Storage - is provided by Datanodes by storing blocks on the local fi le system and Hadoop File Syste
allowing read/write access. commodity hardwa
The prior IIDFS architecture allows only a single namespace for the entire cluster. In that and designed using I
configuration, a single Namenode manages the namespace. HDFS Federation addresses HDFS holds very la
this limitation by adding support for multiple Namenodes/namespaces ro HDFS. data, the fi les are st
The NF~ ~11teway for HDFS _allows clients to mount I fDFS and interact with it through fashion to rescue th
NFS, ~s 1f 11 were pa!' of their local file system. The Gateway supports NFSv3. After makes applications
mounting HDFS, a client user can perform the following tasks: Features of IfDFS
Brows~ the I IDFS file system through their local file system on NFSvJ client-compatible a. It is suitable for
oi>c;ra11ng systems. ·
b. 1ladoop provid
Upload and download files between the HDFS fil . C. The built-in se
Stream data directly to I IDFS throu h th e sy~tem ~nd their local file system.
random write is not supported. g e mount point. File append is supported, but of cluster.
d. Streaming acces
e. IIDFS provides
4
• JI.MU?/( JIALy 2019
HDFS snapshots are similar to backups. but arc created by administrator using the hdfs
dfs-snapshot command. HDFS snapsbo&s arc read-only point-in-time copies of the file
system. They offer the following features.
• Snapshots can be taken of a sub-tree of the file system or the entire file system.
• Snapshots can be used for data backups. protection against user errors. and disaster
recovery.
• Snapshots creation is instantaneous.
• Blocks on the DataNodes are not copied, because the snapshot files record the block
list and the file size. There is no data copying, although it appears to the user that
there arc duplicate files.
• Snapshots do not adversely affect regular HDFS operations.
A Checkpoint Node was introduced to solve the drawb'acks of the NameNode. The
chunges are just written to edits and not merged to fsimage during the runtime. If the
NameNode runs for a while edits gets huge and the next startup will take even longer
because more changes have to be applied to the state to detennine the last state of
the metadata. The Checkpoint Node fetches periodically fsimage and edits from the
NameNodc and merges them. The resulting state is called checkpoint. After this is
uploads the result to the NameNode.
There was also a similar type of node called "Secondary Node'' but it doesn't have
the "upload to NameNode" feature. So the NameNode need to fetch the state from
the Secondary NameNode. It also was confussing because the name suggests that the
Secondary NameNode takes the request if the NameNode fails which isn't the case.
Backup Node The Backup Node provides the same functionality as the Checkpoint
Node, but is synchronized with the NameNode. It doesn't need to fetch the changes
periodically because it receives a strem offile system edits. from the NameNode. It holds
the current state in-memory and just need to save this to an image file to create a new
checkpoint.
OR
What do you understand by HDFS? Explain Its components wilb a neat diagram.
(10 Marks)
Hadoop File System was developed using distributed file system design. It is run on
commodity hardware. Unlike other distributed systems, HDFS is highly faulltolerant
and designed using low-cost hardware.
r the entire cluster. Int HDFS holds very large amount of data and provides easier access. To store such huge
FS Federation add data, the files are stored across multiple machines. These files are stored in redundant
spaces to HDFS. fashion to rescue the system from possible data losses in case of failure. HDFS also
d interact with it throu makes applications available to parallel processing.
supports NFSv3. Aft Features of HDFS
a. It is suitable for 1he distributed storage and processing.
b. Hadoop provides a command interface to interact wilh HDFS.
c. The built-in servers of namenode and datanode help users to easily check the status
their local file S) stem. of cluster.
ppend is supported, bti d. Streaming access to file system data.
e. HDFS provides file pennissions and authentication.
5
VIII Sem, (CSE I I SE)
cacs •JIM\.e/lJ~201
HOFS Architecture
number of replicas of a]
11N Cleta (Name, ...,i1u,,...): and can be changed latt
ltlome/loo/data, 3, ... any time.
The NameNode make~
receives a Heartbeat an~
of a I leartbeat implies t
a list of all blocks on a
•• ■
D ata Nodes
,._ _ _, 1~1
__o_a_t_a_N_o_d_e_s___,J
Na
/use1
/usel
Rack 1
'
Rack2
HDFS Architecture (Components)
Given above is the architecture ofa Hadoop file System. HDFS follows the master-slave OJ
architecture and it has the following elements. 0
Namenode The namenode is the commodity hardware that contains the GNU/Linux
operating system and the namenode software. It is a software that can be run on
commodity hardware. The system having the namenodc acts as the master server and it
does the following tasks -
I!] ~'
• Manages the file system namespace.
• Regulates client's access to files.
• It also executes file system operations such as renaming, closing, and opening files 3. a. Explain Apache Sq
and directories. Marks)
Datanode The datanode is a commodity hardware having the GNU/ Linux operating Ans. Figure I describes the
system and datanode sonware. For every node (Commodity hardware/System) in a two steps. In the first
cluster, there will be a datanode. These nodes manage the data storage of their system. the necessary metada
reduce step) Hadoop j
• Datanodes perform read-write operations on the.file systems, as per client request.
transfer using the me
• They also perform operations such as block creation, deletion, and replication
import must have ace
according to the instructions of the namenode.
The imported data ar
Block Generally the user data is stored in the files of HDFS. The file in a file system for the directory, or t
will be divided into one or more segments and/or stored in individual data nodes. These be populated. By de
file segments are called as blocks. In other words, the minimum amount of data that separating different re
HDFS can read or write is called a Block. The default block size is 64MB, but it can be over by explicitly sp
increased as per the need to change in HDFS configuration. placed in I IDFS, the
b. .B.-ing o ut the con ceplJi ofHDFS block replication, with an example. (06 Marks) Data export from the
Ans. Data Replication HDFS is designed to reliably store very large flh::. across machines in as shown in Figure 2
a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the for metadata. I he ex
last block are the same size. The blocks of a file arc replicated for fault tolerance. The database. Sqoop divl
block size and replication factor are configurable per fi le. An application can specify the push the splits to the
the database.
6
C'8CS · JIM'\R/( J"'½' 2019
number of replicas of a file. The replicatioe . . _ can be specified at file creation time
and can be changed later. Files in HDFS -WIiie-once and have strictly 0111: writer at
, 191lllca1,...):
any time.
Illa,~-·-
The NameNode makes all decisions •& iis aq>lication of blocks. It periodically
receives a Heartbeat and a Blockreport from....,llle DataNodes in the cluster. Receipt
of a Heartbeat implies chat the DataNode at 2 1 « prop1:rly. A Rlockn:port contains
a I ist of all blocks on a DataNode.
Blocl. Reillc1lo11
-
Namenode (Filename, nurnRel)lcas, block-ids, ... )
/users/sameerp/data/pat-0, r:2, {1.3}, ...
/users/sameerp/dala/part-1, r:3, {2,4,5}, ...
odes Datanodes
.. -
Slllrt map-,educe._,_OIC;;;..;..,a,j
--t I
, __
','\
---·
_ _ ,.:t_ _ _ _ _ _ _
I I
I
wordcounl
ERROR
,--......,,_.,...._.,
__,
>CfflaJaM1rCo•
o:
I
I -0 J\
l __ - c -_ _
How do you run Map Reduce a■d M1 FE I ...... Interface (MPI) on YARN
architecture? Discuss. (10 Marks)
Figuro 2
w for Hadoop archltectu
(06 Mara
d manage multiple relat
analysis may require several
tput ofone job serves as the
d manage these work.flows.
ARN manages r~ources for
and control Hadoop jobs
(DAGs} of actio
types of Oozic jo . lld,ntw
:.... ' .• o,f ConWnerII I
· outcome-based decisioo
another cannot happen until
The ResourceManager has a central and global view of aJI cluster resources and,
therefore, can ensure fairness, capacity, and locality are shared across all users.
Depending on the application demand, scheduling priorities, and resource availability,
tch a set ofcoordinatorjobs. the ResourceManager dynamically allocates resource containers to applications 10
ing several types ofHadoop run on particular nodes. A container is a logical bundle of resources (e.g., memory,
apReduce, Pig. Hive, and cores} bound to a particular clust~r node. To enforce and track such assignments, the
and shell scripts}. Oozie also ResourceManager interacts with a special system daemon rwming on each node called
the NodeManager. Communications between the ResourceManager and NodeManagers
zie runs a basic MapReduce are heartbeat based for scalability. NodeManagers are responsible for local monitoring of
an error occurred, the job is resource availability, fault reporting, and container life-cycle management (e.g., starting
and killing jobs). The Resoun:eManager depends on the NodeManagers for its "global
view" of the cluster.
User applications are submitted to the ResourceManager via a public protocol and go
through an admission control phase during which security credentials are vaJidltted
and various operational and administrative checks are perfonned. Those applications
9
VIII Semt (CSE/ I Sf)
that are accepted pass to the scheduler and are allowed to run. Once the scheduler has
enough resources to satisfy the request, the application is moved from an accepted state Version 1 Apal
MapReducc ~
to a running state. Aside from internal bookkeeping, this process involves allocating
a container for the single ApplicationMaster and spawning it on a node in the cluster.
Often called container 0, the ApplicationMaster does not have any additional resources
at this point, but rather must request additional resources from the ResourceManager.
The ApplicationMaster is the ·•master" user job that manages all application life-cycle
aspects, including dynamically increasing and decreasing resource consumption (i.e.,
containers), managing the flow of execution (e.g., in case of MapReduce jobs, running
reducers against the output of maps), handling faults and computation skew, and
performing other local optimizations.
Above Figure illustrates the relationship between lhe application and YA RN components. Hadoo
The YARN components appear as the large outer boxes (ResourceManagcr and
NodeManagers), and the two applications appear as smaller boxes (containers), one dark
and one light. Each application uses a different ApplicationMaster; the darker client is
running a Message Passing Interface (MPI) application and the lighter client is running Write any four Busioe.
a traditional MapReduce application. 1. Customer Relations~
b. What do you understand by YARN Dbtributcd-Shcll? (06 Marks) customer becomes
Ans. Distributed-Shell is an example application included with the Hadoop core components sentiments of the cu
that demonstrates how to write applications on top of YARN. It provides a simple method also, expand the poo
for running shell commands and scripts in containers in parallel on a Hadoop YARN of marketing.
cluster. a. Maximize the ret
The distributed shell client allows an application master to be launched that in turn would points from data-
run the provided shell command on a set of containers. tuned to better re
This client is meant to act as an example on how to write yarn-based applications. b. Improve custome
To submit an application, a client first needs to connect to the ResourceManager win new custome
aka ApplicationsManager or ASM via the ApplicationClientProtocol. The on their likelihoo
ApplicationClientProtocol provides a way for the client to get access to cluster such as discounts
information and to request for. a new ApplicationId. manner.
Fortheactualjob submission, theclient first has to create anAppl icationSubm issionContext. c. Maximize custon
The ApplicationSubmissionContext defines the application details such as Applicationld customer should
and application name, the priority assigned to the application and the queue to which this a customer new p
application needs to be assigned. In addition to this, the ApplicationSubmissionContext increase revenue
also defines the ContainerLaunchContext which describes the Conlainer with which the opportunity to wo
ApplicationMaster is launched. and value, the busi
The ContainerLaunchContcxt in this scenario defines the resources to be allocated for the d. Identify and delig
Ap~licationMaster's ~ontainer, the local resources Uars, configuration files) to be made best customers can
available and the environment to be set for the ApplicationMaster and the commands to with greater attenti
be executed to run the ApplicationMaster. effectively.
Using the ApplicationSubmissionContext, the client submits the application to the e. Manage brand ima
Re~ourceManager and then monitors the application by requesting the RcsourceManager chatter about itself.
for an Application Report at regular rime intervals. In case of the application taking too nature of comments
long, the client kills the application by submitting a KillApplicationRequest to the 2. Health Care and Wel
ResourceManager. economies. Evidence-
run. Once the scheduler
oved from an accepted s ....,, ,_
~1
11
VIII Se,m., (CS£ I I S£) 'B~V~A~ C'BCS • JIM'\R./( J ~ 20
care management. Bl applications can help apply the most effective diagnoses and c. What is Confusion Ml
prescriptions for various ailments. They can also help manage public health issues, Ans.
and reduce waste and fraud.
3. Education As higher education becomes more expensive and competitive, it is a great
user of data-based decision-making. There is a strong need for efficiency, increasing
revenue, and improving the quality of student experience at all levels of education.
4. Retail organii.ations grow by meeting customer needs with quality products, in a
convenient, timely, and cost-effective manner. Understanding emerging customer
shopping patterns can help retailers organize their products, inventory, store layout,
and web presence in order to delight their customers, which in tum would help
increase revenue and profits. Retaih:rs generate a lot of transaction and logistics data
that can be used to solve problems.
b. Explain the star schema design or Data Warehousing with an example.
A confusion matrix
(06 Marks)
Ans. Star schema is the preferred data architecture for most DWs. There is a central fact table classification m~del
that provides most of the information of interest. There are lookup tables that provide known. The confusi
detailed values for codes used in the central table. For example, the central table may use terminology can be ci
digits to represent a sales person. The lookup table will help provide the name for that
sales person code. Herc is an example of a star schema for a data man for monitoring 6. a. Explain CRISP-D
sales performance Ans.
Item id Item name Region id Name
-- Prriod id Month/yr
Other schemas include the snowflake architecture. The difference between a star and
snowflake is that in the latter, the lookup table~ can have their own funher lookup tables.
There are many technology choices for developing OW. This includes selecting the right
The data mining I
database management system and the right set of data management tools. There are a
Mining (CRISP-D
few big and reliable providers of DW systems. The provider of the operational DBMS
I. The first and m
may be chosen for DW also. Alternatively, a best-of-breed DW vendor could be used. asking the right
There are also a variety of tools out there for data migration, data upload, data retrieval,
11ml data analysis. lead to large pa
selecting a dat
12
CBCS -JIM\,(!/(J~ 2019
t effective diagnoses and c. What Is Confusion Matrix. (02 Marks)
age public health issues, Am.
d competitive, it is a great
for efficiency, increasing
t all levels of education.
ith quality products, in a
~ding emerging customer
cts, inventory, store lay~
which in tum would hel
-ansaction and logistics
h an example. A confusion m~trix is a table that is often used II> dacribe the performance of a
(06 Ma classification model (or "classifier") on a set of test dlla for which the true values are
. There is a central fact tab known. The confusion matrix itself is relatively simple to understand, but the related
lookup tables that provi terminology can be confusing.
le, the central table may OR
p provide the name f?r t_
a data mart for momtonna 6. a. Explain CRISP-OM cycle with a neat diagram. (08 Marks)
Ans.
13
VIII Sett11 (CSE ( I SE) 'B~V~A~ L13CS · JIM'\.el( J u½' 2011
strong payoffs if the project is successful. There should be strong executive support
for the data mining project, which means that the project aligns well with the business Data mining is a method
strategy. large amounts of data t
2. A second important step is to be creative and open in proposing imaginative
patterns.
hypotheses for the solution. Thinking outside the box is important, both in terms ofa Data mining is usu;i
proposed model as well in the data sets available and required. business users with th,
3. The data should be clean and of high quality. It is important to assemble a team that engineers.
has a mix of technical and business skills, who understand the domain and the data. Data mining is the co
Data cleaning can take 60 to 70 percent of the time in a data mining project. It may process of extracting d
be desirable to add new data elements from external sourc~s of data that could help data sets.
improve predictive accuracy. One of the most impc
4. Patience is required in continuously cn$aging with the data until the data yields some of data mining tech
good insights. A host of modeling tools and algorithms should be used. A tool could detection and identific2
be tried with different options, such as running different decision tree algorithms. in the system.
s. One should not accept what the data says at first. It is better to triangulate the analysis
by applying multiple data mining techniques and conducting many what-if scenarios,
to build confidence in the solution. Evaluate the model's predictive accuracy with 7. a. What Is splitting va1
more test data. variable.
6. The dissemination and rollout of the solution is the key to project success. Otherwise Ans. Decision trees arc traine
the project will be a waste of time and will be a setback for establishing and supporting repeatedly split accordin
a data-based decision-process culture in the organization. The model should be homogeneous) in terms
embedded in the organization's business processes.
b. What do you understand by the term Data Visualizatio n? How It is Impo rtant in
...
Big Data Analytics? (OS Marks)
Ans. As data and insights grow in number, a new requirement is the ability of the executives
and decision makers to absorb this information in real time. There is a limit to human
comprehension and visualilation capacity. That is a good reason to prioritize and manage
with fewer but key variables that relate directly to the key result areas of a role.
Here are few considerations when presenting data:
I. Present the conclusions and not just report the data.
2. Choose wisely from a palette of graphs to suit the data.
3. Organize the results to make the central point stand out.
4. Ensure that the visuals accurately reflect the numbers. Inappropriate visuals can
create misinterpretations and misunderstandings.
5. Make the presentation unique, imaginative, and memorable.
c. Differentiate between Data Mining and Data Warehousing. (03 Marks) Three criteria for choo
Ans.
a) Which variable to u
Data Mining Data Warehouse variable for the fi _
Data mining is the process of A data warehouse is database system which is measures like least
analyzing unknown patterns of data. designed for analytical instead of transactional b) What values to use
work. age or BP, what vah
c) How many branch~
with just two branc
b. List some of the advar
14
,c strong executive support
Data mining is n method ofcompar;ns Duo, "'~huu5ing Is a method or centralizing
ligns well with the business
large amounts of data to finding right data from different sources into one common
patterns. reposit.or)
in proposing imaginati\e
!
mportant, both in terms of a
1uircd.
Data mining is usually done by
business users with the assistance of
Data w.-ehousing is a process which ne~ds to
occur ~fore any data mining can take place.
engineers.
ant to assemble a team that On the other hand, Data warehousing is the
nd the domain and the data. Data mining is the considered as a
process of extracting data from large process of pooling all relevant d~ta together.
data mining project. 1t may
data sets.
urc~s of data that could help
One of the most important benefits One of the pros of Data Warehouse is its
ta until the data yields some of data mining techniques is the ability to update consistently. That's why it is
hould be used. A tool could detection and identification of errors ideal for the business owner who wants the
decision tree algorithms. . in the system. bc!>t and la1eM features.
ter to triangulate the anal~s•s Module- 4
ting many what-ifscenan~s,
l's predictive accuracy with 7. a. What is splitting variable? Describe three criteria for choosing a splitting
variable. (04 Marks)
o project success. Otherw_ise Ans. Decision trees arc trained by passing data down from a root node to leaves. The data is
r establishing and supporting repeatedly split according to predictor variables so that child nodes are more "pure" (i.e.,
ltion. The model should be homogeneous) in terms of the outcome variable. This process is illustrated below:
Root
on? How it is important in
(OS Marks)
1 theability of the executives
e. There is a limit to human _./' , root split
.. :':--$···
.- ~
ason to prioritize and manage
result areas of a role.
st
1 child split /
. _/
'
nd
2 child spltt
~
/ \
s. Inappropriate visuals can
15
VIII Semt ( CSE ( ISE)
Ans. Advantage:
I. Regression models are easy to understand as they are built upon basic statistical
principles, such a~ correlation and least square error.
2. Regression models provide simple algebraic equations that are easy to understand
and use.
3. The strength (or tho: goodncs,; of fit) of the regression model is measured in terms
of the correlation coefficients, and other related :.tatistical parameters that are well
understood.
Disad,•antage:
I. Regression models cannot cover for poor data quality issues. If the data is not
prepared well to remove missing values, or is not well-behaved in terms of a normal
distribution, the validity of the model suffers.
2. Regression models suffer from collinear problems (meaning strong linear correlations
among some independent variables). If the independent variables have strong
correlations among themselves, then they will eat into each other's predictive power
and the regression coefficients will lose their ruggedness.
3. Regression models will not automatically choose between highly collinear variables,
although some packages attempt to do that. Regression models can be unwieldy and
unreliable if a large number of variables are included in the model. All variables
entered into the model will be reflected in the regression equation, irrespective of their
contribution to the predictive power of the model. There is no concept of automatic
pruning the model. 8. a. Explain the deslg,
c. Create a decision tree for the following data set. Ans. Design Principles
Credit Loan Approved • I. A neuron is th
Age Job House
element) recei11
Young FALSE No Fair No
weighted com
Young FALSE No Good No output value, a
Young TRUE No Good Yes the inputs, w's
Young TRUE Yes Fair Yes
Young FALSE No Fair No
Middle FALSE No Fair No
Midd°le FALSE No Good No
Middle TRUE Yes Good Yes
Middle FALSE Yes Excellent Yes
Middle FALSE Yes Excellent Yes
Old FALSE Yes Excellent Yes
Old FALSE Yes Good Yes
Old TRUE No Good Yes
Old TRUE No Excellent Yes
Old FALSE No Fair No 2. A neural net\\
Then solve the following problem using the model. output neuro
structure wo
16
Age J ob Houe Loan Approved
Young FALSE No ???
--
Yes
- Y
Yes
Yes X3 - W3
Yes
Yes /w,
Yes X4
Yes
No 2. A neural network is a multilayered model. Then: is II least one input neuron, one
output neuron, and at least one processing neuron. An ANN with just this basic
structure would be a simple, single-stage computational unit. A simple task may be
17
VIII Sem, (CSE ( ISE) CBCS · J(;(..t'\£11J«½t 201
processed by just that one neuron and the result may be communicated soon. ANNs,
however, may have multiple layers of processing elements in sequence. There could Scru, I> f or
cnun\ of c:.ch
be many neurons involved in a sequence depending upon the complexity of the c:u1d1d~1e
predictive action. The layers of PEs could work in sequence, or they could work in
parallel.
•1 ~
' Wm , w,11 Neuron 11 ~ HJ.- ~
.,.)_:t,-,~1-J,-,~,,
('l
Ki w Wu, ~Neuron
~~~
W, 11 Neuron., Ct:nctntr C2 llcn>>et
wm' _ !\ / Y!.m cnnd1dntc1 hum r..1 { l1, 1_)
Kl Wm w.,~ j I
In /12 : ~:ll Neuron
13
w,u \11 . 15}
{12 , 13)
(12, 14)
w"1 I rn In
{12.15}
\13. 14}
(IJ. •~J
(14. I~)
Input Neuron layer Hidden Layer Output Neuron Laye r
3. The processing logic of each neuron may assign different weights to the various
incoming input streams. The processing logic may also use nonlinear transformation,
such as a sigmoid function, from the processed values to the output value. This Vcnt!r.lfe (' ~ tnu.cr
ca.uct1<1,1,- h om ( II . 12. n )
processing logic and the intermediate weight and processing functions are just what l.1
works for the system as a whole, in its objective of solving a problem collectively.
Thus, the neural networks are considered to be an opaque and a black-box system.
4. The neural network can be trained by making similar decisions over and over
again with many training cases. It will continue to learn by adjusting its internal 9. a. What is Na'ive Bayes 1,
computation and communication based on feedback about its previous decisions. Ans. It is a classification te
Thus, the neural networks become better at making a decision as they handle more independence among pr
and more decisions.
the presence of a partic
b. How does the Apriori Algorithm work? Apply the same for the following example. feature.
TIO List of Item-Ids For example, a fruit ma
T I00 11 , 12. 15 inches in diameter. Even
the other features, all oft
T200 12, 14
this fruit is an apple and ,
TJ00 12, 13 Naive Bayes model is e
T400 11,12,14 Along with simplicity, N
T500 I 1,13 classification methods.
T600 12,13 Bayes theorem provides ~
and P(xlc). Look at the e
noo 11 ,13
T800 11,12,13,15
T900 11,12,13
Assume the support count- 2.
A ns. (08 Marks)
·18
cs -J~( J~ 2019
-
t.:i1\;:ll lh."BlJ.CI Sun .u11111
in si:quence. There could !fl I r,
on the complexity of the
coun: of eoch
cand1<la.~ {1'2) 7 ~_,
l..~C.-• {I I)
{12)
(\
7
(13) 6 !IH t,
ce, or they could work io {14) 2 {hi} !
1151 2 !IS} 2
Genern1c C,
r,
llcmM:I Senn D for
•1 /.,
llcmm Su11..- c,.,.,r:irc cand1dotc llemset Sun count
-
Neuron31 cn11d11Ja1c., (,.,,;, l1 I II. 12 J coun1 of each (II , 12) 4 'iaf"P1rt cuunl wi1h { II, I~) ~
4
2
2
to the output value. This canJ«lll~\ ",,m {II. 12. ll) coun1 or c>ch {II. 12, 13) 2 minimum <Uppon fl 1. 12. 13) ?
l,1 cond«J:u~ count
· ng functions are just what {11.12.15} -
{11.11.15} 2
.
(11.12. JS} 2
ing a problem collectively.
and a black-box system. Module- 5
decisions over and over
n by adjusting its internal What is Na'ive Bayes Technique? Explain its model. (05 Marks)
out its previous decisions. It is a classification technique based on Bayes' Theorem with an assumption of
ision as they handle more independence among predictors. In simple terms, .a Naive Bayes classifier assumes that
the presence of a particular feature in a class is unrelated to the presence of any other
feature.
or the following example,
For example, a fruit may be considered to be an apple if it is red, round, and about 3
inches in diameter. Even if these features depend on each other or upon thi: existence of
the other features, all of these properties independently contribute to the probability that
this fruit is an apple and that is why it is known as 'Naive'. ·
Naive Bayes model is easy to build and particularly useful for very large data sets.
• Along with simplicity, Naive Bayes is known to outperform even highly sophisticated
classification methods.
Bayes theorem provides a way of calculating posterior probability P(cJx) from P(c), P(x)
and P(xlc). Look at the equation below:
(08 Marks)
29
VIII Sem- (CSE I I SE) CBCS - J ~ / J ~ 2
l.i-elmood 1 N
-:,w1 w+CL s,
- za; l
P( c I-") = P(x I c) P(c) subject to the constrai
I P(x) yc(w1 ¢{~J +b)~1
POS-tt:flOr Prob.ab~ o,d1ctorPr~P,obabRrty where C is the capac
represents parameter
training cases. Note t
variables. The kernel
Above, feature space. It sho
P(clx) is the posterior ·probability of class (c, target) given predictor (x. attributes). Thus, C should be ch
P(c) is the prior probability of class. · CLASSIFICATION
P(xlc) is the likelihood which is the probability of predictor given class. In contrast to Class
P(x) is the prior probability of predictor. minimizes the error ti
b. What is Support Vector Machine? Explain its model. (08 Marks) 1 1 .,
Ans. In a nutshell, a support vector machine (or SVM) is an algorithm that works as follows. It
- w 1 w-vp+-~
uses a nonlinear mapping to transform the original training data into a higher dimension.
2 .v;=i
subject to the constra•
Within this new dimension, it searches for the linear optimal separating hyperplane
(that is, a "decision boundary" st:parating the tuples of one class from another). With an Y;(w1 ¢\x,)+b)~
appropriate nonlinear mapping to a sufficiently high dimension, data from two classes In a regression SVM,
can always be separated by a hyperplane. The SVM finds this hyperplane using support variable yon a set ofi
vectors ("essential" training tuph:s) and margins (defined by the support vectors). that the relationship
y deterministic functio
Regression SVM
y = f(_x) + noise
The task is then to fi
the SVM has not bee
model on a sample s~
above), the sequential
this error function, t
REGRESSION SV
For this type of S VM
1 .\'
X -w1 w+cL ,:, +
To construct an optimal hyperplane, SVM employs an iterativt: training algorithm, which 2 r• l
is used to minimize an error function. According to the form of the error function, SVM which we minimizes
models can be classified into four distinct groups: . w 1 ¢(x 1)+ b - y 1 :5'
I. Classification SVM Type I (also known as C-SVM classification)
2. Classification SVM Type 2 (also known as nu-SVM classification) Y, -w1 ¢(x, )-b, :S
J. Regression SVM Type I (also known as epsilon-SVM regression) ~1. ~,· 2::0,i = 1,.... .
4. Regression SVM Type 2 (also known as nu-SVM regression)
Classification SVM REGRESSION SV
CLASSIFICATION SVM TYPE 1 For this SVM modd,
f or this type of SVM, training involves the minimization of the error function:
20
I V
.,
-
~.,,,
-w'w+C~ ,,,
t• l
subject to the constraints:
y,fw'9(x.)+b)~1-s,and{,~O •l ,.V
where C is the capacity constant, w is 6e wector of coefficients, b is a constant, and
represents parameters for handling no,asq.aable data (inputs). Thi;: index i labels thi;: N
training cases. Note that represents the class labels and xi represents the independent
variables. The kernel is used to transform dala from the input (independent) to the
feature space. It should be noted that the larger the C. the more the error is penalized.
ictor (x. attributes). Thus, C should be chosen with care to avoid over fitting.
CLASSIFICATION SVM TYPE 2
ivcn class. In contrast to Classification SVM Type I. the Classification SVM Type 2 model
minimizes the error function:
} T 1 X
(08Ma -w w-vp+- L "'·
m that works as follows, 2 N 1-1 .,,,
ta into a higher dimcnsi sub)ect to the constraints:
mat separating hyperpl .vilw' 9Cx:)+b)~ p- ~t' !t ~ 0, i =l, ...,X and p ~o
lass from another). With
In a regression SVM, you have to estimate the functional dependence of the dependent
ion, data from two cl variable yon a set of independent variables x. It assume:,, like other regression problems,
is hyperplane using sup that the relationship between the independent and dependent variables is given by a
the support vectors). deterministic function f plus the addition of some additive noise:
Rcaresslon SVM
y = f(x) + noise
The task is then to find a functional form for f that can correctly predict new cases that
the SVM has not been presented with before. This can be achieved by training the SVM
model on a sample set, i.e., training set, a process that involves, like classification (see
above), the sequential optimization of an error function. Depending on the definition of
this error function, two types ofSVM models can be recognized:
REGRESSION SVM TYPE 1
For this type of S VM the error function is:
J X X
X
-w'w+C""
') t
£-. . +CY ..... ~Jt • 1
- ,-1 t•l
ve training algorithm, which
which we minimize subject to:
of the error function, SVM
w 1 ¢(x:)+b-y1 ~-E+,:;
Y, -w'¢(x,)-b :Se+::,1
Zt
VIII Sem, (CSE/ IS'f)
CBCS · JIM'\£/( l!M:1
.!..wr
2 .'llf ,.. 1
f
w-c( ve+..!... k +c;:,· )1
) I
10. a. Explain briefly t
which we minimize subject to: Ans.
(wr¢(xi)+b)-yi $e+{,
y, -(w1 ¢(xi)+ b,} ~ E+ ~•;
~ 1. ~ · ; ~ O,i = l, ..., N ,e ~ 0
There are number of kerne.ls that can be used in Support Vector Machines models. These
include linear, polynomial, radial basis function (RBF) and sigmoid: Web Conte
Kernel Functions Using HTM
X 1°Xi Linear
Web Content
f , \- (rxi •X j +C)d Polynomial A website is de
K X i, X i .- ( )
locator). A large
. exp -rl X 1 - X ; I
2
RBF
are managed usi
tan1i(rx1.x 1 +c) Sigmoid audio, video, fo
content. The wet
where K(X;.Xi)- 4l{X1)• ~iJ
of these request
that is, the kernel function, represents a dot product of input data points mapped into the and application
higher dimensional feature space by transformation , pages on a webs
Gamma is an adjustable parameter of certain kernel functions.
pages could be t
The· RBF is by far the most popular choice of kernel types used in Support Vector altogether. Simih
Machines. This is mainly because of their localized and fi nite responses across the entire more fresh and i
range of the real x-axis.
Web Structure
c. Mention the 3-steps process of Text Mining. (03 Marks) The Web works t
Ans. page can create
Text mining is a s~miautomated process. Text data needs to be gathered, structured, and
then mined, in a three-step process. Web lends itselft
I. Tho text and documents are first gathered into a corpus and organized. also be analyzed
2. The corpus is then analyzed for structure. The result is a matrix mapping important Web Usage Min·
terms to source documents. As a user clicks a
3. The structured data is then analyzed for word structures, sequences, and frequency. entities in many I
Ter'?•document matrix (TDM): This is the heart of the structuring process. Free the web server pr
Howmg text can be transformed into numeric data, which can then be mined using entities between t
regular data minin techniques. too, would record
data generated th
£stablish the Structure using
Term Document Mine TOM for data stored in ser
Corpus of Text:
Matrix (TOM): Patterns user characteristi •
Gather syndicated data. F
documents, Select a bag of ·Apply data data, arc also gath
clean, prepare for words, compute mining tools like b. Compute the ra
analysis frequencies of classification and
cluster analysis FigQlO(b), Whic
occurrence
OR
a. [ 1plain briefly the three different type .r"" mining. (06 Marks)
( web-.] =c::::::
ector Machines models. These
dsigmoid: Web Content Mining
Using HTML pages
Web Strudlft Mlling:
Using URL links
Web Usqe Mining:
Using visits, clicks, logs
I
Web Content Mining
A website is designed in the form of pages with a distinct URL (univcr5al resource
locator). I\ large website may contain thousands of pages. Those pages and their content
arc managed using content management systems. Every page can have text, graphics,
audio, video, forms, applications, and more kinds of content, including user-generated
content. The websites make a record of all requests received for its page/URLs. The log
of these requests could be analyzed to gauge the popularity of those pages. The textual
and application content could be analyzed for its usage by visits to the website. The
ut data points mapped into the pages on a website themselves could be analyzed for quality of content. Th«: unwanted
pages could be transformed with different content and style, or they may be deleted
unc tions. altogether. Similarly, more resources could be assigned to keep the more popular pages
types used in Support Vector
more fresh and inviting.
nite responses across the entire Web Structure Mining
The Web works through a system of hyperlinks using the hypertext protocol (http). Any
(03 Marks) page can create a link to any other page. The intertwined or self-referral nature of the
to be gathered, structured, and Web lends itself to some unique analytical algorithms. The structure of web pages could
also be analyzed to examine the structure ofhyperlinks among pages.
and organized. Web Usage Mining
·s a matrix mapping important As a user clicks anywhere on a web page or application, the action is recorded by many
entities in many locations. The browser at the client machine will record the click, and
s, sequences, and frequency. the web server providing the content would also log onto the pages-served activity. The
the structuring process. Free entities between the client and the server, such as the router, proxy server, or ad server,
ich can then be mined using too, would record that click. The goal of web usage is 10 el\tract useful information from
data generated through web page visits and transactions. The activity data comes from
data stored in server access logs, referrer logs, agent logs, and client-side cookies. The
MlneTDMfor user characteristics and usage profiles arc also gathered directly, or indirectly, through
Patterns syndicated data. Further, metadata, such,as page attributes, content attributes, and usage
data, are also gathered.
-Apply data
mining tools like b. Compute the rank v11lues for the Nodes for the following network shown in
classificat ion and FigQJ0(b), Which is the Highest ranked node. Solve the snme with e ight ilcn11lons.
cluster analvsl~ (10 Marks)
23
VIII Set111 (CSE/ ISE)
Ans.
Ra Rb Re Rd
Ra 0 0.50 0 1.00
Rb 0.50 0 0 0
Re 0.50 0.50 0 0
Rd 0 0 1.00 0
Variable Initial Value
Ra 0.250
Rb 0.250 .
Re 0.250
Rd 0.250
u C C C g C C C C
:;:;
.!! ~1 -~ -
0 0 0
-~ ..... .! .... "!..,. ·s 0 0
·.:i
E so
0
-~"
_g
~ ..
NET
V>
]~
~ 1! l! ~ ~ -= ~ ~
Rd
1.00
0
SUNSTAR
0
po 0
C: C:
0
g
0
-~ "' -~ .... -~ 00
6
~
0.332
~
0.334
~
0.333
NETWORK MANAGEMENT
64 0.168 0.167 0.167
50 0.250 0.250 0.250
50 0.250 0.250 0.250
.33.
(VIII SEM. B.E. CSE / ISE)
. SYLLABUS
NETWORK MANAGEMENT Eight
li\S rl'll <.1101('1 OASLD <.RU)IT SYS II M (c:tKS) S( 111 M~)
terrl'C-IIVI FROM1111 A(ADl~11l'YI \1(21Jlt\-W17)
Subjrrl C'odr •~< ~8,; I\ \l:1rk} 2Q
,umbrr of I ntun· lloun/'I\ ,-ck UJ L ,:on \ 1...rl..., ~o Time: 3 hr~
Iota! :-.umhn 11r l.rcturc llour, ~o f,:om 111/Ul'll 113 'iotc: fmwu ""Y r11 ·
MODULE 1
Introduction: AnalO!l' ul kkphun~ Nc11,ork M,10,1!!,· 11cm. IJ,ir., aml ·1ck.iw1111nmi.:,11iu~ Scl\\ork 1~1,:nt,111,-J
compulin!! r"' ironm,nts I Cl' 11'11:i;;c,I ~Cl\\ ,rk, I he lmcrn,·1,rn.l lnlr,m:K <.:c•mm11111,,111un, I r..1..,,,1s.,ndS1,,ml,1nls- I a. What :ire the 1:0:11.
('uminunu;ation t\r..,;I it~~ hm::-.. 1•it..)h 1,,;,,1 I .i: . "mt S-:1, 11.:~'i (. ·''"' ~ 11:-,h•ric:-. ul \!t,..,,.,rk1ng tiOd ~vtina~~mL'nl I I
lmpurlancc uf wpulo"~ . r:;1tcn,1,! D1>c • "' Hc,l11cc I uaJ <111 ~ •.le. ',omc (.\m'.1111111 Nct~vurl._ l'rohkms· l_hallcn~, Ans, The goals of nctwo
of Inform.Ilion kd1 .11h1g.1 Manag.er,, Nc11111rl. :-..1a11,1)!c111cn1 Ciuab. Org.an11,111u11. :,nil hmc11011,- G1>al ol Ncl\\Ork
IT services ,1 ilh a q
\lanag.cmc111. Nci" rk Pr,,· i,mmn:!.. t,;,1,H1rk Or,•r.1111111, anll the MX ·. 1'cl\\1•rk ln_,1allat10n and M.unlcnan,,:
~el\\Url. and S: ,, >-l.111.1g,·11,1,1 :S:c11,u • M~ •.:mcnl S"1"111 pl.ufor111. ( urrcnt StalU~ ;uni I ulur.: ol :S.,l\\ui\ The function!> of nc
Mana~cm.:111.
'VIODULE 2
Basic ~·u1111da1i1>1 , ,!,ml,. \h•1'cl~. an,! I ·11:,ll,I!(, ,·1111111. M,magcm.:111 Stan,lards.
11.111,kl. Oriµini1.o•• '.' I. lnli.,, ,: t· ,n \h•lcl \fan ,;,,ment Information lr.:cs. \!anag.,•,I Ob1cc1 l'cr-1-.:,ti,,
Communkalion M• ,1,, \Sl'-:. •-".:mt •I ,. S"n'l\ils. :in<l Con,cmion,. Objcci, .ind Data l~pc,. ObJ,'ct ,\.,mes. Aa
fo,:imph: ol'ASN I from hO li'P I: l.n,ud1n:-, '.'itr- e111rc. M:,cro, ! 1111,11,,nal \10\lcl
MODULE 3 Nc1,,ork
SNMl'vl Ncl\,01k :-..1anag.:rn~11t: M,111agcll N,·1,,ork I he lli,tN) ol SNl\11' l\1anagcmenl. lntcrni:l Org,ini7,lllllll5 Prm isioning
and \landards. ln1e111.:1 D,,cum-·111,. I he S!\i\ II' \10\lcl. I he O~;mi1.i1ion Mo1.kl. S~ ,tc:m O, en ic1,. The lnli11m:11iua
Model - lntrodu,1ion. I h,· S1111 111r • of M.1n,1;_,cn11·n1 lnform~ll,•n..Mana11,cd Obje,1,. Mana11,cmen1 lnfurmalion l\,1-.: Planning
'I he SNMP Communicalinn Mn,td I'll, S'-~il· h:hi1,e111:-c. /\dministrati,c: M01lc:I, SNMP Spc,ificalion,. SN\11' Ocsign
Operations. SNMI' 1\1111 (iroup. Func111•nal Mo,kl SNl\11' M.m.1!(cmcn1 - RMON ; l{cmnh: Monitoring,. Rl\l(JN Sl\11
and MIB. Rl\10NII- RMO'NI 'lc:-.111.il (\,mcnti,,ns. RM(•:-.11 (irm1p, and F11nc11ons. Relationship lkh,c:IXI Control
and O~ta !'ables. UMON I ('omnwn and l:1hcrn,·1 l,ruup,. R\10N Iokcn Rmp F,t<!nsion C.rc111,,,. ltMO;s:2 rtw
!{MON~ Man.,gcmem Information 11.,sc. !l\lO:-.:? Confonnanec Sp,:~1lka11011,.
MODULE4
Broadband A'-css Nct"or\..,. flro.1db,m.f ,\cccs, l'cchnnlv!l). I IF(.' I' Tedmolot!): l'hc 13roadband I .AN. The Cable
Modern. Th.: C:ahlc Mo1lcm Tcrn1in.11ion ~: ,1e111. The Ill'(.' 1'1:inl. I he 1(1' Spectrum forC,hle fllodem: 1)31n Ch er Cable.
Rcfcn:m..-c Arehi1cc1ure: I IFC Managcmcn• - Cat-le Modem and lM IS \olanagernenl. I IF(' l.inl. M,111;1gcmcnt. Ill
Spcctnim Management OSI. li:ehnulog~ ,\,~ mmeirie D1g11.1I Suh,cnl>,·r I.in.: k,hnolug~ -Role ul the /\D',f \cc.:.-
Ne1,\ork in an 01crall Network, /\D!'it. t\rchitccturc. ,\DSI { 'h,umcling ~chcmc,, 1\DSL lncod111g Scheme,. i\l>SL
Mnnngcmenl -/\DSI Network Manag.cmenl Elcmen1'. /\DSI. ( unligura1ion 1\111nagcmcn1. 1\0SI F,,ult Manni:cmcnt.
/\OSI. l'crformanc.: Managcmenl. SNML'-llaseli ADS!. I.inc MIil. MIU lmcsra1111n "ilh lmcrl'acc, Group, in Mlll-2.
ADS!. (;011figur;1tion Profiles
MODULES
Network Management \pplicmion,· Conli-Jura1mn Man.,gcrncnt- Nct\\orl.: Provisioning. lmcmol) \fanag.cmcnt.
Network 'lopolog). Fault Managcnicn1-l'auh Dc1cct1vn. Faull I o,,llion an,I holation 24 techniques. Pcrform<1ncc
Mnnng.cmcnl - Perfonnance Metric,. Data Moniluring. l'rnhlc111 bolmion. l'crl<mnanec Sta1i,1ic,. 1:.,ent Corrdalion
leehmques - Ruk-lla,cd Rea,unmg. :'1111,kl-ll:,,cd Re;iwning. C'.1'ellased Re.1,unmg. Codd,uok corrcl;nion \1<'<ld.
Stale 1 ran.,ition Graph \.10\lcl l'inite State \fachinc l\foJcl. Securi1~ M,tn;igcm,nt - l'olic,~, a11d Procedur.:s. Sccuril)
Rn:a chcs ,md the Hc,nurlcs N~cdc,111, l're<ent lhcm. I irc11,1II,. C'l")ptoi:r.1flh,. /\uthentieation ;,nd Au1hori1.nion
ClicnVScl'\cr /\uthen11c;llion S)slcm,. Mcss;iec, lr.rn,li:r Scc11ri&). Pro1cc11u~ of Network., from Virns A11:1ck~
i\~cuunttnl? Mana1,1cmcn1. R~pon I\ lnn11gcme111, l'ol,c)- Based M,urngcmcnl. Ser\'i,c Le, cl Manugcmcnt.
Eighth Semester B.E. Degree Examinntion,
C BCS - Model Q uestion P:1pcr - 1
211
NETWORK M ANAGEMENT
80
Time: 3 hrs. M11x. !\,, rks: 80
Nole : Am over 1111y Fl V/:,,f11ll q11eJtiom, 1e/efti11g ONEf ull question from l!(lc/t module.
OJ ..-,:
Module - 1
I a. What 11rc the goals, org1111iz11tio11 and function of Network m:1n11gcm,·11t?
(I~ Marks)
Ans. The goals of network management is to en~urc 1ha1 users of a network a1, provided
IT services with a qualities of services.
The runc1ions of network management arc
Ncl\,ork
Management
k Nc1work MJn.,J~
agc,1 Object Pcrs1•~d
.t Types. Obj.:cl N,1111.:s.
3
VIII Se,m, ( CS£/ISE) Network, M ~ C13CS · Mode.l.,Q~
Regional
Netwo,k :-nivi<Pn;,: .. ,rn, isls of nt'lwo,k planning and design is the responsibilities centre class
of engiuc..:nng ~rnup. t: 0 ha!. defined fiw OSI network mam1gement application switch
which are fault. confi~t:~,llion. performance. securities and account management.
Trouble admini~tration 1•, th1· administrati,c i-1 of fault mnnagcm..:nt and is used
to track problem in the nel\,ork. When e\et there is a service failure. it is NOC's Sccrional
responsibility to restore service as soon as possible. centre class
Data need tu be gathered by NOC and kept updalcd in a timely foshiun in order to 2 switch
perform some of the nbu, c h,nctiom,. Security management can cover a vel) broad
range of securit). Primary
b. Explain common netwo rk 1>roblcm (08 Murks) ccn1rc cl,1ss
Ans. The most common and serious problems of network's arc connectivity failures, 3 switch
which are in category of fault management. F•lt is the mean to failure in accessing
network and systems by user. The network failure is caused more often by a node
Toll centre
failure than by failure of passive links. class 4
The node failures are limited lo specific interface failures. When this happens, switch
all down streams systems from that interface are in accessible. Such failures arc
associated with network interface card which needs replacement.
The node fai lures manifest :ts connectivity railures to the user. These are networking End off
tools available to localize the fault. Another cause of network connectivity failure is class 5
procedural. Network connectivity is based on IP address, which is a logical address switch
assigned by the network administrator. Mistakes are mnde in assigning duplicate IP
address. ~
Voice
A host or s~stem !nlerfocc problem in a shared medium can bring entire segment
down. The 1ntcrm 111cnt problems could also occur as a result of traffic overload
which causes packets to be lost, Performance monitoring tools can be useful i~
tracking problems.
4
C'BCS - MoctevQ~POtpe-r - l
Power hits could reset network component configuration. causing network failure.
The network has a permanent configuration and a dynamic configuration and a power
~it could change configuration.
A performance problem could manifest as a network delay and is an annoyance to
network management. They need to separate the network delay from the application
or application processes delay and then rectifies the situation.
onfiguration dat With the ever increasing. si.:e of the n..:twork's and the connectivity 10 the internet.
TT Restoration security violation is a frequent problem.
OR
2. a. With diagram, CXJllain airnlogy of telephone network management.
Ans. The telephone network is reliable and dependable and the quality and speed of
connection is gud.
The key is the management and operations of the network. The architecture of
telephone network is hierarchical.
To other
Regional centre
Regional Regional Sectional centre
ign is the responsi~i lit_ies
centre class 1------~ centre class Primary centre
Toll centre
switch I Switch End ofllccs
management applicat1on
nd account managcmenL To other
management an<l is used Sectional Sectional Primary centre
centre class Toll centre
rvice failure. it is NOC's centre class End offices
2 switch
timely fashion in order to
ent can cover a very broad To other
~m~ ~m~ Clan 4 toll points
centre class ci:ntre class End offices
(08 Marks)
arc connectivity failures,
_ 1---,1,;....--►I 3 switch
.._;..._.,..... _,
s
CBCS · Mo-d.citQ
VIII Se,m, ( CSf/ISf)
The message in e
There arc five le, els of network S\\ itches and three types of trunks that connect (PDU 's) "hich co
these switchi:s. A trunk is .1 logical link b/ " two ~witchc~ 1ha1 ""'> lnl\ erse one or PCI comain~ hen
more ph)~ical links. ('ht: c:nd office the lo,,cst in the: hic:rarchy. is the local switching.
a service provide
office. I he other four le,els of switches (class 4 through class I) arc to switches
that carr) toll call!>. /\ din.:ct tnmk connects two end omces. a toll connecting truak
connects an end ollice to any toll office: and a toll trunk connects any two offices. 3. n.
Operation ~· r 'NI :.ystcms ensure the qualtl} of service in thc tt:lephone network. Ans.
The,· con"t::ntlv monitor ,·nrious parameters of the network. The quality of call is representation o
me;surcd in te~ns of sign.ii to noise (s/\\) ratio. is measured regularly b) n trunk management inti
maintenance !lystem. For n given rt:g.ion. tht:re is a network operatiuns centre (NOC) bm,c to descri
whert: the global status of the netnork is moniton:d. Thc NOC is the nave centre management inf
of telephone network operations. To manage a network remotely, i.e. to monitor information stor
and control the network components from a central location nel\\ork management agent and m:ina1
function need to be considered in building the components of the network. MIB associated
b. Explain OSI lnycrs .mtl services (08 \\forks) manager is dcsi ,
Lnycr No L:l)'Cr ·amc Salient features pro,·ldctl b) layer on all the netwo
onl} its local inti
Transfers to and ga1hcrs from the physical medium raw
The manager has
bit data
Physical base (MIO). M
I landl~ physical and electrical interfaces lo the configured value
transmission medium
and contains inf
Consists of two sub layers. Logical link control (LLC)
2. Data link
and media access control (MAC)
LL~: Formats the data to go on the medium. Perform
~oaj
error control and flow control
MAC: Controls data transfer to and from LAN resolves
conflicts with other data on LAN.
3. Network Form the switching/ routing layer of network
Multiplexes and de- multiplexes message
application
Acts as a transparent layer to application's and thus
4. Transport isolates tht:111 from transpon system layc:r~
Makes and breaks connection for connection oriented
commun1cat1on Net\\ orl.. clemcn
Controls flows of data in both directions. programs, algori
Establishes and clear sessions for applications and thus contact person a
5. Session
minimizes loss ol data during last data exchange.
b. Explain the tcr
Provides a set of standard protocols so that the display Ans. ASN I is based o
6. Presentation would be transparent to systcm of the applicallon of back us nonna
Data encryption and deeryption <name>:: <de 0
6
kt M~
The message in each la)er is contained in message units called protocol data units
of trunks that connect (POU's) ~hich consists oft,\O pan~- protocol control info (PCI) ancl user data (lJD).
al ma) 1n1, erse one or PC\ contains header 111fo about the la~er. lJI) contain:-. the data thM the layer acting a~
y. is the local sv,it~hmg a sen ice pro, idcr receive:. from or transmitter 10 the upper layer / scr, ice user ln)er.
lass I) are to switches Module - 2
a toll connecting trn;i\i. 3. n. Exphiin infornrntion model (08 Marks)
ccts any two offices. Ans. Information model is concerned with structure and storage of information. The
the tl!lcphone networ~. representation of objects and information relevant lo their management form the
. The qualil~ of cnll ,s management information model. The information model specifics the information
eel regularl) by a trunk base to describe manai;ed objects and their rclatiom,hips. l'hc :,tructurc ol
operations centre l OC) management information (SMI) dcfinl!s the syntax and Sl!manticc, of management
NOC is the nave cc1~tre information stored in the management information b,1se (MIR). MIB is u<,cd b> hoth
motely. i.e. to monitor agent and management process to store and exchange managl!mcnt information.
ion net\\ork management MIB associated with an ngcnt is called ag.1::nt MIB and the MIB associated\\ ilh a
of the net\\ ork. manager is designated as the manager MIB. A manager M1B consists of information
{08 Marks) on all the net\\ork components that it managcc;, where as agent MIB needs to know
ovillctl by luyer only its local information. its MIB view.
The manager has both the management database ( MDB) and management information
, the ph) sical medium raw
ba:,c (MIB). MD0 is II r~·nl databn5\: and \:Ontains 111c::asurcd or acl111inistrati\l:ly
configured value oft:lemc11t~ of network. On the other hand, MIB.is a\ irlual database
and contains infornrntion necessary for proces~ to e.,change information.
Logical linl-. control (Ll.C)
AC)
~0€)-----1 Manager 1----~0~
, ,o on the medium. Perform
rol ·
er to and from LAN resolves
l LAN.
ng. layer of nch, ork
lllhiplexes message
MOO= Management database
er 10 application's and thus Mll3= Mnnagement information ba~e
Manuged objects
rt system hi) ers
tion for connection oriented
IIJ!llliit! =- Agent process
Network clements: Hubs, bridges, roulers, transmission facilities sollware pro..:ess
,oth directions. programs, algorithms, protocol functions. dntabascs administrative informntio11,
ions for applications and thus contact person account number.
ing last data exchange. b. Explnin the terminology, symbols i1nd con,•cntion of ASN I (08 Marks)
protocob so t!lat the display Ans. ASN I is based on Backer S) stem and uses th.: formal S) stem languag.t: and gr:unmar
stem of the arplication ofbackus normal form (BNF) which looks like
ption <name>:: = <definition>
cific protocols for each Where the notation <entity> denotes nn <entity> and the symbol ::= represents
port protocol system. "defi lll!d as·•.
7
VIII Se.m; (CSE(ISE) Networ-~ M ~
8
an operation entity <OP>
SN MP
managed
nd construct <number>
objects
•nmcnt.
There are amplifiers on tlw outside cab Ii: plant. \\ hich do not have built Ill SNMP
agents. The outside c:ibk pl:int uses some of e11.is1ing cable lechnulog) and has
LSE arc called keywords. monitoring tools built into it.
EGER,REAL,NULL~d Alternatives : CHOICE
List: SET:md SEQUENCE
Replication : SET or- and SEQUENCE OF
ASN - I symbols
(08 Marks) Symbol Meaning
raw data from agents and
Dc!incd as. or assignment
ng the events and calculating
mote monitoring) is inserted Or, alternatives, options of a list signed number
ger. The RMON function. Signed number
e pure SNMP management Following the symbol arc comments
f I
I! end ofa list
S1.111 nnd
() Start and end ora sub type
Range
ASN - I Keywords .
Key\\ord Arief description
BEGIN Start ofASN - I module
CHOICE List of alternative
DEr-INITIONS Definition of a data type or managed object
END End ofASN - I module
ommunication management EXPORTS Data types can be exported to other module:.
IDENTIFIER A sequence of non negnti, e numbi.:rs
IMPORTS Data types defined in external modules
INTEGER Any negative or non m:gative integer
NULL A pl,1cc holder
9
VIII S0t111 (CSE(ISE)
Appliration ommunicatio,
Aprlication PDU Version SNMPPDU Data
header units N/w
Gathi:rin ,
Transpon PDU IUDP Meader! SNMPPDU
Network POU
I IP IMeader Transport POU
Data ]
- fnh:rnal
S1a1is1ics
Enternal I li!,tory
Ifo,tol)
Control -
~unicatio1
- Data
1-lOSl & CONVERSATION STATISTICS
Host I-lost top N Matrix
r+
Network
N/w ...,
nits Gathering Staistics statistics St:itistic~ manager
-
~lPOU
~
P:illet
ti ltcring
Channel
filh:ring .
Packet
1:apturc
~
11
VIII Se-wv (CSE/ISE) Netwo-vlv i\-1 ~
12
I . RI\ION I tcxtu11l rnA,er,.ahon . (08 M:trks) d
b. E,p nm . . . . I RMON I tell.tun I convection arc 0\\ner ~,ring an
Two nc" data types dehned is t 1.:h £\ >rk ,na11agemen1 S' stem '"hich could
at the probt: can monitor entrv status. A ncl\\Or·k has '· mon: t an one ne Vt '
· J. • t ble' rhc owner
/I D:st table. Thu protouols - create u<,e irnd delete the control paramelers Ill ,1 a .
[a}cr nnd nn: ide1111hed by be
. pcrm,tted
'fi to . ,,f. t h l' controI 1·•• Il le d~fincd
· lHl<l. prart ~ by owner string data lypc.. lheh
rden11 u,111011 1s • • " · fl.
entn status is used !o rcsohc con acts t11.11 m1g11 . • . arise manHl!t:ment
~
!>ystem 111 t e
contiol table is configured i
13
VIII Se-wtt (CSE(ISf) Networ~/vl~ C13CS - /vfode,l; Q
Digital
Modern
Modul:ited annlog
Cnrrier
Modern I Digital
. ~'
The single lim: di
duplex COlllllHllllC
The any metric di
I
0 n_n_n subscriber line (v1
~l-reque~1c~me signal has a large
Time
Channel upstream signal i
band" ilith difference betwee
Bit rate is the number of bits per second traversing the medium. The band rate is the shorter Iines than
numb1.:r ol :.ignal units per second. The bit rntc equals the hand rate times the number
of bits per symbol. . 8. a. Explain RF spcct
The channel bandwidth and the data rate defined on the rate at which the signal unit Ans. /\n asymmetric c
and on the type of modulation. In QPSK modulation four levels (00, 01, I 0, and allocation of band
11) arc represented four phase states (0°, 90°, 2101) and 180°) phase shirt keying is The t I ical s ectn
limited by the difficulties of detecting small phase shifts , but it can be combined
Upstream Gu
with amplitude modulation to increase the number of levels of a signal. The number
(Forward) 42-
of po~sible levels is the number of PSK le\els times the AM four levels. The
5-42 MHz
downstream signal is at a higher band and carries much more information than the
upstream signal. The downstream signal ha!> higher information capacity. The digital
over cable service interface specification (DOCS ICS) standard de, eloped by MCNS
in the indusiry standafd.
b. Explain DSL technology (08 Marl.s)
Ans. The main motivating factor for employing digital subscribes line (DSL) for access
technology in multimedia services is the pre existence of loop facilities to most
residences. The information capacities of 3000l lz analog noice channel with a 30-
dB signal to noise ratio is 30000 bits per second an unloaded twisted pair of copper
wires from a central office to a residence can carry a digital a digital Tl/OSI signal
at 1.544 mbps. a 0S2 signal at 6.312 mbps and STS. I signal at S 1.840 mbps thus The upstream or re
Shannon fundamental limitations of data rate prevalent in analog modern can be to 42MHz. The do,
overcome by direct transmission. guard band from 42
These distances can be increased for analog telephony if loaded cables that downstream band c
compensate for loss and dispersion arc u~ecl. They cannot support digital subscribes with TV requiremc,
lines because the loaded coils alternate high frequencies. The length limitations of offered at a data rat
copper cable is practically eliminated by 1h1.: 11s1.: offibcr optic configuration. Special services have band,
digital subscriber I inc multiplexing needs 10 be done at the termination of fiber optic Cable modern are cl
line. downstream channc
The basic DSL architecture consist of an unloaded pair or pairs of wire connected cable modern could
between a trans receiver unit at a central office and a trans receiver unit ;it a customer channels to improve
premises. This trans receiver multiplexes and demultiplexes voice and d.ata converts capabilities and corr
the signal to the format suitable for lra115111ission on the DSL link.
b. Explain ADSL chan
A high data rate digital subscriber line (HSDL) operates at a T or £ data rate in
Ans. There are two aspect
a duplex mode with l\\o pairs of wires. The duplex mode with ~airs ~f wires. The
transpor1 bare chann
duplex mode is defined as two way communication \\ ith the same speed in both
as bearer channels a
directions.
14
The single line digital subscriber line (SDSL) is the saint: as NSDL e::1.cept the 2 way
Di ital
duplex communication occur!> over a smgle twisted pair.
The any metric digital subscriber line (Al)SI ) and the very high data rate digital
subscriber line (VDSL) both operate any metrically. As in HFC. the down stream
signal has a larger bandwidth and is at the high end of the spectrum, where as the
upstream siglllll is at lower end of the spectrum and has a lower bandwidth. The
difference between ADSL and VDSL is that VDSL operates at higher data rates over
sho11er lines than ADSL.
um. The band rate is the
nd rate times the number OR
8. a. Explain RF spectrum for cable modern (08 Marks)
al which the signal unit Ans. An asymmetric configuration is required to achieve two way communication and
r levels (00. 01, 10, and allocation of bandwidth is the two directions is different based on the type of service.
on) phase shift keyi1~i; is Tl1c tvo1ca
. I SJJcctrum a IIocatton
. extends on Iv lo 750 Ml IL.
but it can be combmed Upstream Guard band Down slre111n (forward) 54- 750 Ml lz
~ of a signal. The number (forward) 42-54 MHz Analog Digi1al data Digital Telephone
he AM four levels. The 5-42 Mill. video 54- scrviccs 550- video 560- 700-750
nore information than the 550 MHz 560 MHz 700 Ml-12 MHz
ation capacity. The digital
~
dard developed b) MCNS
(08 Marks)
Upstream (Reverse)
ibcs line (DSL) for access
of loop facilities 10 most I Digital
5-42 Milz
Digital Tele
I
g noice channel with a 30-
video Data shopping
aded twisted pair of copJ)(f
•ital a digital Tl/OSI signal control services 25-40 MHz
signal at S1.840 mbps thus The upstream or reverse signal is allocated the low end of spectrum from 5MHz
t in analog modern can be to 42MHz. The downstream or forward signal is allocated from 54 to 750MHz. A
guard band from 42 to SO Ml It separates the forward and reverse spectral bands. The
ony if loaded cables ~hat downstream band contains the analog video from 54 to 550 MHz and is compatible
with TY requirements. The digital services providing data service to the home are
01 support digital subscribes
•s. The length limitations of offered at a data rale of upto 30 mbps. Digital data, digital video arid tek'r,honcs
optic configuration. Spcci~I services have bandwidth allocated in both forw,1rd and reverse directions.
the termination of fiber optic Cable modern are designed to tune themselves automatically to the upstream and
downstream channel frequencies upon initial installation. Under noisy conditions.
or pairs of wire connected cable modern could automntically switch to different downstream and upstream
s receiver unit at a customer channels to improve quality of service. Such a feature is called frequency agile -
xcs voice and d_ata converts capabilities and corresponding cable modern the frequency agile cable modern.
DSL link. b. Explain i\DSL c hanne lling, encoding schem es (08 Marks)
s at a T I or EI data rate in Ans. There are two aspects of channels in ADSL access network. The first is tradilional
e with pairs of wires. The transport bare channel as they are defined in ISDN for ADSL transport frames. level
ith the same speed in both as bearer chan_nels arti dtifined in the downstream signal upstream signal operating
15
VIII Sewtt (CSE/IS£)
in a simph:x mode. The a<, bcau:r channels :-m: in 111ultiph:s ( 1.2,3 or 4) of the rate
1.536 mb. rhrec additional l.S duple:-. channel can call signal in both do,, n '>tream
and upstream directions. . . . . Plain text
Second. is how the signal is bulTercd "hilc traversing AOSL lmk. Real ume signals.
such as cudio and real - time 1.:ideo. use a fa~tcring scheme und hence are reft:rred a~
fast chRnncl. .
ADSL mg,m·, ! is dependent on line ccno<ling scheme The two types of cncod111g
schemes uwJ in ADSL line encoding 1.:arricr~ amplitude 111odulation (LAP) and
discrdc multitonc (DMi). Both arc based on the Quadrature amplitude modulation
lQAM). In both cases, basic approach ll> to sep:irate no rs band (0.4 JHz). Crom
P1
Cipher text is
the secret ke)
j
fack technique can be used to separate c,, crlapp1ng band hclwecn upstrenm and Message ,ligc
do,\nStn:am signals. in digil:tl Irani
Although ANSI has recommendcd the us~ of D~H for ADSL, a significant number frame or pae
. f current I) deployed sysh.:111 use of CAP ') ~tcm. In D\t1T. the entire bandwidth of nlso known as
approximately 1-1 MI +2 is used in encoding. channels each of an approximate!~ 4 receiv<:d chec
Kl-lz bane!. Cl') ptograph ic
Signature
Public Encrypted
l key and
compn:s!.ed
Plai Encryption
text messa •e
SMTP fon
1::,-mail User pta,~ convcrsio
ration
s ·stem IC.\t
Plain S1gnnturc
text gc11cntin11
i Private
1-.cy
Fig below shown enc!
Then.:, c1-sl p, ,1t·css vecur at 'c receivin~ end PGP is a package, it does not r.:i11vent
the " '. It 1s 111; inl:r 11 ~.:d to tn111~n:it e-mail The signature generation module uses
MDJ 'u .:1.1,... rntc a h.,,I, ~·,,de ,,f 111a, ..1ge and encrypts it "ith senders pri\'att: key.
using R:,A algorithm. II iE/\ or t{SA 1i. employed to generate encrypted manage ·1he
encrypll:d mc,.,ngc ts compn:.,~ed "ith ZIP. 1ne si~naturc is concatenated with the
encrypted 11wssa!,e and com cncd to ASCII:> format by using Radix 64 conversion SMTP formal
module to make it co111patihli: with the e-mail system. t:00\((SIOO
PGP 1s similar tu ENCRYP fED PEM but has additional compression capability. The IC\I
main difference between PGP nnd PEM is how the public key is administered.
b. Explain privllC) enlumccd m.til (PEM) (08 M 11rks) DEK :: Data encrypti
Ans. PEM \\as developed h) IETF. It ii; in11:ndcd to provide privacy enhanced mail, using I K = Inter exchange
end to met cryptography bct\\een originator and recipient process. The privacy MIC:: Message inteo
enhanced services provided arc SMIP - Simple ruail
I. Confidentiality The specification pro
2. Authentication exchange key (IK). O
3. Mcssng.: integrity nssurancc is used to cnct)'pt th •
4. Non repudiation of origin. The lK is a long ran
The cryptographic key called {iata encryption key (DEK) can be either a secret key the DEK transmissiQ
or a public key. generated for digital ,
Fig sho"n MIC- clear message MIC- clear process
encl) pied D[K and
In encl)pted PEM
Public key is the bes
message. encrypted
to pan through e-mn
to e-mail system.
18
~
Signature Signature
ncrypte
Encrypted \_ .
and
; ' TEXT
compr~sscd
messa •e +
E-mail
s stern
SMTP format
User plair, conversion
text
MIC
DEK
t
-~
MIC/DEK ~
t
1~11:iil
IK
Fig below shown encrypted PEM process
MIC:
r.11cryp1c<l
on:
it docs not r.:invcnt tnci, pied
mid
n~ration module uses
senders private key. ~ - - - - \•I IC/OrK-----,
I encoded
ll1<>132C
♦
crypted manage. The
oncatenated with the
Radix 64 conversion -
Us~r 11l11in
ICXI
SMTP format
.
~onvcrs1un
MIC
er11cralor
t
IJl:f-
ENCRYP
fEI) l'l, M
t
ession capability. The DEK IK
is administered. DEK = Data encryption key
(08 Mi1rks) IK = Inter exchange l,.ey
t:nhanced mail, using MIC Message integrity code
process. The privacy SMIP = Simple mail transfor protocol.
The specification provides two types of keys. Data encryption key (DEK) and inter
exchange key (IK). DEK is a rnnclorn 11111nbcr generated on a per message basis and
is used to encrypt the message text to generate an MIC. ifone is needed.
The IK is a long range key agreed between sender and receivt:r i.; "~ed to encrypt
the DEK trnnsmission within the message. The message integrity <.Nie (MIC) is
11 be either a secret key generated for dig.ital signature and is included as past of e-mail.
MIC- clear process and is the simplest. The MIC generated is concat, natcd with
encrypted OF.Kand SMIP text and inserted as the text position in the e-mail.
In encrypted PEM process. SMTP text is padded ir necessary and encrypted.
Public key is the best choice because it guarantees the originator ID. The encrypted
message, encrypted MIC and the encrypted DEK arc all encoclcd in printable code
to pan through e-mail system as ordinary text. They are concatenated and then fed
to e-mail system.
19
CBCS - lvtodei Q
Eighth Semester B.E. Degree Examinution, layers in the two sy
with those layers.
CBCS - Model Question Paper - 2
node/ system.
NETWORK MANAGEMENT The message in cac
Time: 3 hrs. Max. Marks: 80 (POU) which consi
!I. Note: Am wer 1111,v FIVE/11/11f1testio11s, !>~/,:ding ONE/11/l 1f11estio11fro111 eac:/r 111otl11le. ,
(UD). PCI contains
the layer. acting as a
Module - 1 service user 1a;cr.
1 a. Explain the basic communication as architecture. (08 Marks)
Ans. Communication between users and applications occurs at various levels. They can b. Give the comp:iriso
communicate :it the application level. highest level of communication architecture.
Each· system can be <lividecl into t,H) hroad sets of commun ication layers. Top set of Ans. The internet model d
layers w nsist of the application layers and the bottom set of transport layers. layers form the suite
[ __
· _user A I
Peer - protocol interface I User
2
I End user applrcatio
Presentation service
r- - ··
t
\ ';,1,l,..:ntio11 !aver Application layer Data flow control
Transport layer ~ Transpot1 layer
Transrn ission contro
Path co111rol
Physical medium
Data link
System A System 2
Physical
Intermediate system
User A User2 Comparison ofSNA, 0
1: Peer - protocol interface 1: I· In the seven layered s
to one correspondence
Application layer i S stem N
Transport layer
L Application layer (transport). S (session)
model. The p.ith contro
Transport layer conversion Transport layer but it overlaps a little wi
session layer function cq
transmission control lay
Ph sical medium Ph sical medium SNA transmission sub sy
Communication between end system via intermediate system level services combine
Fig (b) shown the enc.I systems communicating via an intermediate system N which control functions.
enables are of dillerent physical media for the two end systems. System N converts
the transport layer information into the appropriate protocols. System A could be 2. a. What are the challenge.
copper wire Li\N and system L could be an fibre optic cable. Ans. The top chnllenges in m,
Various professional organization propose, deliberate anc.l establish stanc.lards. One I. Analysing problems,
of the organisation_is international standard organization (ISO) The corresponding
20 &.111•-f-"'" f".c"'"" Sui""u
layers in the l\\O !>)Ste1m, communicauon on a per to per protocol 1111erfacc associated
with those layer:-.. The end systems communicate b)' going through an inter mediate
node/ ~ysh:m.
The mcssagi: 1n each layer is conca111cd an message units called protocol data units
Max. Marks: 80 (POU) which consists of two parts protocol control information (PCI) and user data
111 tad, 11101111ft. (UD). PCI contains header information about the la)cr. UD contains the data till
the layer. acting :1s a service provider rccci, c1 from or transmits to the upper la)cr /
service user layer.
(08 Marks) b. Give the comparison of SNA, OSI 11nd Internet protocol layer models.
us le,·els. The) can (08 Marks)
ication architecture. Ans. The internet model does not spectly the two lower layers. The transport and nctwrok
1011 layers. Top set of layers form the suite ofTCP/ IP protocol.
nsport layers
End user application Application Application specific
r2 Presentation protocols
Presentation services
Session Transport
Data tlo,, control Transport ~onnection:Connectior
SNICP ... less Ul.J '' less TCP
Transmission control
Nc1,,ork
-------·-
SNDCP .•• Network IP
Path control ·---·---C. '
SNDAP
Data link Data link
Not specific
ystcm 2 Physical Physicnl
Comparison of SN A, OSI and internet protocol lasses models.
User 2
In the seven layered SNA model. physical data link. application layers hnve one-
to one corrcspondi:nce "ith OSI layers. OSI reference model layer 3 (network) 4
(transport). 5 (session) and 6 (presentation) are structured differently in the SNA
lication layer model. The path control layer in the SNA model is similar to OSI m:twork la)cr
but it overlaps a little with OSI transport functions. Much of the SNA transrort and
session layer function equivalent to those of OSI model are done in data flow and
transmission control layers. The combination of these two services is nlso called
SNA transmission sub system. Presentation services, which are known as SNA high
level services combine presentation services :md function with some of session
ediatc system
control functions.
ediatc system N which
s. System N converts OR
Is. System A could be 2. a. What arc the challenges of information technology managers.
Ans. The top challenges in managing. the network are
blish standards. One I. Analysing problems, which requires intuition and skill.
ISO) The corresponding
21
VIII Se+w ( LS[/ISr) Network,, M ~
□
3 Acquiring. resources
4 Managing client/ server environment
S Networking "ith emerging technolog) • s n par! of contnimng education.
6. M:11nt.1i11i11g rL·liabilit).
7. M.1intai11inl,.\ a secure fire wall bet,,een internal network and internet while gaming
the vnlue c,r ,i: • info and services"' ailable from thi: internet.
8. Dctenn1n, 1u responsibility for out:iges to the WAN.
D
Worksta11on
9. Config.uring mnnngement system itself.
I 0. Lxpanding the network
I I. Gathering and analysing stati<;tics for pn:scntation to upper mg.mt.
12. rraditional maintenance, mandatory maintenance.
13. I·ault isolation.
14. 1 rouble shooting.
15. Remo, c constraints and bottle necks.
b. Explain internet fal,ric moc.lcl. (08 Marks)
Ans. Transmission control protocol/ internet protocol (TCP/ IP) is a suite of protocols D
□
that cnab)e networks to be inter conncctcJ TCP IP form the basic foundation of the
internet
Internet can be viewed as n layered an.:hitecturc as \hown in fig. The nrchitccturc
Workstation
shown the global internet as C!om;cntm; la)ers or work~tations. LANS, WANS
interconnected b) fabrics of medium ncccss controls (MAC's) switches and gatewa) s.
The workstation belong 10 tho.; user pl11111:. the LAN's to the LAN plane and WAN's to
□
the WAN plnnc. fhe interfaces are defined as the fabrics. MAC fabric interface the
user plane nnd LAN plane. fhc LAN nnd WAN planes interface through switching Gate\
fnbric. rt1c \VAN's in the WAN plane interface via the gnte\\Oy fabric.
The user's \\ orkstation intc1faccs tu a LA , in a medium access control (MAL).
LAN's interfaces to WAN by a switches fabric of bridges. routers and switches.
Communication between two user palnec. takes the following path. The physical path
trnvcls the I\IAL fabric, LAN plane, s,,itchmg fabric. WAN plnnc nnd the gateway
D Swit
fabric to the core and then Murn to the user plane going through all the planes and
interface fabrics.
D MA~
22
M~
□ l ,..:, plane □
educa11on.
(08 Mnrks)
a suite of protocol!. □ □
, sic foundation oflhe
□
c fabric interface the Gntcwny fabric
ace through switching
D
y fabric.
ccess control (MAL). Switching fabric
routers and switches.
path. The physical path
plane and the gatcwa}
ugh all the planes and
D MAC fabric
Module- 2
J, a. W ith di11gram, Cl.plain organiz.ation model. (08 Marks)
Ans. Organizalion model describes the components of network management and their
relationships. Network objects consists of network clements such as hosts, hubs,
bridges routers and so on.
The managed clements have a managed process running in them, called an agent.
The manager communicates with the agent in the managed clement. Manager
manages the managed element Fig presents a three tier configuration in which the
intermediate layer acts as both agent and manager.
23
VIII Sem, (CSE(ISf) Network, M ~ C-BCS · ~ode,l,-Q
information be1wJ
Manager protocol) messag,
message (comman
Fig represt:nts corr
Agent/ manager
Man,1gcd objects
Tram,port I
Manager
/\ppli~:ttiun
Notification / Traps
The application in the manager module initillfe requests to the agent in the internet
model. It is II part of operations in the OSI model. Tht: agent c>..ecutes the request
on the net\1 ork element i.e. managed object and returns responses to the manager.
Notifications / traps arc the unsolicited messages such as alarms generated by agent.
Fig below presents communicntion protocol used to transfer info between mannged
object and manai.dn~ orocess, as well as between management processes.
Manager Operations/ requests/ responses
applications Trans / notifications Agent nppli<;ation
Manager
Manager Agent
SNMP (Internet)
communication communication
Module CMIP(OSI)
Module
OSI model uses common management information protocol (CMIP) alons with
common management information si:rvicc {CMIS). Internet uses simple network
managcmenl protocol (SNMP) for communication. The services are part of
C'\\or the networl-. ~an be operations using requests, responses and alann notifications.
the manager devices are OSI uses both TCP and UDP. CMIP and SNMP specifies the management
communication protocols for OSI and internet management respectively.
three tier configuration. OR
manager. it collects data
in database. As 11gent, it· 4. n. With diagrnm, explain TLV encoding s tructures. (08 Mu-rks)
Ans. ASN I synta1' that contains management inform Ilion is encoded using the basic
encoding rules (BER). ASCII text data is converted into bit oriented data.
(08 Marks) TLV encoding structure is shown below.
managers process as well
m the communication of
25
VIII Se+111 ( CSf/ISE) Network,l-,1~ CBCS - l-,iode,l, Q
Object
signat_ed as the 6th bit.
specifies the length of
octets. The MSB (8th
the value. If the value Object
t e Object
s and rest of the seven
Instance 3
the length.
Object
Instance 2
ich it is computed.
Name objec Syntax Encoding
(08 Marks) Object
idcntfier ASN-1 BER Instance I
SMI is concerned only with object type and not object type and not object instance.
Fig shown the situation where there are multiple instances of an object type. A
ork LAN and WAN managed object need not be just a network element. In internet as an organization has
an object name "internet'' with OBJECT IDENTIFIER 1.3.6.1. A managed object is
only a means of identifying an object, whether it is physical or abstract.
Object type is a data type, has a name. syntax and an encoding scheme syntax of an
object type is defined using abstract syntax notation ASN- I basic encoding rules (BER)
lion. is the encoding scheme for transfer ofdata types between agent and manger processes.
Every object type is uniquely identified by a DESCRIPTION and an associated
of internet OBJECT IDENTIFIER. The DESCRIPTOR name is mnemonic and is all in lower
nication systems. case letters. OBJECT IDENTIFIER is a unique name nnd number in the management
information tree.
Internet OBJECT IDENTll711:.R :::. [ iso org. (3) dod (6) I!
Any combination of unique name and node number on the management tree can be
used.
27
VIII Semt (CSf/ISE) Network, l - 1 ~
b. Expluin RMON token r ing utcnsion groups. (08 Mllrks) types of net"orks. Co1
Ans fhere are two tok1:n ring statistic~ group~. one at MAC layer and the other on packets \)O\\in\!, 9enod ,md o\\\c
coll~")cd promiscuously. Both contams statis.tici. on ring utilization and ring error Hic.-,\(\t'\' \;fOtlj) IS hdpfo
stat1st1cs.
3) .\larm j.\roup - p
Token ring Function Tables
group the probe and Cl•mp
V. lu:never lh1: monit
l. Statistics Current utilization and error Token ring ML stats table avoid excessive en:nt
statistics of MAC layer
are specified. Groups
2. Promiscuous Current utilization and error Token ring ML stats 2 table alarm parameters. I h
statislics statistics of promiscuous data token ring P stats table to select the variable a
3. MAC layer Historical utilization and error Token ring P stats 2 table token -') Host group - cont·
history statistics of MAC layer ring ML history table a list of host~ by loo•
4. Promiscuous Historical utilization and error Token ring P history table source and dcstinatior
history statistics of promiscuous data and tune table host c
5. Ring station Station statistics Ring station control table ring done host-able contair
stat ion table: S) nchroni,cd \\ ith res
6. Ring station Stntion ordl.!r Ring station control 2 table ring 5) Host top N group.
order station order table categories. ·1he ho,t t
7. Ring station Active configuration of ring Ring station con fig table 6) l\1:ltri'1. group - sto
configuration stations is ere/lied for each co
8. Source Utilization statistics ofsourcl.! Source routing slats table source are - rna1ri, control t
routing routines info routing stats 2 table keeps track or source
The promiscuous group com::cts s1at1st1cs on the number of packets of various sizes destination 10 source t
and the type of packets multicast or broadcast data. There are tow corresponding 7) Filter group - 1s 11.
history statistics groups. Current and promiscuous. One data tabl~ is associated with Stream of data based
each of four statistics groups. The history control table is common to both Ethernet a fi her table and a ch,
and token ring. arbitraf) fi her expres
Three groups are associated with the stations on the ring. The ring station group each filter associated
provide statistics on each station being monitored on the ring, along with its status. 8) Pucket capture gr
The data art: stored' in the ring station tablt:. The ring ~la lion order group provider the based on fi lrcr criteria
order of the station of the monitored rings and has only a data table. The ring station 9) Event group - C
configuration group manager the station on the ring. and falling alam1s c:i
The last of the ring groups is the source routing group. It is used to gather statistics trnnsmit events, syste
on routing on routing info in a pure source routing environment. b. Write short notes on
OR i\nll,
Structure
6. a. Expla in RMON, common and Ethernet groups (10 Marks)
Ans. The RMON, Ethernet groups are
I) Statistics group- contains statistics measured by the probe for each monitored
Ethernet interface on a device. The ether stats table in the group has an entry for each I. l'rim itivc I) pcs t
interface. The data include statistics on packet types. seizer and errors. The group is
used to measure live statistics on nodes and segments.
2) History eroup consists of two sub 1:,aroups history control group and history (data)
group. History control group controls the periodic statistic sampl mg of data from various
28 ~l\.f-M f ,t,.M Sul\l\'-"
M~
(08 Marks) types of neh\orks. Control table stores configuration entries compressing interface
the other on packets polling period ·and other parameters. The information is stored in media sp~··111c table.
ation and ring error J listory group i1, helpful in travelling o,erall trend in the volume oftrallic.
3) Alarm group - periodically takes statistical samples on spt:cificd ~, ,ables in
the probt: and compares them with the preconfigured threshold store, probe.
Whenever the monitored variable crosses the threshold. an event is gl: 1.:1.,ted. To
Lstats table avoid excessive event generation on the threshold border, rising and fall in~ '1 rcsholds
arc specified. Groups contains an alarm table with a list of entries tha, ,;, nne the
stats 2 table alarm parameters. The columnar objects alarm variable and alarm inter. a, nre used
ats table to select the variable and the sampling interval.
tats 2 table token 4) Host group - contains information about the hosts on the network. It ompiles
ry table a list of hosb by looking at the good baskets traversing the network a11< , ,tracting
history table source and destination. Group compresses three tables host contrast table, ::ost table
and time table host control table controls the interfaces on which data bathering in
ontrol table ring done host-able contains statistics about thc::"11ost. The entries in the two data tc,bles ar.e
synchronized with respect to the host in· host cohtrol table.
ontrol 2 table ring S) Host top N gmup- generates reports ranking the top N hosts is selected ~tatistics
table categories. The host top N control table is used to initiate gc111:ratio,n of such a report.
6) Matrix group - ~tores statistics on conversation bet~,een pairs of hosts. An entity
con fig table
is created for each conversations that probe dctects. Thc three tables in the group
arc - matrix control tablt: contains the information 10 be gathered uiatrix SD table
g stats table source
2 table keeps track of source to destination conversations. Matrix DS table keeps data by
destination to source trallic.
ckets of various sizes
7) Filter group - is used to base the capture of filter packets on logical expressions.
re tow corresponding
bl~ is associated with Stream of data based on a logical expression is called a channel. The group contains
mon to both Ethernet u filter table nnd n chnnncl 1abl1:. Filler tabl,; allow:; pockets to be filtered with c111
arbitrary frlttr expression. for cach channel. the input packet is validated against
he ring station group cach·filtcr associated with that channel and is accepted if it passes any of tht tests.
along with its status. 8) Packet cnpttlrc group is a post filter group it captures pnckcts from each channel,
er group provider the based on fi Iler criteria of packet and channel filters in the fl lter group.
table. The ring station 9) Event group - Controls the generation and notification of events. Both rising
and falling alarms can be specified in the event table associated with ,he group. To
cd to gather statistics transmit events, system maintains a log.
nt. b. Write s hort notes on 'SNMP based ASN -l data type structure (06 Marks)
Ans.
termination system
s.
(CMTS Fiber
Receiver
2
Switch/ WAN
router
Splitler ISP
6
and filter
Servers
Video
.-------,
Upemtions support 3 Security and
S) stems/ clement access controlle
mcss:11.tcs
lnterfnccs :-
1. CMCI Cable modem to CPl:. interface At the head encl,
2. CMTS - NSI CMTs Net\\ ork side interface WAN and intern
3. DOCS - OSSI Data over cable services operations support system arc multiplexed
interface Communication
4. CMTRI Data over cable security system head end to the Ii
5. DOCSS Data over cable security system from the hcntl c1
G. Rrl Cable modem to RF interface. signal. The sign
up~tream i.i1malJ
30 S.-1\.tM C.CAM Si.AM
M~
The bottom pan offig sho" nan expanded, ic\\ of the head end It comparc1l tl-ie cable
mod..:r 11 1cn111na1ion system (CM.IS) S\\ itch , router combiner transmiill·, splilh:r
al Ip address "r.1p around. and ti hers servers, operations support systems / t:lemcnt manager and ~- ·1ity and
integer montonically acccs~ controller.
ax 2 32 -1 CMTS consi,ts of a modulator mod and a dcmodulator clcmod on ti· •re link
negotive integer ,ocrc~se side and a m:twork terminator term to the S\\ ilch / rouler connecting l\l \,.\N. Tht:
modulator is connectt:d to the combiner, which multiplexes the data and, ,~.!,, ,;ignals
e int,:ger in hundreds of and feed~ tht:m lo transmillt:r. The rccciH:r converts optical signal do,, , the RF
level and feeds il lo the splitter where the various channels arc split. ThL dcmod in
wide arbilrary ASN -1 the CMTS demodulates the analog signal back to digital data.
le wrapped OCTET Sen er al the head end handle applications and databases. The securit) ' o1ction is
manngccl by security and access controller~. Two interfoces in the datn ' 1 . · face nre
cable modern 10 customer premises equipment (CPE) interlace and the l . rs - NSI
interfoce, \\hich is the nct\\ork side of the interface ofCMTS ,~ith s, i .1 , router.
l"hc second cntegory of interfaces include operations supporl S) stems interfaces and
third categoi} include security system int.:rfacc.
re (10 Marks)
The IU 1nterfac.:cs arc between the cable modern and lhe HI C network ,111. between
ure of HFC data over CMTS and the I IFC network in the upstream and downstream directions respectively.
b. Explain IIFC tcchnolo1,:) (08 '.\1:lrks)
Ans. 1IFC technologies i~ based on existing cable t-:levision I cablc tv or(ATV) technology.
In I IFC, the signal is brought to a fiber node via a pair of optical fibus and then
distributed via co cablc to customers as illustmted in fig below
Cab c
Satellite dish
·mnsmitte
Fiber
Receiver
\\'AN
ISP------
Work station
At the head end, signals from various sources analog and digital Ser\ ices using
WAN :ind inkrnct service pro, idcr ( ISP) sen ice~ 11si11~ prh ate backbrn1L' network
are multiplexed and converted up from an electrical signal to optical signal.
tions support system
Communication is one ''") on the optical fiber. each p:iir <'I opllc/ll lil>bN f·om the
head end to the fiber node calling onc way traffic in one dir..:c1tons l'hc signal going
from the head end to the customer premier is cnlled a do" nstream sig,1,11 or a path
signal. The signal going from the customer_prem icr to tire hi:all .:•1CI is called an
up~lrt'am \ignal or arc, 1:1 w p.iiu ~ig.nal.
SMl\t.-tA.. C..AM Sul"""" 31
VIII ~em., (CSf/ISE) h'etwo-r-k, M~~ C13CS · Mod.cl,Q~
Broad band !iignal transmitted over co a>.ial ..able difl"t:rs from the band signal u - R, - lntc1face be
transmitted over a pair of wires. V,. Lo1:,ical interfo
At the customer premises, a network interface unit (NIU) also refereed to as networl-. v,. = lt1tcrfac..• bctw
interface device (NID). The analog signal is split at the NIU. The tv signal is directed TL= Terminal equip,
to the TV and the data to the cable mC'dcrn Cahle modern convert the analog signal POTS = Plain old te
to an Ethernet output feeding Either a PC or LAN. ·1he tclt:phone signal is abo PSTN = Public swit
transmiued \\ ith, idco and data at s0111e cable site:;.
Splitter:, at central
HIC technolog) is based on
telephones s ignal ti
I. Broadband LAN 2. Asymmetric b,11,dw1dth allocation to achieve two \\ay
central office. klcp
communication. 3. Radio frequency )prcad )pectrum technique for causing signals
are all off the spfitt
oH:r IIFC 4. Radio frequency spectrum allocation to cause multimedia telephones
POTS interfoct: arc
(voice). tdcvi<;icn (v1cleo) and computer communication data services.
satellite feed direct I
OR
b. With diagram exph
8. a. With •li11f!rnm, c~ph1in ADSLsy,;tcm reference model (08 Murks) Ans. fhe access networ
Ans. SON CT WAN arcs
'·., V
! '
U-C: 4-c UR ' UR-1 .F-sm
T
Digit.ii_:
I •
' s
.
□
''
broadcast\ ''
ATU-C I
Broa,llast '
''
nctworl-. j ATU-C
----,,
ATU-R i : TE
: r.,_
Narrow band :
network : I .
t I '
I : TE
POT JC -·· POTS-R ....
1:1 ATU-C TE
Nctwo,k ATU-C Premises :
management Phone (s) distribution
PSTN network
Acc..:si.
node
Interfaces:
B= Auxiliary data input
POTS-C = Interface between PSTN and POTS splitter at network end
POTS- R Interface between phones and POTS satellite at premier end
T = Interface between premier distribution network and service mobiles
T/ sm = Interface between ATU-R and premier distribution network
u - c = Interface between LOOP and ATu - c Two types of occei
u - C1 = Interface between POTS splitter and ATu -c customers. They :ire
11-R = Interface between LOOP and ATU - R subscriber line (OS
can support the trn1
transmitted to cabl
The alternnti.,.c me
32
from the band signal u. R = Interface between POTS spliller and ATU • R
V = logical interface b/ w ATU - C and access node
refereed to as network VC = lnterfac·.! between accesi. node and network
he tv signal is din:cted TE·= Terminal equipment •
vert the analog sigm,I POTS = Plain old tclcpho11e service
lcphone signal is abo PSTN = Public switched telephone network
Splillers at central office and customer premises separate the low frequencies
telephones signal from video and digital data. PSTW is the switch connected at
to achieve two way central office. telephones operate off the splitter at customer end. The u- interface~
ue for causing signals are all olf the splitters. Interfaces may disappear when ASDL lite is implemented
multimedia telephones POTS interface are also from splitters. B interface is for auxiliary data input in a
services. satellite fet:d din:ctty into a service module such a set top box.
b. With diagrnm explain broad bnnd ncct:ss network (08 Marks)
(08 Mnrks) Ans. The access nctworl.s serving the customer premises from the backbone SDH/
SON ET WAN arc shown in tig below.
Cabk
F-sm
T
Tclophouc
s loop
□
IIFC
clwork
'TE
: Ti-.
'Tr
• TE
Premises :
distribution SD/1/00NF.T
WAN
network
Satclli1c communication
and h:lcphon~ loop
twork end
premier end
ice mobiles Wireless - - ~
link
1elcphonc loop
network
Two types of access technology are available to residential and small business
custo~ers. '.hey arc based on hybrid fibre coaxial (HFC cable) modern and the digital
subscnbcr l111e (DSL). Network access based on either of theses two technologies
can support th9 transmission of voice video and data. In case of HFC information is
transmitted to cable modern at the customer from an MSO facility.
The alternative method of getting broadband-services to the home is via a satellite
33
VIII Se-w11 ( CSE/ISE) Networ'lv~~
communica1ion m:1work. /\;, with "irdcss, thi~ neh,ork is t:nvisiont:d as bc111g on~
"a> 10 the customers local ions Md telcphon~ return. /\gain, the potential or 1,,0 -
way satcllilc communication cxi~t,;. SMT
Broadband acce;,s technologies consist of four mutu..llv independent of transporting Gote
multimedia data to a customer premises. ·1he} are based on cable modern, digital
subscriber line and two wireless media methods. rrn
Gate\\-
Module- 5 Screene
9. a. E,plain linit.- s t:lll' rn:11.:hinc model (08 Marks) nnd f
Ans. Model based fou lt detection scheme uses communicating finite :-latt: machine Thc m.iin
foature or 1hi:, procc~:. is. it is a passive testing ~)stcm based on th1: assumption than Packet ti ltering routers c
an observer agent is pres\;nt in each node and reports nonnali1:iblc to a centrnl point. further screening. The nil
A failure in a node or a link is indicated b) state machine associated with component Application level gnlcw
entering an illegal state. An applicntion on the server co- relate--. the events. packet fi hering.
FSM enables communication between a client and server -.ia a communication Firewalls l find 2 will dnt
channel. Client and server both have t\\0 states each. The client '-"hich is in send A secured LAN is a gm
request stale. sends a request message to the sener and transition to the rece1,·c gateway. Ex: FTP service
rcsponsc state. For TELNET service, ap
The sen er in the receive requc<,t state. rccei\ e:. the request state. receiver rcqm:st
and transition to the send n:spom,e stati:. Alier processing the request. it send<, the
response and transition back 10 the rccci, c request state. The client then receive the
response from the scrver and transition to send request state.
l ,,mmunn;J11on
Kc4uc,t chaMcl
Firewalls protect a secu
parameters (In FTP, NNTI
34
k,l-,1~
FTP
Packet
filtering
router
g
(08 Marks)
Secure network
late machine. The main
Packet filtering routers can l'ither drop packets or redirect them to specific hosts for
11 the assumption than
• further screening. The rules to be implemented should be simple.
,able to a central point.
ciated with component
Application level gateway - is used to overcome some of problem identified for
packet fi ltcring.
s the events. . .
via a communication Firewalls I and 2 will data only if it going to or coming from the application gateway.
client which is in sc_nd A secured LAN is a gateway LAN. Filtering is handled by the proxy services in
allsition to the receive gateway. Ex: FTP service fi le is stored first is application gateway and then forwarded.
For TELNET service, application gateway authenticates foreign host.
t state. receiver reqm:sl Secure
the request. it send~ the Firewall I
c dient then receive the
Proxy
services
Application
gnteway
35
VIII Sew11 ( CSE(ISE)
E ighth Semester 8.1
Time: 3 hrs.
~ Note : Am wer any FfJIJ::J
I VIC}
I. a. Wha l are tbc challc11
~pace____.___ _,
why?
Domain Ans. For challenge Refer Q.
s ace ..__ _ Polic) Action NMS is required for l
_ _ _ driver space (a) For proactive rnana
(b) To verify customer
Rule space (c) To diagnose problc,
(d) To provide statistici
Policy driwr i~ 1he control mechani!.m. Thus _t~e o~jccts in 1he do~ain ~pace and tl~e (e) To help remove bot
set of rules in rule space are combined for dec1l>IO,l 1.c. made by policy driver. A rule '.s (f) To see the trend in ~
that all network failures an: to be reported to engineering group. A statement that this b. What is Nehvork
rut~ applies to the operations personnel at the netwon.. management desk is a policy. management function
Ans, Refer Q.No. I .a. of M(
b. Explain fo ult detection, location and isolation techniques. {08 Marks!·
Ans. fault detection is accomplished by using either a polling scheme or by generation o c. Explain Network mar
traps. An application program generates the ping command pcriod1call) and "~its
for response. Connecti\ it) is declared broken \\hen a pre set number of consccl'.tlve Ans. A network managem
responses are not received. The freque ncy of pinging and pre set number for fo'.lt~rc systems A and B exch
detection may be optimiLed for balance betnecn traflic over head and the rap1d1ty of management'infonn
\\ ith which failure i~ detected. Traps genetic trap van be set in the agents, giving management controls (
them capability to report events to the network management system with legitimate
communit) name. Advantage of traps o,er polling is that failure detection is
accomplished faster with less traffic overhead.
fault location using a simple upproach would be to detect all the net\\.Ork component:.
that have failed. The origin of the problem could then be traced by traversing the
topology tree to where the problem starts. If an interlace card on a router failed, all
managed components connected to that interface would indicate failure.
/\fter having located where the fault is. nc:1.t step is to isolate the fault. f-irst is to
determine whether the problem il> failure of the component or failure of ph)s1cal
link. Let us assume that the link is not the problem and the interface card is then
proceed to isolate the problem to the layer that is causmg. t:..,cessive packet toss may
cause disconnection and can measure the packet loss by pinging, if p,nging can be
used. We can query the various MIB parameters on the node itself or on other related
nodes to localiLe further the cause of the problem.
The i?eal solution t~ locating and isolating the fault is to have an artificial intelligence
sulullon. Uy observing all the symptoms, we might be able to identify the source of
the r,roblcrn,;
36
Eighth Semester B.F,. Degree Eumination, CBCS - June/ July 2019
Network Management
Max. Marks: 80
Note : A11swer any FIVE full q11ntiom, selec·ting ONEfull quell/011 from ea,·h module.
Module- I
Whal ~ire the challenges of managing nct"'·ork? Explain the ust: of NMS and
why? (05 Marks)
For challenge Refer Q.No. 2.a. of MQP - 2
NMS is required for following reasons
Action
(a) For proactive management of network
space
(b) To verify customer configuration
(c) To diagnose problems
(d) To provide statistics on performance
(c) To help remove bottlenecks
the domain space and the (f) To see the trend in growth.
e by policy driver. A rule ~s
roup. A statement that thl'i b. What is Network management? With neat diagram, explain network
agcment desk is a policy. management functional groupings (06 Marks)
Refer Q.No. I.a. of MQP - I
ues. (08 Marks)
chemc or b) generation of c. Explain Network management Dumbell architecture, with neat diagram.
and periodical\) and waits (OS Marks)
set number of consecutive A network management dumbell architccturc is shown in fig where vendor
pre set number for fa'.11'.re systems A and B exchange common management messages. The message consist
over head and the rnp1d1t) of mnnagement'information data (ex type , id and status of managed objects) and
set in the agents, giving management controls (ex setting and changing configuration of an object)
--
c.~.....
ent svstem \\ ith legitimate t.4M - I
s that failure detection is
2
CBCS · J (;(NU!/ I J 1M.:1 2019
!cture are presented in c. Explain TLV encoding structure with value of class bits. (03 Marks)
ilications such as fault Ans. Refer 4a Model Questions paper 2.
, CMIP for OSI model Module-2
r layers of OSI model
I model. 3. a. What is Management Information base'! Explain MIB module structure.
(05 Marks)
Ans. Reter Sa Model Questions paper 2
ehvork management
b. Explain in brief with neat diagram, two tier, three tier and proxy server SNMP
(08 Marks)
organization model. (06 Marks)
Ans. Refer 4a Model Questions paper 1
nd tags. Give example
(05 Mark,) c. Explain SNMP Network management architecture, with neat diagram,
(05 Marks)
Ans. Network management architecture portrays the data path between mam1ger
Example application and agent application process via the four transpo1t function protocols :
r ethcrstats pkts UDP, IP, DLC and PHY.
I '
r, lpAddress RFC 1157 describes SNMP system architecture SNMP is management info for a
network element may be inspected or altered by logically remote users.
~el Record
The get request message is generated by the management process requesting value
-MIB of an object . the value of an object is a scalar variable.
The get next request or get next can have multiple values because of multiple
, s ends with EN D instances of object.
SNMP Manager SNMPAgent
,.._
Manageme11t
dnl3 K SNMP Manager
Application
'
( SNMPAgent
Application
i1:, ii:,
1
1ii
" .,X i1:,
::,
<,
~
r:
0
C.
-.. .,..8'
"'
:, ;; t1 0c.."'"r:
~
~ u
C
i
[ ~
i g
C. e C:
:,
e ~
0/)
t " 0/)
C.
!l u g 0/)
SNMP SNMP
UDP UDP
·he data type whose name JP IP
can be defined as follows DLC DLC
UENCE are list builders.
PHY PHY
Physical Medium
3
,.
_.l I
•
Nol
_t_
Read
•
Write
•
Read /
accessible only only \Hile
Object I Object 2 Obj~-ct 3 Object 4
< MiRAccess
ifNumber i
if Entry
4
,O' E«AM 5c.A1111
cscs -JIM'\£// J~ 2019
p access mode
Manngcr 2
Object 4 (Communit 2
5
VIII Se.111 ( CSE I IS£) Network, M ~
c. Explain ICMP group with neat diagram. (05 Marks) The netw01k ma
Ans. Internet control message protocol group contains the statistics on ICMP control probe monitor th
messages. and backbone ne
ICMPGroup network segment
Entity 0 1D Description (brief) Ex :- RMON cou
Total no of ICMP messages receive by the an abnormal con
icmp ln Mrgr icmpl an alann.
entity including icmp In Errors
No of messages received by entity with ICMP The individual
icmp In Errors icmp2 diagnosed and
specific errors.
No of icmp destination unreachable message technology in a
icmp ln Destunreaches icmp3 productivity for
received.
icmp ln Time Excds icmp4 No ofICMP Time Exceeded messages received. b. Explain RMON
No of ICMP parameter problems messages Ans. Refer Sa Model c
icmp In parm Probs icmp 5
received. c. Explain four b
icmp l n Src Quenches icmp6 No of ICMP source Quencr messages received Ans. Four modes of
icmp In Redirects icmp7 No of ICMP Redirect messages received fibre coaxial (Ii
icmp In Echos icmp8 No of lCMP Echo (reQuest) messages received. Communication
deployed means
icmp In Ecro Reps icrnp 9 No oflCMP Echo reply messages received
..
The syntax of all ent1t1es 1s read - only counter. Ex stat1st1cs on the no of reQucsts
sent might be obtained from the counter reading of icmp out Echos.
Module-3
5. a. What ls Rcmoh: Monitoring? With a fiture, explain use orRMON probe.
(06 Marks)
Ans. Remotely monitoring the network with a probe is referred to as Remote Network
monitoring (RMON . Fi below shows the use of RMON probe
In one - way te
traverses cable.
network telephone facilit
acceptable. For I
internet, there r
implementation
trans mission Ii
NMS three categoric
Probe
Wireless satelli
cacs -JIA.,V\,(!/1JtAl:Y 2019
(05 Marks) The network management system(NMS) is on local ethemet LAN. A token ring
ics on ICMP control probe monitor the token ring LAN. It communicates with NMS via routers, WAN
and backbone network. One advantage is that each RMON device monitor the local
network segment. it relays info in both solicited and unsolicited fashion to the NMS.
)On (brief) Ex :- RMON could be locally polling the network elements in a segment. If it detects
gcs receive by the an abnormal condition such as heavy packet losses or excessive collisions, it sends
an alarm.
~ Errors
The individual segments can be monitored almost continuously. A fault can be
id by entity with ICMP diagnosed and reposted to NMs. The overall benefits of implementing RMON
technology in a network are higher network availability for users and generate
unreachable message productivity for administrators.
b. Explain RMON, Groups and its functions in detail. (OS Marks)
eeded messages reee ·ve Ans. Refer 5a Model Questions paper I.
problems messages
c. Explain four broad band access technology with example. (05 Marks)
Ans. Four modes of access rely on four different technologies There are Hybrid
,encr messages receive fibre coaxial (HFC) Cable, Digital subscriber line (DSL) wireless and satellite
messages received Communication . HFC uses television facilities and cable modem and is most widely
Quest) messages received. deployed means of access of fou,.r.;....._ _ _ __,
Broadband Access
y messages received
0 11 the no of reQue~ts
Echos.
ofRMON probe.
(06 Marks)
to as Remote Network
obe
7
VIII Sem., (CSE I I SE)
OR c. Explain S layers Al
6. a. Explain SNMP operations with details. (06Ma r ks)
Ans. Refer 6a Model Questions paper I. Ans. Management ofl-U,
of either a compute
b. Explain RMON1 m1magement info base with details. (05 Murks) layer architecture fl
Ans. Refer Sb Model Question paper I. WAN via an ATM
c. Explain Data over Cable system reference Architectures. (05 Marks) the applications am
Ans. Refer 7a Model Question puper 2. in the cable mode
Module-4
7. a. Briefly explain HFC technology , wi~b neat block diagram. (06 Marks)
Ans. Refer 7b Model Question paper 2.
b. What is CMTS? Bricfty explain with example. (05 Marks)
Ans. Cable modern and Cable modern termination system (CMTS) can be managed with
SNMP management. Fig shov.n the Mm•s associated with cable modern and MIB
associated with CMTS that are relevant to managing cable modern and CMTS.
Three functional f
support and plan
include detection
gains signal levelf
docsit MIB docsTrCntMlB
127 128 8. a. What is ADSL T
The first category is the generic set of lETF MIB'S, system {mib - 2 I} interfaces with example.
{mib - 2 2} [RFC 1213). The second category comprises MlB'S for interfaces of Ans. Fig shown the c
cable modern and CMTS. The docs If mib is a sub node under transmission and premise network
includes objects for the Baseline privacy Interface. The docs Tr Cm MIB specifics ADSL = Asynch
the telephone return for cable modem and CMiS. ATM = Asynchr
The Docs interfaces objects group, docs if MIB objects has three sub groups. Base STM = Synchro
interface objects common to CM and CMIS. The RF spectrum has n layered interface TE = Terminal ~
similar to that of ATM sub layers. The CMTS interface objects group, docs if cmts OS = Operation i
objects pertains to the CMTS. PDN = Premises
SM = Service M
8
7<,M~ CBCS ·]IM'\£,/]ufy2019
c. Explain S layers Architecture of network management, with neat diagram.
(06 Marks) (05 Marks)
Ans. Ma1~agement of HFC system with cable modems is more complex than management
of either ~ computer network or a telecommunication network. Figure presents the
(05 Marks) layer architecture for network management. the head end is connected to the SON ET
WAN via an ATM link and to the HFC via an HFC link. The head end contains both
(OS Marks)
!he applications and SNMP manager in the application layer. An SNMl> agent resides
in the cable modem that monitors both RF and Ethernet interface.
Head end Cable modem Subscriber
Applications Modem Applications
SNMPAgent Applications
(06 Marks) SNMPMGR
Applications SNMP, FTP
SNMP
(OS Marks) SNMPMGR HTTP etc
) ca1\ be managed with
TCP/UDP TCP/UDP TCP/UDP
cable modern and MIB
odern and CMTS.
IP IP IP
'
Er ATM
link
HFC
link ~
HFC
link
Ethernet
link ~
Ethernet
link
SM
Brood b:1nd
NIW
Narrow band Access
NIW Node Message
AOSL ADSL proeessin
Pa.:~,1
NW model
STM
ATM
Packet
STM
..
ATM Packet
10
CBCS •JIAKI.(!//Jufy 2019
Securit model
Onto intcsri1~ Aulhen11cation
module
Data origin authentication
Message Privacy
processing i--- Data confidcntialit) _ _....,.
module
model
Timeliness
Message: Timeliness and module
limited rc:play protection
The authentication module provides two services - data integrity and data origin
authentic cation. TI1e authentication process involves the use of protocols such
perators system (OS) as HMAC - MOS or HMAC - SHA - 96 in SNMPV1 . The data origin service
AM) of network and authentication ensures that claimed identify of the use on behalf is truly the originator
of the message. The authentication module appends to each message a unique
es on _ line services . identifies associated with an authoritative SNMP engine.
onsists of broad band.
d band signal • narrow
Module- 5
8 remote location such 9. a. Briefly explain ADSL fault management with example. (08 Marks)
Ans. Fault isolation parameters are ~hown in table below
remises network starts Parameter Component Line Description
port modes are shown
ADSL Line Indicates operational status and
ir cable of a telephone ADSL line Physical
status various types of failures of link.
1cable, fibre optic cable
Alann thresh Generates alarm on failures or
ATU-CIR Physical
holds Crossing of thresh holds.
overall network with Initialization failure of ATU - R from
(06Marks)
Unable to A.TU- CIR Physical
initalizc ATU - R ATU-C
Event generation upon rate changes
presentation of SNMP Rate Change ATU-CIR Physical when shift margins arc Crossed in
(04 Marks) the both upstream and downstream.
curity subsystem user Faults should be displayed by network management system. After the automatic
indication of faults, ATU • C and ATU - R self tests as specific in Tl.413
ADSI line st:itus shown the current status ot line - whether it is operational or there
is a loss of any of the frame , signal , power or link parameters. rt also indicates
initialization errors Alanns are generated \\hen the reset counter reading exceeds IS
minutes on loss of signal, frame. power, link and error seconds.
11
VIII Seffl/ ( CSE I ISE) CBCS • JfM/l.e.,/J~ 2
------------------------------' Network - - M ~
b. What is Event Correlation ? Explain case ba,sed reasoning in detail with
necessary diagram. (08 Marks)
Ans. The management system has to correlate all the events and isolate root cause of the
problem. The methods used to do so are called Event Correlation.
Case based reasoning (CBR) overcomes many of the deficiencies of RBR. The
general CBR architecture i~ shown in fi
12
CBCS • JIMU!/( ]IMIJ 2019
..-------'
c,~~
Application Ticket
~~ Granting
server
server
A user logs on to a client workstation and sends an logins request to the authentication
ng with case library. server. After verifying the user on its access control list, the AS grants an encrypted
:xtends it to current ticket to client. The circuit workstation reQucst a password from the user and then
decrypts the message from AS. The circuit interacts with ticket granting server
in the case library, and obtains a service granting ticket and session key. The application server, after
ten by adapt module validating ticket and session key provides service to user.
ss module takes the (3)Authentication server system - is similar to ticket granting system but no ticket is
wly adapted case is granted. No login identification and password pair is sent from client workstation.
The user sends id to central authentication server,aner validation of user, as a proxy
compared to similar agent to client and authentication the user to the application server.
ith resolution. The (4) Authentication Using Cl) ptographic functions - involves the use of cryptographic
e of three ways (I) functions. The sender can encrypt an authentication request to the receiver, who
ation and (3) Critic decrypts the message to validate the user's identification. Algorithms and keys are
used to encrypt and decrypt messages.
b. What is Privacy Enhanced Mail (PEM) ? Briefly PEM process, with neat block
diagram. (08 Marks)
erver Environment.
(08 Marks) Ans. Refer 10 b Model Question paper I.
ofan authentication
13
1/
Eighth Semester 8.E. Degree Examination, CBCS - Dec 2019 / Jan 2020 protocol (CMIP). them
Network Management of tl~e internet suite of I
Time: 3 hrs. Max. Marl-s: 80 and is connectionless ne
Note: A11swer any FIVE/111/ q11estiom, selecting ONE full questio11/ro111 each 111od11/e. An~· Discuss the challenges
. Refer Q.No. 2.a. MQP. 2
Module- I
1. a. Explain the salient services provided in OSI layer. (08 Marks) 3. a. Explain the various dat
Ans. Refer Q.No. 2.b. of MQP - I Ans. Refer Q.No. 2.b. of MQP
b. Briefly explain Network Management functional groupings, with diagram. Refer Q ·No· 2·b· ofJ une
(08 Marks) b. Discuss 2 Tier mod 1
Ans. ReferQ.No. l.b. of June/ July 2019 Ans Tw • c and
. o tier model • network
OR hubs, bridges and routcrs.
~
2, a. Discuss application-specific protocol in OSI Models (08 Marks)
Ans. All application specific protocol services in OSI arc sandwiched between the user
and presentation layers. _ _ _ __
OSI user MDB-M
- _- anagement Datab
-Agent Process
Terminal
VT Application
As shown in fig the .
mana ' re is a d
ges the managed ele
~
management
. . data, Processes
FTAM File Transfer aF m11111nal set of infiormation
.
or three tier, Refer Q.No. 4.
Mail/ Message
MUTIS -----1 Transfer . a. Discuss the vnrious models
ns. Refer Q.No. 2.a. of June/ Jul
Management b. Explain the following·
CMIP i) Macros ·
Application
ii) Functional Model
ns. (a) Macros. ASN.I langua
Presentation layer types and values by defining
ThcASN. l macros also fncili
The boxes on the right hand side describe the services offered. A user interfaces with characteristics associated wi
a hu:,l at a remote terminal using virtual terminal (VT). File transfer are accomplished Structure of ASN. I macro is
using file transfer access using file transfer access and management (FTAM) in OSI <macroname> MACRO : :
model. Similar to SMTP protocol is message oriented text interchange standard BEGIN
(MOI IS). Network management is accomplished using common management info TYPE NOTATION : : = <S)
14
CBCS - 'f)ec,2019 I Jt:M'\/2020
protocol (CMIP). the most used network protocol is lntemct protocol (lP). It is a part
2019 / Jno 2020
of the internet suite of transmission control protocol and internet protocol (TCP/IP)
and is connectionless network service protocol.
Max. Marks: 80 b. Discuss the challenges facL-d by Network Managers. (08 Mark.,)
1 from eacll module. Ans. Refer Q.No. 2.a. MQP - 2
Module- 2
J. a. Explain the various data types and symbol used in ASN.1 (08 Mark.,)
(08 Marki)
Ans. Refer Q.No. 2.b. of MQP - l
Refer Q.No. 2.b. of June / July 2019
pings, with diagram,
(08 Marks) b. Discuss 2 Tier model and 3 Tier model. (08 Marks)
Ans. Two tier model - network management consists of network elements such as hosts,
hubs, bridges and routers.
(08 Marki)
iched between the user
MDB = Management Database
- = Agent Process
erminal
pplication As shown in fig, there is a database in the manager but not in the agent. The manager
manages the managed element. TI1e manager acquires the agent and receives
management data, processes it and stores it in the database. The agent can also send
a minimal set of information to the manager unsolicited.
le Transfer
For three tier, Refer Q.No. 4.a. of MQP - I
OR
ii / Message Discuss the various models in Network Management. (08 Marks)
Transfer Refer Q.No. 2.a. of June / July 20 19
b. Explain the following:
anagement i) Macros
pplication ii) Functional Model (08 Mark.,)
ns. (a) Macros - ASN. l language permits extension of capability to define new data
types and values by defining ASN. l macros.
The ASN. I macros also facilitate grouping of instances ofan objects and determining
. A user interfaces with characteristics associated with it.
nsfcr are accomplished Structure of ASN. I macro is shown below
gcment (FTAM) in OSI <macronamc> MACRO : : =
xt interchange standard BEGIN
mon management info TYPE NOTATION : : = <SyntaxofNewType>
15
VIII Se,m, (CSE I ISE) Net:wor-'/v M ~ CBCS - Veo201~ I J~
V/\LUE NOTATION : : = <SyntaxofNewValue> (I) It should minimi~
<auxiliary assignments> by management ager
END (2) lt should be flexi
The keyword for MACRO is in capital letters Type notation defined the syntax of (3) SNMP architect
new types and VALUE notation defined the syntax of new values. The auxiliary particular hosts and
assignments define and describe any new types identified. Only non aggregate
Example of object identity macro is shown below are communicated a
OBJECT - IDENTITY MACRO The get request and
BEGIN data from network el
TYPE NOTATION : : = of network element.
"STATUS" Status messages from mana
"DESCRIPTION" Text time stamp. SNMP n
Refer Part protocol in order to b
Value NOTATION : : = b. Explain various gro
Value (VALUE OBJECT IDENTIFIER)
Status : : =·'Current''\ "depreciated"\ Obsolete"
Refer part :: = "REFERENCE'' Text I empty Ans. Refer Q.No. S.a. of M
Text : : ="'"'String"'""'
END 6. a. Explain various S~
The two syntactical expression STATUS and DESCRIPTION are mandatory and
Ans. Refer Q.No. 6.a. of M
the type Refer part is optional. The value in VALUE NOTATION defines object
identifier. b. Explain the followin
b) Func tional Model - Component of OSI model addresses the us~r oriented i) Create and Go Ro
npplications and an, specified as shown in fig below
ii) Create and Wait E
OSI Functional Ans. i Create and Go Ro
model
Manager
Configuration process
Fault Performance
Management Manngemcnt Security A •
Management M ccountmg
. . __..;;..._..J anagement Management
Configurat1on management addresses the I . .
networksand theircomponents Fa It se tmg ~nd chang111g of configuration of
o~the problem causing the fail~rc ~n t;~ea~:f:::n~t~v~~e; delectation and !solation
displays in real time all major and minor I .b const_antly rn~nrtors and
trouble ticket administration of fault. a arm ascd on seventy of failures. The <lat
Module -3
5. a. Explain SNMP ~etwork management architecture. (08 Marks
Ans. The SNMP architecture consists of communication between netwo k )
stations and managed network elements or objects SNMP co ~ ~anagement
is us d 1 • . fi · mrnun1cat1on protocol
. e _o communicate m o between network management stations and ,
station m the elements. management
There are three goals of architecture
16
CBCS · Veo2019 ( J<M\/2020
(I) It should minimize the number and complexity of management functions realized
b} management agent.
(2) It should be fl exible to allow expansion
tion defined the syntax of (3) SNMP architecture should be independent of architecture an<\ mechanism of
new values. The auxiliary particular hosts and gateways.
Only non aggregate objects arc communicated using SNMP. the aggregate objects
arc communicated as instances of object.
The get request and get next req message and generated by manager to retrieve
data from network elements. The set request is used to initialize and edit parameters
of network element. The get response request response from agent to get and set
messages from manager. there are three types of traps: Generic trap, specific trap and
lime stamp. SNMP messages are ex-changed using connectionless UDP transport
protocol in order to be consistent with simplicity of model as well as to reduce traffic.
b. Explain various groups and functions, RMON 1 perform at the data link layer.
(08 Marks)
Ans. Refer Q.No. 5.a. of MQP - I
OR
6. a. Explain various SNMP operations. (08 Marks)
lPTlON are mandatory and Ans. Refer Q.No. 6.a. of MQP • I
NOTATION defines object b. Explain the following briefly:
i) Create and Go Row creation
ii) Create and Wait Row creation. (08 Marks)
Ans. i Create and Go Row creation
Manager Agent Managed
process process Entity
Security Accounting
mcnt Management
Set Request {
changing of configu~ation_ of Status 3 = 4
Ives delectation and isolauon index 3 = 3
MS constantly monitors and data, 3 = Def (Data) Create
on severity of failures. The instance
Instance
Created
(08 Marks) Response {
cen network management Status 3 = I
MP communication protocol index 3 = 3
nt stations and management data, 3 = Def(Data)
17
CBCS · Veo2019 ( j<M'V1020
(I) lt should minimize the number and complexity of management functions realized
by management agent.
(2) It should be flexible to allow expansion
tion defined the syntax of (3) SNMP architecture should be independent of architecture anct mechanism of
ew values. The auxillary particular hosts and gateways.
Only non aggregate objects are communicated using SNMP. the aggregate objects
are communicated as instances of object.
The get request and get next req message and generated by manager to retrieve
data from network elements. The set request is used to initialize and edit parameters
of network element. The get response request response from agent to get and set
messages from manager. there are three types of traps: Generic trap, specific trap and
time stamp. SNMP messages are ex-changed using connectionless UDP transport
protocol in order to be consistent with simplicity of model as well as to reduce traffic.
b. Explain various groups ond functions, RMON 1 perform ot the data link layer.
(08 Marks)
Ans. Refer Q.No. 5.a. of MQP - I
OR
6. a. Explain various SNMP operations.
PTlON are mandatory and (08 Marks)
Ans. Refer Q.No. 6.a. of MQP - I
NOTATION defines object
b. Explain the following bric0y:
i) Create and Go Row creation
addresses the user oriented
ii) Create and Wait Row creation. (08 Marks)
Ans. i Create and Go Row creation
Manager Agent Managed
process process Entity
Security Accounting
anagement Management
Set Request {
changing of configu~ation. of Status 3 = 4
Ives delectation and isolauon index 3 = 3
MS constantly monitors and Create
data, 3 = Def (Data)
cl on severity of failures. The instance
Instance
(08 Marks) Response { Created
etween network management Status 3 = I
MP communication protocol index 3 =3
ment stations and management data, 3 = Def (Data)
17
VIII Sem, (CSE I ISE) Networ-k, M ~ CBCS - Veo2019 (
The creation of a row and deletion of a row arc significant new features in SMIV2.
There are 2 states for Row status. CreateAndGo and CreateAndwait. which are
operations. In createAndGo the manager sends a message to the agent to create a row 7. a. With neat diag
and make its a status active immediately. Fig below shown create and go operation.
The manager process initiates a set request PDU to create a conceptual row with the Ans. Refer Q.No. 7.a 0
values given for three columnar instances of the row. The value for the index column b. Explain with nea
is specified by VarBind index = 3. Ans. Refer Q.No. 5.c.
This is suffixed to the other two columnar objects in the next row to be created.
ii) Create and Wait Row creation.
Fig shows the o rational sequence to create a row using the create and wait tnethod. 8. a. Discuss the folio
Manager Agent i) ADSL Nehvor
..----t-------
Process
Sec Request l
Process Ans. i) ADSL Networ
ii) ADSL Fault M
Scaua-3 : 5
ladn•3 • J b. Discuss protocol I
Ans. Refer Q.No. 7.b. 0
9. a. Discuss security
Ans. Security manageme
management. ltinvo
Set Request ( access to data store
through the network
-
dll>J DclDaia) :::::---;
Response(
Status- 3 • 2 mobile station. Secu
Index - 3 • DelDala) The IETF work gro
Sclltoqual( statement of lhe rut
11111113 • I ::::-----:,.
technology and info
Rc,poosc[ - An enterprise policy
- - Sta1u1-3 • I
into systems and ace
The manager and the agent are shown. The manager process sends a set request A basic guide for ge
POU to the agent process. the value for status is 5, which is to create and wait. I. Identify what you
The third columnar object expects a default value, which is not in the set-request 2. Determine what y
message. The agent process responds with a status value of 3, which is not ready. 3. De1ermine how Ii
The manager sends a get request to get the data for the row. The agent responds with 4. Implement measu
a nosuchlnstance message, indicating that the data vlue is missing. The manager 5. Review the proces
subsequently sends tho value for data and receives a response of not Inservice(2) The assets that needs
from the agent. The fourth and final enchange of message in the fig is to activate data, documentation,
the ro~ with a ~tatus value of I. With each message received from the manager, the classic threats are fro
agent either validates or sets the instance value on the managed entity. I or unauthorized dis
Denail of service is a
state in which it no I
b. Discuss Event corrc
Ans. Refer Q.No. 9.b. of J
JD
cscs -Veo2019 IJCM'\12020
[~e~==-
eAndwait. which are
Module- 4
7. a. With neat diagram, explain data over cable system reference architecture.
agent to create a row (08 Marks)
ate and go operation. Ans. Refer.Q.No. 7.a ofMQP- 2
nceptual row with the (08 Marks)
b. Explain wilh neat sketch brondbund access technology.
for the index column
Ans. Refer Q.No. 5.c. of June/ July 2019
w to be created. OR
8. a. Discuss the following:
ate and wait method. I) ADSL Network Munagemcnt ii) ADSL Fault Management (08 Marks)
Agent Ans. i) ADSL Network Management : Refer Q.No. 8.b. of June / July 2019
Process ii) ADSL Fault Management : Refer Q.No. 9.a. of June / July 2019
b. Discuss protocol layer in IIFC system. (08 Marks)
Ans. Refer Q.No. 7.b. of MQP - 2
Module- S
9. a. Discuss security Management. (08 Marks)
Ans. Security management is both a technical and an administrative consideration of info
management. It involves securing access to the network and info flowing in the network,
access to data stored in the network and manipulating the data stored and flowing
through the network. Another area of concern in secure communication is the use of
mobile station. Security management goes beyond the realm of SNMP management.
The IETF work group that generates RFC defined a security policy as "a formal
statement of the rules by v,hich people who are given access to an organization's
technology and info assets must abide".
An enterprise policy should address both access and security breaches. Illegal entry
into systems and accessing of networks must be protected against.
A basic guide for getting up policies and procedures includes following.
css sends a set request I. Identify what you arc trying to protect
h is to create and wait. 2. Determine what you are trying to protect
is not in the set-request 3. Determine how likely threats are
f 3 which is not ready. 4. Implement measures that will protect your assets in a cost efiective manner
Th~ agent responds with 5. Rc,ic\\ the process continuously and make improvements of weakness are found.
s missing. The manager The assets that needs to be protected should be listed including hardware, software,
nse of not lnservice(2} data, documentation, supplied and the people who have responsibility for them. The
in the fig is to activate classic threats are from unauthorized access to resource and/ or info, unintended and
from the manager, the I or unauthorized disclosure of info and devial of sen ice.
ed entity. Dena ii of service is a serious attack on network because the network is bought into a
state in \\hich it no longer carry legitimate user's data.
b. Discuss t'~ent correlation techniques. (08 Mar~)
Ans. Refer Q.No. 9.b. of June/July 2019
19
VIII Sem,(CSE ( ISE) Netwo-r-k- M ~
OR
10. a. Discuss the Following:
i) Fault Management ii) Performance MIIDagement. (08 Marks)
Ans. i) Fault Management : Refer Q.No. 10.b. of MQP - 2
ii) Performance Management : The purpose of network is to carry info, and
performance management is (data) traffic management. It involves data monitoring,
problem isolation, perfonnancc tuning • analysis of statistical data for recognising
trends and resource planning.
The parameters are throughput, response time, network availability an network
reliability. The metrics of these parameters depend on what when and where
measu:-ements are made. The parameters that affect network throughput are
bandwidth or capacity of the transmission media, its utilization, channel error rate,
peak load and average traffic load. The response time depends on the application.
Three types of metrics arc used to measure applications responsiveness application
availability, response time between user and server and burst frame is the rate at
which the requested data arrives at user station.
b. Explain configuration Management. (08 Marks)
Ans. Configuration management in network management is used in discovering
network topology, mapping the network and setting up configuration parameters in
management agents and management systems.
Network provisioning also called circuit provisioning is an automated process. A
trunk and a special service circuit are designed by application programs written in
operation system.
Planning and inventory systems are integrated with design systems to build an overall
SYST
system. Ex :- Trunk Integrated Record keeping system. Network provisioning in a
computer communication user packet switched paths, broad band communication
uses WAN communication.
Network management is based on knowledge of network topology. As a network
grows, shrinks or otherwise changes. network topology needs to be updated
automatically. Auto discovery can be done by using broadcasting on each segment
and following up with further SNMP queries to gather more details on system.
One efficient method is to look at the ARP cache in the local router. This process is
continued until the desired info is obtained on all IP addresses defined by the scope
(VIII S
of auto discovery procedure. A map showing network topology is presented by auto
discovery procedure after the address of the network entities have been discovered.
TI1e auto discovery procedure becomes more complex in virtual LAN configuration.
An efficient database system is an essential part of inventory management system.
Two of the systems that TRICKS uses are equipment inventory (El) and facilities
inventory (Fl). An object oriented relational database is helpful in configuration and
inventory management.
20
rktM~
ailability an network
iat when and where
~vork throughput are
on, channel error rate,
SUNSTAR·
ds on the application.
nsiveness application
t frame is the rate at
(08 Marks)
used in discovering
uration parameters in
automated process. A
011programs written in
SYSTEM MODELLING AND
terns to build an overall
vork provisioning in a
cl band communication
SIMULATION
opology. As a network
needs to be updated
sting on each segment
.ore details on system.
I router. This process is (VIII SEM. B.E. CSE / ISE)
es defined by the scope
gy is presented by auto
have been discovered.
ual LAN configuration.
management system.
tory (El) and facilities
fut in configuration and
SYLLABUS
SY ST EM MODE L LING AN D SIM U LATION E
(AS l'lil< Cl!OlCE BASED CREDl'I SY~ 1l Mt('(l(.-;l ~Ull·MF)
(EHEC'I IVE FROM TIil· ACADF\I.IIL 'i I ,\R 2lllh • 20171
SulJj«t C:od• 1;;('S833 I \ \l:trk, zo s
~:»1111 \ l:,rl,, 80 Time: 3 hrs.
SumlJ<•r of L,•,tun· Ilours/Week 03
Note : ,<1,,,.,.,er 0 11_,
rotal Xumbcr of L«1urr llou•·• ~o •.·u•m llou,·s 03
MODULE 1
Introduction: When simulation is the appropriate tool amt" hen it is not appropriate, Advantages l. a. With a neat d
and disadvantages of Simulation; Areas of application, Systems and system environment; Ans. Steps in simul
Components of a system; Discrete and continuous systems, Model of a system; Types of Models,
Discrete-Event System Simulation examples: Simulation of queuing systems.
G ene ra l Principles, S imulntio n Softw11re: Concepts in Discrete-Event Simulation. The Event-
Scheduling / Time-Advance Algori.thm, Manual simulation Using Event Scheduling
MODULE 2
Statistical Models in Simulation : Review of terminology and concepts. Useful statistical
models,Discrete distributions. Continuous distributions,Poisson process. Empirical distributions.
Queuing M od els : Characteristics of queuing systems,Qucuing notation.Long-run measures of
performance of queuing systems,Long-run measures of performance of queuing systems cont. ..
,Steady-state behavior of M /GI 1 queue, Networks of queues.
/
MODULE3
Random-Number Gcner:ttion: Properties of random numbers; Generation of pseudo-random
numbers, Techniques for generating random numbers, Tests for Random Numbers,
Ra ndom-V11ria te G eneratio n : Inverse transform technique Acceptance-Rejection technique.
MODULE4
Input Modeling: Data Collection; Identifying the distribution with data, Parameter estimation,
Goodness of Pit Tests, Fitting a non-stationnry Poisson process, Selecting input 1_nodels without
data, M11ltiv11riatt: and Time-Series input models.
F.stimation o f Absolu te Pc rfo rm11ncc: Types of si1111.lations with respect to output ·analysis
,Stochastic nature of output data, Measures of pcrforma1.cc and their estimation, Contd..
MODULE 5
Measures of performance and their estimation,Output analysis fur terminating simulations
Continucd..,Outpul analysis for steady-state simulations.
Verification, C:tlibr:ttion And V:1lidatio11: Optimiz:tlion: Model building, verification and
validation, Verification of simulation models, Verification of simulation models,Calibralion and
validation or models, Optimiwtion via Simulation.
.TION Eighth Semester B.E. Degree Examination,
C BCS - Model Question Pnper - I
SYSTEM MODELING AND SIMULATION
20
Time: 3 hrs. Max. Marks: 80
80
),. Note: A mwer a11y FIV£fi1/I t/lll!M/1111s, .1·e/ecti11,: ON£/11ll 1111e.1t/011/ro111em:/111wd11/e.
OJ
Module - 1
I . a. With a neat diagrum explain the steps in the simul11tion study. (08 Marks)
1 appropriate, Ad,antages
nd system environment; Ans. Steps in simulation study
i.ystem; Types of Models, Problem
formulation
stems.
Ill Simulation. The F,vent-
Selling of objectives and
nt Scheduling 2
ovcrnll project plan
,cration of pseudo-random
m Numbers,
cc-Rejection technique.
No No
I building, ,critication nn
tion models,Calibration an
12 Implementation
VIII Seffll (CSE/IS'f:) Syst"em, ~~ a-t'\.d,S~t,0-n,
I. Problem formulation 12. l mplementatlo ]
Every study should begin "ith a statement of the problem. The success of the
• If the statement is provided by the policymakers or those that have the problem. ha,e been performe
the analyst must ensure that the problem being described is clear!)' understood. I phase:- Consists
• lfa problem statement is being developed by the analyst, it is important that the • It is period of di
policymakers understand and agree with the fonnulation. • The analyst ma
2. Setting of ol.Jjcctives and overall project plan • Recalibration nn
The objectives indicate the questions to be anS\\ered by simulation. At this po_int. 11 phase:- Consist
a determination should be made concerning whether simulation is the appropriate • A continuing int
methodology for the problem as formulated and objectives as stated. • Exclusion of m
111 phase:- Consist
3. Model conceptuali1.ation .
The art of modeling is enhanced by an ability abstract the essential features of a • Conceives a thro
problem. 10 select and modify basic assumptions that characterize the S)i.tem. and • Discrete event s1
then enrich and elaborate the model until a useful approximation result~. • lhe o/p variabl
4. D11t11 collection statistical analys
There is a constant interplay between the construction o f the model and the collection IV phase:- Consists
of the needed input data. As the complexity of the model changes. the required data • Successful impl~
e lements can also change. successful compl
5. Model translation
The modeller must decide whether to program the model in a simulation language
I. b. Explain advantao
Ans. AdvunCuges of sim,
..
such as GPSS/ 11 or 10 use special purpose simulation sollware. • New policies,
6. Verified organ izational p
Verification pertains to the computer program prepared for the simulation model, operations of th
with complex models. it is difficult if not impossible to translate II model successfully • Ne,\ hardware
in its entirety without a good deal of debugging. tested without c
7. Validntcd • H) pothesis abou
Validation usually is achieved through the c11libration of the model, an iterative • ln<;ight can he o
process of comparing the model against actual system behaviour and using the the system.
discrepancies between the two and the insights gained, to improve the model. • Insight can be o
8. Exr1crlmcnt:tl design
• Bottleneck ana
l11e altc111ativcs that arc to be simulated must be dctenni111..'<I. The decision conceming \\hich information, m
altematives to simulate will be function of runs that ha\.c bl..-cn complcH.'<i and anal) sect. • A simulation st
9. Production runs nnd nnnlysis ' how individual
Production runs and their subseqm:nt analysis. arc used to estimate measures of Dis1ulv11111agcs
performance for the system designs that an: being simulated. • Model l>uildin
JO. More runs
and through e ·
Gi\ en the analysis of runs 111111 havi: bi:1:11 completed the analyst determines ,~hcthcr eom~tcnt indi
additional runs are needed and what design those additional experimentation ,hould the) \\ iII be th
follO\\ . • Simulation res
II. Documentation nntl ~porting essentially ran
There are two t)'pes of documentation program and progress. observation is
• Program documentation:- If the program is going lo be used again by the • S11nulation m
same or diffen:nt analysts, if could be ncccssa~ tu understand how the program • Simulation is
operates.
4
12. Imple mentation
The success of the implementation phase depends on how well the previous steps
at have the problem, have been performed. The simulation modeling can be broken into 4 phase
clearly understood. I phase:- Consists of step I and step 2
t is important that the • It is period of discovery/ orientation.
• The analyst may have to restart the process if it is fine tuned.
• Recalibration and classifications may occur in this phast: or another phase.
ulation . At this point, II phase:- Consist of step 3, 4, 5, 6 and 7
ion is the appropriate • A continuing interplay is requirt:d among the steps
• Exclusion of model user results in implications during implementation.
stated.
III phase:- Consists of step 8, 9 and I0
essential features of a • Conceives a through plan for experimenting
• Discrete event stochastic is a satisfaction experiment
·tcri1.e the system, and
• The o/p variables estimates that contain random error and therefore proper
tion results.
statistical analysis is required.
IV ph.1sc:- Consists of step 11 and 12
odcl and the collection
• Successful implementation ckpends on the involvement of user and every steps
nges, the required data
successful completion.
. .
I . b, Expl:iin advantages :rnd disadvant:igcs ofsimul:1tio11. (08 Marks)
n a simulation language Ans. Adv:intugcs of simulution
are. • New policies, operating procedures. decision n,lcs, information nows,
organizational procedures and so on can be explored without disrupting ongoing
r the silllulation lllodel, operations of the real system.
late a model successfully • New hardware designs, physical layout, transportation systems and so on can be
lt:sted without committing resources for their acquisition.
• Hypothesis about how or way certain phenomena occurs can be tested for feasibility.
the model, an iterative • Insight can be obtained about the interaction of variables to the performanct: of
behaviour anti using the the system.
mprove the model. · • Insight can be obtained about the inkraction of variables.
• Oottlcneck analysis can be performed indicating where work - in - process,
d1.-cision conceming which information, materials and so on are being exclusively delayed.
completed and analysed. • I\ simulation study can help in understnnding how the system operntcs rnther than
how individuals think the system operates.
10 estimate measures of Dismlvm1tages .
ed. • Model building requires special training it is an art that is learned over time
and through experience. Further more, If 2 mod~ls are constructed by different
alvst determines whether competent individuals they might have similarities but it is highly unlikely that
I ;ll.perimentation ~hould they will be the same.
• Simulation results can be different to interpret most simulation outputs arc
essentially random variables, so it can be hard to distinguish whether an
ss. observation is a result of syste111 interrelationships on of randomness.
to be used again by the • Simulation moclt:lling and analysis can be time con~uming a,nd expensive.
Jcrstand how the program • Simulation is used in some cases when an analytical solution is possiblt:.
5
VIII Se,w (CSE/ISE)
C13CS - Mo-de.iq~
OR Service lime dctcrnl
2. a. A small store bas only one check out counter customen arrive al the checkout
counter at random times that arc l to 8 minutes apart. Each possible value of
Inter-arrival time has the same probability of0.125 the sen•ice time vary from
J to 6 minutes, with probability show below.
s·ervice time minutes I 2 J 4 5 6
Probabilities 0.10 0.20 0.30 0.25 0.10 0.05
G iven the random digit distribution for customen arriv11ls us 64, 112, 678, 289,
87 1 11nd ror service times 84, 18, 06, 91 s1,,-qucatially.
Develop simul11tion table fqr 6 customcn. Find
(i) Average w:iiting time (ii) Average senice time (iii) Averuge time customer Custonirr Inter ~
spends in the system. (10 Marks) nrriv:il
Ans. Distribution of time between arriva ls
Time b/w Probability Cumulative Random digit I
lime
.
I
j
arrivals probability ;1,;signment 2 I 1
0.125 3
I 0. 125 001-125
4
I I
0.250 126-250 6
2 0.125
3 0.125 0.375 251-375
5 3 •
6 7
•
4 0.125 0.500 376-500
5 0. 125 0.625 501-625 I. Average waiti,ig tim
6 0.125 0.750 626-750
7 0.125 0.875 7S 1-87S
8 0.125 1.000 876-000
Service hme d1Str1buhon 2- Avcragcscrvice tim
Service time Probability Cumulative probability Random digit
I 0.10 0.10 01-10
2 0.20 0.30 11-30
3 0.30 0.60 31-60 3 · A vcrage time custon
4 0.25 0.85 61-85
s 0.10 0.95 86-95
6 0.005 1.00 96-00
lal er arrival time determln11Cloa 2. b. Explnin event sehc,
Customer s1rn11:.hoc 111 cluck ... 1
Random digits Time b/w arrivals
Ans. After lhc sysh!m snap
I - - advanced to simulmioi
2 064 I FEL and event is exec
3 112 I • At time 11 nc\\ full
4 678 6 arc scheduled hy
s 289 position on the FE
3
6 • A Her the nc\\ syst
871 7 to the time of then
6
SM11•+M f:.c"'M Su.Mv
Service time determination
·ve at the checkout Customer Random digit Service: lime
,b possible v1tlm! of I 84 4
ice time vary from 2 18 '.!
3 87 s
5 4 81 4
'. o io \ o~s ] 5 06 I
~s 64, ll2, 678, 289, 6 91 5
Simulation table for smgle channel queuing problem
Customer Inter Arrival Sen•ice Time Wniting Time Time lcJtl
ge time customer arrival lime timr servicr time in ~er, ice ruMomrr lime of
(10 Murks) lime Begins <111eue ends ~11e11ds ~yslem service
I . 0 4 0 () 4 4 0
tandom digit 2 I I 2 4 3 <, s 0
:issi):!.11mc11t 3 i 2 s 6 4 II \I 0
001-125 4 6 8 4 II 3 15 7 0
126-250 5 3 II I 15 4 16 5 ()
A ranido~ variable x is uniformly distributed on the interval (a, b) if its pelf is given by
f(x) j;':';• as; x s; b
0, Otherwise
Thecdf is given by
0, <a
X
o-----
F(x) ~ ~ - aSx<b The Exponential dis
{ b-a arrival:; arc compleh:
I. In these instances. A. i
di~lribution has mea
8
~ s~ W t ' \ I
X. - X1
to advance the clock P(x, <x <x:) = F( x:)-F ( x1) = b-a
~duling / time advance ls proportional to the length of the interval for all x, and x~satisfying a=:x, <x;::b.
The mean and variance of the distribution are given by.
E(x) = n+b
2
and
cur at time 11
ccur at time t 2 V{x)= (b-a)'
ccur at time 13 12
The pdf and cdf when a= I and b= 6 are shown below
pdf cdf
\Ccur at time t__ !lx) l~x)
1.0
cnt 3, time 4) from f"EL. 0.2
0.8
hge entity attributes and 0.6
0.1 0.4
e 011 fEL rank by eve 0.2
X X
I 2 3 -1 5 6 7 8 I 2 3 4 5 c, 7 8
The uniform distribution plays a vital role in simulation. Random numbers, uniformly
distributed between zero and one, provide the means to generate random events.
nl to occur at time 12 ii) F.xponcntial distribution
nt to occur at time t* A random variable x is a said to be exponentially distributed with parameter A..>0 if
mt occur at t3 its pdf is given by
nt to occur at time t. f(x ) ={A,.-'·', x ;,: O
0, else where
The density function is shown below
(OS Ma l{x) f(x)
() X O :\
The Exponential distribution has been used 10 model inter arrival times when
arrivals arc completely random and to model service times that arc highly variable.
In these instances, A is a rate, arrivals per hour or servic..: per minut<:. Tht: exponential
distribution has mean and variance is given by
9
VIII Sem, (CSE/ISE) sy~em, ~ ~ ~ s~ t m ' \ I cacs - Modei Q
I I The population
E ( x) = °i and V( x) = },: assumed to be
Thus, the mean and standard deviation are equal. The cdf can be exhibited by For example. c
integrating equation to obtain of time a Iliac
rerno, es the tir
o. x <0
customers. 11 h
F(x)- '
p,i:-..,clt . ., I -c-••. c!: 0 server who ser
1
11.
finite .ind consi
In syslt:111 with
3. b. A production process mnnufactures computer chips on the average at 2% non assumed to be ;
confo rmi'lg· Everyday, 11 rumlom sum pie or size 50 is taken rrom the process. Ir 2. System capn
the sample contnins more than two non conforming chips, the process will be In many qut:11in
stopped. Compute the probability that the process is stopped by the sampling the 11aiting line
scheme and also compute mean and variance. (08 Marks) For c.,ample. an
Ans. Consider the sampling process ns n=50 Bernoull i trnils, ench with p=0.02, then to enter the mcc
the total number of non conforming chips in the sample. x, would have a binomial customer who (i
distribution given by population.
fC
0 3. The Arrirnl I
P(x) = }o.02)' (0.98)50-'. x =0,1,2 .... 50 The anfrnl proc
1 0, Otherwise
of inter armal
time\ or at rand
It is much easier to compute the right - hand side of the following identify to compute charactcri✓ed by
the probability that more than two non conforming chips 11re found in a sample. a time or in bate!
P(x c!:2}=1- P (:-. s2) 4. Queue bcl111\'i
The probability P(x ~ 2) is calculated from Queue bt:haviou
service to begin.
P(x s 2} = L:,, (-o) (0.02)'(0.98t°-x will balk. renege.
Queue discipline
••O X
\\hich customer,
= (0.98r +50(0.02)(0.98}49 + l225(0.C12f (0.98}48 5. Sen ·icc times ,
=0.92 The !.en ice ti111e
Thus, the probability that the production process is stopped on any day, based on the be constant or o
sampling process, is approximately 0.08. The mean number of non confonning chips characteri/ed as
in a random sample of sizt: 50 is given by variahlcs. ·1he ci
E(x) = np = 50(0.0'.!) = I distributions hav
situations.
and the variance is given by
4. h. Lis t the differcn
V{x) =npq =50(0.02)(0.98) = 0.98 Ans. Rccogni£ing the
for parallel scrv
OR
4. •· Explnin the clrnrnctc ristics of queuing system. of this conventio
(08 Marks)
Ans. Charncteristics of queuing system following system
I. T he c:111ing popul:1tio11 A represents the 1
0 represents the ~
10
cacs - Model,Q,-t.e~t"i,owPet.pu- - 1
Tho.: popula1iu11 of potential customers. referred to as the calling population. may be
assun11::d 10 be finite or infinite.
For example. consider a pank or five machines that are curing tires. Atier an interval
df can be exhibited by or 1i11;e a machine automatically opens nnd must be aucnclt:d b) a \\Orker who
removes the tire nnd puts an uncured tire into the machint:. The machines arc the
customers. who arrive at the instant they automatically open. The worker is the
server who serves an open machine as soon as possible. The calling population is
finite <1nd consists of five machines.
In systcni with a large population of potential customers, the calling is usually
the average at 2% non assumed to be infinite.
en from the process. If 2. System cap:1city
ips, the process will be In many queuing systems. there is limit to the number of customers that may be in
the waiting line or system.
topped by the snmpling
(08 Marks) For example. an au10111mic car wash might have room for only IOcars to wait in line
each with p=0.02, then lo enter the 1m:chanism. II may be illegal for cars to wait in the street. An arriving
, would have a binomial customer who finds \ht: system full docs not enter but immediately to the calling
' population.
3. The J\rrirnl (lroccss
The arrival process for i111i11itc population modt:ls is usually charncterizecl in terms
or inter arrival times or successive customers. Arrivab ma) occur at scheduled
times or at random limes. when al random 1i11ics, the inter arrival times are usually
owing identify to compute
a
characterized by probability distribution. In addition. customers may arrive one of
a time or in batches. The batch may be of constant size or of random size.
re found in a sample.
4. Queue behaviour nnd Queue discipline
Queue behnviour refers to the actions of customers while in a queue waiting for
service to begin. In some situations, there is a possibility that incoming customers
will balk, renege.
Queue discipline refers 10 the logical ordering ofcustomers in a c1ucuc and determines
which customer will be chosen for service when a server becomes free.
5. Sen'icc times and the service nH•chanism
The service times of suc~essivc <1rrivals nrc denoted by S,. S:. S,... They may
be constant or of random duration. In the laucr case, [S,. S:, S3..... J is usually
ed on any day, based on the
characteri;,ccl as a sequence of inilependent and identically distributed random
r of non conforming chips
variables. The exponential. weibull. ga111111:1. log normal and truncated normal
distributions have all been used successfully as 111odcls of service limes in different
situations.
4. b. List the different <1ueuing notations using in queuing systems. • (08 Marks)
Ans. Recognizing the diversity of queuing systcms,Kendall proposed a notational system
for pnrallel server systems which has been' widely adopted. An abridged version
(08 Marks) of this convention is based on the format A/8/C/N/K. These letters represent the
fol lowing system chnractcristics.
A represents the inter arrival time distribution.
B represents the service time distribution.
11
VIII Semt (CSE/ISE) Sy~M~~S~ cacs - Modei Q
C repn:sents the number of parallel servers.
N represents the system capacity. o• = mnx I s i s N
K represents the size of the calling population.
Common symbols for A and B include M(exponential)
D(constant or deterministic), Ek (Erlang of order k) o- ""max I S i s
PH(phase - type). H (hyper exponential). G(arbitrary or gcneral) and Gl(general
indepcndent) Step 3:- Compute
Queuing notation for parallel server systems Step 4:- The criti
r n- steady state probability of having n customers in the system. sample: size N.
P.(t)- Probability of n customers in system at time t Step 5:- If D>Da,
,.,,_ Arrival rate Problem
).,, - Effective arrival rate
µ-• Service rate of one server
t- Server utilization
A - Inter arrival time between customers n-1 and n I
S: Service time of the n'11 arriving customer 2
W Total time spent in the system by the n'h arriving customer
11
-
3
W:- total timc spent in the waiting line by customer n
L(t)- The number of customers in Queue of time I 4
L"(t) - TI1e number of customers in queue at time t 5
L- Long run time average number of customers in the system. Compute D= max
Lv· Long run time average number of customers in Queue D= max (0.26, 0.2
W- Long run average time spent in system per customer D=0.26
W v· Long nm average time spent in Queue per customer. Check = D>Da, j
Module-3 0.26<0.S6S
The hypothesis is
S. a. Explain steps in the kolmorgo,· - simronv test. The sequence of random numbers
0.44, 0.81, 0.14, 0.05, 0.93 arc generated using kolmorgo,·- simrno,, test with 5. b. Wlrnt is nccepl:1
a ...0.05 td determine if the hypothesis that he numben arc 011 the interval (0.1) ~ can cx=0.2. The
can be rcjccteil take Du.=0565. (08 Murks)
Ans. Kolmogorov- simrnov k st An1. In acce1>t:111ce rcj
This test compares the F(x) of the uniform distribution with the t:mpricnl SN(K) of the In thi:. 111c1hod a ra
sample of N observation. only then the rand
Problem
F(x)= x. 0:::: x :::I
If the sample from the random number gcnernlor is R 1• R•..... R, tht:n the cmprical Step I:- set n=O.
S"(x) is defined by · Step 2:- R, =-0.43
Step 3:- Since P-
S,i (x )- Number of R,. R~~.R~ Whichare:s: x
Stcp I:- Set n=0.
Stcp2:- R,=0.414
Step I:- Rank the data from smallest to largest Step 3:- Since P~
R,:::: R~:::.......RN Step I:- Set n=O.
Step 2:- Compute Step 2:- R,•0.83
Step 3:- Since p -
Step 1:- R,=0.99
riis~
D' =mm.Isis N { ~- R}
D" = ma:-. I s i S N { R, - ( i ~ 1) }
eral) and Gl(gem:ral
Step 3:- Compute D- max (D·, D·)
Step 4:- fhc critical value, Da for the specified significance level a, and the given
m. sample size N.
Step 5:- If D>Da, reject 11,. otherwise accept H...
Problem
• I)+
I R I
i/N -i-N 1 -NI -R,
R o-_ (i-1)
I N
I 0.05 0.20 0 0.15 0.05
2 0.14 0.40 0.20 0.26 -
.r 3 0.44 0.60 0.40 0.16 0.04
4 0.81 0.80 0.60 - 0.21
5 0.93 I 0.80 0.07 0.13
Compute D= max (D", D·)
D"' max (0.26, 0.21)
D=0.26
Check = D>Da , if reject
0.26<0.565
The hypothesis is accepte<l because D<Da.
eofrnndom numbers S. b. What is 11ccept11ncc rejection technique? Generate three Pois.,on variates with
,._ simrno,· test with
~c11n a=-0.2. The random numbers are 0.4357, 0.4146, 0.8353, 0.9952 and 0.8004.
on the interval (0.1) (08 Marks)
(08 Mnrks)
Ans. In accc1>t:mcc rejection technique
In this mc1hod a random number is generated and ifit satisfies the particular condition
e emprical S,(x) of the
only then the random variable is generated.
Problem
Step I:- sc::t n"'"O. P= I .
... R~ 1hen the empncal
Step2:- R1=0.4357, P= I. Rl)! 1=1•0.4357=0.4357
Step 3:- Since P=0.4357<0.8187, then accept N=0.
S1cp I :- Set n= O. P-= I
Step 2:- R1=0.4146, P=I. R0 • 1=1•0.4146=0.4146
Step 3:- Since P=0.4 I40<0.8187, then accept N=0
Step I:- Set n=0. P-= I
Step 2:- R,=0.8353. P= l *R0, 1= I *0.8353=0.8353
Step 3:- Since P=0.8353 >c"" then Reject n=I and return to step 2
Slcp 2:- R2=0.9952. P- R,. R,=0.8313
14
~~
h n=2
X =-_!_ln(l-R ) ➔ (2)
, "- '
For i= I, 2. 3...... one simplification that is usually in equation (2) is to replace
1-R, by R
x =-..!_ ln(R)
' A '
This alternative is justified by the fe<a:I thnt both R, and I- R, are uniformly distributed
on (0, I).
stribution with the rntc 11., ii) Uniform dis tri bution
Ile. Consider n random variable x that is uniformly distributed on the interval (a. b)
15
VIII Sem, ( CS'f(IS'f) CBCS · Mod.el,Q
Step I:- Find the cdf F{x) i.e., 2. Label the horiz
0, X <a
3. Find the frequc,
x-a 4. Label the vcrtic
F(x) = - - , as:xsb
{ b -a 5. Plot the freque
I, x>b 3. Selcc1i11g lhc
After cons1ructinij
Step 2:- Set upper part Off he rl
. X -11 Depending on thu
F( x ) = R 1.e.,--=R
b-a
Step 3:- Get
1
X = f- (R)
. x-11 R
I.C.• - - =
b-u
x-a = R(b - a)
16
2. Label the horizontal axis to conform to the inten ab ~eh:ctcd.
3. find the frequencies for each class interval.
4. Label the vertical axis so that the total occum:nces can be plotted for each interval.
5. Plot the frequencies on the vertical axis.
3. Selecting the distribution
After constructing a histogram a smooth curve passing through mid points from the
upper part of Jhc rectangle is drawn. lhis cun e 1s called the frequency curve.
Depending on the shape of the curve. appropriate guess to the distribution is made.
Poisson Uniform
n.
begin by practice or pre
Exponential Normal
ing Another methud to identify the distribution is to use the physical basis of the
hcckout whether the data
ns .needed as input to the distribution.
• Poison:- Number of independent events that occur in a lhed nmount of time.
homogeneity in successive • Exponential:- Time between independent periods
• Normal:- Some of component processes
I
ssive days.
a quantity of interest is not • Discrete/ continuous uniform:- Complete uncertainty cnn be used where there
is no dnta.
vo variables. • Gamm:1:- An extremely flexible distribution used to model non negative random
ations that appear to be vnriables.
• Em1,iric11l :- Re samples from the actual data collected.
r for successive customers. • Octa :- An extremely flexible distribution used to model bounded random
variablt:s.
owed by the data should be 4. Parameter estimation
Once the:: distribution is identified the parnmetcr are tu be idenriflcd and estimated.
The paramc::tcrs can be eliminated by using sample means and variance denoted by
chh.
17
V III Set>11 ( CSE/I SE) CBCS · lvt~Que.!
~ and S: s :
-1 t:i (unfroupeddnta)
6
7 .
x= I x,f, (grouped data) 8
9
~
''
, n
10
( ungrouped data) 11
!
s2 = I
,l
(grouped data) x: - I(o, - E.)
E,
Distribution Parameter Suggested estimator X ~ > .\ 2U, J.. - 5 -
Poison a
x=x
- 27.68 >3.6-t
:. Reject H) poth
Expotential
" A.==
I
l\
8. a. Brien,- explain c
Gamma p,e I Ans.
0==
X • To understnn
bc1wcen meas
Normal µ,o
~I= X
- • One way to rr
prediction inti
a' =S2 • Both confide
being produce
7. b. Apply chi- square goodness of fil lest to poison assumption nith a=3.64, data
Suppose that mo
size=IO0. Observed freq uency is 0i: 12 10 19 17 10 8 7 5 5 J 3 I ta ke le,cl of
unknown.
significance a=0.05 and critical , ·alue I I.I. Expected value should be greater
To lel J be lhe ~
tha n five. (06 l\.1nrks)
simulation
Ans. Therefore. 8 is t
of the average CJ
• lf gool is toe
lfwc arc plannir
(o,-E,}:
X
'
( or o x.( P(x) .
I.:. 0 -
I
I.:. '
I::.,
8 is a relevant p
• A confidenc~
• I I{
0 12 0 0.0262 :?.62 9.85 7.'1.7 s· =- L ('
R-1 .. 1
I 10 10 0.0955 9.55 - -
be the !>ample
2 19 38 0.1739 17.36 1.65 0.156
3 17 51 0.2110 21.10 -4.1 0.796
4 10 40 0.1920 19.16 -9.16 4.379
18
5 8 40 0.1397 13.9 -5.9 2.50
6 7 42 0.084 8.46 -1.46 0.25 1
7 5 35 0.044 4.4 . .
8 5 40 0.0200 2.0 . .
9 3 27 0.0080 0.8 9.51 I 1.062
10 3 30 0.0029 0.3 . .
II I 11 0.00097 0.09 . .
100 364 100 27.68
,,
9.85 7.117
R-1 ,.1
be the sample varinncc ncrosc; the R replications.
1.65 0.156
-4.1 0.796
-9. 16 4.379
19
VIII SeM11 ( CSE(ISE) Sy~~~ CM'\a1S~t.0nt CBCS · ~odeltQ
• · The usual confidence interval. \\hich assumes the Y, are normally distributed. is
-Y... ± tet / 2. R - I r.:-
s 9. a. Explain the replici\
"R Ans. Replication melho
Where to./2, R-1 is the quantile of the t distribution "ith R-1 degrees of freedom that • If initialization
cuts off u/2 of the area of each tail. level, then the 111
• A prediction interval is designed to be wide enough to contain the actual average
estimator variabi
cycle time on any particular day with high probability.
• The basic idea is
• A prediction interval is a measure of risk a confidence interval is a measure of
one the same wa
error.
• If significant bia
• The normal - theory prediction intcn al is
arc used to redu
Y... ± to. / 2, R-1 S✓l + I~ can be misleadin
• This happens be
The length of this interval will not go ofO as R increases. In the limit it becomes affected only hy
0 ± Za./2a • Thus, incre:ising
to rcOcct the fact that, no matter ho,, much \\I! simulate. our daily average cycle time intervals around
sti II varies. in,csting the init
• If the simulatio
8, b. Prescribe Quantiles in output unalysis for terminating simulations. (06 M:irks) observations in a
Ans. • The basic raw 0
• To present the interval estimator for quantitics,it is helpful to review the interval • Each y,, is derive
estimator for a mean in the special case when the mean represents a proportion ,,
Cusc I :- y is an in
or probability P. could be the delay 0
• When the number of independent replications Y ,......Y k is large enough that to./2, Case 2:- )' is a ba
n-1 =Za/2. the confidence interval for a probabilil) Pis often written as obscr\'alio,;~.
P±Zo. / 2
. ( )
P 1-P
,,
Case 3:- y is a bat
• When using the
sample for the p
R-1
- 1 "
Where I' is the sample proportion. y,.(n.d) = -
n- d d
• The quantile - estimation problem is the inverse for the probability - estimation
problem. find O such that Pr [Y S Ol=P. Thus IO estimate the P quantile we find as the sample mca,
the value O such that I 00% of the data in a histogram of Y is to the lefl ofO. different random n
• Extending this idea. an appro:-.imatc ( 1-u) I()0%. Confidence intc:rval for O can initial conditions (I
be obtained b) finding two values: Ot that cuts of I00 pt % of the histogram and y,.(n,d), .......)R·<-'
Ou that cuts IOOpn % of the histogram. where. are ind1:pcndent an
P(l - P) random sample fro
Pt -= P - Za. / 2
R-1 8n.d -E[y,.(n,d)
The overall point e
- I K
y ...(n.d) ~-
R ,.,
L
20
lly distributed. is Module- 5
9. a. Explain the replicntion method for steady state simulation. (10 Marks)
Ans. Replication method for steady state simulation
es of freedom that • If initialization bitts in the point estimator has been reduced to H negligible
level, then the method of independent replications can be used to estimate point
the actual average estimator variability and to construct a confidence interval.
• The basic idea is simple make R replications initializing and deleting from each
al is a measure of one the same way.
• If significant bias remarks in the point estimation and large number of replication
arc used to reduce point estimator variability, the resulting confidence interval
can be misleading.
• This happens because bias is not affected by the number of replications (R), it is
affected only by deleting more data or extending the length of each run.
limit it becomes • Thus, increasing the number of replications (R) could produce shorter confidence
intc.rvals around the wrong point. Tlum:fore. it is important to do a through job of
average cycle time investing the initial condition bias.
• If the simulation analyst decides to delete t observations of the total of n
ations. (06 Mnrks) observations in a replications then the point estimator of0 is Y. .. (11, d)
• The basic raw output data, {Y,i' r= I ....R, j= I ... n}
review the interval • Each yri is derived in one of the following ways
resents a proportion Case 1:- Y,i is an individual observation from within replication r, for example. Y,,
could be the delay of customer in a queue or the response time to job j in a job shop.
rge enough that to.J2, Case 2:- y,, is a batch mean from within replication r of nurnber of discrete time
written as observations.
Case 3:- y is a batch mean of a continuous time process over time interval j.
• When ~:sing the replication method, each replication is regarded as a single
sample for the purpose of estimating 0. for replication r, define
- I "
y,.(n.d) = - L
n - d ,-<1 H
Yri
obability • estimation
he P quantile we find as the sample mean of all observation in replication r. l-3ccausll all replications use
different random number streams and all initialized at time O by the same set of
is to the left of 0.
nee interval for 0 can initial conditions (IJ the replication averages.
o of the histogram and Y, ,YR.(
.(n.d ) ....... n,d)
are independent and identically distributed random variables that is. they constitute a
random sample from some underlying population having unknown mean
0n.d = E[y, .( n,d)]
The overall point estimator, is also given by
- l~-
y...(n.d) = - L., Y,.(n. d)
R r, I
21
VIII Set111 ( CSE/I S£) CBCS • ModeiQ
Thus, it follows that
E[y....(n,d)] =8n,d
also ,if d and n are chosen sulliciently larl!,e. then On, d ~- and y...(n,d) is an
approximately unbiased estimator of0. The bias in y..(n,d) is 0n,d -0.
9. b. Explain output analysis for stc:1dy state simulation. (06 Marks)
Ans. Consider a single run of a simulation model whose purpose is to estimate a steady
state or lone run characteristic of the system. Suppose that the single run produces
observ11tions y1, y:""' .... hich generating an: samples ofan autocorrelated time series.
The steady state measure of performance, 0 is defined by
I "
0 = lim 11 ➔ OO-LY,
n ,-1
With probability I, \\ here the , aluc of9 is independent of the initial conditions.
For t:xample ,if Y, was the time customer spent taking to an operator, the Owould be • The second st
the long-run average time a customer spends talking to an operator and because 8 is which include
defined as limit, it is independent lo the call confers conditions at time 0. Similarly system. ilp pa
the steady state performance for a continuous time output measure (y (I), t~O} , such • The third step i
as the number of customer in the call centers hold queue is defined as The model builde
♦ =limT1 ➔ . . !. ._ Jy(t)dt verifyi ng and vali
Verification:- Thi:
Tc o
is reOccted accura
With probability I .
The sample size n(or Tr) is a design choice. it is not inherently determined by the verification pmccs
• I lave the corn
nature of the problem. The simulation analyst\\ ill choose simulation run length with
developer.
several consideration in mind.
• Name a flow di
I. Any bias in the point estimator that is due to artificial or arbitrary conditions.
• Closely examin
2. The desired precision of the poi_nt estimator, as measured by the ~tandard error or
• Print the input p
confidence interval half width.
value are not ch
3. Budget constraints on computer resources.
• Mal-.e the com pt
OR • If the compute,
10. a. Exph1in lhe ncnl diagram model building verification mul validalion. (08 Mi1rks) animation imita
Ans. • Have a graphic,
• The first step in model building consists of observing the real system and • It simplifies the
interactions among it various components and collecting data on its behaviour Valid:1tion:- Verifi
also, persons familiar with system or sub system should be questioned to take validation is ovcrnl
advantage, their special knowledge. and ii~ behaviour.
For example operators, supervisors. h.-chnicians etc arc also questioned to gain more Calibration is the i
knowledge about the system. As model de\elopment proceeds new questions may system making ad
arise and model dcvclopers may have to return to this step. making additional
22
Real system
Verification
Operational model
c initial conditions.
pemtor, the Owould be • The second step in model building is the construction of a conceptual model.
perator and beca_us~ 0 is ,, hich includes collection of assumptions on a system components, structure of a
ions at time 0. S1m1larly system. i/p parameters, data assumption etc..
easure lY (t), t~O} • such • The third step is the translation of the conceptual model into operational model.
efined as The model builder has to return lo each of these steps many times while building.
verifying and validating the model.
Verification:- Thi: purpose of vcrifica1ion is to cnsun: that tht: conceptual model
is re0ected accurately in the computcri,ed reprc<,t:ntation, some suggestions in the
verification process arc,
rently dctermin~d by the • I lave the computerized reprcs1:n1a1ion checl-.ccl by someone other than its
1mulation run llrngth "ith developer.
• Name a flow diagram of the process.
arbitrary conditions. • Closely examine the model output tor reasonableness under a variety of inputs.
by the standard error or • Print the input paramett:rs at lhc i:nd ofsimulation ofachicvc that the i/p parameter
value arc not changcd.
• Make the computcrizt:d representation as self documentary as possible.
• If the computerized representation is animated verify that what is seen in the
d v11lidntion. (08 M11rks) animation imitates the actual system.
• Have a graphical representation of the model.
ing the real system and • It simplifies the task of model understanding trace the model.
ing data on its behaviour Val_id:1~io1~:- Verification and validaiion are conducted simultaneously by the model
Id be questioned to take validation 1s overall process of comparing the model and its behaviour to real system
and its behaviour.
questioned to gain more Calibratio11 is the itera1ive process of comparing the model and its behaviour to real
ccds new questions may sysl~m ma~i~g adju~lment to the model, comparing the revised model to reality
making aclcl111onal adJustment comparing again and again.
23
VIII Se.m, (CSE/ISf) Sy~ M ~ ~s~
10. b.Which are the common suggestions can be given for use in the , ·erification
process. (08 Marks) Ei
Ans. I. Have the operational model checked by someone other than its developer,
prefcmbl> an expert in the simulation software being used. SYS
2. Make a now diagram that includes each logically possible action a system can take Time: J hrs.
v. hen an event occurs and folio" the model logic for each action for each event type. Note : .-111.1 wer ""Y
3. Closely examine the model output for reasonablcne!>s under a variety of settings
of the input parameters.
4. Have the operational model print the input parameters at the end of the simulation, I. a. Lisi any five ci
to be sure that these parameter values have not been changed inadvertently. when ii is not.
5. Make the operational model as self documenting as possible. Ans. When the simul
6. If the operational model is animated, verify that what is seen in the animated I Simulation en
imitates the actual system. of a comple:-. sy
7. The interactive run controller (IRC) or debugger is an essential component of 2. Informational
successful simulation model building. the effect of the
8. Graphical interfaces are recommended for accomplishing verification and 3. ll1e knowled
validation. The graphical representation of the model is essentially a form of self suggesting imp
documentation. 4. Animations ,
vi:sualized. •
5. The modem Sl
simulation. ·
When the simu
I. When the pro
2. \\'hen the pro
3. When it is cat
4. When the sim
5. When the res,
I. b. A computer tc,
"ho t:1kcs call~
lime betnccn c
table 1.1. Able
which mcnns
their sen ice r
Table I. I Intc1
Table 1.2 Sc
24
e verification Eighth S emester B.E. Degree Examination,
(08 Marks)
its developer,
CBCS - Model Question Paper - 2
SYSTEM MODELING AND SIMULATION
system can take Time: 3 hrs. Max. Marks: 80
ach event type. 1- Note: Answer 1111y /CJV£/11ll 1111estiom, Je/ecti11g ONE / 11/1 q11esliim/ro111 e11d11110,/11/e.
ricty of settings
Module- I
fthe simulation, I. a. List any five circumst:rnces, when the simulation is the appropriate tool and
rtently. when it is not. (08 Marks)
Ans. When the simulation is the approp.-i:lte too l
in the animated I. Simulation enable the study of and expel'imentation with the internal interactions
of a complex systt·m or of a sub system within a complex system.
al component of 2. Informational, organintional and environmental changes can be simulated and
the effect of these alterations on the model's behaviour can be observed.
verification and 3. The knowledge gained. in designing a simulation model of great value toward
ly a form of self suggesting improvement in the system under investigation.
4. Animations shows a system in simulated operation so that the plan can be
visualized.
5. The modern system is so complex that the interactions can be treated only through
simulation.
When the simulation is not appropriate
I. When the problem can be solved using common- sense.
2. When the ,,roblem can be solved analytically.
3. When it is easier to perform direct experiments.
4. When the simulation costs exceed the savings.
5. When the resources or time arc not available.
I. b. A computer technical support centre is staffed by two people, Able and Baker
who takes calls and try to answer questions and solve computer problems. The
time between calls r.rnges from l to 4 minutes with the distribution os shown in
table 1.1. Able is more experilmccd und can providl.' service faster thnn Baker,
which means that, when both arc idle, Able tokes the call. The distribution of
their service times arc shown in table 1.2 and 1.3 respectively. (08 Marks)
Table 1.1 Inter A rriv.il time (IAT) Distribution
IAT I 2 3 4
Probability 0.25 0.40 0.20 0.15
Table 1.2 Scrvicl' time dis tribution of Able
IAT 2 3 4 5
Probability 0.30 0.28 0.25 0.17
25
VIII Semt ( CSE/ ISE)
Table 1.3 Service time distribution of Baker
3 4 s 6 2. a. Prepare a slm
Prob11bility 0.35 0.25 0.20 0.20 queu~ stoppin~
Random digits for inter 11rrival time- 26, 98, 26, 42 given belo
11 l'l'
Random digits for Service time- 95, 21, 51, 92, 89, 38 Simulate the t
Simulate thi11 system for 6 customers by finding
I) Average waiting time of customers
Ii) Average s~rvice time of Able
iii) Average servke time of B:1ker Ans.
ADIi, Inter Arrival deh:rminution
Customer no. Random digit Inter arrival time
I
2 26 2
3 98 4
4 90 4
s 26 2
6 42 2
Simulation of ABLE - Buker S
AUI.F. Uoke,
Cu)lumcr
'"'
I
2
lnltr
:uTl\'al
hme
2
Am"val
11n1r
2
Tune
~.,,~
riq,,n
0
S1,'1'\l(C
hmc
s
1,me
a-ff\!ICC
l'nds
~
~-
11111c
M.'fVl~t:
2
~"i\lt."C
lllll\':
T1a1c
M:r\lCC
Fnd
s
Cwa,....,
0.1.,
0
Time
Spent 1n
S~SICffl
s Sinmlution rablQ
l
1 ~ 6 l ')
' 0
u
)
J
b
Clock LQ
4 4 10 IU s IS 0 s 0
:, 2 12 0
12 b 11 0 6 -
6 2 14 11 l 18 I 4
I I
2 2
i) Average waiting time of customer = Totalt1mecustomer wait mqueue
total number of customers 4 I
6 0
=¾ =0.16 mins 8 I
ii)Averageservicc timeof Able 11 I
16 . IS 0
=-= 4 llllllS
4 16 0
iii) Average service time of Baker Total busy time<
Maximum queue
=~= 4.Smins
2
,, '"-'"... Sc.1,1t11
26
OR
2. a . Prepare II simulution htble miing the time a,h, a nee 11lgorlthm fo1· 11 slniclc chunnel
queue slopping c\"cnl will be Ill 30 minutes. Inter arrival lime 11nd service times
arc given below. Find the busy lime server and maximum queue lcn1tlh.
Slmululc lhe 111ble unlil fifth customer departs
IAT 6 3 7 S 2 4
Service time 4 2 5 4 I 5 4 4
(06 Murks)
Ans.
Serial No IAT Arri,al Sen ice
al time titll(' time
I I 0 4
2 I I 2
·'" 6 2 s
4 3 8 4
5 7 11 I
6 s IX s
7 2 23 4
8 4 25 I
Time l1n1c
,cc
CuSlotffl:I' Spall IP 9 I 29 4
0<1"' iyMefl\
bd Simulation table for checkout counter simulation
0
Cumulative stntic·s
0 l
Q l Clock LQ(t) Ls(t) FEL 13 MQ
0 0 0 I (A. I) ( D,4) (E.30) 0 0
18 0 6 I I I (A.2) ( D,4 ) ( E.30) I I
4
2 2 I (D,4) (/\,8) ( E.30) 2 2
mqueue
4 I I ( D.6) (A,8) ( E, 30) 4 2
ers (A,8) ( D, 11) (1::,30) 6 2
6 0 I
8 I I ( D.11) (A.I I) (E.30) II 2
l1 I I (D, 15) (A , 18) ( r,.30) 11 2
15 0 I ,(D.16) (A, 18) (E.30) 15 2
16 0 0 (A. 18) (E,30) 16 2
Total busy 1I111c ol server = 16
Maximum queue length ~2
27
VIII Set111 ( CSf(ISc) S y ~ M ~ ~ S~ L < m l CBCS ·ModeiQ~
2. b. Six dump trucks arc used to haul coal from lhc entra nce or a smali mine 10 i
the railroad. Each truck is loaded by one of two loaders. After loading, a truck 36 0
immediately moves to scale, to be weighted 11s soon 11s possible. The queue
discipline is FlFO. When It is weighed, a truck travels 10 lhe industry und I
returns lo the loader queue. The distribution of loading time, weighing time
52 0
and travel time are given tu the following table as follows.
Loading time 10 5 5 10 15 10 10
Weighing time 12 12 12 16 12 16 -
Travel time 60 100 40 40 80 - - 6-l 0
Calculate the total busy hme or both the loaders, the scale, averngc loader and
scale utilization. Assume 5 trucks arc at the lo:,dcr and o ne is at the scale at time
0 stopping Event TE=60 mins (10 Marks)
Ans, . a. Explain i) Inv,
4512 Mainlainabillf)
·1· .
Average Ioad erut11zat1on;--= 0 .35
64 A ns. i) In realistic i
least 3 random ,
A veragcscaIc11t1·1·1zallon
. =-=
64 I
64 I. The numbt:r c
Simulation table for dump truck o peration
2. The 11me bet
SunulahM !)t.df t:uurnl.ah\c ~l\1stn:s
3. The lead time
~l,n Load1."1 \\e1ah
In very simple
c1,,.q11 L(l) WQ(1) WCU) 1(1 o, B,
over time and I
L<JII) ~th,"U\! llli~U.:
28
~ S~t.Ot'\I
rn
ALQ. 92. OT,
EW SO.OT,
- 64 0 0 0 I
AU), 72. 0T,
Al,1), 124.0T,
4S 64
At<) 76 or
- Al.I). Y2 DI,
le, averuge loader and Al.l). 144. 01,
is at the scale at time Module - 2
( 10 Mark.'!)
3. a. Explain i) Inventory - supply chain management systems ii) Reliability nnd
Maintainability. · (08 Marks)
Ans. I) In renlistic inventory and supply - chnin mnnagcmcnt systems, there an: al
least 3 random variables.
I. The number of units demanded per order or per time period.
2. The time between demands.
3. The lead time
In very simple mathematical models of inventory systcms, demands is a constant
D,
over time and lead time is zero or a constant.
• In most real world cases hence in simulation models. demands occurs randomly
s.DT, in time and the number. of units demanded 1:ach time a demand occurs is also
0 0
10,0T,
11,DT, random.
10,DT, • Distributional assumptions for demand and lead time in inventory theory texts are
10.or; 10
12,DI, usually based on mathematical tractability but those assumptions could be invalid
10. ur, in a realistic context.
20 10
12. or • In practice, the lead time distribution can often be fitted fairly well by a gamma
, ?O, 1>1,
12, llT,
distribution.
10
20. 1)1, 20 • Unlike analytic models. simulation models can accommodated whatever
. 2s. 01. assumptions appear mosl reasonable.
_ 20. OT,
. 2S,OT, 24 12
• The geometric poisson and negative binomial distributions might be appropriate.
~.24.0T , • The passion di~tribution is oncn used to model demand because it is simple. it is
LQ 72 OT,
extensively tabulated and R is well lrnO\.\ n.
ps OT
20 • The tail of the poisson distribution is generally shorter than that of the negative
W.2·1, IH,
, _lJ 12 nr, binomial. which means that fewer large demands will occur if a poisson model is
1.,2S.Dl used than if a negati\'e binomial distribution is used.
W.3b.OI, 44
1.Q, 72,01 1 ii) reliability and maintainability:- Time to failure has been modelled with numerous
LQ 124, OT,
distributions including the exponential. gamma and wcibull. If only random failures
\W.36,0T.
\LQ, 72,DT,
15 occur. the time to failure distribution may be modelled as exponential. The gamma
~I.Q, 124, OT, distribution arises from modeling stand by. redundancy. where each component has
an exponential time to failure.
29
fS~WtV
a small mine to LW Sl 01,
r loading, a truck ,6 0 0 2 I
IH
or.
AU).72 IH,
1\1 (). Ill ur,
,, J6
sihle. The queue Al.(). 76. Dl,
the industry and EW<,1 Dl,
c, weighing time AL<). 72. 0 I,
S2 0 0 I I OT, AL(). 124, OT, 4$ $2
ALI), 76. OT,
ALI). 92. Dr,
f.W.80, OT,
Al.<),72.01,
AUJ, 12-1, 01 1
6-1 0 0 0 I
Al.I) 76, DI
41 64
Al.l).92.DI,
~ ,d.,,nd 0 Al.I). 144, DI,
10
• In practice, the lead time distribution can often be fitted fairly well by a gamma
distribution.
,.
r'. 20 • Unlike analytic models. simulation modds can accommodated whatever
assumptions appear most reasonable.
fl, • The geometric poisson and negative binomial distributions might be appropriate.
l. 24 12
i. • The passion distribution is of'tt:n used to model demand because it is simple. it is
OT
r
extensively tabulated and R is well known.
1)1, 40 zo • The tail of the poisson distribution is gem:rall> shorter than that of the negative
DI, binomial. which means that fewer large demands will occur if a poisson model is
~T
PT, 24
used than ifa negative binomial distribution is used.
44
DT 1 ii) reliability a nd malntaln:1bility:-Time 10 failure has been modelled with numerous
~ DI,
distributions including the exponential. gamma and weibull. If only random fai lures
I'll
oi", 4S 2S occur. the time to failure distribution may be modelled as exponential. The gamma
~ .OT, distribution arise~ from modeling stand by. redundancy. where each component has
an exponential time to failure.
t 29
1\1,+,.., f.c"M Sc,.,.M
VIII Sem,, ( CSE/ISE) CBCS - 1-,iode,l.,Q
• The wcibull distribution has extensively used to represent tirnt: to failure and its
nature is such that it can be made lo approximate many observed phenomena. 0,
When there arc a number ofcomponents in a system and failure is due to the most (x -,
serious of a large number of defects or po!;siblc defects, the weibull distributor
seems to de partically well as a model. (b-n)(
F(x) =
3. b. Explain the following continuous distributions I- (c
i) Tria ngular tlistribution (c-b
ii) Lognormal distribution (8 Murks)
Ans. i) Triungulnr distribution :- A random variable x has a triangular distribution if its
pdf is given by Ii) Log normal d
f (x)-
2
(x-a) , a!>x~b its pdif is given by
(b -a)(c-a) ~ex
f(x) J27rax
2(c-x)
(c-b)(c-n)'
0, else where Where a= >O. The
E ( X) = c••· 0
'''
Where a< h~ e. The mode occurs at x=b. A triangular pdf is shown below
f(x) V ( X) = e=,.. •' (e 01
Three lognormal A
a b c x
The pnrnn11.:ters (a. b, c) can be related to other measures. such as the mean and the
mode, as follows
E(x) =a+ b + c
3
the mode can be determined ns
TI1e parameters ►
Mode=b =3 ;1c (x)- (a + c) parameters come
Because a .S: b .S:c a lognormal distr
2a-+ c5 E( lognormal are kn
- · X) ~--
a+ 2c
J 3 given by
The mode is used more oftt:n than the mean to characterize the triangular distribution.
Its height is 2/ (c-a) above the x-axis.
µ= In[ ,µj ,
►lj_+ Gj
The cdffofthe triangular distribution is given by
µ~ +a2
a2 = In ,
[ µl
30
me to failure and its
served phenomena. 0, x=,;a
re is due to the most (x -a}2
wcibull distributor aS:xS:b
(b - a)(c -a)'
f(x) =
1-
(C- x}2 b·s xsc
(c-b)(c-a)'
(8 Murks) I, X >e
tar distribution if its
ii) Log normal distribution:- A random variable x has a log normal distribution if
its pdf is g.iven by
✓2~ox exp[
2
f(x)J (ln ~;lµ) ] X>0
l 0. other wise
Where a= >0. The mean and variance of a log normal random variable are
E( X) = e10• 0
"'
own below
V ( x) = eJi•+a' ( e"~ - I)
Three lognormal pelf's all having mean I, but various 1/2. I and 2, are shown below.
1.5
2 3
The parameters µ and a2 are not the mean and variance of the lognonnal. These
parameters come from the fact that when y has aµ(µ, o2) distribution then x= e> has
a lognormal distribution with parameters ~land o2. If the mean and variance of the
lognormal are known to be ~ and a 2 respectively, then the parameters µ and a2 are
triangular distribution.
::,::rlJ~,j_:t ,l +O"i.
..,
31
VIII Semt (CSE(ISE)
CBCS. J..iodel,
OR od·
4. a. Explain networks of queues in Queuing model. (06 Murk.~) p[n+IJ: = p[l
Ans. :fhe study of mathematical model of networks of queue is beyond the scope . do;
• A few fundamental principles arc very useful for rough cut modell ing. prior to for II from c 1
simul~tions study. do
• The following results assume a stable system with infinite calling population and p[n+ lJ != Pf I J
no limit no system 1:apacity. ' od;
I. Provided that no customers are created or destroyed in the qu1:uc, then the departure RETURN (cv
rate out of a queue is the same as the arrival rate into the queue, over the long run. end;
2. If customers arrive lo queue i at rate A and a fraction 05 pij 51 of them are routed
to queue j upon departure, Ihen the arrival rate from queue i to queue j is ,.jpij over
the long run. S. a. What ls llnca
3. The over all arrival rate into queue j, Aj, is the sum of the arrival rate from all genemte a seq
sources. If customers ~rri,e from outside the net\\ork at rate aj, then Ans. Linear congri
Ai =a,+ L A, P,1 sequence of in
olh Xi ti= (axi + c
4. If queue j has c < C1J parallel servers.each worl...ing :11 rntc ~lj. then long run The initial \'alu
I
utilization of each server is • lfc#O in eq
Al • If c=0, the fc
p=- xi
J C ~l
1 1
R,~- i =I'
Ill '
and pI <I is n:quired for the queue to be stable. X,., =(a>.iie)
S. if. for each queue j, arrivals from oulside the network from a poisson process
with raic a and if there are c identical service)> ddi,ering exponentially distributed X,=(iix,,+c)
I J
service times with mean ~lI tilen,in steady SIHlc, queue j behaves like an µ/µle J queue X,=(17x27+
with arrival rate x. =502 modl0
A1 =a,+ LA,
P,,
all I
x. =2
2
4. b. Write the mnplc procedure to calculate p0 for the M/M/C/K/K queue. R, =I00=0.02
( 10 Marks) X:=(ax,+c)n
Ans. mmcckk= proc {lambda, mu, c, k}
# return steady state probabilities for M/M/M/C/K/K queue. =(17•2+4
#notice that p[n+l] is p-n, n=0, ...... K = 77modl0
local crho, k fac, c foe, p, n; X1 = 77
P: vector {K+ I, O};
crho:- lambda/ mu; 77
R. =-=0.77
k fac:= k!; · 100
c fac:= c!;
p[ I]:= Sum ((k fac / (n"! (k-11)''!)) 1 crcho"n, n-0, c-1 )+
Sum ((k fac/ (c"(n-c) • (k-n)! • c foe))• crho"n, n=c ..k);
P[ I]:= 1/P[ IJ .
for n from I to c-1
· 32
od;
p[n+I): = p[l) ♦ (k fac/ (n!*(k-n)!))* crho"n;
(06 Marks)
do;
ond the scope . for n from c to k
.ut modeII i1tg. prior to do
p[n+ I]!= p[ 1J• (k fac) (c"(n-c)*(k-nWc facWcrho" n:
calling population and
od;
RETURN (eva / m (p));
eue, then the departure
end;
iue, over the long run.
\j $ 1 of them are routed Modulc-3
to queue j is i-.jpij over 5. a. Whnt is linenr, congruential method? Use the linear congruential method to
generate a sequence of rJntlom numbers x0=27, n= l7, c=43, m=I00. (06 Marks)
he arrival rate from all Ans. Linear congruenti11I method:- It was initially proposed by lchmcr, produces a
aj, then sequence of integers x1• x: between rand m-1 following recursive relationship
Xi ti = (axi + c) mod mi = 0, I, 2........ 1
The initial values x0 is called seed, a..... multiplier c ..... increment and m is models.
rate ~lj. then long run • If c#0 in eq I then the form is known as the mixed congruential method
• If c=0, the form is known. as multiplicative congruential method.
xi .
R,.-- 1=1,2......
m
X,. 1 z {a xi H )mod m
from a poisson process X 1 = (axu -I c)modm
exponentially distributed
X 1 = (17x27 +43) modl00
haves like an µ/µ/c, queue
X1 ~
502mod l00
x, -=2
2
RI =-=0.02
C/K/K queue. 100
( 10 Marks) X 2 =(ax1 +c)modm
= ( 17 • 2 + 43) mod I 00
ue.
= 77modl00
X2 = 77
77
R. ---=- 0.77
· 100
33
C13CS · Mo-dclt
VIII Se+ttt ( CSE/ISE)
X; =(17•77,+43)modl00 Stcp4:-Sim
= I352mod 100
X 3 = 52 0 s,m
52
R3 = =0.52
100 After computi
X 4 =(17*52+43)modt00 z0 S Zee/ 2
x. =27 Problt!m solu
27 Stt:p I:- To tcs
R~ =-=0.27 Step 2:- Calcu
100
it (M+I) m:::N
xj = (17 x 27 +'43)mod I00 3+(4+1)5$30
X5 = 2 28530
z M=4
R5 =-=0.02
100 Sim=..!_[(0.23)
5
5. b. Explain test for Autocorrelation'! Consider the sequence.
0.12 0.01 0.23 0.28 0.89 0.31 0.64 0.28 0.83 0.93
Sim =-0.1945
0.99 0.15 0.33 0.35 0.91 0.41 0.60 0.27 0.75 0.88
and
0.68 0.49 0.05 0.43 0.95 0.58 0.19 0.36 0.69 0.87
Test for whether the 3rd, 8th, 13th ans soon numbers in the sequence arc auto 13(4)+
corrected using x= 0.05, consider critical value 1.96. (10 Marks) O'.,, =
12(4 + I
Ans. Test for auto correlation :- It is concerned with the dependence between numbers
in a sequence. I !ere test to be described sho11ly requires the computation of the auto z = -0.1945 _
0
correlation between every 111 numbers starting with the i th number. 0.1280 -
The auto correlation between numbers Ri, Ri+m, Ri+2m .... Rit (M+l) m. The value -Zo./2 :!;20 :!;2
Mis largest integer such that it (M+l) m:SN where N is the total number of values -l.96!:i:-1.516
in the sequence.
Step 1:- Formulate the hypothesis :. llypothcsisof i
H0 : Sim=O Vs H1 : Sim -:/, O
Step 2:- Calculate M, using it (M+ I) m::,N
Sim 6, a. Explain lhl· foll
Stcp3 : - CalulatcZ0 = - - i) Weibull distri
CTsun
ii) Trinngul:tr di•
It is approximately normal if the values Ri, Ri+m, Ri+2m .... Ri+(M+I )m, are Ans. i) Weibull distri
uncorrelated. machines or elect
Which is distributl'd normally with a mean of O and variance of I, under the f3 xll-1
assumption of indepcndcn1.:c F{x) == ~ e
{
Where e<>O and 13
Step I:- The cdf i
Step 2:- Set F(x) ·
34
~S~LC1'\I
1
Step4 :-Sim = -
Ill
-[:f
+I k ,,
Ri + km. Ri + (k + l)m]- 0.2S
Jl3m + 7
(J ----
S,m - 12 (111 + 1)
Aller computing 20. do not reject the n all h) polhesis of independence if Z-oc/2 ~
zO ~ Zoc / 2
Prol,l~m solution
Step I :- To test 110 : Sim - 0 or 1-1 1: Sim#0
Step 2:- Calculate M
it (M+ I) msN
3+(4+ I )S:S30
28~30
M=4
Sim - t[( 0.23)(0.'.!8) + (0.'.!8)( 0.33) + ( 0.33 )(0.'.!7) + ( 0.27)
35
VIII Sent, ( CSE/ISE) Sy~ModeUrtfttMUi,S~wrt, CBCS · M~Q
Stcp 3:- Solving for x in tem1s of R yields. 6. b. Downtimes fo
x"'<l(- ln( I-R)] 11P ...(2) be gamma di8
By comparing I and 2. ifx is a weibull variate. then x'' is an exponential variate with 1}=2.30 and 0
mean aP, if y is on exponential variate with mean µ, then y•·P is a weibull variate. 0.434, o. 716
With shape parameter P and scale parameter a=µ I/13. Ans.
ii) Triangular distribution :- Consider a random variable x that has pdf;
x, Os x s I Step! :-a =
f(x) = 2-x, lsxs2
(2
{ Step2 :-Gene
0. othen\ ise
This distribution is cnlled n triangular distribution with endpoints (0. 2) and mode at SctV
I. Its cdf StcpJ. -Comp
0, xSO Step4:-x::5.3
x2
Osxsl
2
F(x) =
(2-xt = 0.91
I- I SxS2
2 '
I, X>2 Step2: -Gt:ner
f(x)
Stcp3 : - Compu
Stcp4 :-Since x
0.767,1
Step 5: - Divide ,
37
L-----'- -
VIII Se,.m, ( CSE/ISE)
J) Physical or conventional limitations:- Most real process have physical limits
on performance for example, computer data entry cannot be faster than a person
can type. IJecause of company policies, there could be upper limits on how long a cr, - -~x:- n-1
process may 1nke.
4. The nature of proces;:. The description of the distributions can be used to justify
a particular choice even when he data nre available. When data arc not available, the cr = ~
uniform, triangular and beta distributions arc often used as input models.
7. b. Let x1 represent the aver.1ge lend time to deliver and x2 be the annual demand, 104520
for industrial robots. The following data nrc nvnilable on demand and lead time
er =
for the lust ten industrial robots. The following data arc availnble on demand
and lead time for the last ten years. P= cov(x"x 2
Lead time 6.5 4.3 6.9 6.0 6.9 6.9 5.8 7.3 4.5 6.3 er,, cr,.
Demand 103 83 116 97 112 104 106 109 92 96
Lead time an
Find the dependency between the lead time and demand.
Lead lime Demand x 1 • x~ x2 x/
I
(x ) (x,)
8. a. Explain point c
6.5 103 669.5 42.25 10609 Ans. The point estima
4.3 83 356.9 18.49 I "
6.9 116 800.4 47.61
6889
13456
0:-I 0 I I
v
• I
6.3
4.5
96
92 414
604.8
20.25
39.69
8464
9216 I\
E(o)
Ix -61.4 2.,x.- 10111 6328.5 386.44 10452 and E(O)- 0 is cnl
that are unbiased
X. = ~::X,
n
6 4
= 1. =6.14
10
of 8.
Example to thee
-X2 =~::X2 l018
-- = - = 101.8
•
w: - ,Ew an
I
N ,..,
r,..
n 10
in \\ hich case yi i
cov(x.,xi)= n~l(.2:X,x 2 - nx. x2 ) The point estimat
run length is deli,
l • I
= IO-I (~)328.5-10 x 6.14x 101.8) t =T'" LTl y(t)
•
I and is cc1llcd a tin
= 9 (77.98) = 8.66
38
e physical limits
ter than a person '°'x2
L.., I - x2
I
'°'x2-x2
L.., 2 2
o,=
its on how long a n-1 n- 1
1 be used to justify cr = ✓386.44 10(6.14) =l.0Z4
• not available, the
11odels.
annual demand, cr = 104520 10(101.8) = _
9 930
~nd and lead time
lnblc on d emand
cov(x.,x 2 ) 8.66
4.5 6.3
p= cr,, cr,, = (1.024)(9.930) 0.86
92 96 Lead time and demand are dependent
-r-./ OR
8, a. Explain point estimation in measures of performance. ( 10 Marks)
P609 Ans. The point estimator ofO to based on the data ly,.....y.,} is defined by
,.. I n
889 0=-I
n 1• 1
Y, ....(1)
p456
~409 Where 0 is a sample mean based on a sample size n. Computer simulation languages
,nay refer to this as a discrete time, collect, fall y or observational statistic. The point
2544
estimator 0 is said to be unbiased for Oif its expected valm: is 0- that is, if
0816
In general E( e)- O ... (2)
1236
1181
... (3)
8464
I\
9216 and E(8)- 8 is called the bias in the point estimator 0. II is desirable to have estimators
~0452 that are unbiased or if.this is not possible, have a small bias relative to the magnitude
of0.
Example to the estimator of the form of equation I including wand w 0 of equations
a (N a IN,
W;::: - I w, andwo;::: - Iw~ ➔ w 0
N ' I N ,.,
in which case yi is the time spent in the system by customer i.
The point estimator of$ based on the data {y(t), O~ t~ Tr), where Tc is the simulation ·
run length is defined by
" I
4' ;:::,,~ foTE y(t)dt ... (4)
39
S y ~ M ~ ~S~i.,01'\, C6CS · Modet-Q
VIII Se.mt ( CS£/ISf)
Simulation languages may refer to this as a --continuous lime''. ·'discrete - change'' 5. A communicaf
components. It is
or ·'time - persistent" statistic.
In general E(0) * 0 ... (S)
40
•, ''discrete• change'' 5. A communications system consists of se,eral components plus several back up
components. 11 is represented schematicall) in the figure
41
VIII Se+111 (CSE/ISE) C13CS · tvfodeiQ
9. b. Explain Batch means for interval estimation in steady -state simulations. l ) Face validi
(06 Marks) appears reason
Ans. The method of batch means attempts to solve the disadvantages of a single- about real invo
replication design arises when \\C:: try to compute the:: standard error of the sample Sensitivity anal
mean, by diving the output data then treating the means of these batches as is they Model user is ,
were independent. input variables
When the raw output data aficr detection fonn a continuous time process {y(t) , Tc,S increased. We
t :S T0+ TL} such as the length of a queue or the level of inventory, then we form k 2) Validating
batches of size m=T /k and compute the batch means as assumption and
I ,m
Y, = -m f
( 1-l)m
Y(t+T0 )dt
Structural assu
consider custo
be I or 2 or 3 etc
for j= I. 2, .....k. be FIFO etc. T
When the raw output data after deletion form a discrete time process {Y i = d+ I during appropri
1
d+2.... n]. such as the customer delays in a queue or the cast per period ofan invento~ regarding bank ~
system, then we form k batches of size m= (n-d)/k and compute the batch means as should be based
y =..!_ ,''" Y, +d example, In abo
' M L..,. u- i, M + I service time etc.
For j= 1,2, ....k That is, the batch means are formed as shown here The reliability 0
consist of 3 steps
YI ...... Yd' vd...1••..···yd+ru' yd't-ftl♦I ......vtt ...:!m •••••Yd+(il.-1)111 ..1 •
I. Identifying pr
JddN V1
v, 2. Estimating pan
3. Validating mod
J) Input o utput
Starting with either continuous time or discrete time data the variance of the sample by using the actu
mean is esJimated by that the simulatio
42
VIII Set11.r ( CSE/IS£) Sy~ M~ ~ s ~ W l ' \ I CBCS · ModeuQ
9. b. Explain Ba tch means for intenial estima tion In steady -state simula tions.
(06 Marks) appears reas
A ns. The method of batch means attempts to solve the disadvantages of a single- about real inv
replication design arises when we try to compute the standard error of the sample Sensitivity an
mean, by diving the output data then treating the means of these batches as is they Model user i
were independent. input variabl
When tilt: raw output data after detection fonn a continuous time process {y(t) , T;, increased. W•
a
t 5 T0+ Tf} such as the length of queue or the level of inventory. then we form k 2) Validalin
batches of size m=T/k and compute the batch means as assumption a
I Structural as
J
JIU
YJ =111- Y(t+T0 )dt consider cust
t1-l)m
be I or2 or3
for j=- I, 2, .....k. be FIFO etc.
When tht: raw output data after deletion fom1 a discrt:te time process {Y,, i = d+ I. during appro
d+2.... n). such as the customer delays in a queue or the cast per period of an inventory regarding bar
system. then we form k batches of size m= (n-d)/k and compute the batch means as should be ba
y _ I ""''" Y, + <l example, In
' - M L,, o ·11 M + I service time c
For j= 1,2,....k That is, the batch means are fom1ed as shown here The reliabilil)1
consist of 3 st
~ • ~ • Yd+ni,l••••••YJ+'m•••••Yd♦(~-l)n1•I'
I. Identifying
deleted Yt Y1
2. Estimating
......Yd ♦~tn 3. Validating
3) Input O UI~
Starting with either continuous time or discrete time data the variance of the sample by using the
mean is estimated by that the simu
S
1
=.!.. i: (vj-Yr ±vf = -k~
occurred in tll
if is importan
time period.
k k ,., k-1 ,., k(k-1)
Where j is the overall sample mean of the data after deletion. J0, b. Explain Ilcrit
Ans.
OR • Calibratio
IO. a . E~plain three slcp approa ch for , ·alidation process as formulated by Naylor a nd making adj
Finger. • , (10 Marks) additional
Ans. The following 3- step approach aids in the \'Rlidation process. • The comp
I. Build a model that has high face value. • Subjective
2. Validnte model assumptions. · aspects of
3. Compare the model input output transformation 10 corresponding the input/ output • Objectivct
transformation of real system. data produ
• Then one
model dat,
42
es~im'\I 1) Face validity:- The first goal of simulation model is to construct a model that
c simulations.
appears reasonable on its face to model users and others who are knowledgeable
(06 Marks)
about real involved in model construction from its conceptualisation to implement.
antages of a single• Sensitivity analysis can also be used to check models face validity.
error of the sample Model user is asked if the model behaves in the expected \\ ay. When one or more
se batches as is they input variables arc changed for example; In queuing systems if arrival rate is
increased. We expect server utili1.atio11 and queue length to increase.
ne process {y(t) , T;, 2) Va lidating motlel assumptions:- Model assumptions are of 2 types structural
tory, then we form k assumption and data assumption.
Structural assumptions involve questions of how the system operates. For example,
consider customers queueing and service facility in a bank. Number of queues may
be I or 2 or 3 etc customers may be moving from queue to other queue discipline may
be FIFO etc . These structural assumption should be verified by actual observation
during appropriate time periods together discu~~ion with managers and teller
process {Y., i d+ I. n;garding bank policies and actual implementation of this policy. Data assumption
eriod of an inventory should be based on connection of reliable data and correct analysis of data. For
te the batch means as example, In above banking system, data may be collected on inter • arrival time,
service time etc.
The reliability of data is verified by consultation with bank mangers. Analysing data
ere consist of 3 steps
I. Identifying probability distance.
2. Estimating parameters.
3. Validating models by goodnesi. of fit test.
3) Input output validation us ing historical da ta:- Here the input data is generated
by using the actual historical record. When using this technology the model knows
variance of the sample that the simulation will duplicate as closely as possible the important events that
occurred in the real system. To conduct n validation test using historical input data
if is important that all input data and all response data be collected during the same
time period.
10. b. Explain Itera tive process of calibrating a model. (06 Marks)
Ans.
• Calibration is the iterative process of comparing the model to the real system,
making adjustment to the model, comparing the revised model to reality, making
ulated by Naylor a nd additional adjustment comparing again and so, on.
(10 Marks)
• The comparison of the model to reality is carried out by a variety of tests.
• Subjective tests usually involves people, who arc knowledge about the model
aspects of the output.
• Objective tests always require data on the system behaviour, plus the corresponding
ing the input / output
data produced by the model. ·
• Then one or more statistical tests are performed to compare some aspect of the
model data s et.
43
L_-----'--------------
VIII Se.-mt ( CSE(ISE) S y ~ M ~ CM'\d, Sim«la-t"i.ort,
• This iterative process ofcomparing model with the system and then revising both Eight Semest
the conceptual and operational models to accommodate any perceived model
deficiencies is continued until the model judged to be sufficiently accurate.
Time: 3 hrs.
Com(llll"C model to realit~ Note : Amwrr any
Initial model
Revise
Real <.:omp,1rc revised model to n:alit)
First re\ ision of model I. a. What is simul
System study•
• Revise Ans, A simulation is
Compare sc,;oml re, is,on tu realit) time. Whether
~-----------.i Second revision of model an artificial hist
inferences conce
Refer Q.no. I (a
b. A computer tee
who take calls ,
time between c •
table I.I. Able
which mean th
their aervice ti
distribution
Table l.J: late
,__
TableJ.2~
~
__
Scrvic
Probab
Table 13 : Serv~
Random digits
~
26, 98, 90, 26, 4:
Random digits J
95,21, SJ,92, 89,
Simulate this l)i
Finding
(l) Average late
(ii) Average se
(iii) Average se
Aas. Refer Q.no. I (b)
and then revising both Eight Semester B.E. Degree Examination, CBCS - June / July 2019
any perceived model System Modelling and Slmulatlon
cicntly accurate. Max. Marks: 80
Note: Amwer any FIVE full questions, Sl'lecting ONE full que:i.tlonfronr l'ach module.
ial model
Module- I
Revise
What Is simulation? Explain with Oowcbart the 1teps involved in slmulalion
ision of model study. (08 Marks)
A simulation is the imitation of the operation of a real-world process or system over
Revise
time. Whether done by hand or on a computer, simulation involves the generation of
an artifkial history of a system, and the observation of that artificial history to draw
vision of model
inferences concerning the operating characteristics of the real system.
Refer Q.no. I (a) of MQP - I
b. A compuler technical support centre in staff'ed by two people. Able and Buker,
who take calls and try to answer questions and solve computer problems. The
time between calls ranges from 1 to 4 minutes with the distribution as shown in
table 1.1. Able is more experienced and can provide service faster than Baker,
which mean that, when both are idle, Able takes the call. The distribution of
their service times are show in table 1.2 and Table 1.3: Inter arrival time (IAT)
distribution
Table 1.1: Inter Arrival Time (IAT) distribution
lAT (mlns) 1 2 3 4
Probability 0.25 0.40 0.20 0.15
Table 1.2: Service Time (ST) Distribution of Able
Service time (mins) 2 3 4 5
Probablllt 0.30 0.28 0.25 0.17
Table 13: Service time distribution of Baker
Service time (mins) 3 4 5 6
Probability 0.35 0.25 0.20 0.20
Random digits for inter arrival times are :
26,98, 90,26,42,74,80,68,22,48,34,45,24,34
Random digits for service time arc :
95,21 , 51,92, 89, 38, 13, 61, 50,49, 39,53, 88,01, 81
Simulate this system for 10 customers by
Finding
(i) Average inter arrival time
(ii) Average service time of Able
(iii) Average service time of Baker. (08 Marks)
Ans. ReferQ.no.l(b) of MQP-2
I
9'~5~tht'\I
m and then revising both Eight Semester B.E. Degree Examination, CBCS -June / July 2019
te any perceived model System Modelling and Slmulatlon
fficiently accurate. Time: 3 hrs. Max. Marks: 80
Nole: A11swer any FIJ/£/u/1 questions, 1>elecling ONE/111/ question from ea,·h module.
itial model
Module- I
Revise
1. a. What is simulation? Explain witb 0owchart the steps involved in simulation
, is ion of model study. (08 Marlu)
Ans. A simulation is the imitation of the operation of a real-world process or system over
Revise time. Whether done by hand or on a computer, simulation involves the generation of
an artifidal history of a system, and the observation of that artificial history to draw
revision of model
inferences concerning the operating characteristics of the real system.
Refer Q.no. I (a) of MQP - I
b. A computer tccboical support centre lo staffed by two people. Able and Baker,
who take calls and try to answer questions and solve computer problems. The
time between calls ranges from 1 to 4 minutes with the distribution as sbown in
table I.I. Able is more experienced and can provide service faster than Baker,
which mean that, when both arc Idle, Able takes the call. The distribution of
their service times are show in table 1.2 and Table 1.3: Inter arrival time (IAT)
distribution
Table I.I: Inter Arrival Time (IAT) distribution
IAT (mins) 1 2 3 4
Probability 0.25 0.40 0.20 0.15
Table 1.2 : Service Time (ST) Distribution of Able
I Service time (mlns) 2 3 4 5
IProbability 0.30 0.28 0.lS 0.17
Table 13 : Service time distribution of Baker
Service time (mins) 3 4 5 6
Probability 0.35 0.25 0.20 0.20
Random digih for later arrival times arc :
26,98, 90,26,42,74, 80, 68,22, 48, 34,45,24,34
Random digits for service time are :
95,21, Sl,92, 89, 38, 13, 61, 50,49, 39,53, 88,01, 81
Simulate this system for 10 customcn by
Finding
I\__A_.,-- -
and then revising both E ight Semester B.E. Degree Examination, CBCS - June / July 2019
any perceived model System Modelling and Simulation
ciently accurate. Time: 3 hrs. Max. Marks: 80
Note: A11swer any FIVE full questions, seleding ONE full questio11/ronr end, module.
al model
Module- I
Revise
I. a. What is simulation? Explain with flowchart the steps involved in simulation
sion of model study. (08 Marks)
Ans. A simulation is the imiuition of the operation of a real-world process or system over
time. Whether done by hand or on a computer, simulation involves the generation of
an artificial history of a system, and the observation of that artificial history to draw
inferences concerning the operating characteristics of the real system.
Refer Q.no. I {a) ofMQP - I
b. A computer technical support centre in stafl'ed by two people. Able and Baker,
who ta ke calls and try to answer questions and solve computer problems. The
time between calls range» from 1 to 4 minutes with the distribution as shown in
table 1.1. Able is more experienced and can provide service faster than Baker,
which mean that, when both are idle, Able ta kes the call. The distribution of
their service times are show in table 1.2 and Table 1.3: Inter arrival time {IAT)
distribution
Table I.I: Inter Arrival Time (IAT) distribution
IAT (mlns) 1 2 3 4
Probability 0.25 0.40 0.20 0.15
Table p : Service Time (ST) Distribution of Able
Service time (mlns) 2 3 4 5
Probability 0.30 0.28 0.25 0.17
Table 13 : Service time distribution of Baker
Service time (mios) 3 4 S 6
Probability 0.35 0.25 0.20 0.20
Random digits for inter arrival times are :
26,98,90,26,42,74,80,68,22,48,34,45,24,34
Random digits for senice time are :
95.21, 51,92, 89, 38, 13, 61, S0,49, 39,53, 88,01, 81
Simulate this system for 10 customers by
Finding
(i) Average Inter arrival time
(ii) Average service time of Able
(Iii) Average service time of Baker. (08 Marks)
Ans. Refer Q.no. I(b) of MQP - 2
1
VIII Sem, (CS£ I I S£) CBCS · Jl,(..t'l,(!/1July 2
OR
2. a. List the various concept used in discrete event simulation and explain a ny four 3. a. Explain binomial 8
of these with examples. (08 Marks) mean and variance
Ans. Concepts used in discrete event simulation Ans. Poisson distributio
System : A collection of entities that interact together overtime to accomplish one It describes many ra
or more goals. given by
Model : An abstract representation of the system
System state : A collection of variables that contain all the infonnation necessary to
describe the system at any time.
P(x)={e:t x
0, Oc
Entity : Any object or component in the system
Attributes : The property of an entity where a> O
List : A collection of associated entities, ordered in some logical fashion Poisson distribution
Event : An instantaneous occurrence that changes the state of the system E(X) = a = v(x)
Even Notice : A record of an event to occur at the current or sometime in future Binomial distributi,
olong with any associated data necessary to execute event. The random variabl
Event list : A list of event notices for future ordered by time of occurrence. a binomial distributi(
Activity : A duration of time of specified length which is known when it begins
Delay : A duration of time of unspecified indefinite length which is known when it P(x) = {(:} • q•·•
begins.
Example able and baker problem 0,
I. Busy state : Able and baker is busy severing It is computed by the
2. Idle state } by s, occurring in th
J. Waiting state cars waiting to get served
Entities P=[Sss. ... '. ·J
Customers, servers [Able and baker)
Event whcreq =I - P, The
Arrival event of cars
Able and baker service completion time
Activities
(:)= x!(n ~x}!
11
2
~s~ ~CS · ]IMWI(Ju½' 2019
Module-2
3. a. Explain binomial and Poisson distribution and give probability mass function
n and explain nny four
(08 Marks) mean and variance (06 Marks)
Ans. Poisson distribution
irtime to accomplish one Tt describes many random processes and is simple. The probability mass function is
given by
e""'a."
P(x) = ~ x=O,I.. ...
information necessary to
{
0, Otherwise
where a> 0
Poisson distribution mean and variance ai:e both equal to a., i.e.,
gical fashion E(X) = a= v(x)
of the system Binomial distribution
t or sometime in future The random vnriable x that denotes the number of successes in n Bernoulli trials has
a binomial distribution given by p(x) where
e of occurrence.
nown when it begins
which is known when it
P(x) =j(:)p•q•-•, x=0,1,2......n
0, otherwise
It is computed by the probability of a particular outcome with all the success denoted
bys, occurring in the first x trials, followed by n-x failures.
(:)= x!(nn~x)!
Outcomes having the required number of S, and F, . An easy approach lo calculate
mean and variance of tho binomial distribution is to consider X as the sum of n
independent bemoulli random variables each with mean p and variance p(l -p)=pq,
rival and service time Then
X = X, 4-X 2 +X3 + ..... +X11
and the mean E(X) is given by
E(X) = p+ p+ .......... + p = np
and the variance V( x) is given by
V(X) = pq + pq + ..... + pq = npq
3
VIII Sem,, ( CSE / ISE) Sy~ M~CUtd.-S ~ CBCS · J IM\el / J
b. Explain the following continuous distributions :
(l)p=}../µ
i) Uniform distribution
Ii) Exponential distribution
iii) Triangular distribution
Iv) Normal distribution. (10 Marks)
Ans. (i),(ii) : Refer Q.no. 3(a) of MQP - I
(iii) Refer Q.no. 3(b) ofMQP- 2
( iv) Normal distribution
A random variable x with mean -oo < µ < oo and variance a2 > 0 has a normal
distribution if it has the pdf.
2
I
f(x)= o.finexp
2 ""7
[ - I (x-µ) ] , - oo<x<oo
Properties :
~ µ
J I
F(x) = p(X~x)= _.,.x o,ff;cxp
27
[ - I('-µ)'] dt . b. T he sequence
Use Kolmogo
the numbers
OR Take D0 •0.
4. a. Explain the cha racteristics of a queuing system. (08 Marks) Ans. Refer Q.no. 5(
Ans. Refer Q.no. 4(n) of MQP - I.
b. Explain the various steady state paramcten of M/G/1 queue. (08 Marks) 6. a. Suggest a ste
Ans. Suppose the service times have mean I/µ and variance 0 2 and that there is one server. transform te
• If p = ,.Jµ < I, then the µ/g/1 queue has a steady • state probability distribution Ans. Step 1 : Com
with steady - slate characteristics as given in table. distribution th
• When "'<µ, the quantity p = Alµ is the server utilization or long - run proportion F(x) = I - e•l•
of time the server is busy. Step 2 : Set F
• l-p = p can also be interpreted as the steady state probability that the system
0
1-e-h = Ro
contains one or more customers. variable calle
• L- LQ=- p is the time - average number of customers being sened. R has a unifo
Table : Steady- state parameter of the µ/G/1 Queue Step 3: Solv
4
s
CU'\.d, ~ t m ' \ t CBCS - J IM'l.e/ I J u½1 201 9
(l)p=A/µ
A
2
(J;: 2 +a )
( S) LQ=___,>;2:..(. ,.'l--
--
2
P2(l+o2µ2)
p) ~ = ___;_2(-!--p-
) ....:..
(6)p = l- p
0
5
cuu;l,S~ cacs •JIMU!/(J~2019
(l)p=AIµ
4 w - ,{>;:2+02)
()o - 2(1-p)
SL - ,.2(J;:2+02)~P2(t+o2µ2)
( ) Q - 2(1-p) 2(1-p)
(6)p 0
= 1-p
Module-3
s zero as x approaches
S. · a. Use the linear congruential method to generate a sequence of random numbers
with X0 = 27, a = 17, C = 43 and m "' 100. Write 3 ways of acblevina maximal
ti mode are equal. period. (08 Marlu)
Ans. Refer Q.no. S(a) of MQP - 2
. b. Thesequence of random members 0.44, 0.81, 0.14, 0.05, 0.93 has been generated.
Use Kolmogorav Smirnov test with a = 0.05 to determine if the hypothesis that
the numbers are uniformly distributed on the interval f0, J) can be rejected.
Take D = 0.565. (08 Marks)
(08 Marks) Ans. Refer Q~no. S(a) of MQP - I
OR
e. (08 Marks) 6. a. Suggest a step by step procedure to generate random variates using inverse
at there is one server. transform technique for exponential distribution. (08 Marks)
obability distribution Ans. Step I : Compute the cdf of the desired random variable x. For the exponential
distribution the cdf is give by
long - run proportion F(x) = I - e·,._, , x ~ 0
Step 2 : Set F(x) =Ron the range ofx. For the exponential distribution, it becomes
bility that the system I - e-"" = R on the range x ~ 0, x is a random variable, so I - e·"' is also random
variable called R.
served. R has a uniform distribution over the interrupt
Step 3 : Solve the equation F(x) = R for interns of R
5
VIII Sem, ( CS£ I IS£) CBCS · JIM'l.,(V f JIA½'
1-e-i..=R b. Explain the types
-e-"-• =R-1
Ans. Types of steady st
-Ax= ln(l -R) • In analyzing si~
transient simulf
x =-±ln{l-R) ... (I) isa random variable generator for theeqn (I) is • A terminating s
a specified eve,
written as
• Such a simula
X= r 1 (R) and closes at t
Generating a sequence of values is done through step 4. Examples ofter
Step 4 : Generate unifonn random number R1 ~ and ~ and compute the desired i. The shady grov
random variates by 11 tellers workinij
x, = f·I (R,) open for 48° minui
between custome
ForexponentialcaseF-1 (R) = _2._ln (t - R) byeqn(l) and closing down
A
ii. A communicati
-I
So X, =,: ln{l-R.) .... (2) components. It is r
Fori= 1,2,3 .....onesimplification that is usually in eqn(2) is to replace 1- R, by R,
-I
X, =,:ln(R,)
6
~s~
CBCS · JWU!,,/ July 2019
b. Explain the types of simulation with respect to output analysis. Give examples.
(08 Marks)
Ans. Types of steady state simulation
• In analyzing simulation output date, a distinction is made between terminating or
transient simulation and steady stale simulation.
theeqn(l)is • A terminating simulation is one that runs for some duration of time Te, where e is
a specified event, that stops the stimulation.
• Such a simulated system op~ns at time O under well specified initial conditions
and closes at the stopping time TE'
Examples of terminating simulation
i. TI1e shady grove bank opens at 8.30 am with no customers present and 8 of the
compute the desired
11 tellers working and closes at 4.30 pm. Here the event E is the fact that the bank
open for 48° minutes. The simulation analyst is interesed in modeling the interaction
between customers and tellers over the entire day including the effect of starting up
and closing down at the end of the day.
ii. A communication system consists of several components plus several backdrop
components. It is represented as below.
lace!- R, byR,
7
VIII Sem.- (CSE ( ISE)
OR Eight Semest
10. a.Explain with neat diagram, a model building verification and validation.
(08 Marks) Time: 3 hrs.
Ans. Refer Q.no. IO(a) of MQP - I Note : An., wrr an
b. Describe the 3 steps approach formulated by Naylor and Finger in the validation
process. (08 Marks)
1. a. With a neat d
Ans. ReferQ.no. IO(a)ofMQP-2 Ans. Refer Q.No I.
b. A small gro
checkout cou,
inter-arrival
times vary fi
Serviet
Proba!
Develop slm
(ii) Average s
random digits
are 84, 10, 74
Ans. Distribution c
Time between
I
2
3
4
s
6
7
8
9
10
Distribution o
Service time
I
2
3
4
5
6
8
s~
E ight Semester B.E. Degree Examination, CBCS - Dec 2019 / Jan 2020
System Modelling and Simulation
d validation.
(08 Marks) Time: 3 hrs. Max. Marks: 80
Note : Answer any FIVE f11/I q11estions, selecting ONE full quest/011 f rom eaclt mod11le. _,
9
VIII Se,m,(CSE I IS'E) S y ~ M ~CMld,,S ~ CBCS · Veo2019 I J
Customers Random digilS Time between arrivals
I - - 2. a. A company uses 6
2 91 10 a railroad. Each t
3 72 8 moves to the weig
4 15 2 come first served
s 94 10 afterward returns
6 30 3 time are given in t
7 92 10
8 75 8 Loadin
9 23 3 Wei hin
10 30 3 Travel ti
Service time determination Calculate the total
Customers Random digits Time between arrivals uUlization. Assum
I 84 5 Stopping event ti
2 10 2 Ans. Refer Q.No. 2.b. 0
3 74 5 b. Explain the terms
4 53 4 i) Event ii
s 17. 3 v) Clock v
6 79 5 Ans. 1. Event : An insta
7 91 6
8 67 5 Ex: an arrivaVdepa
9 89 s 2. Event notice :
10 38 4 along with any as
Simulation table for single channel queue problem record includes the
Customer later Interval Service Time W1ltln1 Time Time customer Idle of an bank, we hav
■rrlv■I time time lffVke time la aervlce spends la time or a possible e,vent no
time be&ln• queue ends system server • Event type (e.g.
I - 0 5 0 0 s s 0 • Event time (e.g.
2 10 10 2 10 0 12 2 s • Customer num
3 8 18 s 18 0 23 5 6 3. FEL : A list ofe
4 2 20 4 23 3 27 7 0 known as the futu
s 10 30 3 30 0 33 3 3 4. Delay : Duratio
6 3 33 .s 33 0 38 .s 0 Ex: a customer's
7 10 43 6 43 0 49 6 s S. Clock : A varia
8 8 .SI .s .SI 0 56 5 2 Consider die Able.
9 3 54 5 56 2 61 7 0
10 3 51 4 61 4 65 8 0 LQ(t): The numbe
LA(t): 0 or I to in
·1) A verage wa1tmg
. . time=
. 9
- = 09 •
. mms LB(t) 0 or I to ind
10 6. System state :
44 to describe the sy
1·1·) Average service
. time=
. - = 4.4 mms
·
10 Ex: Busy or Idle s
"') Average time
· customer spend .m system • -53 • S.3 mins 7. Activity: Dura
111
10 Ex: a service time
10
~s~ CBCS · Vec,2019 / J<M\/2020
Iarrivals OR
2. a. A company usea 6 trucks to haul manganese from the entrance of a mine to
a railroad. Each truck is loaded by one of two loadeni. After loading a truck
moves to the weighting scale to be weighted. Both loaders and scale have lint-
come first served queue. After being weighted a truck begins'a travel time
afterward returns to loader queue. The activity ofloading, weighing and travel
· time are given in the following table:
Loading time 10 5 5 10 15 10 10
Weighing time 12 12 12 16 12 16 -
Travel time 60 100 40 40 80 - -
Calculate the total busy time ofw1th loaders, the scale, average loader and scale
p arrivals utilization. Assume 5 trucks are at the loader and one is at the scale, at time '0'.
Stopping event time TE = 24 min. (08 Marks)
Ans. Refer Q.No. 2.b. of MQP - 2
b. Explain the terms used in discrete event simulation with am example:
i) Event Ii) Event notice iii) FEL Iv) Delay
v) Clock vi) System state vii) Activity viii) System (08 Marks)
Ans. 1. Event : An instantaneous occurrence that changes the state of a system.
Ex: an arrival/departure of a customer
2. Event notice : A record of an event to occur at the current or some future time,
along with any associated data necessary to execute the event; at a minimum, the
record includes the event type and the event time. Ex :when simulating the operation
ofan bank, we have two types of events, Arrival &Departure. With these two events,
Time customu ldlt
a possible e.vent notice will be
spends in time or
system server • Event type (e.g. Arrival or Departure)
s 0 • Event time (e.g. 8)
2 s • Customer number
s 6 3. FEL : A list ofevent notices for future events, ordered by time of occurrence; also
7 0 known as the future event list (FEL).
3 3 4. Delay: Duration of time of unknown length, which is not known until it ends
5 0 Ex: a customer's waiting in-line for service to begin
6 5 S. Clock : A variable representing simulated time.
s 2 Consider the Able. Baker carhop system. The components ofdiscretc-cventmodel are:
7 0
8 0
LQ(t): The number of cars waiting to be served at time t
LA(t): 0 or I to indicate Able being idle or busy at time t
LB(t) 0 or I to indicate Baker being idle or busy at time t .
6. System state: A collection of variables that contain all the information necessary
to describe the system at any time.
Ex: Busy or Idle state of server
7. Activity : Duration of time of specified length .It is known when it begins.
Ex: a service time or inter-arrival time
11
VIII Sem., (CSE I ISE) S y ~ Model.Un.g, ~S~ CBCS · 'Deo2019
8. System : A collection of entities (e.g., people and machine~) that interact together 3) Geometric a
over time to accomplish one or more goals. Bernoulli trials,
Ex: Bank, Transportation to achieve the fi
Module-2
3. a. Discuss different types of discrete distribution. P(x) ={q':P,
(10 Marks)
Ans. Types of discrete distributions
1) Bernoulli trails and tbe Bernoulli distribution : Let x, = I if the jd' experiment The event [x •
resulted in success and let x = 0 if the jd' experiment resulted in a failure. Then the failures has
Bernoulli trails are called a ~moulli process if the trails are independent, each trial p. Thus
has only two possibilities outcomes and the probability of a success remains constant P(FFF .... FJ -=
from trial to trail. Thus The mean varia
P( x., x 2 ••..x = P, (x, ),P1 (x1 ) .. . ..P1 ( x 1 }
• E(x) =.!.
0 )
p
P, x, =I, j =1,2, ....n
and P; ( x,) = P( x,) = 1- P = q, x, = =
0, j 1,2,....n and V(x)=..9...
pl
{ 0, otherwise
4) Poisson dis
For one trial, the distribution given in equation (I) is called Bernoulli distribution.
The mean and variance of x, are calculated as follows.
E(x,) = 0.q + 1.P + P
and V(x)=
J
[(02. q)+ (12. P)-P2 = P(l -P)
:;:·~{·~~; 0,
2) Binomial distribution
The random variable x that denotes the number of successes in Bernoulli trials has a Where a> 0 o
binomial distribution given by p(x) where mean and vari
{(") .•-·
E(x) -= a = V(x
P(x) = x pq x = 0,1,2, ....n ....(1) The cumulativ
0, otherwise F(x) = f e-oi!
,-o
All the successes, each denoted bys, occurring in the first x trials, followed by the
n-x failures, each denoted by an F that i~ b. Explain the d
Ans. Refer Q.No. 4
P(SS;::. SS)(F; -~~-=-FF) p"q" •
4. a. .Explain the
where q -= I - P. There are Ans. Unifonn and
(:)= xl(nn~x)!
Triangular di
Nonnal distr
Each with mean P and variable P ( I - P) • pq. b. Explain Ke
also interp
Then x = x.+ '½ + ..... + x.
and the mean E(x) is given by Ans. The letters I
E(x) = p + p + ... p:. np A-+ represe
and variance V(x) is given by e-rep
V(x) = pq + pq + ..... + pq = npq
C-+rep
N-+re
K-+re
12 s.-+w
Cl,t'\.d, s~ CBCS - Oec-2019 / JOtKv2020
pthat interact together 3) Geometric and Negative Binomial distributions : It is related to sequence of
Bemoul Ii trials, the random variable of interest x, is defined to be the number of trials
to achieve the first success. The distribution ofx is given by
P(x) ={q•-'p, x=l,2.....
(10 Marks) 0 otherwise
The event [x = x) occurs when there are x - I failures followed by a success, Each of
I if the j"' experiment
the failures has an associated probability ofq = I - P and each success has probability
lted in a failure. Then
p. Thus
independent, each trial P(FFF .... F8) = q•- 1 p
ccess remains constant
The mean variance are given by
E(x)=.!_
p
and V(x)=.1.
p2
4) Poisson distribution :- It describes many random process quite well and is
Bernoulli distribution. mathematically simple. The poisson probability mass functfon is given by
N
-N
I
- -R
N I
R,-C~') uniformly
14
cacs . vec, 201 9 / J'4W 2020
s~ 4 5
l 2 3
I
orkov), D(Constant
yperx potential), C R 0.1306 0.0422 0.6597 0.796S 0.7696
0.1400 0.0431 1.078 l.592 1.468
X
b. Explain the properties of random numben and pseudo-random numben.
tenlial arrivals, the (05 Marks)
ed when N and K are
Ans. Properties of Random numben
A sequence of random numbers R1, R1.... must have two important statistical
properties : uniformity and independence. Each random number R, must be an
oiuon variates with independent sample drawn from a continuous uniform distribution between zero and
53,0.9952, 0.8004, 1 that is, the pdf is given by
(06 Marks)
f(x)~·{'• Osxsl
0, otherwise
achieving maximal
(OS Marks) Some consequences of the uniformity and independence properties are following :-
1) If the interval (0, I] is divided into n classes or subintervals of equal length, the
and 0.68 bas been
expected number of observations in each interval i,; N/n where N is the total number
to determine if the of observations.
2) The probability of observing a value in a particular interval is independent of the
tlaeinterval(O,llcan
(OS Marks) previous value drawn.
Properties of Pseudo Random Numbers
Refer Q.No. 6.a. ofMQP • I
c. Use the Chi-square test with a • O.OS to test whether the data shown below are
uniformly distributed or not. (Note: X,2...., • 6.9J
0.594 1>J5I 0.422 0.127 0.182
0.11
0.34 0.928 0.262 0.797 0.788 0.929
0.28 0.515 o.cm 0.798 0.825 0.852
0.13 0.507 0.474 0.227 0.07 0.05
0.18
(05 Marki)
Ans.
16
odel of input data with
(08 Marks)
(08 Marks)
ulated by Nayierand
(08 Marks)
a■d validation.
(08 Marks)
(08 Marks)
(08 Marks)
TECHNICAL POSTS
OROUP • B
ELECTRONICS
r,:COMMUMCATION