Professional Documents
Culture Documents
Open Source MGMT Options PDF
Open Source MGMT Options PDF
JaneCurry
Skills1stLtd
2CedarChase
Taplow
Maidenhead
SL60EU
01628782565
jane.curry@skills1st.co.uk
1
Synopsis
Nutsandboltsnetworkandsystemsmanagementiscurrentlyunfashionable.The
emphasisisfarmoreonprocessesthatimplementservicemanagement,drivenby
methodologiesandbestpracticessuchastheInformationTechnologyInfrastructure
Library(ITIL).Nonetheless,allservicemanagementdisciplinesultimatelyrelyona
waytodeterminesomeofthefollowingcharacteristicsofsystemsandnetworks:
Configurationmanagement
Availabilitymanagement
Problemmanagement
Performancemanagement
Changemanagement
Securitymanagement
Thecommercialmarketplaceforsystemsandnetworkmanagementofferingstendto
bedominatedbythebigfourIBM,HP,CAandBMC.Eachhavelarge,modular
offeringswhichtendtobeveryexpensive.Eachhasgrowntheirportfoliobybuying
upothercompaniesandthenperformingsomelevelofintegrationbetweentheir
respectivebrandedproducts.Onecanarguethattheresultingofferingstendtobe
marketechturesratherthanarchitectures.
ThispaperlooksatOpenSourcesoftwarethataddressesthesamerequirements.
OfferingsfromNetdisco,CactiandTheDudeareexaminedbriefly,followedbyanin
depthanalysisofNagios,OpenNMSandZenoss.
Thispaperisaimedattwoaudiences.Foradiscussiononsystemsmanagement
selectionprocessesandanoverviewofthreemainopensourcecontenders,readthe
firstfewchapters.Thelastfewchaptersthenprovideaproductcomparison.
ForthosewhowantlotsmoredetailonNagios,OpenNMSandZenoss,themiddle
sectionsprovideindepthdiscussionswithplentyofscreenshots.
2
Table of Contents
1DefiningSystemsManagement....................................................................................5
1.1Jargonandprocesses................................................................................................5
1.2SystemsManagementforthispaper....................................................................6
2Systemsmanagementtools.............................................................................................6
2.1Choosingsystemsmanagementtools......................................................................7
2.2TheadvantagesofOpenSource...............................................................................8
3OpenSourcemanagementofferings...............................................................................8
4CriteriaforOpenSourcemanagementtoolselection.................................................10
4.1Generalrequirements.............................................................................................10
4.1.1MandatoryRequirements...............................................................................10
4.1.2DesirableRequirements..................................................................................10
4.2Definingnetworkandsystemsmanagement.....................................................11
4.2.1Networkmanagement.....................................................................................11
4.2.2Systemsmanagement......................................................................................12
4.3Whatisoutofscope?..............................................................................................13
5AquicklookatCacti,TheDudeandnetdisco..............................................................14
5.1Cacti.........................................................................................................................14
5.2netdisco....................................................................................................................17
5.3TheDude..................................................................................................................20
6Nagios..............................................................................................................................21
6.1ConfigurationDiscoveryandtopology................................................................22
6.2Availabilitymonitoring...........................................................................................27
6.3Problemmanagement.............................................................................................32
6.3.1Eventconsole....................................................................................................33
6.3.2Internallygeneratedevents............................................................................37
6.3.3SNMPTRAPreceptionandconfiguration.....................................................39
6.3.4Nagiosnotifications........................................................................................39
6.3.5Automaticresponsestoeventseventhandlers..........................................41
6.4Performancemanagement......................................................................................42
6.5Nagiossummary.....................................................................................................45
7OpenNMS........................................................................................................................46
7.1ConfigurationDiscoveryandtopology................................................................47
7.1.1Interfacediscovery...........................................................................................47
7.1.2Servicediscovery..............................................................................................48
7.1.3Topologymappinganddisplays......................................................................51
7.2Availabilitymonitoring...........................................................................................53
7.3Problemmanagement.............................................................................................59
7.3.1Eventconsole....................................................................................................59
7.3.2Internallygeneratedevents............................................................................62
7.3.3SNMPTRAPreceptionandconfiguration.....................................................65
7.3.4Alarms,notificationsandautomations..........................................................69
3
7.4Performancemanagement......................................................................................76
7.4.1Definingdatacollections.................................................................................76
7.4.2Displayingperformancedata..........................................................................85
7.4.3Thresholding....................................................................................................91
7.5ManagingOpenNMS..............................................................................................97
7.6OpenNMSsummary...............................................................................................98
8Zenoss..............................................................................................................................98
8.1ConfigurationDiscoveryandtopology..............................................................100
8.1.1Zenossdiscovery.............................................................................................100
8.1.2Zenosstopologymaps....................................................................................107
8.2Availabilitymonitoring........................................................................................108
8.2.1Basicreachabilityavailability......................................................................108
8.2.2AvailabilitymonitoringofservicesTCP/UDPportsandwindowsservices
...................................................................................................................................110
8.2.3Processavailabilitymonitoring....................................................................113
8.2.4Runningcommandsondevices.....................................................................120
8.3Problemmanagement...........................................................................................121
8.3.1Eventconsole.................................................................................................122
8.3.2Internallygeneratedevents..........................................................................123
8.3.3SNMPTRAPreceptionandconfiguration...................................................125
8.3.4email/pageralerting....................................................................................126
8.3.5Eventautomations.........................................................................................131
8.4Performancemanagement....................................................................................132
8.4.1Definingdatacollection,thresholdingandgraphs.....................................132
8.4.2Displayingperformancedatagraphs...........................................................138
8.5Zenosssummary....................................................................................................141
9ComparisonofNagios,OpenNMSandZenoss...........................................................142
9.1Featurecomparisons.............................................................................................143
9.1.1Discovery........................................................................................................143
9.1.2Availabilitymonitoring.................................................................................144
9.1.3Problemmanagement....................................................................................144
9.1.4Performancemanagement............................................................................145
9.2Producthighpointsandlowpoints....................................................................146
9.2.1Nagiosgoodiesandbaddies.....................................................................146
9.2.2OpenNMSgoodiesandbaddies...............................................................146
9.2.3Zenossgoodiesandbaddies.....................................................................147
9.3Conclusions............................................................................................................148
10References...................................................................................................................149
11AppendixACactiinstallationdetails.....................................................................149
4
1 Defining Systems Management
1.1 Jargon and processes
Everyorganisationandindividualhastheirownperspectiveonsystemsmanagement
requirements;thefirstessentialstepwhenlookingforsystemsmanagementsolutions
istodefinewhatthoserequirementsare.Thisgivesameanstomeasuresuccessofa
project.
Therearemanydifferentmethodologiesanddisciplinesforsystemsmanagementfrom
theInternationalStandardsOrganization(ISO)FCAPSacronymFault,
Configuration,Accounting,PerformanceandSecurity,throughtotheInformation
TechnologyInfrastructureLibrary(ITIL)whichdividestheITILV2frameworkinto
twocategories:
ServiceSupportwhichincludesthe:
ServiceDeskfunction
Incidentmanagementprocess
Problemmanagementprocess
Configurationmanagementprocess
Changemanagementprocess
Releasemanagementprocess
ServiceDeliverywhichincludesthe:
ServiceLevelmanagementprocess
Capacitymanagementprocess
ITServiceContinuitymanagementprocess
Availabilitymanagementprocess
FinancialmanagementforITservices
KeytothecoreofconfigurationmanagementandtheentireITILframeworkisthe
conceptoftheConfigurationManagementDatabase(CMDB)whichstoresand
maintainsConfigurationItems(CIs)andtheirinterrelationships.
Theartofsystemsmanagementisdefiningwhatisimportantwhatisinscope,and
perhapsmoreimportantly,whatiscurrentlyoutofscope.Thescienceofsystems
managementisthentoeffectively,accuratelyandreliablyprovidedatatodeliveryour
systemsmanagementrequirements.Thedevilreallyisinthedetailhere.A
comprehensivesystemsmanagementtoolthatdeliversathousandmetricsoutof
theboxbutwhichisunreliableand/ornoteasilyconfigurable,issimplyarecipefora
projectthatisdeliveredlateandoverbudget.
5
ForsmallerprojectsorSmall/MediumBusiness(SMB)organisations,apragmatic
approachisoftenhelpful.Manypeoplewillwantasayinthedefinitionof
management.Others,whoserequirementsmaybeequallyvaluable,maynotknow
theartofthepossible.Hence,combiningtopdownrequirementsdefinition
workshopswithabottomupapproachofdemonstratingtop10metricsthatcan
easilybedeliveredbyatool,canresultinaniterativeprocessthatfairlyquickly
deliversatleastaprototypesolution.
6
differentpiecemealtoolsfordifferentprojects,especiallywhenthecostofbuildingand
maintainingskillsandeducatingusersistakenintoaccount.
Toolintegrationisalargefactorinthesuccessfulrolloutofsystemsmanagement.
TheconceptofasingleConfigurationManagementDatabase(CMDB)thatalltools
feedanduse,iskeytothis.
Agoodtooldeliversusefulstuffeasilyoutoftheboxandprovidesastandardwayto
thenprovidelocalcustomisation.
Atitsmostbasic,thetoolisacompilerorinterpreter(C,bash,...)andthe
customisationiswritingprogramsfromscratch.Atthecomplexendofthespectrum,
thetoolmaybealargesuiteofmodulesfromoneofthebigfourcommercial
suppliers,IBM,HP,CAandBMC.Atthereallycomplexend,iswhereyouhave
severalofthebigcommercialproductsinvolvedinadditiontohomegrownprograms.
7
2.2 The advantages of Open Source
OneattractionofOpenSourcetomeisthatyoudon'tactuallyhavetofund
salesfolk.Somecostsdoneedtobeinvestedinyourownpeopletoinvestigatethe
offeringsavailable,researchtheirfeaturesandrequirements,andparticipateinthe
onlineforathatshareexperiencearoundtheglobe.Thesecostsmaynotbesmallbut
atleasttheinvestmentstayswithinthecompanyandhopefullythosepeoplewhohave
donetheresearchwillthenbeakeypartoftheteamimplementingthesolution.This
isoftennotthecaseifyoupurchasefromacommercialsupplier.
OpenSourcedoesnotnecessarilymeanyou'reonyourown,pal!.MostoftheLinux
distributionshaveafreeversionandasupportedversion,whereasupportcontractis
availabletosuityourorganisationandbudget.SeveraloftheOpenSource
managementofferingshaveasimilarmodelbutdoensurethatthefreeversionhas
sufficientfeaturesforyourrequirementsandisnotjustawellfeatureddemo.
Allsoftwarehasbugsinit.Ultimately,ifyougoOpenSource,youhavethesource
codesoyouhavesomechanceoffixingproblemswithlocalstafforbuyinginglobal
expertiseandthatdoesn'tnecessarilymeantransportingagurufromAustraliato
Paris.OpenSourcecodeisavailabletoeveryonesoremotesupportandconsultancyis
adistinctpossibility.Withthebestwillintheworld,commercialorganisationswill
prioritiseproblemreportsaccordingtotheircriterianotyours.
TherearesomeexcellentforaanddiscussionlistsforcommercialproductsIhave
participatedinseveralofthemformanyyears;someevenhaveinputfromthesupport
anddevelopmentteams;however,thesourcecodeisnotopenfordiscussionor
communitydevelopment.WithaveryactiveOpenSourceoffering,theretendstobea
muchlargerpoolofdevelopersandtesters(ie.us)andthechanceofgettingproblems
fixedmaybehigher,evenifyoucannotfixityourself.Iwouldemphasiseveryactive
OpenSourceofferingsunlessyoureallydohavesomeveryhighlyskilledlocalstaff
thatyouaresureyouaregoingtokeep,itmaybeariskychoicetoparticipateina
smallOpenSourceproject.
8
V2(1993)solvedsomeperformanceissues.Neverreachedfullstandard
status.
V3(2002)significantlyimprovedperformanceandsecurityissues.Muchmore
complex.
OftheOpenSourcemanagementsolutionsavailable,someareexcellentpoint
solutionsforspecificnicherequirements.MRTG(MultiRouterTrafficGrapher)
writtenbyTobiOetiker,isanexcellentexampleofacompactapplicationthatuses
SNMPtocollectandlogperformanceinformationanddisplayitgraphically.Ifthat
satisfiesyourrequirement,don'tlookanyfurtherbutitwillnothelpyouwith
definingandcollectingproblemsfromdifferentdevicesandthenmanagingthose
problemsthroughtoresolution.
AnenhancementofMRTGisRRDTool(RoundRobinDatabaseTool),againfromTobi
Oetiker.Itisstillfundamentallyaperformancetool,gatheringperiodic,numericdata
anddisplayingitbutRRDToolhasadatabaseatitsheart.Thesizeofthedatabaseis
predeterminedoncreationandnewerdataoverwritesolddataafterapredetermined
interval.RRDcanbefoundembeddedinanumberofotherOpenSourcemanagement
offerings(Cacti,Zenoss,OpenNMS).
AfurtherenhancementfromRRDToolisCactiwhichprovidesacompletefrontendto
RRDTool.AbackendMySQLrelationaldatabasecanbeusedbehindtheRoundRobin
databases;datasourcescanbeprettywellanyscriptinadditiontoSNMP;andthere
isusermanagementincluded.Thisisstillaperformancedatacollectionanddisplay
package,notamultidiscipline,framework,systemsmanagementsolution.
Movingupthescaleoffeaturesandcomplexity,someofferingsareslantedmore
towardsnetworkmanagement(netdisco,TheDude);otherstowardssystems
management(Nagios).
Someaimtoencompassanumberofsystemsmanagementdisciplineswithan
architecturebasedaroundacentraldatabase(Nagios,Zenoss,OpenNMS).
Someareextremelyactiveprojectswithhundredsofappendstomaillistspermonth
(Nagios,Zenoss,OpenNMS,cacti);othershavearegularbutsmallercommunitywith
hundredsofmaillistappendsperyear(netdisco).
SomearepurelyOpenSourceprojects,typicallylicensedundertheGnuGPL(MRTG,
RRDTool,cacti)orBSDlicense(netdisco);somehavefreeversions(againtypically
underGPL)withextensionsthathavecommerciallicences(Zenoss).Inadditionto
freelicences,severalproductsoffersupportcontracts(Zenoss,Nagios,OpenNMS).
MostareavailableonseveralversionsofLinux;MRTG,RRDToolandcactiarealso
availableforWindows.TheDudeisbasicallyaWindowsapplicationbutcanrun
underWINEonLinux.
MosthaveawebbasedGUIsupportedonOpenSourcebrowsers.OpenNMScanonly
displaymapsbyusingInternetExplorer.
9
4 Criteria for Open Source management tool selection
Itisessentialtodefinewhatisinscopeandwhatisoutofscopeforasystems
managementproject.Aprioritisedlistofmandatoryanddesirablerequirementsis
helpful.
10
4.2 Defining network and systems management
TheIntegratednetworkandsystemsmanagementrequirementneedssomefurther
expansion:
11
Regular,customisablemonitoringofSNMPMIBvariables,bothstandard
andenterprisespecific,withdatastorageandabilitytothresholdvaluesto
generateevents
AbilitytoimportanyMIB
AbilitytobrowseanyMIBonanydevice
Customisablegraphingofperformancedata
12
Centraleventsconsolefornetworkandsystemsmanagementeventswith
abilitytoprioritiseevents
Abilitytocategoriseeventsfordisplaytospecificusers
AbilitytoreceiveandformatSNMPtrapsforSNMPV1,V2andpreferably,
V3
AbilitytomonitorUnixsyslogsandWindowsEventLogsandgenerate
customisableevents
Ideallytheabilitytomonitoranytestlogfileandgeneratecustomisable
events
Customisationofactionsinresponsetoevents,bothmanualactionsand
automaticresponses
Abilitytocorrelateeventstofindrootcauseproblems(eg.singlepointof
failurerouterisrootcauseofavailabilityfailureforalldevicesinanetwork)
Performance
Regular,customisablemonitoringofSNMPMIBvariables,bothstandard
andenterprisespecific,withdatastorageandabilitytothresholdvaluesto
generateevents
AbilitytoimportanyMIB
AbilitytobrowseanyMIBonanydevice
AbilitytogatherperformancedatabymethodsotherthanSNMP(eg.ssh)
Customisablegraphingofperformancedata
13
5 A quick look at Cacti, The Dude and netdisco
Cacti,TheDudeandnetdiscodonotmeetmymandatoryrequirements;howeverthey
areinterestingnichesolutionsthatwereinvestigatedduringthetoolsevaluation
process.Cactiandnetdiscowereinstalled;TheDudewasonlyInternetresearched.
5.1 Cacti
Cactiisanichetoolforcollecting,storinganddisplayingperformancedata.Itisa
comprehensivefrontendtoRRDTool,includingtheconceptofusermanagement.
AlthoughthedefaultmethodofdatacollectionisSNMP,otherdatacollectors,
typicallyscripts,arepossible.
DatacollectionisveryconfigurableandisdrivenbytheCactiPollerprocesswhichis
calledperiodicallybytheOperatingSystemscheduler(cronforUnix).Thedefault
pollingintervalis5minutes.
DevicesneedtobemanuallyaddedusingtheCactiwebbasedGUI.Basicinformation
suchashostname,SNMPparametersanddevicetypeshouldbesupplied.Depending
onthedevicetypeselected(eg.ucd/netSNMPHost,CiscoRouter),oneormoredefault
graphtemplatescanbeassociatedwithadevicealongwithoneormoredefaultSNMP
dataqueries.InadditiontothewebbasedGUI,configurationofCacticanbedoneby
CommandLine,usingPHPwhichisageneralpurposescriptinglanguageespecially
suitedforwebdevelopment.
CactinowhassupportforSNMPV3.
Forhighperformancepolling,Spine(usedtobecactid)canreplacethebasecmd.php
pollingengine.TheusermanualsuggeststhatSpinecouldsupportpollingintervals
oflessthan60secondsforatleast20,000datasources.
CactiissupportedonbothUnixandWindowsplatforms.
GettheCactiUserManualfromhttp://www.cacti.net/downloads/docs/pdf/manual.pdf.
Cactihasaveryactiveuserforumwithhundredsofappendspermonth.Thereisalso
adocumentedreleaseroadmapgoingforwardto2ndquarter2009.
HereareafewscreenshotsofCactitogiveafeelfortheproduct.
14
Figure1:CactimainDevicespanel
15
Figure2:Cactigraphofinterfacetraffic
16
Figure3:Cactigraphofmemoryfordevicebino
5.2 netdisco
netdiscowascreatedattheUniversityofCalifornia,SantaCruz(UCSC),Networking
andTechnologyServices(NTS)department.Itisinterestingasanetwork
managementconfigurationoffering.ItusesSNMPandCiscoDiscoveryProtocol
(CDP)totryandautomaticallydiscoverdevices.Unlikemostothermanagement
offerings,netdiscoisLayer2(switch)awareandcanbothdisplayswitchportsand
optionallyprovideaccesstocontrolswitchports.
ItprovidesaninventoryofdevicesthatyoucansorteitherbyOSorbydevicemodel,
displayingallportsforadevice.Italsohastheabilitytoprovideanetworkmap.
Usermanagementisincludedsoyoucanrestrictwhoisallowedtoactivelymanage
devices.ThereisgoodprovisionofbothcommandlineinterfaceandwebbasedGUI.
netdiscoissupportedonvariousplatformsitwasoriginallydevelopedonFreeBSD;I
builtitonaCentos4platform.
17
Ifyourrequirementisstrictlyfornetworkconfigurationmanagementandyour
devicesrespondsuitablytonetdiscothenthismightbeworthatry.Ifounditvery
quirkyastowhatitwoulddiscover.ItappearsverydependentontheSNMPsystem
sysServicesvariabletodecidewhetheradevicesupportsnetworklayer2and3
protocols;ifadevicedidnotprovidesysServicesordidn'tindicatelayer2/3,then
netdiscowouldnotdiscoverit.IalsohadveryfewdevicessupportingCiscoCDPso
theautomaticdiscoverydidn'tworkwellforme.Althoughthereisafilewhereyou
canmanuallydescribethetopology,thiswouldbeahugejobinasizeablenetworkif
youhadtohandcraftasignificantamountofthenetworktopology.
Thisprojectisnotnearlysoactiveassomeoftheotherofferingsdiscussedhere
(around500appendstotheusersmaillistin2007)butthereseemstobeasteadyflow.
Buildingthesystemwasafairmarathonbutthedocumentationisreasonablygood.
Herearesomescreenshotsofthemaindeviceinventorypanel,plusthedetailsofa
routerandthedetailsofaswitch.
Figure4:Netdiscomaindeviceinventorydisplay
18
Figure5:Netdiscodetailsofrouterdevice
19
Figure6:Netdiscodetailsofaswitchdevice,includingports
20
6 Nagios
Nagiosevolvedin2002outofanearliersystemsmanagementprojectcalledNetSaint,
whichhadbeenaroundsincethelate1990s.Itisfarmoreasystemsmanagement
product,ratherthananetworkmanagementproduct.Itisavailabletobuildonmost
flavoursofLinux/Unixandtheinstallationhasbecomemucheasierovertheyears.
TheNagiosQuickstartdocumentisreasonablycomprehensive(althoughitmissesa
fewprerequisitesthatIfoundnecessarylikegd,png,jpeg,zlib,netsnmpandtheir
relateddevelopmentpackages).IdownloadedandbuiltNagios3.0.1onaSuSE10.3
platform(hostnamenagios3),andhaditworkinginsidehalfaday.
TostarttheWebInterface,pointyourbrowserathttp://nagios3/nagios/.The
Quickstartdocumenthasyoucreatesomeuseridsandpasswordsthedefaultlogon
fortheWebconsoleisnagiosadminwiththepasswordyouspecifiedduring
installation.
HereisascreenshotoftheNagiosTacticalOverviewdisplay.
Figure7:NagiosTacticalOverviewscreen
21
6.1 Configuration Discovery and topology
Nagiosusesanumberoffilestoconfigurediscoveryoutoftheboxitwillfind
nothing.Samplesareavailable,bydefault,in/usr/local/nagios/etc.Themain
configurationfileisnagios.cfgwhichdefinesalargenumberofparameters,mostof
whichyoucanleavealoneattheoutset.
Typicallythemainthingstodiscoverarehostsandservices.Thesearedefinedin
anobjectorientedwaysuchthatyoucandefinehostandservicetoplevelclasseswith
particularcharacteristicsandthendefinesubclassesandhoststhatinheritfromtheir
parentclasses.Ratherthanhavingasingle,hugenagios,cfg,itcanreferenceother
files(typicallyintheobjectssubdirectory),wheredefinitionsforhosts,servicesand
otherobjecttypes,canbekept.So,forexample,/usr/local/nagios/etc/nagios.cfgmay
containlinessuchas:
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
cfg_file=/usr/local/nagios/etc/objects/services.cfg
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
Definitionsofhostsarebuiltupinahierarchicalmannersothetopleveldefinitions
maylooklikethefollowingscreenshot.Notetheusestanzatodenoteinheritanceof
characteristicsfromapreviousdefinition.
22
Figure8:Nagioshosts.cfgtopleveldefinitions
Hostavailabilityparametersareshowninthescreenshotabove:
check_period (24x7)
check_interval (5mins)
retryinterval (1min)
max_check_attempts (10)
check_command (check_host_alivewhichisbasedoncheck_ping)
23
Figure9:Nagioshosts.cfgshowinghosttemplatedefinitions
Subsequentdefinitionsofsubgroupsandrealhostswillfollow.Notetheuseofthe
parentsstanzatodenotethenetworknodethatprovidesaccesstothedevice.This
meansthatNagioscantellthedifferencebetweenanodethatisdownandanodethat
isunreachablebecauseitsaccessrouterisdown.
24
Figure10:Nagioshosts.cfgfileshowingrealhostdefinitions
Hostscanbedefinedtobeamemberofoneormorehostgroups.Thisthenmakes
subsequentconfigurationmorescalable(forexample,aservicecanbeappliedtoahost
groupratherthantoindividualhosts).Hostgroupsaretypicallydefinedinhosts.cfg.
Figure11:Nagioshosts.cfghostgroupdefinitions
25
HostgroupsarealsousedintheGUItodisplaydatabasedonhostgroups.
Figure12:NagiosHostgroupsummary
Wheneverchangeshavetakenplacetoanyconfigurationfile,thecommand:
/etc/init.d/nagiosreload
shouldbeused.ThisdoesnotstopandstarttheNagiosprocesses(usestop|start|
restart|statustocontrolthebackgroundprocesses)thereloadparametersimplyre
readstheconfigurationfile(s).Thereisalsoahandycommandtoverifythatyour
configurationfilesarelegalandconsistent,beforeactuallyperformingthereload:
/usr/local/nagios/bin/nagiosv/usr/local/nagios/etc/nagios.cfg
AllobjectstobemanagedneeddefiningintheNagiosconfigurationfilesthereisno
formofautomaticdiscovery;howevertheabilitytocreateobjecttemplatesandthus
anobjecthierarchy,makesdefinitionsflexibleandeasy,onceyouhavedefinedyour
hierarchies.
26
Agreatbenefitofthisconfigurationfileistheabilitytodenotethenetworkdevices
thatprovideaccesstospecificnodes(parent/childrelationship).Thismeansthata
maphierarchycanbedisplayedandalsomeansthatnodereachabilityisencoded.If,
forexample,allnodesonthe172.31.100.32networkinheritfromatemplatethat
includesaparentsgroup100r3stanza,whengroup100r3goesdownthen
Nagiosknowsthatallnodesinthatnetworkareunreachable(ratherthandown).
Definingmultipleparentsforameshednetworkseemedproblematicalthough.
Nagiosautomaticallygeneratesatopologymap,basedonthetheparentsstanzasin
theconfigurationfiles.Colourcodingprovidesstatusfornodes.
Figure13:NagiosStatusmap
27
othercommunitypluginsavailable,oryoucanwriteyourown.Theofficialplugins
shouldbeinstalledalongsidethebaseNagios.Theexecutablescanbefound
in/usr/local/nagios/libexec(use<pluginname>helpforusageoneachplugin).The
officialpluginsinclude:
check_ping configurablepingtestwithwarning&criticalthresholds
check_snmp genericSNMPtesttogetMIBOIDs&testreturnvalues
check_ifstatus checkSNMPifOperStatusagainstifAdminStatusforall
Administrativelyupinterfaces
check_ssh checkthatthesshportcanbecontactedonaremotehost
check_by_ssh usesshtoruncommandonremotehost
check_nt checkWindowsparameters(disk,cpu,services,etc..).Needs
NSClient++agentinstalledonWindowstargets
check_nrpe checkremoteLinuxparameters(disk,cpu,processes,etc..).
NeedsNRPEagentinstalledonUnix/Linuxtarget
Nagioshastwoseparateconceptshostmonitoringandservicemonitoringandthere
isaknownrelationshipbetweenthestateofthehostandthestateofitsservices.
Hostmonitoringisareachabilitytestandwillgenerallyusethecheck_pingNagios
plugin.IfyouhavedevicesthatsupportSNMPbutdonotsupportping(perhaps
becausethereisafirewallinthewaythatblocksping),thenthecheck_ifstatusplugin
workswelltotestallinterfacesonadeviceandcomparestheSNMPadministrative
statuswiththeoperationalstatus.HostmonitoringisdefinedintheNagios
configurationfileswiththecheck_commandstanza,wheretypicallythisisdefined
atahighlevelofthehostdefinitionhierarchybutcanbeoverriddenforsubgroupsor
specifichosts.Forexample,inhosts.cfg:
define host {
host_name group-100-a1
use host_172.31.100 ;Inherits from this parent class
parents group-100-r2 ;This is n/w route to device
alias group-100-a1.class.example.org
address group-100-a1.class.example.org
check_command check_ifstatus ;SNMP status check, not ping
}
AsummaryofhoststatusisgivenontheTacticalOverviewdisplay.TheHost
Detaildisplaythengivesfurtherinformationforeachdevice.Thehostsmonitored
usingcheck_pingshowtheRoundTripAverage(RTA).Notethatgroup100a1is
monitoredusingthecheck_ifstatuspluginsoshowsdifferentStatusInformation.
28
Figure14:NagiosHostDetaildisplay
Availabilitymonitoring,especiallyforcomputersratherthannetworkdevices,can
meanmanythings.Nagiosprovidesmanypluginsforportmonitoring,including
genericTCPandUDPmonitors.Thecheck_snmpplugincouldbeusedtocheck
SNMPparametersfromtheHostResourcesMIB(ifatargetsupportsthis).Nagios
alsoprovidesremoteagents,NSClient++forWindowsandNRPEforUnix/Linux
systems,whichprovideamuchmorecustomisabledefinitionofsystemmonitoring.
Servicesaretypicallydefinedinservices.cfg.Aswithhostdefinitions,servicescanbe
definedinaclasshierarchywherecharacteristicsofanobjectareinheritedfromits
parent.
29
Figure15:Nagiosservice.cfgtoplevelobjects
Again,notethecheck_period,max_check_attempts,normal_check_intervaland
retry_check_intervalstanzas.Morespecificservicedefinitionscanbethenbedefined,
inheritingcharacteristicsofparentsthroughtheusestanza:
30
Figure16:Nagiosservices.cfgshowingspecificservices
Notethatservicescanbeappliedeithertogroupsofhosts(hostgroup_name)orto
specifichosts(host_name).
Aswithhosts,itispossibletocreategroupsofservicestoimprovetheflexibilityof
configurationandthedisplayofservices.
AlsonotethatsomeservicesruncommandsthatareinherentlylocaltotheNagios
systemeg.check_local_disk.Thecheck_dnscommandrunsnslookupontheNagios
systembutthehost_nameparametercanbeusedtospecifytheDNSservertoquery
from.Thecommandsareactuallyspecifiedintheconfigurationfilecommands.cfg,
which,inturn,callsexecutablepluginsin/usr/local/nagios/libexec.
31
Figure17:NagiosServicedetail
ServicedependenciesareanadvancedfeatureofNagiosthatallowyoutosuppress
notificationsandactivechecksofservicesbasedonthestatusofoneormoreother
services(thatmaybeonotherhosts).
Bothhostandservicemonitoringcanbeconfiguredtogenerateeventsonfailure(and
thisisthedefault).
32
notification(s)aregeneratedtooneormoreusersorgroupsofusers.Itisalsopossible
tocreateautomatedresponsestoevents(typicallyscripts).
NotethatNagiostendstousethetermseventandalertinterchangeably.
Figure18:NagiosEventLog
Bydefault,theeventlogisdisplayedinonehourlysections.Thelogshowstheevent
statusandalsoshowswhetheraNotificationhasbeengenerated(themegaphone
symbol).Thisdisplayiseffectivelysimplyshowing/usr/local/nagios/var/nagios.log.
33
UndertheReportingheadingonthelefthandmenu,therearefurtheroptionsto
displayinformationonevents(alerts).TheAlertHistoryiseffectivelythesameasthe
EventLog.TheAlertHistogramproducesgraphsforeitherahostorservicewith
customisableparameters.
Figure19:NagiosConfigurationforAlertHistogram
Noteinthefigureabovethatahost/serviceselectionhasalreadybeenpromptedfor
and,havingselectedhost,thespecifichosthasbeensupplied.Thefollowingfigure
showstheresultinggraph.Notethebluelinkstowardsthetopleftofthedisplay
providingaccesstoafilteredviewoftheeventslog(ViewHistoryforthisHost)andto
notificationsforthishost.
34
Figure20:NagiosAlertHistogramforhostgroup100r1
TheAlertSummarymenuoptioncanprovidevariousreports,specifictohostsor
services.
35
Figure21:NagiosAlertSummaryconfigurationoptions
Limitingthereporttoaspecifichost,group100r1,producesthefollowingreport.
36
Figure22:NagiosAlertSummaryforgroup100r1
37
max_check_attempts default3(numberofattemptsbeforeHARDevent)
WhenanonOKstatusisdetected,asofterrorisgeneratedforeachsamplinginterval
untilmax_check_attemptsareexhausted,afterwhichahardeventwillbegenerated.
Atthispoint,thepollingintervalrevertstothecheck_intervalratherthanthe
retry_interval.
Figure23:NagiosEventLogshowinghardandsoftevents
Notefromtheearlierfigureshowingthetopologylayout,thatgroup100r3sits
behindgroup100r1.Eachofthesehostdevicesisbeingpolledevery5minuteswhen
inanOKstate(ormax_check_attemptshasbeenexceeded)andevery1minutewhen
aproblemhasarisen.Theactualproblemthathascausedtheeventlogshownabove,
isthatgroup100r1hasfailed;however,group100r3ispolledfirstandresultsinthe
firsteventforthisdevicewithastatusofDOWNandastatetypeofSOFT.
Subsequently,group100r1ispolledandfoundtobeDOWNwhichresultsinthe
associatedpolltogroup100r3receivingastatusofUNREACHABLEandastatetype
38
ofSOFT.Thethirdpollofgroup100r3againhasastatusofUNREACHABLEanda
statetypeofSOFT.
Thenexteventforgroup100r3isaservicepingmonitor(whichrunsevery5minutes
forthisdevice).NotethatthiseventhasastatetypeofHARDthisisbecauseNagios
knowsthatthehoststatusassociatedwiththisservicemonitorisalready
UNREACHABLE(orDOWN).
ThefourtheventresultsinastatetypeofHARDandthestatusofUNREACHABLE.
Thehardeventalsogeneratesanotification.
39
Onceeachofthesefiltersfornotificationhasbeentestedandpassed,contactfilters
arethenappliedforeachcontactinthegroup(s)indicatedinthehostorservice
contact_groupsstanza.Hereisthedefaultdefinition:
Figure24:NagiosDefaultcontactdefinition
Notificationsforhostsandservicescanbesent24x7.Theyaresentforalltypesof
eventsanduseaNagioscommandthatdrivestheemailsystem.Aswithallother
Nagiosconfigurations,morespecificusersandgroupsofuserscanbedefinedwhich
changeanyoftheseparameters.
Aneventhastosatisfytheglobalcriteria,thespecifichost/servicecriteriaandthe
contactcriteria,beforeanotificationisactuallysent.
RememberfromtheAlertsHistogramreport,itispossibletoseenotificationsfora
particularhost.
Figure25:NagiosHostNotifications
40
6.3.5 Automatic responses to events event handlers
Nagioscanrunautomaticactions(eventhandlers)whenaserviceorhost:
IsinaSOFTproblemstate
InitiallygoesintoaHARDproblemstate
InitiallyrecoversfromaSOFTorHARDproblemstate
Thereisaglobalparameter,enable_event_handlerswhichmusttakethevalue1
(true),beforeanyautomationcantakeplace.
Therearetwoglobalparameters,global_host_event_handlerand
global_service_event_handlerwhichcanbeusedtoruncommandsonallhost/service
events.Thesemightbeused,say,tologalleventstoanexternalfile.
Inaddition,individualhostandservices(orgroupsofeither)canhavetheirown
event_handlerdirectiveandtheirownevent_handler_enableddirective.Notethatif
theglobalenable_event_handlersisoffthennoindividualhost/servicewillrunevent
handlers.Individualeventhandlerswillrunimmediatelyafterandglobalevent
handler.
Typically,aneventhandlerwillbeascriptorprogram,definedintheNagios
commands.cfgfile,torunanyexternalprogram.Thefollowingparameterswillbe
passedtotheeventhandler:
ForServices:$SERVICESTATE$,$SERVICESTATETYPE$,$SERVICEATTEMP$
ForHosts: $HOSTSTATE$,$HOSTSTATETYPE$,$HOSTATTEMPT$
Eventhandlerscriptswillrunwiththesameuserprivilegeasthatwhichrunsthe
nagiosprogram.
Sampleeventhandlerscriptscanbefoundinthecontrib/eventhandlers/subdirectory
oftheNagiosdistribution.Hereisthesamplesubmit_check_resultscommand:
41
Figure26:NagiosSamplesubmit_check_resultcommandforeventhandlerfromcontribdirectory
42
host_perfdata_file_processing_interval processdatafileevery<n>seconds
service_perfdata_file_processing_interval processdatafileevery<n>seconds
host_perfdata_file_processing_command Nagioscommandtoprocessdata
service_perfdata_file_processing_commandNagioscommandtoprocessdata
host_perfdata_file_template formatofdatafile
service_perfdata_file_template formatofdatafile
Figure27:NagiosPerformanceparametersinnagios.cfg
Thedefaultisthatprocess_performance_data=0(ie.off)andalltheotherparameters
arecommentedout.
Inadditiontotheglobalparameters,eachhostandserviceneedstoeitherexplicitly
configureorinheritadefinitionfor:
43
process_perf_data=1 1=datacollectionon,0=datacollectionoff
Bydefault,thegeneric_hostandgeneric_servicetemplatedefinitionssetthese
parametersto1(on).
IfaNagiospluginisabletoprovideperformancedata,itisreturnedaftertheusual
statusinformation,separatedbya|(pipe)symbol.Itcanberetrievedasthe
$HOSTPERFDATA$or$SERVICEPERFDATA$macro.ItisthenuptoyourNagios
commandstointerpretandmanipulatethatdata.
Thenextfigureshowsperformancedatathathasbeengatheredinto/tmp/service
perfdatausingthedefaultservice_perfdata_file_templatewherethelastfieldisthe
$SERVICEPERFDATA$value(iftheplugindeliversperformancedata).
Figure28:NagiosPerformancedatacollectedinto/tmp/serviceperfdata
Themostrecentperformancedatagatheredforhostsandservicescanalsobeseen
fromtheHostDetailorServiceDetailmenuoptions.
44
Figure29:NagiosPerformancedatahighlightedDNSCheckservice
45
ItisalsopossibletorunchecksonremotehostsbyinstallingtheNRPEagent
(availableforbothUnix/LinuxandWindowshosts)andtherequiredNagiosplugins,
ontheremotesystem.Thecheck_nrpepluginmustalsobeinstalledontheNagios
system.ThisallowspluginsdesignedtoberunlocaltotheNagiossystem,toberun
onremotehosts.WithNRPEagents,checksarerunonascheduledbasis,initiated
fromtheNagiossystem.
AnotheralternativeistoinstalltheNSCAaddontoremotesystems.Thispermits
remotemachinestoruntheirownperiodicchecksandreporttheresultsbackto
Nagios,whichcanbedefinedaspassiveservicechecks.
TheeventsubsystemofNagiosislesspowerfulandconfigurablethansomeofthe
otherofferingsithaslessfocusonaneventconsolebutincludesmoreinformation
abouthostandserviceeventsfromothermenus.Nagioshasnoeasybuiltinwayto
collectandprocessSNMPTRAPs.
IfyouwantlotsofperformancegraphsthenNagiosaloneisnotgoingtodelivereasily.
Insummary,Nagiosseemsgoodformonitoringarelativelysmallnumberofsystems,
providedyoudon'tneedhistoricalperformancereporting.
7 OpenNMS
OpenNMSpresentsitselfasthefirstEnterprisegradenetworkmanagement
platformdevelopedundertheOpenSourcemodel.ItisaJavaapplicationthatruns
underseveralflavoursofLinux.AVMwareVirtualMachine(VM)isalsoavailable
withthelatestreleaseofOpenNMS,whichmakesinitialevaluationveryeasywithout
havingtogothroughafullbuildprocess.Thereisalsoanonlinedemosystemwhich
appearstobemonitoringrealkitwhichgivesagoodfirsttasteoftheproduct.
ThefollowingsectionisbasedontheVMdownloadwhichisOpenNMS1.5.93based
onMandrivaitworkedveryeasily.TheVMwassetupforDHCPbutImodifiedthe
OperatingSystemfilestousealocalfixedaddress,withtheVMnetworkbridgedto
mylocalenvironment.
ToaccesstheOpenNMSWebConsole,pointyourbrowserathttp://opennms:
8980/opennms/.Thedefaultlogonidisadminwithapasswordofadmin.
HereisascreenshotofthemaindefaultwindowofOpenNMS.
46
Figure30:MaindefaultwindowforOpenNMS
ThefollowingsectionswilldescribehowtoconfiguredifferentaspectsofOpenNMSby
editingxmlconfigurationfiles.ItispossibletoconfiguremanyaspectsofOpenNMS
usingGUIdrivenmenus.Seesection7.5ManagingOpenNMSforabrief
description.
47
<end>10.0.0.254</end>
</include-range>
<include-range >
<begin>172.30.100.1</begin>
<end>172.30.100.10</end>
</include-range>
<specific 10.191.101.1/specific>
</discovery-configuration>
Intheaboveexample,pingdiscoverywillstart300,000ms(5minutes)after
OpenNMShasstartedup;thediscoveryprocesswillberestartedevery86,400,000ms
(24hours);1pingwillbesentpersecond;thetimeoutforapingwillbe800msand
therewillbe3pingretriesbeforethediscoveryprocessgivesuponanaddress.All
devicesontheClassC10.0.0.0networkwillbepolled(withonly2retriesbuta3
secondtimeout).The10devices172.30.100.1through10willbepolledforwiththe
defaultcharacteristics.Thespecificnode10.191.101.1willbepolled.
Allthatthediscoverprocessdoesistogeneratenewsuspecteventsthatarethen
usedbyotherOpenNMSprocesses.Ifthedevicedoesnotrespondtothispingpolling
thenitwillnotbeaddedtotheOpenNMSdatabase.
Anotherwaytogeneratesuchevents(sayforaboxthatdoesnotrespondtoping),isto
useaprovidedPerlscript:
/opt/opennms/bin/sendevent.plinterface<ipaddr>
uei.opennms.org/internal/discovery/newsuspect
48
<property key="retry" value="1"/>
</protocol-plugin>
Thisdefinesaservice(protocol)calledSSHthattestsTCPport22usingtheTCP
plugin.ItwilllookforthestringSSHtobereturned.Timeoutis3secondswith1
retry.
Thefirstprotocolentryincapsdconfiguration.xmlisforICMP.
<protocol-plugin protocol="ICMP"
class-name="org.opennms.netmgt.capsd.IcmpPlugin" scan="on" user-defined="false">
<property key="timeout" value="2000"/>
<property key="retry" value="1"/>
</protocol-plugin>
Itispossibletoapplyprotocolstospecificaddressrangesorexcludeprotocolsfrom
addressranges(thedefaultisinclusion).
<protocol-plugin protocol="ICMP"
class-name="org.opennms.netmgt.capsd.IcmpPlugin" scan="on" user-defined="false">
<protocol-configuration scan="off" user-defined="false">
<range begin="172.31.100.1" end="172.31.100.15"/>
<property key="timeout" value="4000"/>
<property key="retry" value="3"/>
</protocol-configuration>
</protocol-plugin>
Notethescan=offforIPaddresses172.31.100.115.
TheSNMPprotocolisspecialinthat,ifsupported,itprovidesawaytocollect
performancedataaswellaspollforavailabilitymanagementinformation.SNMP
parametersfordifferentdevicesandrangesofdevicesarespecified
in/opt/opennms/etc/snmpconfig.xml.Hereisasample:
<snmp-config retry="3" timeout="800" version=v1 port=161
read-community="public" write-community="private">
<definition version="v2c">
<specific>10.0.0.121</specific>
</definition>
<definition retry="2" timeout="1000">
<range begin="172.31.100.1" end="172.31.100.254"/>
</definition>
<definition read-community="fraclmye" write-community="rrwatr">
<range begin="10.0.0.1" end="10.0.0.254"/>
</definition>
</snmp-config>
Thefirststanzainsnmpconfig.xmlprovidesglobaldefaultparametersforSNMP
access.Variationsinanyoftheseglobalparameterscanbemadeusingadefinition
stanzaandeitherarangeoraspecificstatement.Thisfileisusedbothfordiscovery
andforcollectingperformancedata.
49
WhentestingSNMP,capsdmakesanattempttoreceivethesysObjectIDMIB2
variable(.1.3.6.1.2.1.1.2.0).Ifsuccessful,thenextradiscoveryprocessingtakesplace.
First,threethreadsaregeneratedtocollectthedatafromtheSNMPMIB2system
treeandtheipAddrTableandifTabletables.If,forsomereason,theipAddrTableor
ifTableareunavailable,theprocessstops(buttheSNMPsystemdatamayshowupon
thenodepage).
Second,alloftheIPaddressesintheipAddrTablearerunthroughthecapsd
capabilitiesscan.Notethatthisisregardlessofhowmanagementisconfiguredinthe
configurationfile.Thisonlyhappensontheinitialscanandonforcedrescans.On
normalrescans(bydefault,every24hours),IPaddressesthatare"unmanaged"in
capsdarenotpolled.
Third,everyIPaddressintheipAddrTablethatsupportsSNMPistestedtoseeifit
mapstoavalidifIndexintheifTable.Ifthisistrue,theIPaddressismarkedasa
secondarySNMPinterfaceandisacontenderforbecomingtheprimarySNMP
interface.
Figure31:OpenNMSnodedetailforaswitchshowingswitchports
50
Thefirststanzaincapsdconfiguration.xmldefinesservicepollingparameters:
<capsd-configuration rescan-frequency="86400000"
initial-sleep-time="300000"
management-policy="managed"
max-suspect-thread-pool-size = "6"
max-rescan-thread-pool-size = "3"
abort-protocol-scans-if-no-route = "false">
Thisdefinesthatcapsdwillwait5minutesafterOpenNMSstartsbeforestartingthe
capsddiscoveryprocess.Itwillrescantodiscoverservicesevery24hours.The
defaultmanagementpolicyforallIPaddressesfoundinnewsuspecteventswillbe
toscanforeachoftheservices.Thismanagedparametercanbeoverriddenatthe
endofcapsdconfiguration.xmlbyunmanagedrangestanzas:
<ip-management policy="unmanaged">
<specific>0.0.0.0</specific>
<range begin="127.0.0.0" end="127.255.255.255"/>
</ip-management>
Whenanewsuspecteventisgenerated,providedtheIPaddressisinamanaged
managementpolicyrange,theIPaddressischeckedforeachoftheservicesincapsd
configuration.xml,startingfromthetop.
Ifthedevicedoesnotrespondtoanyconfiguredservicethen,eveniftriggeredwith
send_event.pl,itwillnotbeaddedtotheOpenNMSdatabase.Look
in/opt/opennms/logs/daemon/discovery.logfordebugginginformation.
51
Figure32:OpenNMSNodeListofdiscoverednodes
52
Figure33:OpenNMSnodedetailforgroup100r1
Notetheservicesthathavebeendiscoveredforthenode.Thelistofservicesper
interfacearethosethathavebeenactuallydetected;whethertheyareMonitoredor
notwillbediscussedinthenextsection.
OpenNMSperformsavailabilitymonitoringbypollingdeviceswithprocessesknown
asmonitorswhichconnecttoadeviceandperformasimpletest.Pollingonlyhappens
toaninterfacethathasalreadybeendiscoveredbycapsd.
Theconfigurationfileforpollingis/opt/opennms/etc/pollerconfiguration.xml.There
aremanysimilaritiesbetweenthisandcapsdconfiguration.xml;howeverthe
monitorsaredefinedwithmonitorservicestanzas(ratherthanprotocolstanzas),
whichdefinetheJavaclasstouseformonitoring.
53
<monitor service="DominoIIOP" class-name="org.opennms.netmgt.poller.DominoIIOPMonitor"/>
<monitor service="ICMP" class-name="org.opennms.netmgt.poller.IcmpMonitor"/>
<monitor service="Citrix" class-name="org.opennms.netmgt.poller.CitrixMonitor"/>
<monitor service="LDAP" class-name="org.opennms.netmgt.poller.LdapMonitor"/>
<monitor service="HTTP" class-name="org.opennms.netmgt.poller.HttpMonitor"/>
<monitor service="HTTP-8080" class-name="org.opennms.netmgt.poller.HttpMonitor"/>
<monitor service="HTTP-8000" class-name="org.opennms.netmgt.poller.HttpMonitor"/>
<monitor service="HTTPS" class-name="org.opennms.netmgt.poller.HttpsMonitor"/>
<monitor service="SMTP" class-name="org.opennms.netmgt.poller.SmtpMonitor"/>
<monitor service="DHCP" class-name="org.opennms.netmgt.poller.DhcpMonitor"/>
<monitor service="DNS" class-name="org.opennms.netmgt.poller.DnsMonitor" />
<monitor service="FTP" class-name="org.opennms.netmgt.poller.FtpMonitor"/>
<monitor service="SNMP" class-name="org.opennms.netmgt.poller.SnmpMonitor"/>
<monitor service="Oracle" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="Postgres" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="MySQL" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="Sybase" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="Informix" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="SQLServer" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="SSH" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="IMAP" class-name="org.opennms.netmgt.poller.ImapMonitor"/>
<monitor service="POP3" class-name="org.opennms.netmgt.poller.Pop3Monitor"/>
<monitor service="NSClient class-name="org.opennms.netmgt.poller.NsclientMonitor"/>
<monitor service="NSClientpp class-name="org.opennms.netmgt.poller.NsclientMonitor"/>
<monitor service="Windows-Task-Scheduler" class-name="org.opennms.netmgt.poller.Win32ServiceMonitor"/>
Precedingthemonitorservicestanzasinpollerconfiguration.xmlarethedefinitions
ofservices.Theselookverysimilartotheentriesincapsdconfiguration.xml(which
makessenseasthisistheregularpollingdefinitionsforthesameservicesthatcapsd
hasalreadyfound);howeverparametersinthepollerfilemaywelltakedifferent
values(forexample,thediscoveryservicemaybeallowedlongertimeoutsandmore
retriesthanthepollingservice).
<service name="ICMP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
</service>
Notethatthedefaultpollerconfiguration.xmlhastheSNMPmonitorserviceturned
off.
Servicesmaybedefinedseveraltimeswithdifferentparameterseachservicewill
obviouslyrequireauniquename.Thisissothatdifferentdevicescanreceive
availabilitymonitoringwithdifferentcharacteristics.
Foravailabilitypolling,devicesaregroupedtogetherinpackages,whereapackage
defines:
targetinterfaces
servicesincludingthepollingfrequency
54
adowntimemodel(whichcontrolshowthepollerwilldynamicallyadjustits
pollingonservicesthataredown)
anoutagecalendarthatschedulestimeswhenthepollerisnottopoll(i.e.
scheduleddowntime).
Therearetwopackagesdefinedinthedefaultpollerconfiguration.xmlfile,example1
andaseparatepackage,strafer,tomonitorStrafePing.Apackagedefinitionmust
includeasinglefilterstanza;itmayalsohavespecific,includerangeand
excluderangestanzas.Hereisthestartofthedefault,asshipped:
<package name="example1">
<filter>IPADDR != '0.0.0.0'</filter>
<include-range begin= 1.1.1.1 end= 254.254.254.254 />
Itisthenfollowedbythelistofservicespertinenttothatpackageexample1includes
manyoftheservices,witheachservicesettostatus=onexceptSNMP.
Theopeningstanzainpollerconfiguration.xmlcontrolstheoverallbehaviourof
polling:
<poller-configuration threads="30"
serviceUnresponsiveEnabled="false"
nextOutageId= SELECT nextval('outageNxtId')
xmlrpc= false >
<node-outage status="on"
pollAllIfNoCriticalServiceDefined="true">
<critical-service name="ICMP"/>
</node-outage>
30threadsareavailableforpolling.Thebasiceventthatisgeneratedwhenapoll
failsiscalled"NodeLostService".Ifmorethanoneserviceislost,multiple
NodeLostServiceeventswillbegenerated.Ifalltheservicesonaninterfacearedown,
insteadofaNodeLostServiceevent,an"InterfaceDown"eventwillbegenerated.Ifall
theinterfacesonanodearedown,thenodeitselfcanbeconsidereddown,andthis
sectionoftheconfigurationfilecontrolsthepollerbehaviourshouldthatoccur.Ifa
"NodeDown"eventoccursandnodeoutagestatus=onthenalloftheInterfaceDown
andNodeLostServiceeventswillbesuppressedandonlyaNodeDowneventwillbe
generated.Insteadofattemptingtopollalltheservicesonthedownnode,thepoller
willattempttopollonlythecriticalservice.Oncethecriticalservicereturns,the
pollerwillthenresumepollingtheotherservices.
Noteinthefollowingscreenshotthatsixserviceshavebeendiscoveredonthe
10.0.0.95interfaceofthenodecalleddeodar.skills1st.co.uk,ofwhichfourare
monitored.Thetwointerfacesonthe172.16networkhavebeendetectedthrough
SNMPqueriesbutthereisnomonitoringofanyservicesonthesenetworks.There
arenocurrentissueswithdeodarandavailabilityhasbeen100%overthelast24
hours.
55
Figure34:OpenNMSnodedetailwithmonitoredservices
OpenNMSincludesastandardsetofAvailabilityreports.Theycanbeselectedfrom
theReportsmenu:
56
Figure35:OpenNMSAvailabilityreportsmenu
Hereisasample:
57
Figure36:OpenNMSOverallserviceavailabilityreport
Notethatthereisan/opt/opennms/etc/examplesdirectorywithextrasamplesofall
theOpenNMSconfigurationfiles.
AlsonotethatOpenNMSneedsrecyclingifanyconfigurationfileshavebeenmodified.
Use:
/etc/init.d/opennmsstop
/etc/init.d/opennmsstart
58
7.3 Problem management
Forproblemmanagement,OpenNMShastheconceptsof:
Events allsortsofbothgoodandbadnews
Alarms importantevents
Notifications typicallyemailorpagerbutcouldbeothermethods
Theeventssubsystemisdrivenbytheeventdprocesswhichlistensonport5817.Out
ofthebox,eventdreceivesinternaleventsfromOpenNMS(suchasnewsuspect
events)andSNMPTRAPs.Itispossibletoalsoconfigureforothereventsources
(suchasfromsyslogs).
Figure37:OpenNMSEventsmenu
TheAdvancedSearchoptionprovidesseveralwaystofilterevents.Bydefault
Outstandingeventsaredisplayed(ie.eventsthathavenotbeenAcknowledged).
59
Figure38:OpenNMSAdvancedEventSearchoptions
Notethatifyouwishtosearchonseverity,youhavetospecifyanexactseverity;you
cannotspecifyseveritygreaterthan.....
60
Figure39:OpenNMSdisplayofAllevents
Thecolumnheaderscanbeclickedontouseassortkeys(ascending/descending).
TheAckboxcanbetickedtoAcknowledgeoneormoreeventstheywillthen
disappearfromthisdisplaywhichonlyshowsOutstandingevents.Clickonthe
symbolbesideEvent(s)outstandingtoseeEvent(s)Acknowledged,includingthe
nameoftheuserthatacknowledgedtheevent.
Thevarious[+]and[]linkscanbeusedtofilterin/outontheparameter(suchas
node,interface,orservice).The[<]and[>]besidetheTimecanbeusedtofilterfor
eventsbeforeorafterthistime.
Toseetheeventdetail,clickontheIDlink.
61
Figure40:OpenNMSEventdetailforevent139192
62
alarmtype 1=problem,2=resolution.alarmtype=2alsotakesa
clearkeyparameterdefiningtheproblemeventthisresolves
autoclean trueorfalse
operinstruct optionalinstructionsforoperatorsusingthewebGUI
mouseovertext texttodisplaywhenmousepositionedoverthisevent
autoaction absolutepathnametoexecutableprogramexecutedevery
eventinstance
Manyofthetagscanusedatasubstitutedfromtheevent.Thesearedocumentedon
theOpenNMSwiki:
63
Figure41:OpenNMSeventparametersthatcanbesubstituted
Hereisanexampleeventfromthedefaulteventconf.xml:
64
Figure42:OpenNMSeventdefinitionfornodeLostService
ThedifferentseveritiesavailablecanbeseenbyselectingtheSeverityLegendoption
fromthetopofaneventslist.
Figure43:OpenNMSeventseveritylegend
Notethatthereisnoseparatefiletoconfigurealarms;itissimplydonewiththe
<alarmtype>tagineventconf.xml.
OpenNMScomeswithahugenumberofeventspredefined.Tomakeeventconf.xml
muchmoremanageable,inclusionfilescanbespecifiedattheend,suchas:
<eventfile>events/NetSNMP.events.xml</eventfile>
Theeventssubdirectorycurrentlyhasaround100filesinit!Forperformancereasons,
itmakessensetoediteventconf.xmlandremoveany<eventfile>stanzasthatarenot
relevantforyourorganisation.
AlsonotethatthewholeOpenNMSsystemmustberecycledinorderforchangesto
eventconf.xmltotakeeffect!
65
Figure44:OpenNMSUnknowntrapappearsintheEventslist
ClickingontheeventIDgivesthedetailoftheeventwhichshowsalltheinformation
thatarrivedwiththeTRAP.
Figure45:OpenNMSEventdetailforanunformattedTRAP
TRAPsareconfiguredineventconf.xml(oranincludefile),usingthe<mask>tag.
Thistagspecifiesmaskelementswithname/valuepairsthatmustmatchdata
deliveredbytheTRAP,inorderforthisparticulareventconfigurationtomatch.
66
Figure46:OpenNMSDefinitionindefault.events.xmlforanunknownspecifictrap
ThisexampleeventwillmatchanyTRAPwhosegenericfieldisequalto6.Note,as
withotherconfigurationsineventconf.xml,thatthisdefinitionwillonlymatchthe
incomingTRAPifnopreviousdefinitionhigherinthefile(orincludefiles)hadalready
matchedit.
Themaskelementnametagmustbeone(ormore)ofthefollowing:
uei
source
host
snmphost
nodeid
interface
service
id(OID)
specific
generic
Itispossibletousethe"%"symboltoindicateawildcardinthemaskvalues.
SNMPTRAPsoftenhaveadditionaldatawiththem,knownasvarbinds.Thisdata
canbeaccessedusingthe<parm>element,where:
Eachparameterconsistsofanameandavalue.
%parm[all]%:Willreturnaspaceseparatedlistofallparametervaluesinthe
formparmName1="parmValue1" parmName2="parmValue2"etc.
%parm[values-all]%:Willreturnaspaceseparatedlistofallparameter
valuesassociatedwiththeevent.
%parm[names-all]%:Willreturnaspaceseparatedlistofallparameter
namesassociatedwiththeevent.
67
%parm[<name>]%:Willreturnthevalueoftheparameternamed<name>ifit
exists.
%parm[##]%:Willreturnthetotalnumberofparameters.
%parm[#<num>]%:Willreturnthevalueofparameternumber<num>.
Anyofthisdatacanbeusedinthemessageordescriptionfields.
Inaddition,thevarbinddatacanalsobeusedtofiltertheeventwithinthe<mask>
tags,followingthe<maskelement>tags.Itispossibletomatchmorethanone
varbind,andmorethanonevaluepervarbind.Forexample:
<varbind>
<vbnumber>3</vbnumber>
<vbvalue>2</vbvalue>
<vbvalue>3</vbvalue>
</varbind>
<varbind>
<vbnumber>4</vbnumber>
<vbvalue>2</vbvalue>
<vbvalue>3</vbvalue>
</varbind>
Theabovecodesnippetwillmatchifthethirdparameterhasavalueof"2"or"3"and
thefourthparameterhasavalueof"2"or"3".Itisalsopossibletouseregular
expressionswhenmatchingvarbindvalues.
Again,notethattheorderinwhicheventsarelistedisveryimportant.Putthemost
specificeventsfirst.
Hereisanexampledefinitionthatincludesmatchingavarbindwitharegular
expression.Notethe<vbvalue>matchesanystringthatcontainseitherBadorbad.
Extrastanzashavealsobeenaddedfor<operinstruct>help(whichprovidesaweb
linkononelineandplaintextonthesecond),a<mouseovertext>tag(whichdoesn't
appeartowork)andatagtorunanautomaticaction(ashellscript)wheneverthis
eventoccurs.
68
Figure47:OpenNMSConfigurationofspecificTRAPwithvarbindmatchingaregularexpression
IfyouhaveSNMPTRAPdefinitionsinamibfile,theopensourceutility
mib2opennmscanbeobtainedtoconvertSNMPV1TRAPsandSNMPV2
NOTIFICATIONSintoanOpenNMSeventconfigurationxmlfile.Forasourcefile
vcs.mibin/home/jane,use:
mib2opennmsf/opt/opennms/etc/events/vcs.events.xmlm/home/janevcs.mib
69
Figure48:OpenNMSAlarmsdisplay
Alarmsaredefinedaspartofaneventdefinitionineventconf.xmlanditsincludefiles.
Itusesthe<alarmdata>tagwhere:
reductionkey fieldstocomparetodetermineduplicateevent
alarmtype 1=problem,2=resolution.alarmtype=2alsotakesa
clearkeyparameterdefiningtheproblemeventthisresolves
autoclean trueorfalse.Trueensuresthatalleventsotherthanthe
latestone,thatmatchthereductionkey,areremoved(veryusefulforclearing
outduplicateevents)
Oneofthekeycharacteristicsofanalarmthatdifferentiatesitfromanevent,isthe
reductionkeyfield,whichshouldensurethatduplicateeventsaretreatedasone
eventwithmultipleinstances,ratherthanasmultipleevents.
MostoftheinformationprovidedwithaneventisalsoavailableintheAlarmdisplay.
ThenewfieldisCountwhichshowsthenumberofduplicateeventsthathavebeen
integratedintothisalarm.Toseetheindividualevents,clickonthenumberinthe
Countcolumn.
70
Atpresent(July10th,2008),acknowledgingeventshasnoeffectonrelatedalarms,
andviceversa.NotethattheconceptsofAcknowledgingandClearingare
completelydifferent.Anoperatorcanacknowledgeaneventoranalarm,andthen
ownsit.Thisdoesnotcleartheevent(ie.removeitentirelyfromtheevents
database).
Automaticactionscanbeconfiguredforaneventusingthe<autoaction>tagbutthis
canonlyrunanexecutableanditrunsoneveryoccurrenceoftheevent(whichmay
notbewhatyouwant!).
OpenNMS'sconceptofautomation,however,istriggeredfromalarmsratherthan
events.Automationistheconceptofactionsbeingperformedonascheduledbasis,
providedthecorrecttriggersexist.An<automation>tagincludes:
name thenameoftheautomation
interval thefrequencyinmillisecondsatwhichtheautomationruns
triggername astringthatreferencesatriggerdefinition
actionname astringthatreferencesanactiondefinition
ThetriggersandactionsareSQLstatementsthatoperateontheeventsdatabase.
Automationisdefinedin/opt/opennms/etc/vacuumd.xmlwherethereareanumberof
usefulrules,bydefault:
71
Figure49:OpenNMSDefaultdefinitionsforautomationsinvacuumd.xml
Notethatautomationsalwaysrequireanactionnamebutdonotnecessarilyneeda
triggername.
ThecosmicClearautomationisthemeansbywhichan<alarmdata>alarmtype=2
tagineventconf.xml,canclearbadnewseventswhengoodnewseventsarrive.
HereisthedefinitionoftheselectResolverstriggername:
Figure50:OpenNMSDefinitionofselectResolverstriggerinvacuumd.xml
...andtheclearProblemsaction:
72
Figure51:OpenNMSDefinitionofclearProblemsactioninvacuumd.xml
ThetriggeriskeyedonthefieldalarmType=2.Notethatthefirstversionofthe
actioniscommentedouttheclearueielementisnowdeprecatedinthe<alarm
data>tagandonlytheclearkeyelementonthegoodnewseventisusedtomatch
againstthereductionkeyelementofthebadnewsevent,settingtheseverityto2
(ie.Cleared).Alsonotefromthe<automation>tagthatcosmicClearwillrunevery30
seconds.
IfusersneedtobenotifiedofaneventthenOpenNMSprovidesemailandpager
notificationsoutofthebox,runbythenotifddaemon.Itisalsopossibletocreate
othernotificationmethodssuchasSNMPTRAPsoranarbitraryexternalprogram.
Thereareseveralrelatedconfigurationfilesin/opt/opennms/etc:
destinationPaths.xml who,when,howtonotify/escalate
notifdconfiguration.xml globalparametersfornotifd
notificationCommands.xml notificationmethodsemail,http,page
notifications.xml whateventsgeneratenotifications,where
javamailconfiguration.properties configurationforjavaemailer(default)
ThemainfilesthatwillneedattentionaredestinationPaths.xml,notifd
configuration.xmlandnotifications.xml.Hereispartoftheexamplesfileprovided
in/etc/opennms/etc/examples/destinationPaths.xml:
73
Figure52:OpenNMSExampleentriesindestinationPaths.xml
The<name>tagspecifiesauserorgroupofusersdefinedinOpenNMS.The
<command>tagspecifiesamethodthatmustbedefinedin
notificationCommands.xml.Notethatescalationsarepossible.
Whenaneventisreceivedforwhichanotificationisrequired,OpenNMS"walks"the
destinationpath.Wesaythatthedestinationpathis"walked"becauseitisoftena
seriesofactionsperformedovertimeandnotnecessarilyjustasingleaction(although
itcanbe).Thedestinationpathcontinuestobewalkeduntilallnotificationsand
escalationshavebeensentorthenotificationisacknowledged(automaticallyorby
manualintervention).
Outofthebox,theonlydestinationPaththatisconfiguredisforjavaEmailtothe
Admingroupofusers.
Thenotifications.xmlfilespecieswhateventstriggernotificationsandtowhom.Here
isanexamplefromthedefaultfile:
74
Figure53:OpenNMSExtractofnotificationsfromnotifications.xml
ThenotificationcalledinterfaceDownisturnedon;itappliestoallinterfacesother
than0.0.0.0;thenotificationissenttothedestinationEmailAdmin(definedin
destinationPaths.xml)andthetextmessageoftheemailincludes3parametersfrom
theevent4parametersareincludedontheemailsubject.Thedefault
notifications.xmlgeneratesemailtotheAdmingroupforthefollowingevents:
interfaceDown
nodeDown
nodeLostService
nodeAdded
interfaceDeleted
HighThreshold
LowThreshold
HighThresholdRearmed
LowThresholdRearmed
Nothing,sofar,hashandledacknowledgingnotifications.Thiscaneitherbedone
manuallybyauserorcanbeperformedautomatically.Eitherway,whena
notificationisacknowledged,itstopsthedestinationpathbeingwalkedforthe
originalnotification.Itwillalsocreateanewnotificationtotellusersthattheoriginal
issueisresolved.Automaticacknowledgementsareconfigured
75
in/opt/opennms/etc/notifdconfiguration.xmlwhere<autoacknowledge>tagsspecify
theueiresolution/problemevents,alongwiththeparametersontheeventwhich
mustalsomatchforthenotificationtobeautomaticallyacknowledged.
Figure54:OpenNMSnotifdconfiguration.xmlwithautoacknowledgementsfornotifications
Notethatatpresent(July2008)notificationsaredrivenbyeventsnotalarms.Also
notethatacknowledgingnoticeshasnoeffectontheirassociatedeventsoralarms.
Itwouldappearthattherehasbeenadiscussionofachangeinarchitecturearound
events,alarmsandnotifications,atleastthroughout2008.Inthefuture,itis
suggestedthatalarmswillbewheremostautomationisdrivenfrom,including
notifications,andthateventswillbecomemoreofabackgroundlog.
76
datacollectionconfig.xmlspecifiescollectionnames(justthesnmpcollection
calleddefaultoutofthebox),whichdefines(typicallyMIB)valuestocollect
collectdconfiguration.xmlspecifiespackagesforcollection.Apackagecombines
filtersandrangestodeterminewhichinterfacescollectionsshouldbeappliedto,
withserviceswhichreferencecollectionsindatacollectionconfig.xml.collectd
configuration.xmlcanalsospecifydatacollectionintervalsandwhetherthe
collectionisactive.
Notethatifadevicehasseveralinterfacesthat:
SupportSNMP
HaveavalidifIndex
Isincludedinacollectionpackageincollectdconfiguration.xml
thenthelowestIPaddressismarkedasprimaryandwillbeusedbydefaultforall
performancedatacollection.
collectdistriggeredwhencapsdgeneratesaNodeGainedServiceevent.The
discoveredprotocolname(eg.SNMP,SSH)ispassedfromcapsdtocollectd,alongwith
theprimaryinterfacefromtheevent.Thesearecheckedagainsttheconfigurationin
collectdconfiguration.xmltoseewhetheranycollectionpackagesarevalid(there
shouldbeatleastone,bydefinition!)anddatacollectionisstarted.
Figure55:OpenNMScollectdconfiguration.xmlasshipped
Thereisonlyonepackagespecifiedincollectdconfiguration.xml,asshipped,which
appliestoallinterfacesotherthan0.0.0.0andintherange1.1.1.1through
254.254.254.254.Aswithpollerconfiguration.xml,youmusthaveonefilter
77
statementperpackageandcanthenusemultiple<specific>,<includerange>and
<excluderange>statementstodefinewhichinterfacesthispackageappliesto.You
canalsousethe<includeurl>tagtospecifyafilewithalistofinterfaces.
ThereisonlyonedatacollectionservicedefinedforOpenNMSoutofthebox,in
collectdconfiguration.xmltheSNMPservice.Itwillrunevery5minutes(300,000
ms)andwillcollecttheMIBvariablesspecifiedinthecollectioncalleddefault,
specifiedindatacollectionconfig.xml.The<service>stanzacanalsospecifyvaluesfor
SNMPtimeouts,retriesandportnumberwhichwouldoverridethedefaultvaluesin
snmpconfig.xml.
Thepackagedefinitioncanalsousethe<outagecalendar>tagtospecifyscheduled
downtimefordevices,duringwhichdatacollectionwillbesuspended.Thisshouldbe
usedtopreventlotsoffailedSNMPcollectionevents.Outageperiodsaredefinedin
thepolloutages.xmlfile.
Obviouslyyoucanspecifydifferentpackageswithdifferentaddressranges,collection
intervalsandwithdifferentcollectionkeys.Youcanalsospecifydatacollectorsother
thanSNMP,suchasNSClient,JMXandHTTP.Seehttp://blogs.opennms.org/?p=242
foranoteonusinganHTTPdatacollector.
Thedatacollectionconfig.xmlfiledefinesoneormoreSNMPdatacollectionsthat
TarusBalog(theprimedeveloperbehindOpenNMS)callsa"scheme",todifferentiate
itfromthepackagedefinedinthecollectdconfigurationfile.Theseschemesbring
togetherOIDsforcollection,intogroupsandthegroupsaremappedtosystems.The
systemsaremappedtointerfacesbyadevice'ssystemOID.Inaddition,each"scheme"
controlshowthedatawillbecollectedandstored.
Fundamentally,OpenNMSusesRRDTool(RoundRobinDatabaseTool)tostore
performancedata.ThispaperisnotatutorialonRRDToolsopleasefollowthe
referencetoRRDattheendofthispaperformoreinformation.
ThebasisofRRDisthatafixedamountofspaceisallocatedforagivendatabase
whenitiscreated.Itholdsdataforagivenperiodoftime,say1month,1year,etc.
Thesamplingintervalisknownsoyouknowhowmanydatapointswillgointothe
databaseandhencehowmuchspaceisrequired.Oncethedatabaseisfull,newer
datapointswillreplacetheoldestones,cyclingaround.
Figure56:OpenNMSdatacollectionconfig.xmlcollectionandRRDparameters
78
The<rrd>stanzaspecifieshowdatawillbestoredinaRoundRobinArchive(RRA).
Thesnapshotshowninthefigureabovespecifies:
<rrdstep="300">
datatobesavedevery5minutes,perstep
RRA:AVERAGE:0.5:1:2016
createanRRAwithvaluesAVERAGE'dover1step(ie.thisdataisraw,
notconsolidated).TheRRAwillhave2016rowsrepresenting7daysofdata
(5minutesteps=12/hour*24hours*7days=2016).Consolidatethe
samplesprovided0.5(half)ofthemarenotUNKNOWN(otherwisethe
consolidatedvaluewillbeUNKNOWN)
RRA:AVERAGE:0.5:12:1488
createanRRAwithvaluesAVERAGE'dover12steps(ie.thisdatais
consolidatedover1hour).TheRRAwillhave1488rowsrepresenting2
monthsofdata(1hourconsolidations*24hours*62days=1488).
Consolidatethesamplesprovided0.5(half)ofthemarenotUNKNOWN
(otherwisetheconsolidatedvaluewillbeUNKNOWN)
RRA:AVERAGE:0.5:288:366
createanRRAwithvaluesAVERAGE'dover288steps(ie.thisdatais
consolidatedover288*5minsteps=1day).TheRRAwillhave366rows
representing1yearofdata(1dayconsolidations*366days=366).
Consolidatethesamplesprovided0.5(half)ofthemarenotUNKNOWN
(otherwisetheconsolidatedvaluewillbeUNKNOWN)
RRA:MAX:0.5:288:366
createanRRAwithMAXvaluesaverageddailyandkeep1yearofdata
RRA:MIN:0.5:288:366
createanRRAwithMINvaluesaverageddailyandkeep1yearofdata
Thetopofdatacollectionconfig.xmldefineswheretheRRDrepositoriesarekeptand
howmanyvariablescanberetrievedbyanSNMPV2GETBULKcommand(10isthe
default).Withintherepositorydirectory,foreachnode,therewillexistadirectory
thatconsistsofthenodenumber.Thus,ifthesystemwascollectingdataonnode21,
therewouldbeadirectorycalled/opt/opennms/share/rrd/snmp/21containinga
datafileforeachMIBOIDbeingcollected.Filenameswillmatchthealiasparameter
foraMIBOID,indatacollectionconfig.xml.
Thenodenumbercanbefoundbygoingtothedetailednodeinformationforadevice
andchoosingtheAssetInfolink:
79
Figure57:OpenNMSAssetInfolinkforadevice
TheresultingpageincludestheNodeIDatthetop.
80
Figure58:OpenNMSAssetinformationpage,includingNodeID
ThesnmpStorageFlagparameterinthesnmpcollectionstanzaofdatacollection
config.xmldefinesforwhichinterfacesofadevice,datawillbestored.Possiblevalues
are:
all (theolddefault)
primary theprimarySNMPinterface
select collectfromallIPinterfacesandcanuseAdminGUIto
selectadditionalnonIPinterfacestocollectdatafrom(new
defaultsinceOpenNMS1.1.0)
81
Figure59:OpenNMSGUIAdminpageforspecifyinginterfacestocollectdatafrom
Mostofthecontentsofdatacollectionconfig.xmlisdefininggroupsandsystems:
groups definegroupsofSNMPMIBOIDstocollect
systems useadevice'sSystemOIDasamasktodeterminewhichgroupsof
OIDsshouldbecollected
82
Figure60:OpenNMSgroupdefinitionsindatacollectionconfig.xml
UnfortunatelyOpenNMSdoesnothaveaMIBcompilersoallMIBOIDsneedtobe
manuallyspecifiedinthisfile(thegoodnewsisthattherearelotsthereoutofthe
box).OncegroupsofMIBvariablesaredeclared,systemstanzassaywhichgroup(s)
aretobecollectedforanydevicewhosesystemOIDmatchesaparticularpattern.
EachSNMPMIBvariableconsistsofanOIDplusaninstance.Usually,thatinstance
iseitherzero(0)oranindextoatable.Atthemoment,OpenNMSonlyunderstandsa
smallnumberoftableindices(forexample,theifIndexindextotheifTableandthe
hrStorageIndextothehrStorageTable).Allotherinstanceshavetobeexplicitly
configured.
TheifTypeparametercanbeusedtospecifythesortofinterfacestocollectfrom.
Legalvaluesare:
all collectfromallinterfacetypes
83
ignore usedwhenthevaluewouldbethesameforallinterfaceseg.
CPUutilisationforaCiscorouter
<i/ftypenumber> usedtodenoteoneormorespecificinterfacetypes.For
exampleifType=6forethernetCsmacd.See
http://www.iana.org/assignments/ianaiftypemibfora
comprehensivelist.
OpenNMSunderstandsfourtypesofvariablestocollectongauge,timeticks,integer,
octetstring.NotethatRRDonlyunderstandsnumericdata.
Figure61:OpenNMSsystemsdefinitionsindatacollectionconfig.xml
Inthefigureabove,anydevicewhichhassatisfiedthefilteringincollectd
configuration.xmlandhasasystemOIDstartingwith.1.3.6.1.4.1(thestartofthe
EnterpriseMIBtree),willcollectperformancedataforMIB2interfaces,tcpandicmp,
asspecifiedintheearlier<group>stanzas.
Notethatthedefaultsincollectdconfiguration.xmlanddatacollectionconfig.xml
meanthatalargenumberofSNMPdatacollectionswillbeactivatedoutofthebox.
Thisisgoodinprovidinglotsofsamplesinsmallenvironmentsbutitcouldbea
seriousperformanceanddiskusagefactorifthesedefaultsareleftunchanged,where
alargenumberofinterfacesaremonitoredbyOpenNMS.
84
7.4.2 Displaying performance data
OpenNMSprovidesalargenumberofreportsoutofthebox,basedonthedefaultdata
collectionparameters.UsetheReportsmainmenutoseetheoptions.
Figure62:OpenNMSReportcategoriesavailableoutofthebox
ResourceGraphs providelotsofstandardreports
KSCPerformance,Nodes,Domains allowsuserstocustomiseownreports
Availability availabilityreportsforinterfaces&services
StatisticsReports showsTop20ifInOctets acrossallnodes
FollowingtheResourceGraphslinkprovidesaccesstomanystandardreports.
85
Figure63:OpenNMSStandardperformancereports
Thestandardperformancereportsdisplayvariouscollectedvaluesforoneparticular
nodewhichyouchoosefromthemenuprovided.Thedifferentcategoriesprovide:
NodelevelperformancedatasuchasTCPconnections,CPU,memory
Interfacedataforeachinterfacesuchasbitsin/out
ResponsetimedataforservicessuchasICMP,DNS,SSH
DiskspaceinformationfromtheucdsnmpMIB
86
Figure64:OpenNMSStandardResourcegraphsavailableforaselectednode
Hereispartofthenodelevelperformancedatasetofgraphs.
87
Figure65:OpenNMSpartialdisplayofthenodelevelperformancedatagraphs
Ifyouwishtocreatemoreselectivesetsofgraphsforotherpeopletouse,theKey
SNMPCustomized(KSC)Reportsmenutocreateyourownreportswhichcaninclude
graphsofselectedMIBvariablesfromonedeviceorcanselectMIBvariablesfrom
differentdevices.UsingtheCreateNewbuttonwillpromptfornodesthathavedata
collectionsconfiguredasChildResources.
88
Figure66:OpenNMSKSCReportsmenu
SelectinganodeandclickingViewchildresourcesresultsinamenuofreport
categories.
89
Figure67:OpenNMSReportcategoriesavailableforcustomisedreports
IfyouselecttheNodelevelPerformanceDataoptionandtheChoosechildresource
buttontheneachoftheMIBvariablescollectedcanbedisplayedandselected.
90
Figure68:OpenNMSSelectingprefabricatedreportstoincludeinacustomisedreport
ThedropdownalongsidethePrefabricatedReportfieldallowsyoutoselectanyof
thedefaultreportstoincludeinyourowncustomisedreports.Youcanincludeseveral
differentgraphs,fromthesameordifferentnodes,inyourKSCreport.
7.4.3 Thresholding
ThethresholdingcapabilityinOpenNMShaschangedfairlysignificantlyovertime
seehttp://www.opennms.org/index.php/Thresholding#Merge_into_collectd.foragood
explanation.
PreOpenNMS1.3.10,collectdcollecteddataandthreshdperformedthresholding
twoseparateprocesses.Thisdesignusedarangeparameterinthreshd
configuration.xmltogetaroundproblemscausedbytheasynchronousmannernature
ofcollectdandthreshd.
OpenNMS1.3.10mergedthethresholdingfunctionalityintocollectdandintroduceda
newparameterintocollectdconfiguration.xml:
<parameterkey=thresholdinggroupvalue=defaultsnmp/>
wherethevalueofthethresholdinggroupmatchedadefinitioninthreshd
configuration.xml.Theneedfortherangeparameterdisappeared.However,to
definedifferentfiltersforthresholding,differentpackageshadtobedefinedin
collectdconfiguration.xml.
91
FromOpenNMS1.5.91,(thispaperisbasedonversion1.5.93),filterscanbedefined
inthreshdconfiguration.xmlsothatpackagesincollectdconfiguration.xmlcanbe
keptsimple.Theparameterinthreshdconfiguration.xmlchanges;thethresholding
groupkeydisappearsandisreplacedby:
<parameterkey=thresholdingenabledvalue=true/>
Hereisthedefaultcollectdconfiguration.xml:
Figure69:OpenNMSDefaultcollectdconfiguration.xml
Thelackofanythresholdingparameterimpliesthatthresholdingisdisabled.
...andthedefaultthreshdconfiguration.xml:
Figure70:OpenNMSDefaultthreshdconfiguration.xml
92
Thedefaultthreshdconfiguration.xmlissetupfortheinterimdesignbetween
versions1.3.10and1.5.90.ForOpenNMS1.5.93,collectdconfiguration.xmlshouldbe
changedasshownbelow:
Figure71:OpenNMSModifiedcollectdconfiguration.xmltoenablethresholds
threshdconfiguration.xmlcanbemodifiedwithdifferentpackagesofthresholdingto
applytodifferentrangesofnodes.
Figure72:OpenNMSModifiedthreshdconfiguration.xml
93
Differentfiltersareappliedtoeachpackage.Thethresholdinggroupparameteris
requiredhereandthevaluepointstoamatchingdefinitioninthresholds.xml,where
theMIBstothresholdandthethresholdvalues,arespecified.
Figure73:OpenNMSModifiedthresholds.xmlforCCsnmpgroupandraddlesnmpgroup
Theattributesofathresholdare:
type:A"high"thresholdtriggerswhenthevalueofthedatasourceexceedsthe
"value",andisrearmedwhenitdropsbelowthe"rearm"value.Conversely,a
"low"thresholdtriggerswhenthevalueofthedatasourcedropsbelowthe
"value",andisrearmedwhenitexceedsthe"rearm"value."relativeChange"is
forthresholdsthattriggerwhenthechangeindatasourcevaluefromone
collectiontothenextisgreaterthan"value"percent.
expression:Amathematicalexpressioninvolvingdatasourcenameswhichwill
beevaluatedandcomparedtothethresholdvalues.Thisisusedin"expression"
thresholding(supportedfrom1.3.3).
dsname:Thenameofthevariabletobemonitored.Thismatchesthenamein
thealiasparameteroftheMIBstatementindatacollectionconfig.xml.
dstype:Datasourcetype.nodefornodeleveldataitems,and"if"for
interfacelevelitems.
dslabel:Datasourcelabel.Thenameofthecollected"string"typedataitemto
useasalabelwhenreportingthisthreshold.Note:thisisadataitemwhose
valueisusedasthelabel,notthelabelitself.
value:Thevaluethatmustbeexceeded(eitheraboveorbelow,dependingon
whetherthisisahighorlowthreshold)inordertotrigger.Inthecaseof
relativeChangethresholds,thisisthepercentthatthingsneedtochangein
ordertotrigger(e.g.'value="1.5"'meansa50%increase).
rearm:Thevalueatwhichthethresholdwillresetitself.Notusedfor
relativeChangethresholds.
94
trigger:Thenumberoftimesthethresholdmustbe"exceeded"inarowbefore
thethresholdwillbetriggered.NotusedforrelativeChangethresholds.
triggeredUEI:AcustomUEItosendintotheeventssystemwhenthis
thresholdistriggered.Ifleftblank,itdefaultstothestandardthresholdsUEIs.
rearmedUEI:AcustomUEItosendintotheeventssystemwhenthis
thresholdisrearmed.Ifleftblank,itdefaultstothestandardthresholdsUEIs.
Bydefault,standardthresholdandrearmeventswillbegeneratedbutitisalso
possibletocreatecustomisedeventswiththethresholdattributes.Thiswouldthen
makeiteasiertogeneratenotificationsforspecificthresholding/rearmevents.
Hereisascreenshotwithstandardeventsgeneratedbythresholdsontheraddle
network:
Figure74:OpenNMSThresholdeventsfromvariousdevicesintheraddlenetwork
ForthosewhoprefernottoeditXMLconfigurationfiles,theOpenNMSAdminmenu
providesaGUIwaytocreateandmodifythresholds.
95
Figure75:OpenNMSAdminmenu
SelectingtheManageThresholdsoptiondisplaysallthresholdscurrentlyconfigured
inthresholds.xml.
96
Figure76:OpenNMSConfiguringthresholdsthroughtheAdminmenu
UsingtheEditbuttonpermitsmodificationofanexistingthreshold.
Figure77:OpenNMSModifyingthresholdsthroughtheAdminGUI
97
toFigure75:OpenNMSAdminmenuforalistoftheareaswhichcanbeconfigured
thisway.
8 Zenoss
ZenossisathirdOpenSource,multifunctionsystemsandnetworkmanagementtool.
UnlikeNagiosandOpenNMS,thereisafree,coreoffering(whichdoesseemtohave
mostthingsyouneed),andZenossEnterprisethathasextraaddongoodies,high
availabilityconfigurations,distributedmanagementserverconfigurationsandvarious
98
supportcontractofferingswhichincludessomeeducation.Foracomparisonofthe
freeandfeealternatives,tryhttp://www.zenoss.com/product/#subscriptions.
Zenossoffersconfigurationdiscovery,includinglayer3topologymaps,availability
monitoring,problemmanagementandperformancemanagement.Itisbasedaround
theITILconceptofaConfigurationManagementDatabase(CMDB),theZenoss
StandardModel.ZopeEnterpriseObjects(ZEO)isthebackendobjectdatabasethat
storestheconfigurationmodel,andZopeisthewebapplicationdevelopment
environmentusedtodisplaytheconsole.TherelationalMySQLdatabaseisusedto
holdcurrentandhistoricalevents.
Zenoss2.2hasrecentlybeenreleasedwhichprovidesstackbuildscomplete
bundlesincludingZenossandallitsprerequisites.Thesestackinstallersare
availableforawidevarietyofLinuxplatforms;standardRPMandsourceformatsare
alsoavailable.Foreasyevaluation,aVMwareappliancecanbedownloaded,readyto
go.
ItriedboththeVMwarebuildandthe2.2stackinstallforSuSE10.3;bothwere
relativelypainless.Therestofthissectionisbasedonthe2.2stackinstallationona
machinewhosehostnameiszenoss.
ToaccesstheWebconsole,pointyourbrowserathttp://zenoss:8080.Thedefaultuser
isadminwithapasswordofzenoss.Thedefaultdashboardiscompletelyconfigurable
butthisscreenshotisclosetothedefault.
99
Figure78:Zenossdefaultdashboard
100
Figure79:Zenossdeviceclasses
TheclassofDeviceshasazPropertiespageasdotheclassesNetwork,Server,Printer,
etc.DeviceswillinitiallybeaddedtotheDiscoveredclassandcanthenbemovedtoa
moreappropriateclass.
101
Figure81:ZenossLinuxServerdevices
Figure80:ZenossServerDeviceclasses
DiscoveryandmonitoringislargelycontrolledbythecombinationofzProperties
appliedtoadevice,ofwhichtherearealargenumber(mostwithsensibledefaults).
Initially,basicSNMPandpingpollingparametersshouldbeconfiguredinthe
zPropertiespageforDevices.
102
Figure82:ZenosszPropertiesfortheDeviceclass(part1)
103
Figure83:ZenosszPropertiesfortheDeviceclass(part2)
104
Figure84:ZenosszPropertiesfortheDeviceclass(part3)
ThelefthandmenusofthewebconsoleprovideanAddDeviceoption(nothingis
discoveredautomatically,outofthebox).
Figure85:ZenossAddDevicesdialogue
Onceadevicehasbeendiscovered(whichbydefaultusesping),ifthediscovery
protocolissettoSNMPthenthedevicewillbequeriedforitsSNMProutingtable.
Anynetworksthatthedevicehasroutestowillthenbeaddedtotheobjectclassof
networks.
105
Figure86:ZenossNetworksclasswithdropdownmenu
Oncethepresenceofanetworkhasbeendiscovered,devicescanautomaticallybe
discoveredonthatnetworkthisusesaspraypingmechanism.Thereisadropdown
menufromthetopleftcorneroftheNetworkspage(whichworksfineforsimpleClass
Cnetworks).AlthoughtheGUIdoesmanagetodisplaysubnetworksaccurately,even
ifthesubnetmaskisnotonabyteboundary,theDiscoverDevicesmenudoesnot
honourthesubnetmask.However,agoodfeatureofZenossisthatthereisa
commandline(CLI)forvirtuallyeverythingandtheCLIfordevicediscoveryona
networkdoeshonoursuppliednetmasks.Forexample:
zendiscrunnet10.0.0.0/24
NotethattheZenossdiscoveryalgorithmisverydependentongettingroutingtables
usingSNMPandtheZenossservermustsupportSNMPitself.
FordevicesthatdonotsupportpingbutdosupportSNMP,theycanbeadded
manuallywiththeAddDevicemenu.ThezPropertiesofthedevice(orclassof
106
devicesifyoucreateasubclass)shouldhavezPingMonitorIgnore=Trueand
zSsnmpMonitorIgnore=False.
TherearethreeZenossprocessesthatimplementdiscovery:
zenmodelercanuseSNMP,sshandtelnettodiscoverdetailedinformation
aboutdevices.zenmodelerwillonlyberunagainstdevicesthathavealready
beendiscoveredbyzendisc.Bydefault,zenmodelerrunsevery6hours.
zenwindetectsWindows(WMI)services
zendiscisasubclassofzenmodeler.IttraversesroutingtablesusingSNMP
andthenusespingtodetectdevicesondiscoverednetworks.
Figure87:ZenossNetworkMapshowing4hopsfromgroup100r1
107
8.2 Availability monitoring
AvailabilitymonitoringinZenosscanuse3differentmethods:
pingtests
implementedviazenping
detectsdeviceavailability
servicetests
implementedviazenstatus
detectsservicesasdefinedbyTCP/UDPports
processtestsandWindowsServicestests
implementedviazenprocess
detectsprocessesusingtheSNMPHostResourcesMIBusingthe
snmp.IpServiceMapzCollectorPlugindrivenbyzenmodeler
detectsWindowsservicesusingWMIusingtheWinServiceMapdrivenby
zenwin
108
Figure88:ZenossCollectors(Monitors)overview
Thedevicesbeingmonitoredareshownatthebottomofthescreen.Tochangeanyof
theseparameters,usetheEdittab.Thedefaultsforavailabilitymonitoringare:
Pingcycletimepolling 60sec
Pingtimeout 1.5sec
Pingretries 2
Status(TCP/UDPservice)pollinginterval 60sec
Process(SNMPHostResources)pollinginterval 180sec
SNMPperformancecycleinterval 300sec
WhatavailabilitychecksarecarriedoutonadeviceiscontrolledbythezPropertiesof
thatdevice,rememberingthatzPropertiescanbesetatanyleveloftheobject
hierarchy.Bydefaultthe/DevicesclasshaszPingMonitorIgnore=Falseand
zSnmpMonitorIgnore=Falsesoeverydevicewillgetpingpollingat1minuteintervals
andSNMPpollingat5minuteintervals.
109
8.2.2 Availability monitoring of services - TCP / UDP ports and windows
services
ServicemonitoringforTCP/UDPportsandWindowsservices,isconfiguredthrough
theServicesmenu.
Figure89:ZenossServicesmenu
AverylargenumberofWindowsservicesarepreconfiguredoutofthebox.These
servicesareactuallymonitoredbythezenwindaemonwhichuses(andrequires)WMI
ontheWindowstargetmachine.NotetheCountcolumnshowingonhowmany
devicestheseserviceshavebeendetected
110
Figure90:ZenossWindowsservices
EvenmoreIPservicescomeconfiguredoutofthebox.TherearetwosubclassesofIP
servicesPrivilegedandRegistered;eithercanmonitoreitherTCPorUDPports.
111
Figure91:ZenossPrivilegedIPservices
Again,notetheCountcolumn.Clickingontheservicenameshowswherethe
servicehasbeendetected:
Figure92:Zenossdevicesrunningthedomain(DNS)serviceonTCP53orUDP53
112
Thefactthataservicehasbeendetecteddoesnotimplythatitisbeingmonitoredfor
availability(thedefault,outofthebox,isthatnothingismonitored).TheMonitor
columnfordevicesshowswhetheractivemonitoringistakingplace(andhenceevents
potentiallybeinggenerated).TheMonitorfieldinthetoppartofthewindowshows
theglobaldefaultforthisservice.
Toturnonservicemonitoringgloballyforaparticularservice,usetheServicesmenu
tofindtheserviceinquestion.YoucanthenuseeitherthezPropertiestaborthe
EdittabtochangetheMonitorglobaldefaulttoTrue(thedefault,asshipped,is
False).
Toturnonservicemonitoringforaspecificdevice,accessthemainpageforadevice
andopentheOStab.UndertheIPServicessection,clickontheNamecolumn
headertoseeservicesdetected.Clickontheservicenamewhichbringsuptheservice
statuswindowforthedevicewheretheMonitorfieldcanbechangeddon'tforgetto
clicktheSavebutton.NotethattheMonitoredboxintheIPServicesheadingbar
canbeusedtotogglethedisplaybetweendetectedservicesandmonitoredservices.
NotethatthedropdownmenutoAddIpServiceisdrivenbytypinginapartial
matchoftheservicenameyouwantthesubsequentdropdownthenshows
configuredservicesthatmatchyourselection.
113
Figure93:ZenossProcesseswithdropdownmenu
Supplyaprocessnameanditwillbeaddedtothelist.Tomodifythedefinitionofthe
process,clickontheprocessnameandselecttheEdittab.
Figure94:Zenossdialogueformodifyingprocessdefinition
TomodifythezPropertiesofaprocess,usethezPropertiestab.
114
Figure95:ZenosszPropertiesforthefirefoxprocess
Toapplyprocessmonitoringtoadevice,fromtheOStabofthedevicepage,selectthe
dropdownmenuandusetheAddOSProcessmenu.Definedprocessesareselectable
fromthedropdownwindow.
Figure96:ZenossAddOSProcessmonitoringtoaspecificdevice
115
Notethattherearecurrently(July4th,2008)acoupleofbugstodowithprocess
monitoringwherebyprocessesdisappearfromtheOStabofadeviceand/orshowthe
wrongstatus(tickets#3408,#3399,#3270).Tomitigateagainstthese,thezenprocess
daemonshouldbestoppedandrestartedwhenevermodificationshavebeenmadeto
dowithprocesses.YoucanusetheGUIbychoosingSettingsandselectingthe
Daemonstab.
Temporarily,itwouldalsobewisetousethemenufortheprocessandselecttoLock
theprocessfromDeletion.
Moresophisticatedavailabilitymonitoringcanbeimplementedusingstandard
zCollectorPluginsnotethatthesearemodellingpluginsasdistinctfrom
performanceplugins.zCollectorpluginsareappliedtodeviceclassesordevices
throughthezPropertiestabusetheEditlinkalongsidezCollectorPluginstoshow
ormodifythepluginsappliedandavailable.
Figure97:ZenosszCollectorPlugins
NotethattheAddFields/HideFieldsappearsgreyedoutbutdoesactuallywork.The
pluginsshownontheleftinthescreenshotabovearethedefaultforthe/Devicesclass.
The/Devices/ServerclasshasseveralmoreSNMPbasedplugins,bydefaultand
the/Devices/Server/Windowsclasshasanextrawmi.WinServiceMapplugin.
Documentationonthesepluginsseemsalittlesparsebuthereareafewclues:
116
Figure98:Zenossdefaultpluginsforclass/Devices/Server/Windows
zenoss.snmp.InterfaceMap usesSNMPtoqueryforinterfaceinfo
zenoss.snmp.IpServiceMap zenstatusdaemonqueriesTCP/UDPportinfo
zenoss.snmp.HRSWRunMap usesSNMPtogetprocessinfofromHost
resourcesMIB
zenoss.wmi.WinServiceMap zenwindaemonusesWMItoqueryforWindows
services
Onewaytofindwhatpluginsareappliedbydefaulttodeviceclassesistoinspectthe
migrationscriptsupplied
in/usr/local/zenoss/zenoss/Products/ZenModeler/migrate/zCollectorPlugins.py.
Toseewhatpluginsareactiveonaspecificdevice,usethedevicesmainpagemenu
andselecttheMoremenutofindtheCollectorPluginsmenu.
117
Figure99:ZenosszCollectorPluginsfordevicegroup100r1.class.example.org
Whenmodifyingcharacteristicsforspecificdevices,donotethatthemainpagemenu
(fromthearrowdropdownatthetopleftcorner)hasbothaMoresubmenu(which
includeszPropertiesamongotherthings)andaManagesubmenu.
118
Figure100:ZenossDeviceMoresubmenu
Figure101:ZenossDeviceManagesubmenu
119
8.2.4 Running commands on devices
AfewCommandsaredefinedoutoftheboxandcanbeseenusingthelefthand
SettingsmenuandthenselectingtheCommandstab.Newcommandscanbe
addedusingtheAddUserCommanddropdownmenu.
Figure102:ZenossCommandsprovidedoutofthebox
Fromadevice'smainpage,thereisasubmenutoRunCommands.
Figure103:ZenossRunCommandsforaparticulardevice
120
Althoughmuchoftheavailabilitymonitoringthathasbeendemonstratedsofarrelies
onSNMP,itisalsopossibletousesshortelnettocontactremotedevicesandrun
monitoringscriptsonthem.
Bydefault,statuseventsofseveritybelowError,areagedouttotheEventHistory
databaseafter4hours.Historicaleventsareneverdeleted.
121
Figure104:ZenossEventManagerconfiguration
122
Figure105:ZenossEventConsole
FromtheConsole,eventscanbeselectedbycheckingtheboxalongsidetheeventand
thedropdowncanbeusedforvariousfunctionsincludingAcknowledgeandMove
toHistory.ThedropdowncanalsobeusedtogenerateanytesteventwiththeAdd
Eventoption(ifyouareaCLIpersonratherthanaGUIperson,thezensendevent
commandisalsoavailable).
ThecolumnheadersoftheEventConsolecanbeusedtochangethesortingcriteria
andtheiconatthefarrightoftheeventcanbeusedtodisplaythedetaileddataof
fields.
123
EventsareorganisedinclasshierarchieswhichhavezProperties,justlikeDevices.
Tomodifythepropertiesofanevent,selecttheEventsoptionfromthelefthand
menu.
Figure106:ZenossEventclassesandsubclasses
Tomodifythecontextofanyevent,selecttheeventandusethezPropertiestab.
Figure107:ZenosszPropertiesfortheeventclass/Event/Status/OSProcess
124
EventsaremappedtoEventClassesbyEventClassinstances.EventClassinstances
arelookedupbyanonuniquekeycalledEventClassKey.Whenaneventarrivesit
is:
Parsed
Assignedtotheappropriateclassandclasskey
Contextisthenapplied:
EventcontextisdefinedinthezPropertiesofaneventclass
Aftertheeventcontexthasbeenapplied,thenthedevicecontextisapplied
wherebytheProductionState,Location,DeviceClass,DeviceGroups,and
Systems,areallattachedtotheeventintheeventdatabase.
Oncethesepropertieshavebeenassociatedwiththeevent,Zenossattemptsto
updatethezEventProperties. Thisallowsaparticulardeviceorclassofdevices
tooverridethedefaultvaluesforanygivenevent.
Tochangetheeventmapping,selecttheeventclassandusetheMappingstab.
Figure108:ZenossEventmapping
TheEdittaballowseditingofanyofthesefields.
125
configuredeventclassbyselectingtheoccurrenceoftheeventandusingthepulldown
menutoselectMapEventstoClasspickthecorrectclassfromthescrollablelist.
Itisalsopossibletocreateneweventclasses.StartingfromEventsontheleftmenu,
navigatetotheplaceintheeventclasshierarchyunderwhichyouwanttocreatea
newclassandusethedropdownmenutoAddNewOrganizerandgivetheclassa
uniquename.
Figure109:Zenossmenutocreateaneweventclass
126
Figure110:ZenossmenutocreateAlertingRule
UsingtheEdittabpermitschangesofexistingalertingrules.Differentrulescanbe
appliedbasedonacombinationofseverity,eventstate,productionstateandamore
genericfilter.TheProductionStateisassignedtoadeviceordeviceclass:
Production
PreProduction
Test
Maintenance
Decommissioned
TheProductionStatecanbesetorchangedusingtheEdittabfromadevicemain
page.ThedefaultisProduction.TheProductionStateattributecanbeusedto
controlwhetheradeviceismonitoredatall,whetheralertsaresentandwhethera
deviceisrepresentedontheZenossmaindashboard.Itisverysimpletomodifythe
ProductionStatetoputadeviceorclassofdevicesintomaintenance,forexample.
127
Figure111:ZenossEditingalertingrule
TheemailorpagermessageoftheAlertingRuleisconfiguredbytheMessagetab
andtheScheduletabcanbeusedtocreatedifferentalertingrulesatdifferenttimes.
128
Figure112:ZenossAlertingrulemessageformat
Globalparametersforemailandpaging,alongwithotherusefulparameters,canbe
definedfromtheSettingslefthandmenu.
129
Figure113:ZenossSettingsparameters
TheoutoftheboxemailnotificationsprovidehandylinksbacktoZenossto
manipulatetheeventthatisbeingreportedon.
130
Figure114:Zenossemailgeneratedbyeventnotification,includinglinks
Figure115:ZenossEventCommanddefinition
131
8.4 Performance management
ZenosscancollectperformancedataandthresholditusingeitherSNMP(throughthe
zenperfsnmpdaemon)orbycommands(typicallyssh),usingthezencommanddaemon.
ThedataisstoredanddisplayedusingRRDTool.
Figure116:ZenossAllTemplatesshowingalldefinedperformancetemplates
WiththeexceptionofthetemplateswithHRMIBinthename,theabovefigure
showsthedefaulttemplatesasshipped.Notethatthesearedefinedtemplates
thereisnoindicationhereastowhichareactiveonwhatobjects.
NoteinthescreenshotabovethatthereareseveraltemplatescalledDevice.
Templatescanbeboundtoadeviceordeviceclasstomakeitactive.When
132
determiningwhatdatatocollect,thezenperfsnmp(orzencommand)daemonfirst
determinesthelistofTemplatenamesthatareboundtothisdeviceorcomponent.
Fordevicecomponentsthisisusuallyjustthemetatypeofthecomponent(e.g.
FileSystem,CPU,HardDisk,etc.)Fordevices,thislististhelistofnamesinthe
device'szDeviceTemplateszProperty.
Figure117:ZenosszPropertiesshowingzDeviceTemplate
Thedefault,outofthebox,isthatthedevicetemplatecalledDeviceisboundtoeach
devicediscovered.Asnotedinthepreviousscreenshot,thereareseveraltemplates
calledDevice.TheDevicetemplatefortheclass/DevicessimplycollectssysUpTime.
ThetemplatecalledDevicefor/Devices/Servercollectsanumberofparameters
supportedbythenetsnmpMIB.ThetemplatecalledDevice
for/Devices/Server/WindowscollectsvariousMIBvaluesfromtheInformantMIB.
ForeachtemplatenameZenosssearchesfirstthedeviceitselfandthenuptheDevice
Classhierarchylookingforatemplatewiththatname.Zenossusesthefirsttemplate
thatitfindswiththecorrectname,ignoringotherswiththesamenamethatmight
existfurtherupthehierarchy.
133
So,thezenperfsnmpdaemonwillcollectnetSNMPMIBinformationforUnix/Linux
serversandwillcollectInformantMIBinformationforWindowsservers
(as/Devices/Server/Windowsismorespecificthan/Devices/Server).Anyactualdevice
canhavealocalcopyofatemplateandchangeparameterstosuitthatspecificdevice.
TemplatebindingscaneitherbemodifiedbychangingthezProperties
zDeviceTemplatesfieldorthereisaBindTemplatesmenudropdownfromthe
templatesdisplayofanydevice.(Dorememberthat,foradevice,boththeTemplates
menuandthezPropertiesmenuareofftheMoredropdownsubmenu).
Figure118:ZenossBindTemplatesmenu
Beawarethatwhenselectingtemplatestobind,youneedtoselectallthetemplates
youwantbound(usetheCtrlkeytoselectmultiples).
So,whatdothesetemplatesactuallyprovide?
Templatescontainthreetypesofsubobjects:
Datasources whatdatatocollectandmethodtouseeg.MIBOID
Thresholds expectedboundsfordataandeventstoraiseifbreached
Graphdefinitions howtographthedatapoints
134
Figure119:ZenossDevicetemplatefor/Devices/Server
ZenossprovidestwobuiltintypesofDataSources,SNMPandCOMMAND.Other
typescanbeprovidedthroughZenPacks.ClickingontheDataSourcedisplaysdetails
whichcanthenbemodified.TypicallyanSNMPDataSourcewillprovideasingle
DataPoint(aMIBOIDvalue).Typicallythenameofthedatapointwillbethesame
asthenameofthedatasource.Thismeansthatwhenyoucometoselectthreshold
valuesorvaluestograph,youwillbeselectingnameslike
ssCpuRawWait_ssCpuRaw_wait.
Figure120:ZenossDataSourcememAvailReal
135
NotethatthereisausefulTestbuttontocheckyourOIDagainstanodethatZenoss
knowsabout.However,bewarethatthisTestbuttonappearstousesnmpwalkunder
thecoverssoifaMIBOIDhasmultipleinstancesthenthesnmpwalkwillreturn
valuessuccessfully.Whenzenperfsnmpactuallycollectsdata,itrequiresthecorrect
instanceaswellasthecorrectMIBOID.Ifyourtestissuccessfulbutyou
subsequentlyseeemptygraphswithamessageofMissingRRDfilethenthe
problemislikelytobethattheMIBinstanceisincorrect.
DatasourcescanbeaddedordeletedwiththedropdownAddDataSourceand
DeleteDataSourcemenus.
Thresholdscanbeappliedtoanyofthedatapointscollected,alongwitheventsto
generateifthethresholdisbreached.
Figure121:ZenossThresholdonCPUcollecteddata
Allofthedatapointsdefinedinthedatasourcessectionaresuppliedinthetop
selectionbox.Ifaneventistobegenerated,dropdownsareprovidedtoselectthe
eventclassandseverity.Youcanalsospecifyanescalationcount.
ThresholdscanbeaddedordeletedfromtheThresholdsdropdownmenu.
136
Figure122:ZenossDropdownmenufordatathresholds
Notethatthisdropdownmenu(asisalsotrueoftheDataSourcesdropdown)hasan
optiontoAddtoGraphs.
Graphscanbedefinedforawidecombinationofthecollecteddatapointsand
thresholds.ThemenupanelsarebasicallyafrontendtotheRRDgraphingtooland,
withlotsofsamplesprovided,youdon'tneedtogetintothedetailsofRRDTool;
howeverifyouwishto,thereisplentyofscopetodoso.
Graphscanbeadded,deletedorresequencedusingthedropdown.Existinggraphs
aremodifiedbyclickingonthegraphname.
137
Figure123:ZenossPerformancetemplategraphdefinition
Notethatgraphscandisplaybothdatapointsandthresholds.
Allgraphsarestored,bydefault,under/usr/local/zenoss/zenoss/perf/Devices.Thereis
asubdirectoryforeachdevice.Componentdatarrdfilesareundertheossubdirectory
withfurthersubdirectoriesforfilesystems,interfacesandprocesses.
138
Figure124:ZenossPerformancegraphsforeth1interfaceonbino
YoucanchangetherangeofdatawiththeHourlydropdown(todaily,weekly,
monthlyoryearly).Datacanbescrolledusingthe<>barsateithersideandthe+
andmagnifierscanbeusedtozoomin/out.Bydefault,allgraphsonthepageare
linked(sothatifyouchangetherangeonone,itchangesforall).Theycanbede
coupledwiththeLinkGraphs?checkbox.
HereisapartialscreenshotofthegraphsforbinounderthePerftab.
139
Figure125:ZenossPerformancegraphsavailableunderthePerftabforbino
NotethattheReportslefthandmenualsoprovidesaccesstovariousreports,
includingperformancereports.
140
Figure126:ZenossReportsmenu
FollowingthePerformanceReportslinkprovidesaccesstoallperformancereports
foralldevices.
Figure127:ZenossPerformanceReportsmenu
141
ZenosswilluseSNMPtogainstatusandperformanceinformationfromadevicebutit
alsohassshandtelnetasalternatives,forthosedeviceswhereSNMPis
inappropriate.
TheQuickStartGuidegetsyourunningfastandtheAdminGuideprovideswhatit
saysareasonablecomprehensiveAdministrator'sGuide.Thereisalsoabookby
MichaelBadger,publishedJune2008,ZenossCoreNetworkandSystem
Monitoring,whichiswellworththeinvestment(availablebothinpaperandin
electronicformat).However,onefeelsthatthereissomuchmoreinthedetailof
Zenossthatoneneedstoknowandcanfindnoinformationon!
MyonlyrealnegativecommentonZenoss,otherthanthelackofdetailedtechnical
information,isthatitisarapidlyevolvingproductanditfeelsratherbuggy.The
current(August2008)pollonthezenossusersforumforinputtoZenoss2.3,has
manyrequesterswithcodereliabilityandbetterdocumentationatthetopoftheir
lists!
142
Thereareadvocatesforandagainstagentlessmonitoring.Personally,Idon't
believeinagentless.Onceyouhavegotpastpingthenyouhavetohavesomeform
ofagenttodomonitoring.Thequestionis,shouldamanagementparadigmusean
agentthatistypicallypartofaboxbuild(likessh,SNMPorWMIforWindows),or
shouldthemanagementsolutionprovideitsownagent,likeNagiosprovidesNRPE
(andmostofthecommercialmanagementproductscomewiththeirownagents).If
yourmanagementsystemwantsitsownagents,youthenhavethehugeproblemof
howyoudeploythem,checktheyarerunning,upgradethem,etc,etc.OpenNMSand
ZenosshaveastrongdependencyonSNMPalthoughZenossalsosupportssshand
telnetmonitoring,outofthebox(ifyourenvironmentpermitsthese).SNMPmaybe
oldandSimple,butallthreeproductssupportSNMPV3(forthosewhoareworried
aboutthesecurityofSNMP)andvirtuallyeverythinghasanSNMPagentavailable.
Theotherformofagentlessmonitoringbasicallycomesdowntoportsniffingfor
services.Whilstthiscanworkfineforsmallerinstallations,thensquarednatureof
lotsofdevicesandlotsofservicesdoesn'tscaletoowell.Allthreeproductsdoport
sniffingsoitcomesdowntohoweasyitistoconfigureeconomicmonitoring.
9.1.1 Discovery
Nagios OpenNMS Zenoss
Nodediscovery Configfileforeach Configfilewith GUI,CLIandbatch
node include/exclude importfromtextor
ranges XMLfile
Automatic No Yesnodeswithin Yesnetworks&nodes
discovery configuredn/wranges
Interface Possiblethrough Yesincludingswitch Yesincludingswitch
discovery configfile ports ports
Discovernodes Yesuse Yessend_event.pl YesuseSNMP,sshor
thatdon't check_ifstatus telnet
supportping plugin
SQLDatabase No PostgreSQL mySQL&ZopeZEO
Service(port) Yesuseplugin Yesvariousoutof YesTCPandUDP
discovery (TCP,UDP,....) thebox
Application Yesdefineservice Notwithoutextra Yeswithssh,
discovery agenteg.NRPE zenPacksorplugins
143
Nagios OpenNMS Zenoss
Supports Yes Yes Possible
NRPE/
NSClient
SNMPsupport V1,2&3 V1,2&3 V1,2&3
L3topology Yes No Yesupto4hops
map
L2topology No No No(butmaybein
map plan!)
144
Nagios OpenNMS Zenoss
Event No Flexible.LotsOOTB Flexible.LotsOOTB
configuration
SNMPTRAP No Flexible.LotsOOTB Flexible.LotsOOTB
handling
email/pager Yes Yeswith Yes
notifications configurable
escalation
Automation autoactionson autoactionson
events events
goodnews/badnews goodnews/badnews
correlationonalarms correlationonevents
andnotifications andnotifications
Deduplication Noautomaticrepeat Yes Yes
countmechanismbut
eventsdonotcontinue
toberaisedfor
existingproblems
Service/host Yes No
dependencies
Rootcause UNREACHABLE Outages/Path No
analysis statusfordevices outages
behindnetworksingle
pointoffailure.
Also,host/service
dependencies.
145
Nagios OpenNMS Zenoss
Threshold No Yes Yes
performancedata
Graph No Yeslotsprovided Yeslotsprovided
performancedata OOTB OOTB
MIBcompiler No No Yes
MIBBrowser No No No(thoughaMIB
BrowserZenPackis
saidtobeavailable
for2.2)
Goodpoints Badpoints
Good,stablecodeforsystems Noautodiscovery
management
Goodcorrelationbetweenservice Weakeventconsole
eventsandhostevents
Commandtocheckvalidityofconfig NoOOTBcollectionorthresholdingof
files performancedata
Commandtoreloadconfigfileswithout NoeasywaytoreceiveandinterpretSNMP
disruptingNagiosoperation TRAPs
Gooddocumentation NoMIBcompilerorbrowser
Goodpoints Badpoints
GoodOOTBfunctionality WritteninJavalogfileshopeless!Difficult
togetindividualdaemonstatus
Codefeelssolid Nomap(thatworksreasonably)
Clean,standardconfigurationthrough GUIiswordydifficultfortheeyetofocus
wellorganisedxmlfiles ontheimportantthings
146
Goodpoints Badpoints
Singledatabase(PostgreSQL) NeedtobounceentireOpenNMSwhen
almostanyconfigfileischanged
LOTSoftrapcustomisationOOTB Event/alarm/notificationarchitectureis
currentlyamess(underreview)
Abilitytodosomeconfiguration Nowaytochangecoloursofevents
throughwebAdminmenu
EasyimportofTRAPMIBs NoMIBcompilerorbrowser
(mib2opennms)
ChargeablesupportavailablefromThe
OpenNMSGroup
Nopdfdocumentation.Wikihardtofind
detailedinformation.
SupportsNagiosplugins
SomegoodHowtodocumentsforbasic Lotsofthingsundocumentedwhenyouget
configurationonthewiki downtodetails.
Goodpoints Badpoints
GoodOOTBfunctionality Nocorrelationbetweenserviceeventsand
hostevents
Architecturegoodbasedaroundobject Implementationfeelsbuggy
orientedCMDBdatabase
Topologymap(upto4hops)
Lotsofplugins&zenPacksavailable NoMIBbrowser
emailnotificationsincludeURLlinks Nowaytochangecoloursofevents
backtoZenoss
Commercialversionavailable Commercialversionavailable
GoodQuickStartmanual, Lotsofthingsundocumentedwhenyouget
Administratorsmanualandbook downtodetails
SupportsNagios&Cactiplugins
147
9.3 Conclusions
Whattochoose?Backtoyourrequirements!
Forsmallish,systemsmanagementenvironments,Nagiosiswelltestedandreliable
withahugecommunitybehindit.Foranythingmorethansimplepingchecksplus
SNMPchecks,bearinmindthatyoumayneedawaytoinstallremotepluginson
targethosts.Notificationsarefairlyeasytosetupbutifyouneedtoproduceanalysis
onyoureventlogthenNagiosmaynotbethebestchoice.
OpenNMSandZenossarebothextremelycompetentproductscoveringautomatic
discovery,availabilitymonitoring,problemmanagementandperformance
managementandreporting.Zenosshassometopologymappingandhasbetter
documentationbutthecodefeelslessreliable.OpenNMScurrentlyhasarather
messyarchitecturearoundevents,alarmsandnotifications,thoughthisissaidtobe
underreview.Ialsostruggletobelievethatyouhavetorecyclethewholeof
OpenNMSifyouhavechangedaconfigurationfile!Thecodefeelsverystablethough.
Mychoice,hopingferventlythatcodereliabilityanddocumentationimproves,is
Zenoss.
148
10 References
1. itSMFPocketGuide:ITServiceManagementaCompaniontoITIL,IT
ServiceManagementForum
2. MultiRouterTrafficGrapher(MRTG)byTobiOetiker,
http://oss.oetiker.ch/mrtg/
3. RRDtoolhighperformancedataloggingandgraphingsystemfortimeseries
datahttp://oss.oetiker.ch/rrdtool/
4. netdisconetworkmanagementapplicationhttp://www.netdisco.org/
5. TheDudenetworkmonitorbyMicroTik,http://www.mikrotik.com/thedude.php
6. nagioshost,serviceandnetworkmonitoringprogramhttp://www.nagios.org/
7. Zenossnetwork,systemsandapplicationmonitoringhttp://www.zenoss.com/
8. OpenNMSdistributednetworkandsystemsmanagementplatform
http://www.opennms.org/
9. cactinetworkgraphingsolutionhttp://www.cacti.net/
10. SNMPRequestsForComment(RFCs)http://www.ietf.org/rfc.html
11. V1RFCs1155,1157,1212,1213,1215
12. V2RFCs2578,2579,2580,3416,3417,3418
13. V3RFCs25782580,341618,3411,3412,3413,3414,3415
14. SNMPHostResourcesMIB,RFCs1514and2790http://www.ietf.org/rfc.html
15. PHPscriptinglanguagehttp://www.php.net/
16. ZenossCoreNetworkandSystemMonitoringbyMichaelBadger,published
byPACKTPublishing,June2008,ISBN9781847194282.
149
MySQL(5.0.4522)
Cacti,aswellasalloftheprerequisites,wereavailableontheOpenSuSE10.3
standarddistributionDVD.
UsetheInstallationunderUnixinstructionsavailablefrom
http://www.cacti.net/downloads/docs/html/install_unix.html.
Afewmodificationswererequiredsuchas:
NoPHP5configurationwasdoneasthefilesdocumentedintheinstallation
guidedidnotexist
ConfigurationofApache2requirednomodifications
in/etc/apache2/conf.d/php5.conf
CactiwasinstalledusingthestandardSuSEYastmechanism
CreatetheMySQLdatabaseby:
cd/usr/share/cacti
mysqluser=rootp(andsupplytherootpasswordwhenprompted)
createdatabasecacti;
sourcecacti.sql;
GRANTALLONcacti.*TOcactiuser@localhostIDENTIFIEDBY
'cacti';
(Notethatcactiintheabovecommandisthepasswordfortheuser
cactiuser)
YouneedtomanuallycreatetheOperatingSystemusercactiuserwith
passwordcacti
Whenpointingyourwebbrowserathttp://<yourserver>/cacti/ensurethatyou
includethetrailingslash.Useaweblogonofadmin,passwordadmin.
Ensurethatapache2andmysqlareeithermanuallystarted(/etc/init.d/<name>
start)orstartthemautomaticallyatsystemstartusingchkconfig
Ensurethatthecactiuseruseridcanexecutethe/usr/share/cacti/poller.php
scriptthatisrunby/etc/crontab.
AlsoensurethatthedirectorythattheRRDdataiswrittento(/var/lib/cacti)is
writeablebythisuser.
cacti.logisin/var/log/cacti
Ifound(through/var/log/messages)thatpoller.phpwasbeingruntwice,oncein
/etc/crontabascactiuserandoncein/etc/cron.d/cactiasuserwwwrun
commentoutthelinein/etc/cron.d/cactiandcheckagainthatcactiusercan
writetothedatafilesin/var/lib/cacti.
150
Theinitialconsolepageisagoodstartingpointtoadddevicestomonitorand
associatedgraphs.
151