You are on page 1of 151

Open Source Management Options

September 30th, 2008


Jane Curry
Skills 1st Ltd
www.skills-1st.co.uk

JaneCurry
Skills1stLtd
2CedarChase
Taplow
Maidenhead
SL60EU
01628782565

jane.curry@skills1st.co.uk

1
Synopsis
Nutsandboltsnetworkandsystemsmanagementiscurrentlyunfashionable.The
emphasisisfarmoreonprocessesthatimplementservicemanagement,drivenby
methodologiesandbestpracticessuchastheInformationTechnologyInfrastructure
Library(ITIL).Nonetheless,allservicemanagementdisciplinesultimatelyrelyona
waytodeterminesomeofthefollowingcharacteristicsofsystemsandnetworks:
Configurationmanagement
Availabilitymanagement
Problemmanagement
Performancemanagement
Changemanagement
Securitymanagement
Thecommercialmarketplaceforsystemsandnetworkmanagementofferingstendto
bedominatedbythebigfourIBM,HP,CAandBMC.Eachhavelarge,modular
offeringswhichtendtobeveryexpensive.Eachhasgrowntheirportfoliobybuying
upothercompaniesandthenperformingsomelevelofintegrationbetweentheir
respectivebrandedproducts.Onecanarguethattheresultingofferingstendtobe
marketechturesratherthanarchitectures.
ThispaperlooksatOpenSourcesoftwarethataddressesthesamerequirements.
OfferingsfromNetdisco,CactiandTheDudeareexaminedbriefly,followedbyanin
depthanalysisofNagios,OpenNMSandZenoss.
Thispaperisaimedattwoaudiences.Foradiscussiononsystemsmanagement
selectionprocessesandanoverviewofthreemainopensourcecontenders,readthe
firstfewchapters.Thelastfewchaptersthenprovideaproductcomparison.
ForthosewhowantlotsmoredetailonNagios,OpenNMSandZenoss,themiddle
sectionsprovideindepthdiscussionswithplentyofscreenshots.

2
Table of Contents
1DefiningSystemsManagement....................................................................................5
1.1Jargonandprocesses................................................................................................5
1.2SystemsManagementforthispaper....................................................................6
2Systemsmanagementtools.............................................................................................6
2.1Choosingsystemsmanagementtools......................................................................7
2.2TheadvantagesofOpenSource...............................................................................8
3OpenSourcemanagementofferings...............................................................................8
4CriteriaforOpenSourcemanagementtoolselection.................................................10
4.1Generalrequirements.............................................................................................10
4.1.1MandatoryRequirements...............................................................................10
4.1.2DesirableRequirements..................................................................................10
4.2Definingnetworkandsystemsmanagement.....................................................11
4.2.1Networkmanagement.....................................................................................11
4.2.2Systemsmanagement......................................................................................12
4.3Whatisoutofscope?..............................................................................................13
5AquicklookatCacti,TheDudeandnetdisco..............................................................14
5.1Cacti.........................................................................................................................14
5.2netdisco....................................................................................................................17
5.3TheDude..................................................................................................................20
6Nagios..............................................................................................................................21
6.1ConfigurationDiscoveryandtopology................................................................22
6.2Availabilitymonitoring...........................................................................................27
6.3Problemmanagement.............................................................................................32
6.3.1Eventconsole....................................................................................................33
6.3.2Internallygeneratedevents............................................................................37
6.3.3SNMPTRAPreceptionandconfiguration.....................................................39
6.3.4Nagiosnotifications........................................................................................39
6.3.5Automaticresponsestoeventseventhandlers..........................................41
6.4Performancemanagement......................................................................................42
6.5Nagiossummary.....................................................................................................45
7OpenNMS........................................................................................................................46
7.1ConfigurationDiscoveryandtopology................................................................47
7.1.1Interfacediscovery...........................................................................................47
7.1.2Servicediscovery..............................................................................................48
7.1.3Topologymappinganddisplays......................................................................51
7.2Availabilitymonitoring...........................................................................................53
7.3Problemmanagement.............................................................................................59
7.3.1Eventconsole....................................................................................................59
7.3.2Internallygeneratedevents............................................................................62
7.3.3SNMPTRAPreceptionandconfiguration.....................................................65
7.3.4Alarms,notificationsandautomations..........................................................69

3
7.4Performancemanagement......................................................................................76
7.4.1Definingdatacollections.................................................................................76
7.4.2Displayingperformancedata..........................................................................85
7.4.3Thresholding....................................................................................................91
7.5ManagingOpenNMS..............................................................................................97
7.6OpenNMSsummary...............................................................................................98
8Zenoss..............................................................................................................................98
8.1ConfigurationDiscoveryandtopology..............................................................100
8.1.1Zenossdiscovery.............................................................................................100
8.1.2Zenosstopologymaps....................................................................................107
8.2Availabilitymonitoring........................................................................................108
8.2.1Basicreachabilityavailability......................................................................108
8.2.2AvailabilitymonitoringofservicesTCP/UDPportsandwindowsservices
...................................................................................................................................110
8.2.3Processavailabilitymonitoring....................................................................113
8.2.4Runningcommandsondevices.....................................................................120
8.3Problemmanagement...........................................................................................121
8.3.1Eventconsole.................................................................................................122
8.3.2Internallygeneratedevents..........................................................................123
8.3.3SNMPTRAPreceptionandconfiguration...................................................125
8.3.4email/pageralerting....................................................................................126
8.3.5Eventautomations.........................................................................................131
8.4Performancemanagement....................................................................................132
8.4.1Definingdatacollection,thresholdingandgraphs.....................................132
8.4.2Displayingperformancedatagraphs...........................................................138
8.5Zenosssummary....................................................................................................141
9ComparisonofNagios,OpenNMSandZenoss...........................................................142
9.1Featurecomparisons.............................................................................................143
9.1.1Discovery........................................................................................................143
9.1.2Availabilitymonitoring.................................................................................144
9.1.3Problemmanagement....................................................................................144
9.1.4Performancemanagement............................................................................145
9.2Producthighpointsandlowpoints....................................................................146
9.2.1Nagiosgoodiesandbaddies.....................................................................146
9.2.2OpenNMSgoodiesandbaddies...............................................................146
9.2.3Zenossgoodiesandbaddies.....................................................................147
9.3Conclusions............................................................................................................148
10References...................................................................................................................149
11AppendixACactiinstallationdetails.....................................................................149

4
1 Defining Systems Management
1.1 Jargon and processes
Everyorganisationandindividualhastheirownperspectiveonsystemsmanagement
requirements;thefirstessentialstepwhenlookingforsystemsmanagementsolutions
istodefinewhatthoserequirementsare.Thisgivesameanstomeasuresuccessofa
project.
Therearemanydifferentmethodologiesanddisciplinesforsystemsmanagementfrom
theInternationalStandardsOrganization(ISO)FCAPSacronymFault,
Configuration,Accounting,PerformanceandSecurity,throughtotheInformation
TechnologyInfrastructureLibrary(ITIL)whichdividestheITILV2frameworkinto
twocategories:
ServiceSupportwhichincludesthe:
ServiceDeskfunction
Incidentmanagementprocess
Problemmanagementprocess
Configurationmanagementprocess
Changemanagementprocess
Releasemanagementprocess
ServiceDeliverywhichincludesthe:
ServiceLevelmanagementprocess
Capacitymanagementprocess
ITServiceContinuitymanagementprocess
Availabilitymanagementprocess
FinancialmanagementforITservices
KeytothecoreofconfigurationmanagementandtheentireITILframeworkisthe
conceptoftheConfigurationManagementDatabase(CMDB)whichstoresand
maintainsConfigurationItems(CIs)andtheirinterrelationships.
Theartofsystemsmanagementisdefiningwhatisimportantwhatisinscope,and
perhapsmoreimportantly,whatiscurrentlyoutofscope.Thescienceofsystems
managementisthentoeffectively,accuratelyandreliablyprovidedatatodeliveryour
systemsmanagementrequirements.Thedevilreallyisinthedetailhere.A
comprehensivesystemsmanagementtoolthatdeliversathousandmetricsoutof
theboxbutwhichisunreliableand/ornoteasilyconfigurable,issimplyarecipefora
projectthatisdeliveredlateandoverbudget.

5
ForsmallerprojectsorSmall/MediumBusiness(SMB)organisations,apragmatic
approachisoftenhelpful.Manypeoplewillwantasayinthedefinitionof
management.Others,whoserequirementsmaybeequallyvaluable,maynotknow
theartofthepossible.Hence,combiningtopdownrequirementsdefinition
workshopswithabottomupapproachofdemonstratingtop10metricsthatcan
easilybedeliveredbyatool,canresultinaniterativeprocessthatfairlyquickly
deliversatleastaprototypesolution.

1.2 Systems Management for this paper


Forthepurposesofthispaper,Ishalldefinesystemsmanagementasspanning:
Configurationmanagement
Availabilitymanagement
Problemmanagement
Performancemanagement
Ishallfurtherdefinesystemstoincludelocalandwideareanetworks,aswellas
PCsandUnixlikesystems.Inmyenvironment,Idonothavemainframeor
proprietarymidrangesystems.PCsrunavarietyofversionsofWindows.Unix
liketendstomeanaflavourofLinuxratherthanavendorspecificUnix,though
thereissomelegacyIBMAIXandSunSolaris.

2 Systems management tools


Therearenosystemsmanagementsolutionsforsale.Thesuccessful
implementationofsystemsmanagementrequirementsisacombinationof:
Appropriaterequirementsdefinition
Appropriatetools
Skillstotranslatetherequirementsintocustomisationoftools
Projectmanagement
Usertraining
Documentation
Intheory,thechoiceoftoolshouldbedrivenbytherequirements.Inpractise,thisis
oftennotthecaseandasolutionforoneaspectofsystemsmanagementinoneareaof
abusinessmaybecomethedefactostandardforawholeorganisation.
Therearegoodreasonswhythismightcomeabout.Itisnotpracticaltoruna
centralisedServiceDeskwithaplethoraofdifferenttools.AFrameworkbasedtool
withacentraliseddatabase,andacommonlookandfeelacrossbothGraphicalUser
Interface(GUI)andCommandLineInterface(CLI),offeringmodulesthatdeliverthe
differentsystemsmanagementdisciplines,isamuchmorecosteffectivesolutionthen

6
differentpiecemealtoolsfordifferentprojects,especiallywhenthecostofbuildingand
maintainingskillsandeducatingusersistakenintoaccount.
Toolintegrationisalargefactorinthesuccessfulrolloutofsystemsmanagement.
TheconceptofasingleConfigurationManagementDatabase(CMDB)thatalltools
feedanduse,iskeytothis.
Agoodtooldeliversusefulstuffeasilyoutoftheboxandprovidesastandardwayto
thenprovidelocalcustomisation.
Atitsmostbasic,thetoolisacompilerorinterpreter(C,bash,...)andthe
customisationiswritingprogramsfromscratch.Atthecomplexendofthespectrum,
thetoolmaybealargesuiteofmodulesfromoneofthebigfourcommercial
suppliers,IBM,HP,CAandBMC.Atthereallycomplexend,iswhereyouhave
severalofthebigcommercialproductsinvolvedinadditiontohomegrownprograms.

2.1 Choosing systems management tools


Everyorganisationhasdifferentprioritiesforthecriteriathatdrivetoolselection.
Forthemoment,let'sleaveasidethetechnicalmetricsandlookatsomeoftheother
decisionfactors:
Easeofusenotjustwhatdemoswellbutwhatimplementswellinyour
environment
Skillsnecessarytoimplementtherequirementsversusskillsavailable
Requirementsforandavailabilityofusertraining
Costallofitnotjustlicencesandtinevaluationtime,maintenance,
training,...
Supportfromsupplierand/orcommunities
Scalability
Deployabilitymanagementserver(s)easeofinstallationandagent
deployment
Reliability
Accountabilitytheabilitytosue/chargethevendorifthingsgowrong
Ifaccountabilityishighinyourprioritiesandthesoftwarecostisarelativelylow
prioritythenyouarelikelytochooseoneofthecommercialofferings;howeverifyou
haveawellskilledworkforce,oronepreparedandabletolearnquickly,andoverall
costisalimitingfactor,thenOpenSourceofferingsarewellworthconsidering.
Interestingly,youcanfindofferingsthatsuitalltheotherbulletsabove,fromboththe
commercialandtheOpenSourcestables.

7
2.2 The advantages of Open Source
OneattractionofOpenSourcetomeisthatyoudon'tactuallyhavetofund
salesfolk.Somecostsdoneedtobeinvestedinyourownpeopletoinvestigatethe
offeringsavailable,researchtheirfeaturesandrequirements,andparticipateinthe
onlineforathatshareexperiencearoundtheglobe.Thesecostsmaynotbesmallbut
atleasttheinvestmentstayswithinthecompanyandhopefullythosepeoplewhohave
donetheresearchwillthenbeakeypartoftheteamimplementingthesolution.This
isoftennotthecaseifyoupurchasefromacommercialsupplier.
OpenSourcedoesnotnecessarilymeanyou'reonyourown,pal!.MostoftheLinux
distributionshaveafreeversionandasupportedversion,whereasupportcontractis
availabletosuityourorganisationandbudget.SeveraloftheOpenSource
managementofferingshaveasimilarmodelbutdoensurethatthefreeversionhas
sufficientfeaturesforyourrequirementsandisnotjustawellfeatureddemo.
Allsoftwarehasbugsinit.Ultimately,ifyougoOpenSource,youhavethesource
codesoyouhavesomechanceoffixingproblemswithlocalstafforbuyinginglobal
expertiseandthatdoesn'tnecessarilymeantransportingagurufromAustraliato
Paris.OpenSourcecodeisavailabletoeveryonesoremotesupportandconsultancyis
adistinctpossibility.Withthebestwillintheworld,commercialorganisationswill
prioritiseproblemreportsaccordingtotheircriterianotyours.
TherearesomeexcellentforaanddiscussionlistsforcommercialproductsIhave
participatedinseveralofthemformanyyears;someevenhaveinputfromthesupport
anddevelopmentteams;however,thesourcecodeisnotopenfordiscussionor
communitydevelopment.WithaveryactiveOpenSourceoffering,theretendstobea
muchlargerpoolofdevelopersandtesters(ie.us)andthechanceofgettingproblems
fixedmaybehigher,evenifyoucannotfixityourself.Iwouldemphasiseveryactive
OpenSourceofferingsunlessyoureallydohavesomeveryhighlyskilledlocalstaff
thatyouaresureyouaregoingtokeep,itmaybeariskychoicetoparticipateina
smallOpenSourceproject.

3 Open Source management offerings


TherearelotsofdifferentOpenSourcemanagementofferingsavailable.Manyofthem
relyontheSimpleNetworkManagementProtocol(SNMP)whichdefinesbotha
protocolforanSNMPmanagertoaccessaremoteSNMPagent,andalsodefinesthe
datathatcanbetransferred.SNMPdatavaluesthatanSNMPmanagercanrequest,
aredefinedinManagementInformationBases(MIBs)whichcaneitherbestandard
(MIB2)orcanbeenterprisespecificinotherwords,eachdifferentmanufacture
canprovidedifferentdataaboutdifferenttypesofdevice.Informationevents
emanatingfromanagent(typicallyproblems)areSNMPtraps.Therearethree
versionsoftheSNMPstandard:
V1(1988)stillmostprevalent.Significantpotentialsecurityandperformance
issues.

8
V2(1993)solvedsomeperformanceissues.Neverreachedfullstandard
status.
V3(2002)significantlyimprovedperformanceandsecurityissues.Muchmore
complex.
OftheOpenSourcemanagementsolutionsavailable,someareexcellentpoint
solutionsforspecificnicherequirements.MRTG(MultiRouterTrafficGrapher)
writtenbyTobiOetiker,isanexcellentexampleofacompactapplicationthatuses
SNMPtocollectandlogperformanceinformationanddisplayitgraphically.Ifthat
satisfiesyourrequirement,don'tlookanyfurtherbutitwillnothelpyouwith
definingandcollectingproblemsfromdifferentdevicesandthenmanagingthose
problemsthroughtoresolution.
AnenhancementofMRTGisRRDTool(RoundRobinDatabaseTool),againfromTobi
Oetiker.Itisstillfundamentallyaperformancetool,gatheringperiodic,numericdata
anddisplayingitbutRRDToolhasadatabaseatitsheart.Thesizeofthedatabaseis
predeterminedoncreationandnewerdataoverwritesolddataafterapredetermined
interval.RRDcanbefoundembeddedinanumberofotherOpenSourcemanagement
offerings(Cacti,Zenoss,OpenNMS).
AfurtherenhancementfromRRDToolisCactiwhichprovidesacompletefrontendto
RRDTool.AbackendMySQLrelationaldatabasecanbeusedbehindtheRoundRobin
databases;datasourcescanbeprettywellanyscriptinadditiontoSNMP;andthere
isusermanagementincluded.Thisisstillaperformancedatacollectionanddisplay
package,notamultidiscipline,framework,systemsmanagementsolution.
Movingupthescaleoffeaturesandcomplexity,someofferingsareslantedmore
towardsnetworkmanagement(netdisco,TheDude);otherstowardssystems
management(Nagios).
Someaimtoencompassanumberofsystemsmanagementdisciplineswithan
architecturebasedaroundacentraldatabase(Nagios,Zenoss,OpenNMS).
Someareextremelyactiveprojectswithhundredsofappendstomaillistspermonth
(Nagios,Zenoss,OpenNMS,cacti);othershavearegularbutsmallercommunitywith
hundredsofmaillistappendsperyear(netdisco).
SomearepurelyOpenSourceprojects,typicallylicensedundertheGnuGPL(MRTG,
RRDTool,cacti)orBSDlicense(netdisco);somehavefreeversions(againtypically
underGPL)withextensionsthathavecommerciallicences(Zenoss).Inadditionto
freelicences,severalproductsoffersupportcontracts(Zenoss,Nagios,OpenNMS).
MostareavailableonseveralversionsofLinux;MRTG,RRDToolandcactiarealso
availableforWindows.TheDudeisbasicallyaWindowsapplicationbutcanrun
underWINEonLinux.
MosthaveawebbasedGUIsupportedonOpenSourcebrowsers.OpenNMScanonly
displaymapsbyusingInternetExplorer.

9
4 Criteria for Open Source management tool selection
Itisessentialtodefinewhatisinscopeandwhatisoutofscopeforasystems
managementproject.Aprioritisedlistofmandatoryanddesirablerequirementsis
helpful.

4.1 General requirements


Forthepurposesofthispaper,herearemyselectioncriteria.

4.1.1 Mandatory Requirements


OpenSourcefreesoftware
Veryactivefora/maillists
Establishedhistoryofcommunitysupportandregularfixesandreleases
Integratednetworkandsystemsmanagementincluding:
Configurationmanagement
Availabilitymanagement
Problemmanagement
Performancemanagement
Centralised,opendatabase
BothGraphicalUserInterface(GUI)andCommandLineInterface(CLI)
Easydeploymentofagents
Scalabilitytoseveralhundreddevices
Adequatedocumentation

4.1.2 Desirable Requirements


SupportforSNMPV3
Usermanagementtolimitaspectsofthetooltocertainindividuals
Graphicalrepresentationofnetwork
Controllableremoteaccesstodiscovereddevices
Easyserverinstallation
Norequirementforproprietarywebbrowsers
Scalabilitytoseveralthousanddevices
Gooddocumentation
Availabilityof(chargeable)support

10
4.2 Defining network and systems management
TheIntegratednetworkandsystemsmanagementrequirementneedssomefurther
expansion:

4.2.1 Network management


Configuration
Automatic,controllablediscoveryofnetworkLayer3(IP)devices
Topologydisplayofdiscovereddevices
SupportforSNMPV1,V2andpreferably,V3
Abilitytodiscoverdevicesthatdonotsupportping
AbilitytodiscoverdevicesthatdonotsupportSNMP
Central,opendatabasetostoreinformationforthesedevices
Abilitytoaddtothisinformation
Ideally,abilitytodiscoveranddisplaynetworkLayer2(switch)topology
Availabilitymonitoring
Customisablepingtestforalldiscovereddevicesandinterfaces
SNMPavailabilitytestfordevicesthatdonotrespondtoping(eg.
comparisonofSNMPInterfaceadministrativestatuswithInterface
operationalstatus)
Simpledisplayofavailabilitystatusofdevices,preferablybothtabularand
graphical
Eventsraisedwhenadevicefailsitsavailabilitytest
Abilitytomonitorinfrastructureofnetworkdevices(eg.CPU,memory,fan)
Differentiationbetweendevice/interfacedownandnetworkunreachable
Problem
Eventstobeconfigurableforanydiscovereddevice
Centraleventsconsolewithabilitytoprioritiseevents
Abilitytocategoriseeventsfordisplaytospecificusers
AbilitytoreceiveandformatSNMPtrapsforSNMPV1,V2andpreferably,
V3
Customisationofactionsinresponsetoevents,bothmanualactionsand
automaticresponses
Abilitytocorrelateeventstofindrootcauseproblems(eg.failureofarouter
deviceisrootcauseofallinterfacefailureeventsforthatdevice)
Performance

11
Regular,customisablemonitoringofSNMPMIBvariables,bothstandard
andenterprisespecific,withdatastorageandabilitytothresholdvaluesto
generateevents
AbilitytoimportanyMIB
AbilitytobrowseanyMIBonanydevice
Customisablegraphingofperformancedata

4.2.2 Systems management


Manyofthecriteriaforsystemsmanagementaresimilartothenetworkmanagement
bulletsabovebuttheyarerepeatedhereforconvenience.
Configuration
Automatic,controllablediscoveryofWindowsandUnixdevices
Topologydisplayofdiscovereddevices
SupportforSNMPV1,V2andpreferably,V3
Abilitytodiscoverdevicesthatdonotsupportping
AbilitytodiscoverdevicesthatdonotsupportSNMP
Central,opendatabasetostoreinformationforthesedevices
Abilitytoaddtothisinformation
Availabilitymonitoring
Customisablepingtestforalldiscovereddevices
Availabilitytestfordevicesthatdonotrespondtoping(eg.comparisonof
SNMPInterfaceadministrativestatuswithInterfaceoperationalstatus,
supportforsshtests)
Abilitytomonitorcustomisableportsonadevice(eg.tcp/80forhttpservers)
Ideallytheabilitytomonitorapplications(eg.ssh/snmpaccesstomonitor
forprocesses,wgettoretrievewebpages)
Simpledisplayofavailabilitystatusofdevices,preferablybothtabularand
graphical
Eventsraisedwhenadevicefailsanyavailabilitytest
AbilitytomonitorbasicsystemmetricsCPU,memory,diskspace,
processes,services(eg.theSNMPHostResourcesMIB)
Problem
Eventstobeconfigurableforanydiscovereddevice

12
Centraleventsconsolefornetworkandsystemsmanagementeventswith
abilitytoprioritiseevents
Abilitytocategoriseeventsfordisplaytospecificusers
AbilitytoreceiveandformatSNMPtrapsforSNMPV1,V2andpreferably,
V3
AbilitytomonitorUnixsyslogsandWindowsEventLogsandgenerate
customisableevents
Ideallytheabilitytomonitoranytestlogfileandgeneratecustomisable
events
Customisationofactionsinresponsetoevents,bothmanualactionsand
automaticresponses
Abilitytocorrelateeventstofindrootcauseproblems(eg.singlepointof
failurerouterisrootcauseofavailabilityfailureforalldevicesinanetwork)
Performance
Regular,customisablemonitoringofSNMPMIBvariables,bothstandard
andenterprisespecific,withdatastorageandabilitytothresholdvaluesto
generateevents
AbilitytoimportanyMIB
AbilitytobrowseanyMIBonanydevice
AbilitytogatherperformancedatabymethodsotherthanSNMP(eg.ssh)
Customisablegraphingofperformancedata

4.3 What is out-of-scope?


Inmyenvironment,somethingsarespecificallyoutofscope:
Softwaredistribution
Remoteconfiguration
Remotecontrolofdevices
Highavailabilityofmanagementservers
Applicationresponsetime
InthenextfewsectionsofthisdocumentIwillexploresomeofthenicheproducts
brieflyandthentakeaslightlymoreindepthlookatOpenNMS,NagiosandZenoss.
Thesesectionsarenotintendedtobeafullanalysisoftheproducts,moreaninitial
impressionsandacomparisonofstrengthsandweaknesses.Subsequentdocuments
willinvestigateNagios,OpenNMSandZenossinmoredetail.

13
5 A quick look at Cacti, The Dude and netdisco
Cacti,TheDudeandnetdiscodonotmeetmymandatoryrequirements;howeverthey
areinterestingnichesolutionsthatwereinvestigatedduringthetoolsevaluation
process.Cactiandnetdiscowereinstalled;TheDudewasonlyInternetresearched.

5.1 Cacti
Cactiisanichetoolforcollecting,storinganddisplayingperformancedata.Itisa
comprehensivefrontendtoRRDTool,includingtheconceptofusermanagement.
AlthoughthedefaultmethodofdatacollectionisSNMP,otherdatacollectors,
typicallyscripts,arepossible.
DatacollectionisveryconfigurableandisdrivenbytheCactiPollerprocesswhichis
calledperiodicallybytheOperatingSystemscheduler(cronforUnix).Thedefault
pollingintervalis5minutes.
DevicesneedtobemanuallyaddedusingtheCactiwebbasedGUI.Basicinformation
suchashostname,SNMPparametersanddevicetypeshouldbesupplied.Depending
onthedevicetypeselected(eg.ucd/netSNMPHost,CiscoRouter),oneormoredefault
graphtemplatescanbeassociatedwithadevicealongwithoneormoredefaultSNMP
dataqueries.InadditiontothewebbasedGUI,configurationofCacticanbedoneby
CommandLine,usingPHPwhichisageneralpurposescriptinglanguageespecially
suitedforwebdevelopment.
CactinowhassupportforSNMPV3.
Forhighperformancepolling,Spine(usedtobecactid)canreplacethebasecmd.php
pollingengine.TheusermanualsuggeststhatSpinecouldsupportpollingintervals
oflessthan60secondsforatleast20,000datasources.
CactiissupportedonbothUnixandWindowsplatforms.
GettheCactiUserManualfromhttp://www.cacti.net/downloads/docs/pdf/manual.pdf.
Cactihasaveryactiveuserforumwithhundredsofappendspermonth.Thereisalso
adocumentedreleaseroadmapgoingforwardto2ndquarter2009.
HereareafewscreenshotsofCactitogiveafeelfortheproduct.

14
Figure1:CactimainDevicespanel

15
Figure2:Cactigraphofinterfacetraffic

16
Figure3:Cactigraphofmemoryfordevicebino

5.2 netdisco
netdiscowascreatedattheUniversityofCalifornia,SantaCruz(UCSC),Networking
andTechnologyServices(NTS)department.Itisinterestingasanetwork
managementconfigurationoffering.ItusesSNMPandCiscoDiscoveryProtocol
(CDP)totryandautomaticallydiscoverdevices.Unlikemostothermanagement
offerings,netdiscoisLayer2(switch)awareandcanbothdisplayswitchportsand
optionallyprovideaccesstocontrolswitchports.
ItprovidesaninventoryofdevicesthatyoucansorteitherbyOSorbydevicemodel,
displayingallportsforadevice.Italsohastheabilitytoprovideanetworkmap.
Usermanagementisincludedsoyoucanrestrictwhoisallowedtoactivelymanage
devices.ThereisgoodprovisionofbothcommandlineinterfaceandwebbasedGUI.
netdiscoissupportedonvariousplatformsitwasoriginallydevelopedonFreeBSD;I
builtitonaCentos4platform.

17
Ifyourrequirementisstrictlyfornetworkconfigurationmanagementandyour
devicesrespondsuitablytonetdiscothenthismightbeworthatry.Ifounditvery
quirkyastowhatitwoulddiscover.ItappearsverydependentontheSNMPsystem
sysServicesvariabletodecidewhetheradevicesupportsnetworklayer2and3
protocols;ifadevicedidnotprovidesysServicesordidn'tindicatelayer2/3,then
netdiscowouldnotdiscoverit.IalsohadveryfewdevicessupportingCiscoCDPso
theautomaticdiscoverydidn'tworkwellforme.Althoughthereisafilewhereyou
canmanuallydescribethetopology,thiswouldbeahugejobinasizeablenetworkif
youhadtohandcraftasignificantamountofthenetworktopology.
Thisprojectisnotnearlysoactiveassomeoftheotherofferingsdiscussedhere
(around500appendstotheusersmaillistin2007)butthereseemstobeasteadyflow.
Buildingthesystemwasafairmarathonbutthedocumentationisreasonablygood.
Herearesomescreenshotsofthemaindeviceinventorypanel,plusthedetailsofa
routerandthedetailsofaswitch.

Figure4:Netdiscomaindeviceinventorydisplay

18
Figure5:Netdiscodetailsofrouterdevice

19
Figure6:Netdiscodetailsofaswitchdevice,includingports

5.3 The Dude


IputsomeresearchintoTheDudeasitapparentlyprovidesautodiscoveryofa
networkwithgraphicalmaplayoutsomethingthatishardtofinddonewell.From
theOpenSourceperspectivethough,itreallydoesn'tqualify.Itisbasicallya
WindowsapplicationthoughitcanapparentlyrununderWINEonLinux.Itcomes
fromacompanycalledMikroTikandtheirwebsitesaysitisfreebutitisunclear
whatthelicensingarrangementisforTheDude.Ithasaveryactiveforum.
Itoffersmorethansimplydiscoveryandconfigurationasitcanapparentlymonitor
linksanddevicesforavailabilityandgraphlinkperformance.Itcanalsogenerate
notifications

20
6 Nagios
Nagiosevolvedin2002outofanearliersystemsmanagementprojectcalledNetSaint,
whichhadbeenaroundsincethelate1990s.Itisfarmoreasystemsmanagement
product,ratherthananetworkmanagementproduct.Itisavailabletobuildonmost
flavoursofLinux/Unixandtheinstallationhasbecomemucheasierovertheyears.
TheNagiosQuickstartdocumentisreasonablycomprehensive(althoughitmissesa
fewprerequisitesthatIfoundnecessarylikegd,png,jpeg,zlib,netsnmpandtheir
relateddevelopmentpackages).IdownloadedandbuiltNagios3.0.1onaSuSE10.3
platform(hostnamenagios3),andhaditworkinginsidehalfaday.
TostarttheWebInterface,pointyourbrowserathttp://nagios3/nagios/.The
Quickstartdocumenthasyoucreatesomeuseridsandpasswordsthedefaultlogon
fortheWebconsoleisnagiosadminwiththepasswordyouspecifiedduring
installation.
HereisascreenshotoftheNagiosTacticalOverviewdisplay.

Figure7:NagiosTacticalOverviewscreen

21
6.1 Configuration Discovery and topology
Nagiosusesanumberoffilestoconfigurediscoveryoutoftheboxitwillfind
nothing.Samplesareavailable,bydefault,in/usr/local/nagios/etc.Themain
configurationfileisnagios.cfgwhichdefinesalargenumberofparameters,mostof
whichyoucanleavealoneattheoutset.
Typicallythemainthingstodiscoverarehostsandservices.Thesearedefinedin
anobjectorientedwaysuchthatyoucandefinehostandservicetoplevelclasseswith
particularcharacteristicsandthendefinesubclassesandhoststhatinheritfromtheir
parentclasses.Ratherthanhavingasingle,hugenagios,cfg,itcanreferenceother
files(typicallyintheobjectssubdirectory),wheredefinitionsforhosts,servicesand
otherobjecttypes,canbekept.So,forexample,/usr/local/nagios/etc/nagios.cfgmay
containlinessuchas:
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
cfg_file=/usr/local/nagios/etc/objects/services.cfg
cfg_file=/usr/local/nagios/etc/objects/commands.cfg

Definitionsofhostsarebuiltupinahierarchicalmannersothetopleveldefinitions
maylooklikethefollowingscreenshot.Notetheusestanzatodenoteinheritanceof
characteristicsfromapreviousdefinition.

22
Figure8:Nagioshosts.cfgtopleveldefinitions

Hostavailabilityparametersareshowninthescreenshotabove:
check_period (24x7)
check_interval (5mins)
retryinterval (1min)
max_check_attempts (10)
check_command (check_host_alivewhichisbasedoncheck_ping)

23
Figure9:Nagioshosts.cfgshowinghosttemplatedefinitions

Subsequentdefinitionsofsubgroupsandrealhostswillfollow.Notetheuseofthe
parentsstanzatodenotethenetworknodethatprovidesaccesstothedevice.This
meansthatNagioscantellthedifferencebetweenanodethatisdownandanodethat
isunreachablebecauseitsaccessrouterisdown.

24
Figure10:Nagioshosts.cfgfileshowingrealhostdefinitions

Hostscanbedefinedtobeamemberofoneormorehostgroups.Thisthenmakes
subsequentconfigurationmorescalable(forexample,aservicecanbeappliedtoahost
groupratherthantoindividualhosts).Hostgroupsaretypicallydefinedinhosts.cfg.

Figure11:Nagioshosts.cfghostgroupdefinitions

25
HostgroupsarealsousedintheGUItodisplaydatabasedonhostgroups.

Figure12:NagiosHostgroupsummary

Wheneverchangeshavetakenplacetoanyconfigurationfile,thecommand:
/etc/init.d/nagiosreload
shouldbeused.ThisdoesnotstopandstarttheNagiosprocesses(usestop|start|
restart|statustocontrolthebackgroundprocesses)thereloadparametersimplyre
readstheconfigurationfile(s).Thereisalsoahandycommandtoverifythatyour
configurationfilesarelegalandconsistent,beforeactuallyperformingthereload:
/usr/local/nagios/bin/nagiosv/usr/local/nagios/etc/nagios.cfg
AllobjectstobemanagedneeddefiningintheNagiosconfigurationfilesthereisno
formofautomaticdiscovery;howevertheabilitytocreateobjecttemplatesandthus
anobjecthierarchy,makesdefinitionsflexibleandeasy,onceyouhavedefinedyour
hierarchies.

26
Agreatbenefitofthisconfigurationfileistheabilitytodenotethenetworkdevices
thatprovideaccesstospecificnodes(parent/childrelationship).Thismeansthata
maphierarchycanbedisplayedandalsomeansthatnodereachabilityisencoded.If,
forexample,allnodesonthe172.31.100.32networkinheritfromatemplatethat
includesaparentsgroup100r3stanza,whengroup100r3goesdownthen
Nagiosknowsthatallnodesinthatnetworkareunreachable(ratherthandown).
Definingmultipleparentsforameshednetworkseemedproblematicalthough.
Nagiosautomaticallygeneratesatopologymap,basedonthetheparentsstanzasin
theconfigurationfiles.Colourcodingprovidesstatusfornodes.

Figure13:NagiosStatusmap

6.2 Availability monitoring


Nagiosavailabilitymonitoringfocusesmuchmoreonsystemsthanonnetworks.
Nagiosprovidesalargenumberofofficialpluginsformonitoring;inadditionthereare

27
othercommunitypluginsavailable,oryoucanwriteyourown.Theofficialplugins
shouldbeinstalledalongsidethebaseNagios.Theexecutablescanbefound
in/usr/local/nagios/libexec(use<pluginname>helpforusageoneachplugin).The
officialpluginsinclude:
check_ping configurablepingtestwithwarning&criticalthresholds
check_snmp genericSNMPtesttogetMIBOIDs&testreturnvalues
check_ifstatus checkSNMPifOperStatusagainstifAdminStatusforall
Administrativelyupinterfaces
check_ssh checkthatthesshportcanbecontactedonaremotehost
check_by_ssh usesshtoruncommandonremotehost
check_nt checkWindowsparameters(disk,cpu,services,etc..).Needs
NSClient++agentinstalledonWindowstargets
check_nrpe checkremoteLinuxparameters(disk,cpu,processes,etc..).
NeedsNRPEagentinstalledonUnix/Linuxtarget
Nagioshastwoseparateconceptshostmonitoringandservicemonitoringandthere
isaknownrelationshipbetweenthestateofthehostandthestateofitsservices.
Hostmonitoringisareachabilitytestandwillgenerallyusethecheck_pingNagios
plugin.IfyouhavedevicesthatsupportSNMPbutdonotsupportping(perhaps
becausethereisafirewallinthewaythatblocksping),thenthecheck_ifstatusplugin
workswelltotestallinterfacesonadeviceandcomparestheSNMPadministrative
statuswiththeoperationalstatus.HostmonitoringisdefinedintheNagios
configurationfileswiththecheck_commandstanza,wheretypicallythisisdefined
atahighlevelofthehostdefinitionhierarchybutcanbeoverriddenforsubgroupsor
specifichosts.Forexample,inhosts.cfg:
define host {
host_name group-100-a1
use host_172.31.100 ;Inherits from this parent class
parents group-100-r2 ;This is n/w route to device
alias group-100-a1.class.example.org
address group-100-a1.class.example.org
check_command check_ifstatus ;SNMP status check, not ping
}

AsummaryofhoststatusisgivenontheTacticalOverviewdisplay.TheHost
Detaildisplaythengivesfurtherinformationforeachdevice.Thehostsmonitored
usingcheck_pingshowtheRoundTripAverage(RTA).Notethatgroup100a1is
monitoredusingthecheck_ifstatuspluginsoshowsdifferentStatusInformation.

28
Figure14:NagiosHostDetaildisplay

Availabilitymonitoring,especiallyforcomputersratherthannetworkdevices,can
meanmanythings.Nagiosprovidesmanypluginsforportmonitoring,including
genericTCPandUDPmonitors.Thecheck_snmpplugincouldbeusedtocheck
SNMPparametersfromtheHostResourcesMIB(ifatargetsupportsthis).Nagios
alsoprovidesremoteagents,NSClient++forWindowsandNRPEforUnix/Linux
systems,whichprovideamuchmorecustomisabledefinitionofsystemmonitoring.
Servicesaretypicallydefinedinservices.cfg.Aswithhostdefinitions,servicescanbe
definedinaclasshierarchywherecharacteristicsofanobjectareinheritedfromits
parent.

29
Figure15:Nagiosservice.cfgtoplevelobjects

Again,notethecheck_period,max_check_attempts,normal_check_intervaland
retry_check_intervalstanzas.Morespecificservicedefinitionscanbethenbedefined,
inheritingcharacteristicsofparentsthroughtheusestanza:

30
Figure16:Nagiosservices.cfgshowingspecificservices

Notethatservicescanbeappliedeithertogroupsofhosts(hostgroup_name)orto
specifichosts(host_name).
Aswithhosts,itispossibletocreategroupsofservicestoimprovetheflexibilityof
configurationandthedisplayofservices.
AlsonotethatsomeservicesruncommandsthatareinherentlylocaltotheNagios
systemeg.check_local_disk.Thecheck_dnscommandrunsnslookupontheNagios
systembutthehost_nameparametercanbeusedtospecifytheDNSservertoquery
from.Thecommandsareactuallyspecifiedintheconfigurationfilecommands.cfg,
which,inturn,callsexecutablepluginsin/usr/local/nagios/libexec.

31
Figure17:NagiosServicedetail

ServicedependenciesareanadvancedfeatureofNagiosthatallowyoutosuppress
notificationsandactivechecksofservicesbasedonthestatusofoneormoreother
services(thatmaybeonotherhosts).
Bothhostandservicemonitoringcanbeconfiguredtogenerateeventsonfailure(and
thisisthedefault).

6.3 Problem management


Nagios'seventsystemdisplayseventsgeneratedbyNagios'sownhostandservice
monitors.ThereisnobuiltincapabilitytocollateeventsreceivedasSNMPTRAPsor
syslogmessages.Whenaneventisgenerated,itcanbeconfiguredsothat

32
notification(s)aregeneratedtooneormoreusersorgroupsofusers.Itisalsopossible
tocreateautomatedresponsestoevents(typicallyscripts).
NotethatNagiostendstousethetermseventandalertinterchangeably.

6.3.1 Event console


TheNagiosEventLogisdisplayedfromthelefthandmenu:

Figure18:NagiosEventLog

Bydefault,theeventlogisdisplayedinonehourlysections.Thelogshowstheevent
statusandalsoshowswhetheraNotificationhasbeengenerated(themegaphone
symbol).Thisdisplayiseffectivelysimplyshowing/usr/local/nagios/var/nagios.log.

33
UndertheReportingheadingonthelefthandmenu,therearefurtheroptionsto
displayinformationonevents(alerts).TheAlertHistoryiseffectivelythesameasthe
EventLog.TheAlertHistogramproducesgraphsforeitherahostorservicewith
customisableparameters.

Figure19:NagiosConfigurationforAlertHistogram

Noteinthefigureabovethatahost/serviceselectionhasalreadybeenpromptedfor
and,havingselectedhost,thespecifichosthasbeensupplied.Thefollowingfigure
showstheresultinggraph.Notethebluelinkstowardsthetopleftofthedisplay
providingaccesstoafilteredviewoftheeventslog(ViewHistoryforthisHost)andto
notificationsforthishost.

34
Figure20:NagiosAlertHistogramforhostgroup100r1

TheAlertSummarymenuoptioncanprovidevariousreports,specifictohostsor
services.

35
Figure21:NagiosAlertSummaryconfigurationoptions

Limitingthereporttoaspecifichost,group100r1,producesthefollowingreport.

36
Figure22:NagiosAlertSummaryforgroup100r1

6.3.2 Internally generated events


Nagioshastheconceptofsofterrorsandharderrorstoallowforoccasionalglitchesin
hostandservicemonitoring.Anyhostorservicemonitorcanspecifyorinherit
parametersforthecheckintervalunderOKconditions,thecheckintervalundernon
OKconditionsandthenumberofcheckattemptsthatwillbemade.
Hostparameters
check_interval default5mins(checkintervalwhenhostOK)
retry_interval default1min(checkintervalwhenhostnonOK)
maxcheck_attempts default4(numberofattemptsbeforeHARDevent)
Serviceparameters
normal_check_intervaldefault10mins
retry_check_interval default2mins

37
max_check_attempts default3(numberofattemptsbeforeHARDevent)
WhenanonOKstatusisdetected,asofterrorisgeneratedforeachsamplinginterval
untilmax_check_attemptsareexhausted,afterwhichahardeventwillbegenerated.
Atthispoint,thepollingintervalrevertstothecheck_intervalratherthanthe
retry_interval.

Figure23:NagiosEventLogshowinghardandsoftevents

Notefromtheearlierfigureshowingthetopologylayout,thatgroup100r3sits
behindgroup100r1.Eachofthesehostdevicesisbeingpolledevery5minuteswhen
inanOKstate(ormax_check_attemptshasbeenexceeded)andevery1minutewhen
aproblemhasarisen.Theactualproblemthathascausedtheeventlogshownabove,
isthatgroup100r1hasfailed;however,group100r3ispolledfirstandresultsinthe
firsteventforthisdevicewithastatusofDOWNandastatetypeofSOFT.
Subsequently,group100r1ispolledandfoundtobeDOWNwhichresultsinthe
associatedpolltogroup100r3receivingastatusofUNREACHABLEandastatetype

38
ofSOFT.Thethirdpollofgroup100r3againhasastatusofUNREACHABLEanda
statetypeofSOFT.
Thenexteventforgroup100r3isaservicepingmonitor(whichrunsevery5minutes
forthisdevice).NotethatthiseventhasastatetypeofHARDthisisbecauseNagios
knowsthatthehoststatusassociatedwiththisservicemonitorisalready
UNREACHABLE(orDOWN).
ThefourtheventresultsinastatetypeofHARDandthestatusofUNREACHABLE.
Thehardeventalsogeneratesanotification.

6.3.3 SNMP TRAP reception and configuration


Nagios'sowndocumentationsaysthatitisnotareplacementforafullblownSNMP
managementapplication.IthasnosimplewaytoreceiveSNMPTRAPsortoparse
them.
ItispossibletointegrateSNMPTRAPsbysendingthemtoNagiosaspassivechecks
butthiswillrequiresignificanteffort.Thedocumentationsuggestsusinga
combinationofnetsnmpandtheSNMPTRAPTranslator(SNMPTT)packages.

6.3.4 Nagios notifications


InNagios,thetermseventandalertareusedinterchangeably.
Thereisacomprehensivemechanismfornotificationswhichisdrivenbyparameters
onthehostandservicechecks.Thereisalsoconfigurationfornotificationsonaper
contactbasis;eachcheckcanhaveacontact_groupsstanzaspecifyingwhotocontact.
Contactscanappearinseveraldifferentcontactgroups(althoughonlyasingle
notificationwillbesenttoanyindividual).NotificationsareonlygeneratedforHARD
statustypeevents,notSOFTones.
Whethernotificationsaresentdependsonthefollowingparameters/characteristics
(inthisorder);
notifications_enabled globalon/offparameter
Eachhost/servicecanhavescheduleddowntimenonotificationsindowntime
Eachhost/servicecanbeflappingnonotificationsifflapping
Hostnotification_options(d,u,r) specifiesnotificationsondown,
unreachable,recoveryevents
Servicenotification_options(w,u,c,r) specifiesnotificationsonservicewarning,
unreachable,critical,recoveryevents
Host/servicenotification_period notificationsonlysentduringthisperiod
(eg.24x7,workdays,...)
Host/servicenotification_interval ifnotificationalreadysent,problemstill
extantandnotification_periodexceeded
thensendanothernotification

39
Onceeachofthesefiltersfornotificationhasbeentestedandpassed,contactfilters
arethenappliedforeachcontactinthegroup(s)indicatedinthehostorservice
contact_groupsstanza.Hereisthedefaultdefinition:

Figure24:NagiosDefaultcontactdefinition

Notificationsforhostsandservicescanbesent24x7.Theyaresentforalltypesof
eventsanduseaNagioscommandthatdrivestheemailsystem.Aswithallother
Nagiosconfigurations,morespecificusersandgroupsofuserscanbedefinedwhich
changeanyoftheseparameters.
Aneventhastosatisfytheglobalcriteria,thespecifichost/servicecriteriaandthe
contactcriteria,beforeanotificationisactuallysent.
RememberfromtheAlertsHistogramreport,itispossibletoseenotificationsfora
particularhost.

Figure25:NagiosHostNotifications

40
6.3.5 Automatic responses to events event handlers
Nagioscanrunautomaticactions(eventhandlers)whenaserviceorhost:
IsinaSOFTproblemstate
InitiallygoesintoaHARDproblemstate
InitiallyrecoversfromaSOFTorHARDproblemstate
Thereisaglobalparameter,enable_event_handlerswhichmusttakethevalue1
(true),beforeanyautomationcantakeplace.
Therearetwoglobalparameters,global_host_event_handlerand
global_service_event_handlerwhichcanbeusedtoruncommandsonallhost/service
events.Thesemightbeused,say,tologalleventstoanexternalfile.
Inaddition,individualhostandservices(orgroupsofeither)canhavetheirown
event_handlerdirectiveandtheirownevent_handler_enableddirective.Notethatif
theglobalenable_event_handlersisoffthennoindividualhost/servicewillrunevent
handlers.Individualeventhandlerswillrunimmediatelyafterandglobalevent
handler.
Typically,aneventhandlerwillbeascriptorprogram,definedintheNagios
commands.cfgfile,torunanyexternalprogram.Thefollowingparameterswillbe
passedtotheeventhandler:
ForServices:$SERVICESTATE$,$SERVICESTATETYPE$,$SERVICEATTEMP$
ForHosts: $HOSTSTATE$,$HOSTSTATETYPE$,$HOSTATTEMPT$
Eventhandlerscriptswillrunwiththesameuserprivilegeasthatwhichrunsthe
nagiosprogram.
Sampleeventhandlerscriptscanbefoundinthecontrib/eventhandlers/subdirectory
oftheNagiosdistribution.Hereisthesamplesubmit_check_resultscommand:

41
Figure26:NagiosSamplesubmit_check_resultcommandforeventhandlerfromcontribdirectory

6.4 Performance management


Nagiosdoesnothaveperformancedatacollectionandreportingoutofthebox;
however,itdoesprovideconfigurationparameterssuchthatanyhostcheckorservice
checkmayalsoreturnperformancedata,providedthepluginsuppliessuchdata.This
datacantheneitherbeprocessedbyaNagioscommandorthedatacanbewrittento
afiletobeprocessedasynchronouslyeitherbyaNagioscommandorbysomeother
mechanismmrtg,RRDToolandCactimayallbecontendersforthepostprocessing.
Thereareanumberofglobalparametersthatcontrolthecollectionofperformance
data,typicallyin/usr/local/nagios/etc/nagios.cfg:
process_performance_data globalon/offswitch
host_perfdata_command Nagioscommandtobeexecutedondata
service_perfdata_command Nagioscommandtobeexecutedondata
host_perfdata_file datafileforasynchronousprocessing
service_perfdata_file datafileforasynchronousprocessing
Noteeitherusethecommandparameterfordataprocessingwhenthedata
isretrieved,orusethedatafileforlaterprocessing

42
host_perfdata_file_processing_interval processdatafileevery<n>seconds
service_perfdata_file_processing_interval processdatafileevery<n>seconds
host_perfdata_file_processing_command Nagioscommandtoprocessdata
service_perfdata_file_processing_commandNagioscommandtoprocessdata
host_perfdata_file_template formatofdatafile
service_perfdata_file_template formatofdatafile

Figure27:NagiosPerformanceparametersinnagios.cfg
Thedefaultisthatprocess_performance_data=0(ie.off)andalltheotherparameters
arecommentedout.

Inadditiontotheglobalparameters,eachhostandserviceneedstoeitherexplicitly
configureorinheritadefinitionfor:

43
process_perf_data=1 1=datacollectionon,0=datacollectionoff
Bydefault,thegeneric_hostandgeneric_servicetemplatedefinitionssetthese
parametersto1(on).
IfaNagiospluginisabletoprovideperformancedata,itisreturnedaftertheusual
statusinformation,separatedbya|(pipe)symbol.Itcanberetrievedasthe
$HOSTPERFDATA$or$SERVICEPERFDATA$macro.ItisthenuptoyourNagios
commandstointerpretandmanipulatethatdata.
Thenextfigureshowsperformancedatathathasbeengatheredinto/tmp/service
perfdatausingthedefaultservice_perfdata_file_templatewherethelastfieldisthe
$SERVICEPERFDATA$value(iftheplugindeliversperformancedata).

Figure28:NagiosPerformancedatacollectedinto/tmp/serviceperfdata

Themostrecentperformancedatagatheredforhostsandservicescanalsobeseen
fromtheHostDetailorServiceDetailmenuoptions.

44
Figure29:NagiosPerformancedatahighlightedDNSCheckservice

6.5 Nagios summary


Nagiosisamaturesystemsmanagementtoolwhosedocumentationismuchbetter
thantheotheropensourceofferings.It'sstrengthisincheckingavailabilityofhosts
andservicesthatrunonthosehosts.Supportfornetworkmanagementislessstrong
asthereisnoautomaticdiscovery;howeveritispossibletoconfiguresimplenetwork
topologiesanditincludestheconceptofasetofdevicesbeingUNREACHABLE
(ratherthanDOWN)ifthereisanetworksinglepointoffailure.Handlingmeshed
networkswithmultipleroutingpathstoanetworkisproblematical.
Sinceallmonitoringisperformedbyplugins,someofwhichcomewiththeproduct
andsomeofwhichareavailableascommunitycontributions,thetoolisasflexibleas
anyonerequires.Therearealargenumberofpluginsavailableandyoucanalsowrite
yourown.
Oneofthestandardpluginsischeck_snmpwhichcanbeusedtoqueryanyhostfor
anySNMPMIBvariable;thisobviouslyrequiresthetargettosupportSNMPandthe
MIBinquestion.

45
ItisalsopossibletorunchecksonremotehostsbyinstallingtheNRPEagent
(availableforbothUnix/LinuxandWindowshosts)andtherequiredNagiosplugins,
ontheremotesystem.Thecheck_nrpepluginmustalsobeinstalledontheNagios
system.ThisallowspluginsdesignedtoberunlocaltotheNagiossystem,toberun
onremotehosts.WithNRPEagents,checksarerunonascheduledbasis,initiated
fromtheNagiossystem.
AnotheralternativeistoinstalltheNSCAaddontoremotesystems.Thispermits
remotemachinestoruntheirownperiodicchecksandreporttheresultsbackto
Nagios,whichcanbedefinedaspassiveservicechecks.
TheeventsubsystemofNagiosislesspowerfulandconfigurablethansomeofthe
otherofferingsithaslessfocusonaneventconsolebutincludesmoreinformation
abouthostandserviceeventsfromothermenus.Nagioshasnoeasybuiltinwayto
collectandprocessSNMPTRAPs.
IfyouwantlotsofperformancegraphsthenNagiosaloneisnotgoingtodelivereasily.
Insummary,Nagiosseemsgoodformonitoringarelativelysmallnumberofsystems,
providedyoudon'tneedhistoricalperformancereporting.

7 OpenNMS
OpenNMSpresentsitselfasthefirstEnterprisegradenetworkmanagement
platformdevelopedundertheOpenSourcemodel.ItisaJavaapplicationthatruns
underseveralflavoursofLinux.AVMwareVirtualMachine(VM)isalsoavailable
withthelatestreleaseofOpenNMS,whichmakesinitialevaluationveryeasywithout
havingtogothroughafullbuildprocess.Thereisalsoanonlinedemosystemwhich
appearstobemonitoringrealkitwhichgivesagoodfirsttasteoftheproduct.
ThefollowingsectionisbasedontheVMdownloadwhichisOpenNMS1.5.93based
onMandrivaitworkedveryeasily.TheVMwassetupforDHCPbutImodifiedthe
OperatingSystemfilestousealocalfixedaddress,withtheVMnetworkbridgedto
mylocalenvironment.
ToaccesstheOpenNMSWebConsole,pointyourbrowserathttp://opennms:
8980/opennms/.Thedefaultlogonidisadminwithapasswordofadmin.
HereisascreenshotofthemaindefaultwindowofOpenNMS.

46
Figure30:MaindefaultwindowforOpenNMS

ThefollowingsectionswilldescribehowtoconfiguredifferentaspectsofOpenNMSby
editingxmlconfigurationfiles.ItispossibletoconfiguremanyaspectsofOpenNMS
usingGUIdrivenmenus.Seesection7.5ManagingOpenNMSforabrief
description.

7.1 Configuration Discovery and topology


7.1.1 Interface discovery
OpenNMSusesastraightforwardfileforinterfacediscoverybydefaultthis
is/opt/opennms/etc/discoveryconfiguration.xml.Itcomeswithsomecommentedout
defaults,sobydefaultitdiscoversnothing!Thisfileneedsmodifyingtospecify
includerangesandexcluderangestoping;specificIPaddressesfordiscoverycanalso
beconfigured.Thefirststanzaspecifiesthecharacteristicsofthepingdiscovery
mechanism.Ifthereisaresponsewithinthetimeout,a"newsuspect"eventis
generated.
<discovery-configuration threads="1" packets-per-second="1"
initial-sleep-time="300000" restart-sleep-time="86400000"
retries="3" timeout="800">

<include-range retries="2" timeout="3000">


<begin>10.0.0.1</begin>

47
<end>10.0.0.254</end>
</include-range>
<include-range >
<begin>172.30.100.1</begin>
<end>172.30.100.10</end>
</include-range>
<specific 10.191.101.1/specific>
</discovery-configuration>

Intheaboveexample,pingdiscoverywillstart300,000ms(5minutes)after
OpenNMShasstartedup;thediscoveryprocesswillberestartedevery86,400,000ms
(24hours);1pingwillbesentpersecond;thetimeoutforapingwillbe800msand
therewillbe3pingretriesbeforethediscoveryprocessgivesuponanaddress.All
devicesontheClassC10.0.0.0networkwillbepolled(withonly2retriesbuta3
secondtimeout).The10devices172.30.100.1through10willbepolledforwiththe
defaultcharacteristics.Thespecificnode10.191.101.1willbepolled.
Allthatthediscoverprocessdoesistogeneratenewsuspecteventsthatarethen
usedbyotherOpenNMSprocesses.Ifthedevicedoesnotrespondtothispingpolling
thenitwillnotbeaddedtotheOpenNMSdatabase.
Anotherwaytogeneratesuchevents(sayforaboxthatdoesnotrespondtoping),isto
useaprovidedPerlscript:
/opt/opennms/bin/sendevent.plinterface<ipaddr>
uei.opennms.org/internal/discovery/newsuspect

7.1.2 Service discovery


Whenanewsuspecteventhasbeengeneratedbythediscoveryprocessitisthe
capabilitiesdaemon,capsd,thattakesoveranddiscoversservicesonasystem.capsd
isconfiguredusing/opt/opennms/etc/capsdconfiguration.xml.Thus,discoveryin
OpenNMSconsistsoftwoparts:discoveringanIPaddresstomonitor(thediscover
process)andthendiscoveringtheservicessupportedbythatIPaddress(thecapsd
process).
Thebasicmonitoredelementiscalledan"interface",andaninterfaceisuniquely
identifiedbyanIPaddress.Servicesaremappedtointerfaces,andifanumberof
interfacesarediscoveredtobeonthesamedevice(eitherviaSNMPorSMB)then
theymaybegroupedtogetherasa"node".
capsdusesanumberofpluginssuppliedwithOpenNMS,todiscoverservices.Each
servicehasa<protocolplugin>stanzaincapsdconfiguration.xml.Forexample:
<protocol-plugin protocol="SSH" class-name="org.opennms.netmgt.capsd.TcpPlugin"
scan="on" user-defined="false">
<property key="banner" value="SSH"/>
<property key="port" value="22"/>
<property key="timeout" value="3000"/>

48
<property key="retry" value="1"/>
</protocol-plugin>

Thisdefinesaservice(protocol)calledSSHthattestsTCPport22usingtheTCP
plugin.ItwilllookforthestringSSHtobereturned.Timeoutis3secondswith1
retry.
Thefirstprotocolentryincapsdconfiguration.xmlisforICMP.
<protocol-plugin protocol="ICMP"
class-name="org.opennms.netmgt.capsd.IcmpPlugin" scan="on" user-defined="false">
<property key="timeout" value="2000"/>
<property key="retry" value="1"/>
</protocol-plugin>

Itispossibletoapplyprotocolstospecificaddressrangesorexcludeprotocolsfrom
addressranges(thedefaultisinclusion).
<protocol-plugin protocol="ICMP"
class-name="org.opennms.netmgt.capsd.IcmpPlugin" scan="on" user-defined="false">
<protocol-configuration scan="off" user-defined="false">
<range begin="172.31.100.1" end="172.31.100.15"/>
<property key="timeout" value="4000"/>
<property key="retry" value="3"/>
</protocol-configuration>
</protocol-plugin>

Notethescan=offforIPaddresses172.31.100.115.
TheSNMPprotocolisspecialinthat,ifsupported,itprovidesawaytocollect
performancedataaswellaspollforavailabilitymanagementinformation.SNMP
parametersfordifferentdevicesandrangesofdevicesarespecified
in/opt/opennms/etc/snmpconfig.xml.Hereisasample:
<snmp-config retry="3" timeout="800" version=v1 port=161
read-community="public" write-community="private">
<definition version="v2c">
<specific>10.0.0.121</specific>
</definition>
<definition retry="2" timeout="1000">
<range begin="172.31.100.1" end="172.31.100.254"/>
</definition>
<definition read-community="fraclmye" write-community="rrwatr">
<range begin="10.0.0.1" end="10.0.0.254"/>
</definition>

</snmp-config>

Thefirststanzainsnmpconfig.xmlprovidesglobaldefaultparametersforSNMP
access.Variationsinanyoftheseglobalparameterscanbemadeusingadefinition
stanzaandeitherarangeoraspecificstatement.Thisfileisusedbothfordiscovery
andforcollectingperformancedata.

49
WhentestingSNMP,capsdmakesanattempttoreceivethesysObjectIDMIB2
variable(.1.3.6.1.2.1.1.2.0).Ifsuccessful,thenextradiscoveryprocessingtakesplace.
First,threethreadsaregeneratedtocollectthedatafromtheSNMPMIB2system
treeandtheipAddrTableandifTabletables.If,forsomereason,theipAddrTableor
ifTableareunavailable,theprocessstops(buttheSNMPsystemdatamayshowupon
thenodepage).
Second,alloftheIPaddressesintheipAddrTablearerunthroughthecapsd
capabilitiesscan.Notethatthisisregardlessofhowmanagementisconfiguredinthe
configurationfile.Thisonlyhappensontheinitialscanandonforcedrescans.On
normalrescans(bydefault,every24hours),IPaddressesthatare"unmanaged"in
capsdarenotpolled.
Third,everyIPaddressintheipAddrTablethatsupportsSNMPistestedtoseeifit
mapstoavalidifIndexintheifTable.Ifthisistrue,theIPaddressismarkedasa
secondarySNMPinterfaceandisacontenderforbecomingtheprimarySNMP
interface.

Figure31:OpenNMSnodedetailforaswitchshowingswitchports

50
Thefirststanzaincapsdconfiguration.xmldefinesservicepollingparameters:
<capsd-configuration rescan-frequency="86400000"
initial-sleep-time="300000"
management-policy="managed"
max-suspect-thread-pool-size = "6"
max-rescan-thread-pool-size = "3"
abort-protocol-scans-if-no-route = "false">

Thisdefinesthatcapsdwillwait5minutesafterOpenNMSstartsbeforestartingthe
capsddiscoveryprocess.Itwillrescantodiscoverservicesevery24hours.The
defaultmanagementpolicyforallIPaddressesfoundinnewsuspecteventswillbe
toscanforeachoftheservices.Thismanagedparametercanbeoverriddenatthe
endofcapsdconfiguration.xmlbyunmanagedrangestanzas:
<ip-management policy="unmanaged">
<specific>0.0.0.0</specific>
<range begin="127.0.0.0" end="127.255.255.255"/>
</ip-management>

Whenanewsuspecteventisgenerated,providedtheIPaddressisinamanaged
managementpolicyrange,theIPaddressischeckedforeachoftheservicesincapsd
configuration.xml,startingfromthetop.
Ifthedevicedoesnotrespondtoanyconfiguredservicethen,eveniftriggeredwith
send_event.pl,itwillnotbeaddedtotheOpenNMSdatabase.Look
in/opt/opennms/logs/daemon/discovery.logfordebugginginformation.

7.1.3 Topology mapping and displays


OpenNMSdoesnotuseatopologymappingfunctioninthecorecode(indeed,someof
itsproponentsarevociferousthatyoudonotneedamappingability).Thereisa
mappingcapabilityifyouuseanInternetExplorerwebbrowserwithaspecificAdobe
ScalableVectorGraphics(SVG)pluginthisisonlysupportedinIEanddidnotwork
forme.Thereisalsoamapsonfirefoxcodebranchbutperformanceissaidtobepoor
andthemaillistssuggestthatneithermappingcapabilityisheavilyused.
ANodeListisavailablefromthemainmenuwhereeachnodenameisalinktoa
detailednodepage.

51
Figure32:OpenNMSNodeListofdiscoverednodes

52
Figure33:OpenNMSnodedetailforgroup100r1

Notetheservicesthathavebeendiscoveredforthenode.Thelistofservicesper
interfacearethosethathavebeenactuallydetected;whethertheyareMonitoredor
notwillbediscussedinthenextsection.

7.2 Availability monitoring

OpenNMSperformsavailabilitymonitoringbypollingdeviceswithprocessesknown
asmonitorswhichconnecttoadeviceandperformasimpletest.Pollingonlyhappens
toaninterfacethathasalreadybeendiscoveredbycapsd.
Theconfigurationfileforpollingis/opt/opennms/etc/pollerconfiguration.xml.There
aremanysimilaritiesbetweenthisandcapsdconfiguration.xml;howeverthe
monitorsaredefinedwithmonitorservicestanzas(ratherthanprotocolstanzas),
whichdefinetheJavaclasstouseformonitoring.

53
<monitor service="DominoIIOP" class-name="org.opennms.netmgt.poller.DominoIIOPMonitor"/>
<monitor service="ICMP" class-name="org.opennms.netmgt.poller.IcmpMonitor"/>
<monitor service="Citrix" class-name="org.opennms.netmgt.poller.CitrixMonitor"/>
<monitor service="LDAP" class-name="org.opennms.netmgt.poller.LdapMonitor"/>
<monitor service="HTTP" class-name="org.opennms.netmgt.poller.HttpMonitor"/>
<monitor service="HTTP-8080" class-name="org.opennms.netmgt.poller.HttpMonitor"/>
<monitor service="HTTP-8000" class-name="org.opennms.netmgt.poller.HttpMonitor"/>
<monitor service="HTTPS" class-name="org.opennms.netmgt.poller.HttpsMonitor"/>
<monitor service="SMTP" class-name="org.opennms.netmgt.poller.SmtpMonitor"/>
<monitor service="DHCP" class-name="org.opennms.netmgt.poller.DhcpMonitor"/>
<monitor service="DNS" class-name="org.opennms.netmgt.poller.DnsMonitor" />
<monitor service="FTP" class-name="org.opennms.netmgt.poller.FtpMonitor"/>
<monitor service="SNMP" class-name="org.opennms.netmgt.poller.SnmpMonitor"/>
<monitor service="Oracle" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="Postgres" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="MySQL" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="Sybase" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="Informix" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="SQLServer" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="SSH" class-name="org.opennms.netmgt.poller.TcpMonitor"/>
<monitor service="IMAP" class-name="org.opennms.netmgt.poller.ImapMonitor"/>
<monitor service="POP3" class-name="org.opennms.netmgt.poller.Pop3Monitor"/>
<monitor service="NSClient class-name="org.opennms.netmgt.poller.NsclientMonitor"/>
<monitor service="NSClientpp class-name="org.opennms.netmgt.poller.NsclientMonitor"/>
<monitor service="Windows-Task-Scheduler" class-name="org.opennms.netmgt.poller.Win32ServiceMonitor"/>

Precedingthemonitorservicestanzasinpollerconfiguration.xmlarethedefinitions
ofservices.Theselookverysimilartotheentriesincapsdconfiguration.xml(which
makessenseasthisistheregularpollingdefinitionsforthesameservicesthatcapsd
hasalreadyfound);howeverparametersinthepollerfilemaywelltakedifferent
values(forexample,thediscoveryservicemaybeallowedlongertimeoutsandmore
retriesthanthepollingservice).
<service name="ICMP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
</service>

<service name="SNMP" interval="300000" user-defined="false" status="off">


<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="161"/>
<parameter key="oid" value=".1.3.6.1.2.1.1.2.0"/>
</service>

Notethatthedefaultpollerconfiguration.xmlhastheSNMPmonitorserviceturned
off.
Servicesmaybedefinedseveraltimeswithdifferentparameterseachservicewill
obviouslyrequireauniquename.Thisissothatdifferentdevicescanreceive
availabilitymonitoringwithdifferentcharacteristics.
Foravailabilitypolling,devicesaregroupedtogetherinpackages,whereapackage
defines:
targetinterfaces
servicesincludingthepollingfrequency

54
adowntimemodel(whichcontrolshowthepollerwilldynamicallyadjustits
pollingonservicesthataredown)
anoutagecalendarthatschedulestimeswhenthepollerisnottopoll(i.e.
scheduleddowntime).
Therearetwopackagesdefinedinthedefaultpollerconfiguration.xmlfile,example1
andaseparatepackage,strafer,tomonitorStrafePing.Apackagedefinitionmust
includeasinglefilterstanza;itmayalsohavespecific,includerangeand
excluderangestanzas.Hereisthestartofthedefault,asshipped:
<package name="example1">
<filter>IPADDR != '0.0.0.0'</filter>
<include-range begin= 1.1.1.1 end= 254.254.254.254 />

Itisthenfollowedbythelistofservicespertinenttothatpackageexample1includes
manyoftheservices,witheachservicesettostatus=onexceptSNMP.
Theopeningstanzainpollerconfiguration.xmlcontrolstheoverallbehaviourof
polling:
<poller-configuration threads="30"
serviceUnresponsiveEnabled="false"
nextOutageId= SELECT nextval('outageNxtId')
xmlrpc= false >
<node-outage status="on"
pollAllIfNoCriticalServiceDefined="true">
<critical-service name="ICMP"/>
</node-outage>

30threadsareavailableforpolling.Thebasiceventthatisgeneratedwhenapoll
failsiscalled"NodeLostService".Ifmorethanoneserviceislost,multiple
NodeLostServiceeventswillbegenerated.Ifalltheservicesonaninterfacearedown,
insteadofaNodeLostServiceevent,an"InterfaceDown"eventwillbegenerated.Ifall
theinterfacesonanodearedown,thenodeitselfcanbeconsidereddown,andthis
sectionoftheconfigurationfilecontrolsthepollerbehaviourshouldthatoccur.Ifa
"NodeDown"eventoccursandnodeoutagestatus=onthenalloftheInterfaceDown
andNodeLostServiceeventswillbesuppressedandonlyaNodeDowneventwillbe
generated.Insteadofattemptingtopollalltheservicesonthedownnode,thepoller
willattempttopollonlythecriticalservice.Oncethecriticalservicereturns,the
pollerwillthenresumepollingtheotherservices.
Noteinthefollowingscreenshotthatsixserviceshavebeendiscoveredonthe
10.0.0.95interfaceofthenodecalleddeodar.skills1st.co.uk,ofwhichfourare
monitored.Thetwointerfacesonthe172.16networkhavebeendetectedthrough
SNMPqueriesbutthereisnomonitoringofanyservicesonthesenetworks.There
arenocurrentissueswithdeodarandavailabilityhasbeen100%overthelast24
hours.

55
Figure34:OpenNMSnodedetailwithmonitoredservices

OpenNMSincludesastandardsetofAvailabilityreports.Theycanbeselectedfrom
theReportsmenu:

56
Figure35:OpenNMSAvailabilityreportsmenu

Hereisasample:

57
Figure36:OpenNMSOverallserviceavailabilityreport

Notethatthereisan/opt/opennms/etc/examplesdirectorywithextrasamplesofall
theOpenNMSconfigurationfiles.
AlsonotethatOpenNMSneedsrecyclingifanyconfigurationfileshavebeenmodified.
Use:
/etc/init.d/opennmsstop
/etc/init.d/opennmsstart

58
7.3 Problem management
Forproblemmanagement,OpenNMShastheconceptsof:
Events allsortsofbothgoodandbadnews
Alarms importantevents
Notifications typicallyemailorpagerbutcouldbeothermethods
Theeventssubsystemisdrivenbytheeventdprocesswhichlistensonport5817.Out
ofthebox,eventdreceivesinternaleventsfromOpenNMS(suchasnewsuspect
events)andSNMPTRAPs.Itispossibletoalsoconfigureforothereventsources
(suchasfromsyslogs).

7.3.1 Event console


EventscanbeviewedfromthewebGUIbyselectingtheEventsoption.

Figure37:OpenNMSEventsmenu

TheAdvancedSearchoptionprovidesseveralwaystofilterevents.Bydefault
Outstandingeventsaredisplayed(ie.eventsthathavenotbeenAcknowledged).

59
Figure38:OpenNMSAdvancedEventSearchoptions

Notethatifyouwishtosearchonseverity,youhavetospecifyanexactseverity;you
cannotspecifyseveritygreaterthan.....

60
Figure39:OpenNMSdisplayofAllevents

Thecolumnheaderscanbeclickedontouseassortkeys(ascending/descending).
TheAckboxcanbetickedtoAcknowledgeoneormoreeventstheywillthen
disappearfromthisdisplaywhichonlyshowsOutstandingevents.Clickonthe
symbolbesideEvent(s)outstandingtoseeEvent(s)Acknowledged,includingthe
nameoftheuserthatacknowledgedtheevent.
Thevarious[+]and[]linkscanbeusedtofilterin/outontheparameter(suchas
node,interface,orservice).The[<]and[>]besidetheTimecanbeusedtofilterfor
eventsbeforeorafterthistime.
Toseetheeventdetail,clickontheIDlink.

61
Figure40:OpenNMSEventdetailforevent139192

7.3.2 Internally generated events


Events(andindeedalarms)areconfiguredin/opt/opennms/etc/eventconf.xml,where
thefirstmatchforaneventdefinesitscharacteristics.Forthisreason,theorderingof
stanzasineventconf.xmlisveryimportant.Anyindividualeventisidentifiedbya
UniversalEventIdentifier(uei).
Eventsarebracketedby<event></event>tags.Withintheeventdefinition,the
followingtagscanalsobeused:
uei alabeltouniquelyidentifytheevent
eventlabel atextlabelfortheeventusedintheWebGUI
descr descriptionoftheevent
logmsg summaryoftheeventwherethedestparameterisoneof:
logndisplay logtoeventsdatabaseanddisplayinwebGUI
logonly logtodatabasebutdon'tdisplayinwebGUI
suppress don'tlogtodatabaseorwebGUI
donotpersist don'tlogordisplaybutdopasstootherdaemons(eg.for
notification)
discardtraps trapdtodiscardTRAPsnoprocessingwhatsoever
severity
alarmdata createanalarmforthiseventwith
reductionkey fieldstocomparetodetermineduplicateevent

62
alarmtype 1=problem,2=resolution.alarmtype=2alsotakesa
clearkeyparameterdefiningtheproblemeventthisresolves
autoclean trueorfalse
operinstruct optionalinstructionsforoperatorsusingthewebGUI
mouseovertext texttodisplaywhenmousepositionedoverthisevent
autoaction absolutepathnametoexecutableprogramexecutedevery
eventinstance
Manyofthetagscanusedatasubstitutedfromtheevent.Thesearedocumentedon
theOpenNMSwiki:

63
Figure41:OpenNMSeventparametersthatcanbesubstituted

Hereisanexampleeventfromthedefaulteventconf.xml:

64
Figure42:OpenNMSeventdefinitionfornodeLostService

ThedifferentseveritiesavailablecanbeseenbyselectingtheSeverityLegendoption
fromthetopofaneventslist.

Figure43:OpenNMSeventseveritylegend

Notethatthereisnoseparatefiletoconfigurealarms;itissimplydonewiththe
<alarmtype>tagineventconf.xml.
OpenNMScomeswithahugenumberofeventspredefined.Tomakeeventconf.xml
muchmoremanageable,inclusionfilescanbespecifiedattheend,suchas:
<eventfile>events/NetSNMP.events.xml</eventfile>
Theeventssubdirectorycurrentlyhasaround100filesinit!Forperformancereasons,
itmakessensetoediteventconf.xmlandremoveany<eventfile>stanzasthatarenot
relevantforyourorganisation.
AlsonotethatthewholeOpenNMSsystemmustberecycledinorderforchangesto
eventconf.xmltotakeeffect!

7.3.3 SNMP TRAP reception and configuration


OpenNMSwillautomaticallymonitortheSNMPTRAPpart(UDP/162)withthe
trapdprocess.The/opt/opennms/etc/eventsdirectorycontainsaround100fileswhich
specifySNMPTRAPtranslationsintoOpenNMSevents.IfaTRAPissentto
OpenNMSthatithasnoconfigurationfor,thenitwilluseadefaultmappingfoundin
default.events.xml.

65
Figure44:OpenNMSUnknowntrapappearsintheEventslist

ClickingontheeventIDgivesthedetailoftheeventwhichshowsalltheinformation
thatarrivedwiththeTRAP.

Figure45:OpenNMSEventdetailforanunformattedTRAP

TRAPsareconfiguredineventconf.xml(oranincludefile),usingthe<mask>tag.
Thistagspecifiesmaskelementswithname/valuepairsthatmustmatchdata
deliveredbytheTRAP,inorderforthisparticulareventconfigurationtomatch.

66
Figure46:OpenNMSDefinitionindefault.events.xmlforanunknownspecifictrap

ThisexampleeventwillmatchanyTRAPwhosegenericfieldisequalto6.Note,as
withotherconfigurationsineventconf.xml,thatthisdefinitionwillonlymatchthe
incomingTRAPifnopreviousdefinitionhigherinthefile(orincludefiles)hadalready
matchedit.
Themaskelementnametagmustbeone(ormore)ofthefollowing:
uei
source
host
snmphost
nodeid
interface
service
id(OID)
specific
generic
Itispossibletousethe"%"symboltoindicateawildcardinthemaskvalues.
SNMPTRAPsoftenhaveadditionaldatawiththem,knownasvarbinds.Thisdata
canbeaccessedusingthe<parm>element,where:
Eachparameterconsistsofanameandavalue.
%parm[all]%:Willreturnaspaceseparatedlistofallparametervaluesinthe
formparmName1="parmValue1" parmName2="parmValue2"etc.
%parm[values-all]%:Willreturnaspaceseparatedlistofallparameter
valuesassociatedwiththeevent.
%parm[names-all]%:Willreturnaspaceseparatedlistofallparameter
namesassociatedwiththeevent.

67
%parm[<name>]%:Willreturnthevalueoftheparameternamed<name>ifit
exists.
%parm[##]%:Willreturnthetotalnumberofparameters.
%parm[#<num>]%:Willreturnthevalueofparameternumber<num>.
Anyofthisdatacanbeusedinthemessageordescriptionfields.
Inaddition,thevarbinddatacanalsobeusedtofiltertheeventwithinthe<mask>
tags,followingthe<maskelement>tags.Itispossibletomatchmorethanone
varbind,andmorethanonevaluepervarbind.Forexample:
<varbind>
<vbnumber>3</vbnumber>
<vbvalue>2</vbvalue>
<vbvalue>3</vbvalue>
</varbind>
<varbind>
<vbnumber>4</vbnumber>
<vbvalue>2</vbvalue>
<vbvalue>3</vbvalue>
</varbind>

Theabovecodesnippetwillmatchifthethirdparameterhasavalueof"2"or"3"and
thefourthparameterhasavalueof"2"or"3".Itisalsopossibletouseregular
expressionswhenmatchingvarbindvalues.
Again,notethattheorderinwhicheventsarelistedisveryimportant.Putthemost
specificeventsfirst.
Hereisanexampledefinitionthatincludesmatchingavarbindwitharegular
expression.Notethe<vbvalue>matchesanystringthatcontainseitherBadorbad.
Extrastanzashavealsobeenaddedfor<operinstruct>help(whichprovidesaweb
linkononelineandplaintextonthesecond),a<mouseovertext>tag(whichdoesn't
appeartowork)andatagtorunanautomaticaction(ashellscript)wheneverthis
eventoccurs.

68
Figure47:OpenNMSConfigurationofspecificTRAPwithvarbindmatchingaregularexpression

IfyouhaveSNMPTRAPdefinitionsinamibfile,theopensourceutility
mib2opennmscanbeobtainedtoconvertSNMPV1TRAPsandSNMPV2
NOTIFICATIONSintoanOpenNMSeventconfigurationxmlfile.Forasourcefile
vcs.mibin/home/jane,use:
mib2opennmsf/opt/opennms/etc/events/vcs.events.xmlm/home/janevcs.mib

7.3.4 Alarms, notifications and automations


InOpenNMSyoucanaddan<alarmdata>tagtoaneventconfigurationtocreatean
alarm.AlarmsaredefinedasImportantEventsandhaveaseparatedisplay.Itis
similartotheEventsdisplayinthatyoucanselectAllAlarmsoryoucanspecifya
searchtofilterforparticularalarms.

69
Figure48:OpenNMSAlarmsdisplay

Alarmsaredefinedaspartofaneventdefinitionineventconf.xmlanditsincludefiles.
Itusesthe<alarmdata>tagwhere:
reductionkey fieldstocomparetodetermineduplicateevent
alarmtype 1=problem,2=resolution.alarmtype=2alsotakesa
clearkeyparameterdefiningtheproblemeventthisresolves
autoclean trueorfalse.Trueensuresthatalleventsotherthanthe
latestone,thatmatchthereductionkey,areremoved(veryusefulforclearing
outduplicateevents)
Oneofthekeycharacteristicsofanalarmthatdifferentiatesitfromanevent,isthe
reductionkeyfield,whichshouldensurethatduplicateeventsaretreatedasone
eventwithmultipleinstances,ratherthanasmultipleevents.
MostoftheinformationprovidedwithaneventisalsoavailableintheAlarmdisplay.
ThenewfieldisCountwhichshowsthenumberofduplicateeventsthathavebeen
integratedintothisalarm.Toseetheindividualevents,clickonthenumberinthe
Countcolumn.

70
Atpresent(July10th,2008),acknowledgingeventshasnoeffectonrelatedalarms,
andviceversa.NotethattheconceptsofAcknowledgingandClearingare
completelydifferent.Anoperatorcanacknowledgeaneventoranalarm,andthen
ownsit.Thisdoesnotcleartheevent(ie.removeitentirelyfromtheevents
database).
Automaticactionscanbeconfiguredforaneventusingthe<autoaction>tagbutthis
canonlyrunanexecutableanditrunsoneveryoccurrenceoftheevent(whichmay
notbewhatyouwant!).
OpenNMS'sconceptofautomation,however,istriggeredfromalarmsratherthan
events.Automationistheconceptofactionsbeingperformedonascheduledbasis,
providedthecorrecttriggersexist.An<automation>tagincludes:
name thenameoftheautomation
interval thefrequencyinmillisecondsatwhichtheautomationruns
triggername astringthatreferencesatriggerdefinition
actionname astringthatreferencesanactiondefinition
ThetriggersandactionsareSQLstatementsthatoperateontheeventsdatabase.
Automationisdefinedin/opt/opennms/etc/vacuumd.xmlwherethereareanumberof
usefulrules,bydefault:

71
Figure49:OpenNMSDefaultdefinitionsforautomationsinvacuumd.xml
Notethatautomationsalwaysrequireanactionnamebutdonotnecessarilyneeda
triggername.
ThecosmicClearautomationisthemeansbywhichan<alarmdata>alarmtype=2
tagineventconf.xml,canclearbadnewseventswhengoodnewseventsarrive.
HereisthedefinitionoftheselectResolverstriggername:

Figure50:OpenNMSDefinitionofselectResolverstriggerinvacuumd.xml

...andtheclearProblemsaction:

72
Figure51:OpenNMSDefinitionofclearProblemsactioninvacuumd.xml

ThetriggeriskeyedonthefieldalarmType=2.Notethatthefirstversionofthe
actioniscommentedouttheclearueielementisnowdeprecatedinthe<alarm
data>tagandonlytheclearkeyelementonthegoodnewseventisusedtomatch
againstthereductionkeyelementofthebadnewsevent,settingtheseverityto2
(ie.Cleared).Alsonotefromthe<automation>tagthatcosmicClearwillrunevery30
seconds.
IfusersneedtobenotifiedofaneventthenOpenNMSprovidesemailandpager
notificationsoutofthebox,runbythenotifddaemon.Itisalsopossibletocreate
othernotificationmethodssuchasSNMPTRAPsoranarbitraryexternalprogram.
Thereareseveralrelatedconfigurationfilesin/opt/opennms/etc:
destinationPaths.xml who,when,howtonotify/escalate
notifdconfiguration.xml globalparametersfornotifd
notificationCommands.xml notificationmethodsemail,http,page
notifications.xml whateventsgeneratenotifications,where
javamailconfiguration.properties configurationforjavaemailer(default)
ThemainfilesthatwillneedattentionaredestinationPaths.xml,notifd
configuration.xmlandnotifications.xml.Hereispartoftheexamplesfileprovided
in/etc/opennms/etc/examples/destinationPaths.xml:

73
Figure52:OpenNMSExampleentriesindestinationPaths.xml

The<name>tagspecifiesauserorgroupofusersdefinedinOpenNMS.The
<command>tagspecifiesamethodthatmustbedefinedin
notificationCommands.xml.Notethatescalationsarepossible.
Whenaneventisreceivedforwhichanotificationisrequired,OpenNMS"walks"the
destinationpath.Wesaythatthedestinationpathis"walked"becauseitisoftena
seriesofactionsperformedovertimeandnotnecessarilyjustasingleaction(although
itcanbe).Thedestinationpathcontinuestobewalkeduntilallnotificationsand
escalationshavebeensentorthenotificationisacknowledged(automaticallyorby
manualintervention).
Outofthebox,theonlydestinationPaththatisconfiguredisforjavaEmailtothe
Admingroupofusers.
Thenotifications.xmlfilespecieswhateventstriggernotificationsandtowhom.Here
isanexamplefromthedefaultfile:

74
Figure53:OpenNMSExtractofnotificationsfromnotifications.xml

ThenotificationcalledinterfaceDownisturnedon;itappliestoallinterfacesother
than0.0.0.0;thenotificationissenttothedestinationEmailAdmin(definedin
destinationPaths.xml)andthetextmessageoftheemailincludes3parametersfrom
theevent4parametersareincludedontheemailsubject.Thedefault
notifications.xmlgeneratesemailtotheAdmingroupforthefollowingevents:
interfaceDown
nodeDown
nodeLostService
nodeAdded
interfaceDeleted
HighThreshold
LowThreshold
HighThresholdRearmed
LowThresholdRearmed
Nothing,sofar,hashandledacknowledgingnotifications.Thiscaneitherbedone
manuallybyauserorcanbeperformedautomatically.Eitherway,whena
notificationisacknowledged,itstopsthedestinationpathbeingwalkedforthe
originalnotification.Itwillalsocreateanewnotificationtotellusersthattheoriginal
issueisresolved.Automaticacknowledgementsareconfigured

75
in/opt/opennms/etc/notifdconfiguration.xmlwhere<autoacknowledge>tagsspecify
theueiresolution/problemevents,alongwiththeparametersontheeventwhich
mustalsomatchforthenotificationtobeautomaticallyacknowledged.

Figure54:OpenNMSnotifdconfiguration.xmlwithautoacknowledgementsfornotifications

Notethatatpresent(July2008)notificationsaredrivenbyeventsnotalarms.Also
notethatacknowledgingnoticeshasnoeffectontheirassociatedeventsoralarms.
Itwouldappearthattherehasbeenadiscussionofachangeinarchitecturearound
events,alarmsandnotifications,atleastthroughout2008.Inthefuture,itis
suggestedthatalarmswillbewheremostautomationisdrivenfrom,including
notifications,andthateventswillbecomemoreofabackgroundlog.

7.4 Performance management


7.4.1 Defining data collections
Thereareseveralparallelsbetweenthecapabilitydiscoverysubsystemandthe
performancedatacollectionsubsystem.Eachusesthesnmpconfig.xmlfile,described
insection7.1.2,togetSNMPparametersforeachdevicesuchasSNMPversion,port
number,communitynames.
Thecapabilitydiscoveryprocess,capsd,usestheprotocoldefinitionsincapsd
configuration.xmltodeterminewhatservices(capabilities)todiscovertheseare
thingslikeSNMP,DNS,ICMP,SSH.Theperformancedatacollectionprocess,
collectd,uses2filestodefinewhatdatatocollect:

76
datacollectionconfig.xmlspecifiescollectionnames(justthesnmpcollection
calleddefaultoutofthebox),whichdefines(typicallyMIB)valuestocollect
collectdconfiguration.xmlspecifiespackagesforcollection.Apackagecombines
filtersandrangestodeterminewhichinterfacescollectionsshouldbeappliedto,
withserviceswhichreferencecollectionsindatacollectionconfig.xml.collectd
configuration.xmlcanalsospecifydatacollectionintervalsandwhetherthe
collectionisactive.

Notethatifadevicehasseveralinterfacesthat:
SupportSNMP
HaveavalidifIndex
Isincludedinacollectionpackageincollectdconfiguration.xml
thenthelowestIPaddressismarkedasprimaryandwillbeusedbydefaultforall
performancedatacollection.
collectdistriggeredwhencapsdgeneratesaNodeGainedServiceevent.The
discoveredprotocolname(eg.SNMP,SSH)ispassedfromcapsdtocollectd,alongwith
theprimaryinterfacefromtheevent.Thesearecheckedagainsttheconfigurationin
collectdconfiguration.xmltoseewhetheranycollectionpackagesarevalid(there
shouldbeatleastone,bydefinition!)anddatacollectionisstarted.

Figure55:OpenNMScollectdconfiguration.xmlasshipped

Thereisonlyonepackagespecifiedincollectdconfiguration.xml,asshipped,which
appliestoallinterfacesotherthan0.0.0.0andintherange1.1.1.1through
254.254.254.254.Aswithpollerconfiguration.xml,youmusthaveonefilter

77
statementperpackageandcanthenusemultiple<specific>,<includerange>and
<excluderange>statementstodefinewhichinterfacesthispackageappliesto.You
canalsousethe<includeurl>tagtospecifyafilewithalistofinterfaces.
ThereisonlyonedatacollectionservicedefinedforOpenNMSoutofthebox,in
collectdconfiguration.xmltheSNMPservice.Itwillrunevery5minutes(300,000
ms)andwillcollecttheMIBvariablesspecifiedinthecollectioncalleddefault,
specifiedindatacollectionconfig.xml.The<service>stanzacanalsospecifyvaluesfor
SNMPtimeouts,retriesandportnumberwhichwouldoverridethedefaultvaluesin
snmpconfig.xml.
Thepackagedefinitioncanalsousethe<outagecalendar>tagtospecifyscheduled
downtimefordevices,duringwhichdatacollectionwillbesuspended.Thisshouldbe
usedtopreventlotsoffailedSNMPcollectionevents.Outageperiodsaredefinedin
thepolloutages.xmlfile.
Obviouslyyoucanspecifydifferentpackageswithdifferentaddressranges,collection
intervalsandwithdifferentcollectionkeys.Youcanalsospecifydatacollectorsother
thanSNMP,suchasNSClient,JMXandHTTP.Seehttp://blogs.opennms.org/?p=242
foranoteonusinganHTTPdatacollector.
Thedatacollectionconfig.xmlfiledefinesoneormoreSNMPdatacollectionsthat
TarusBalog(theprimedeveloperbehindOpenNMS)callsa"scheme",todifferentiate
itfromthepackagedefinedinthecollectdconfigurationfile.Theseschemesbring
togetherOIDsforcollection,intogroupsandthegroupsaremappedtosystems.The
systemsaremappedtointerfacesbyadevice'ssystemOID.Inaddition,each"scheme"
controlshowthedatawillbecollectedandstored.
Fundamentally,OpenNMSusesRRDTool(RoundRobinDatabaseTool)tostore
performancedata.ThispaperisnotatutorialonRRDToolsopleasefollowthe
referencetoRRDattheendofthispaperformoreinformation.
ThebasisofRRDisthatafixedamountofspaceisallocatedforagivendatabase
whenitiscreated.Itholdsdataforagivenperiodoftime,say1month,1year,etc.
Thesamplingintervalisknownsoyouknowhowmanydatapointswillgointothe
databaseandhencehowmuchspaceisrequired.Oncethedatabaseisfull,newer
datapointswillreplacetheoldestones,cyclingaround.

Figure56:OpenNMSdatacollectionconfig.xmlcollectionandRRDparameters

78
The<rrd>stanzaspecifieshowdatawillbestoredinaRoundRobinArchive(RRA).
Thesnapshotshowninthefigureabovespecifies:
<rrdstep="300">
datatobesavedevery5minutes,perstep
RRA:AVERAGE:0.5:1:2016
createanRRAwithvaluesAVERAGE'dover1step(ie.thisdataisraw,
notconsolidated).TheRRAwillhave2016rowsrepresenting7daysofdata
(5minutesteps=12/hour*24hours*7days=2016).Consolidatethe
samplesprovided0.5(half)ofthemarenotUNKNOWN(otherwisethe
consolidatedvaluewillbeUNKNOWN)
RRA:AVERAGE:0.5:12:1488
createanRRAwithvaluesAVERAGE'dover12steps(ie.thisdatais
consolidatedover1hour).TheRRAwillhave1488rowsrepresenting2
monthsofdata(1hourconsolidations*24hours*62days=1488).
Consolidatethesamplesprovided0.5(half)ofthemarenotUNKNOWN
(otherwisetheconsolidatedvaluewillbeUNKNOWN)
RRA:AVERAGE:0.5:288:366
createanRRAwithvaluesAVERAGE'dover288steps(ie.thisdatais
consolidatedover288*5minsteps=1day).TheRRAwillhave366rows
representing1yearofdata(1dayconsolidations*366days=366).
Consolidatethesamplesprovided0.5(half)ofthemarenotUNKNOWN
(otherwisetheconsolidatedvaluewillbeUNKNOWN)
RRA:MAX:0.5:288:366
createanRRAwithMAXvaluesaverageddailyandkeep1yearofdata
RRA:MIN:0.5:288:366
createanRRAwithMINvaluesaverageddailyandkeep1yearofdata

Thetopofdatacollectionconfig.xmldefineswheretheRRDrepositoriesarekeptand
howmanyvariablescanberetrievedbyanSNMPV2GETBULKcommand(10isthe
default).Withintherepositorydirectory,foreachnode,therewillexistadirectory
thatconsistsofthenodenumber.Thus,ifthesystemwascollectingdataonnode21,
therewouldbeadirectorycalled/opt/opennms/share/rrd/snmp/21containinga
datafileforeachMIBOIDbeingcollected.Filenameswillmatchthealiasparameter
foraMIBOID,indatacollectionconfig.xml.
Thenodenumbercanbefoundbygoingtothedetailednodeinformationforadevice
andchoosingtheAssetInfolink:

79
Figure57:OpenNMSAssetInfolinkforadevice

TheresultingpageincludestheNodeIDatthetop.

80
Figure58:OpenNMSAssetinformationpage,includingNodeID
ThesnmpStorageFlagparameterinthesnmpcollectionstanzaofdatacollection
config.xmldefinesforwhichinterfacesofadevice,datawillbestored.Possiblevalues
are:
all (theolddefault)
primary theprimarySNMPinterface
select collectfromallIPinterfacesandcanuseAdminGUIto
selectadditionalnonIPinterfacestocollectdatafrom(new
defaultsinceOpenNMS1.1.0)

81
Figure59:OpenNMSGUIAdminpageforspecifyinginterfacestocollectdatafrom

Mostofthecontentsofdatacollectionconfig.xmlisdefininggroupsandsystems:
groups definegroupsofSNMPMIBOIDstocollect
systems useadevice'sSystemOIDasamasktodeterminewhichgroupsof
OIDsshouldbecollected

82
Figure60:OpenNMSgroupdefinitionsindatacollectionconfig.xml

UnfortunatelyOpenNMSdoesnothaveaMIBcompilersoallMIBOIDsneedtobe
manuallyspecifiedinthisfile(thegoodnewsisthattherearelotsthereoutofthe
box).OncegroupsofMIBvariablesaredeclared,systemstanzassaywhichgroup(s)
aretobecollectedforanydevicewhosesystemOIDmatchesaparticularpattern.
EachSNMPMIBvariableconsistsofanOIDplusaninstance.Usually,thatinstance
iseitherzero(0)oranindextoatable.Atthemoment,OpenNMSonlyunderstandsa
smallnumberoftableindices(forexample,theifIndexindextotheifTableandthe
hrStorageIndextothehrStorageTable).Allotherinstanceshavetobeexplicitly
configured.
TheifTypeparametercanbeusedtospecifythesortofinterfacestocollectfrom.
Legalvaluesare:
all collectfromallinterfacetypes

83
ignore usedwhenthevaluewouldbethesameforallinterfaceseg.
CPUutilisationforaCiscorouter
<i/ftypenumber> usedtodenoteoneormorespecificinterfacetypes.For
exampleifType=6forethernetCsmacd.See
http://www.iana.org/assignments/ianaiftypemibfora
comprehensivelist.

OpenNMSunderstandsfourtypesofvariablestocollectongauge,timeticks,integer,
octetstring.NotethatRRDonlyunderstandsnumericdata.

Figure61:OpenNMSsystemsdefinitionsindatacollectionconfig.xml

Inthefigureabove,anydevicewhichhassatisfiedthefilteringincollectd
configuration.xmlandhasasystemOIDstartingwith.1.3.6.1.4.1(thestartofthe
EnterpriseMIBtree),willcollectperformancedataforMIB2interfaces,tcpandicmp,
asspecifiedintheearlier<group>stanzas.
Notethatthedefaultsincollectdconfiguration.xmlanddatacollectionconfig.xml
meanthatalargenumberofSNMPdatacollectionswillbeactivatedoutofthebox.
Thisisgoodinprovidinglotsofsamplesinsmallenvironmentsbutitcouldbea
seriousperformanceanddiskusagefactorifthesedefaultsareleftunchanged,where
alargenumberofinterfacesaremonitoredbyOpenNMS.

84
7.4.2 Displaying performance data
OpenNMSprovidesalargenumberofreportsoutofthebox,basedonthedefaultdata
collectionparameters.UsetheReportsmainmenutoseetheoptions.

Figure62:OpenNMSReportcategoriesavailableoutofthebox

ResourceGraphs providelotsofstandardreports
KSCPerformance,Nodes,Domains allowsuserstocustomiseownreports
Availability availabilityreportsforinterfaces&services
StatisticsReports showsTop20ifInOctets acrossallnodes
FollowingtheResourceGraphslinkprovidesaccesstomanystandardreports.

85
Figure63:OpenNMSStandardperformancereports

Thestandardperformancereportsdisplayvariouscollectedvaluesforoneparticular
nodewhichyouchoosefromthemenuprovided.Thedifferentcategoriesprovide:
NodelevelperformancedatasuchasTCPconnections,CPU,memory
Interfacedataforeachinterfacesuchasbitsin/out
ResponsetimedataforservicessuchasICMP,DNS,SSH
DiskspaceinformationfromtheucdsnmpMIB

86
Figure64:OpenNMSStandardResourcegraphsavailableforaselectednode

Hereispartofthenodelevelperformancedatasetofgraphs.

87
Figure65:OpenNMSpartialdisplayofthenodelevelperformancedatagraphs

Ifyouwishtocreatemoreselectivesetsofgraphsforotherpeopletouse,theKey
SNMPCustomized(KSC)Reportsmenutocreateyourownreportswhichcaninclude
graphsofselectedMIBvariablesfromonedeviceorcanselectMIBvariablesfrom
differentdevices.UsingtheCreateNewbuttonwillpromptfornodesthathavedata
collectionsconfiguredasChildResources.

88
Figure66:OpenNMSKSCReportsmenu

SelectinganodeandclickingViewchildresourcesresultsinamenuofreport
categories.

89
Figure67:OpenNMSReportcategoriesavailableforcustomisedreports

IfyouselecttheNodelevelPerformanceDataoptionandtheChoosechildresource
buttontheneachoftheMIBvariablescollectedcanbedisplayedandselected.

90
Figure68:OpenNMSSelectingprefabricatedreportstoincludeinacustomisedreport

ThedropdownalongsidethePrefabricatedReportfieldallowsyoutoselectanyof
thedefaultreportstoincludeinyourowncustomisedreports.Youcanincludeseveral
differentgraphs,fromthesameordifferentnodes,inyourKSCreport.

7.4.3 Thresholding
ThethresholdingcapabilityinOpenNMShaschangedfairlysignificantlyovertime
seehttp://www.opennms.org/index.php/Thresholding#Merge_into_collectd.foragood
explanation.
PreOpenNMS1.3.10,collectdcollecteddataandthreshdperformedthresholding
twoseparateprocesses.Thisdesignusedarangeparameterinthreshd
configuration.xmltogetaroundproblemscausedbytheasynchronousmannernature
ofcollectdandthreshd.
OpenNMS1.3.10mergedthethresholdingfunctionalityintocollectdandintroduceda
newparameterintocollectdconfiguration.xml:
<parameterkey=thresholdinggroupvalue=defaultsnmp/>
wherethevalueofthethresholdinggroupmatchedadefinitioninthreshd
configuration.xml.Theneedfortherangeparameterdisappeared.However,to
definedifferentfiltersforthresholding,differentpackageshadtobedefinedin
collectdconfiguration.xml.

91
FromOpenNMS1.5.91,(thispaperisbasedonversion1.5.93),filterscanbedefined
inthreshdconfiguration.xmlsothatpackagesincollectdconfiguration.xmlcanbe
keptsimple.Theparameterinthreshdconfiguration.xmlchanges;thethresholding
groupkeydisappearsandisreplacedby:
<parameterkey=thresholdingenabledvalue=true/>
Hereisthedefaultcollectdconfiguration.xml:

Figure69:OpenNMSDefaultcollectdconfiguration.xml

Thelackofanythresholdingparameterimpliesthatthresholdingisdisabled.
...andthedefaultthreshdconfiguration.xml:

Figure70:OpenNMSDefaultthreshdconfiguration.xml

92
Thedefaultthreshdconfiguration.xmlissetupfortheinterimdesignbetween
versions1.3.10and1.5.90.ForOpenNMS1.5.93,collectdconfiguration.xmlshouldbe
changedasshownbelow:

Figure71:OpenNMSModifiedcollectdconfiguration.xmltoenablethresholds

threshdconfiguration.xmlcanbemodifiedwithdifferentpackagesofthresholdingto
applytodifferentrangesofnodes.

Figure72:OpenNMSModifiedthreshdconfiguration.xml

93
Differentfiltersareappliedtoeachpackage.Thethresholdinggroupparameteris
requiredhereandthevaluepointstoamatchingdefinitioninthresholds.xml,where
theMIBstothresholdandthethresholdvalues,arespecified.

Figure73:OpenNMSModifiedthresholds.xmlforCCsnmpgroupandraddlesnmpgroup

Theattributesofathresholdare:
type:A"high"thresholdtriggerswhenthevalueofthedatasourceexceedsthe
"value",andisrearmedwhenitdropsbelowthe"rearm"value.Conversely,a
"low"thresholdtriggerswhenthevalueofthedatasourcedropsbelowthe
"value",andisrearmedwhenitexceedsthe"rearm"value."relativeChange"is
forthresholdsthattriggerwhenthechangeindatasourcevaluefromone
collectiontothenextisgreaterthan"value"percent.
expression:Amathematicalexpressioninvolvingdatasourcenameswhichwill
beevaluatedandcomparedtothethresholdvalues.Thisisusedin"expression"
thresholding(supportedfrom1.3.3).
dsname:Thenameofthevariabletobemonitored.Thismatchesthenamein
thealiasparameteroftheMIBstatementindatacollectionconfig.xml.
dstype:Datasourcetype.nodefornodeleveldataitems,and"if"for
interfacelevelitems.
dslabel:Datasourcelabel.Thenameofthecollected"string"typedataitemto
useasalabelwhenreportingthisthreshold.Note:thisisadataitemwhose
valueisusedasthelabel,notthelabelitself.
value:Thevaluethatmustbeexceeded(eitheraboveorbelow,dependingon
whetherthisisahighorlowthreshold)inordertotrigger.Inthecaseof
relativeChangethresholds,thisisthepercentthatthingsneedtochangein
ordertotrigger(e.g.'value="1.5"'meansa50%increase).
rearm:Thevalueatwhichthethresholdwillresetitself.Notusedfor
relativeChangethresholds.

94
trigger:Thenumberoftimesthethresholdmustbe"exceeded"inarowbefore
thethresholdwillbetriggered.NotusedforrelativeChangethresholds.
triggeredUEI:AcustomUEItosendintotheeventssystemwhenthis
thresholdistriggered.Ifleftblank,itdefaultstothestandardthresholdsUEIs.
rearmedUEI:AcustomUEItosendintotheeventssystemwhenthis
thresholdisrearmed.Ifleftblank,itdefaultstothestandardthresholdsUEIs.
Bydefault,standardthresholdandrearmeventswillbegeneratedbutitisalso
possibletocreatecustomisedeventswiththethresholdattributes.Thiswouldthen
makeiteasiertogeneratenotificationsforspecificthresholding/rearmevents.
Hereisascreenshotwithstandardeventsgeneratedbythresholdsontheraddle
network:

Figure74:OpenNMSThresholdeventsfromvariousdevicesintheraddlenetwork

ForthosewhoprefernottoeditXMLconfigurationfiles,theOpenNMSAdminmenu
providesaGUIwaytocreateandmodifythresholds.

95
Figure75:OpenNMSAdminmenu

SelectingtheManageThresholdsoptiondisplaysallthresholdscurrentlyconfigured
inthresholds.xml.

96
Figure76:OpenNMSConfiguringthresholdsthroughtheAdminmenu

UsingtheEditbuttonpermitsmodificationofanexistingthreshold.

Figure77:OpenNMSModifyingthresholdsthroughtheAdminGUI

7.5 Managing OpenNMS


Sofar,thisdescriptionofOpenNMShasfocusedverymuchonconfigurationby
editingxmlfiles.ItiswellworthmentioningthatthereisnowanAdminmenu
(touchedonintheThresholdingsectionpreviously),whichmeansmanyofthe
configurationtaskscanbedrivenbyamenubased,fillintheblanksGUI.Referback

97
toFigure75:OpenNMSAdminmenuforalistoftheareaswhichcanbeconfigured
thisway.

7.6 OpenNMS summary


OpenNMSisamatureandverycapablesystemsandnetworkmanagementproduct.
Itsatisfiesmostrequirementsfordiscovery,availabilitymonitoring,problem
managementandperformancemanagement.
IthasacleanarchitectureforconfigurationwitheverythingbeingdefinedinXML
files.IthasanexcellentmechanismforcollectingandconfiguringSNMPTRAPs.
ForthosewhoprefertocustomisethroughaGUI,theAdminmenuprovidesaccessto
configuresomeofthesefileswithoutneedingtoknowaneditororXML.
Itfeelslikeasolid,reliableproductandisdesigned(saythedevelopers)toscaleto
trulylargeenterprises.Therearelotsofgoodsamplesprovidedandthedefault
configurationsproviderichfunctionality.
Areaswhereitisweakarearoundformaldocumentationandthelackofausable
topologymap.Thatsaid,thehelpthatisprovidedwithOpenNMSpanelsisvery
good.Datacollectionandthresholdingisstrong.TheadditionofaMIBcompilerand
browserwouldimprovemattersenormously.Itisalsoshortofawaytodiscover
applicationsthatdonotsupportportsniffingorSNMP.
TherearetwolargeproblemswithOpenNMSthatgivemegreatconcern.Youhaveto
bouncethewholeOpenNMSsystemifyouchangeanyconfigurationfiles!
Thesecondbigissueknowntobeunderreviewistheassociationbetweenevents,
alarmsandnotifications.Currently,notificationsaredrivenfromeventswhereas
drivingthemfromalarmswouldseempreferable.Thereisalsonolinkbetween
acknowledgingevents,alarmsandnotifications.
IhavetwopersonalnegativefeelingswithOpenNMS.Thefirstisthatitiswrittenin
Java.Sorry,butIhateJavaapplications!Tobefair,OpenNMSdoesnotsufferfrom
performanceissuesthataffectsomanyotherJavaapplicationsbutitslogfilesare
Javalogfilesandlifeisjusttooshorttofindanythingusefulinthem!Mysecond
personalnonpreferenceisthatOpenNMSisverywordy.Theimportantinformation
neverseemstohittheeyeonmostscreens.

8 Zenoss
ZenossisathirdOpenSource,multifunctionsystemsandnetworkmanagementtool.
UnlikeNagiosandOpenNMS,thereisafree,coreoffering(whichdoesseemtohave
mostthingsyouneed),andZenossEnterprisethathasextraaddongoodies,high
availabilityconfigurations,distributedmanagementserverconfigurationsandvarious

98
supportcontractofferingswhichincludessomeeducation.Foracomparisonofthe
freeandfeealternatives,tryhttp://www.zenoss.com/product/#subscriptions.
Zenossoffersconfigurationdiscovery,includinglayer3topologymaps,availability
monitoring,problemmanagementandperformancemanagement.Itisbasedaround
theITILconceptofaConfigurationManagementDatabase(CMDB),theZenoss
StandardModel.ZopeEnterpriseObjects(ZEO)isthebackendobjectdatabasethat
storestheconfigurationmodel,andZopeisthewebapplicationdevelopment
environmentusedtodisplaytheconsole.TherelationalMySQLdatabaseisusedto
holdcurrentandhistoricalevents.
Zenoss2.2hasrecentlybeenreleasedwhichprovidesstackbuildscomplete
bundlesincludingZenossandallitsprerequisites.Thesestackinstallersare
availableforawidevarietyofLinuxplatforms;standardRPMandsourceformatsare
alsoavailable.Foreasyevaluation,aVMwareappliancecanbedownloaded,readyto
go.
ItriedboththeVMwarebuildandthe2.2stackinstallforSuSE10.3;bothwere
relativelypainless.Therestofthissectionisbasedonthe2.2stackinstallationona
machinewhosehostnameiszenoss.
ToaccesstheWebconsole,pointyourbrowserathttp://zenoss:8080.Thedefaultuser
isadminwithapasswordofzenoss.Thedefaultdashboardiscompletelyconfigurable
butthisscreenshotisclosetothedefault.

99
Figure78:Zenossdefaultdashboard

8.1 Configuration Discovery and topology


ThereisagoodZenossQuickstartdocumentavailablefrom
http://www.zenoss.com/community/docs.SimilartoOpenNMS,thearchitectureis
basedonobjectorientedtechniques.

8.1.1 Zenoss discovery


zPropertiescanbedefinedfordevices,services,processes,productsandevents.
ObjectscanbegroupedandsubgroupedwithzPropertiesbeingrefinedandchanged
throughoutthehierarchy.So,forexample,theDeviceobjectclasshasdefault
subclassesfordifferentdevicetypes,asshownbelow.

100
Figure79:Zenossdeviceclasses
TheclassofDeviceshasazPropertiespageasdotheclassesNetwork,Server,Printer,
etc.DeviceswillinitiallybeaddedtotheDiscoveredclassandcanthenbemovedtoa
moreappropriateclass.

101
Figure81:ZenossLinuxServerdevices
Figure80:ZenossServerDeviceclasses

DiscoveryandmonitoringislargelycontrolledbythecombinationofzProperties
appliedtoadevice,ofwhichtherearealargenumber(mostwithsensibledefaults).
Initially,basicSNMPandpingpollingparametersshouldbeconfiguredinthe
zPropertiespageforDevices.

102
Figure82:ZenosszPropertiesfortheDeviceclass(part1)

103
Figure83:ZenosszPropertiesfortheDeviceclass(part2)

104
Figure84:ZenosszPropertiesfortheDeviceclass(part3)

ThelefthandmenusofthewebconsoleprovideanAddDeviceoption(nothingis
discoveredautomatically,outofthebox).

Figure85:ZenossAddDevicesdialogue

Onceadevicehasbeendiscovered(whichbydefaultusesping),ifthediscovery
protocolissettoSNMPthenthedevicewillbequeriedforitsSNMProutingtable.
Anynetworksthatthedevicehasroutestowillthenbeaddedtotheobjectclassof
networks.

105
Figure86:ZenossNetworksclasswithdropdownmenu

Oncethepresenceofanetworkhasbeendiscovered,devicescanautomaticallybe
discoveredonthatnetworkthisusesaspraypingmechanism.Thereisadropdown
menufromthetopleftcorneroftheNetworkspage(whichworksfineforsimpleClass
Cnetworks).AlthoughtheGUIdoesmanagetodisplaysubnetworksaccurately,even
ifthesubnetmaskisnotonabyteboundary,theDiscoverDevicesmenudoesnot
honourthesubnetmask.However,agoodfeatureofZenossisthatthereisa
commandline(CLI)forvirtuallyeverythingandtheCLIfordevicediscoveryona
networkdoeshonoursuppliednetmasks.Forexample:
zendiscrunnet10.0.0.0/24
NotethattheZenossdiscoveryalgorithmisverydependentongettingroutingtables
usingSNMPandtheZenossservermustsupportSNMPitself.
FordevicesthatdonotsupportpingbutdosupportSNMP,theycanbeadded
manuallywiththeAddDevicemenu.ThezPropertiesofthedevice(orclassof

106
devicesifyoucreateasubclass)shouldhavezPingMonitorIgnore=Trueand
zSsnmpMonitorIgnore=False.
TherearethreeZenossprocessesthatimplementdiscovery:
zenmodelercanuseSNMP,sshandtelnettodiscoverdetailedinformation
aboutdevices.zenmodelerwillonlyberunagainstdevicesthathavealready
beendiscoveredbyzendisc.Bydefault,zenmodelerrunsevery6hours.
zenwindetectsWindows(WMI)services
zendiscisasubclassofzenmodeler.IttraversesroutingtablesusingSNMP
andthenusespingtodetectdevicesondiscoverednetworks.

8.1.2 Zenoss topology maps


Zenosshasanautomatictopologymappingoptionwhichcandisplayupto4hopsfrom
aselecteddevice.Itevenseemstobeabletounderstandnetworksservedbyseveral
routers!

Figure87:ZenossNetworkMapshowing4hopsfromgroup100r1

107
8.2 Availability monitoring
AvailabilitymonitoringinZenosscanuse3differentmethods:
pingtests
implementedviazenping
detectsdeviceavailability
servicetests
implementedviazenstatus
detectsservicesasdefinedbyTCP/UDPports
processtestsandWindowsServicestests
implementedviazenprocess
detectsprocessesusingtheSNMPHostResourcesMIBusingthe
snmp.IpServiceMapzCollectorPlugindrivenbyzenmodeler
detectsWindowsservicesusingWMIusingtheWinServiceMapdrivenby
zenwin

8.2.1 Basic reachability availability


BasicavailabilitymonitoringiscontrolledbyCollectors.Thesearealsoknownas
Monitors(andthedocumentationcanbeconfusing!),TheCollectorsmenucanbe
foundonthelefthandside.

108
Figure88:ZenossCollectors(Monitors)overview

Thedevicesbeingmonitoredareshownatthebottomofthescreen.Tochangeanyof
theseparameters,usetheEdittab.Thedefaultsforavailabilitymonitoringare:
Pingcycletimepolling 60sec
Pingtimeout 1.5sec
Pingretries 2
Status(TCP/UDPservice)pollinginterval 60sec
Process(SNMPHostResources)pollinginterval 180sec
SNMPperformancecycleinterval 300sec
WhatavailabilitychecksarecarriedoutonadeviceiscontrolledbythezPropertiesof
thatdevice,rememberingthatzPropertiescanbesetatanyleveloftheobject
hierarchy.Bydefaultthe/DevicesclasshaszPingMonitorIgnore=Falseand
zSnmpMonitorIgnore=Falsesoeverydevicewillgetpingpollingat1minuteintervals
andSNMPpollingat5minuteintervals.

109
8.2.2 Availability monitoring of services - TCP / UDP ports and windows
services
ServicemonitoringforTCP/UDPportsandWindowsservices,isconfiguredthrough
theServicesmenu.

Figure89:ZenossServicesmenu

AverylargenumberofWindowsservicesarepreconfiguredoutofthebox.These
servicesareactuallymonitoredbythezenwindaemonwhichuses(andrequires)WMI
ontheWindowstargetmachine.NotetheCountcolumnshowingonhowmany
devicestheseserviceshavebeendetected

110
Figure90:ZenossWindowsservices

EvenmoreIPservicescomeconfiguredoutofthebox.TherearetwosubclassesofIP
servicesPrivilegedandRegistered;eithercanmonitoreitherTCPorUDPports.

111
Figure91:ZenossPrivilegedIPservices

Again,notetheCountcolumn.Clickingontheservicenameshowswherethe
servicehasbeendetected:

Figure92:Zenossdevicesrunningthedomain(DNS)serviceonTCP53orUDP53

112
Thefactthataservicehasbeendetecteddoesnotimplythatitisbeingmonitoredfor
availability(thedefault,outofthebox,isthatnothingismonitored).TheMonitor
columnfordevicesshowswhetheractivemonitoringistakingplace(andhenceevents
potentiallybeinggenerated).TheMonitorfieldinthetoppartofthewindowshows
theglobaldefaultforthisservice.
Toturnonservicemonitoringgloballyforaparticularservice,usetheServicesmenu
tofindtheserviceinquestion.YoucanthenuseeitherthezPropertiestaborthe
EdittabtochangetheMonitorglobaldefaulttoTrue(thedefault,asshipped,is
False).
Toturnonservicemonitoringforaspecificdevice,accessthemainpageforadevice
andopentheOStab.UndertheIPServicessection,clickontheNamecolumn
headertoseeservicesdetected.Clickontheservicenamewhichbringsuptheservice
statuswindowforthedevicewheretheMonitorfieldcanbechangeddon'tforgetto
clicktheSavebutton.NotethattheMonitoredboxintheIPServicesheadingbar
canbeusedtotogglethedisplaybetweendetectedservicesandmonitoredservices.
NotethatthedropdownmenutoAddIpServiceisdrivenbytypinginapartial
matchoftheservicenameyouwantthesubsequentdropdownthenshows
configuredservicesthatmatchyourselection.

8.2.3 Process availability monitoring


Unix/LinuxprocessmonitoringreliesontheSNMPHostResourcesMIBonthe
targetdevice.Processestobemonitoredcanbeflexiblydefinedusingregular
expressions.StartfromtheProcessesmenutoseeprocessesdefined(therearenone
outofthebox).UsethedropdownmenutoAddprocess.

113
Figure93:ZenossProcesseswithdropdownmenu

Supplyaprocessnameanditwillbeaddedtothelist.Tomodifythedefinitionofthe
process,clickontheprocessnameandselecttheEdittab.

Figure94:Zenossdialogueformodifyingprocessdefinition

TomodifythezPropertiesofaprocess,usethezPropertiestab.

114
Figure95:ZenosszPropertiesforthefirefoxprocess

Toapplyprocessmonitoringtoadevice,fromtheOStabofthedevicepage,selectthe
dropdownmenuandusetheAddOSProcessmenu.Definedprocessesareselectable
fromthedropdownwindow.

Figure96:ZenossAddOSProcessmonitoringtoaspecificdevice

115
Notethattherearecurrently(July4th,2008)acoupleofbugstodowithprocess
monitoringwherebyprocessesdisappearfromtheOStabofadeviceand/orshowthe
wrongstatus(tickets#3408,#3399,#3270).Tomitigateagainstthese,thezenprocess
daemonshouldbestoppedandrestartedwhenevermodificationshavebeenmadeto
dowithprocesses.YoucanusetheGUIbychoosingSettingsandselectingthe
Daemonstab.
Temporarily,itwouldalsobewisetousethemenufortheprocessandselecttoLock
theprocessfromDeletion.
Moresophisticatedavailabilitymonitoringcanbeimplementedusingstandard
zCollectorPluginsnotethatthesearemodellingpluginsasdistinctfrom
performanceplugins.zCollectorpluginsareappliedtodeviceclassesordevices
throughthezPropertiestabusetheEditlinkalongsidezCollectorPluginstoshow
ormodifythepluginsappliedandavailable.

Figure97:ZenosszCollectorPlugins

NotethattheAddFields/HideFieldsappearsgreyedoutbutdoesactuallywork.The
pluginsshownontheleftinthescreenshotabovearethedefaultforthe/Devicesclass.
The/Devices/ServerclasshasseveralmoreSNMPbasedplugins,bydefaultand
the/Devices/Server/Windowsclasshasanextrawmi.WinServiceMapplugin.
Documentationonthesepluginsseemsalittlesparsebuthereareafewclues:

116
Figure98:Zenossdefaultpluginsforclass/Devices/Server/Windows

zenoss.snmp.InterfaceMap usesSNMPtoqueryforinterfaceinfo
zenoss.snmp.IpServiceMap zenstatusdaemonqueriesTCP/UDPportinfo
zenoss.snmp.HRSWRunMap usesSNMPtogetprocessinfofromHost
resourcesMIB
zenoss.wmi.WinServiceMap zenwindaemonusesWMItoqueryforWindows
services
Onewaytofindwhatpluginsareappliedbydefaulttodeviceclassesistoinspectthe
migrationscriptsupplied
in/usr/local/zenoss/zenoss/Products/ZenModeler/migrate/zCollectorPlugins.py.
Toseewhatpluginsareactiveonaspecificdevice,usethedevicesmainpagemenu
andselecttheMoremenutofindtheCollectorPluginsmenu.

117
Figure99:ZenosszCollectorPluginsfordevicegroup100r1.class.example.org

Whenmodifyingcharacteristicsforspecificdevices,donotethatthemainpagemenu
(fromthearrowdropdownatthetopleftcorner)hasbothaMoresubmenu(which
includeszPropertiesamongotherthings)andaManagesubmenu.

118
Figure100:ZenossDeviceMoresubmenu

Figure101:ZenossDeviceManagesubmenu

119
8.2.4 Running commands on devices
AfewCommandsaredefinedoutoftheboxandcanbeseenusingthelefthand
SettingsmenuandthenselectingtheCommandstab.Newcommandscanbe
addedusingtheAddUserCommanddropdownmenu.

Figure102:ZenossCommandsprovidedoutofthebox

Fromadevice'smainpage,thereisasubmenutoRunCommands.

Figure103:ZenossRunCommandsforaparticulardevice

120
Althoughmuchoftheavailabilitymonitoringthathasbeendemonstratedsofarrelies
onSNMP,itisalsopossibletousesshortelnettocontactremotedevicesandrun
monitoringscriptsonthem.

8.3 Problem management


TheZenosseventmanagementsystemcancollecteventsfromsyslogs,windowsevent
logs,SNMPTRAPsandXMLRPC,inadditiontomanagingeventsgeneratedby
Zenossitself(suchasavailabilityandperformancethresholdevents).
WhenaneventarrivesintheStatustableoftheeventsdatabase,thedefaultstateof
theeventissettoNew.TheeventcanthenbeAcknowledged,Suppressedor
Dropped.Fromthere,aneventwillbearchivedintotheEventHistorydatabaseinone
offourways.
Manuallymovedtothehistoricaldatabase(historifying)
Automaticcorrelation(goodeventclearsbadevent)
Aneventclassrule
Atimeout
Eventsautomaticallyhaveaduplicationdetectionruleappliedsothatifaneventof
thesameclass,fromthesamedevice,withthesameseverityarrives,thentherepeat
countofanexistingeventwillsimplybeincremented.
Globalconfigurationparametersfortheeventsystemcanbeconfiguredfromthe
EventManagerlefthandmenu.

Bydefault,statuseventsofseveritybelowError,areagedouttotheEventHistory
databaseafter4hours.Historicaleventsareneverdeleted.

121
Figure104:ZenossEventManagerconfiguration

8.3.1 Event console


ThemainEventConsoleisreachedfromtheEventConsolemenuontheleft.The
defaultistoshowallstatuseventswithaseverityofInfoorhigher,sortedfirstby
severityandthenbytime(mostrecentfirst).Eventsareassigneddifferentseverities:
Critical Red
Error Orange
Warning Yellow
Info Blue
Debug Grey
Clear Green
Theeventssystemhastheconceptofactivestatuseventsandhistoricalevents(two
differentdatabasetablesintheMySQLeventsdatabase).
EventsintheconsolecanbefilteredbySeverity(Infoandabovebydefault)andby
State(New,AcknowledgedandSuppressedwhereNewandAcknowledgedareshown
bydefault).AnyeventwhichhasbeenAcknowledgedchangestoawishywashy
versionoftheappropriatecolour.ThereisalsoaSearchboxatthetoprightfor
filteringevents.

122
Figure105:ZenossEventConsole

FromtheConsole,eventscanbeselectedbycheckingtheboxalongsidetheeventand
thedropdowncanbeusedforvariousfunctionsincludingAcknowledgeandMove
toHistory.ThedropdowncanalsobeusedtogenerateanytesteventwiththeAdd
Eventoption(ifyouareaCLIpersonratherthanaGUIperson,thezensendevent
commandisalsoavailable).
ThecolumnheadersoftheEventConsolecanbeusedtochangethesortingcriteria
andtheiconatthefarrightoftheeventcanbeusedtodisplaythedetaileddataof
fields.

8.3.2 Internally generated events


EventsareautomaticallygeneratedbyZenossifanavailabilitymetricismissed(such
asapingcheckfailingoraservicecheckfailing).Similarly,ifperformancesampling
issetupalongwiththresholds,theneventswillbegeneratedifthethresholdis
breached.Reasonabledefaultsforsucheventsareconfiguredoutofthebox.

123
EventsareorganisedinclasshierarchieswhichhavezProperties,justlikeDevices.
Tomodifythepropertiesofanevent,selecttheEventsoptionfromthelefthand
menu.

Figure106:ZenossEventclassesandsubclasses

Tomodifythecontextofanyevent,selecttheeventandusethezPropertiestab.

Figure107:ZenosszPropertiesfortheeventclass/Event/Status/OSProcess

124
EventsaremappedtoEventClassesbyEventClassinstances.EventClassinstances
arelookedupbyanonuniquekeycalledEventClassKey.Whenaneventarrivesit
is:
Parsed
Assignedtotheappropriateclassandclasskey
Contextisthenapplied:
EventcontextisdefinedinthezPropertiesofaneventclass
Aftertheeventcontexthasbeenapplied,thenthedevicecontextisapplied
wherebytheProductionState,Location,DeviceClass,DeviceGroups,and
Systems,areallattachedtotheeventintheeventdatabase.
Oncethesepropertieshavebeenassociatedwiththeevent,Zenossattemptsto
updatethezEventProperties. Thisallowsaparticulardeviceorclassofdevices
tooverridethedefaultvaluesforanygivenevent.
Tochangetheeventmapping,selecttheeventclassandusetheMappingstab.

Figure108:ZenossEventmapping

TheEdittaballowseditingofanyofthesefields.

8.3.3 SNMP TRAP reception and configuration


ZenossautomaticallylistensforSNMPTRAPsonUDP/162(thewellknowntrapport)
usingthezentrapprocess.SomegenericTRAPs(23and4forLinkDown,LinkUp
andAuthenticationFailure)areautomaticallymappedtodefinedclasses.Other
genericTRAPs(suchas0,1forColdStartandWarmStart)appearasthe/Unknown
eventclass,aswillanyspecificTRAPs.Itissimpletomapsucheventstoanalready

125
configuredeventclassbyselectingtheoccurrenceoftheeventandusingthepulldown
menutoselectMapEventstoClasspickthecorrectclassfromthescrollablelist.
Itisalsopossibletocreateneweventclasses.StartingfromEventsontheleftmenu,
navigatetotheplaceintheeventclasshierarchyunderwhichyouwanttocreatea
newclassandusethedropdownmenutoAddNewOrganizerandgivetheclassa
uniquename.

Figure109:Zenossmenutocreateaneweventclass

8.3.4 email / pager alerting


AlertingRulesareZenoss'swayofsendingemailand/orpagingnotifications.These
areconfiguredonaperuserbasis,startingfromthePreferencesmenutowardsthe
toprightofthewebconsole.TheAlertingRuletabthenshowsexistingrulesand
permitsrulecreation/deletion.

126
Figure110:ZenossmenutocreateAlertingRule

UsingtheEdittabpermitschangesofexistingalertingrules.Differentrulescanbe
appliedbasedonacombinationofseverity,eventstate,productionstateandamore
genericfilter.TheProductionStateisassignedtoadeviceordeviceclass:
Production
PreProduction
Test
Maintenance
Decommissioned
TheProductionStatecanbesetorchangedusingtheEdittabfromadevicemain
page.ThedefaultisProduction.TheProductionStateattributecanbeusedto
controlwhetheradeviceismonitoredatall,whetheralertsaresentandwhethera
deviceisrepresentedontheZenossmaindashboard.Itisverysimpletomodifythe
ProductionStatetoputadeviceorclassofdevicesintomaintenance,forexample.

127
Figure111:ZenossEditingalertingrule

TheemailorpagermessageoftheAlertingRuleisconfiguredbytheMessagetab
andtheScheduletabcanbeusedtocreatedifferentalertingrulesatdifferenttimes.

128
Figure112:ZenossAlertingrulemessageformat

Globalparametersforemailandpaging,alongwithotherusefulparameters,canbe
definedfromtheSettingslefthandmenu.

129
Figure113:ZenossSettingsparameters

TheoutoftheboxemailnotificationsprovidehandylinksbacktoZenossto
manipulatetheeventthatisbeingreportedon.

130
Figure114:Zenossemailgeneratedbyeventnotification,includinglinks

8.3.5 Event automations


Anyeventcanbeconfiguredtorunanautomaticscript.Thiscanbeinadditiontothe
email/pageralertingrulesdescribedabove.Suchautomationscriptsareknownas
ZenossCommandsandarerunbythezenactionsdaemon.Theyareconfiguredfrom
theEventManagerlefthandmenuusingtheCommandstab.

Figure115:ZenossEventCommanddefinition

131
8.4 Performance management
ZenosscancollectperformancedataandthresholditusingeitherSNMP(throughthe
zenperfsnmpdaemon)orbycommands(typicallyssh),usingthezencommanddaemon.
ThedataisstoredanddisplayedusingRRDTool.

8.4.1 Defining data collection, thresholding and graphs


Configurationofperformancedatacollection,thresholdinganddisplayisdone
throughtemplates.AswithotherZenossobjects,templatescanbeappliedtoaspecific
deviceortoahigherlevelinthedeviceclassobjecthierarchy.Toseeallthedefined
templates,navigatetotheDevicespageandusethelefthanddropdownmenuandthe
MoresubmenutochooseAllTemplates.

Figure116:ZenossAllTemplatesshowingalldefinedperformancetemplates

WiththeexceptionofthetemplateswithHRMIBinthename,theabovefigure
showsthedefaulttemplatesasshipped.Notethatthesearedefinedtemplates
thereisnoindicationhereastowhichareactiveonwhatobjects.
NoteinthescreenshotabovethatthereareseveraltemplatescalledDevice.
Templatescanbeboundtoadeviceordeviceclasstomakeitactive.When

132
determiningwhatdatatocollect,thezenperfsnmp(orzencommand)daemonfirst
determinesthelistofTemplatenamesthatareboundtothisdeviceorcomponent.
Fordevicecomponentsthisisusuallyjustthemetatypeofthecomponent(e.g.
FileSystem,CPU,HardDisk,etc.)Fordevices,thislististhelistofnamesinthe
device'szDeviceTemplateszProperty.

Figure117:ZenosszPropertiesshowingzDeviceTemplate

Thedefault,outofthebox,isthatthedevicetemplatecalledDeviceisboundtoeach
devicediscovered.Asnotedinthepreviousscreenshot,thereareseveraltemplates
calledDevice.TheDevicetemplatefortheclass/DevicessimplycollectssysUpTime.
ThetemplatecalledDevicefor/Devices/Servercollectsanumberofparameters
supportedbythenetsnmpMIB.ThetemplatecalledDevice
for/Devices/Server/WindowscollectsvariousMIBvaluesfromtheInformantMIB.
ForeachtemplatenameZenosssearchesfirstthedeviceitselfandthenuptheDevice
Classhierarchylookingforatemplatewiththatname.Zenossusesthefirsttemplate
thatitfindswiththecorrectname,ignoringotherswiththesamenamethatmight
existfurtherupthehierarchy.

133
So,thezenperfsnmpdaemonwillcollectnetSNMPMIBinformationforUnix/Linux
serversandwillcollectInformantMIBinformationforWindowsservers
(as/Devices/Server/Windowsismorespecificthan/Devices/Server).Anyactualdevice
canhavealocalcopyofatemplateandchangeparameterstosuitthatspecificdevice.
TemplatebindingscaneitherbemodifiedbychangingthezProperties
zDeviceTemplatesfieldorthereisaBindTemplatesmenudropdownfromthe
templatesdisplayofanydevice.(Dorememberthat,foradevice,boththeTemplates
menuandthezPropertiesmenuareofftheMoredropdownsubmenu).

Figure118:ZenossBindTemplatesmenu

Beawarethatwhenselectingtemplatestobind,youneedtoselectallthetemplates
youwantbound(usetheCtrlkeytoselectmultiples).
So,whatdothesetemplatesactuallyprovide?
Templatescontainthreetypesofsubobjects:
Datasources whatdatatocollectandmethodtouseeg.MIBOID
Thresholds expectedboundsfordataandeventstoraiseifbreached
Graphdefinitions howtographthedatapoints

134
Figure119:ZenossDevicetemplatefor/Devices/Server

ZenossprovidestwobuiltintypesofDataSources,SNMPandCOMMAND.Other
typescanbeprovidedthroughZenPacks.ClickingontheDataSourcedisplaysdetails
whichcanthenbemodified.TypicallyanSNMPDataSourcewillprovideasingle
DataPoint(aMIBOIDvalue).Typicallythenameofthedatapointwillbethesame
asthenameofthedatasource.Thismeansthatwhenyoucometoselectthreshold
valuesorvaluestograph,youwillbeselectingnameslike
ssCpuRawWait_ssCpuRaw_wait.

Figure120:ZenossDataSourcememAvailReal

135
NotethatthereisausefulTestbuttontocheckyourOIDagainstanodethatZenoss
knowsabout.However,bewarethatthisTestbuttonappearstousesnmpwalkunder
thecoverssoifaMIBOIDhasmultipleinstancesthenthesnmpwalkwillreturn
valuessuccessfully.Whenzenperfsnmpactuallycollectsdata,itrequiresthecorrect
instanceaswellasthecorrectMIBOID.Ifyourtestissuccessfulbutyou
subsequentlyseeemptygraphswithamessageofMissingRRDfilethenthe
problemislikelytobethattheMIBinstanceisincorrect.
DatasourcescanbeaddedordeletedwiththedropdownAddDataSourceand
DeleteDataSourcemenus.
Thresholdscanbeappliedtoanyofthedatapointscollected,alongwitheventsto
generateifthethresholdisbreached.

Figure121:ZenossThresholdonCPUcollecteddata

Allofthedatapointsdefinedinthedatasourcessectionaresuppliedinthetop
selectionbox.Ifaneventistobegenerated,dropdownsareprovidedtoselectthe
eventclassandseverity.Youcanalsospecifyanescalationcount.
ThresholdscanbeaddedordeletedfromtheThresholdsdropdownmenu.

136
Figure122:ZenossDropdownmenufordatathresholds

Notethatthisdropdownmenu(asisalsotrueoftheDataSourcesdropdown)hasan
optiontoAddtoGraphs.
Graphscanbedefinedforawidecombinationofthecollecteddatapointsand
thresholds.ThemenupanelsarebasicallyafrontendtotheRRDgraphingtooland,
withlotsofsamplesprovided,youdon'tneedtogetintothedetailsofRRDTool;
howeverifyouwishto,thereisplentyofscopetodoso.
Graphscanbeadded,deletedorresequencedusingthedropdown.Existinggraphs
aremodifiedbyclickingonthegraphname.

137
Figure123:ZenossPerformancetemplategraphdefinition

Notethatgraphscandisplaybothdatapointsandthresholds.
Allgraphsarestored,bydefault,under/usr/local/zenoss/zenoss/perf/Devices.Thereis
asubdirectoryforeachdevice.Componentdatarrdfilesareundertheossubdirectory
withfurthersubdirectoriesforfilesystems,interfacesandprocesses.

8.4.2 Displaying performance data graphs


Toviewperformancegraphs,theOperatingSystemcomponentgraphscanbeseen
fromtheOSpageofadevice,byclickingontherelevantinterface,filesystemor
process.TherestoftheperformancegraphscanbefoundunderthePerftab.

138
Figure124:ZenossPerformancegraphsforeth1interfaceonbino

YoucanchangetherangeofdatawiththeHourlydropdown(todaily,weekly,
monthlyoryearly).Datacanbescrolledusingthe<>barsateithersideandthe+
andmagnifierscanbeusedtozoomin/out.Bydefault,allgraphsonthepageare
linked(sothatifyouchangetherangeonone,itchangesforall).Theycanbede
coupledwiththeLinkGraphs?checkbox.
HereisapartialscreenshotofthegraphsforbinounderthePerftab.

139
Figure125:ZenossPerformancegraphsavailableunderthePerftabforbino

NotethattheReportslefthandmenualsoprovidesaccesstovariousreports,
includingperformancereports.

140
Figure126:ZenossReportsmenu

FollowingthePerformanceReportslinkprovidesaccesstoallperformancereports
foralldevices.

Figure127:ZenossPerformanceReportsmenu

8.5 Zenoss summary


Zenossisanextremelycomprehensivesystemsandnetworkmanagementproduct,
satisfyingmostofmyrequirements.Onefeelsthattheobjectorientedarchitectureis
extremelyflexibleandpowerfulwithmostthingsyourequirealreadyconfiguredout
ofthebox.Theautomaticdiscoveryandtopologymappingoptionsarethemost
powerfuloftheproductsdiscussedhere.ItcanaccommodateNagiosandCacti
pluginsandhasitsownaddonarchitectureintheformofZenPacks.

141
ZenosswilluseSNMPtogainstatusandperformanceinformationfromadevicebutit
alsohassshandtelnetasalternatives,forthosedeviceswhereSNMPis
inappropriate.
TheQuickStartGuidegetsyourunningfastandtheAdminGuideprovideswhatit
saysareasonablecomprehensiveAdministrator'sGuide.Thereisalsoabookby
MichaelBadger,publishedJune2008,ZenossCoreNetworkandSystem
Monitoring,whichiswellworththeinvestment(availablebothinpaperandin
electronicformat).However,onefeelsthatthereissomuchmoreinthedetailof
Zenossthatoneneedstoknowandcanfindnoinformationon!
MyonlyrealnegativecommentonZenoss,otherthanthelackofdetailedtechnical
information,isthatitisarapidlyevolvingproductanditfeelsratherbuggy.The
current(August2008)pollonthezenossusersforumforinputtoZenoss2.3,has
manyrequesterswithcodereliabilityandbetterdocumentationatthetopoftheir
lists!

9 Comparison of Nagios, OpenNMS and Zenoss


Necessarily,comparisonsarebasedonamixtureoffactandfeelingandyouneeda
cleardefinitionofwhatfeaturesareimportanttoyourenvironmentbefore
comparisonscanbevalidforyou.
Nagiosisanolder,morematureproduct.ItevolvedfromtheNetSaintproject,
emergingasNagiosin2002.OpenNMSalsodatesbackto2002butfeelslikethelead
developer,TarusBalog,haslearnedsomelessonsfromobservingNagios.Zenossisa
morerecentoffering,evolvingfromanearlierprojectbydeveloperErikDahland
emergingtothecommunityasZenossaround2006.
AlltheproductsexpecttouseSNMPOpenNMSandZenossuseSNMPasthe
defaultmonitoringprotocol.TheyallprovideotheralternativesZenosssupportsssh
andtelnetalongwithcustomisedZenPacks;NagioshasNRPEandNSCAagents(both
ofwhich,ofcourse,requireinstallingonremotenodes);OpenNMSdoesn'thavemuch
elsetoofferoutoftheboxbutitcansupportJMXandHTTPaswellashaving
supportforNagiosplugins.
Alltheproductshavesomeusermanagementtodefineusers,passwordsandroles
withcustomisationofwhatausersees.
OpenNMSandZenossuseRRDTooltoholdanddisplayperformancedata;Nagios
doesn'treallyhaveaperformancedatacapabilityCactimightbeagoodcompanion
product.
Mostsurprisingly,giventhattheyallrelyonSNMP,noneoftheproductshasan
SNMPMIBBrowserbuiltintoassistwithselectingMIBsforbothstatusmonitoring
andperformancedatacollection.

142
Thereareadvocatesforandagainstagentlessmonitoring.Personally,Idon't
believeinagentless.Onceyouhavegotpastpingthenyouhavetohavesomeform
ofagenttodomonitoring.Thequestionis,shouldamanagementparadigmusean
agentthatistypicallypartofaboxbuild(likessh,SNMPorWMIforWindows),or
shouldthemanagementsolutionprovideitsownagent,likeNagiosprovidesNRPE
(andmostofthecommercialmanagementproductscomewiththeirownagents).If
yourmanagementsystemwantsitsownagents,youthenhavethehugeproblemof
howyoudeploythem,checktheyarerunning,upgradethem,etc,etc.OpenNMSand
ZenosshaveastrongdependencyonSNMPalthoughZenossalsosupportssshand
telnetmonitoring,outofthebox(ifyourenvironmentpermitsthese).SNMPmaybe
oldandSimple,butallthreeproductssupportSNMPV3(forthosewhoareworried
aboutthesecurityofSNMP)andvirtuallyeverythinghasanSNMPagentavailable.
Theotherformofagentlessmonitoringbasicallycomesdowntoportsniffingfor
services.Whilstthiscanworkfineforsmallerinstallations,thensquarednatureof
lotsofdevicesandlotsofservicesdoesn'tscaletoowell.Allthreeproductsdoport
sniffingsoitcomesdowntohoweasyitistoconfigureeconomicmonitoring.

9.1 Feature comparisons


Thefollowingtablesstartwithmyrequirementsdefinitionandcomparethethree
productsonafeaturebyfeaturebasis.(OOTB=OutOfTheBox).

9.1.1 Discovery
Nagios OpenNMS Zenoss
Nodediscovery Configfileforeach Configfilewith GUI,CLIandbatch
node include/exclude importfromtextor
ranges XMLfile
Automatic No Yesnodeswithin Yesnetworks&nodes
discovery configuredn/wranges
Interface Possiblethrough Yesincludingswitch Yesincludingswitch
discovery configfile ports ports
Discovernodes Yesuse Yessend_event.pl YesuseSNMP,sshor
thatdon't check_ifstatus telnet
supportping plugin
SQLDatabase No PostgreSQL mySQL&ZopeZEO
Service(port) Yesuseplugin Yesvariousoutof YesTCPandUDP
discovery (TCP,UDP,....) thebox
Application Yesdefineservice Notwithoutextra Yeswithssh,
discovery agenteg.NRPE zenPacksorplugins

143
Nagios OpenNMS Zenoss
Supports Yes Yes Possible
NRPE/
NSClient
SNMPsupport V1,2&3 V1,2&3 V1,2&3
L3topology Yes No Yesupto4hops
map
L2topology No No No(butmaybein
map plan!)

9.1.2 Availability monitoring


Nagios OpenNMS Zenoss
Pingstatus Yes Yes Yes
monitoring
Alternativesto Yesanyplugin Nagiosplugins Yesssh,telnet,
pingstatus eg.check_ifstatus ZenPacks,Nagios
plugins
Portsniffing Yes Yes Yes
Processmonitoring Yeswithplugins Nagiosplugins YesHostResources
MIB
Agenttechnology Generallyrelies SNMPoutofthebox; SNMP,sshclient,
onNagiosplugins customisedplugins WMIforWindows,
deployed possible ZenPackstobe
deployed
Availabilityreports Yes Yes Yes

9.1.3 Problem management

Nagios OpenNMS Zenoss


Configurable No Yes Yes
eventconsole
Severity Yes Yes Yes
customisation

144
Nagios OpenNMS Zenoss
Event No Flexible.LotsOOTB Flexible.LotsOOTB
configuration
SNMPTRAP No Flexible.LotsOOTB Flexible.LotsOOTB
handling
email/pager Yes Yeswith Yes
notifications configurable
escalation
Automation autoactionson autoactionson
events events
goodnews/badnews goodnews/badnews
correlationonalarms correlationonevents
andnotifications andnotifications
Deduplication Noautomaticrepeat Yes Yes
countmechanismbut
eventsdonotcontinue
toberaisedfor
existingproblems
Service/host Yes No
dependencies
Rootcause UNREACHABLE Outages/Path No
analysis statusfordevices outages
behindnetworksingle
pointoffailure.
Also,host/service
dependencies.

9.1.4 Performance management

Nagios OpenNMS Zenoss


Collect No Yes Yes
performancedata
usingSNMP
Collect No NSClient,JMX, ssh,telnet,other
performancedata HTTP methodsusing
usingother ZenPacks
methods

145
Nagios OpenNMS Zenoss
Threshold No Yes Yes
performancedata
Graph No Yeslotsprovided Yeslotsprovided
performancedata OOTB OOTB
MIBcompiler No No Yes
MIBBrowser No No No(thoughaMIB
BrowserZenPackis
saidtobeavailable
for2.2)

9.2 Product high points and low points


Thissectionisfarmoresubjectiveyourmileagemayvary!

9.2.1 Nagios goodies and baddies

Goodpoints Badpoints
Good,stablecodeforsystems Noautodiscovery
management
Goodcorrelationbetweenservice Weakeventconsole
eventsandhostevents
Commandtocheckvalidityofconfig NoOOTBcollectionorthresholdingof
files performancedata
Commandtoreloadconfigfileswithout NoeasywaytoreceiveandinterpretSNMP
disruptingNagiosoperation TRAPs
Gooddocumentation NoMIBcompilerorbrowser

9.2.2 OpenNMS goodies and baddies

Goodpoints Badpoints
GoodOOTBfunctionality WritteninJavalogfileshopeless!Difficult
togetindividualdaemonstatus
Codefeelssolid Nomap(thatworksreasonably)
Clean,standardconfigurationthrough GUIiswordydifficultfortheeyetofocus
wellorganisedxmlfiles ontheimportantthings

146
Goodpoints Badpoints
Singledatabase(PostgreSQL) NeedtobounceentireOpenNMSwhen
almostanyconfigfileischanged
LOTSoftrapcustomisationOOTB Event/alarm/notificationarchitectureis
currentlyamess(underreview)

Abilitytodosomeconfiguration Nowaytochangecoloursofevents
throughwebAdminmenu
EasyimportofTRAPMIBs NoMIBcompilerorbrowser
(mib2opennms)
ChargeablesupportavailablefromThe
OpenNMSGroup
Nopdfdocumentation.Wikihardtofind
detailedinformation.
SupportsNagiosplugins
SomegoodHowtodocumentsforbasic Lotsofthingsundocumentedwhenyouget
configurationonthewiki downtodetails.

9.2.3 Zenoss goodies and baddies

Goodpoints Badpoints
GoodOOTBfunctionality Nocorrelationbetweenserviceeventsand
hostevents
Architecturegoodbasedaroundobject Implementationfeelsbuggy
orientedCMDBdatabase
Topologymap(upto4hops)
Lotsofplugins&zenPacksavailable NoMIBbrowser
emailnotificationsincludeURLlinks Nowaytochangecoloursofevents
backtoZenoss
Commercialversionavailable Commercialversionavailable
GoodQuickStartmanual, Lotsofthingsundocumentedwhenyouget
Administratorsmanualandbook downtodetails
SupportsNagios&Cactiplugins

147
9.3 Conclusions
Whattochoose?Backtoyourrequirements!
Forsmallish,systemsmanagementenvironments,Nagiosiswelltestedandreliable
withahugecommunitybehindit.Foranythingmorethansimplepingchecksplus
SNMPchecks,bearinmindthatyoumayneedawaytoinstallremotepluginson
targethosts.Notificationsarefairlyeasytosetupbutifyouneedtoproduceanalysis
onyoureventlogthenNagiosmaynotbethebestchoice.
OpenNMSandZenossarebothextremelycompetentproductscoveringautomatic
discovery,availabilitymonitoring,problemmanagementandperformance
managementandreporting.Zenosshassometopologymappingandhasbetter
documentationbutthecodefeelslessreliable.OpenNMScurrentlyhasarather
messyarchitecturearoundevents,alarmsandnotifications,thoughthisissaidtobe
underreview.Ialsostruggletobelievethatyouhavetorecyclethewholeof
OpenNMSifyouhavechangedaconfigurationfile!Thecodefeelsverystablethough.
Mychoice,hopingferventlythatcodereliabilityanddocumentationimproves,is
Zenoss.

148
10 References
1. itSMFPocketGuide:ITServiceManagementaCompaniontoITIL,IT
ServiceManagementForum
2. MultiRouterTrafficGrapher(MRTG)byTobiOetiker,
http://oss.oetiker.ch/mrtg/
3. RRDtoolhighperformancedataloggingandgraphingsystemfortimeseries
datahttp://oss.oetiker.ch/rrdtool/
4. netdisconetworkmanagementapplicationhttp://www.netdisco.org/
5. TheDudenetworkmonitorbyMicroTik,http://www.mikrotik.com/thedude.php
6. nagioshost,serviceandnetworkmonitoringprogramhttp://www.nagios.org/
7. Zenossnetwork,systemsandapplicationmonitoringhttp://www.zenoss.com/
8. OpenNMSdistributednetworkandsystemsmanagementplatform
http://www.opennms.org/
9. cactinetworkgraphingsolutionhttp://www.cacti.net/
10. SNMPRequestsForComment(RFCs)http://www.ietf.org/rfc.html
11. V1RFCs1155,1157,1212,1213,1215
12. V2RFCs2578,2579,2580,3416,3417,3418
13. V3RFCs25782580,341618,3411,3412,3413,3414,3415
14. SNMPHostResourcesMIB,RFCs1514and2790http://www.ietf.org/rfc.html
15. PHPscriptinglanguagehttp://www.php.net/
16. ZenossCoreNetworkandSystemMonitoringbyMichaelBadger,published
byPACKTPublishing,June2008,ISBN9781847194282.

11 Appendix A Cacti installation details


Cacti0.8.6j64.4wasinstalledonanOpenSuSE10.3Linuxsystem.
Prerequisitesare:
Awebserver(Apache2.2.470)
PHP(5.2.58.1)
RRDTool(1.2.2347)
netsnmp(5.4.119)

149
MySQL(5.0.4522)
Cacti,aswellasalloftheprerequisites,wereavailableontheOpenSuSE10.3
standarddistributionDVD.
UsetheInstallationunderUnixinstructionsavailablefrom
http://www.cacti.net/downloads/docs/html/install_unix.html.
Afewmodificationswererequiredsuchas:
NoPHP5configurationwasdoneasthefilesdocumentedintheinstallation
guidedidnotexist
ConfigurationofApache2requirednomodifications
in/etc/apache2/conf.d/php5.conf
CactiwasinstalledusingthestandardSuSEYastmechanism
CreatetheMySQLdatabaseby:
cd/usr/share/cacti
mysqluser=rootp(andsupplytherootpasswordwhenprompted)
createdatabasecacti;
sourcecacti.sql;
GRANTALLONcacti.*TOcactiuser@localhostIDENTIFIEDBY
'cacti';
(Notethatcactiintheabovecommandisthepasswordfortheuser
cactiuser)
YouneedtomanuallycreatetheOperatingSystemusercactiuserwith
passwordcacti
Whenpointingyourwebbrowserathttp://<yourserver>/cacti/ensurethatyou
includethetrailingslash.Useaweblogonofadmin,passwordadmin.
Ensurethatapache2andmysqlareeithermanuallystarted(/etc/init.d/<name>
start)orstartthemautomaticallyatsystemstartusingchkconfig
Ensurethatthecactiuseruseridcanexecutethe/usr/share/cacti/poller.php
scriptthatisrunby/etc/crontab.
AlsoensurethatthedirectorythattheRRDdataiswrittento(/var/lib/cacti)is
writeablebythisuser.
cacti.logisin/var/log/cacti
Ifound(through/var/log/messages)thatpoller.phpwasbeingruntwice,oncein
/etc/crontabascactiuserandoncein/etc/cron.d/cactiasuserwwwrun
commentoutthelinein/etc/cron.d/cactiandcheckagainthatcactiusercan
writetothedatafilesin/var/lib/cacti.

150
Theinitialconsolepageisagoodstartingpointtoadddevicestomonitorand
associatedgraphs.

About the author


JaneCurryhasbeenanetworkandsystemsmanagementtechnicalconsultantand
trainerfor20years.Duringher11yearsworkingforIBMshefulfilledbothpresales
andconsultancyrolesspanningthefullrangeofIBM'sSystemViewproductspriorto
1996andthen,whenIBMboughtTivoli,shespecialisedinthesystemsmanagement
productsofDistributedMonitoring&IBMTivoliMonitoring(ITM),thenetwork
managementproduct,TivoliNetViewandtheproblemmanagementproductTivoli
EnterpriseConsole(TEC).AlltheseproductsarebasedaroundtheTivoliFramework
productandarchitecture.
Since1997Janehasbeenanindependentbusinesswomanworkingwithmany
companies,bothlargeandsmall,commercialandpublicsector,deliveringTivoli
consultancyandtraining.Overthelast5yearsherworkhasbeenmoreinvolvedwith
OpenSourceofferings.

151

You might also like