You are on page 1of 36

7/11/19

ISP Essentials Workshop – Network


Monitoring
Manila, Philippines
8-12 July 2019

Agenda
• Intro to Network Management
• Configuration Management
• Device Monitoring
• Flow Monitoring
• Log Management

1
7/11/19

Module 1

INTRO TO NETWORK
MANAGEMENT

Hosts and Services


• Host • Service
– Container for services – An application software
– Can be physical or virtual – Runs on a host
– Both have CPU, Disk, Memory, – Have allocated resources
Network interfaces – Have vendors / suppliers
– Physical hosts also have
• Vendors, service contracts
• Power supplies, temperature

2
7/11/19

Managing Config Data


• Some Host Configuration Data to Track
– Physical Device Locations
– Installed CPU, Disk, Memory, Network Interfaces
– Serial Numbers, Licenses, OS Revision & Patch Details

• Some Service Configuration Data to Track


– Allocated Resources, Network Ports
– Service Permissions, Filters and ACLs, Logging
– Software Revision & Patch Details

Why Manage Config Data?


• Match Resource Allocation to Revenue Generation
• Ensure our Hosts and applications have Secure
configuration
• Correlate operational results with config changes
• Roll back or restore config when fault occurs

3
7/11/19

Operational Data
• Host • Service
– CPU Utilisation – Time to Respond to Request
– Memory Utilisation – Processes in Use
– Disk Utilisation – Queue Length
– Network Interface Utilisation – State of a BGP session
– Fan State
– Port Errors

Operational Data
• Availability • Reachability
– Applies to Hosts & Services – Applies to Hosts & Services
– Percent of time host or service is – Percent of time host or service is
performing to specification reachable
– Typically measured as a percent, for – Typically measured as a percent, for
example 99.99% example 99.99%
– Excludes planned outages – Unreachable hosts may not be
unavailable to everyone
– Unreachable hosts may be available
• Performance from another location
– Time to respond to request or
forward packet
– Megabits or Packets Per Second
– Discards, Errors, Loss
8

4
7/11/19

Why Monitor Operational Data


• Know about Problems Before your Customers Call
• Prove Hosts & Services are Delivering on SLAs
• Continue to Meet SLAs as your Network Grows

Common NMM Tools

10

5
7/11/19

Common Back-end Tools


• Data storage
– Config files, formats and locations
– Databases: SQL, key-pair, not SQL
• RRDTool
– Explain the idea of a round-robin database

• Check_mk
– Explain the idea of a service checking
• Nagios Plugins
– Explains what is Nagios and what are plugins
11

Network Automation
• A continuous process of generation and deployment of
configuration changes, management, and operations of
network devices (from Network Automation at Scale)

12

6
7/11/19

Network Automation
• Automating config management
• Including config changes based on operational data
• Orchestrated with tools like Ansible Chef, Puppet, and Salt
• This is the next step in network monitoring and
management

13

Module 2

ADDRESS MANAGEMENT

14

7
7/11/19

Address Management
• planning and managing the assignment and use of IP
addresses and closely related resources of a computer
network.

• IP Address Management (IPAM) tools


– Racktables
– Netbox
– A lot of others (commercial and open source)

https://en.wikipedia.org/wiki/IP_address_management 15

Tools - Racktables
• Asset management tool

https://www.racktables.org/demo.php
16

8
7/11/19

Tools - Netbox
• open source web application designed to help manage and
document computer networks.

https://netbox.readthedocs.io/en/stable/
17

Module 3

CONFIG MANAGEMENT

18

9
7/11/19

Network Device Configuration


• How to configure device?

– Using the command line (Cisco)


– From a special tool (Mikrotik)
– From a web interface (Procurve)
– JSON files (Arista)
– XML files (Juniper)

• Who configures the device?


• How often do changes happen?
19

Why do you need to manage config?


• Know when changes are done
• Restore config during failure
• Rollback changes with unexpected outcome
• Track config changes throughout time (history)

20

10
7/11/19

What is Version Control?


• Also known as revision control or source control
• Manages changes to files or documents with a revision
number
• Allows users to find and highlight changes
• Allows users to restore previous versions of a file or
document

21

What’s a Diff?
• A comparison of two versions of a single file or document
• Highlighting the changes between the two versions
• Allowing users to quickly see only what’s changed

22

11
7/11/19

What’s a Diff?

23

Config Management Tools


• Retrieve configuration files
• Allow for their storage as files or in versioning system
• Solve many problems with network operations

24

12
7/11/19

Tools - Rancid
• Really Awesome New Cisco config differ
• monitors a router's (or more generally a device's)
configuration
• Uses CVS, Subversion, or Git to maintain history
• Supports Cisco, Foundry, HP, Juniper, and more
• Runs on BSD, Linux, Mac OS
• Pros:
– The de-facto industry standard for config management
https://www.shrubbery.net/rancid/
25

Rancid Example
Index: configs/dc1-gw1

===================================================================
retrieving revision 1.677

diff -U 4 -r1.677 dc1-gw1


@@ -713,8 +713,10 @@
remark permit eduroam to beta-login
permit tcp any host 204.111.222.3 eq www 443
remark permit eduroam to stats
permit tcp any host 204.111.222.4 eq www 443
+ remark permit eduroam to net-api
+ permit tcp any host 204.111.222.5 eq www 443
remark temp deny access to all

deny ip any 204.111.222.0 0.0.0.64

26

13
7/11/19

Rancid Example
Index: configs/dc1-gw

===================================================================

retrieving revision 1.2213

diff -U 4 -r1.2213 dc1-gw

@@ -32,9 +32,8 @@

!Flash: bootflash: Directory of bootflash:/

!Flash: bootflash: 11 drwx 16384 Jan 11 2017 12:13:18 +10:00 lost+found

!Flash: bootflash: 12 -rw- 371180156 Oct 5 2018 14:05:16 +10:00 asr1000rp1-adventerprisek9.03.13.10.S.154-3.S10-ext.bin

- !Flash: bootflash: 13 -rw- 4 Jul 9 2019 15:15:03 +10:00 .issu_loc_lock

!Flash: bootflash: 48769 drwx 4096 Jan 11 2017 12:16:08 +10:00 .installer

!Flash: bootflash: 438913 drwx 4096 Jan 11 2017 13:05:11 +10:00 core

!Flash: bootflash: 829057 drwx 4096 Oct 11 2018 07:24:32 +10:00 .prst_sync

!Flash: bootflash: 520193 drwx 4096 Jan 11 2017 12:19:19 +10:00 .rollback_timer

27

Tools - Oxidized
• network device configuration backup tool (to replace
Rancid)
• Stores files in a version control system
• Supports a large number of manufacturer
– Cisco (CatOS, IOS, IOSXR, NXOS)
– Juniper (JunOS, ScreenOS)
– Huawei (VRP, SmartAX)
– Mikrotik (RouterOS)
• Pros:
– Integrates with LibreNMS
https://github.com/ytti/oxidized
28

14
7/11/19

Other Tools
• Fetchconfig
• Jazigo

29

Module 4

DEVICE MONITORING

30

15
7/11/19

Intro to SNMP
• Simple Network Management Protocol
• Used to communicate management information between
the network management stations and the agents in the
network elements.

• Even though SNMP is a protocol, we use the term SNMP to


describe the complete architecture of the management
system

31

Intro to SNMP
• Network management stations execute management
applications which monitor and control network elements.

• Network elements are devices such as hosts, gateways,


terminal servers

• The agent is a piece of software that runs on the network


devices you are managing. It can be a separate program, or it
can be incorporated into the operating system. Agents listen and
respond on UDP port 161.

32

16
7/11/19

SNMP Polling, Traps and MIB


• SNMP Polling is the act of querying an agent for some piece of
information. SNMP managers use UDP to poll agents

• A trap is way for the agent to tell the NMS that something has
happened. Traps are sent asynchronously, not in response to queries
from the NMS. SNMP traps are sent using UDP port 162.

• MIB or Management Information Base is a database of managed


objects that the agent tracks. Any sort of status or statistical information
accessed by the NMS is defined in an MIB.
– OID or object identifier is the name of a management object. OIDs are globally
unique

33

SNMP Applications
• LibreNMS
• MRTG
• PRTG
• …

34

17
7/11/19

Beyond SNMP
• SNMP is a heavy-weight protocol with low information density
• SNMP was not designed for streaming high resolution data
• It’s seen as too slow, incomplete, network-specific, and hard to
operationalize

New protocols are being developed to stream telemetry data in real-time


• Yang data model
• XML, JSON and GBP encoding
• Data pushed from agents, not requested from Managers
• UDP, TCP or gRPC transport available

35

Tools - LibreNMS
• An open-source network monitoring system (NMS)
• Capable of managing small or big networks
• Most management functions are supported or can be
integrated
• Details under the hood:
– Written in PHP, derived from the Observium project
– Configuration in MySQL
– Operational data is stored in Round Robin Database files

https://www.librenms.org/
36

18
7/11/19

LibreNMS Dashboard

37

Tools – Sensu
• Sensu is a multi-cloud monitoring system that allows for
automating monitoring workflow
– Monitor containers, instances, applications, and on-premises
infrastructure
– Integrates with PagerDuty, Slack, Grafana, etc
• Sensu Go is the latest version
• Uchiwa is an open-source dashboard for the Sensu
monitoring framework

https://sensu.io/about/
38

19
7/11/19

Sensu / Uchiwa Dashboard

https://github.com/sensu/uchiwa

39

Tools - Grafana
• Open platform for monitoring and analytics
• Does time series analytics
• Plugins to integrate with other applications

40

20
7/11/19

Grafana Dashboard

https://grafana.com/
41

Module 5

FLOW MONITORING

42

21
7/11/19

What is a Flow?
• A flow is defined as a unidirectional sequence of packets
with some common properties that pass through a network
device. (RFC3954)

43

Why do we monitor IP flows?


• Where is our traffic coming from?
• What kind of application traffic is it?
• Are the correct QoS bits set?
• Have routing changes impacted the network

44

22
7/11/19

What’s Netflow?
• Cisco protocol for flow monitoring released in 1996
• Described by RFC3954, but not an Internet Standard
• Netflow V5 is supported by nearly all router platofrms
• Versions:
– Version 5: Ipv4 only
– Version 9: IPv4/v6 and MPLS

45

What is IPFIX?
• IP Flow Information Export
• Vendor neutral protocol for flow monitoring
• Started through the IETF process in 2004 & released in
2011
• Based on Cisco’s Netflow version9
• IPFIX is an Internet Standard replacement for version 9

46

23
7/11/19

How do Netflow and IPFIX work?


• Packets with matching tuples are grouped into a flow
• First occurrence of a flow is recorded in a flow cache
• Cache entries are timestamped
• Number of packets and bytes matching the flow are tallied
• Details like next hop IP, ASN, subnet masks, and TCP flags
can be recorded
• Cache can be queries interactively, or flows can be
exported
47

Setting up Netflow & IPFIX


• Cisco – Netflow Configuration
• Juniper – Monitoring, Sampling …
• Huawei – Netstream Configuration
• Mikrotik - IP Traffic Flow

48

24
7/11/19

Flow Sampling / Downsampling


• Tracking every flow can take a lot of device resources
• Some routers & switches can be crippled by turning on
Netflow
• Sampling helps by tracking one in n packets
• CPU load can be significantly reduced – but so can
resolution

49

Tools - Softflowd
• Software Flow Monitoring
• Passive Netflow collector
• Network traffic passing through a switch can be mirrored
• Attach a Unix computer to the mirrored port
• Softflowd tracks flows from the mirrored traffic
• Flows can be exported just as they are from routers &
switches

50

25
7/11/19

Ad-Hoc Flow Queries


• Cisco
show ip flow

• JunOS
show services accounting flow-detail

51

Tools – nfdump + nfsen


• Nfdump collects and processes netflow and sflow
– C application that receives flows & logs them to files
• Nfsen generates stats and displays graphs
– Web-based front-end to Nfdump

https://github.com/phaag/nfdump
http://nfsen.sourceforge.net/ 52

26
7/11/19

Tools – nfdump + nfsen

53

Tools - ntopng
• Web-based traffic and security network monitoring tool

https://github.com/ntop/ntopng
54

27
7/11/19

Module 6

LOG MANAGEMENT

55

What generates logs?


• Operating Systems
– Linux, Mac, Windows
• System applications
– Cron, init, rdbms

• Network applications
– Bgp, dhcp, http, iptables …

56

28
7/11/19

What do servers log?


• Backups
• Connections
• Database messages
• Hardware messages
• Software versions and updates

57

What do Network Apps log?


• Connections
• DHCP details
• Hardware messages
• Port events
• Protocol information

58

29
7/11/19

Where are logs stored?


• Linux/Mac : /var/log
• Windows: Event Viewer
• Network devices: Memory

Is it useful to have logs stored all over the place? What


happens to events written to memory when devices are
turned off?

59

Firewall Log

60

30
7/11/19

Syslog Message Levels


Level Description
0 Emerg
1 Alert
2 Critical
3 Error
4 Warning
5 Notice
6 Info
7 Debug

61

Syslog aggregation

62

31
7/11/19

How to aggregate syslog


• Set up a remote syslog facility on a server
– Graylog
– Elastic Stack
– Rsyslog
– Splunk
– Syslog-ng

• Configure devices to send their logs

63

Tools - Graylog
• Commercial + Open source software
• Collection, Storage, Analysis, & Visualisation
• Tightly coupled software stack including:
– ElasticSearch for Search
– MongoDB for log storage
• LibreNMS integration

64

32
7/11/19

Tools – Elastic Stack


• Open source with commercial support available
• Collection via Logstash
• ElasticSearch for Storage and Search
• Kibana for Search, Analytics, and Visualisation
• (ELK stack)

65

Tools - Rsyslog
• Open source with commercial support available
• TCP, SSL, TLS, RELP
• MySQL, PostgreSQL, Oracle and more
• Filter any part of syslog message
• Multi-threading and suitable for relay chains

66

33
7/11/19

Tools - Splunk
• Commercial software
• Free for small users at < 500 mb/day
• Collection, Storage, Analysis & visualization
• Real-time alerting engine included
• Popular corporate solution with 13k customers

67

Tools – Syslog-ng
• Free and open source with commercial support available
• Collection and storage
• Adds TCP and TLS to basic UDP transport
• Can extract structured information from log messages
• Can log directly to a database
• Requires external tools for Analysis and visualization

68

34
7/11/19

Log Alerting & Analysis


• No systems administrator has time to read all logs
• Log messages are unimportant until they aren’t
– Post-incident security reports
– Billing inquiries
– Law Enforcement Agency request

• Some platforms include analysis or alerting


• Others need external tools like Tenshi or Swatch

69

Beyond Alerting: Analysis


• Volume of log entries is as important as entries
– What’s your baseline number of entries?
– Has it changed?
– Do more log entries mean an attack?
• Similar log entries across a network can be important
– Port scanning, intrusion attempts

• Similar log entries across time can be important


– Is someone attacking you very slowly?

70

35
7/11/19

7171

36

You might also like