Monitoring System
Technical Proposal
(Shwe Bank)
Prepared by NEX4 ICT Solutions
Hardware List
No. Device Manfacturer Part Number Quantity
1 Access switch Cisco WS-C2960X-48TD-L 12
2 Access switch Cisco WS-C2960X-24TD-L 4
3 MGMT switch Cisco WS-C2960X-24PD-L 2
4 WAN switch Cisco C9200L-48T-4X 1
5 Core Switch Cisco N9K C93180YC-EX 2
6 Core Switch Cisco N7K C7010 2
7 Core Switch Cisco N3K-C3172TQ-1 4
8 CBM ISR Router Cisco ISR4431/K9 2
9 CBM switch Cisco C9200L-24 2
10 Edge Firewall Palo Alto PAN-PA-3410 2
11 Core Firewall Check Point CPAP-SG6700-PLUS-SNBT 2
12 Web Application Firewall F5 F5 R4800 LTM + AWAF 2
13 Hyperflex Servers Cisco HXAF240C-M6SN 4
14 Fabric Interconnect switches Cisco UCS-FI-6454-U 2
If there are additional devices or services which is needed to added in Monitoring tool, NEX4 will provide these additional setting
limited to maximum of 5 devices or service per month.
General Scope of Work
Step : 1 Planning and Design:
➢ Assessment: Conduct a detailed assessment of the current network and server infrastructure to identify specific
monitoring requirements for each component.
➢ Requirement Gathering: Identify the key metrics, thresholds, and events that need to be monitored for each
device type (e.g., CPU, memory, network traffic, security events).
Step : 2 Installation and Configuration:
➢ Zabbix Server Setup: Install and configure the Zabbix server, database, and web interface.
➢ Agent/Proxy Deployment: Deploy and configure Zabbix agents or proxies as necessary on VMware ESXi hosts and
Cisco HyperFlex Servers.
➢ SNMP Configuration: Set up SNMP (Simple Network Management Protocol) for monitoring Cisco Catalyst
Switches, Nexus Switches, and Firewalls (Palo Alto, Checkpoint).
General Scope of Work
Step : 3 Template Creation:
➢ Cisco Catalyst and Nexus Switches: Monitor CPU usage, memory, interface status, traffic, and error rates.
➢ Fabric Interconnect Switches: Monitor network performance, interface statuses, and connectivity issues.
➢ Palo Alto Firewall: Monitor threat logs, session counts, interface traffic, and system resources.
➢ Checkpoint Firewall: Monitor security events, VPN statuses, CPU, memory, and interface traffic.
➢ VMware ESXi: Monitor host performance, VM statuses, CPU, memory, datastore usage, and network metrics.
➢ Cisco HyperFlex servers: Monitor cluster health, storage performance, network performance, and hardware status.
General Scope of Work
Step : 4 Data Collection and Monitoring :
➢ Metric Collection: Configure data collection intervals and retention policies specific to the critical metrics of each
device type.
➢ Network Monitoring: Set up SNMP traps and polling for real-time monitoring of network devices (Cisco Switches,
Firewalls).
➢ Virtualization Monitoring: Configure VMware API integration to monitor ESXi hosts and VMs.
➢ Security Event Monitoring: Configure monitoring for security events and logs from Palo Alto and Checkpoint
Firewalls.
General Scope of Work
Step : 5 Alerts and Notifications:
➢ Thresholds Configuration: Set up specific thresholds for key metrics (e.g., high CPU usage, memory consumption,
network errors) and define corresponding triggers.
➢ Notification Setup: Establish notification channels (email, SMS, etc.) for different severity levels, ensuring timely
alerts to the relevant teams.
➢ Escalation Procedures: Define and configure escalation procedures for unresolved critical alerts.
General Scope of Work
Step : 6 Reporting and Dashboards:
➢ Custom Dashboards: Create real-time dashboards for each device type, providing a centralized view of
performance and health metrics. 10 dashboards per device but need to discuss detail scope in project.
➢ Scheduled Reports: Set up periodic reporting for trend analysis, capacity planning, and compliance with Service
Level Agreements (SLAs).
➢ Historical Analysis: Implement tools for historical data analysis to identify patterns and optimize resource usage.
Step : 7 Testing and Validation:
➢ System Testing: Conduct rigorous testing of the Zabbix setup to ensure accurate monitoring, data collection, and
alerting.
➢ Validation: Verify that all critical infrastructure components are being monitored according to the defined
requirements.
General Scope of Work
Step : 8 Training and Documentation:
➢ Training Sessions: Provide training for network and system administrators on using Zabbix, managing alerts, and
interpreting data.
➢ Documentation: Develop detailed documentation covering the configuration, monitoring templates, and
troubleshooting procedures for each device type.
Dashboard
Information
(Cisco switches)
Collected Items (Cisco Switches)
Name Description
ICMP ping
Uptime (network) The time (in hundredths of a second) since the network management portion of the
system was last re-initialized.
Uptime (hardware) The amount of time since this host was last initialized. Note that this is different from
sysUpTime in the SNMPv2-MIB [RFC1907] because sysUpTime is the uptime of the
network management portion of the system.
SNMP traps (fallback) The item is used to collect all the SNMP traps unmatched by the other snmp trap items.
System contact details The textual identification of the contact person for the managed node (or: this node),
together with the contact information of this person. If no contact information is known,
the value is a zero-length string.
System description The textual description of the entity. This value should include the full name and version
identification number of the system's hardware type, software operating-system, and the
networking software.
Hardware model name MIB: ENTITY-MIB.
Hardware serial number MIB: ENTITY-MIB.
Trigger Events
Unavailable by ICMP ping
High ICMP ping loss
Triggers High ICMP ping response time
Device has been replaced
(Cisco switches) System name has changed
Operating system description has changed
Device has been restarted or reinitialized
No SNMP data collection
Dashboard
Information
(Palo Alto Firewall)
Collected Items (Palo Alto Firewall)
Name Description
Currently installed application definition release date. If no release date
App-ID content date
is found, unknown is returned.
Currently installed application definition version. If no application
App-ID Version
definition is found, 0 is returned.
Chassis type Chassis type for this Palo Alto device.
Currently installed global-protect client package version. If package is
Global Protect Client Version
not installed, 0.0.0 is returned.
GP active tunnels Number of active tunnels.
GP gateway utilization GlobalProtect Gateway utilization percentage.
GP tunnels supported Max tunnels allowed.
Current high-availability mode (disabled, active-passive, or active-
HA Mode
active).
HA Peer State Current peer high-availability state.
HA State Current high-availability state.
HW Version Hardware version of the unit.
ICMP Check Ping to device.
Collected Items (Palo Alto Firewall)
Name Description
Full software version. The first two components of the full version are the
PAN-OS Version major and minor versions. The third component indicates the
maintenance release number.
The average, over the last minute, of the percentage of time that this
Processor 1 Load (mgmt) processor was not idle. Implementations may approximate this one
minute smoothing period if necessary.
The average, over the last minute, of the percentage of time that this
Processor 2 Load (data) processor was not idle. Implementations may approximate this one
minute smoothing period if necessary.
Serial Number The serial number of the unit. If not available, an empty string is returned.
Session table utilization percentage. Values should be between 0 and
Session table utilization
100.
SNMP availability SNMP availability.
A textual description of the entity. This value should include the full name
and version identification of the system's hardware type, software
System Description
operating-system, and networking software. It is mandatory that this only
contain printable ASCII characters.
Collected Items (Palo Alto Firewall)
Name Description
An administratively-assigned name for this managed node. By
System Name
convention, this is the node's fully-qualified domain name.
The time (in hundredths of a second) since the network management
System Uptime
portion of the system was last re-initialized. Preprocessed to seconds.
Currently installed threat definition version. If no threat definition is
Threat Version
found, 0 is returned.
Total active ICMP sessions Total number of active ICMP sessions.
Total active sessions Total number of active sessions.
Trigger Events
App-ID content date
App-ID Version
Chassis type
Triggers
Global Protect Client Version
GP active tunnels
(Palo Firewall) GP gateway utilization
GP tunnels supported
HA Mode
HW Version
ICMP Check
PAN-OS Version
Trigger Events
Processor 1 Load (mgmt)
Processor 2 Load (data)
Serial Number
Triggers
Session table utilization
SNMP availability
(Palo Firewall) System Description
System Name
System Uptime
Threat Version
Total active ICMP sessions
Total active sessions (TCP/UDP)
Type Items Value/Unit
FAN Status of FAN, Speed of FAN
CPU Number of CPUs
CPU Utilization %
Memory Total Memory MB/GB
Active Memory MB/GB
Free Memory MB/GB
Used Memory MB/GB
Dashboard Information Memory Utilization %
(CheckPoint Firewall) Storage Storage size
Storage Utilization
MB/GB
%
Network Network Traffic received bps
Interface
Network Traffic sent bps
Operational status Up/Down
Sessions Concurrent Connection
Peak Concurrent Connection
Overview info SNMP Agent Availability
Uptime
ICMP Ping
Collected Items (CheckPoint Firewall)
Name Description
Appliance product name MIB: CHECKPOINT-MIB
Appliance product name.
Appliance serial number MIB: CHECKPOINT-MIB
Appliance serial number.
Appliance manufacturer MIB: CHECKPOINT-MIB
Appliance manufacturer.
Remote Access users MIB: CHECKPOINT-MIB
Number of remote access users.
System contact details MIB: SNMPv2-MIB
Name and contact information of the contact person for the node. If not
provided, the value is a zero-length string.
System description MIB: SNMPv2-MIB
Full name and version identification of the system's hardware type,
software operating system, and networking software.
Appliance product name MIB: CHECKPOINT-MIB
Appliance product name.
Collected Items (CheckPoint Firewall)
Name Description
System name MIB: SNMPv2-MIB
An administratively-assigned name for the node (the node's fully-
qualified domain name). If not provided, the value is a zero-length string.
System object ID MIB: SNMPv2-MIB
The vendor's authoritative identification of the entity as part of the
vendor's SMI enterprises subtree with the prefix 1.3.6.1.4.1 (e.g., a vendor
with the identifier 1.3.6.1.4.1.4242 might assign a system object with the
OID 1.3.6.1.4.1.4242.1.1).
System uptime MIB: HOST-RESOURCES-V2-MIB
Time since the network management portion of the system was last re-
initialized.
Number of CPUs MIB: CHECKPOINT-MIB
Number of processors.
CPU utilization MIB: CHECKPOINT-MIB
CPU utilization per core in %.
Collected Items (CheckPoint Firewall)
Name Description
Load average (1m avg) MIB: UCD-SNMP-MIB
Average number of processes being executed or waiting over the last
minute.
Load average (5m avg) MIB: UCD-SNMP-MIB
Average number of processes being executed or waiting over the last 5
minutes.
Load average (15m avg) MIB: UCD-SNMP-MIB
Average number of processes being executed or waiting over the last 15
minutes.
CPU user time MIB: CHECKPOINT-MIB
Average time the CPU has spent running user processes that are not
niced.
CPU system time MIB: CHECKPOINT-MIB
Average time the CPU has spent running the kernel and its processes.
Collected Items (CheckPoint Firewall)
Name Description
CPU idle time MIB: CHECKPOINT-MIB
Average time the CPU has spent doing nothing.
Context switches per second MIB: UCD-SNMP-MIB
Number of context switches per second.
CPU interrupts per second MIB: CHECKPOINT-MIB
Number of interrupts processed per second.
Total memory MIB: CHECKPOINT-MIB
Total real memory in bytes. Memory used by applications.
Active memory MIB: CHECKPOINT-MIB
Active real memory (memory used by applications that is not cached to
the disk) in bytes.
Free memory MIB: CHECKPOINT-MIB
Free memory available for applications in bytes.
Used memory Used real memory calculated by total real memory and free real memory
in bytes.
Collected Items (CheckPoint Firewall)
Name Description
Memory utilization Memory utilization in %.
Encrypted packets per second MIB: CHECKPOINT-MIB
Number of encrypted packets per second.
Decrypted packets per second MIB: CHECKPOINT-MIB
Number of decrypted packets per second.
ICMP ping Host accessibility by ICMP.
0 - ICMP ping fails.
1 - ICMP ping successful.
ICMP loss Percentage of lost packets.
ICMP response time ICMP ping response time (in seconds).
SNMP agent availability Availability of SNMP checks on the host. The value of this item corresponds to
the availability icons in the host list.
Possible values:
0 - not available
1 - available
2 - unknown
Trigger Events
Interface Link Down
Interface High Bandwidth usage
Interface High error rate
Device has been replaced
Triggers System name has changed
(CheckPoint Device has been restarted
Firewall) High CPU utilization
Load average is too high
High memory utilization
Unavailable by ICMP ping
High ICMP ping loss
Trigger Events
High ICMP ping response time
No SNMP data collection
Disk space is critically low
Device has been replaced
Triggers Temperature is above critical threshold
(CheckPoint Temperature is above warning threshold
Firewall) Temperature is too low
Power supply is in down state
License expires soon
License has been expired
Dashboard Information
(Cisco Hyperflex servers)
Collected Items (Cisco Hyperflex Servers)
Name Description
Uptime (network) MIB: SNMPv2-MIB
The time in seconds since the network management
portion of the system was last re-initialized.
Uptime (hardware) MIB: HOST-RESOURCES-MIB
The amount of time since this host was last initialized.
Note that this is different from sysUpTime in the SNMPv2-MIB
[RFC1907] because sysUpTime is the uptime of the
network management portion of the system.
SNMP traps (fallback) The item is used to collect all SNMP traps unmatched by other
snmptrap items
SNMP agent availability Availability of SNMP checks on the host. The value of this item
corresponds to availability icons in the host list.
Possible values:
0 - not available
1 - available
2 - unknown
Collected Items (Cisco Hyperflex Servers)
Name Description
Disk_arrays Disk array controller status
Disk_arrays Disk array controller model
Fans Fan status
Inventory Hardware model name
Inventory Hardware serial number
Physical_disks Physical disk status
Physical_disks Physical disk model name
Physical_disks Physical disk media type
Physical_disks Disk size
Collected Items (Cisco Hyperflex Servers)
Name Description
Power_supply Power supply status
Status Overall system health status
Temperature Ambient: Temperature
Temperature Front: Temperature
Temperature Rear: Temperature
Virtual_disks Status
Virtual_disks Layout type
Virtual_disks Disk size
Trigger Events
Host has been restarted
Disk array controller is in critical state
Disk array controller is in warning state
Disk array controller is not in optimal state
Triggers Disk array cache controller battery is in critical state!
(Cisco Hyperflex Servers) Disk array controller is in critical state
Fan is in critical state
Fan is in warning state
Device has been replaced (new serial number
received)
Physical disk failed
Trigger Events
Power supply is in critical state
Power supply is in warning state
Triggers System status is in critical state
System status is in warning state
(Cisco Hyperflex Servers)
Temperature is above warning threshold
Temperature is above critical threshold:
Unavailable by ICMP ping
Dashboard
Information
(VMware ESXI Hypervisor)
Dashboard
Information
(VMware Guest)
Collected Items (VMware Guest)
Name Description
Cluster name Cluster name of the guest VM.
Number of virtual CPUs Number of virtual CPUs assigned to the guest.
CPU ready Time that the virtual machine was ready, but could not get scheduled to
run on the physical CPU during last measurement interval (VMware
vCenter/ESXi Server performance counter sampling interval - 20
seconds)
CPU usage Current upper-bound on CPU usage. The upper-bound is based on the
host the virtual machine is current running on, as well as limits
configured on the virtual machine itself or any parent resource pool. Valid
while the virtual machine is running.
Datacenter name Datacenter name of the guest VM.
Hypervisor name Hypervisor name of the guest VM.
Ballooned memory The amount of guest physical memory that is currently reclaimed through
the balloon driver.
Compressed memory The amount of memory currently in the compression cache for this VM.
Private memory Amount of memory backed by host memory and not being shared.
Collected Items (VMware Guest)
Name Description
Shared memory The amount of guest physical memory shared through transparent page
sharing.
Swapped memory The amount of guest physical memory swapped out to the VM's swap
device by ESX.
Guest memory usage The amount of guest physical memory that is being used by the VM.
Host memory usage The amount of host physical memory allocated to the VM, accounting for
saving from memory sharing with other VMs.
Memory size Total size of configured memory.
Power state The current power state of the virtual machine.
Committed storage space Total storage space, in bytes, committed to this virtual machine across all
datastores.
Uncommitted storage space Additional storage space, in bytes, potentially used by this virtual
machine on all datastores.
Unshared storage space Total storage space, in bytes, occupied by the virtual machine across all
datastores, that is not shared with any other virtual machine.
Collected Items (VMware Guest)
Name Description
Uptime System uptime.
Guest memory swapped Amount of guest physical memory that is swapped out to the swap
space.
Host memory consumed Amount of host physical memory consumed for backing up guest
physical memory pages.
Host memory usage in Percentage of host physical memory that has been consumed.
percents
CPU usage in percents CPU usage as a percentage during the interval.
CPU latency in percents Percentage of time the virtual machine is unable to run because it is
contending for access to the physical CPU(s).
CPU readiness latency in Percentage of time that the virtual machine was ready, but could not get
percents scheduled to run on the physical CPU.
CPU swap-in latency in Percentage of CPU time spent waiting for swap-in.
percents
Uptime of guest OS Total time elapsed since the last operating system boot-up (in seconds).
VMware Trigger Events
VMware Cluster Cluster status is Red
Cluster status is Yellow
Datastore Free space is critically low
Triggers Free space is low
(VMware ESXI VM VM has been restarted
Hypervisor Cluster) VMware Hypervisor Hypervisor is down
The health is Red
The health is Yellow
Hypervisor has been
restarted
Estimate Project Timeline
Kick Off Meeting & Requirement Gathering
Zabbix Implementation
Knowledge Transfer & Deliver Documents
Project Close
5 Man 5 Man 5 Man 5 Man 5 Man 5 Man 5 Man 5 Man 5 Man 5 Man 5 Man 5 Man 5 Man 5 Man 5 Man
Days Days Days Days Days Days Days Days Days Days Days Days Days Days Days
Week 1 Week 2 Week 3 Week 4 Week Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15
5
LEGENDS High Level Tasks
NEX4 Support Scope Details
No. Support Case Type Description
1. Incidents and Alerts in the Zabbix monitoring tool An issue was reported with the NEX4 monitoring tool that led to
disruptions in the monitoring and management of critical
infrastructure.
2. Incident Triage and Escalation Classification of incidents by severity and impact. Escalation of
complex or high-priority issues to senior engineers or third-party
vendors.
3. Root Cause Analysis Conduct a thorough investigation to determine the underlying
cause of incidents and event problems in Zabbix and provide
solutions to prevent future occurrences.
4. Questionnaire General Service Question cases related to NEX4’s responsible
products.
5. Additional configuration added If there are additional devices or services which is needed to
added in NOC tool, NEX4 will provide these additional setting
limited to maximum of 5 devices or service per month.
NEX4 Premium Support SLAs
Severity Onsite Support
Initial Response Subsequent Support Severity Description
level (if required)
1 Within 30 Minutes 24/7 x 4 hours Every 30 min.
A problem has made a critical unusable or unavailable and no
Critial workaround exists.
2 Within 1 hour 24/7 x 6 hours Every 2 hours A problem has made unavailable, but a workaround exists.
High
3 Within 2 hours 2 Business day Every 4 hours A certain function in a service is degraded
Medium
4 Within 1 Business day Upon availability Weekly General Assistance for Configuration help, Question, etc.
Low
Initial Response is when a ticket is opened and acknowledged by help desk staff. (Note: For Severity 1 and 2 level, we suggest to inform NEX4 Team by
telephone call for faster respond.)
Subsequent Support is the frequency with which the user that logged the ticket is updated on the resolution status.
Onsite Support start if a decision point is made by NEX4 support manager to provide onsite support.
24x7 Emergency Contact
Support@nex4.net +959 683 545 333
Email +959 765 203 073 Phone
Thank You