You are on page 1of 26

Network Automation V0.

0
Network Devices' Open Programmability
Telemetry Experiment
Telemetry as new Data collection Mechanism
Why moving to Telemetry: Traditional Data collection Mechanisms cannot cope with massive data
Telemetry as new Data collection Mechanism: Legacy methods
Les modeles empiriques comme SNMP ou CLI permettent un monitoring et un management des resssources reseaux qui
etait relativemnt statiques (inventory, collecte voisinage, traffic….).
Telemetry as new Data collection Mechanism: Legacy methods limites
Avec le SDN et la centralisation du control plane, un besoin de collecter rapidement plusieurs types de donnees s’impose pour que les
Controllers puissent prendre les bonnes decisions dans la gestion reseaux.
Ces donnees peuvent inclure a des intervalles de temps reguliers :
1. La disponibilite en bande passante de chaque interface dans le reseaux
2. Le statut des files d’attentes sur chaque noeuds
3. L'état des charges CPU & memoire de chaque carte (MPU ou LPU)
4. Le statut et le taux d’utilization des tunnels MPLS/TE dans le reseau
5. …
Telemetry as new Data collection Mechanism
Telemetry as new Data collection Mechanism
Telemetry as new Data collection Mechanism: How Telemetry works ?
Network TELEMETRY is a relatively new mechanism that uses a push model to continuously send high-resolution device operational data
to a network management system.

It sends data at a higher rate and with lower impact on the network devices than with other methods, like SNMP or the command-line
interface (CLI).
Data is selected by configuring a periodic cadence, which can be subsecond or an event trigger, such as a threshold breech (e.g., high
errors) or a status change (e.g., interface state change). Network managers have to determine the cadence or event triggers for streaming
each type of data so they don't overwhelm the processing capabilities of the network management system in question.

The volume of data that can be streamed from even a moderately sized network can be huge, requiring big data storage and processing
mechanisms.
The data is encoded as XML, JSON or Google protocol buffers.
Either UDP or TCP transport can be used, frequently in conjunction with Google Remote Procedure Calls (gRPC), with encryption.

GRPC enables a collector to dynamically request a data stream from a network device. It can be used to establish new data streams or to
poll for data that rarely changes.

Model-driven telemetry, meanwhile, is based on YANG (Yet Another Next Generation) models and simplifies the selection of the data to
stream.

The OpenConfig working group is creating standardized models that can be applied across groups of network devices.
In addition, Google, through its gRPC Network Management Interface (gNMI) initiative, is attempting to define a standard that governs
how telemetry can be used to retrieve network state data.
Telemetry as new Data collection Mechanism
Telemetry as new Data collection Mechanism
Telemetry Protocol Stack
Telemetry framework

2
Telemetry as new Data collection Mechanism Comparing telemetry vs. SNMP

SNMP is used best when retrieving relatively static data, such as inventory or neighboring devices.
SNMP is useful for networks equipped with significant numbers of older devices that don't support telemetry.
It is also good for collecting nonperformance data, such as routing peers, bridge domain neighbors, NTP peers and device inventory information -- i.e., serial
numbers, modules and slot locations.
Finally, the protocol's use of UDP eliminates the need to allocate large receive buffers, enabling management servers to more efficiently allocate internal
memory.
However, Its polling mechanism makes collecting high-volume, high-resolution performance data a challenge.

TELEMETRY is better for collecting high-resolution performance data, such as high-speed network interface statistics. It's becoming more practical as more
device and network management vendors begin to support the methodology.

In addition, newer RPC mechanisms make telemetry more efficient than SNMP or CLI in obtaining data from network devices, making telemetry the obvious
choice going forward.
Moreover, the large number of YANG models for each vendor can make it difficult to analyze streaming data.

For networks that contain a mix of old and new network devices, a combination of SNMP and telemetry will be best. A switch to telemetry is possible when all
network devices within an organization support it.

Regardless of how you may assess the data collection methods of telemetry vs. SNMP, network management is essentially a big data problem.
The management system needs to process large volumes of data to identify anomalies and alert the network operations team to problems.
The OpenConfig and gNMI initiatives are working to simplify data collection and analysis.
Telemetry as new Data collection Mechanism Comparing telemetry vs. SNMP
Network Automation V0.0
Network Devices' Open Programmability
[OPS] Open Programmability System Experiment
[OPS] Open Programmability System

Definition
The open programmability system (OPS) is an open platform that provides Application Programming Interfaces (APIs) to achieve programmability, allowing
third-party applications to run on the platform.

Play this video about Huawei AR Router OPS Feature


https://youtu.be/-S36hWexWwE

Link to Huawei OPS Feature


https://support.huawei.com/enterprise/en/doc/EDOC1000174065/934a57df/overview-of-ops#:~:text=purpose%20of%20OPS.-,Definition,to%20run%20on%20the%20platform.
[OPS] Open Programmability System: Huawei OPS working process
This process is clearly explained in the previous Huawei’s YouTube video.
1. Make a Python script locally and use APIs
to define functions.
2. Upload the Python script to a device.
3. Install the Python script. If an installed
Python script needs to be modified,
uninstall it first, and reinstall it after the
modification. To uninstall the script
bound to an assistant, delete the assistant
first.

4. After installing the Python script, configure an assistant for


the script, and register the subscription event in the script.
When the specified trigger condition is met, the device
automatically executes the working event in the script. If
the subscribed contents in the Python script are incorrect,
the device displays an error message to indicate the error
location. You need to correct the subscribed contents,
upload the script again, and then install the script.
5. After an assistant is configured and the subscription event
is registered successfully, the device executes the working
event in the script successfully when the trigger condition
is met, as long as the script is correct. If the script is
incorrect, the device cannot execute the working event,
requiring you to correct the script and then upload and
install the script again.
6. The device runs the Python script to implement the
predefined functions.
[OPS] Open Programmability System: Application Scenarios for OPS
Automatic Health Check
Traditionally, you need to log in to a device and check the hardware and service
running status using multiple commands to determine the health of the device.

You can configure the OPS function on a device to implement automatic health check,
as shown in Figure below.
The device then automatically runs the health check commands, periodically collects
Automatic Deployment of Unconfigured Devices
health check results, and sends the results to a Secure Shell (SSH) server for analysis.
As shown in below Figure, RouterA has no configuration file.
This function reduces maintenance workload. After RouterA is powered on, it obtains the IP address of the script file server from the
DHCP server and downloads a Python script from the script file server.
When a fault occurs on the device, the system runs the preconfigured commands or
script to isolate the faulty module or rectify the faulty. The OPS on RouterA then runs the Python script to download the system software
and configuration file.
After that, RouterA restarts with the system software and configuration file, and the
automatic deployment is complete.
[OPS] Open Programmability System: Application Scenarios for OPS (2)

OPS can also implement intelligent diagnosis and configuration using a Python script.
Intelligent diagnosis:
 Threshold-crossing alarm function: determines whether the memory or CPU usage of a device exceeds the threshold.
 Neighbor information analysis function: determines whether neighbors of a device are working normally.
 Interface information diagnosis function: determines whether an interface is working normally.
 Route diagnostic analysis function: determines whether routes of a device are correct.
 Important route change monitoring function: generates a log when important routes have changed.
 Device diagnostic information query function: determines whether a device is working normally.
 Interface traffic monitoring function: determines whether traffic of an interface is normal.

Intelligent configuration
 Automatic configuration backup function in configuration mode: automatically backs up the current configuration to the
local and remote SSH servers before new configuration is performed.
 Adding user information to a configuration file: records the user name and IP address of the user that modifies the
configuration.
 Risk notification function: generates a risk warning message before high-risk commands are executed.
 Configuration wizard: guides and simplifies the configuration after a device enters the configuration mode.
 Optional service disabling function: automatically detects whether optional services exist after a device enters the
configuration mode. If so, optional services will be disabled.
[OPS] Open Programmability System: Licensing Requirements and Limitations for OPS

Licensing Requirements
OPS is a basic feature of a router and is not under license control.

Feature Limitations: When configuring the OPS, pay attention to the following points:
 Python scripts can run only after being installed.
 Before using Python scripts for automatic maintenance, familiarize yourself with Python programming language and make scripts correctly.
 The device supports Python 2.7.3. When making Python scripts, use Python 2.7.3.

Note: The AR1220, AR1220V, AR1220W, AR1220VW and AR1220L do not support the OPS function.

Default Settings for OPS

Parameter Default Setting


Script installation path $_user (sd1:/$_user/ or flash:/$_user/) in the root directory of a storage device
Python script assistant Not configured
Assistant function Enabled
[OPS] Open Programmability System: OPS implementation

Example for Using a Python Script to Monitor Changes of Important Routes


OPS is a basic feature of a router and is not under license control.
1 Configure the routing address for the port.
2 Make Python scripts: # Make Python scripts climuti.py and routetrack.py to implement the following <Huawei> system-view
functions: [Huawei] sysname Router
 The script climuti.py defines the routetrack command to enable the function that monitors the [Router] interface GigabitEthernet 1/0/0
changes of important routes and install the script routetrack.py. [Router-GigabitEthernet1/0/0] ip address 10.2.1.1 24
[Router-GigabitEthernet1/0/0] quit
 The script routetrack.py monitors the changes of routes and generates logs when routes change. [Router] quit

3 Upload and install the Python script. 4 Configure an assistant and register the command line event in the script climuti.py to wait for the event to be triggered
# Upload the Python script from the PC to the router. <Router> system-view
[Router] ops
# Install the Python script to the router. [Router-ops] script-assistant python climuti.py
<Router> ops install file climuti.py [Router-ops] quit
[Router] quit

5 Verify the configuration.


# After the preceding configurations are complete, run the routetrack command to enable the function that monitors the changes of important routes and
then run the routetrack.py command in the sd1:/$_user/ directory of the device to check whether the scripts have been installed successfully.
After a Python scrip assistant is configured, the system generates a .pyc file (intermediate file) for each script.
<Router> routetrack
<Router> cd $_user
6# Turn on the log switch. When an important route changes, check whether the following log is
<Router> dir generated.
Directory of sd1:/$_user/ <Router> system-view
[Router] info-center enable
Idx Attr Size(Byte) Date Time(LMT) FileName [Router] quit
0 -rw- 1,672 Jul 22 2015 14:29:33 climuti.py <Router> terminal monitor
1 -rw- 2,000 Jul 22 2015 14:29:57 climuti.pyc <Router> terminal logging
2 -rw- 441 Jul 22 2015 14:31:00 routetrack.py <Router>
3 -rw- 891 Jul 22 2015 14:31:03 routetrack.pyc Jul 28 2015 14:29:17+08:00 Router %%01OPSA/2/SCRIPT_LOG(l)[0]:OPS: Syslog:
The important route changed. (user="routetrack.py", session=964036020).
[OPS] Open Programmability System: OPS implementation

Example for Using a Python Script to Monitor Changes of Important Routes


OPS is a basic feature of a router and is not under license control.

1 Router configuration file 2 Example of the script climuti.py


#
# coding=utf-8
sysname Router import ops # Import the OPS module.
# import sys # Import the sys module.
interface GigabitEthernet1/0/0 import re # Import the reps module.
# Subscription processing function
ip address 10.2.1.1 255.255.255.0 def ops_condition (ops):
# print("\r\n user.py: enter ops_condition()") # Print information.
value1, err_str1 = ops.cli.subscribe("cli1", "^routetrack$", enter=True, sync=True, sync_wait=60) #
ops Define the routetrack command.
script-assistant python climuti.py print("\r\n reg_cli.subscribe.value: %-15d"%(value1))
script-assistant python routetrack.py print("\r\n reg_cli.subscribe.err_str: %s"%(err_str1))
value2, err_str2 = ops.cli.subscribe("cli2", "^no routetrack$", enter=True, sync=True, sync_wait=60) #
# Define the no routetrack command.
return value10, err_str10 = ops.correlate("cli1 or cli2") # Combined event, input the routetrack or no
routetrack command.
print("\r\n correlate.value10:%d"%(value10))
print("\r\n correlate.err_str10:%s"%(err_str10))
return 0
…..

3
Example of the script routetrack.py
# coding=utf-8
import ops # Import the OPS module.
import sys # Import the sys module.
# Subscription processing function
def ops_condition (o):
print("\r\n user.py: enter ops_condition()")
value, err_str = o.route.subscribe("route1", network = "10.2.1.0", maskLen = 24, optype="modify", protocol="ospf")
# Monitor the changes of OSPF routes to network segment 10.2.1.0/24.
print("\r\n retrieve.route1.value:%d"%(value))
print("\r\n retrieve.route1.err_str:%s"%(err_str))
return 0
# Work processing function
def ops_execute (o):
status, err_log = o.syslog("Syslog: The important route changed.", ops.CRITICAL, "syslog")
# Record user-defined critical user logs to notify route changes.
return 0
[OPS] Open Programmability System: Huawei device built-in OPS scripts
Lists OPS script names and functions for Checking the Version Configuration
Check Item Script Name Function Severity

Patch information pys/versioninfo-patch.py Displays the current patch information of the device. Minor

Startup configuration pys/versioninfo-startup.py Displays the startup configuration of main control boards. Minor

Configuration file pys/versioninfo-cfgfile.py Displays the current and saved configurations of the device. Info

Lists OPS script names and functions for Checking the Device Status
Check Item Script Name Function Severity
CPU usage pys/running-cpu.py Displays the current CPU usage of the device. Minor
Memory usage pys/running-memory.py Displays the current memory usage of the device. Minor
Flash memory usage pys/running-flash.py Displays the flash memory usage of the device. Minor
SD card usage pys/running-sdcard.py Displays the SD card usage of main control boards. Minor
NOTE:The AR150&AR200&AR1200 series does not support this script.
Power supply status pys/running-power.py Displays the running status of power modules on the device. Major
Fan status pys/running-fan.py Displays the running status of fan modules on the device. Minor
NOTE:The AR150&AR200 series does not support this script.
Temperature pys/running-environment.py Displays the current ambient temperature of the device. Major
Card status pys/running-device.py Displays the running status of all cards on the device. Major
Exceptions pys/running-exception.py Displays exceptions and deadloop information of main control boards on the device. Minor
Black box information pys/running-blackbox.py Displays current error information of the device. Minor
Assertions pys/running-assert.py Displays the current assertions of main control boards on the device. Minor
Error logs pys/running-logerror.py Displays the current error logs of main control boards on the device. Minor

Lists OPS script names and functions for Checking the Interface Configuration
Check Item Script Name Function Severity
Interface duplex mode pys/intconfig-duplexmode.py Displays the current interface duplex mode of the device. Major
Interface statistics pys/intconfig-intstatistic.py Displays current interface statistics of the device. Minor
Interface traffic pys/intconfig-intflow.py Displays current interface traffic information of the device. Major
Interface CRC and conflicting packets pys/intconfig-Intcrcandcollisions.py Displays the current interface CRC and conflicting packet information of the device. Major

Loopback address pys/intconfig-loopback.py Displays the loopback address of the device. Minor
[OPS] Open Programmability System: Huawei device built-in OPS scripts
Lists OPS script names and functions for Checking Checking WAN Connection
Check Item Script Name Function Severity
PPP pys/wan-ppp.py Displays the serial interface configuration of the device. Major

Interface backup pys/wan-backup.py Helps determine whether the interface backup function is enabled on the device. Minor

3G pys/wan-3g.py Displays 3G interface information of the device. Major

Lists OPS script names and functions for Checking IP Routing


Check Item Script Name Function Severity
Default routes pys/route-defaultroute.py Displays default route information of the device. Info

BGP status pys/route-bgp.py Helps determine whether BGP is configured on the device. Major

MP-BGP status pys/route-mpbgp.py Helps determine whether MP-BGP is configured on the device. Major

Routing table statistics pys/route-statistic.py Displays routing table statistics of the device. Major

Lists OPS script names and functions for Checking IP Services


Check Item Script Name Function Severity
IP statistics pys/ipservice-ipstatistic.py Displays current error packet statistics and TTL-expired packet statistics on the device. Major

ICMP statistics pys/ipservice-icmpstatistic.py Displays the current ICMP packet statistics of the device. Major

DHCP server pys/ipservice-dhcpserver.py Helps determine whether the DHCP server function is enabled on the device. Major

DHCP client pys/ipservice-dhcpclient.py Helps determine whether the DHCP client function is enabled on the device. Major

Lists OPS script names and functions for Checking Security Services
Check Item Script Name Function Severity
CPCAR statistics pys/safe-cpcar.py Displays statistics about packets currently sent to the CPU of the device. Info

IPSec encrypted/decrypted traffic statistics pys/safe-ipsecstatistic.py Displays statistics about current IPSec encrypted/decrypted packets. Info
Network Automation V0.0
Network Devices' Open Programmability
Huawei iMaster NCE Experiment
Huawei iMaster NCE (IP Domain)

Definition
iMaster NCE (IP Domain) centrally manages, controls, and analyzes IP devices such as NE, ATN, CX, and PTN series NEs in a unified manner.

Designed for IP private line, IP core, mobile transport, and metro network scenarios, it provides functions such as device plug-and-play and service automation
to enable automated full-lifecycle network management and maintenance.

With real-time monitoring of network traffic and quality, iMaster NCE (IP Domain) leverages big data analytics to identify network trends in real time and
implement proactive maintenance and closed-loop optimization through service control and optimization.
Huawei iMaster NCE (IP Domain): Use Cases

Network SLA Visualization


Through unified, reliable, and distributed southbound data collection, NCE provides real-time visibility into masses of data and monitors SLA performance at
the NE, link, and service levels.
It collects performance data at certain intervals (such as 1, 5, or 15 minutes).
It also supports various collection protocols, including Telemetry, Qx, and SNMP. Of these, Telemetry can collect huge volumes of data in seconds, enabling NCE
to detect network changes in real time.

After collecting the performance data, NCE displays it in multiple ways (such as dashboard, traffic quality map, and various reports), helping O&M personnel
visualize network SLAs and understand the network status in real time.

Centralized Path Computation: Based on collected network information, NCE deploys tunnels.
It then uses the collected tunnel inventory information and Layer 3 topology to centrally compute paths for TE tunnels based on preset optimization policies.
If network changes occur, NCE adjusts tunnel paths based on real-time bandwidth or fault convergence.

MPLS network optimization, Huawei's innovative path computation algorithm, can optimize network-wide paths for optimal bandwidth usage. Centralized
local management of bandwidth resources simplifies service establishment on path computation clients (PCCs) and reduces the bandwidth conflicts caused by
distributed computation.

Open Programmability
In the agile cloud era, SDN-based automated service deployment and fast, agile, automated, and intelligent service O&M are key for carriers and enterprises to
build their core competitiveness.
Carriers and enterprises may find the service and O&M capabilities in network management and control systems are insufficient to meet the diverse and
unique needs of their customers.

ITo address this, NCE provides an open programmable framework. Customers can customize network services based on their O&M habits and requirements,
and deploy and provision network services as they see fit.
This framework provides one-stop device management and control as well as service provisioning in multi-vendor scenarios.

You might also like