You are on page 1of 9

Network CI/CD Part 2 – Automated Testing with Robot

Framework library for Arista devices


eos.arista.com/arista-robot-testing

Michael Kashin May 21,


2018

Contents [hide]

Previously on Network CI/CD Part 1…


The problem of network testing
Why Ansible isn’t enough?
Why scripting is too much?
Robot Framework
Arista Network Validation
Test bed setup
Testing
Custom keywords
Further reading
Coming up

Previously on Network CI/CD Part 1…


We’ve established that lack of simulated test environments and automated test tools
are among the inhibitors in a transition from a traditional network operation model to a
DevOps workflow, where all changes are verified and tested prior to being deployed in
production. We’ve seen how to solve the first problem with Arista’s cEOS-Lab docker
container and a simple container orchestration tool. We’ve shown how the
new containerised EOS allows us to dramatically increase the number of nodes we can
deploy on a single host and decrease the total build and boot time compared to VM
orchestration methods, e.g. the ones based on Vagrant. Now that we have our virtualised
topology built, we can start thinking about how to test it, and it always helps to start with a
bit of on overview of the current lay of the land.

The problem of network testing


Network testing has always been an afterthought of both traditional network design and
network operation workflows. When designing a new data center or a new campus network
most of the efforts are focused around scalability, reliability, fault talerance, automation.
When planning a network change, both implementation and testing procedures are written
by an engineer based on his/hers expectations of what’s supposed to happen. Very rarely do
we verify our assumptions in a simulated lab environment and even then our tests are
limited to a few ping and traceroute commands. However a ping doesn’t mean the traffic is
taking the right path in the network and traceroutes, especially in ECMP environments, can
be quite hard to verify visually.

1/9
Ultimately, even with very high test coverage we’re still doing things manually and relying on
humans to interpret the output, which means there’s always a chance for a mistake. If the
networking industry is ever to transition to a DevOps operation model having a fully-fledged,
robust and reliable test automation framework is a must. The question is: What’s the right
tool for this?

Why Ansible isn’t enough?


One of the unfortunate side-effects of network engineers learning Ansible is that now
everything looks like it can be solved with yet another intricate playbook and maybe a
custom module. I’ve made this mistake myself a long time ago when I developed a network
TDD framework on top of Ansible to verify traffic paths inside the network. The truth
is Ansible is not a general purpose automation framework, and it was designed to address a
very specific set of use cases and problems:

software provisioning, i.e. running a bunch of wget, yum, apt and pip commands
configuration management, i.e. creating/modifying configuration files
pushing data into a device, i.e. configuration files, binaries
running ad-hoc CLI commands over SSH

What Ansible isn’t very good at is:

state management, i.e. maintaining state between different playbook runs


event management, i.e. reacting to events or state changes on a managed device
parsing unstructured data, i.e. output of “show” commands
cross-device data correlation, e.g. trigger an action on one devices using data collected
on another device

However Ansible is very flexible and customisable and with enough effort it can be “taught”
how to do a lot of things it wasn’t designed to do originally. The problem with that is at some
point those playbooks become too hard to manage and troubleshoot. This is where the
additional complexity outweighs any benefits of automation and obscurity of the
resulting DSL code outweighs the benefits of its readability.

Why scripting is too much?


On the other side of the spectrum are general purpose programming languages, with the
most prominent in a networking community being Python. It certainly loses in readability
and speed of development, compared to DSL-based automation frameworks, for general
purpose tasks, however it makes up in flexibility and extensibility. At a point when Ansible
becomes hard to troubleshoot and manage, Python maintains the same level of complexity.

Another downside of using pure Python for network testing automation is the need to write
a lot of boilerplate code to create common testing abstractions and libraries (it took 6
months and 10k lines of code to write Brigade). Nevertheless we should never discount the
possibility of using scripting for network testing, however there may be, at least for this

2/9
specific use case, a middle ground that on the one hand offers the simplicity and readability
of a DSL framework, but on the other hand allows as much customisation as necessary to
extend and augment the default behaviour. Enter the Robot.

Robot Framework
Robot is a generic test automation framework written in Python. Its DSL has a very
lightweight syntax which makes it very easy to write and read. The framework comes with a
set of standard libraries that implement typical functionality expected from a test framework
– data types and structures, conditionals and expectation, automated UI interactions (e.g.
selenium, telnet, ssh), as well as many other 3rd party libraries. One of the most recent
additions is AristaLibrary – a library to interact with Arista devices over eAPI. At the time of
writing this library defines 18 new keywords that allow users to define most typical test
scenarios. However, one of the major advantages of Robot framework is the ability to define
your own keywords. As I will show later, we can re-use any of the existing keywords to define
our own higher-level keywords and use them in our test definitions.

Arista Network Validation


Arista Network Validation is a mini-wrapper on top of Robot Framework that makes it easier
to use it for network testing. It’s available in software downloads as a tar.gz file that can be
installed using pip and installs all the required 3rd party packages, including the Robot
Framework itself. The installer file is distributed with a “User Guide” pdf document, which
contains a detailed description of how to use the framework. It’d be pointless to repeat
information from the user guide here, so I’ll refer readers to it for detailed description of the
framework and Arista Library. Now it’s time for a quick demo…

Test bed setup


First thing we need to do is install all the Arista Network Validation tool. To simplify
dependency management, we’ll install it inside a python2 virtual environment:

$ python2 -m virtualenv testing; cd testing

$ source bin/activate

$ pip install network_validation-1.0.1.tar.gz

We’ll do our testing against a virtual topology built from cEOS devices I’ve described in
the previous post.:

$ python3 -m pip install git+https://github.com/networkop/arista-ceos-topo.git

$ cat <<EOF >> topology.yml

PUBLISH_BASE: 9000

links:

3/9
– [“Device-A:Interface-1”, “Device-B:Interface-1”]

EOF

$ sudo docker-topo –create topology.yml

This will create a pair of cEOS devices interconnected back-to-back with Ethernet interfaces:

+------+ +------+
|cEOS 1|et1+-----+et1|cEOS 2|
+------+ +------+

Testing
Let’s assume we’ve configured those devices with a simple BGP peering over their directly
connected interfaces and advertised their respective loopbacks into BGP. The pseudocode
for this config would look something like this:

interface Loopback0

ip address X.X.X.X/32

router bgp 65XXX

neighbor 12.12.12.Y remote-as 65YYY

redistribute connected

Now we want to verify that our control plane has converged and we have reachability to the
loopback interfaces. We start by creating a simple YAML configuration file “test.yml”,
describing the device connection details:

TRANSPORT: https

PORT: 80

USERNAME: admin

PASSWORD: admin

RUNFORMAT: suite

nodes:

SW1:

4/9
host: localhost

port: 9000

SW2:

host: localhost

port: 9001

PROD_TAGS:

– ignoretags

testfiles:

– network_validation

Arista Network Validation tool will look for test cases inside a “network_validation”
directory and execute all tests that match a particular tag (“ignoretags” will exectue all of
them).

Now it’s time to create our first test scenario. Each test case file contains a number of
sections responsible for various parts of testing procedure. For now let’s focus on the main
section called “Test Cases”. In there we first check that our BGP peering with a neighbor is in
“Established” state. We do that by issuing a “show ip bgp summary” command, using a “Get
Command Output” keyword, and picking apart the output until we get the “peerState”
attribute of a response. The second test case verifies that peer loopback is reachable with a
special “Address Is Reachable” keyword, which behind the scenes issues a ping and verifies
that at least one ping request received a response.

*** Settings ***

Documentation This test verifies control and dataplane connectivity between two BGP
peers

Suite Setup Connect To Switches

Suite Teardown Clear All Connections

Library AristaLibrary

Library AristaLibrary.Expect

Library Collections

*** Variables ***

# Neighbor peer address

${PEER_ADDRESS} 12.12.12.2

5/9
${PEER_LOOPBACK} 2.2.2.2

*** Test Cases ***

Controlplane verification

[Documentation] Check PEER Established

Get Command Output cmd=show ip bgp summary

Expect vrfs default peers ${PEER_ADDRESS} peerState is Established

Dataplane verification

[Documentation] Check the PEER Loopback is reachable

${result}= Address Is Reachable ${PEER_LOOPBACK}

Should Be True ${result}

Gather Post Change Output

Record Output cmd=show ip bgp summary

*** Keywords ***

Connect To Switches

[Documentation] Establish connection to a switch which gets used by test cases.

Connect To host=${SW1_HOST} transport=${TRANSPORT} username=${USERNAME}


password=${PASSWORD} port=${SW1_PORT}

Finally we can execute our test scenario and get the result:

$ validate_network.py –config test.yml –reportdir output

==============================================================================

Run Full Suite

==============================================================================

Run Full Suite.1 Bgp :: This test verifies control and dataplane connectivi…

==============================================================================

Controlplane verification :: Check PEER Established | PASS |

——————————————————————————

Dataplane verification :: Check the PEER Loopback is reachable | PASS |

6/9
——————————————————————————

Run Full Suite.1 Bgp :: This test verifies control and dataplane c… | PASS |

2 critical tests, 2 passed, 0 failed

2 tests total, 2 passed, 0 failed

==============================================================================

Run Full Suite | PASS |

2 critical tests, 2 passed, 0 failed

2 tests total, 2 passed, 0 failed

==============================================================================

Now that we’ve seen how easy it is to write and read tests using standard AristaLibrary
keywords, let’s have a look at how to extend the Robot Framework by adding new high-level
keywords.

Custom keywords
Let’s assume we want to verify some internal behaviour that is not necessarily exposed
through Arista CLI. One of the common tasks in acceptance testing is to run a
debug to record timing of a certain event (e.g. BGP keepalive or RIP update). Normally, this
would involve some setup/teardown commands to turn the debugging on and off and some
match command to match an event signature. Instead of doing all of these steps at every
tests case, we can define our own keywords in the bottom “Keywords” section of a test case
file:

Enable tracing for ${agent} ${setting}

Run Keyword And Ignore Error Configure bash timeout ${BASH_TIMEOUT} sudo rm
/tmp/${TRACE_FILE}

${trace_on}= Create List trace ${agent} setting ${setting} trace ${agent} filename
${TRACE_FILE}

${result}= Configure ${trace_on}

Length Should Be ${result} 2

Record all occurrences of ${event}

${result}= Configure bash timeout ${BASH_TIMEOUT} grep “${event}” /tmp/${TRACE_FILE}

Log ${result[0][‘messages’][0]}

Disable tracing for ${agent}


7/9
${trace_off}= Create List no trace ${agent} setting no trace ${agent} filename

${result}= Configure ${trace_off}

Length Should Be ${result} 2

Run Keyword And Ignore Error Configure bash timeout ${BASH_TIMEOUT} sudo rm
/tmp/${TRACE_FILE}

We can then make use of those keywords in the “Test Cases” section like this:

[Setup] Enable tracing for ${DEBUG_AGENT} ${DEBUG_SETTING}

Sleep ${DEBUG_TIMEOUT}

Record all occurrences of ${DEBUG_EVENT}

[Teardown] Disable tracing for ${DEBUG_AGENT}

Assuming we’ve defined the debug variables in the config YAML file like this:

DEBUG_AGENT: “Rib”

DEBUG_SETTING: “Rib::Rip*/*”

DEBUG_EVENT: “RIP RECV”

DEBUG_TIMEOUT: 35

We get all occurrences of “RIP RECV” event recorded during a 35 second window in the
output logs:

07:57:23.247214 RIP RECV 12.12.12.2 -> 224.0.0.9 vers 2, cmd Response, length 244

07:57:53.651828 RIP RECV 12.12.12.2 -> 224.0.0.9 vers 2, cmd Response, length 244

Further reading
Obviously, since Robot Framework has its own DSL, some learning curve is expected.
However, once one get familiar with most common standard libraries and keywords, writing
robot test cases becomes very easy. Thankfully Robot boasts one of the best-written
documentation for an open-source project, which, along with the Arista Network Validation
user guide, should be enough for anyone to get up to speed and start writing test cases in a
matter of hours.

Coming up
Hopefully this post has given a feel of how easy we can perform automated network
verification and validation, which brings us one step closer to our final goal – a fully
automated build and test pipeline for network devices. In the next and final post we’ll

8/9
complete our journey towards the network CI/CD nirvana by building our own network CI
server based on GitLab and creating a simple CI/CD pipeline that would make use of both
cEOS and Robot framework to build and test all network changes.

9/9

You might also like