Professional Documents
Culture Documents
Purpose
To help certification candidates organize their training and study plan, matching directly to the exam
blueprint with information and resources to augment Cumulus Networks training. Some general networking
exposure and knowledge is assumed, but links are always provided for additional research at your own
consumption pace.
Some general images of packet types or other reference information were included from web sources such
as Wikipedia and vendor web sites for quick reference.
Organization
This study guide was organized and generated directly from the exam study guide blueprint with
modifications and additions deemed appropriate.
https://education.cumulusnetworks.com/getting-started-materials/287534
Creation references
The document was created primarily using the Cumulus Linux 3.7 User Guide, Cumulus NetQ 1.4 User Guide
(commands validated in version 2.1), validated design documents, Cumulus provided free training resources,
and boot camp documentation. Additional information was added from prior knowledge and research.
Document formatting
Code, configuration, and examples
The document contains a lot of examples of commands and output. Some commands output may be slightly
formatted to fit inside this document and large tables may be reduced to the number of rows required for
the clarity of information.
Command syntax that is showing options within commands and not the command itself with output will be
written with the following syntax:
·· Required variable information enclosed in greater than and less than symbols “<x>”
·· <required_variable>
·· Required items with fixed choice selection will be enclosed in parentheses, “(z)”
·· (choice1|choice2)
·· Some choices may be omitted for brevity
Contents
Purpose1
Organization1
Creation references 1
Document formatting 1
Code, configuration, and examples 1
Files, directories, paths, and commands without output 2
Commands and Syntax 2
NCLU & NetQ 2
Switching fundamentals 13
Describe & switching concepts 13
Frame switching 13
Frame flooding 13
MAC address table 14
MAC learning and aging 15
Interpret frame format 15
Routing fundamentals 28
Describe BGP and how it is used 28
Border Gateway Protocol (BGP) overview 28
Linux concepts 51
Describe the basics of GRUB 51
Display how to boot a switch, recover a password, and manually boot 52
Restart ONIE installer 52
Uninstall all images & remove configuration 52
Display how to add and remove users, set permissions on files, password 55
Add and remove users 55
Set password 55
Set file permissions 56
Describe the benefits and differences between password login and keybased 57
Describe the difference between Userspace and Kernel 57
Configure systemd service architecture 58
Display starting, enabling, disabling a service 58
Describe the basics of EVPN, a BGP EVPN control plane, and the different route types 69
Ethernet Virtual Private Network (EVPN) 69
BGP EVPN control plane 69
EVPN route types 70
Troubleshooting 127
Describe basic troubleshooting techniques 127
Basic steps 127
Isolate the problem 127
Implementing a fix 128
Verifying the fix resolves the problem 128
Automation 152
Identify potential automation templates 152
Describe the principles of automation 154
Describe a library/module 154
Describe groupings 154
Describe push vs. pull & agent vs. agentless 155
Push vs. pull 155
Agent vs. agentless 155
Articulate Linux automation strategy (push file restart –> service) 157
Enable and use the NCLU API 157
Enabling API 157
NCLU API usage examples 158
This document was designed specifically from the exam blueprint (additional details added), so it will
familiarize you with the general sections and outline of the exam which follows.
·· Switching Fundamentals
·· Routing Fundamentals
·· Linux Concepts
·· Overlay Routing Concepts
·· Core Cumulus Concepts
·· Design Architecture Concepts
·· Troubleshooting
The exam is ~90 questions, 2 hours, and delivered online at your own site and convenience. Upon passing,
you will receive a number based on the year and order of passage, for example 2019::2. The certification is
valid for 3 years and successfully recertifying will keep your existing number.
FIGURE 1
Cumulus Networks offers a Linux 101 ebook and other free curriculum resources to build a base
of knowledge.
https://education.cumulusnetworks.com/linux-101-ebook-educational-resources
They also provide free how-to videos covering open networking basics, Cisco configuration comparisons,
NCLU, and Cumulus tutorials.
https://education.cumulusnetworks.com/series/how-to-videos
Self-paced training
Linux networking fundamentals
https://education.cumulusnetworks.com/series/linux-fundamentals
Linux Networking Fundamentals training is 3.5 hours of training covering 9 areas of essential knowledge
design for those new to Linux or networking.
·· Linux Concepts
·· IP Addressing
·· Routing Fundamentals
·· BGP Fundamentals
·· OSPF Fundamentals
·· Linux Routing Proficiency
·· First Hop Redundancy
Cumulus core
https://education.cumulusnetworks.com/series/cumulus-core
4.5 hours of training focusing on operational knowledge for those new to cumulus and teaching them
everything they need to be dangerous. The course is split into 5 modules.
·· Architecture
·· Configuration
·· Routing
·· Network Services and Security
·· Troubleshooting
https://cumulusnetworks.com/support/networking-training/
Cumulus Networks offers online boot camps with open enrollment, and lasting 12 hours equally spread over
3 days. You will by lead through practical exercise and hands on labs to strengthen your knowledge and
understanding of the topics.
Boot camp XL
On site dedicated to single company or organized group up to 16 people over 2 days for covering 16 hours.
This option uses the same modules, but provides more depth and access to in person live responses from
the instructor, and is great for and organization with a large team.
Schedule exam
https://education.cumulusnetworks.com/certification-exam-registration
Switching fundamentals
Describe & switching concepts
Frame switching
Frames are the final layer of encapsulation before transmission over physical medium. Ethernet is an
example of a network technology using frames for communication. Frame switching is the method by which
the frames are transferred between hosts by a central switching device.
Frame flooding
Frame flooding is a method of frame switching to handle unknown destinations. The frame with an unknown
destination is flooded to all ports except for the port the frame was received on.
A media access control (MAC) address table, sometimes referred to as a Content Addressable Memory
(CAM) table, is used on Ethernet switches to determine how to forward frames in a local area network
(LAN). This table stores both static and dynamic MAC addresses for forwarding of layer 2 frames. MAC
addresses are 48 bits in length and usually displayed as 12 hexadecimal characters separated by colons
or dashes.
Traditionally, MACs are learned from the source MAC in the frame ingress to a port, so that future frames
with that destination MAC can be switched directly to that port rather than flooded.
Instead of keeping the learned MAC address permanently and potentially running out of storage space, the
learned addresses are aged out after a certain time without being seen. By default, Cumulus Linux stores
MAC addresses in the Ethernet switching table for 1800 seconds (30 minutes). This timer can be changed
via NCLU.
The bridge-ageing option is in the NCLU blacklist, as it’s not frequently used. To configure this setting, you
need to remove the bridge-ageing keyword from the ifupdown_blacklist in /etc/netd.conf. Restart the netd
service after you edit the file. After restarting the service you can change the setting using NCLU.
https://en.wikipedia.org/wiki/Ethernet_frame
Ethernet Frames are comprised of a header, payload, and frame check sequence preceded by a preamble
and start frame delimiter, and followed by an end of frame and interpacket gap.
FIGURE 2
The Ethernet frame can be encapsulated in Ethernet or another protocol to accomplish overlay functions.
For example Virtual Extensible LAN (VXLAN) technology attempts to address scalability issues associated
with hyper scale computing deployments. Its goal is to provide layer 2 connectivity through a layer 3
infrastructure avoiding the pitfalls of traditional layer 2 network spans.
FIGURE 3
https://docs.cumulusnetworks.com/display/DOCS/VLAN+Tagging
VLAN Trunking allows traffic assigned to multiple VLANs to transit a single link. VLANs are tagged for
unique identification with a 4 byte 802.1q header.
FIGURE 4
https://docs.cumulusnetworks.com/display/DOCS/VLAN-aware+Bridge+Mode
The VLAN-aware mode in Cumulus Linux implements a configuration model for large-scale L2 environments,
with one single instance of Spanning Tree. Each physical bridge member port is configured with the list of
allowed VLANs as well as its port VLAN ID. MAC address learning, filtering and forwarding are VLAN-aware.
This significantly reduces the configuration size, and eliminates the large overhead of managing the
port/VLAN instances as subinterfaces, replacing them with lightweight VLAN bitmaps and state updates.
·· Scale: The new VLAN-aware mode can support 2000 concurrent VLANs while the traditional mode
supports only 200 concurrent VLANs
·· Simplicity: VLAN-aware mode has a simpler configuration
FIGURE 5
Cumulus Networks recommends using VLAN-aware mode bridges, rather than traditional mode bridges.
The bridge driver in Cumulus Linux is capable of VLAN filtering, which allows for configurations that are
similar to incumbent network devices. You can configure both VLAN-aware and traditional mode bridges on
the same network in Cumulus Linux; however you should not have more than one VLAN-aware bridge on a
given switch.
https://docs.cumulusnetworks.com/display/DOCS/Traditional+Bridge+Mode
https://support.cumulusnetworks.com/hc/en-us/articles/204909397
Address Resolution Protocol (ARP), defined in RFC 826 is a communication protocol used for discovering
the link layer (MAC) address, associated with a given network layer (IP) address.
The Cumulus Linux ARP implementation differs from standard Debian Linux ARP behavior in a few ways
because Cumulus Linux is an operating system for routers/switches rather than servers.
arp_accept BOOL 0— Don’t create new entries in the ARP table.
arp_announce INT 0 — (Default) Use 2 — Always use the best local address for this target. In this
any local address, mode we ignore the source address in the IP packet and
configured on try to select local address that we prefer for talks with the
any interface. target host. Such local address is selected by looking for
primary IP addresses on all our subnets on the outgoing
interface that include the target IP address. If no suitable
local address is found we select the first local address we
have on the outgoing interface or on all other interfaces,
with the hope we will receive reply for our request and even
sometimes no matter the source IP address
we announce.
arp_filter BOOL 0 — (Default) The kernel can respond to ARP requests with addresses from other
interfaces. This may seem wrong but it usually makes sense, because it increases
the chance of successful communication. IP addresses are owned by the complete
host on Linux, not by particular interfaces. Only for more complex setups like load-
balancing, does this behavior cause problems.
arp_ignore INT 0 — (Default) 1 — Reply only if the target IP address is local address
Reply for any local configured on the incoming interface.
target IP address,
configured on
any interface.
https://docs.cumulusnetworks.com/display/DOCS/Network+Troubleshooting#NetworkTroubleshooting-
ManipulatetheSystemARPCache
cumulus@leaf01:mgmt-vrf:~$ arp -a
? (169.254.0.1) at 44:38:39:00:06:01 [ether] PERM on swp51
? (10.1.3.12) at 44:38:39:00:03:05 [ether] PERM on vlan13
? (192.168.0.254) at 44:38:39:00:01:01 [ether] on eth0
? (10.2.4.12) at 44:38:39:00:03:05 [ether] PERM on vlan24
? (169.254.0.1) at 44:38:39:00:07:01 [ether] PERM on swp52
? (10.2.4.102) at 44:38:39:00:09:01 [ether] on vlan24
? (169.254.1.2) at 44:38:39:00:03:03 [ether] on peerlink.4094
? (10.1.3.101) at 44:38:39:00:08:01 [ether] on vlan13
To keep neighbors in the reachable state, Cumulus Linux includes a background process
( /usr/bin/neighmgrd) that tracks neighbors that move into a stale, delay or probe state, and attempts
to refresh their state ahead of any removal from the Linux kernel, and thus before it would be removed
from the hardware forwarding.
Neighbor discovery
https://en.wikipedia.org/wiki/Neighbor_Discovery_Protocol
Hosts and routers use ND in IPv6 to determine link-layer addresses, and routers attached.
The Neighbor Discovery Protocol (NDP, ND) is a protocol in the Internet protocol suite used with IPv6. It
operates at the link layer of the Internet model (RFC 1122), and is responsible for gathering information
required for internet communication, including configuration of local connections, domain name servers,
and gateways used to communicate with more distant systems.
The protocol defines five ICMPv6 packet types for IPv6 functions similar to the ARP and ICMP Router
Discovery and Router Redirect protocols for IPv4, but provides improvements over its IPv4 counterparts
(RFC 4861, section 3.1).
Traditional Linux bridges Per VLAN Spanning Tree (PVST) creates a spanning tree instance for a bridge.
Rapid PVST (PVRST) supports RSTP enhancements for each spanning tree instance. To use PVRST with
a traditional bridge, you must create a bridge corresponding to the untagged native VLAN and all the
physical switch ports must be part of the same VLAN. When connected to a switch that has a native VLAN
configuration, the native VLAN must be configured to be VLAN 1 only for maximum interoperability.
VLAN-aware bridges only operate in RSTP mode. STP bridge protocol data units (BPDUs) are transmitted
on the native VLAN. If a bridge running RSTP (802.1w) receives a common STP (802.1D) BPDU, it falls back
to 802.1D operation automatically. RSTP interoperates with MST seamlessly, creating a single instance of
spanning tree, which transmits BPDUs on the native VLAN. RSTP treats the MST domain as one giant switch.
When connecting a VLAN-aware bridge to a proprietary PVST+ switch using STP, VLAN 1 must be allowed
on all 802.1Q trunks that interconnect them, regardless of the configured native VLAN. This is because only
VLAN 1 enables the switches to address the BPDU frames to the IEEE multicast MAC address.
Configuration
Most STP parameters are blacklisted in the ifupdown_blacklist section of the /etc/netd.conf file. Before
you configure those parameters, you must edit the file to remove them from the blacklist. A full list of
parameters — https://docs.cumulusnetworks.com/display/DOCS/Spanning+Tree+and+Rapid+Spanning+Tre
e#SpanningTreeandRapidSpanningTree-paramsSpanningTreeParameterList.
Some of the more commonly edited parameters or functions are STP being turned on, off, have its priority
changed, and manipulate ports states and functions via configuration.
Priority
The bridge with the lowest priority is elected the root bridge. The priority must be a number between 0 and
61440 and must be a multiple of 4096; the default is 32768.
BPDU guard
To protect the spanning tree topology from unauthorized switches affecting the forwarding path, configure
BPDU guard (Bridge Protocol Data Unit). One common example is when a new switch is connected to an
access port off of a leaf switch. If this new switch is configured with a low priority, it could become the new
root switch and affect the forwarding path for the entire layer 2 topology. Recovery of the port requires it
to be shut down and re-enabled, but the event will reoccur until the cause of the issue is corrected on the
connected device.
Below is an example of the error message in /var/log/syslog for a BPDU Guard event.
Port Admin Edge is equivalent to the PortFast feature offered by other vendors. It enables or disables the
initial edge state of a port. Ports configured with PortAdminEdge bypass the listening and learning states
to move immediately into forwarding. Using PortAdminEdge mode has the potential to cause loops if not
accompanied by the BPDU guard feature. It is common, but not required, for edge ports to be configured
as access ports for a simple end host. In the data center, edge ports mostly connect to servers, which might
pass both tagged and untagged traffic.
PortAutoEdge is an enhancement to the standard PortAdminEdge (PortFast) mode, which allows for the
automatic detection of edge ports. PortAutoEdge enables and disables the auto transition to/from the edge
state of a port in a bridge.
When a BPDU is received on port configured with portautoedge, the port ceases to be in edge port
state and transitions into a normal STP port. When BPDUs are no longer received on the interface, the
port becomes an edge port, and transitions through the discarding and learning states before
resuming forwarding.
You can enable bpdufilter on a switch port, which filters BPDUs in both directions effectively disabling
STP. Using BDPU filter inappropriately can cause layer 2 loops. Use this feature deliberately and with
extreme caution.
On a point-to-point link where RSTP is running, if you want to detect unidirectional links and put the port in
a discarding state (in error), enable bridge assurance by configuring port type network. The port will be in
bridge assurance inconsistent state until a BPDU is received from the peer. You need to configure the port
type network on both the ends of the link in order for bridge assurance to operate properly. The default
setting for bridge assurance is off.
Storm control
Storm control provides protection against excessive inbound BUM (broadcast, unknown unicast, multicast)
traffic on layer 2 switch port interfaces, by limiting the packet rate. Configure storm control for each
physical port by configuring switchd. To enable unicast and multicast storm control at 400pps and
3000pps, for swp1:
Verification
Troubleshooting
The purpose of STP is to detect loops and block forwarding on ports to prevent the loop. STP is broken
when it cannot accomplish this task and unmitigated loop occurs. General steps for STP loops.
https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
You will need to configure switch1 and switch2, the steps for both are
very similar.
- create the peering; select one switch to be primary and the other secondary
- backup-ip is an optional (recommened) IP address that is separately reachable
- create VLANs 100-200
- configure a host facing interface for clag
- switch1 and switch2 MUST use the same clag-id for host-11
- connect the clag to host-11 to vlan 100 untagged
- review and commit
net commands
============
switch1# net add clag peer sys-mac 44:38:39:FF:01:01 interface swp3-4 primary backup-
ip 10.0.0.2
switch1# net add vlan 100-200
switch1# net add clag port bond bond-to-host-11 interface swp1 clag-id 1
switch1# net add bond bond-to-host-11 bridge access 100
switch1# net pending
switch2# net add clag peer sys-mac 44:38:39:FF:01:01 interface swp3-4 secondary
backup-ip 10.0.0.1
switch2# net add vlan 100-200
switch2# net add clag port bond bond-to-host-11 interface swp1 clag-id 1
switch2# net add bond bond-to-host-11 bridge access 100
switch2# net pending
switch2# net commit
Verification
============
switch1# net show interface
switch1# net show clag
Cumulus Networks recommends that you always enable STP in your layer 2 network. With MLAG, Cumulus
Networks recommends you enable BPDU guard on the host-facing bond interfaces. Best Practices for STP
with MLAG:
·· The STP global configuration must be the same on both the switches
·· The STP configuration for dual-connected ports should be the same on both peer switches
·· The STP priority must be the same on both peer switches
By default, when clagd is running, it logs its status to the /var/log/clagd.log file and syslog.
https://docs.cumulusnetworks.com/display/DOCS/Ethernet+Bridging+-+VLANs
Ethernet bridges provide a means for hosts to communicate through layer 2, by connecting all of the
physical and logical interfaces in the system into a single layer 2 domain. The bridge is a logical interface
with a MAC address and an MTU. The bridge MTU is the minimum MTU among all its members. By default,
the bridge’s MAC address is copied from eth0. The bridge can also be assigned an IP address.
Bridge members can be individual physical interfaces, bonds or logical interfaces that traverse an 802.1Q
VLAN trunk.
Single attached
The server is connected to a single switch for connectivity. The loss of the network devices brings the
connected servers down and it is the responsibility of the server and application infrastructure to overcome
the loss.
Linux bonding provides a method for aggregating multiple network interfaces (slaves) into a single logical
bonded interface (bond). Cumulus Linux supports two bonding modes:
·· IEEE 802.3ad link aggregation mode, which allows one or more links to be aggregated together to
form a link aggregation group (LAG), so that a media access control (MAC) client can treat the link
aggregation group as if it were a single link. IEEE 802.3ad link aggregation is the default mode
·· Balance-xor mode, where the bonding of slave interfaces are static and all slave interfaces are
active for load balancing and fault tolerance purposes. This is useful for MLAG deployments
A server can be bonded to a single switch or bonded to multiple switches. Bonding to a single switch
provide all links in the bond active, but there is lacking network level redundancy for the connection. A
single switch failure will bring the server offline. Traditionally when connecting to multiple switches a server
would use a version of NIC teaming to provide redundancy, failover, and outbound from server load sharing.
This was dependent on server manufacturer with each having different mechanisms. For a host to achieve
active/active forwarding at layer 2 to multiple switches, Multi-chassis Link Aggregation must be used.
A MLAGs purpose is to reduce the dependence on spanning tree and enable 100% bandwidth utilization
while providing redundancy and failover at both the server and network portions of the access
layer connections.
Multi-Chassis Link Aggregation (MLAG), enables a server or switch with a two-port bond, such as a link
aggregation group/LAG, EtherChannel, port group or trunk, to connect those ports to different switches
and operate as if they are connected to a single, logical switch. This provides greater redundancy and
greater system throughput.
Dual-connected devices or hosts can create LACP bonds that contain links to each physical switch.
Therefore, active-active links from the dual-connected devices are supported even though they are
connected to two different physical switches.
Routing fundamentals
Describe BGP and how it is used
Border Gateway Protocol (BGP) overview
BGP is a path-vector routing protocol defined in RFC4271 where each organization control is referenced
as a piece of the path. The path consists of the listed numbers of the organizations called autonomous
system numbers needed to reach the destination. The shortest path is considered the best route, and
BGP uses attributes to exchange paths, origins, next-hops, and other preference settings in order to
manipulate and filter route prefixes. BGP utilizes TCP port 179 for building neighbor relationships and
exchanging information.
Name Description
Atomic Aggregate Includes ASes which have been dropped due to route aggregation
Multiple Exit Discriminator (MED) Metric for external neighbros to reach the local AS default 0
Origin IGP Prefer IGP- learned routes over routes learned over EGP and EGP
over unknown
Traditionally BGP is an external gateway protocol intended for use between different autonomous systems,
or networks and is routing protocol used between ISPs, but carries a significant use case inside data centers.
iBGP represents an internal BGP connection which is signified by peering 2 devices in the same autonomous
system. Since the autonomous system represents a control organization, the systems behave differently for
external and internal connections.
Routes learned from iBGP peers will be only be advertised to eBGP peers, and a full mesh between peers
is required. Route reflectors and confederations are methods to improve scalability of iBGP and the default
full mesh requirement.
eBGP
eBGP represents an external BGP connection signified by the peering devices having different autonomous
system numbers. EBGP peers are assumed to be directly connected by default.
Routes learned from an eBGP peer will be advertised to other peers, both iBGP and eBGP.
BGP placement and usage within the data center has evolved over recent years with the scale required.
eBGP has shown to be a capable and preferred routing protocol to solve current challenges to data center
scale and automation improvements. Dinesh Dutt’s book provides a good look into these scenarios and
choices. https://cumulusnetworks.com/lp/bgp-ebook/
https://docs.cumulusnetworks.com/display/DOCS/Open+Shortest+Path+First+-+OSPF
https://en.wikipedia.org/wiki/Link-state_advertisement
OSPF is an interior gateway protocol using a link-state routing algorithm (Dijkstra) defined in RFC2328 with
updates for IPv6 with OSPFv3 in RFC5340. OSPF uses multicast groups for neighbor discovery and hellos
and supports MD5 and AH (v3) authentication.
OSPF maintains the view of the network topology conceptually as a directed graph. Each router represents
a vertex in the graph. Each link between neighboring routers represents a unidirectional edge and each
link has an associated weight (called cost) that is either automatically derived from its bandwidth or
administratively assigned. Using the weighted topology graph, each router computes a shortest path
tree (SPT) with itself as the root, and applies the results to build its forwarding table. The computation is
generally referred to as SPF computation and the resultant tree as the SPF tree.
An LSA (link-state advertisement) is the fundamental quantum of information that OSPF routers exchange
with each other. It seeds the graph building process on the node and triggers SPF computation. LSAs
originated by a node are distributed to all the other nodes in the network through a mechanism called
flooding. Flooding is done hop-by-hop. OSPF ensures reliability by using link state acknowledgement
packets. The set of LSAs in a router’s memory is termed link-state database (LSDB), a representation of
the network graph. Therefore, OSPF ensures a consistent view of LSDB on each node in the network in a
distributed fashion (eventual consistency model); this is key to the protocol’s correctness.
Type 1 Router LSA The Router LSA is generated by each router for each area it is located. In the link-
state ID you will find the originating router’s ID.
Type 2 Network LSA Network LSAs are generated by the DR. The link-state ID will be the router ID of
the DR.
Type 3 Summary LSA The summary LSA is created by the ABR and flooded into other areas.
Type 4 Summary Other routers need to know where to find the ASBR. This is why the ABR will
ASBR LSA generate a summary ASBR LSA which will include the router ID of the ASBR in the
link-state ID field.
Type 5 External LSA The external LSAs are generated by the ASBR.
Type 7 External NSSA An external LSA for NSSA (not-so-stubby-area) which doesn’t allow external LSAs
(type 5).
LSA
OSPF as a DC underlay
EVPN can be deployed with an OSPF or static route underlay if needed. This is a more complex
configuration than using eBGP. In this case, OSPF is responsible for neighboring and exchanging loopback
routing information. The loopbacks are used to peer for the overlay. A separate routing instance or protocol
is then required for the overlay. OSPF is IPv4 only and OSPFv3 may not support IPv4 in all implementations.
Cumulus Linux supports IP unnumbered interfaces for OSPF over Ethernet point-to-point interfaces
and recommends using Prescriptive Topology Manager to verify link connectivity in this scenario. IP
unnumbered can conserve address space in the network, reduce LSA table size (using less memory) as
well as make automation easier for larger networks.
Each pod of a two-tier Clos network are assigned an area. For a single pod, all devices are in area 0. LSA
Types 1 and 2 are never flooded outside of their local areas.
The area border router (ABR) is placed on the spine switches. The connectivity to and from the data center
can be either in the pod’s area or moved to area 0. For ease of automation, the non-backbone areas could
be configured with the same area number in this scenario.
Beyond two-tiers, the massively scalable data center, each super-spine switch would be in area 0. Area 0
can be discontiguous if the switches never need to talk with each other. Each pod would be in its own area
so the SPF calculations would be limited to its local pod.
https://docs.cumulusnetworks.com/display/DOCS/Open+Shortest+Path+First+-
+OSPF#OpenShortestPathFirst-OSPF-StubAreas
Stub areas help improve scalability for larger networks reducing LSAs types. Type 5 External LSAs can take
up a large percentage of the database size.
Normal non-zero area LSA types 1, 2, 3, 4 area-scoped, type 5 externals, inter-area routes summarized
Stub area LSA types 1, 2, 3, 4 area-scoped, No type 5 externals, inter-area routes summarized
Totally stubby area LSA types 1, 2 area-scoped, default summary, No type 3, 4, 5 LSA types allowed
Bridges can be included as part of a routing topology once assigned an IP address. This enables hosts
within the bridge to communicate with other hosts outside of the bridge, via a switch virtual interface
(SVI), which provides layer 3 routing. The IP address of the bridge is typically from the same subnet as the
bridge’s member hosts.
FIGURE 6
VRR enables hosts to communicate with any redundant router without reconfiguration, running dynamic
router protocols, or running router redundancy protocols. Redundant routers will respond to ARP requests
from hosts in an identical manner, but if one fails, the other redundant routers will continue to respond,
leaving the hosts with the impression that nothing has changed. Cumulus Linux only supports VRR on
switched virtual interfaces (SVIs). VRR is NOT supported on physical interfaces or virtual subinterfaces.
As the bridges in each of the redundant routers are connected, they will each receive and reply to ARP
requests for the virtual router IP address.
A range of MAC addresses is reserved for VRR to prevent MAC address conflicts with other interfaces in the
same bridged network. The reserved range is 00:00:5E:00:01:00 to 00:00:5E:00:01:ff. Cumulus Networks
recommends using MAC addresses from the reserved range when configuring VRR. The reserved MAC
address range for VRR is the same as for the Virtual Router Redundancy Protocol (VRRP), as they serve
similar purposes. VRRP is separate but can be configured instead, if preferred.
The VLAN interface must have unique IP addresses for both the physical (the address option) and
virtual (the address-virtual option) interfaces, as the unique address is used when the switch initiates an
ARP request.
https://docs.cumulusnetworks.com/display/DOCS/Virtual+Router+Redundancy+-+VRR+and+VRRP#Virtual
RouterRedundancy-VRRandVRRP-VRR
https://docs.cumulusnetworks.com/display/DOCS/Virtual+Router+Redundancy+-+VRR+and+VRRP#Virtual
RouterRedundancy-VRRandVRRP-VRRP
Virtual Router Redundancy Protocol (VRRP) allows for a single virtual default gateway shared among two
or more network devices in active/standby. The VRRP master forwards packets, and if the master VRRP
router fails, another VRRP standby router automatically takes over the master role. VRRP advertisements
are sent to other VRRP routers in the same virtual router group, which include priority and state. VRRP
router priority determines the role that each virtual router plays and who becomes the new master if the
master fails.
All virtual routers use 00:00:5E:00:01:XX for IPv4 gateways and 00:00:5E:00:02:XX for IPv6 gateways as
their MAC address. The last byte of the address is the Virtual Router IDentifier (VRID), which is different
for each virtual router group in the network. This MAC address is used by only one physical router at a time,
which replies with this address when ARP requests or neighbor solicitation packets are sent for the
IP addresses of the virtual router.
FIGURE 7
Anycast gateway
An anycast gateway is used with VXLAN routing typically in a distributed routing architecture. A distributed
architecture involves configuring an SVI and enabling VXLAN routing on each leaf switch. Therefore, VXLAN
routing occurs closest to the host, keeping traffic local for more efficient routing and lower latency than the
centralized architecture.
Using an Anycast gateway, every leaf’s SVI is configured with the same IP address per VLAN. Since all hosts
within a VLAN are configured with the same IP default gateway address, all hosts or VMs can be easily
moved throughout the data center without changing their configuration.
ECMP routing is when multiple next hops are installed to the same destination, due to the same protocol
containing multiple routes with an identical cost or metric. ECMP is enabled by default in Cumulus Linux
and load sharing occurs automatically for all routes with multiple next hops installed. ECMP load sharing
supports both IPv4 and IPv6 routes.
Describe hashing
To prevent out of order packets, ECMP hashing is done on a per-flow basis, which means that all packets
with the same source and destination IP addresses and the same source and destination ports always
hashed to the same next hop. ECMP hashing does not keep a record of flow states, nor a record of packets,
nor guarantees that traffic sent to each next hop is equal.
·· Be the identical route, including network and prefix length. A /24 and /25 are NOT the same route.
·· Originate from the same routing protocol. Routes from different sources are not considered equal.
For example, a static route and an OSPF route are not considered for ECMP load sharing.
·· Have equal cost or metric within the routing protocol. If two routes from the same protocol are unequal,
only the best route is installed in the routing table.
When different protocols provide the same route prefix, their administrative distance is compared to
determine which route should be utilized with the lower distance preferred.
·· eBGP: 20
·· iBGP: 200
·· OSPF: 110
·· RIP: 120
In the example below, BGP and OSPF are providing routes for 10.0.0.11/32, 10.0.0.21/32, and 10.0.0.22/32
which are the loopback interfaces of leaf01, spine01 and spine02. Longest prefix length match is chosen
prior to comparison protocols sources. A /32 matching route will be preferred over /30 regardless of
protocol source for example.
1. If the route entry does not currently exist in the routing table, add it to the routing table
2. If the route entry is more specific than an existing route, add it to the routing table. Both the new and
less specific entry are retained in the routing table.
3. If the route entry is the same as an existing one, but is received from a more preferred route source,
replace the forwarding entry with the new entry
4. If the route entry is the same as an existing one, and is received from the same protocol:
a. Discard the new route if the metric is higher than the existing route
b. Replace the existing route if the metric of the new route is lower
c. If the metric for both routes is the same, use both routes for load-balancing
the database according to the way the dynamic routing protocol works.
FIGURE 8
A device will check for the longest match to select the most specific route to the destination. For example
a packet destined for 172.16.0.10 with multiple matching routes will compare prefix lengths. In the example
pictured on the right, the device has three possible routes that match this packet: 172.16.0.0/12, 172.16.0.0/18,
and 172.16.0.0/26. Of the three routes, 172.16.0.0/26 has the longest match and is therefore chosen to
forward the packet.
FIGURE 9
Routing protocols dynamically compute reachability to destinations based on information state. Dynamic
routing protocols can adjust to changes in the network without administrative interaction. They provide
significant advantages for overall uptime and scale to large numbers of devices.
An overview for BGP was provided via the earlier section, BGP Overview, and is discussed in detail
throughout the study guide.
OSPF
An overview for BGP was provided via the earlier section, OSPF Overview.
RIP is a distance vector protocol defined in RFC1058 (v1), RFC2080 (ipv6 NG), and RFC2453 (v2), which is
generally limited to use in smaller networks by its lack of features and scalability, and is declining in overall
usage since the 1990s. It is simple and easy to use, but limited to 15 hops. For IPv4, RIPv1 and RIPv2 are
available versions. Both use hop count as a metric, send routing tables every 30 seconds, and are assigned a
distance of 120, but version 1 uses broadcasts for updates and cannot advertise subnet masks while version
2 moves to multicast (224.0.0.9) updates and subnet mask support. RIP employs split horizon to help
prevent loops by blocking the advertisement of a network on the same interface it was learned on.
·· An update has been received from another router; route goes into a 16 metric (or unreachable).
·· An update has been received from another router; route goes into a higher metric than what it is currently using.
For modern usage, RIP is generally limited to consumer grade devices and networks in need of an upgrade.
IS-IS is an link-state interior gateway protocol designed for use within an administrative domain defined is
ISO/IEC 10589:2002 within the OSI reference design, and sees its primary usage in large service provider
networks. IS-IS operates by flooding link state information through a network of devices with each device
independently building a database of the network’s topology. Conceptually similar to OSPF, IS-IS LAO uses
Dijkstra’s algorithm for best path computation.
While OSPF is a layer 3 protocol and was built to run on IP, IS-IS is a layer 2 protocol with IP support defined
in RFC1195.
IS-IS differs from OSPF in the way that “areas” are defined and routed between. IS-IS routers are designated
as being: Level 1 (intra-area); Level 2 (inter area); or Level 1–2 (both). Routing information is exchanged
between Level 1 routers and other Level 1 routers of the same area, and Level 2 routers can only form
relationships and exchange information with other Level 2 routers. Level 1–2 routers exchange information
with both levels and are used to connect the inter area routers with the intra area routers.
In IS-IS area borders are in between routers, designated as Level 2 or Level 1–2. The result is that an IS-IS
router is only ever a part of a single area. IS-IS also does not require Area 0 (Area Zero) to be the backbone
area through which all inter-area traffic must pass. The logical view is that OSPF creates something of a
spider web or star topology of many areas all attached directly to Area Zero and IS-IS by contrast creates a
logical topology of a backbone of Level 2 routers with branches of Level 1–2 and Level 1 routers forming the
OSPF IS-IS
Link Circuit
Area Sub-domain
individual areas.
As of Cumulus Linux 3.7.3, NCLU does not support interacting with IS-IS and it must be configured through
direct FRR manipulation. http://docs.frrouting.org/en/latest/isisd.html
FIGURE 10
EIGRP is a hybrid distance vector routing protocol developed by Cisco Systems, which remained unique to
Cisco equipment until 2013 when it was published in an IETF draft, then later published as RFC7868 in 2016,
and adopted by some vendors.
EIGRP utilizes multicast (224.0.0.10) for hello packets to form and maintain neighbor adjacencies. It utilizes
bandwidth and delay as metrics by default, with the option to enable load, reliability, and MTU, as well as
customize their weights in metric calculation through K value manipulation. In order for neighbor to form
an adjacency they must agree on ASN, subnet, and K values. The composite metric formula for EIGRP is
displayed below.
FIGURE 11
FIGURE 12
*Chart shows older Cisco proprietary information for EIGRP, rather than RFC7868
*Is IS-IS distance right for Cumulus Linux?
Internet Protocol version 4 is a core protocol for standards based networking on the internet and still routes
most traffic today despite the ongoing deployment of IPv6, and exhaustion of new IPv4 assignments. IPv4
is described in RFC791 and uses 32 bit addressing providing a maximum of 4,294,967,296 (~4.295 billion) IP
addresses, although ~590 million are reserved for various purposes. The addressing is usually represented in
dot-decimal format (10.10.10.1) with 4 decimals separated by a period.
FIGURE 13
IPv4 broadcast
The IP address is combined with a subnet mask to determine the size and scope for the network or subnet,
as well as determine network and broadcast addresses and the available hosts included. The network
address is the constant network space with the host portion with all zeros. The broadcast address is the
constant network space with the host portion space with all ones. The network and host portion do not
have to reside on the dotted bit boundary, as shown in the example for 192.168.5.0/23 network is below.
Broadcast addresses distribute the layer 3 packet to all hosts in the given network. Broadcasts are
traditionally filtered by routers, and need forwarding to reach an off network address. A good example
of this is DHCP and DHCP relay. Router take the broadcast and forward it via unicast towards the specific
DHCP server.
The broadcast address can be the last address in a network for directing a broadcast to a specific network,
but more commonly is all 1s (255.255.255.255) when used.
IPv4 unicast
An ipv4 unicast address is used to send information to a specific host. Any IP address in the 192.168.4.0/23
network above that is not the network or broadcast address is a unicast address. 192.168.4.1 through
192.168.5.254. For multiple receivers to get the same data, the data must be sent once for each and e
very receiver.
IPv4 multicast
Multicast addresses are design to deliver data to multiple receivers at once who are subscribed and joined
to the multicast information stream. A good example of this is IPTV video services offered by ISPs, where
set top boxes subscribe to the multicast stream of a specific IPTV channel. They are also commonly used in
routing protocol neighbor discovery and peering relationships such as OSPF, IS-IS, and EIGRP.
Multicast addresses reside in the 224.0.0.0/4 (224.0.0.0 through 239.255.255.255) range and is carved up
within that range for specific reserved purposes.
IPv6 overview
FIGURE 14
Unicast and anycast addresses are typically composed of two logical parts: a 64-bit network prefix used for
routing, and a 64-bit interface identifier used to identify a host’s network interface. The network prefix (the
routing prefix combined with the subnet id) is contained in the most significant 64 bits of the address. The
size of the routing prefix may vary; a larger prefix size means a smaller subnet id size. The bits of the subnet
id field are available to the network administrator to define subnets within the given network. The 64-bit
interface identifier is either automatically generated from the interface’s MAC address using the modified
EUI-64 format, obtained from a DHCPv6 server, automatically established randomly, or assigned manually.
IPv6 unicast
Bits 10 54 64
The prefix field contains the binary value 1111111010. The 54 zeroes that follow make the total network prefix
the same for all link-local addresses (fe80::/64 link-local address prefix), rendering them non-routable.
IPv6 multicast
Multicast addresses are formed according to several specific formatting rules, depending on the application,
but the general format is below. IPv6 does not use broadcast addresses as IPv4 does, rather using the
specially defined all-nodes multicast address.
Bits 8 4 4 112
https://docs.cumulusnetworks.com/display/DOCS/Routing
Static routes are managed using NCLU or the Cumulus Linux ip route command. The routes are added to
the FRRouting routing table, and are then updated into the kernel routing table as well.
Static routes can be verified via NCLU show commands focused to their scope.
IPv6 static routes are configured in the same manner as IPv4 via NCLU.
https://docs.cumulusnetworks.com/display/DOCS/Virtual+Routing+and+Forwarding+-+VRF
Cumulus contributed VRFs to the Linux code. VRFs are individual logical routers running on a device to
logically segment layers 3 routing tables. This is useful for multitenancy, leveraging powerful resources
efficiently, and separating management and data plane routing tables. VRF is fully supported in the Linux
kernel, and has the following characteristics:
·· The VRF is presented as a layer 3 master network device with its own associated routing table.
·· The layer 3 interfaces (VLAN interfaces, bonds, SVIs) associated with the VRF are enslaved to that
VRF; IP rules direct FIB (forwarding information base) lookups to the routing table for the VRF device.
·· The VRF device can have its own IP address, known as a VRF-local loopback.
·· Applications can use existing interfaces to operate in a VRF context — by binding sockets to the
VRF device or passing the ifindex using cmsg. By default, applications on the switch run against
the default VRF. Services started by systemd run in the default VRF unless the VRF instance is
used. If management VRF is enabled, logins to the switch default to the management VRF. This
provides convenience as users to not have to specify management VRF in each command.
·· Listen sockets used by services are VRF-global by default unless the application is configured
to use a more limited scope, such as in the management VRF. Connected sockets (like TCP) are
then bound to the VRF domain in which the connection originates. The kernel provides a sysctl
that allows a single instance to accept connections over all VRFs. For TCP, connected sockets
are bound to the VRF the first packet was received. This sysctl is enabled for Cumulus Linux.
·· Connected and local routes are placed in appropriate VRF tables.
·· Neighbor entries continue to be per-interface, and you can view all entries associated with the
VRF device.
·· A VRF does not map to its own network namespace; however, you can nest VRFs in a
network namespace.
·· You can use existing Linux tools to interact with it, such as tcpdump.
https://docs.cumulusnetworks.com/display/DOCS/Management+VRF
Management VRF is a subset of VRF and provides separation between the out-of-band management
network and the in-band data plane network. The main routing table is the default table for all of the data
plane switch ports, and a management VRF, mgmt, is used for routing through the Ethernet ports of the
switch. The mgmt name is special cased to identify the management VRF from a data plane VRF.
Cumulus Linux only supports eth0 or eth1 as the management interface, depending on the switch platform.
The Ethernet ports are software-only and not hardware accelerated by switchd. VLAN subinterfaces, bonds,
bridges, and the front panel switch ports are not supported as management interfaces.
When management VRF is enabled, logins to the switch are set into the management VRF context and
IPv4 and IPv6 networking applications (for example, Ansible, Chef, and apt-get) run by an administrator
communicate out the management network by default. This default context does not impact services run
through systemd and the systemctl command, and does not impact commands examining the state of the
switch, such as the ip command to list links, neighbors, or routes.
Configure VRF
You configure VRF by associating each subset of interfaces to a VRF routing table, and configuring an
instance of the routing protocol — BGP or OSPFv2 — for each routing table. Configure the VRF using NCLU,
then place the layer 3 interface(s) in the VRF.
When you commit the change to add the management VRF, all connections over eth0 are dropped. This
can impact any automation that might be running.
VRF verification
VRF Table
---------------- ------
mgmt 1001
rocket 1002
VRF: mgmt
----------------------
eth0 UP 44:38:39:00:03:00 <BROADCAST,MULTICAST,UP,LOWER _ UP>
VRF: rocket
----------------------
VRF mgmt:
K>* 0.0.0.0/0 [0/0] via 192.168.0.254, eth0, 02:16:47
K * 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 02:17:23
C>* 192.168.0.0/16 is directly connected, eth0, 02:16:48
VRF mgmt:
K * ::/0 [255/8192] unreachable (ICMP unreachable), 02:17:23
C>* fe80::/64 is directly connected, eth0, 02:17:23
K>* ff00::/8 [0/256] is directly connected, eth0, 02:17:23
IGMP (Internet Group Management Protocol) and MLD (Multicast Listener Discovery) snooping are
implemented in the bridge driver in the Cumulus Linux kernel and are enabled by default. IGMP snooping
processes IGMP v1/v2/v3 reports received on a bridge port in a bridge to identify the hosts which would like
to receive multicast traffic destined to that group.
FIGURE 15
When an IGMPv2 leave message is received, a group specific query is sent to identify if there are any other
hosts interested in that group, before the group is deleted.
An IGMP query message received on a port is used to identify the port that is connected to a router and is
interested in receiving multicast traffic.
MLD snooping processes MLD v1/v2 reports, queries and v1 done messages for IPv6 groups. If IGMP or MLD
snooping is disabled, multicast traffic gets flooded to all the bridge ports in the bridge. Similarly, in the
absence of receivers in a VLAN, multicast traffic would be flooded to all ports in the VLAN. The multicast
group IP address is mapped to a multicast MAC address and a forwarding entry is created with a list of
ports interested in receiving multicast traffic destined to that group.
Linux concepts
Describe the basics of GRUB
GRUB is a boot loader that understands the underlying file system by maintaining a driver for each file
system the operating system supports. This approach eliminates the need for hardcoded locations of hard
disk sectors and existence of map files, and does not require Master Boot Record (MBR) updates after the
kernel images are added or moved around. Configuration of a boot loader is stored in a regular file, which
is also accessed in a file system-aware way to obtain boot configurations before the actual booting of any
kernel images.
Reprovisioning the system deletes all system data from the switch. A reboot is required for the reinstall to
begin. To initiate the provisioning and installation process, run the onie-select -i command:
To remove all installed images and configurations and return the switch to its factory defaults, run the
onie-select -k command:
If your system becomes broken is some way, you can correct certain issues by booting into ONIE rescue
mode. In rescue mode, the file systems are unmounted and you can use various Cumulus Linux utilities to
try and resolve a problem. To reboot the system into ONIE rescue mode, run the onie-select -r command:
Password recovery
Use single user mode to assist in troubleshooting system boot issues or for password recovery. To enter
single user mode, follow the steps below.
1. Boot the switch, as soon as you see the GRUB menu,
2. Use the ^ and v arrow keys to select Advanced options for Cumulus Linux GNU/Linux. A menu similar
to the following should appear:
3. Select Cumulus Linux GNU/Linux, with Linux 4.1.0-cl-1-amd64 (recovery mode).
6. Sync the /etc directory using btrfs, then reboot the system:
A changelog is a record of all notable changes made to a project or software program. Cumulus Linux
release notes keep a record of all changes from version to version.
https://support.cumulusnetworks.com/hc/en-us/articles/360007793174-Cumulus-Linux-3-7-Release-Notes
https://cdn.kernel.org/pub/Linux/kernel/v4.x/ChangeLog-4.19.2
commit ab5d01b6130a4faa37a393cf828c6f65c45e7251
Author: David Ahern <dsahern@gmail.com>
Date: Wed Oct 24 08:32:49 2018 -0700
Display how to add and remove users, set permissions on files, password
Add and remove users
You can configure user accounts in Cumulus Linux with read-only or edit permissions for NCLU:
·· For read-only permissions with NCLU add users to the netshow group. Users in the netshow group
can run NCLU net show commands, such as net show interface or net show config, and certain general
Linux commands, such as ls, cd or man, but cannot run net add, net del or net commit commands.
·· For edit permissions with NCLU add users the netedit group. Users in the netedit group can run
NCLU configuration commands, such net add, net del or net commit in addition to NCLU net
show commands.
Set password
https://docs.cumulusnetworks.com/display/DOCS/User+Accounts
cumulus@oob-mgmt-server:~$ passwd
Changing password for cumulus.
(current) UNIX password:
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Linux permissions dictate 3 things you may do with a file, read, write and execute referred to Linux by a
single letter each.
For every file there are 3 sets of people for whom permissions are specified. The order of permissions is
always read, then write, and then execute.
·· owner — a single person who owns the file. (usually file creator, but ownership may be granted to
different user)
·· group — every file belongs to a single group.
·· others — everyone else who is not in the group or the owner.
cumulus@oob-mgmt-server:~$ ls -l
total 0
drwxr-xr-x 1 cumulus cumulus 96 Feb 21 15:10 cldemo-netq
drwx------ 1 cumulus cumulus 474 Oct 4 12:54 gateone
drwxr-xr-x 1 cumulus cumulus 136 Feb 19 18:48 local-git-repo
·· The first character identifies the file type. A dash “-“, is a normal file. “d”, is a directory.
·· Characters 2 through 4 represent permissions for the owner. A letter represents the presence of a
permission and a dash “-“, represents the absence of a permission. The owner above all permissions
(read, write and execute).
·· Characters 5 through 7 represent permissions for the group. The group above has the ability to read
but not write or execute.
·· Characters 8 through 10 represent permissions for others (everyone else). The others above have the
execute permission and nothing else.
·· Who are we changing the permission for? [ugoa] — user (or owner), group, others, all
·· Are we granting or revoking the permission — indicated with either a plus ( + ) or minus ( - )
·· Which permission are we setting? — read ( r ), write ( w ) or execute ( x )
cumulus@oob-mgmt-server:~/cldemo-netq$ ls -l
total 20
-rw-r--r-- 1 cumulus cumulus 281 Feb 21 15:10 ansible.cfg
drwxr-xr-x 1 cumulus cumulus 60 Feb 21 15:10 docker
drwxr-xr-x 1 cumulus cumulus 60 Feb 21 15:10 evpn
-rw-r--r-- 1 cumulus cumulus 733 Feb 21 15:10 hosts
-rw-r--r-- 1 cumulus cumulus 808 Feb 21 15:10 README.md
cumulus@oob-mgmt-server:~/cldemo-netq$ ls -l
total 20
-rw-r--r-- 1 cumulus cumulus 281 Feb 21 15:10 ansible.cfg
drwxr-xr-x 1 cumulus cumulus 60 Feb 21 15:10 docker
drwxr-xr-x 1 cumulus cumulus 60 Feb 21 15:10 evpn
-rw-r-xr-- 1 cumulus cumulus 733 Feb 21 15:10 hosts
-rw-r--r-- 1 cumulus cumulus 808 Feb 21 15:10 README.md
-rw-r--r-- 1 cumulus cumulus 6445 Feb 21 15:10 setup.yml
Describe the benefits and differences between password login and keybased
Password authentication is accomplished with the user prompted for a password upon login, and the
password provided by the user is hashed and checked against a file or database. This requires each user to
remember their (ideally complex) password and follow proper password security protocols. User forgetting
password or locking out their access is quite common and can add administrative burden.
Key based authentication is accomplished by generating a public and private key pair on your jump host(s)
or oob-mgmt device(s), and copying the public key file to all of your hosts. This requires the admin to
properly generate the key file and copy it successful to 1000s of devices, and is an excellent candidate for
automation. This method of login also enables easier automation by simplifying the process for system to
system interaction. The authentication lives in a file and is transparent to the user at the time of login, as
opposed to the user entering a password. The login mechanism only transits the network once upon initial
copy, opposed to each time with password based logins.
https://docs.cumulusnetworks.com/display/DOCS/SSH+for+Remote+Access
·· The user space, which is a set of locations where normal user processes run (everything other than
the kernel). The role of the kernel is to manage applications running in this space from messing
with each other, and the system.
·· The kernel space, which is the location where the code of the kernel is stored, and executes under.
FIGURE 16
Processes running under the user space have access only to a limited part of memory, whereas the
kernel has access to all of the memory. Processes running in user space don’t have access to the kernel
space. User space processes can only access a small part of the kernel via an interface exposed by the
kernel — the system calls. If a process performs a system call, a software interrupt is sent to the kernel,
which then dispatches the appropriate interrupt handler and continues its work after the handler
has finished.
This separation is meant to ensure that Linux is as reliable and secure and operating system as possible.
Bash (bourne again shell) is a sh-compatible shell that incorporates useful features from the Korn shell (ksh)
and C shell (csh). It is intended to conform to the IEEE POSIX P1003.2/ISO 9945.2 Shell and Tools standard.
It gives users interaction to a command line environment to interface with the system and offers functional
improvements over sh for both programming and interactive. The improvements offered by Bash include:
Normally, we will get our output on the screen, which is convenient most of the time, but sometimes we
desire to save it into a file to keep as a record, feed into another system, or send it to someone else. The
greater than operator “>” indicates to the command line the user wants the programs output (or whatever it
sends to STDOUT) to be saved in a file instead of printed to the screen.
If the user redirects to a file which does not exist it will be created automatically, but if redirected into a file
which already exists, then the file’s contents will be cleared, and the new output saved to it. Instead the new
data can be appended to the file by using the double greater than operator “>>”.
Pipes
Pipes redirect the output of a command to another command. The output of ls -al can be piped to less
in order to scroll the output via keypress. The “|” character is used; the command from the example is
ls -al | less. The operator feeds the output from the program on the left as input to the program on the right.
The below example feeds the output of net show configuration files into grep to search for directory
markers at the start of lines in order to only show the configuration file paths rather than the entire file.
cumulus@leaf01:mgmt-vrf:/$ cd /
cumulus@leaf01:mgmt-vrf:/$ pwd
/
cumulus@leaf01:mgmt-vrf:/$ cd ~
cumulus@leaf01:mgmt-vrf:~$ pwd
/home/cumulus
cumulus@leaf01:mgmt-vrf:~$ cd ..
cumulus@leaf01:mgmt-vrf:/home$ pwd
/home
Linux users can create blank files with the command touch [options] <filename>. They can utilize the
redirection methods already covered to create a new file with pertinent information to their current task.
Users can copy files with or without manipulation to create new files as well.
cumulus@oob-mgmt-server:~/cldemo-netq$ ls
ansible.cfg docker evpn hosts README.md setup.yml testfile
cumulus@oob-mgmt-server:~/cldemo-netq$ ls
ansible.cfg docker evpn hosts README.md setup.yml testfile testfile2 testfile3
The sudo command allows you to execute a command as superuser or another user as specified by the
security policy. Examples of sudo usage are numerous in this document.
https://docs.cumulusnetworks.com/display/DOCS/Using+sudo+to+Delegate+Privileges
The grep program can be used for many purposes, such as finding files, searching inside files, including lines
before and/or after search item, or for filtering content output for show commands in Cumulus Linux.
# Management interface
auto eth0
iface eth0 inet dhcp
alias management interface
vrf mgmt.
·· /home — contains a home folder for each user which contains the user’s data files and user-specific
configuration files. The cumulus user’s home folder is /home/cumulus. Each user only has write
permission to their own home folder and must obtain elevated permissions to modify other files
on the system.
·· /opt — contains subdirectories for optional software packages. It’s commonly used by proprietary
software that doesn’t obey the standard file system hierarchy, which might dump files in
/opt/application when installed.
·· /sbin — similar to the /bin directory. It contains essential binaries that are generally intended to be run
by the root user for system administration.
·· /usr — contains applications and files used by users, opposed to applications and files used by the
system. Non-essential applications are located inside the /usr/bin directory instead of the /bin
directory and non-essential system administration binaries are located in the /usr/sbin directory
instead of the /sbin directory.
·· /var — is the writable counterpart to the /usr directory, which must be read-only in normal operation.
Log files and everything else that would normally be written to /usr during normal operation are
written to the /var directory. Log files are one example found in /var/log.
/etc/lldpd.conf Link Layer Discover Protocol (LLDP) Link Layer Discovery Protocol
daemon configuration
default-lease-time 600;
max-lease-time 7200;
host printer {
hardware Ethernet 00:01:da:b4:3e:45;
fixed-address 10.1.1.100;
}
host web-server {
hardware Ethernet 00:02:a9:df:31:90;
fixed-address 10.1.1.101;
}
auto eth0
iface eth0 inet dhcp
The dhclient program can be run to interact with DHCP manually on an interface. A release and renew is
exampled below.
Virtual Extensible LAN is a standards-based overlay technology defined in RFC7348 designed to provide
layer 2 adjacency over a layer 3 network via encapsulation and decapsulation through a component called
a VTEP (VxLAN Tunnel End Point). A VTEP has an IP address in the underlay network and also has one
or more VNI’s associated. When frames from one of these VNI’s arrives at the ingress VTEP, the VTEP
encapsulates it with UDP and IP headers.
The encapsulated packet is sent over the IP network to the Egress VTEP. When it arrives, the VTEP removes
the IP and UDP headers, and delivers the frame as normal.
VXLAN configuration
FIGURE 17
In distributed asymmetric routing, each VTEP acts as a layer 3 gateway, performing routing for its attached
hosts. The routing is called asymmetric because only the ingress VTEP performs routing, the egress
VTEP only performs the bridging. Traffic always travels on different VNIs; always the destination VNI.
Asymmetric routing is easy to deploy as it can be achieved with only host routing and does not involve
any interconnecting VNIs. However, each VTEP must be provisioned with all VLANs/VNIs — the subnets
between which communication can take place; this is required even if there are no locally-attached hosts for
a particular VLAN.
The only additional configuration required to implement asymmetric routing beyond the standard
configuration for a layer 2 VTEP is to ensure each VTEP has all VLANs (and corresponding VNIs)
provisioned on it and the SVI for each such VLAN is configured with an Anycast gateway IP/MAC address.
Symmetric routing
In distributed symmetric routing, each VTEP acts as a layer 3 gateway, performing routing for its attached
hosts. This is the same as in asymmetric routing. The difference with symmetric routing is, both the ingress
VTEP and egress VTEP route the packets. Therefore, it can be compared to the traditional routing behavior
of routing to a next hop router. In the VXLAN encapsulated packet, the inner destination MAC address is
set to the router MAC address of the egress VTEP as an indication that the egress VTEP is the next hop and
also needs to perform routing. All routing happens in the context of a tenant (VRF).
For a packet received by the ingress VTEP from a locally attached host, the SVI interface corresponding
to the VLAN determines the VRF. For a packet received by the egress VTEP over the VXLAN tunnel, the
VNI in the packet has to specify the VRF. For symmetric routing, this is a VNI corresponding to the tenant
and is different from either the source VNI or the destination VNI. This VNI is referred to as the layer 3 VNI,
transit VNI, or interconnecting VNI; it has to be provisioned by the operator and is exchanged through the
EVPN control plane. In order to make the distinction clear, the regular VNI, which is used to map a VLAN, is
referred to as the layer 2 VNI.
FIGURE 18
FIGURE 19
·· Configure a per-tenant VXLAN interface that specifies the layer 3 VNI for the tenant. This VXLAN
interface is part of the bridge and router MAC addresses of remote VTEPs is installed over
this interface.
·· Configure an SVI (layer 3 interface) corresponding to the per-tenant VXLAN interface. This is
attached to the tenant’s VRF. Remote host routes for symmetric routing are installed over this SVI.
·· Specify the mapping of VRF to layer 3 VNI. This configuration is for the BGP control plane.
Describe the basics of EVPN, a BGP EVPN control plane, and the different route types
Ethernet Virtual Private Network (EVPN)
VXLAN is the de facto technology for implementing network virtualization in the data center, enabling
layer 2 segments to be extended over an IP core (the underlay). The initial definition of VXLAN (RFC 7348)
did not include any control plane and relied on a flood-and-learn approach for MAC address. An alternate
deployment model was to use a controller or a technology such as Lightweight Network Virtualization
(LNV) in Cumulus Linux (you cannot use EVPN and LNV at the same time).
Ethernet Virtual Private Network (EVPN) is a standards-based control plane for VXLAN defined in RFC 7432
and draft-ietf-bess-evpn-overlay that allows for building and deploying VXLANs at scale. It relies on
multi-protocol BGP (MP-BGP) for exchanging information and is based on BGP-MPLS IP VPNs (RFC 4364).
It has provisions to enable not only bridging between end systems in the same layer 2 segment but also
routing between different segments (subnets). There is also inherent support for multi-tenancy. EVPN is
often referred to as the means of implementing controller-less VXLAN.
EVPN address-family is supported with both eBGP and iBGP peering. If the underlay routing is provisioned
using eBGP, the same eBGP session can also be used to carry EVPN routes. For example, in a typical 2-tier
Clos network topology where the leaf switches are the VTEPs, if eBGP sessions are in use between the leaf
and spine switches for the underlay routing, the same sessions can be used to exchange EVPN routes and
the spine switches merely act as “route forwarders” as they do not install any forwarding state as they are
not VTEPs. When EVPN routes are exchanged over iBGP peering, OSPF can be used as the IGP or the next
hops can also be resolved using iBGP.
Key features of Cumulus Linux regarding EVPN as the control plane for VXLAN:
·· VNI membership exchange between VTEPs using EVPN type-3 (Inclusive multicast
Ethernet tag) routes.
·· Exchange of host MAC and IP addresses using EVPN type-2 (MAC/IP advertisement) routes.
·· Support for host/VM mobility (MAC and IP moves) through exchange of the MAC Mobility
Extended community.
·· Dual-attached host support via VXLAN active-active mode. MAC synchronization between
peers uses MLAG.
·· Support for ARP/ND suppression, providing VTEPs with the ability to suppress ARP flooding over
VXLAN tunnels.
·· Support for exchange of static (sticky) MAC addresses through EVPN.
·· Support for distributed symmetric routing between different subnets.
·· Support for distributed asymmetric routing between different subnets.
·· Support for centralized routing.
·· Support for prefix-based routing using EVPN type-5 routes (EVPN IP prefix route)
·· Support for layer 3 multi-tenancy.
·· Support for IPv6 tenant routing.
·· Symmetric routing, asymmetric routing and prefix-based routing are supported for IPv4/IPv6 hosts
and prefixes.
·· ECMP support for overlay networks on RIOT-capable Broadcom switches (Trident 3, Maverick,
Trident 2+) in addition to Mellanox Spectrum-A1 and Tomahawk switches.
0 Reserved 7432
Discovery (A-D)
12-255 Unassigned
·· Route Distinguisher
·· Ethernet Segment Identifier
·· Ethernet Tag ID
·· MAC Address Length
·· MAC Address
·· IP Address Length
·· IP Address
·· Label 1 (L2VNI)
·· Label 2 (L3VNI)
If external layer 3 connectivity is required, a separate route type, type 5, is used. BGP EVPN type 5
route components:
·· Route Distinguisher
·· Ethernet Segment Identifier
·· Ethernet Tag ID
·· IP Prefix Length
·· IP Prefix
·· GW IP address
·· Label (L3VNI)
FIGURE 20
Cumulus Networks implemented the ifreload feature in ifupdown2, the network interface configuration
utility. ifupdown2 interacts with the /etc/network/interfaces flat file, which controls all network
configuration (VLANs, MTU, IP addressing) except routing. When the /etc/network/interfaces file is edited
and overwritten, issuing an ifreload -a or systemctl reload networking.service causes ifupdown2 to apply
changes only to the part of the configuration that was modified.
FIGURE 21
Configure interfaces
https://docs.cumulusnetworks.com/display/DOCS/Interface+Configuration+and+Management
An interface can be placed into an admin down state. The interface remains down after any future reboots
or applying configuration changes with ifreload -a.
Port lists or ranges can be specified using the commas and dashes.
Alias
-------
to Server01
Routing
---------
Interface swp1 is down
Link ups: 1 last: 2019/02/28 14:17:11.67
Link downs: 1 last: 2019/02/28 15:18:23.17
PTM status: disabled
vrf: default
Description: to Server01
index 7 metric 0 mtu 9000 speed 1000
flags: <BROADCAST,PROMISC,MULTICAST>
Type: Ethernet
HWaddr: 44:38:39:00:02:05
Interface Type Other
Alias
-------
to Server02
Routing
---------
Interface swp2 is down
Link ups: 1 last: 2019/02/28 14:17:11.67
Link downs: 1 last: 2019/02/28 15:18:23.25
PTM status: disabled
vrf: default
Description: to Server02
index 8 metric 0 mtu 9000 speed 1000
flags: <BROADCAST,PROMISC,MULTICAST>
Type: Ethernet
HWaddr: 44:38:39:00:02:06
Interface Type Other
IPv4 and IPv6 addresses can be added to interfaces in the following manner.
Interface Descriptions can be added by creating an alias, which will be displayed in output and SNMP OID
IF-MIB::ifAlias. The alias can be up to 255 characters.
-----
hypervisor _ port _ 1
PTM overview
In data center topologies, ensuring proper cabling can be a time-consuming and error-prone endeavor.
Prescriptive Topology Manager (PTM) is a dynamic cabling verification tool to help detect and eliminate
such errors. It takes a Graphviz-DOT specified network cabling plan, stored in a topology.dot file, and
couples it with runtime information derived from LLDP to verify that the cabling matches the specification.
The check is performed on every link transition on each node in the network.
FIGURE 22
The topology.dot file can be customized to control ptmd at both the global/network level and the node/port
level. PTM runs as a daemon, named ptmd. For more information, see man ptmd(8).
The same topology.dot file should be used on all switches, and DO NOT split the file per device, as this
allows for easier automation by pushing/pulling the same exact file on each device.
Host-only parameters apply to the entire host on which PTM is running. You can include the hostnametype
host-only parameter, which specifies whether PTM should use only the host name (hostname) or the fully-
qualified domain name (fqdn) while looking for the self-node in the graph file.
Global parameters apply to every port listed in the topology file. There are two global parameters: LLDP
and BFD. LLDP is enabled by default; if no keyword is present, default values are used for all ports. However,
BFD is disabled if no keyword is present, unless there is a per-port override configured.
Per-port parameters provide finer-grained control at the port level. These parameters override any global or
compiled defaults.
graph G {
“spine1”:”swp1” -- “leaf1”:”swp1”;
“spine1”:”swp2” -- “leaf2”:”swp1”;
“spine2”:”swp1” -- “leaf1”:”swp2”;
“spine2”:”swp2” -- “leaf2”:”swp2”;
“leaf1”:”swp3” -- “leaf2”:”swp3”;
“leaf1”:”swp4” -- “leaf2”:”swp4”;
“leaf1”:”swp5s0” -- “server1”:”eth1”;
“leaf2”:”swp5s0” -- “server2”:”eth1”;
}
PTM templates
Templates provide flexibility in choosing different parameter combinations and applying them to a given
port. A template instructs ptmd to reference a named parameter string instead of a default one. There are
two parameter strings ptmd supports:
In this template, LLDP1 and LLDP2 are templates for LLDP parameters while BFD1 and BFD2 are templates
for BFD parameters. The templates are then referenced in the connectivity entries.
graph G {
LLDP=””
BFD=”upMinTx=300,requiredMinRx=100”
BFD1=”upMinTx=200,requiredMinRx=200”
BFD2=”upMinTx=100,requiredMinRx=300”
LLDP1=”match _ type=ifname”
LLDP2=”match _ type=portdescr”
“cumulus”:”swp44” -- “qct-ly2-04”:”swp20” [BFD=”bfdtmpl=BFD1”,
LLDP=”lldptmpl=LLDP1”]
https://docs.cumulusnetworks.com/display/DOCS/Border+Gateway+Protocol+-+BGP#BorderGateway
Protocol-BGP-unnumberedBGPUnnumberedInterfaces
BGP unnumbered enables the peering of devices without unique interface IP addresses. In BGP, configure
unnumbered interfaces using extended next hop encoding (ENHE), which is defined by RFC 5549. BGP
unnumbered interfaces provide a means of advertising an IPv4 route with an IPv6 next hop. Prior to RFC
5549, an IPv4 route could be advertised only with an IPv4 next hop.
FIGURE 23
BGP unnumbered interfaces are particularly useful in deployments where IPv4 prefixes are advertised
through BGP over a section without any IPv4 address configuration on links. This results in vast IP space
and administrative resources saved, while making automation significantly easier.
Every router or end host must have an IPv4 address to complete a traceroute of IPv4 addresses. In this
case, the IPv4 address used is that of the loopback device. Even if ENHE is not used in the data center, link
addresses are not typically advertised. Assigning an IP address to the loopback device is essential.
·· Link addresses take up valuable FIB resources, and the number of such addresses can be quite large,
increasing quickly based on number of spines, leafs and port density
·· 4 Spines * 32 interfaces * 2 IPs = 256 IP addresses or 8 * 96 * 2 = 1536 IP Addresses
·· Link addresses expose an additional attack vector for intruders to use to either break in or engage in
DDOS attacks
·· BGP unnumbered uses the interface’s IPv6 LLA to set up a BGP session with a peer
·· The IPv6 LLA of the remote end is discovered via IPv6’s Router Advertisement (RA) protocol
·· RA provides not only the remote end’s LLA, but also its corresponding MAC address
·· BGP uses RFC 5549 to encode IPv4 routes as reachable over an IPv6 nexthop, using the IPv6 LLA
as the nexthop
·· The RIB process programs a static ARP entry with a reserved IPv4 LLA, 169.254.0.1, with the MAC
address set to the one learned via RA
·· BGP hands down to the RIB process, IPv4 routes with the IPv6 LLA as the nexthop
·· The RIB process converts the nexthop to 169.254.0.1 and the outgoing interface before programming
the route in the forwarding table
To configure a BGP unnumbered interface, IPv6 neighbor discovery router advertisements must be enabled.
The interval specified is measured in seconds and defaults to 10.
In Cumulus Linux 3.7.1 and earlier, ENHE is sent only for the link-local address peering. In Cumulus Linux
3.7.2 and later, extended next hop encoding can be sent for the both link-local and global unicast
address peering.
BGP unnumbered troubleshooting is not that different, except that you need to focus on the IPv6 addresses
and switchport information as the next hop information.
Quick snapshot of neighbor information and overall summary is useful for most issues.
The following command shows how the IPv4 link-local address 169.254.0.1 is used to install the route and
static neighbor entry to facilitate proper forwarding without having to install an IPv4 prefix with IPv6 next
hop in the kernel.
65020 65012
fe80::4638:39ff:fe00:601 from spine01(swp51) (10.0.0.21)
(fe80::4638:39ff:fe00:601) (used)
Origin IGP, valid, external, multipath, bestpath-from-AS 65020, best
AddPath ID: RX 0, TX 4
Last update: Thu Feb 28 14:16:51 2019
A more detailed view of a BGP Neighbor can be checked for neighbor related issues.
Notifications: 0 0
Updates: 143 85
Keepalives: 5585 5585
Route Refresh: 0 0
Capability: 0 0
Total: 5729 5671
Minimum time between advertisement runs is 0 seconds
65020 65012
fe80::4638:39ff:fe00:601 from spine01(swp51) (10.0.0.21)
(fe80::4638:39ff:fe00:601) (used)
Origin IGP, valid, external, multipath, bestpath-from-AS 65020, best
AddPath ID: RX 0, TX 4
Last update: Thu Feb 28 14:16:50 2019
Verify that the device has learned the IPv6 link-local neighbor ip.
Alias
-----
to Spine01
cl-netstat counters
-------------------
RX _ OK RX _ ERR RX _ DRP RX _ OVR TX _ OK TX _ ERR TX _ DRP TX _ OVR
------- -------- ------- -------- ------ ------- ------- --------
25507 0 2 0 13074 0 0 0
LLDP Details
------------
LocalPort RemotePort(RemoteHost)
--------- ----------------------
swp51 swp1(spine01)
Routing
-------
Interface swp51 is up, line protocol is up
Link ups: 0 last: (never)
Link downs: 0 last: (never)
PTM status: disabled
vrf: default
Description: to Spine01
index 3 metric 0 mtu 9216 speed 1000
flags: <UP,BROADCAST,RUNNING,MULTICAST>
Type: Ethernet
HWaddr: 44:38:39:00:02:01
inet6 fe80::4638:39ff:fe00:201/64
Interface Type Other
ND advertised reachable time is 0 milliseconds
ND advertised retransmit interval is 0 milliseconds
ND router advertisements sent: 1608 rcvd: 1603
ND router advertisements are sent every 10 seconds
ND router advertisements lifetime tracks ra-interval
ND router advertisement default router preference is medium
Hosts use stateless autoconfig for addresses.
Neighbor address(s):
inet6 fe80::4638:39ff:fe00:601/128
By default, FRR configuration settings are stored in an integrated file containing all routing protocols,
/etc/frr/frr.conf. If this is disabled each routing protocol process saves their configuration in a different file.
FIGURE 24
NCLU can manage configurations and interact with FRR, as shown throughout this document and
specifically in the previous BGP unnumbered section.
The FRR interactive command line can be accessed with the command sudo vtysh. This command line
interface feels familiar to traditional networking vendors with question mark completion. FRRouting inherits
the IP addresses and associated routing tables for the network interfaces from the /etc/network/interfaces
file, and this is the recommended way to define the addresses. Do NOT create interfaces using FRRouting.
Static routes added via FRRouting can be deleted via Linux shell, but while possible, should be avoided.
Routes added by FRRouting should only be deleted by FRRouting, otherwise FRRouting might not be able
to clean its internal state completely and incorrect routing could occur.
FRR can be manually enabled by editing the file /etc/frr/daemons, and then enabling and starting the
FRRouting service.
The default FRR configuration can be applied by deleting the /etc/frr/frr.conf file and restarting the service.
During configuration updates FRR is reloaded for changes to take effect and applies only the changes made
and synchronizes state with the configuration in /etc/frr/frr.conf. This option is not available when using a
non-integrated configuration and separate protocol configuration files.
https://docs.cumulusnetworks.com/display/DOCS/Network+Command+Line+Utility+-+NCLU
The Network Command Line Utility (NCLU) is a command line interface for Cumulus Networks products
simplifying the networking configuration process for all users. NCLU resides in the Linux user space and
provides consistent access to networking commands directly through bash, making configuration and
troubleshooting simple and easy; no need to edit files or enter modes and sub-modes. NCLU provides
these benefits:
·· Embeds help, examples, and automatic command checking with suggestions in case you enter a typo
·· Runs directly from and integrates with bash, while being interoperable with the regular way of
accessing underlying configuration files and automation
·· Configures dependent features automatically so that you don’t have to
·· Every configuration change with NCLU is saved in a snapshot
The NCLU wrapper utility “net” is capable of configuring layer 2 and layer 3 features of the networking
stack, installing ACLs and VXLANs, rolling back and deleting snapshots, as well as providing monitoring and
troubleshooting functionality for these features. You can configure both the /etc/network/interfaces and
/etc/frr/frr.conf files with net, in addition to running show and clear commands related to ifupdown2
and FRRouting.
Use the following workflow to stage and commit changes to Cumulus Linux with NCLU:
1. Use the net add and net del commands to stage and remove configuration changes
3. Use net commit and net abort to commit and delete staged changes
The net commit command applies the changes to the relevant configuration files, such as
/etc/network/interfaces, then runs necessary follow on commands to enable the configuration, such as
ifreload -a. If two different users try to commit a change at the same time, NCLU displays a warning but
implements the change according to the first commit received, and the second user will need to abort
their commit.
·· net show is a series of commands to view various parts of the network configuration. Use net show
configuration to view the entire network configuration, net show commit history for a history of
commits by NCLU.
·· net clear provides a way to clear net show counters, BGP and OSPF neighbor content, and more.
·· net rollback provides a mechanism to revert back to an earlier configuration.
·· net commit confirm requires the user to press “enter” to commit changes using NCLU within 10
seconds, or the commit automatically reverts and no changes are made.
·· net commit description <description> enables you to provide a descriptive summary of the changes
you are about to commit.
·· net commit permanent retains the snapshot taken when committing the change. Otherwise, the
snapshots created from NCLU commands are cleaned up periodically with a snapper cron job.
·· net commit delete deletes one or more snapshots created when committing changes with NCLU.
·· net del all deletes all configurations and stops the IEEE 802.1X service.
NCLU help
NCLU offers tab completion for help when using individual commands, as well as offering a specific help
command option which will search for commands including that keyword. If the keywords returns no results
you will receive an error. The basic form of the command returns comprehensive information to guide
the user.
Usage:
# net <COMMAND> [<ARGS>] [help]
#
# net is a command line utility for networking on Cumulus Linux switches.
#
# COMMANDS are listed below and have context specific arguments which can
# be explored by typing “<TAB>” or “help” anytime while using net.
#
# Use ‘man net’ for a more comprehensive overview.
net abort
net commit [verbose] [confirm] [description <wildcard>]
net commit delete (<number>|<number-range>)
net help [verbose]
net pending
net rollback (<number>|last)
Options:
# Help commands
help : context sensitive information; see section below
example : detailed examples of common workflows
# Configuration commands
add : add/modify configuration
del : remove configuration
# Status commands
show : show command output
clear : clear counters, BGP neighbors, etc
NCLU has a number of built in examples to guide users through basic configuration setup.
Scenario
========
We are configuring switch1 and would like to configure the following
- configure switch1 as an L2 switch for host-11 and host-12
- enable vlans 10-20
- place host-11 in vlan 10
- place host-12 in vlan 20
- create an SVI interface for vlan 10
- create an SVI interface for vlan 20
- assign IP 10.0.0.1/24 to the SVI for vlan 10
- assign IP 20.0.0.1/24 to the SVI for vlan 20
- configure swp3 as a trunk for vlans 10, 11, 12 and 20
swp3
*switch1 --------- switch2
/\
swp1 / \ swp2
/ \
/ \
host-11 host-12
Verification
============
switch1# net show interface
switch1# net show bridge macs
Switchd is the daemon at the heart of Cumulus Linux responsible for communicating between the switch
and Cumulus Linux, and all the applications running on Cumulus Linux. The configuration for switchd is
stored in /etc/cumulus/switchd.conf. Switchd peers directly with networking ASICs and normalizes the
networking model, and peers with the kernel via netlink.
FIGURE 25
NCLU operates in the user space, and the user space versus kernel information is covered in the
User Space and Kernel section. Netlink is a Linux kernel interface used for inter-process communication
between the kernel and userspaces, as well as between different userspace processes.
ONIE enables a bare metal network switch ecosystem where switch hardware suppliers to manage their
operations based on a small number of hardware SKUs, and end users have a choice among different
network operating systems alternatives.
https://support.cumulusnetworks.com/hc/en-us/sections/200709257-ONIE
http://onie.org/
When a new machine boots up for the first time, ONIE locates and executes a NOS vendor’s
installation program.
FIGURE 26
ONIE is NOT used on every boot of the system. After the initial installation, subsequent boots go straight
into the NOS, bypassing ONIE.
FIGURE 27
Mechanisms exist for a system to re-enter the installation phase. An API is defined so that network
operating systems can direct the system to re-enter the installation phase.
http://opencomputeproject.github.io/onie/design-spec/discovery.html
6. TFTP waterfall
FIGURE 28
https://docs.cumulusnetworks.com/display/DOCS/Installing+a+New+Cumulus+Linux+Image
Describe ZTP
https://docs.cumulusnetworks.com/display/DOCS/Zero+Touch+Provisioning+-+ZTP
Zero touch provisioning (ZTP) enables you to deploy network devices quickly in large-scale environments.
On first boot, Cumulus Linux invokes ZTP, which executes the provisioning automation used to deploy the
device for its intended role in the network.
The provisioning framework allows for a one-time, user-provided script to be executed. You can develop
this script using a variety of automation tools and scripting languages, providing flexibility to design the
provisioning schemes to meet your needs.
While developing and testing the provisioning logic, you can use the ztp command in Cumulus Linux to
manually invoke your provisioning script on a device.
3. DHCP
ZTP best practices (can expand each to tier 3 and apply details to each)
Netfilter is the packet filtering framework in Cumulus Linux and most Linux distributions. Netfilter does not
require a separate software daemon to run; it is part of the Linux kernel itself. Netfilter asserts policies at
layers 2, 3, and 4 of the OSI model by inspecting packet and frame headers based on a list of rules. There
are a number of tools available for configuring ACLs in Cumulus Linux with varying functions.
·· iptables, ip6tables, and ebtables are Linux userspace tools used to administer filtering rules for IPv4
packets, IPv6 packets, and Ethernet frames (layer 2 using MAC addresses)
·· NCLU is a Cumulus Linux-specific userspace tool used to configure custom ACLs
·· cl-acltool is a Cumulus Linux-specific userspace tool used to administer filtering rules and configure
default ACLs
·· Without using cl-acltool, rules are NOT installed into hardware
·· Running cl-acltool -i (the installation option) resets all rules and deletes anything that is not stored
in /etc/cumulus/acl/policy.conf
However, running cl-acltool -i or reboot removes them. To ensure all rules that can be in hardware are
hardware accelerated, place them in the /etc/cumulus/acl/policy.conf file, then run cl-acltool -i.
Netfilter tables
When building rules to affect the flow of traffic, the individual chains can be accessed by tables. Linux
provides three tables by default:
Each table has a set of default chains that can be used to modify or inspect packets at different points
of the path through the switch. Chains contain the individual rules to influence traffic. Each table and the
default chains they support are shown below. Tables and chains in green are supported by Cumulus Linux,
and those in red are NOT supported (NO hardware acceleration) at this time.
FIGURE 29
Netfilter chains
Netfilter chains reference the position the action is taken during packet flow and processing. The rules
created by these programs inspect or operate on packets at several points in the life of the packet through
the system. These five points are known as chains:
FIGURE 30
Netfilter rules
FIGURE 31
·· Table: The first argument is the table. The second example does not specify a table, because the filter
table is implied if not specified.
·· Chain: The second argument is the chain. Each table supports several different chains.
·· Matches: The third argument(s) are called the matches. You can specify multiple matches in a single
rule. However, the more matches you use in a rule, the more memory that rule consumes.
·· Jump: The jump specifies what action to take if the packet matches the rule. If jump is omitted in
rule, then matching the rule will have no effect on the packet’s fate, but will increment counters.
·· Target(s): The target can be a user-defined chain, one of the special built-in targets that decides the
fate of the packet immediately (like DROP), or an extended target.
ACL rule assignment placement
·· If a switch port is assigned to a bond, any egress rules must be assigned to the bond.
·· When using the OUTPUT chain, rules must be assigned to the source. For example, if a rule is assigned
to the switch port in the direction of traffic but the source is a bridge (VLAN), the traffic is not affected
by the rule and must be applied to the bridge.
·· If all transit traffic needs to have a rule applied, use the FORWARD chain, not the OUTPUT chain.
https://docs.cumulusnetworks.com/display/DOCS/Netfilter+-+ACLs#Netfilter-ACLs-ControlPlaneand
DataPlaneTraffic
Control Plane traffic can be monitored and identified with tcdump. Cumulus Linux adds new extended
targets to iptables and ebtables such as SPAN, ERSPAN, POLICE, TRICOLORPOLICE, and SETCLASS.
You can configure quality of service for traffic on both the control plane and the data plane. By using QoS
policers, you can rate limit traffic so incoming packets get dropped if they exceed specified thresholds.
Unfortunately, counters on POLICE ACL rules in iptables do not currently show the packets that are
dropped due to those rules.
Using the POLICE target with iptables takes the following arguments:
·· --set-class <value> sets the system internal class of service queue configuration to value.
·· --set-rate <value> specifies the maximum rate in kilobytes (KB) or packets.
·· --set-burst <value> specifies the number of packets or kilobytes (KB) allowed to arrive sequentially.
Must be greater than or equal to 1.
·· --set-mode (KB|pkt) sets the mode in KB (kilobytes) or pkt (packets) for rate and burst size.
Cumulus comes with 2 files in /etc/cumulus/acl/policy.d/ for control plane filtering, 00control_plane.rules
and 99control_plane_catch_all.rules. These files can be edited or new files can be created numbered 01
through 98 for order processing. An excerpt of 00control_plane.rules is shown below.
[iptables]
-A $INGRESS _ CHAIN --in-interface $INGRESS _ INTF -p udp --dport $BFD _ ECHO _ PORT -j
SETCLASS --class 7
-A $INGRESS _ CHAIN -p udp --dport $BFD _ ECHO _ PORT -j POLICE --set-mode pkt --set-
rate 2000 --set-burst 2000
--set-burst 2000
SPAN overview
SPAN (Switched Port Analyzer) provides for the mirroring of all packets coming in from or going out of an
interface (source) and being copied and transmitted out of a local port (destination) for monitoring. The
SPAN destination port is also referred to as a mirror-to-port (MTP). The original packet is still switched,
while a mirrored copy of the packet is sent out of the MTP.
ERSPAN (Encapsulated Remote SPAN) enables the mirrored packets to be sent to a monitoring node
located anywhere across the routed network. The switch finds the outgoing port of the mirrored packets by
doing a lookup of the destination IP address in its routing table. The original L2 packet is encapsulated with
GRE for IP delivery.
SPAN and ERSPAN are configured via cl-acltool. The match criteria for SPAN and ERSPAN is usually an
interface, but Selective SPAN can be used for granular match terms. The SPAN source interface can be a
port, a subinterface or a bond interface. Ingress and egress (mellanox) traffic on interfaces can be matched.
Cumulus Linux supports a maximum of 2 SPAN destinations. The SPAN destination (MTP) interface can be a
physical port, a subinterface, or a bond interface. The SPAN/ERSPAN action is independent of security ACL
actions. If packets match both a security ACL rule and a SPAN rule, both actions will be carried out.
Install rules
Rule removal
Selective SPAN
https://docs.cumulusnetworks.com/display/DOCS/Network+Troubleshooting#NetworkTroubleshooting-
selective_spanning
SPAN/ERSPAN traffic rules can be configured to limit the traffic that is spanned to reduce the volume of
copied data. Cumulus Linux supports selective spanning for iptables only. ip6tables and ebtables are
NOT supported. The following matching fields are supported:
·· IPv4 SrcIP/DstIP
·· IP protocol
·· L4 (TCP/UDP) src/dst port
·· TCP flags
·· An ingress port/wildcard (swp+) can be specified in addition
With ERSPAN, a maximum of two --src-ip --dst-ip pairs are supported. Exceeding this limit produces an
error when you install the rules with cl-acltool.
The following rule blocks any traffic with source MAC address 00:00:00:00:00:12 and destination MAC
address 08:9e:01: ce: e2:04 going from any switch port egress/ingress.
In Cumulus Linux, atomic update mode is enabled by default. If you have Tomahawk switches and plan to
use SPAN and/or mangle rules, you must disable atomic update mode in the file /etc/cumulus/switchd.conf,
then restart switchd.
Other references
An overview of IPv6 is covered in the IPv6 Overview section earlier in the document.
AAA
Cumulus Networks offers add-on packages that enable RADIUS users to log in to Cumulus Linux switches
in a transparent way with minimal configuration. There is no need to create accounts or directories on the
switch. Authentication is handled with PAM and includes login, ssh, sudo and su. The general steps needed
to enable RADIUS are:
The RADIUS packages are not included in the base Cumulus Linux image; there is no RADIUS metapackage.
The libpam-radius-auth package supplied with the Cumulus Linux RADIUS client is a newer version than the
one in Debian Jessie. This package has added support for IPv6, the src_ip option described below, as well
as a number of bug fixes and minor features. The package also includes VRF support, provides man pages
describing the PAM and RADIUS configuration, and sets the SUDO_PROMPT environment variable to the
login name for RADIUS mapping support.
The libnss_mapuser package is specific to Cumulus Linux and supports the getgrent, getgrnam and
getgrgid library interfaces. These interfaces add logged in RADIUS users to the group member list for
groups that contain the mapped_user (radius_user) if the RADIUS account is unprivileged and add
privileged RADIUS users to the group member list for groups that contain the mapped_priv_user
(radius_priv_user) during the group lookups.
·· The PAM configuration is modified automatically using pam-auth-update (8), and the NSS
configuration file /etc/nsswitch.conf is modified to add the mapuser and mapuid plugins. If you
remove or purge the packages, these files are modified to remove the configuration for these plugins.
·· The radius_shell package is added, which installs the /sbin/radius_shell and setcap cap_setuid
program used as the login shell for RADIUS accounts. The package adjusts the UID when needed, then
runs the bash shell with the same arguments. When installed, the package changes the shell of the
RADIUS accounts to /sbin//radius_shell, and to /bin/shell if the package is removed. This package is
required for privileged RADIUS users to be enabled. It is not required for regular RADIUS client use.
·· The radius_user account is added to the netshow group and the radius_priv_user account to the
netedit and sudo groups. This change enables all RADUS logins to run NCLU net show commands and
all privileged RADIUS users to also run net add, net del, and net commit commands, and to use sudo.
After installation is complete, either reboot the switch or run the sudo systemctl restart netd command.
·· Add the hostname or IP address of at least one RADIUS server (such as a freeradius server on Linux)
and the shared secret used to authenticate and encrypt communication with each server.
·· Multiple server configuration lines are verified in the order listed. Other than memory, there is no limit
to the number of RADIUS servers that can be used.
·· The server port number or name is optional. The system looks up the port in the /etc/services file.
However, you can override the ports in the /etc/pam_radius_auth.conf file.
·· If the server is slow or latencies are high, change the timeout setting. The setting defaults to 3 seconds.
·· If you want to use a specific interface to reach the RADIUS server, specify the src_ip option. You can
specify the hostname of the interface, an IPv4, or an IPv6 address. If you specify the src_ip option,
you must also specify the timeout option.
·· Set the vrf-name field. This is typically set to mgmt if you are using a management VRF. You cannot
specify more than one VRF.
The configuration file includes the mapped_priv_user field that sets the account used for privileged
RADIUS users and the priv-lvl field that sets the minimum value for the privilege level to be considered a
privileged login (the default value is 15). If you edit these fields, make sure the values match those set in the
/etc/nss_mapuser.conf file.
Debugging messages are written to /var/log/syslog. When the RADIUS client is working correctly, comment
out the debug line.
·· Add a local privileged user account with same unique identifier as the privileged radius user
·· Enable local privileged user to run appropriate commands
·· Edit the /etc/passwd file to move the local user line before the radius entry
·· Set the local password for the user
The example shows the setting based on the radius_priv_user account in the /etc/passwd file is
radius_priv_user:x:1002:1001::/home/radius_priv_user:/sbin/radius_shell.
Cumulus Linux handles this transparently once the lbnss-mapuser package is installed. Specific details
can be found here. https://docs.cumulusnetworks.com/display/DOCS/RADIUS+AAA#RADIUSAAA-
EnableLoginwithoutLocalAccounts
To verify that the RADIUS client is configured correctly, log in as a non-privileged user and run a net add
interface command.
In this example, the ops user is not a priveleged RADIUS user so they cannot add an interface, while the
admin user is a privileged RADIUS user (with privilege level 15) so is able to add interface swp1.
source /etc/network/interfaces.d/*.intf
https://docs.cumulusnetworks.com/display/DOCS/Setting+Date+and+Time
NTP synchronizes time between devices configured in a client-server relationship. Cumulus will use eth0 by
default, but this can be changed.
Prior to NTP configuration validate the time zone (acceptable values) and datetime and change them if they
need modification. The /etc/timezone file can be modified or replaced. Running the second command will
make the changes take effect immediately.
To set the system clock according to the time zone configured: man date(1)
To write the current value of the system (software) clock to the hardware clock: man hwclock(8)
If you use DHCP and want to specify your NTP servers, you must specify an alternate configuration file
for NTP.
Before you create the file, ensure that the DHCP-generated configuration file exists. In Cumulus Linux 3.6.1
and later (which uses NTP 1:4.2.8), the DHCP-generated file is named /run/ntp.conf. This file is generated
by the /etc/dhcp/dhclient-exit-hooks.d/ntp script and is a copy of the default /etc/ntp.conf with a modified
server list from the DHCP server. If this file does not exist and you plan on using DHCP in the future, you can
copy your current /etc/ntp.conf file to the location of the DHCP file.
To use an alternate configuration file that persists across upgrades of Cumulus Linux, create a systemd unit
override file called /etc/systemd/system/ntp.service.d/config.conf and add the following content:
With this unit file override present, changing NTP settings using NCLU do NOT take effect until the DHCP
script regenerates the alternate NTP configuration file.
The ntpd daemon running on the switch implements the NTP protocol. It synchronizes the system time with
time servers listed in /etc/ntp.conf. The ntpd daemon is started at boot by default. You can specify the NTP
server or servers you want to use with NCLU; include the iburst option to increase the sync speed.
These commands add the NTP server 4.cumulusnetworks.pool.ntp.org to the list of servers in /etc/ntp.conf.
Servers 0 through 3 are included by default.
# pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will
# pick a different set every time it starts up. Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 0.cumulusnetworks.pool.ntp.org iburst
server 1.cumulusnetworks.pool.ntp.org iburst
server 2.cumulusnetworks.pool.ntp.org iburst
server 3.cumulusnetworks.pool.ntp.org iburst
server 4.cumulusnetworks.pool.ntp.org iburst
https://docs.cumulusnetworks.com/display/DOCS/Simple+Network+Management+Protocol+
%28SNMP%29+Monitoring
SNMP is an IETF RFC standards-based network management architecture and protocol tracing back to
Carnegie-Mellon University. Subsequently modified by programmers at the University of California, the
code was made publicly available under the name ucd-snmp. This was further extended by the University
of Liverpool as well as in Denmark. The name update to net-snmp and became a fully-fledged collaborative
open source project. The version used by Cumulus Networks is the latest net-snmp 5.7 branch with added
custom MIBs, and pass-through and pass-persist scripts.
SNMP Management servers gather information from different systems in a consistent manner and its
longevity is due to standardizing the objects collected from devices, the protocol used for transport, and
architecture of the management systems. The most widely used versions of SNMP are v1 and v2c, while v3
uses advanced security features and is the recommended choice.
Agents — The SNMP agents (snmpd) on the switches perform the bulk of the work by gathering information
about the local system and storing it in an internal database called the management information base (MIB).
The MIB is a standardized, hierarchical structure storing information to be queried. Parts of the MIB tree are
available and provided to incoming requests originating from an NMS host that has authenticated with the
correct credentials. You can configure the Cumulus Linux switch with usernames and credentials to provide
authenticated and encrypted responses to NMS requests. The snmpd agent can also proxy requests and
act as a master agent to sub-agents running on other daemons (FRR, LLDP).
Managers — An SNMP Network Management System (NMS) is a computer that is configured to poll SNMP
agents to gather and present information. The manager can be any machine capable of sending query
requests to SNMP agents with the correct credentials. The NMS can be a large monitoring suite or a simple
set of scripts that collect through polling the agents and present the data. SNMP agents can also send
unsolicited Traps/Inform messages to the SNMP Manager based on predefined criteria.
Management Information Base (MIB) — The MIB is a database implemented on the daemon or agent and
follows IETF RFC standards the same as the manager. It is a hierarchical structure that, in many areas, is
globally standardized, but also flexible enough to allow vendor-specific additions. Cumulus Networks
implements a number of custom enterprise MIB tables and these are defined in text files located on the
switch and in files named /usr/share/snmp/mibs/Cumulus*. The MIB structure is best understood as a top-
down hierarchical tree where each branch that forks is labeled both with an identifying number (starting
at 1) and an identifying string unique to that level of the hierarchy. These strings and numbers can be used
interchangeably in order for a specific node of the tree to be traced from the unnamed root of the tree to
the node in question.
Object Identifier (OID) — The parent IDs (numbers or strings) are strung together, starting with the most
general to form an address for the MIB Object with each junction in the hierarchy represented by a dot. The
series of ID strings or numbers separated by dots is an address known as an object identifier (OID).
No changes are required in the /etc/snmp/snmpd.conf file on the switch to support the custom Cumulus
Networks MIBs. The following lines are already included by default and provide support for both the
Cumulus Counters and the Cumulus Resource Query MIBs.
sysObjectID 1.3.6.1.4.1.40310
pass _ persist .1.3.6.1.4.1.40310.1 /usr/share/snmp/resq _ pp.py
pass _ persist .1.3.6.1.4.1.40310.2 /usr/share/snmp/cl _ drop _ cntrs _ pp.py
However, you need to copy several files to the NMS server(s) for the custom Cumulus MIB to be recognized
on the NMS server.
/usr/share/snmp/mibs/Cumulus-Snmp-MIB.txt
/usr/share/snmp/mibs/Cumulus-Counters-MIB.txt
/usr/share/snmp/mibs/Cumulus-Resource-Query-MIB.txt
Cumulus Linux 3.4 and later releases support configuring SNMP with NCLU. While NCLU does not have
100% coverage to configure every single snmpd feature, it is the recommended method of configuring
snmpd. You are not restricted to using NCLU for configuration and can edit the /etc/snmp/snmpd.conf file
and control snmpd with systemctl commands. For Cumulus Linux versions earlier than 3.0, snmpd has a
default configuration that listens to incoming requests on all interfaces.
Many options are available for review and implementation, below is an example of SNMPv3 tied to the mgmt
VRF using MD5 and reporting link-up, link-down, and authentication failure traps.
Inform keywords in the trap definition use SNMPv3 acknowledged informs rather than traps. NCLU restarts
the snmpd daemon after configuration changes are made and committed.
To use file manipulation and replacement for manual or automated SNMP configuration, see https://docs.
cumulusnetworks.com/display/DOCS/Simple+Network+Management+Protocol+%28SNMP%29+Monitoring
#SimpleNetworkManagementProtocol(SNMP)Monitoring-ConfigureSNMPManually.
DHCP Relay provides a forwarding action for DHCP broadcast packets, which would otherwise be dropped
at a layer 2/3 demarcation point, towards a set of layer 3 unicast addresses. This allows for centralized
DHCP servers to serve an entire site or organization.
DHCP and DHCP Relay daemons and disabled by default. After configuring the services needed, the
appropriate service(s) will need to be restarted. Both IPv4 and IPv6 DHCP Relays are supported but must
be started separately.
The following example shows the use of the command to set IP address 10.0.0.1 on the loopback interface as
the giaddr and creates the latter configuration in the /etc/default/isc-dhcp-relay file:
When using VRR for redundancy, the configuration procedure for DHCP relay is the same, except that DHCP
relay must run on the SVI and not on the -v0 interface.
NCLU does not support IPv6 DHCP Relay configuration as of version 3.7.3.
Edit the /etc/default/isc-dhcp-relay6 file has a different format than the /etc/default/isc-dhcp-relay file for
IPv4 DHCP relays. Make sure to configure the variables appropriately by editing this file.
After you finish configuring the DHCP relay, save your changes, restart the dhcrelay6 service, then enable
the dhcrelay6 service so the configuration persists between reboots:
To see the status of the IPv6 DHCP relay, use the systemctl status dhcrelay6.service command:
Use the journalctl command to look at the behavior on the Cumulus Linux switch that is providing the DHCP
relay functionality:
If experiencing issues with the DHCP relay, run the following commands to determine if the issue is with
systemd. The following commands manually activate the DHCP relay process and they do not persist when
you reboot the switch:
A unique IP is required from the device relaying the DHCP request representing the gateway IP address
(giaddr) to identify the requesting subnet. This poses an issue when Anycast is used with the same gateway
across multiple devices as with EVPN. Cumulus Linux supports RFC 3527 which allows the specification of
link selection sub-option to identify the subnet when the giaddr is insufficient.
FIGURE 32
Design concepts
Describe clos design
Clos is an architecture developed between the 1930 and 1950s, made popular by Charles Clos to meet
scalability requirements in circuit switched networks. With the explosion of scale in modern data center
networks, the Clos design is being utilized for similar purposes.
For data center networks the clos design is represented by leaf and spine devices, where servers connect to
leafs, each leaf connects to each spine, spines do not connect to each other, and leafs only connect to each
other in support of device virtualization techniques such as MLAG. This provides consistency in bandwidth
and latency between servers as the majority of traffic in the data center is now east and west, rather than
north and south.
https://cumulusnetworks.com/media/resources/validated-design-guides/Cumulus-Linux-Layer-2-HA-
Validated-Design-Guide_v1.0.0.pdf
https://cumulusnetworks.com/media/cumulus/pdf/technical/validated-design-guides/Big-Data-Cumulus-
Linux-Validated-Design-Guide.pdf
Positively, this is an established topology, widely supported with multiple vendors, good documentation
and easy configuration. This topology will provide Layer 2 reachability and allow the use of spanning
tree commands.
Negatively, this topology will not provide failover capability, waste half of available bandwidth, and the
switches can see multiple MAC addresses.
MLAG
Multi-Chassis Link Aggregation (MLAG), enables a server or switch with a two-port bond, such as a link
aggregation group/LAG, EtherChannel, port group or trunk, to connect those ports to different switches
and operate as if they are connected to a single, logical switch. This provides greater redundancy and
greater system throughput.
Dual-connected devices or hosts can create LACP bonds that contain links to each physical switch.
Therefore, active-active links from the dual-connected devices are supported even though they are
connected to two different physical switches.
FIGURE 33
MLAG reduces the dependence on spanning tree and enables 100% bandwidth utilization, but is vendor
specific without interoperability, requires an inter-switch link (ISL), more configuration, and can be
more complex.
FIGURE 34
L3 single-attached hosts
FIGURE 35
The server (physical host) has one or more links to only one ToR switch.
Layer 3 architecture with single-attached hosts provides simple networking configuration with no spanning
tree, MLAG, no L2 loops, no leaf inter-switch links. All of which allow for great route scaling and flexibility.
Unfortunately, this simplistically comes at the cost of redundancy for server connectivity and uses a single
top of rack switch as its gateway. It is the responsibility of the remaining infrastructure to overcome the loss
of the rack.
With redistribute neighbor, a daemon grabs ARP entries dynamically, and utilizes redistribute table for
FRRouting to enter them into the fabric. This solution is limited to IPv4, provides no layer 2 adjacency
without VXLAN, host traffic dependent on proxy ARP, and withdrawal from original leaf could be up to 4
hours. First hop redundancy is handled with equal cost routes on the host to both top of rack switches. This
solution allows containers to be built and destroyed with their Layer 3 information dynamically introduced
to the network.
FIGURE 36
Routing on the host means there is a routing application (such as FRRouting) either on the bare metal host
(no VMs/containers) or the hypervisor (Ubuntu with KVM). This is preferred and recommended by Cumulus
Networks over redistribute neighbor.
There is no requirement for MLAG, no spanning tree or layer 2 domain, no loops, and 3 or more top of rack
devices can be used. Host and VM mobility is enabled and traffic engineering can be used to allow for
hardware and software upgrades. This solution is dependent on host routing support, and would require
VXLAN for layer 2 adjacencies.
FIGURE 37
Routing on the VM is very similar to routing on the host and includes those benefits, but instead of routing
on the hypervisor, each virtual machine utilizes its own routing stack. This removes the need for the
hypervisor platform to support routing, but all VMs must support routing, and 1 router per VM can create
scaling issues.
FIGURE 38
Virtual router
Virtual router (vRouter) runs as a VM on the hypervisor/host, sends routes to the ToR using BGP or OSPF. In
addition to routing on host; virtual router enables same rack Multi-tenancy and the base OS does not need
to be routing capable. The virtual router acts as a local gateway and provides multiple routes through top of
rack switches. Older Linux kernels will not provide per flow ECMP.
https://support.cumulusnetworks.com/hc/en-us/articles/213177027-Installing-the-Cumulus-Linux-Quagga-
Package-on-an-Ubuntu-Server
FIGURE 39
Limiting the exchange of routing information at various parts in the network is a best practice to follow. One
way to do this is show in the image.
FIGURE 40
In contrast to routing on the host (preferred), this method allows a user to route to the host. The ToRs are
the gateway, as with redistribute neighbor, except because there is no daemon running. The networks must
be manually configured under the routing process. There is potential ease to black hole traffic unless a
script is run to remove the routes when the host no longer responds.
FIGURE 41
This provides most benefits of routing on the host without host routing or redistribute neighbor requirement.
Moving subnets from one top of rack to another is manual process and requires synchronization between
server and network teams.
https://docs.cumulusnetworks.com/display/DOCS/Lightweight+Network+Virtualization+Overview
FIGURE 42
Lightweight Network Virtualization (LNV) is a technique for deploying VXLANs without a central controller
on bare metal switches. This solution requires no external controller or software suite; it runs the VXLAN
service and registration daemons on Cumulus Linux itself. The data path between bridge entities is
established on top of a layer 3 fabric by means of a simple service node coupled with traditional MAC
address learning.
The host runs LACP (Etherchannel/bond) to the pair of ToRs. LNV (Lightweight Network Virtualization) then
transports the Layer 2 bridges across a Layer 3 fabric. The layer 2 domain is limited to the local top of rack
switches, and the aggregation is all layer 3 providing great route scaling and flexibility, and high availability.
FIGURE 43
FIGURE 44
FIGURE 45
ECMP overview
Cumulus Linux supports hardware-based equal cost multipath (ECMP) load sharing. ECMP is enabled by
default and load sharing occurs automatically for all routes with multiple next hops installed. ECMP load
sharing supports both IPv4 and IPv6 routes.
·· Originate from the same routing protocol. Routes from different sources are not considered equal.
For example, a static route and an OSPF route are not considered for ECMP load sharing.
·· Have equal cost. If two routes from the same protocol are unequal, only the best route is installed in
the routing table.
ECMP hashing
To prevent out of order packets, ECMP hashing is done on a per-flow basis, which means that all packets
with the same source and destination IP addresses and the same source and destination ports always
hashed to the same next hop. ECMP hashing does not keep a record of flow states.
ECMP hashing does not keep a record of packets that have hashed to each next hop and does not
guarantee that traffic sent to each next hop is equal.
·· IP protocol
·· Ingress interface
·· Source IPv4 or IPv6 address
·· Destination IPv4 or IPv6 address
·· Source port (TCP|UDP)
·· Destination port (TCP|UDP)
FIGURE 46
In Cumulus Linux, when a next hop fails or is removed from an ECMP pool, the hashing or hash bucket
assignment can change. For deployments where there is a need for flows to always use the same next hop,
like TCP anycast deployments, this can create session failures. Resilient hashing helps prevent disruptions
when next hops are added but does not assist when next hops are added. When one next hop fails, other
next hops are filled in its place to maintain the fixed total number of buckets.
The ECMP hash performed with resilient hashing is exactly the same as the default hashing mode, and only
the method in which next hops are assigned to hash buckets differs. Resilient Hashing is disabled by default
but does support both IPv4 and IPv6 routes.
Resilient Hashing is supported on switches with Broadcom Tomahawk, Trident II, Trident II+, Trident 3, and
Mellanox Spectrum chipsets.
When resilient hashing is enabled, 65,536 buckets are created to be shared among all ECMP groups, and
next hops are assigned in a round robin fashion. An ECMP group is a list of unique next hops that are
referenced by multiple ECMP routes. The number of buckets can be configured as 64, 128, 256, 512 or 1024;
the default is 128:
64 1024
64 1024
128 512
256 256
512 128
1024 64
To enable resilient hashing, the file /etc/cumulus/datapath/traffic.conf must be edited. You may optionally
modify number of hash buckets, which affects the supported number of ECMP groups per the chart. Then
restart switchd.service.
Spine and Leaf Clos networks are often built with a 4 to 1 uplink to downlink speed ratio in the leafs in direct
port comparison, but with 48 downlink ports and 4, 6, and 8 uplink ports are most common. Leaf Switches
with 10Gb downlinks have 40Gb uplink, 25Gb downlink/100Gb uplink, and 100Gb downlink/400Gb uplink
models. Spine switches commonly have 32 ports matching the leaf uplink port bandwidth such as 40Gb,
100Gb, or 400Gb.
For optimal network performance, hosts are connected via dual 10|40|100Gb uplinks to the access/leaf
switch layer, which in turn is connected via 40|100|400Gb links to the aggregation/spine layer. The number
of servers in a two-tier Clos architecture can be determined by multiplying leaf ports per switch by ports per
spine switch and dividing that number by two as each server will have 2 connections. 48 * 32/2 = 768 for
example with 48 port leaf switches and 32 port spine switches.
Scaling out the architecture involves adding more hosts to the access switch pairs, and then adding more
access switches in pairs or more as needed. In a Layer 2-only environment, additional spine switches should
be added in pairs. Once the limit for the spine switches approaches, an additional network pod of spine/leaf
switches may be added.
L2 use of MLAG/virtual port channel can increase oversubscription. 768 hosts can be connected in the pod
before expansion is required. Common oversubscription ratios are formed starting from the original 4:1
uplink:downlink speed ratio based on port count usage and layer 2 MLAG versus layer 3 solutions for host
connectivity redundancy.
FIGURE 47
Table bandwidth numbers are in Gbps. Leaf and Spine required throughput is calculated by totaling the
interface bandwidth. Numbers and ratios are shown for both the higher and lower bandwidth usage
of the uplinks. Utilizing the 100Gb port as the lower 40Gb capability is an option, but increases the
oversubscription ratios. When utilizing matching uplink and downlink speeds the oversubscription can
explode to unwelcome levels especially when used in conjunction with MLAG. This could be used in a
migration scenario where leafs are replaced first and utilize the uplinks at the lower speeds until the spines
are replaced.
In practice, it is common not to use all 48 ports on leaf switches, which changes the oversubscription ratios.
The chart shows numbers based on using all ports. For example, if 40 ports are used instead (common)
oversubscription ratios drop to 2.5, 1.67, 1.25, 10.0, 6.67, and 5.0 for the top chart.
In a high performance leaf design a switch with all high bandwidth ports are used, and the bandwidth is split
evenly between host ports and network ports to always form a 1:1 subscription ration between uplink and
downlink bandwidth. With a one to one subscription ratio the network is nonblocking as all traffic can flow
at capacity.
FIGURE 48
In the pictured example there are 32 x 100Gb ports, with 16 host ports and 16 network ports. The network
ports provide uplinks and the host ports provide downlink connections to servers. The host ports are
typically broken out so that there are 64 x 25Gb ports.
Troubleshooting
Describe basic troubleshooting techniques
Basic steps
·· Isolate the problem
·· Implementing a fix
·· Verifying the fix resolves the problem
Once a problem is reported, we need to determine the cause of the problem, and if this a problem actually
caused by network related malfunction.
Traditionally this is the most time-consuming step, because it involved going device by device to find the
location of the problem. Not every problem you find during troubleshooting is the problem you are trying
to solve. Engineers had to refer to diagrams in large netwo rks and investigate for a lengthy period of time
before the problem was isolated.
NetQ can perform the bulk of troubleshooting from a single device or management server in seconds. Any
NetQ command can be executed from any Cumulus NetQ device.
NCLU can locally provide this information on a per device basis. In the above example an administrator shut
down 2 connections bringing BGP down as shown in the netq check bgp command output.
Implementing a fix
Once you’ve isolated the issue, you can take corrective action at the appropriate time based on network
impact. Continuing with the previous example, the last committed change can be rolled back via the
net rollback last command on device leaf01.
Verifying will depend on the nature of the issue and the corrective action taken, but NetQ makes it easy to
validate, much in the same manner it was easy to check the infrastructure to isolate the problem.
With as previous example we run show and check bgp again to validate the corrective action. BGP is
checked again via NetQ in this example focusing on leaf01 with grep, and the netq check bgp output now
shows no failed sessions.
CLAG Interfaces
Our Interface Peer Interface CLAG Id Conflicts Proto-Down Reason
---------------- ---------------- ------- ------------------- ------------------
vni13 vni13 - - -
bond01 bond01 1 - -
vni24 vni24 - - -
bond02 bond02 2 - -
https://docs.cumulusnetworks.com/display/DOCS/Prescriptive+Topology+Manager+-+PTM
Verify counters
Verify bonding
Verify VLANs
You can view a list of configured VXLANs for all devices, including the VNI, protocol, address of associated
VTEPs, replication list for BUM traffic, and the last time it was changed. VXLAN information for a given
device is show by adding a hostname to the show command.
Route peering
Refer to BGP Unnumbered Troubleshooting earlier in this document for more BGP examples and show
command output.
Route Table
EVPN
You can view the available unidirectional or bidirectional paths between two devices on the network
currently and historically using their IPv4 or IPv6 addresses, and view the output in json, pretty, or
detail format.
·· JSON output provides the output in a JSON file format for ease of importing to other applications
or software.
·· Pretty output lines up the paths in a pseudo-graphical manner to help visualize multiple paths.
·· Detail output is the default when not specified and is useful for traces with higher hop counts where
the pretty output wraps lines, making it harder to interpret the results. The detail output displays a
table with a row per hop and a set of rows per path.
Number of Paths: 2
Number of Paths with Errors: 0
Number of Paths with Warnings: 0
Path MTU: 1500
Top
cumulus@leaf01:mgmt-vrf:~$ top
top - 21:22:43 up 7:01, 2 users, load average: 0.22, 0.20, 0.18
Tasks: 116 total, 1 running, 115 sleeping, 0 stopped, 0 zombie
%Cpu(s): 10.1 us, 3.0 sy, 0.0 ni, 86.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.3 st
KiB Mem: 1981808 total, 487760 used, 1494048 free, 876 buffers
KiB Swap: 0 total, 0 used, 0 free. 242384 cached Mem
IOstat
02/21/2019 07:53:43 PM
avg-cpu: %user %nice %system %iowait %steal %idle
8.58 0.00 3.42 0.12 0.25 87.63
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r _ await w _ await
svctm %util
vda 0.00 0.00 0.00 0.00 0.02 0.00 7.69 0.00 0.58 0.58 0.00
0.57 0.00
sda 0.00 0.29 0.27 1.32 8.19 56.79 81.62 0.01 7.29 20.40 4.60
2.57 0.41
MPstat (sysstat)
07:58:17 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
07:58:17 PM all 8.58 0.01 3.34 0.12 0.00 0.08 0.24 0.00 0.00 87.63
07:58:17 PM 0 8.58 0.01 3.34 0.12 0.00 0.08 0.24 0.00 0.00 87.63
Memory can be seen in the output of the top command as referenced in the CPU section, as well as the
below examples.
Free
cumulus@leaf01:mgmt-vrf:~$ free
total used free shared buffers cached
Mem: 1981808 492992 1488816 6188 876 244816
-/+ buffers/cache: 247300 1734508
Swap: 0 0 0
/proc/meminfo
SwapFree: 0 kB
Dirty: 136 kB
Writeback: 0 kB
AnonPages: 178324 kB
Mapped: 37620 kB
Shmem: 6188 kB
Slab: 42892 kB
SReclaimable: 30392 kB
SUnreclaim: 12500 kB
KernelStack: 2400 kB
PageTables: 6216 kB
NFS _ Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 990904 kB
Committed _ AS: 554544 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 11048 kB
VmallocChunk: 34359724992 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages _ Total: 0
HugePages _ Free: 0
HugePages _ Rsvd: 0
HugePages _ Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 71544 kB
DirectMap2M: 2025472 kB
cumulus@leaf01:mgmt-vrf:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 10M 0 10M 0% /dev
tmpfs 388M 5.8M 382M 2% /run
/dev/sda4 5.8G 1.2G 4.4G 21% /
tmpfs 968M 0 968M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 968M 0 968M 0% /sys/fs/cgroup
/dev/sda4 5.8G 1.2G 4.4G 21% /var/lib/libvirt/images
/dev/sda4 5.8G 1.2G 4.4G 21% /var/cache/cumulus
/dev/sda4 5.8G 1.2G 4.4G 21% /var/opt
Services can be checked directly within Linux or via NetQ. The systemctl status <unit>.service command
will provide the status of an individual service. You can enter ntp, another service, or a wildcard in order to
check service status.
Mar 01 15:02:26 oob-mgmt-server ntpd[799]: 1 Mar 15:02:26 ntpd[799]: Listen and drop
on 1 v4wildcard 0.0.0.0:123
Mar 01 15:02:26 oob-mgmt-server ntpd[799]: 1 Mar 15:02:26 ntpd[799]: Listen normally
on 2 lo 127.0.0.1:123
Mar 01 15:02:26 oob-mgmt-server ntpd[799]: Listen normally on 3 eth0 10.255.0.1:123
Mar 01 15:02:26 oob-mgmt-server ntpd[799]: Listen normally on 4 lo [::1]:123
Mar 01 15:02:26 oob-mgmt-server ntpd[799]: Listen normally on 5 eth0 [fe80::4638:39ff:
fe00:100%2]:123
Mar 01 15:02:26 oob-mgmt-server ntpd[799]: Listening on routing socket on fd #22 for
interface updates
Mar 01 15:02:26 oob-mgmt-server ntpd[799]: 1 Mar 15:02:26 ntpd[799]: Listen normally
on 3 eth0 10.255.0.1:123
Mar 01 15:02:26 oob-mgmt-server ntpd[799]: 1 Mar 15:02:26 ntpd[799]: Listen normally
on 4 lo [::1]:123
Mar 01 15:02:26 oob-mgmt-server ntpd[799]: 1 Mar 15:02:26 ntpd[799]: Listen normally
on 5 eth0 [fe80::4638:39ff:fe00:100%2]:123
Mar 01 15:02:26 oob-mgmt-server ntpd[799]: 1 Mar 15:02:26 ntpd[799]: Listening on
routing socket on fd #22 for interface updates
Automation
Identify potential automation templates
Tasks and information that are repeated often in multiple places or events are great candidates for
templates and automation. Configuration information such as NTP, SNMP, and other settings may be
common across every device in a network or at a given site. IP addresses, MAC addresses, interface
configuration, and other items may exist with commonalities and minor differences that can be overcome.
Commonly automated tasks:
·· Rapid Provisioning
·· Easy hardware swap for replacement
·· Configuration stored remotely and downloaded during provisioning
·· Elimination of manual cut and paste device configuration provisioning
FIGURE 49
FIGURE 50
·· Configuration Management
·· Configuring /etc/network/interfaces
·· Configuring daily messages (MOTD)
·· Configuring FRR
·· Interface and IP address templates
FIGURE 51
Version Control makes it easier to push configurations to a test environment that mirrors your
production network.
Editing configuration files by hand can be prone to error, and you should first test all changes in a simulated
lab environment. Contrary to intent-based networking hype, automation platforms cannot read your mind
to determine what you MEANT to achieve. There are many benefits of automation.
Describe a library/module
A module is a referenced non-volatile resource that defines one or more functions or classes which you
intend to reuse in different codes of your program.
A library is a collection of non-volatile resources (modules) providing related functionality used by programs
including configuration data, documentation, help information, message templates, pre-written code,
classes, values, or type specifications. It is also a collection of implementations of repetitive behavior.
Writing high level programs can reference the library to handle system calls rather than writing the same or
similar code each time.
Describe groupings
Grouping devices organizes them by commonalities for the ease of automation. For example all leafs may
be in a group called leafs or all spine devices could be in a group named spines, since the members of each
of those respective groups have many commonalities.
Further, platforms usually allow for the nesting of groups or multiple inclusion so that a device can belong
to the groups; switches, cumulus, california, and leaf for example.
[switches:children]
cumulus
eos
ios
[cumulus]
cumulus01.example.net
[eos]
eos01.example.net
[ios]
ios01.example.net
In the automation push model, a user or process on a server sends commands to the local system for
immediate execution. The push model depends on the clients being reachable during execution and the
limitations are generated from the master.
In the automation pull model, an agent or the local system polls a server for information for non-immediate
execution. In the pull model clients call the master when they are able to, providing increased scalability, but
less control over when clients poll or receive information.
Agentless management requires no additional software and relies on existing process, such as SSH, to
communicate with the control server. Ansible is an example of an agentless automation model.
Describe idempotentcy
Running the event multiple times will not change the outcome. The result of a successful performed event is
independent of the numbers of times it is executed.
If a configuration is re-applied over and over the end result is the same configuration. The method of
application may be impactful.
Ansible is an open source, lightweight configuration management tool that can be used to automate
many configuration tasks. Ansible does not require an agent be run on a switch; instead Ansible manages
nodes over SSH. Using Ansible, you can run automation tasks across many end points, whereas you use
Mako within the context of a single switch. A particular script that runs a variety of tasks is referred to as a
playbook in Ansible.
Puppet
Puppet is an open source tool that can automate configuration management through the use of a controller
that syncs with agents installed on each end point. While similar in functionality to Ansible, Puppet relies
on agents installed on each switch being managed, SSL certificates, and DNS to properly function. Puppet
utilizes TCP port 61613 for syncing between agents and a controller. In Puppet, a particular script that runs a
variety of tasks is referred to as a manifest, similar to the idea of an Ansible playbook.
Chef
Chef is an agent-based open source tool using Ruby in order to accomplish configuration management in a
client-server architecture. Chef uses resources to make a recipe and collects those into cookbooks to apply
a desired state to the network as defined in the recipes. Agents poll the information from master servers
that respond via SSH.
SaltStack
Salt was designed to enable low-latency and high-speed communication for data collection and remote
execution in sysadmin environments. The platform is written in Python and uses the push model for
executing commands via SSH protocol. Salt allows parallel execution of multiple commands encrypted
via AES and offers both vertical and horizontal scaling. A single master can manage multiple masters, and
the peer interface allows users to control multiple agents (minions) directly from an agent. In addition to
the usual queries from minions, downstream events can also trigger actions from the master. The platform
supports both master-agent and de-centralized, non-master models.
CFEngine
The oldest and establish tool in configuration management written in C language. Lightweight agent system.
Manages configuration of a large number of computers using the client–server paradigm or stand-alone.
Any client state which is different from the policy description is reverted to the desired state. Configuration
state is specified via a declarative language. CFEngine’s paradigm is convergent “computer immunology.”
One file can be referenced and transferred to the device rather than hundreds of NCLU commands in a
procedural fashion. This method also provides commonality to existing Linux servers and infrastructure.
To enable, start, or stop the HTTP API service, run the following systemd commands:
There are two configuration files associated with the HTTP API services. The first configuration file is used
for non-chassis hardware; the second for chassis hardware. Generally, only the configuration file relevant to
your hardware needs to be edited, as the associated services determine the appropriate configuration file to
use at run time.
·· /etc/nginx/sites-available/nginx-restapi.conf
·· etc/nginx/sites-available/nginx-restapi-chassis.conf
The HTTP API services are configured to listen on port 8080 for chassis hardware by default. However, only
HTTP traffic originating from internal link local management IPv6s will be allowed. To configure the services
to also accept HTTP requests originating from external sources:
2. Uncomment the server block lines near the end of the file
3. Change the port on the now uncommented listen line if the default value, 8080, is not the preferred
port, and save the configuration file
4. Verify the configuration file is still valid, and if the configuration file is not valid, return to step 1; review
any changes that were made, and correct the errors
The IP:port combinations that services listen to can be modified by changing the parameters of the listen
directive(s). By default, nginx-restapi.conf has only one listen parameter, whereas /etc/nginx/sites-available/
nginx-restapi-chassis.conf has two independently configurable server blocks, each with a listen directive.
One server block is for external traffic, and the other for internal traffic. All URLs must use HTTPS, rather
than HTTP. Do not set the same listening port for internal and external chassis traffic.
All traffic must be secured in transport using TLSv1.2 by default. Cumulus Linux contains a self-signed
certificate and private key used server-side in this application so that it works out of the box, but Cumulus
Networks recommends you use your own certificates and keys. Certificates must be in the PEM format.
©2019 Cumulus Networks. All rights reserved. CUMULUS, the Cumulus Logo, CUMULUS NETWORKS, and the Rocket Turtle Logo (the “Marks”) are trademarks and service
marks of Cumulus Networks, Inc. in the U.S. and other countries. You are not permitted to use the Marks without the prior written consent of Cumulus Networks. The registered
trademark Linux® is used pursuant to a sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis. All other marks are used under
fair use or license from their respective owners.
09042019