Professional Documents
Culture Documents
{misrakir, kharoliw}@msu.edu
May 1, 2001
Abstract: The Internet today is going through a high degree of transformation, new generations
of innovative and fast applications like, e-commerce, video streaming, transfer of medical-
transcripts etc. are placing high performance demands on the Internet infrastructure. To keep
pace with the continuous demands, not only does the bandwidth need to be increased, but also the
routers that power the Internet have to evolve architecturally to keep up with the escalating use of
the web. Here we look at the different router architectures and also study the merits of some of
1. Introduction
The Internet has developed at an exponential rate, and so has the traffic on it. This has placed an
enormous amount of pressure to continually improve router performance. The diverse nature of
the traffic on the Internet makes the task of improving router performance on the Internet
extremely challenging. The main demands of any Internet infrastructure today are:
• To utilize the network capacity to the fullest so as to transfer large volumes of traffic.
• To be able to scale networks quickly and cost effectively with minimal impact on the
network operations.
• The ability to ensure the delivery of packets in proper sequence and minimize packet loss
The report starts by looking at the functionality of routers and the different popular architectures
in use. We then look at router performance in light of various switching techniques. We also
1
look at the use of cache in routers and how it affects performance. Finally we look at two
commercially available routers, and examine their architecture and performance in the light of
2. Related Work
Routers have adopted various techniques to overcome the high performance requirements of the
Internet environment. Some of them use larger caches, while some of them have adopted the
performance degrades as the volume of traffic increases, however in a distributed architecture the
processing load gets distributed, ensuring faster and more reliable communication. In this report
we look at the following techniques adopted to improve router performance: Cache, Switching,
Generally, the router performs two main functions: Control path routines and Data path control
(switching). Routers maintain and manipulate routing tables; they listen for updates and change
the routing tables to reflect the new network topology. The network topology in the core of the
Internet and in enterprise networks is extremely dynamic and changes very frequently. Routers
also classify packets and perform control actions on the packets; it performs Layer 3 switching
and sometimes maintains statistical data on the data-flow. Typically, packets are received at
inbound network interface; they are then processed by the processing module (CPU), possibly
stored in the buffering module. The packets are then forwarded through the switching fabric to
outbound interface that transmits the packet to the next hop router. The architecture of a
conventional router is given in Figure 1, the CPU typically performs functions such as path
computations, routing table maintenance and reachability propagation. The router adjusts the
2
Time-to-live (TTL) filed in the packet to prevent packets from circulating endlessly. Packets
whose lifetime is exceeded are dropped by the router (sender may or may not receive error
message). The router also checks the validity of the data based on the checksum, since the router
changes the TTL field, it needs to incrementally update the checksum before forwarding the
packet. One of the performance bottlenecks in routers is to lookup the address of the next hop.
The first approaches used Patricia trees [1][2] combined with hash tables, these are binary trees
that used the destination IP address as the lookup key. A key to overcoming this performance was
the introduction of lookup caches, which relied on the fact that there is enough locality in the
traffic, to maintain a high hit rate. The frequent changes in network topology in the core of the
Internet causes the cache entries to be invalidated frequently, resulting in smaller hits. Hardware
based caching; route lookup and forwarding solutions are generating a lot of interest due to the
The various techniques used for improving router performance are: Route Cache, Switching,
3
4.1 Route Caching
Traditionally all the route lookup has been done centrally by the CPU, but this places a lot of load
on the CPU and the CPU then becomes a performance bottleneck. So routers have now moved
towards providing interfaces (line cards) with processing power. The line cards now have a CPU
of their own, which determine the outbound interface using the Cache it maintains (figure 2). The
cache contains lookup information on the next hop for a destination IP. The cache is updated
based on packets that were forwarded recently, and algorithms like, say, LRU, FIFO. Entries in
cache may be validated as the network topology changes. This not only speeds up the packet
forwarding, as the packets are forwarded from one interface to another (fast path), but also
reduces load over the system bus. Packets are transmitted only once over the shared bus, the CPU
is now free to perform other functions like, -routing and, determining policies and resources used
by interfaces. If there is a cache miss the packet is forwarded to the CPU, which performs a route
table lookup (slow path), before forwarding the packet to the appropriate interface. The use of
cache makes the throughput of the architecture dependent on the nature of traffic. The
4
• Performance of slow path
4.2 Switching
As the route lookup was off-loaded to the interfaces, the CPU became less of a bottleneck,
however the shared bus limited the speed at which packets could be forwarded from one interface
to another. This leads to the next improvement in routers; the shared bus being replaced by a
switch fabric (figure 3). The switch fabric provides a large bandwidth for transmitting packets
between interface cards, and increases throughput considerably. All commercially available
Gigabit Routers use Switch fabric. A switching fabric offers parallelism, but makes it difficult to
multicast packets.
When the function of route lookup was delegated to the interfaces, it required the use of CPU and
cache in every interface. This restricted the number of interfaces as cost increased along with the
number of interfaces. This put a restriction on port density. The approach that was adopted to
5
Figure 4: Forwarding Engines [11]
overcome this problem was to separate the forwarding engines from the line cards. Multiple
forwarding engines (figure 4) are connected in parallel to achieve high throughput. As the packets
arrive through the inbound interface, the header is stripped from the packet, and attaches a tag.
This tagged header is then assigned to the appropriate forwarding engine in a round-robin
fashion. The forwarding engines have a FIFO queue in which the header is placed (The load can
be shared equally by the forwarding engines). Every forwarding engine has its own route cache.
The forwarding engine performs basic error checks and then computes the hash offset into the
route cache. The Forwarding engine then generates the appropriate header, and forwards the IP
packet (along with the appropriate header) to the appropriate interface. The basic function of the
engines eliminates any unnecessary payload over the system bus. Packets are transferred directly
between interfaces; they never go to forwarding engines or route processor unless they are
destined for them. The use of Forwarding engines is based on the presumption that it is unlikely
that all interfaces will be bottlenecked at the same time, hence sharing of forwarding engines can
increase the port density of the router. In some applications the order in which the packets are
sent/received may be important, so forwarding engines can be made to output their data in the
6
same order they received it. The performance of a forwarding engine can be considerably
improved by designing it as an ASIC, however the Internet is constantly evolving and any change
CPU
-Routing protocols (RIP, BGP etc.)
-Error and maintenance protocols (ICMP, IGMP)
-Network management
-QoS policies
-Applications (UDP, TCP etc.)
-IP options processing
-Packet fragmentation & Reassembly
-ARP
Switch Fabric
-IP header validation -IP header validation
-Router Lookup & Packet Classification -Router Lookup & Packet Classification
-TTL Update -TTL Update
-Checksum Update -Checksum Update
contains a switch fabric, which increases the number of packets that can be transmitted between
interfaces. Communication between interfaces and CPU can also be through the switch fabric,
making CPU equivalent to an interface, but with additional functionality. The routing decisions
are made by the Forwarding Engines, which contain their own route cache. The various
functionalities of a router may be built in the slow-path or the fast–path based on the nature of the
functionality, for e.g. Sometimes the IP datagram received is too large for the MTU of the output
port. Then the datagram is fragmented and forwarded. This fragmented datagram is to be
reassembled at the destination router, although fragment reassembly is resource intensive, the
7
number of packets, which are fragmented, is normally quite low, so it is generally built in the
slow path. The functional partitioning (an example) of distributed router architecture is dis played
in figure 5.
A switch fabric is a mechanism for allowing each line card to transmit data to any other line card
as needed. High performance Switch Fabric needs to be non-blocking, i.e., they eliminate IP
traffic jams within the router. However the use of high-speed memory to achieve this goal is
expensive and impractical. A more efficient design relies on buffering, queuing and scheduling
techniques. The Switch Fabric is a big influence on the performance of routers, especially in the
Gigabit domain. The design of Switch fabric is influenced by many factors, the need for
multicasting, fault tolerance, and delay priorities. This leads to building of redundancy in Switch
The four basic type of switch fabrics are: Shared medium, shared memory, Distributed output
This is the simplest type of switch fabric where the IP packets are transferred from one interface
to another over a shared medium e.g. bus, ring or dual bus. Data is transmitted over the shared
medium using time division multiplexing (TDM), however the bandwidth of the bus is major
bottleneck. Also there is a certain overhead for bus arbitration, this is a major bottleneck in the
Gigabit domain. For all its limitations the Shared medium architecture has found several
8
5.2 Shared Memory
The incoming data is stored in a shared memory pool. The headers are examined to determine the
appropriate output port. The output port can then read the data from the shared memory. This
method has the advantage that it minimizes the need for buffering, as the shared memory absorbs
large bursts of output. However since the packets are written into and read out of the memory one
at a time the speed of the memory must be at least equal to the throughput, this may be a problem.
Figure 6 shows the various components of this type of switch fabric. There are N2 independent
paths from the inbound interfaces to the outbound interfaces. Every output port determines if the
packet is destined for it using the Address Filters (AF). There is no waiting at the input, all the
data is buffered at the output. The memory buffer at the output needs to be, only as fast as the
output port. Ideally, the depth of the output buffer needs to be N to make sure there is no packet
loss. However, practically the depth is usually kept as L < N, at the cost of some packet loss.
9
5.4 Space Division
Another popular name for the Space Division Switch Fabric is the Crossbar Switch (Figure 3).
This architecture establishes connection between a inbound interface and an outbound interface,
based on the destination IP. It has the following advantages: (i) low cost (ii) good scalability (iii)
non-blocking properties (iv) Convenient to provide QoS guarantees. The speed at which the
switch fabric must operate is at least equal to the aggregate speed of all input links connected to
the fabric. A serious drawback of crossbar switches is Head of Line Blocking (figure 7). If an
interface receives a number of packets, it may decide to write them to the same output port, this
may lead to queuing up of packets. Another scenario is when an interface receives several
inbound packets and puts the packets in the input queue. The packet at the head of the input
queue is blocked because the output port corresponding to its destination is not free. The input
packets behind the head may be destined for a different output port (which may be available), are
blocked. This leads to a slow down in the forwarding rate. There are several ways to combat this,
keep track of traffic on every output port. Every cycle (epoch) the input port is matched to an
output port depending on the traffic. The input ports bid for their output ports; an Allocator,
arbitrates and assigns the output ports. This scheduling is pipelined. Another method is to give
10
every output port its own lane (input buffer), so that a packet waiting on an output port does not
block packets bound for a different output port. Increasing the speed of input/output channel is
another alternative. The above approaches can be used to maximize the throughput of the
crossbar switch.
The Shared memory and Shared medium approaches, achieve throughput limited by memory
access time. The crossbar switch does not have any such limitations. However, the fabrication
density limits the speed of switching in the Crossbar Switch. It is generally accepted that large
router switch fabrics (of 1 TB or higher) cannot be obtained by simply scaling up a fabric design
6. Other Issues
The nature of traffic is another factor influencing the performance of routers. Although not
strictly under the control of the designer, it is helpful to know the behavior of routers under
different data profiles. The figure 8, displays a graph indicating the number of packet transmitted
per second depending on the frame size over various connections like T1, Ethernet and T3. It is
11
observed that routers perform better when the packet size is larger as compared to when the
packet size is smaller (for the same amount of data). This is logical, as the overhead for lookup is
7. Case Studies
We have performed case studies of two router architectures as part of the survey – Cisco 7500,
Cisco Express Forwarding (CEF)[3][4]: CEF is a non-cache based switching mode for IP
packets. In cache based switching the first packet of a flow is sent up to the process level (CPU),
where its destination address is compared with the routing table to obtain the forwarding
information. The route cache entry for the corresponding forwarding information is built, so that
the subsequent packets of the same flow can be fast-switched based on the route cache. With CEF
instead of building route cache entries on demand, a Forwarding Information Base (FIB) is built.
It is based on the entire routing table, and is downloaded to all the interfaces for distributed
switching. The FIB has a one-to-one correspondence to the routing table. FIB is updated only
Tag Switching [3][4]: Tag switching is a new IP packet switching scheme that adds a tag to each
IP packet. The tag is used by tag switches (which can be routers or ATM switches) as the basis
for packet switching, instead of the original IP destination addresses. There is a separate Tag
Distribution Protocol (TDP) for maintaining the mapping between IP addresses and tags. Tag
switching offers the following benefits: (i) Flexible traffic engineering (ii) IP-ATM integration
(iii) Scalable Border Gateway protocol (iv) Virtual Private Network (VPN) implementations.
VPN uses encryption or tunneling to connect users or sites over a public network.
Parallel Express Forwarding (PXF)[5]: The Cisco PXF is a powerful new multiprocessor
technology to enable forwarding performance on the order of millions of packets per second. PXF
12
has the ability to deploy new services such as Multiprotocol label switch (MPLS), VPN etc. PXF
like CEF avoids the potential overhead of continuous cache churn by using a FIB for the
destination switching decision. PXF enhances the FIB model by separating the control plane
functions from forwarding plane functions. PXF uses a parallel array of processors for an
accelerated switching path. The architecture allocates independent memory to each processor as
well as memory for each column of processors, to provide memory access optimization. A more
The Cisco 7500[ 3] is designed for environments requiring high performance, high availability
routing. It was introduced in 1995 and had a centralized architecture. It uses CEF and Tag
switching to obtain higher performance. The Route Switch Processor (RSP) in Cisco 7500
performs the following functions: Switching data packets, Providing additional services (such as
encryption, compression, access control, QoS, and traffic accounting) to data packets, running
routing protocols to maintain switching intelligence, and handling other system maintenance
functions such as network management. The architecture of the RSP is as shown in figure 9, at
the heart lies a RISC processor (MIPS R4600/MIPS R4700/MIPS R5000). The Cisco
13
Internetwork Operating System (IOS) executes on this processor to perform all the functions of
the RSP. A description of the other major components of the RSP and their functionality is as
follows [7]:
Boot Rom: It contains the ROM Monitor containing the startup diagnostic code, and exception
handling.
NVRAM: It contains the Startup configuration file and 16-bit configuration registers which
Boot Flash: It houses the software component RxBoot, in the host mode, RxBoot is used for
SRAM: The RAM can be divided functionally into two parts, Main and IO. The IO part contains
buffers for interfaces and some system buffers. The main part contains the image of the Cisco
In 1996, a distributed architecture using Versatile Interface Processors (VIP -figure 10) was
introduced. Each VIP has its own processor, which is capable of switching IP data packets and
providing certain network services like encryption, compression, access control, QoS and traffic
accounting. The RSP can now devote all its CPU cycles to handle other essential tasks, such as
14
routing protocols, non-IP traffic, tunneling, network management etc. The VIP2-50[3] is the most
recent addition to the VIP2 family. It uses the MIP R5000 processor, with up to 8MB of SRAM
and up to 128 MB of DRAM. The additional memory capacities give the VIP2-50 more queuing
capabilities and more storage to handle large routing tables. VIP2-50 supports all available WAN
and LAN port adapters (PAs), and there are also Packet over Synchronous Optical Network
based on VIP2-50 platform. The VIP2-50 has a minimum software requirement of Cisco IOS
The Cisco 10000 Edge Service Router (ESR) [6] is a Layer-3, 10-slot platform optimized to meet
the large-scale leased line aggregation requirements of Internet service providers (ISPs). The
Cisco 10000 ESR dedicates two slots for active and redundant processor modules and eight slots
for interface modules. Interface modules can be configured in any of the eight available slots. All
modules require a single slot and are hot-swappable. The Cisco 10000 ESR supports a:
15
• Channelized OC-12 module
The heart of the Cisco 10000 ESR’s high performance and throughput is the Performance
Routing Engine (PRE). The PRE uses PXF to support high-performance throughput with IP
services enabled on ever port. The PRE has two PCMCIA slots, 32 MB of Flash memory, and
128 MB packet buffer. It also supports 512 MB of SDRAM for use by Cisco IOS® software. The
Cisco 1000ESR has two major blocks: Line cards and the PRE. Each line card manages its own
interface type, sending and receiving complete packets to PRE across the backplane. Most
communication devices are based on a shared system bus to which all circuit cards are attached,
but Cisco 10000 ESR replaces it with line card interconnect that uses point-to-point links between
each line card and each PRE. This provides a high bandwidth and fault isolation. Transfer speeds,
up to 3.2 Gbps in each direction, can be achieved using the point-to-point links.
Two PRE can be configured in a single router to provide fault-tolerance and high availability. The
PRE consists of the forwarding path (FP) and the route processor (RP). The FP executes packet-
forwarding algorithm on every packet flowing through the router. The FP is based on PXF
network processor, each PXF network processor provides a packet-processing pipeline consisting
Figure 12: Cisco 10000 ESR Forwarding Path Processor Array [6]
16
The RP runs the routing protocols, does update calculations, and handles other control-plane
function such as the SNMP agent and command line interface (CLI). Each of the 16 processors in
processing. Each processor, called an eXpress Micro Controller (XMC), provides a sophisticated
packet-processing tasks efficiently. Within a single PXF network processor, the 16 XMC are
linked together in four parallel pipelines. Each pipeline comprises four microcontrollers arranged
as a systolic array, where each processor can efficiently pass its result to its neighboring
downstream processor. Four parallel pipelines are used, further increasing throughput. Within
Cisco 10000 ESR, two PXF network processor ASICs are used, yielding four parallel processing
pipelines, each containing eight processors in a row (Figure 12). In the array of processors,
hardware, microcode, and Cisco IOS software resources are combined to provide advanced, high-
touch feature processing on the Cisco 10000 ESR. The allocation of features is constantly
changing, but one such allocation could be: Layer 2 Analysis (level 1), FIB Switching (level 2),
Additional Features (level 3,4,5,7), MAC Rewrite (level 6), Enqueue/Dequeue (level 8).
The RP also includes standard Cisco IOS facilities like Flash memory, nonvolatile RAM for
8. Conclusion
The growth of the Internet has propelled the emergence of routers with forwarding rates in the
Gigabit and Terabit range. This report has briefly presented the various approaches used for
designing high-speed routers. We have seen that shared buses become a bottleneck in high-speed
routers, necessitating the use of switched backplane. High-speed routers must be robust and must
17
have enough parallelism to support QoS, multicast etc. Routing table lookup and data movements
To improve performance, critical functions are now performed in ASIC. Parallelism is being
exploited by the use of pipelining in the forwarding path. The cost of a router port is dependent
on the type and size of memory at the ports. Faster memory is expensive, but may be required to
meet the performance criteria. The buffer size should be sufficiently large to avoid packet losses,
or to contain them within reasonable limits. In case of crossbar switches, head of line blocking
plays a major role in determining effective switching speed. Crossbar switches require the use of
techniques like faster input/output channels, or allocation of lanes (input queues) for every output
Industry has developed its own architectural design to improve performance. Use of CEF and tag
switching is popular in all Cisco High speed routers. Also the use of PXF (pipelining) cuts the
average time in the forwarding path of routers. Extensive research is being carried out in the
industry to improve router performance to keep up with the exponential growth of the Internet.
The field presents many unique and interesting challenges, on which the research community and
References
[1] K. Sklower “A Tree-Based Packet Routing Table for Unix”, USENIX Winter’91, Dallas, TX,
1991.
18
[5] White Paper, “Parallel eXpress Forwarding in the Cisco 10000 Edge Service Router”,
http://www.cisco.com
http://www.cisco.com/warp/public/cc/pd/rt/12000/tech/ruar_wp.htm
http://www.cisco.com/univercd/cc/td/doc/product/atm/c8540/12_0/13_19/trouble/l3_net.htm
[11] James Aweya, “IP Router Architectures: An Overview”, Nortel Networks, Ottawa, Canada,
K1Y 4H7
[12] White Paper, “The Evolution of high-end Router Architectures-Basic Scalability and
[13] Partridge, Carvey, Burgess, Castineyra, Clarke, graham, Hathaway, Herman, King, Kohlami,
Ma, Mcallen, Mendez, Milliken, Osterlind, Pettyjohn, Rokosz, Seeger, Sollins, Storch, Tober,
Troxel, Waitzman, Winterble, “A Fifty Gigabit Per Second IP Router”, BBN Technologies (a part
of GTE Corporation)
[14] Vibhavasu Vuppala, Lionel M. Ni, “Virtual Network Ports: An Inter-network Switching
[15] Newman, Minshall, Lyon, Hutson, “IP Switching and Gigabit Routers”, Ipsilon Networks
Inc.
19