You are on page 1of 3

COMMUNICATIONS

(ATM). The network processor typically

Packet Scheduling segments variable-length packets, such as


IP packets, into smaller, fixed-size data
units that traverse the switch core and
later reassembles them at the output ports

in Next-Generation for transmittal in their original format.


The length of data units passing
through the switch core directly affects
switch architecture and performance.

Multiterabit Larger data units relax the timing


requirements imposed on the switching
and scheduling mechanisms, while
smaller-sized units offer finer switching

Networks granularity. Designs commonly use 64-


byte data units as a trade-off.

QoS classes
Itamar Elhanany, Michael Kahane, and Dan Sadot Accompanying the rapid pace of
Ben-Gurion University Internet traffic is the increasing popular-
ity of multimedia applications sensitive
to mean packet delay and jitter. To pro-
vide differentiated services, developers

T
he infrastructure required to verse, and repeatedly reconfigures the categorize incoming packets into QoS
govern Internet traffic volume, crosspoint for successive transmissions. classes. For each QoS class, the router
which doubles every six months,
consists of two complementary
elements: fast point-to-point Multiterabit networks will require
links and high-capacity switches and innovative queuing strategies
routers. Dense wavelength division mul-
tiplexing (DWDM) technology, which and high-performance scheduling
permits transmission of several wave-
lengths over the same optical media, will algorithms to meet the future
enable optical point-to-point links to packet-scheduling challenge.
achieve an estimated 10 terabits per sec-
ond by 2008. However, the rapid growth
of Internet traffic coupled with the avail- Optical versus electrical must not exceed the acclaimed latency
ability of fast optical links threatens to A major debate exists on whether next- and jitter requirements.
cause a bottleneck at the switches and generation multiterabit routing scenarios For routers to operate under heavy traf-
routers. should use optical or electrical switch fab- fic loads while supporting QoS, designers
rics. At first glance, all-optical switches must implement smarter scheduling
SWITCH FABRICS are the straightforward solution because schemes, a difficult task given that routers
Switches and routers consist of line they enable high-speed multiterabit aggre- usually configure the crosspoint and trans-
cards and a switch fabric. A network gation and concentration. However, mit data on a per-data-unit basis. The
processor that interfaces to the physical emerging router functions, such as qual- scheduling algorithm must make a new
data links, decodes incoming traffic, and ity-of-service (QoS) provisioning, require configuration decision with each incom-
applies traffic shaping, filtering, and that packets be bufferedtypically at the ing data unit. In an ATM switch, where
other policy functions resides on each network processorsuntil the scheduler the data unit is a 53-byte cell, the algorithm
line card, which is associated with a port. grants transmission. Because dynamic must issue a scheduling decision every 168
Actual data transmission across ports optical buffering is impractical, delayed ns at 2.5-Gbit-per-second line rates. As 10-
occurs on the switch fabric, which transmissions currently dictate using opto- Gbit-per-second port rates become stan-
includes a crosspoint, a scheduler, and electric and electro-optical conversions. dard for high-end routers, the decision
buffers. The crosspoint is a configurable Switching nodes, which entail simpli- time reduces fourfold, to a mere 42 ns.
interconnecting element that dynamically fied buffer and transmission manage-
establishes links between input and out- ment, more efficiently process network QUEUING STRATEGIES
put ports. The scheduler configures the protocols deploying fixed packet sizes The switch fabric core embodies the
crosspoint, allows buffered data to tra- such as asynchronous transfer mode crosspoint element responsible for

104 Computer
matching N input and output ports.
Currently, routers incorporate electrical
crosspoints but optical crosspoint solu- Port processor N N crosspoint
tions, such as those based on dynamic
Input port 1 Output port 1


DWDM, show promise. Regardless of


the underlying technology, the basic func-

tionality of determining the crosspoint
configuration and transmitting data Prioritized virtual
remains the same. output queuing


Input queuing
Port processor
We can generally categorize switch-
queuing architectures as input- or out-
put-queued. Network processors store Input port N Output port N


arriving packets in input-queued switches
in FIFO buffers that reside at the input
or ingressport until the processor sig- Prioritized virtual
nals them to traverse the crosspoint. output queuing
A disadvantage of input queuing is that
a packet at the front of a queue can pre-
vent other packets from reaching poten- Crosspoint scheduler
tially available destinationor egress
ports, a phenomenon called head-of-line
(HOL) blocking. Consequently, the over-
all switching throughput degrades sig- Figure 1. Router-switch architecture consisting of virtual output queuing buffering, crosspoint,
nificantly: For uniformly distributed and scheduling modules. The network processor automatically classifies packets arriving at the
Bernoulli iid traffic flows, the maximum input ports to queues corresponding to their destination. The crosspoint scheduler receives
achievable throughput using input queu- transmission requests from the various queues and determines which queue is granted
ing is 58 percent of the switch core capac- transmission at each time slot. Packets accordingly traverse the crosspoint to their designated
ity. output ports and depart the switch.
Virtual output queuing (VOQ) entirely
eliminates HOL blocking. As Figure 1 of 10- and 40-Gbit-per-second rates, how- minimize packet loss resulting from
shows, every ingress port in VOQ main- ever, makes acceleration by N infeasible. buffer overflow, and
tains N separate queues, each associated Trading space for time by deploying support strict QoS requirements in
with a different egress port. The network space-division multiplexing (SDM) tech- accordance with diverse data classes.
processor automatically classifies and niques requires a dedicated path within
stores packets upon arrival in a queue cor- the switch fabric for each input-output Intuitively, these objectives appear con-
responding to their destination. VOQ thus pair. However, SDM implies having tradictory. Temporarily maximizing input-
assures that held-back packets do not O(N2) internal paths, only N of which output matches, for example, may not
block packets destined for available out- can be used at any given time because at result in optimal bandwidth allocation in
puts. Assigning several prioritized queues most, N input ports simultaneously terms of QoS, and vice versa. Scheduling
instead of one queue to each egress port transmit to N output ports. This require- is clearly a delicate task of assigning
renders per-class QoS differentiation. ment renders SDM impractical for large ingress ports to egress ports while opti-
port densities such as N = 64. mizing several performance parameters.
Output queuing Moreover, as port density and bit rates
Output-queuing strategies directly SCHEDULING APPROACHES increase, the scheduling task becomes
transfer arriving packets to their desig- The main challenge of packet sched- increasingly complex because more deci-
nated egress ports. A contention resolu- uling is designing fast yet clever algo- sions must be made during shorter time
tion mechanism handles cases in which rithms to determine input-output frames. Advanced scheduling schemes
two or more ingress ports simultaneously matches that, at any given time: exploit concurrency and distributed com-
request packet transmission to the same putation to offer a faster, more efficient
egress port. maximize switch throughput uti- decision process.
A time-division-multiplexing (TDM) lization by matching as many input-
solution accelerates the switch fabric inter- output pairs as possible, PIM and RRM
nal transmission rates by N with respect to minimize the mean packet delay as Commonly deployed scheduling algo-
the port bit rates. The growing prevalence well as jitter, rithms derive from parallel iterative

April 2001 105


Communications

matching (PIM), an early discipline devel- Request. All unmatched inputs send using global contention logic. Although
oped by the Digital Equipment Corp. for requests to every output to which more difficult to implement, this scheme
a 16-port, 1-Gbit-per-second switch they have packets to send. yields lower packet delay at the expense
(http://www.cs.washington.edu/homes/ Grant. Each output selects a request- of longer switching intervals. It also
tom/atm.html). PIM and its popular ing input that coincides with a pre- introduces lower connectivity, global
derivatives use randomness to avoid star- defined priority sequence. A pointer contention resolution, and inherent QoS
vation and maximize the matching pro- indicates the current location of the support.
cess. Unmatched inputs and outputs highest-priority elements and, if
contend during each time slot in a three- accepted, increments (modulo N) to
step process. one beyond the granted input. ultiterabit packet-switched net-

Request. All unmatched inputs send


requests to every output to which
Accept. Each input selects one
granting output according to a pre-
defined priority order. A unique
M works will require high-perfor-
mance scheduling algorithms and
architectures. With port densities and
they have packets to send. pointer indicates the position of the data rates growing at an unprecedented
Grant. Each output randomly highest-priority element and incre- rate, future prioritized scheduling
selects one of its requesting inputs. ments (modulo N) to one location schemes will be necessary to pragmati-
Accept. Each input randomly selects beyond the accepted output. cally scale toward multiterabit capacities.
a single output from among those Further, support of strict QoS require-
outputs that granted it. ments for the diverse traffic loads char-
Advanced scheduling acterizing emerging multimedia Internet
By eliminating the matched pairs in schemes exploit traffic will increase. Continuous im-
each iteration, PIM assures conver- concurrency and provements in VLSI and optical tech-
gence to a maximal match in O(log N) distributed computation nologies will stimulate innovative
iterations. It also ensures that all solutions to the intricate packet-sched-
to offer a faster, more
requests are eventually granted. How- uling task.
ever, PIM has significant drawbacks,
efficient decision process.
principally large queuing latencies in
the presence of traffic loads exceeding Itamar Elhanany is a PhD student in the
60 percent of the maximal switch Instead of updating after every grant, the Computer and Electrical Engineering
capacity (calculated as number of ports outer pointer updates only if an input Department at Ben-Gurion University.
multiplied by a port data rate). More- accepts the grant. Contact him at itamar@ieee.org.
over, PIMs inability to provide priori- iSLIP significantly reduces pointer syn-
tized QoS and its requirement for chronization and accordingly increases
O(N2) connectivity make it impractical throughput with a lower average packet Michael Kahane is an MSc student in the
for modern switching cores. delay. The algorithm does, however, suf- Computer and Electrical Engineering
Developers designed the round-robin fer from degraded performance in the Department at Ben-Gurion University.
matching (RRM) algorithm to over- presence of nonuniform and bursty traffic Contact him at michaelk@bgumail.bgu.
come PIMs disadvantages in terms of flows, lack of inherent QoS support, and ac.il.
both fairness and complexity. Instead of limited scalability with respect to high
arbitrating randomly, RRM makes port densities. Despite its weaknesses,
selections based on a prescribed rotat- iSLIPs low implementation complexity Dan Sadot is professor and head of the
ing priority discipline. Two pointers promotes its extensive deployment side by Communications Systems Engineering
update after every grant and side with various crosspoint switches. Department at Ben-Gurion University.
accept. RRM is a minor improvement Contact him at sadot@ee.bgu.ac.il.
over PIM, but its overall performance Central prioritized scheduling
remains poor under nonuniformly dis- The independent pointers of request-
tributed traffic loads because of pointer grant-accept-based algorithms such as
synchronization. PIM and iSLIP tend to synchronize,
degrading overall performance. An alter-
iSLIP native approach, based on a centralized
iSLIP, the widely implemented iterative scheduling module, replaces the pointer
algorithm developed by Stanford Uni- mechanisms with a priority-generating Editor: Upkar Varshney, Department of CIS,
versitys Nick McKeown (http://www-ee. function at each ingress port. During Georgia State University, Atlanta, GA
stanford.edu/~nickm), is a popular descen- each time slot, the central module gath- 30002-4015; voice +1 404 463 9139; fax +1
dant of PIM and RRM that consists of the ers prioritized requests from the input 404 651 3842; uvarshney@gsu.edu
following steps: ports and produces matching decisions

106 Computer

You might also like