Professional Documents
Culture Documents
Network Systems Design (CS490N) : Douglas Comer Computer Science Department Purdue University West Lafayette, IN 47907
Network Systems Design (CS490N) : Douglas Comer Computer Science Department Purdue University West Lafayette, IN 47907
(CS490N)
Douglas Comer
Computer Science Department
Purdue University
West Lafayette, IN 47907
http://www.cs.purdue.edu/people/comer
I
Course Introduction
And Overview
CS490N -- Chapt. 1
2003
CS490N -- Chapt. 1
2003
Network systems
Concept
Software implementation
Special languages
Switching fabrics
CS490N -- Chapt. 1
2003
Cross-development environment
CS490N -- Chapt. 1
2003
Economic details
Price points
CS490N -- Chapt. 1
2003
Background Required
Basic knowledge of
Packet headers
Registers
Memory organization
CS490N -- Chapt. 1
2003
Schedule Of Topics
Quick review of basic networking
Protocol processing tasks and classification
Software-based systems using conventional hardware
Special-purpose hardware for high speed
Motivation and role of network processors
Network processor architectures
CS490N -- Chapt. 1
2003
Schedule Of Topics
(continued)
An example network processor technology in detail
Programming model
Design tradeoffs
Scaling a network processor
Classification languages and programs
CS490N -- Chapt. 1
2003
Course Administration
Textbook
Grade
Quizzes 5%
CS490N -- Chapt. 1
2003
Agere Systems
IBM
Intel
CS490N -- Chapt. 1
10
2003
CS490N -- Chapt. 1
11
2003
Programming Projects
A packet analyzer
IP datagrams
TCP segments
An Ethernet bridge
An IP fragmenter
A classification program
A bump-in-the-wire system using low-level packet
processors
CS490N -- Chapt. 1
12
2003
Questions?
A QUICK OVERVIEW
OF NETWORK PROCESSORS
CS490N -- Chapt. 1
14
2003
CS490N -- Chapt. 1
15
2003
The Challenge
CS490N -- Chapt. 1
16
2003
What speeds?
CS490N -- Chapt. 1
17
2003
What applications?
CS490N -- Chapt. 1
18
2003
The Challenge
(restated)
CS490N -- Chapt. 1
19
2003
Special Difficulties
Ambitious goal
Vague problem statement
Problem is evolving with the solution
Pressure from
Changing infrastructure
Changing applications
CS490N -- Chapt. 1
20
2003
Desiderata
High speed
Flexible and extensible to accommodate
Arbitrary protocols
Arbitrary applications
Low cost
CS490N -- Chapt. 1
21
2003
Desiderata
High speed
Flexible and extensible to accommodate
Arbitrary protocols
Arbitrary applications
Low cost
CS490N -- Chapt. 1
21
2003
Statement Of Hope
(1995 version)
CS490N -- Chapt. 1
22
2003
Statement Of Hope
(1999 version)
???
CS490N -- Chapt. 1
22
2003
Statement Of Hope
(2003 version)
programmers!
If there is hope, it lies in ASIC designers.
CS490N -- Chapt. 1
22
2003
Programmability
Key to low-cost hardware for next generation network
systems
More flexibility than ASIC designs
Easier / faster to update than ASIC designs
Less expensive to develop than ASIC designs
What we need: a programmable device with more capability
than a conventional CPU
CS490N -- Chapt. 1
23
2003
CS490N -- Chapt. 1
24
2003
CS490N -- Chapt. 1
25
2003
Disclaimer #1
CS490N -- Chapt. 1
26
2003
Definition
CS490N -- Chapt. 1
27
2003
By Definition
CS490N -- Chapt. 1
28
2003
In Our Defense
CS490N -- Chapt. 1
29
2003
Questions?
II
Basic Terminology And Example Systems
(A Quick Review)
CS490N -- Chapt. 2
2003
Generic term
Travels independently
CS490N -- Chapt. 2
2003
IP datagram
Internet packet
CS490N -- Chapt. 2
2003
Types Of Networks
Paradigm
Connectionless
Connection-oriented
Access type
Point-To-Point
CS490N -- Chapt. 2
2003
Point-To-Point Network
Connects exactly two systems
Often used for long distance
Example: data circuit connecting two routers
CS490N -- Chapt. 2
2003
Data Circuit
Leased from phone company
Also called serial line because data is transmitted bitserially
Originally designed to carry digital voice
Cost depends on speed and distance
T-series standards define low speeds (e.g. T1)
STS and OC standards define high speeds
CS490N -- Chapt. 2
2003
T1
T3
OC-1
OC-3
OC-12
OC-24
OC-48
OC-192
OC-768
CS490N -- Chapt. 2
Bit Rate
Voice Circuits
0.064 Mbps
1.544 Mbps
44.736 Mbps
51.840 Mbps
155.520 Mbps
622.080 Mbps
1,244.160 Mbps
2,488.320 Mbps
9,953.280 Mbps
39,813.120 Mbps
1
24
672
810
2430
9720
19440
38880
155520
622080
2003
T1
T3
OC-1
OC-3
OC-12
OC-24
OC-48
OC-192
OC-768
Bit Rate
Voice Circuits
0.064 Mbps
1.544 Mbps
44.736 Mbps
51.840 Mbps
155.520 Mbps
622.080 Mbps
1,244.160 Mbps
2,488.320 Mbps
9,953.280 Mbps
39,813.120 Mbps
1
24
672
810
2430
9720
19440
38880
155520
622080
CS490N -- Chapt. 2
2003
Signaling
Unimportant to us
Layer 2 standards
CS490N -- Chapt. 2
2003
MAC Addressing
Three address types
CS490N -- Chapt. 2
2003
More Terminology
Internet
Network scope
CS490N -- Chapt. 2
10
2003
Network System
Individual hardware component
Serves as fundamental building block
Used in networks and internets
May contain processor and software
Operates at one or more layers of the protocol stack
CS490N -- Chapt. 2
11
2003
Bridge
Ethernet switch
VLAN switch
CS490N -- Chapt. 2
12
2003
VLAN Switch
Similar to conventional layer 2 switch
CS490N -- Chapt. 2
13
2003
Broadcast Domain
Determines propagation of broadcast / multicast
Originally corresponded to fixed hardware
CS490N -- Chapt. 2
14
2003
Layer 4
TCP terminator
CS490N -- Chapt. 2
15
2003
Firewall
Application gateway
Set-top box
CS490N -- Chapt. 2
16
2003
Traffic monitor
Traffic policer
Traffic shaper
CS490N -- Chapt. 2
17
2003
Questions?
III
Review Of Protocols And Packet Formats
CS490N -- Chapt. 3
2003
Protocol Layering
Application
Layer 5
Transport
Layer 4
Internet
Layer 3
Network Interface
Layer 2
Physical
Layer 1
CS490N -- Chapt. 3
2003
Layer 2 Protocols
Two protocols are important
Ethernet
ATM
CS490N -- Chapt. 3
2003
Ethernet Addressing
48-bit addressing
Unique address assigned to each station (NIC)
Destination address in each packet can specify delivery to
CS490N -- Chapt. 3
2003
Ethernet Addressing
(continued)
Broadcast address is all 1s
Single bit determines whether remaining addresses are
unicast or multicast
multicast bit
x x x x x x xm x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
CS490N -- Chapt. 3
2003
Source Frame
Address Type
Data In Frame
46 - 1500
Header
Payload
CS490N -- Chapt. 3
2003
Internet
Set of (heterogeneous) computer networks interconnected by
IP routers
End-user computers, called hosts, each attach to specific
network
Protocol software
CS490N -- Chapt. 3
2003
Layer 3
Layer 4
CS490N -- Chapt. 3
2003
IP Datagram Format
0
4
VERS
8
HLEN
16
SERVICE
24
31
TOTAL LENGTH
ID
TTL
19
FLAGS
TYPE
F. OFFSET
HDR CHECKSUM
SOURCE
DESTINATION
IP OPTIONS (MAY BE OMITTED)
PADDING
BEGINNING OF PAYLOAD
..
.
CS490N -- Chapt. 3
2003
IP Datagram Fields
Field
Meaning
VERS
HLEN
SERVICE
TOTAL LENGTH
ID
FLAGS
F. OFFSET
TTL
TYPE
HDR CHECKSUM
SOURCE
DESTINATION
IP OPTIONS
PADDING
CS490N -- Chapt. 3
10
2003
IP addressing
32-bit Internet address assigned to each computer
Virtual, hardware independent value
Prefix identifies network; suffix identifies host
Network systems use address mask to specify boundary
between prefix and suffix
CS490N -- Chapt. 3
11
2003
Next-Hop Forwarding
Routing table
Route lookup
CS490N -- Chapt. 3
12
2003
Next-Hop Forwarding
Routing table
Route lookup
CS490N -- Chapt. 3
12
2003
16
31
SOURCE PORT
DESTINATION PORT
MESSAGE LENGTH
CHECKSUM
BEGINNING OF PAYLOAD
.
.
.
Field
Meaning
SOURCE PORT
DESTINATION PORT
MESSAGE LENGTH
CHECKSUM
ID of sending application
ID of receiving application
Length of datagram including the header
Ones-complement checksum over entire datagram
CS490N -- Chapt. 3
13
2003
10
16
SOURCE PORT
24
31
DESTINATION PORT
SEQUENCE
ACKNOWLEDGEMENT
HLEN
NOT USED
CODE BITS
WINDOW
CHECKSUM
URGENT PTR
PADDING
BEGINNING OF PAYLOAD
..
.
Sent end-to-end
Fixed-size fields make parsing efficient
CS490N -- Chapt. 3
14
2003
Meaning
SOURCE PORT
DESTINATION PORT
SEQUENCE
ACKNOWLEDGEMENT
HLEN
NOT USED
CODE BITS
WINDOW
CHECKSUM
URGENT PTR
OPTIONS
PADDING
ID of sending application
ID of receiving application
Sequence number for data in payload
Acknowledgement of data received
Header length measured in 32-bit units
Currently unassigned
URGENT, ACK, PUSH, RESET, SYN, FIN
Receivers buffer size for additional data
Ones-complement checksum over entire segment
Pointer to urgent data in segment
Special handling
To make options a 32-bit multiple
CS490N -- Chapt. 3
15
2003
Illustration Of Encapsulation
UDP HEADER
IP HEADER
ETHERNET HDR.
UDP PAYLOAD
IP PAYLOAD
ETHERNET PAYLOAD
CS490N -- Chapt. 3
16
2003
16
24
31
OPERATION
CS490N -- Chapt. 3
17
2003
End Of Review
Questions?
IV
Conventional Computer Hardware Architecture
CS490N -- Chapt. 4
2003
Allocates memory
CS490N -- Chapt. 4
2003
Present
CS490N -- Chapt. 4
2003
CS490N -- Chapt. 4
2003
CS490N -- Chapt. 4
2003
Serious Question
Which is growing faster?
Processing power
Network bandwidth
CS490N -- Chapt. 4
2003
Growth Of Technologies
10 Gbps
OC-192
2.4 Gbps
OC-48
10,000
1,000
100
10
.. . .. .. .. .
. .. .. . . .. ..
.........
...
...........
.
.
.
.
.
....
.
.
.
..
....
....
.
.
.
.
.
.
.
.
622 Mbps
....
....
.
.
.
.
.
.
. Pent.-3GHz
OC-12 .....
....
.
.
.
.
.
.
.
.
....
....
.
.
.
.
.
.
.
.
.
...
....
.. ...
.
.
.
.
.
.
.
.
.
.
..
..
.. ... ...
.. ...
100 Mbps
.
.
.
.
.
.
.
.
.
.
.
Pent.-400
...
. ..
FDDI
.. ... ...
. .. ..
.
.
.
.
.
.
.
.
.
.
.
.
..
... .
. .. .. . . ..
.. ...
.
.
.
.
.
.
.
.
.
.
.
Pent.-166
.
. .. .
...
. . . ... .. .
.
.
.
.
.
.
.
.
.
.
.
...
486-33
...............
.
.
.
.
.
.
.
486-66
......... .....
.
.
...
.
.
.
...
.
.
..
10 Mpbs
Ethernet
1990
CS490N -- Chapt. 4
1992
1994
1996
6
1998
2000
2002
2003
Processor
Memory
I/O interfaces
CS490N -- Chapt. 4
2003
Illustration Of Conventional
Computer Architecture
CPU
MEMORY
bus
. . .
CS490N -- Chapt. 4
2003
control lines
...
address lines
...
data lines
An address of K bits
CS490N -- Chapt. 4
2003
Bus Width
Wider bus
Costs more
CS490N -- Chapt. 4
2003
Bus Paradigm
Only two basic operations
Fetch
Store
CS490N -- Chapt. 4
10
2003
Fetch/Store
Fundamental paradigm
Used throughout hardware, including network processors
CS490N -- Chapt. 4
11
2003
Fetch Operation
Place address of a device on address lines
Issue fetch on control lines
Wait for device that owns the address to respond
If successful, extract value (response) from data lines
CS490N -- Chapt. 4
12
2003
Store Operation
Place address of a device on address lines
Place value on data lines
Issue store on control lines
Wait for device that owns the address to respond
If unsuccessful, report error
CS490N -- Chapt. 4
13
2003
Stop disk
CS490N -- Chapt. 4
14
2003
Returns an error
CS490N -- Chapt. 4
15
2003
I/O device
CS490N -- Chapt. 4
16
2003
CS490N -- Chapt. 4
17
2003
memory
CS490N -- Chapt. 4
18
2003
Network I/O On
Conventional Hardware
Network Interface Card (NIC)
CS490N -- Chapt. 4
19
2003
CS490N -- Chapt. 4
20
2003
Discard others
CS490N -- Chapt. 4
21
2003
CS490N -- Chapt. 4
22
2003
NIC
CS490N -- Chapt. 4
23
2003
Buffer Chaining
CPU
NIC
CS490N -- Chapt. 4
24
2003
Operation Chaining
CPU
NIC
CS490N -- Chapt. 4
25
2003
Illustration Of
Operation Chaining
r
packet buffer
packet buffer
packet buffer
CS490N -- Chapt. 4
26
2003
NIC
memory
data leaves
CS490N -- Chapt. 4
27
2003
Summary
Software-based network systems run on conventional
hardware
Processor
Memory
I/O devices
Bus
CS490N -- Chapt. 4
28
2003
Questions?
V
Basic Packet Processing:
Algorithms And Data Structures
CS490N -- Chapt. 5
2003
Copying
Used when packet moved from one memory location to
another
Expensive
Must be avoided whenever possible
CS490N -- Chapt. 5
2003
Buffer Allocation
Possibilities
Variable-size buffers
CS490N -- Chapt. 5
2003
Buffer Addressing
Buffer address must be resolvable in all contexts
Easiest implementation: keep buffers in kernel space
CS490N -- Chapt. 5
2003
Integer Representation
Two standards
CS490N -- Chapt. 5
2003
little endian
big endian
CS490N -- Chapt. 5
2003
Integer Conversion
Needed when heterogeneous computers communicate
Protocols define network byte order
Computers convert to network byte order
Typical library functions
Function
ntohs
htons
ntohl
htonl
CS490N -- Chapt. 5
data size
16 bits
16 bits
32 bits
32 bits
Translation
Network byte order to hosts byte order
Hosts byte order to network byte order
Network byte order to hosts byte order
Hosts byte order to network byte order
2003
Ethernet bridge
Layer 3
IP forwarding
Layer 4
Other
CS490N -- Chapt. 5
2003
CS490N -- Chapt. 5
2003
CS490N -- Chapt. 5
2003
CS490N -- Chapt. 5
2003
CS490N -- Chapt. 5
2003
CS490N -- Chapt. 5
2003
Ethernet Bridge
Used between a pair of Ethernets
Provides transparent connection
Listens in promiscuous mode
Forwards frames in both directions
Uses addresses to filter
CS490N -- Chapt. 5
2003
Bridge Filtering
Uses source address in frames to identify computers on each
network
Uses destination address to decide whether to forward frame
CS490N -- Chapt. 5
10
2003
Bridge Algorithm
Assume: two network interfaces each operating in promiscuous
mode.
Create an empty list, L, that will contain pairs of values;
Do forever {
Acquire the next frame to arrive;
Set I to the interface over which the frame arrived;
Extract the source address, S;
Extract the destination address, D;
Add the pair ( S, I ) to list L if not already present.
If the pair ( D, I ) appears in list L {
Drop the frame;
} Else {
Forward the frame over the other interface;
}
}
CS490N -- Chapt. 5
11
2003
CS490N -- Chapt. 5
12
2003
Hashing
Optimizes number of probes
Works well if table not full
Practical technique: double hashing
CS490N -- Chapt. 5
13
2003
Hashing Algorithm
Given: a key, a table in memory, and the table size N.
Produce: a slot in the table that corresponds to the key
or an empty table slot if the key is not in the table.
Method: double hashing with open addressing.
Choose P1 and P2 to be prime numbers;
Fold the key to produce an integer, K;
Compute table pointer Q equal to ( P1 K ) modulo N;
Compute increment R equal to ( P2 K ) modulo N;
While (table slot Q not equal to K and nonempty ) {
Q (Q + R) modulo N;
}
At this point, Q either points to an empty table slot or to the
slot containing the key.
CS490N -- Chapt. 5
14
2003
Address Lookup
Computer can compare integer in one operation
Network address can be longer than integer (e.g., 48 bits)
Two possibilities
CS490N -- Chapt. 5
15
2003
Folding
Maps N-bit value into M-bit key, M < N
Typical technique: exclusive or
Potential problem: two values map to same key
Solution: compare full value when key matches
CS490N -- Chapt. 5
16
2003
IP Forwarding
Used in hosts as well as routers
Conceptual mapping
(next hop, interface) f(datagram, routing table)
Table driven
CS490N -- Chapt. 5
17
2003
IP Routing Table
One entry per destination
Entry contains
CS490N -- Chapt. 5
18
2003
Address
Mask
Next-Hop
Address
Interface
Number
255.255.255.0
255.255.0.0
0.0.0.0
128.210.30.5
128.210.141.12
128.210.30.5
2
1
2
CS490N -- Chapt. 5
19
2003
IP Forwarding Algorithm
Given: destination address A and routing table R.
Find: a next hop and interface used to route datagrams to A.
For each entry in table R {
Set MASK to the Address Mask in the entry;
Set DEST to the Destination Address in the entry;
If (A & MASK) == DEST {
Stop; use the next hop and interface in the entry;
}
}
If this point is reached, declare error: no route exists;
CS490N -- Chapt. 5
20
2003
IP Fragmentation
Needed when datagram larger than network MTU
Divides IP datagram into fragments
Uses FLAGS bits in datagram header
0
FLAGS bits
CS490N -- Chapt. 5
21
2003
IP Fragmentation Algorithm
(Part 1: Initialization)
Given: an IP datagram, D, and a network MTU.
Produce: a set of fragments for D.
If the DO NOT FRAGMENT bit is set {
Stop and report an error;
}
Compute the size of the datagram header, H;
Choose N to be the largest multiple of 8 such
that H+N MTU;
Initialize an offset counter, O, to zero;
CS490N -- Chapt. 5
22
2003
IP Fragmentation Algorithm
(Part 2: Processing)
Repeat until datagram empty {
Create a new fragment that has a copy of Ds header;
Extract up to the next N octets of data from D and place
the data in the fragment;
Set the MORE FRAGMENTS bit in fragment header;
Set TOTAL LENGTH field in fragment header to be H+N;
Set FRAGMENT OFFSET field in fragment header to O;
Compute and set the CHECKSUM field in fragment
header;
Increment O by N/8;
}
CS490N -- Chapt. 5
23
2003
Reassembly
Complement of fragmentation
Uses IP SOURCE ADDRESS and IDENTIFICATION fields
in datagram header to group related fragments
Joins fragments to form original datagram
CS490N -- Chapt. 5
24
2003
Reassembly Algorithm
Given: a fragment, F, add to a partial reassembly.
Method: maintain a set of fragments for each datagram.
Extract the IP source address, S, and ID fields from F;
Combine S and ID to produce a lookup key, K;
Find the fragment set with key K or create a new set;
Insert F into the set;
If the set contains all the data for the datagram {
Form a completely reassembled datagram and process it;
}
CS490N -- Chapt. 5
25
2003
80
fragment in
reassembly buffer
reassembly buffer
CS490N -- Chapt. 5
40
26
2003
TCP Connection
Involves a pair of endpoints
Started with SYN segment
Terminated with FIN or RESET segment
Identified by 4-tuple
( src addr, dest addr, src port, dest port )
CS490N -- Chapt. 5
27
2003
CS490N -- Chapt. 5
28
2003
CS490N -- Chapt. 5
29
2003
TCP Splicing
Join two TCP connections
Allow data to pass between them
To avoid termination overhead translate segment header
fields
Acknowledgement number
Sequence number
CS490N -- Chapt. 5
30
2003
Host
A
TCP connection #1
sequence 200
CS490N -- Chapt. 5
splicer
sequence 50
TCP connection #2
sequence 860
sequence 1200
Connection
& Direction
Sequence
Number
Connection
& Direction
Sequence
Number
Incoming #1
Outgoing #2
200
860
Incoming #2
Outgoing #1
1200
50
Change
660
Change
-1150
31
Host
B
2003
CS490N -- Chapt. 5
32
2003
CS490N -- Chapt. 5
33
2003
Summary
Packet processing algorithms include
Ethernet bridging
IP forwarding
TCP splicing
CS490N -- Chapt. 5
34
2003
Questions?
CS490N -- Chapt. 5
36
2003
VI
Packet Processing Functions
CS490N -- Chapt. 6
2003
Goal
Identify functions that occur in packet processing
Devise set of operations sufficient for all packet processing
Find an efficient implementation for the operations
CS490N -- Chapt. 6
2003
CS490N -- Chapt. 6
2003
CS490N -- Chapt. 6
2003
Checksum
CRC
CS490N -- Chapt. 6
2003
CS490N -- Chapt. 6
2003
CS490N -- Chapt. 6
2003
Assigned on output
CS490N -- Chapt. 6
2003
Packet Classification
Alternative to demultiplexing
Crosses multiple layers
Achieves lower cost
More on classification later in the course
CS490N -- Chapt. 6
2003
CS490N -- Chapt. 6
10
2003
Queueing Priorities
Multiple queues used to enforce priority among packets
Incoming packet
Queueing discipline
CS490N -- Chapt. 6
11
2003
CS490N -- Chapt. 6
12
2003
CS490N -- Chapt. 6
13
2003
CS490N -- Chapt. 6
14
2003
Multiple processors
CS490N -- Chapt. 6
15
2003
Confidentiality mechanisms
CS490N -- Chapt. 6
16
2003
CS490N -- Chapt. 6
17
2003
Traffic Shaping
Make traffic conform to statistical bounds
Typical use
Smooth bursts
Only possibilities
Delay packets
CS490N -- Chapt. 6
18
2003
Easy to implement
Popular
CS490N -- Chapt. 6
19
2003
CS490N -- Chapt. 6
20
2003
packets
leave
Packets
Arrive in bursts
CS490N -- Chapt. 6
21
2003
Timer Management
Fundamental piece of network system
Needed for
Scheduling
Traffic shaping
Cost
Can be high
CS490N -- Chapt. 6
22
2003
Summary
Primary packet processing functions are
Security functions
CS490N -- Chapt. 6
23
2003
Questions?
VII
Protocol Software On A
Conventional Processor
CS490N -- Chapt. 7
2003
Possible Implementations Of
Protocol Software
In an application program
Easy to program
CS490N -- Chapt. 7
2003
Possible Implementations Of
Protocol Software
(continued)
In an embedded system
CS490N -- Chapt. 7
2003
Possible Implementations Of
Protocol Software
(continued)
In an embedded system
CS490N -- Chapt. 7
2003
Possible Implementations Of
Protocol Software
(continued)
In an operating system kernel
CS490N -- Chapt. 7
2003
Asynchronous
Synchronous
Blocking
Polling
CS490N -- Chapt. 7
2003
Asynchronous API
Also known as event-driven
Programmer
CS490N -- Chapt. 7
2003
CS490N -- Chapt. 7
2003
CS490N -- Chapt. 7
2003
Embedded systems
Operating systems
Asynchronous API
CS490N -- Chapt. 7
2003
Includes
I/O functions
CS490N -- Chapt. 7
10
2003
on_startup()
on_shutdown()
recv_frame()
Output functions
new_fbuf()
send_frame()
CS490N -- Chapt. 7
11
2003
delayed_call()
periodic_call()
cancel_call()
console_command()
CS490N -- Chapt. 7
12
2003
Processing Priorities
Determine which code CPU runs at any time
General idea
CS490N -- Chapt. 7
13
2003
Illustration Of Priorities
Applications
lowest priority
protocol
processing
medium priority
device drivers
handling frames
highest priority
packet queue
between levels
NIC1
CS490N -- Chapt. 7
NIC2
14
2003
Implementation Of Priorities
In An Operating System
Two possible approaches
Interrupt mechanism
Kernel threads
CS490N -- Chapt. 7
15
2003
Interrupt Mechanism
Built into hardware
Operates asynchronously
Saves current processing state
Changes processor status
Branches to specified location
CS490N -- Chapt. 7
16
2003
Software interrupt
CS490N -- Chapt. 7
17
2003
Hardware interrupts
CS490N -- Chapt. 7
18
2003
Kernel Threads
Alternative to interrupts
Familiar to programmer
Finer-grain control than software interrupts
Can be assigned arbitrary range of priorities
CS490N -- Chapt. 7
19
2003
Conceptual Organization
Packet passes among multiple threads of control
Queue of packets between each pair of threads
Threads synchronize to access queues
CS490N -- Chapt. 7
20
2003
Possible Organization Of
Kernel Threads For Layered Protocols
One thread per layer
One thread per protocol
Multiple threads per protocol
Multiple threads per protocol plus timer management
thread(s)
One thread per packet
CS490N -- Chapt. 7
21
2003
CS490N -- Chapt. 7
22
2003
applications
app. sends
app. receives
queue
T4
Layer 4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
queue
T3
Layer 3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
queue
T2
packets arrive
CS490N -- Chapt. 7
Layer 2
packets leave
23
2003
Finer-grain control
CS490N -- Chapt. 7
24
2003
queue
queue
udp
tcp
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CS490N -- Chapt. 7
25
2003
CS490N -- Chapt. 7
26
2003
Illustration Of Multiple
Threads Used With TCP
applications
queue
tcp
timer thread
tim.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CS490N -- Chapt. 7
27
2003
TCP
*
Retransmission timeout
2MSL timeout
ARP
*
IP
*
CS490N -- Chapt. 7
Reassembly timeout
28
2003
CS490N -- Chapt. 7
29
2003
Yes
In practice
CS490N -- Chapt. 7
30
2003
Small-granularity timer
CS490N -- Chapt. 7
31
2003
Thread Synchronization
Thread for layer i
CS490N -- Chapt. 7
32
2003
Thread Synchronization
Thread for layer i
CS490N -- Chapt. 7
32
2003
Context Switch
OS function
CPU passes from current thread to a waiting thread
High cost
Must be minimized
CS490N -- Chapt. 7
33
2003
CS490N -- Chapt. 7
34
2003
Summary
Packet processing software usually runs in OS
API can be synchronous or asynchronous
Priorities achieved with
Software interrupts
Threads
CS490N -- Chapt. 7
35
2003
Questions?
VIII
Hardware Architectures
For Protocol Processing
And
Aggregate Rates
CS490N -- Chapt. 8
2003
A Brief History Of
Computer Hardware
1940s
Beginnings
1950s
1960s
CS490N -- Chapt. 8
2003
I/O Processing
Evolved from after-thought to central influence
Low-end systems (e.g., microcontrollers)
CS490N -- Chapt. 8
2003
I/O Processing
(continued)
Mid-range systems (e.g., minicomputers)
CPU
*
Device
*
CS490N -- Chapt. 8
2003
I/O Processing
(continued)
High-end systems (e.g., mainframes)
CS490N -- Chapt. 8
2003
CS490N -- Chapt. 8
2003
CS490N -- Chapt. 8
2003
Protocol Processing In
First Generation Network Systems
NIC1
Standard CPU
NIC2
framing &
address
recognition
all other
processing
framing &
address
recognition
CS490N -- Chapt. 8
2003
CS490N -- Chapt. 8
2003
Aggregate rate
Aggregate rate
CS490N -- Chapt. 8
10
2003
1960: 10 Kbps
1970: 1 Mbps
1980: 10 Mbps
CS490N -- Chapt. 8
11
2003
1960: 10 Kbps
1970: 1 Mbps
1980: 10 Mbps
Soon: 10 Gbps???
CS490N -- Chapt. 8
12
2003
Aggregate
CS490N -- Chapt. 8
12
2003
CS490N -- Chapt. 8
13
2003
Per-bit processing
Per-packet processing
CS490N -- Chapt. 8
14
2003
For protocol processing tasks that have a fixed cost per packet,
the number of packets processed is more important than the
aggregate data rate.
CS490N -- Chapt. 8
15
2003
Network
Data Rate
In Gbps
Packet Rate
For Small Packets
In Kpps
Packet Rate
For Large Packets
In Kpps
10Base-T
100Base-T
OC-3
OC-12
1000Base-T
OC-48
OC-192
OC-768
0.010
0.100
0.156
0.622
1.000
2.488
9.953
39.813
19.5
195.3
303.8
1,214.8
1,953.1
4,860.0
19,440.0
77,760.0
0.8
8.2
12.8
51.2
82.3
204.9
819.6
3,278.4
CS490N -- Chapt. 8
16
2003
77760.0
19440.0
104 Kpps
4860.0
1953.1
1214.8
103 Kpps
195.3
303.8
102 Kpps
19.5
101 Kpps
100 Kpps
10Base-T 100Base-T
CS490N -- Chapt. 8
OC-3
17
OC-192
OC-768
2003
77760.0
19440.0
104 Kpps
4860.0
1953.1
1214.8
103 Kpps
195.3
303.8
102 Kpps
19.5
101 Kpps
100 Kpps
10Base-T 100Base-T
OC-3
OC-192
OC-768
17
2003
10Base-T
100Base-T
OC-3
OC-12
1000Base-T
OC-48
OC-192
OC-768
51.20
5.12
3.29
0.82
0.51
0.21
0.05
0.01
1,214.40
121.44
78.09
19.52
12.14
4.88
1.22
0.31
CS490N -- Chapt. 8
18
2003
Conclusion
CS490N -- Chapt. 8
19
2003
CS490N -- Chapt. 8
20
2003
Fine-Grain Parallelism
Multiple processors
Instruction-level parallelism
Example:
CS490N -- Chapt. 8
21
2003
CS490N -- Chapt. 8
22
2003
Assessment
CS490N -- Chapt. 8
23
2003
Special-Purpose Coprocessors
Special-purpose hardware
Added to conventional processor to speed computation
Invoked like software subroutine
Typical implementation: ASIC chip
Choose operations that yield greatest improvement in speed
CS490N -- Chapt. 8
24
2003
General Principle
To optimize computation, move operations that account for the
most CPU time from software into hardware.
CS490N -- Chapt. 8
25
2003
General Principle
To optimize computation, move operations that account for the
most CPU time from software into hardware.
CS490N -- Chapt. 8
25
2003
Onboard buffering
DMA
CS490N -- Chapt. 8
26
2003
Effectively a multiprocessor
CS490N -- Chapt. 8
27
2003
Standard CPU
Smart NIC2
all other
processing
CS490N -- Chapt. 8
28
2003
Cell Switching
Alternative to new hardware
Changes
Basic paradigm
Connection-oriented
CS490N -- Chapt. 8
29
2003
Example: ATM
CS490N -- Chapt. 8
30
2003
Data Pipeline
Move each packet through series of processors
Each processor handles some tasks
Assessment
CS490N -- Chapt. 8
31
2003
stage 2
stage 3
packets leave
the pipeline
stage 4
stage 5
CS490N -- Chapt. 8
32
2003
Summary
Packet rate can be more important than data rate
Highest packet rate achieved with smallest packets
Rates measured per interface or aggregate
Special hardware needed for highest-speed network systems
CS490N -- Chapt. 8
33
2003
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
Questions?
IX
Classification
And
Forwarding
CS490N -- Chapt. 9
2003
Recall
Packet demultiplexing
CS490N -- Chapt. 9
2003
CS490N -- Chapt. 9
2003
Packet Classification
Alternative to demultiplexing
Designed for higher speed
Considers all layers at the same time
Linear in number of fields
Two possible implementations
Software
Hardware
CS490N -- Chapt. 9
2003
Example Classification
Classify Ethernet frames carrying traffic to Web server
Specify exact header contents in rule set
Example
CS490N -- Chapt. 9
2003
Example Classification
(continued)
Field sizes and values
2-octet IP type is 6
CS490N -- Chapt. 9
2003
10
16
19
24
31
HLEN
SERVICE
IP IDENT
IP TTL
ETHERNET TYPE
IP TOTAL LENGTH
FLAGS
IP TYPE
FRAG. OFFSET
IP HDR. CHECKSUM
IP SOURCE ADDRESS
IP DESTINATION ADDRESS
TCP SOURCE PORT
TCP ACKNOWLEDGEMENT
HLEN
NOT USED
CODE BITS
TCP WINDOW
TCP CHECKSUM
CS490N -- Chapt. 9
2003
Software Implementation
Of Classification
Compare values in header fields
Conceptually a logical and of all field comparisons
Example
if ( (frame type == 0x0800) && (IP type == 6) && (TCP port == 80) )
declare the packet matches the classification;
else
declare the packet does not match the classification;
CS490N -- Chapt. 9
2003
CS490N -- Chapt. 9
2003
Example Of Optimizing
Software Classification
Assume
CS490N -- Chapt. 9
10
2003
Example Of Optimizing
Software Classification
(continued)
if ( (TCP port == 80) && (IP type == 6) && (frame type == 0x0800) )
declare the packet matches the classification;
else
declare the packet does not match the classification;
At each step, test the field that will eliminate the most
packets
CS490N -- Chapt. 9
11
2003
CS490N -- Chapt. 9
12
2003
Concatenate bits
Perform comparison
CS490N -- Chapt. 9
13
2003
constant to compare
comparator
result of comparison
CS490N -- Chapt. 9
14
2003
CS490N -- Chapt. 9
15
2003
In Practice
Classification usually involves multiple categories
Packets grouped together into flows
May have a default category
Each category specified with rule set
CS490N -- Chapt. 9
16
2003
CS490N -- Chapt. 9
17
2003
Rule Sets
Web server traffic
2-octet IP type is 6
2-octet IP type is 1
CS490N -- Chapt. 9
18
2003
CS490N -- Chapt. 9
19
2003
CS490N -- Chapt. 9
20
2003
Rule Set 2
CS490N -- Chapt. 9
21
2003
CS490N -- Chapt. 9
22
2003
Hybrid Classification
packets classified into
flows by hardware
...
...
hardware
classifier
packets arrive
for classification
software
classifier
packets unrecognized
by hardware
exit for
unclassified packets
CS490N -- Chapt. 9
23
2003
Dynamic
CS490N -- Chapt. 9
24
2003
CS490N -- Chapt. 9
25
2003
IP source address
IP destination address
CS490N -- Chapt. 9
26
2003
CS490N -- Chapt. 9
27
2003
CS490N -- Chapt. 9
28
2003
Flow Identification
Connection-oriented network
Connectionless network
CS490N -- Chapt. 9
29
2003
CS490N -- Chapt. 9
30
2003
CS490N -- Chapt. 9
31
2003
Interconnects NICs
CS490N -- Chapt. 9
32
2003
Standard CPU
Layer 1 & 2
Class-
Forward-
(framing)
ification
ing
CS490N -- Chapt. 9
Control
And
Exceptions
fast data path
33
Interface2
Forward-
Class-
Layer 1 & 2
ing
ification
(framing)
2003
CS490N -- Chapt. 9
34
2003
Summary
Classification faster than demultiplexing
Can be implemented in hardware or software
Dynamic classification
CS490N -- Chapt. 9
35
2003
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
Questions?
XI
Network Processors: Motivation And Purpose
CS490N -- Chapt. 11
2003
CS490N -- Chapt. 11
2003
Powerful NIC
CS490N -- Chapt. 11
2003
Standard CPU
Forwarding
Control
And
Exceptions
fast data path
Interface2
Forwarding
CS490N -- Chapt. 11
2003
CS490N -- Chapt. 11
2003
ASIC hardware
NIC handles
Classification
Forwarding
Traffic policing
CS490N -- Chapt. 11
2003
Embedded Processor
Two possibilities
Smaller
CS490N -- Chapt. 11
2003
CS490N -- Chapt. 11
2003
Layers 1 & 2
standard CPU
Interface2
Layer 4
Other processing
Layer 4
Embedded
processor
Embedded
Processor
switching fabric
Layers 1 & 2
CS490N -- Chapt. 11
2003
CS490N -- Chapt. 11
10
2003
CS490N -- Chapt. 11
10
2003
CS490N -- Chapt. 11
10
2003
CS490N -- Chapt. 11
11
2003
CS490N -- Chapt. 11
12
2003
A Fourth Generation
Goal: combine best features of first generation and third
generation systems
CS490N -- Chapt. 11
13
2003
CS490N -- Chapt. 11
14
2003
Memory
Programmable
Ability to scale to higher
Data rates
Packet rates
CS490N -- Chapt. 11
15
2003
Memory
Programmable
Ability to scale to higher
Data rates
Packet rates
CS490N -- Chapt. 11
15
2003
Faster time-to-market
For consumers
CS490N -- Chapt. 11
16
2003
Choices difficult
CS490N -- Chapt. 11
17
2003
Instruction Set
Need to choose one instruction set
No single instruction set best for all uses
Other factors
Power consumption
Heat dissipation
Cost
CS490N -- Chapt. 11
18
2003
Scalability
Two primary techniques
Parallelism
Data pipelining
Questions
CS490N -- Chapt. 11
19
2003
CS490N -- Chapt. 11
20
2003
?
Network
Processor
Designs
Increasing
Performance
ASIC
Designs
Software
On Conventional
Processor
Increasing cost
CS490N -- Chapt. 11
21
2003
What is unknown
CS490N -- Chapt. 11
22
2003
Result
CS490N -- Chapt. 11
23
2003
An Interesting Observation
System developed using ASICs
CS490N -- Chapt. 11
24
2003
Summary
Third generation network systems have embedded processor
on each NIC
Network processor is programmable chip with facilities to
process packets faster than conventional processor
Primary motivation is economic
CS490N -- Chapt. 11
25
2003
Questions?
XII
The Complexity Of
Network Processor Design
CS490N -- Chapt. 12
2003
CS490N -- Chapt. 12
2003
Goals
Generality
High speed
Elegance
CS490N -- Chapt. 12
2003
CS490N -- Chapt. 12
2003
CS490N -- Chapt. 12
2003
CS490N -- Chapt. 12
2003
Questions
Does our list of functions encompass all protocol
processing?
Which function(s) are most important to optimize?
How do the functions map onto hardware units in a typical
network system?
Which hardware units in a network system can be replaced
with network processors?
What minimal set of instructions is sufficiently general to
implement all functions?
CS490N -- Chapt. 12
2003
Division Of Functionality
Partition problem to reduce complexity
Basic division into two parts
Functions applied when packet arrives known as
ingress processing
Functions applied when packet leaves known as
egress processing
CS490N -- Chapt. 12
2003
Ingress Processing
Security and error detection
Classification or demultiplexing
Traffic measurement and policing
Address lookup and packet forwarding
Header modification and transport splicing
Reassembly or flow termination
Forwarding, queueing, and scheduling
CS490N -- Chapt. 12
2003
Egress Processing
Addition of error detection codes
Address lookup and packet forwarding
Segmentation or fragmentation
Traffic shaping
Timing and scheduling
Queueing and buffering
Output security processing
CS490N -- Chapt. 12
10
2003
packets
arrive
packets
leave
CS490N -- Chapt. 12
P
H
Y
S
I
C
A
L
I
N
T
E
R
F
A
C
E
Egress Processing
Addition of error detection codes
Address lookup and packet forwarding
Segmentation or fragmentation
Traffic shaping
Timing and scheduling
Queueing and buffering
Output security Processing
11
2003
CS490N -- Chapt. 12
12
2003
CS490N -- Chapt. 12
13
2003
CS490N -- Chapt. 12
13
2003
CS490N -- Chapt. 12
14
2003
An Interesting Potential
Role For Network Processors
CS490N -- Chapt. 12
15
2003
Extant processors
Alternative designs
CS490N -- Chapt. 12
16
2003
Slow
FPGA implementation
CS490N -- Chapt. 12
17
2003
CS490N -- Chapt. 12
18
2003
CS490N -- Chapt. 12
19
2003
Summary
Protocol processing divided into ingress and egress
operations
Network processor design is challenging because
CS490N -- Chapt. 12
20
2003
Questions?
XVI
Languages Used For Classification
CS490N -- Chapt. 16
2003
CS490N -- Chapt. 16
2003
CS490N -- Chapt. 16
2003
Choices Depend On
Underlying hardware architecture
Data and packet rates
Software support available (e.g., compilers)
CS490N -- Chapt. 16
2003
Desiderata
High-level
Declarative
Data-oriented
Efficient
General or extensible
Explicit links to actions
CS490N -- Chapt. 16
2003
CS490N -- Chapt. 16
2003
Example Classification
Languages
We will consider two examples
NCL
FPL
CS490N -- Chapt. 16
2003
NCL
Expands to Network Classification Language
Developed by Intel
Runs on StrongARM
Not part of fast path
CS490N -- Chapt. 16
2003
NCL
Network Classification Language
Developed by Intel
Runs on StrongARM
Not part of fast path
CS490N -- Chapt. 16
2003
NCL Characteristics
High level
Mixes declarative and imperative paradigms
Associates action with each classification
Optimized for protocols with fixed-size header fields
Basic unit of data is eight-bit byte
CS490N -- Chapt. 16
2003
NCL Notation
Item specified by tuple
Size
Syntax:
base [ offset : size ]
Example
CS490N -- Chapt. 16
10
2003
Default
CS490N -- Chapt. 16
11
2003
CS490N -- Chapt. 16
12
2003
NCL Comments
C-style
/* ... */
C++-style
// ...
CS490N -- Chapt. 16
13
2003
NCL Primitives
Can name header fields
Names defined by offset / length
No predefined protocol specifications
CS490N -- Chapt. 16
14
2003
ip[0:1] }
( vershl & 0xf0 ) >> 4 }
( vershl & 0x0f ) }
tmphl << 2 }
ip[2:2] }
ip[4:2] }
ip[6:2] }
ip[8:1] }
ip[9:1] }
ip[10:2] }
ip[12:4] }
ip[16:4] }
demux
{
( proto == 6 )
{ tcp at hlen }
( proto == 17 )
{ udp at hlen }
( default ) { ipunknown at hlen }
// note: other protocol types can be added here
}
}
CS490N -- Chapt. 16
2003
IP Header Length
Found in second four bits of IP header
Gives header size in 32-bit multiples
NCL can
CS490N -- Chapt. 16
16
2003
CS490N -- Chapt. 16
{
{
{
{
ip[0:1] }
( vershl & 0xf0 ) >> 4 }
( vershl & 0x0f ) }
tmphl << 2 }
17
2003
Encapsulation
Needed for layered protocols
Type field from layer N specifies type of message
encapsulated at layer N + 1
Specified using at keyword
CS490N -- Chapt. 16
18
2003
If IP protocol is 6, tcp
If IP protocol is 17, icmp
Other cases classified as ipunknown
CS490N -- Chapt. 16
19
2003
Actions
Refers to imperative execution
Names function to execute
Associated with each classification
Specified by rule statement
CS490N -- Chapt. 16
20
2003
Specifies
Name is web_filter
CS490N -- Chapt. 16
21
2003
Intrinsic Functions
Built into language
Handle checksum
Verification
Calculation
CS490N -- Chapt. 16
22
2003
Available Intrinsic
Functions
CS490N -- Chapt. 16
Function Name
Purpose
ip.chksumvalid
ip.genchksum
tcp.chksumvalid
tcp.genchksum
udp.chksumvalid
udp.genchksum
23
2003
Named Predicates
Bind name to predicate
Can be used later in program
Associated with specific protocol
CS490N -- Chapt. 16
24
2003
Example Predicate
predicate ip.fragmentable { ! ( (ip.frags >> 14) & 0x01 ) }
CS490N -- Chapt. 16
25
2003
CS490N -- Chapt. 16
26
2003
Illustration Of NCP
Conditional Rule Execution
predicate TCPvalid { tcp && tcp.cksumvalid }
with { ( TCPvalid ) {
predicate SynFound { (tcp.flags & 0x02) }
rule NewConnection { SynFound } { start_conn (tcp_dport ) }
}
CS490N -- Chapt. 16
27
2003
CS490N -- Chapt. 16
28
2003
CS490N -- Chapt. 16
29
2003
size_hint { 256 }
}
CS490N -- Chapt. 16
30
2003
CS490N -- Chapt. 16
31
2003
NCL Preprocessor
Adopts syntax from C-preprocessor
Provides
#include
#ifndef
CS490N -- Chapt. 16
32
2003
Questions?
FPL
Functional Programming Language
Developed by Agere Systems
Runs on Agere FPP
In the fast path
CS490N -- Chapt. 16
34
2003
FPL
Functional Programming Language
Developed by Agere Systems
Runs on Agere FPP
In the fast path
CS490N -- Chapt. 16
34
2003
FPL Characteristics
High level
Unusual, declarative language
Follows pattern paradigm
Designed for networking
Differs from conventional syntactic pattern languages like
SNOBOL or Awk
Permits parallel evaluation
CS490N -- Chapt. 16
35
2003
FPL Syntax
C++ comments
Decimal
CS490N -- Chapt. 16
36
2003
CS490N -- Chapt. 16
37
2003
Interface Hardware
Divides each incoming packet into blocks of 64 octets
Transfers one block at a time to FPP chip
Sends additional information with each block
Flags
*
CS490N -- Chapt. 16
38
2003
Pass 1 Processing
(blocks)
Program invoked once for each block
Accommodates blocks from multiple interfaces
Collects the blocks for each packet into a separate queue
Forwards complete queue to Pass 2 when
CS490N -- Chapt. 16
39
2003
Collecting Blocks
Blocks arrive asynchronously
In arbitrary order
CS490N -- Chapt. 16
40
2003
Pass 2 Processing
(packets)
Program invoked once for each packet
Handles complete packet
Receives
Chooses disposition
Discard packet
CS490N -- Chapt. 16
41
2003
complete packet
sent back
block enqueued
FPL program
block
arrives
first
pass
processing
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
second
pass
processing
packet
leaves
CS490N -- Chapt. 16
42
2003
FPL Invocation
Handled by hardware
Pass 1 invoked when block arrives
Pass 2 invoked when packet ready
No explicit polling
CS490N -- Chapt. 16
43
2003
Designating Passes
Programmer specifies starting label for each pass
Handled by language directives
To designate first pass:
SETUP ROOT( starting_label )
To designate second pass:
SETUP REPLAYROOT( starting_label )
CS490N -- Chapt. 16
44
2003
CS490N -- Chapt. 16
45
2003
Example Conditional In
First Pass Processing
Hardware sets variable $FramerEOF
CS490N -- Chapt. 16
46
2003
Example Conditional In
First Pass Processing
(continued)
Conceptual algorithm for enqueuing a block:
if ( $FramerEOF ) {
fQueueEOF( );
} else {
fQueue( );
}
CS490N -- Chapt. 16
47
2003
CS490N -- Chapt. 16
48
2003
// ARP frames
// IP frames
// Other frames (not the above)
SETUP REPLAYROOT(HandleFrame);
// Pass 2
HandleFrame:
fSkip(96)
// skip past dest & src addresses
cl = CLASS
// compute class from frame type
fSkipToEnd()
fTransmit(cl:21, 0:16, 0:5, 0:6);
CLASS: 0x0806:16
CLASS: 0x0800:16
CLASS: BITS:16
CS490N -- Chapt. 16
fReturn(FRAMETYPEA);
fReturn(FRAMETYPEI);
fReturn(FRAMETYPEO);
49
2003
CS490N -- Chapt. 16
50
2003
Processed in order
Tree function
Set of patterns
CS490N -- Chapt. 16
51
2003
CS490N -- Chapt. 16
52
2003
fReturn(FRAMETYPEA);
CLASS: 0x0800:16
fReturn(FRAMETYPEI);
CLASS: BITS:16
fReturn(FRAMETYPEO);
CS490N -- Chapt. 16
53
2003
Special Patterns
fSkip
fSkipToEnd
fTransmit
BITS
CS490N -- Chapt. 16
54
2003
Pattern Optimization
FPL compiler optimizes patterns
Typical optimization: eliminate common prefix
CS490N -- Chapt. 16
55
2003
IP:
IP:
IP:
IP:
0x4
0x4
0x4
0x4
0x5 fSkip(152)
0x6 fSkip(160)
0x7 fSkip(168)
BITS:4
fReturn(IPOPTIONS0);
fReturn(IPOPTIONS1);
fReturn(IPOPTIONS2);
fReturn(IPUNKNOWN);
(a)
IP:
IPHDR:
IPHDR:
IPHDR:
IPHDR:
0x4 IPHDR;
0x5 fSkip(152)
0x6 fSkip(160)
0x7 fSkip(168)
BITS:4
fReturn(IPOPTIONS0);
fReturn(IPOPTIONS1);
fReturn(IPOPTIONS2);
fReturn(IPUNKNOWN);
(b)
CS490N -- Chapt. 16
56
2003
IP Address Example
ipAddrMatch:
ipAddrMatch:
ipAddrMatch:
ipAddrMatch:
ipAddrMatch:
CS490N -- Chapt. 16
*.*.*.*
10.*.*.*
128.10.*.*
128.211.*.*
128.210.*.*
57
fReturn(0);
fReturn(1);
fReturn(2);
fReturn(2);
fReturn(3);
2003
FPL Variables
Variable
Pass
Meaning
$framerSOF
$ferr
$portNumber
$framerEOF
$offset
$currOffset
$currLength
$pass
$tag
1
1
1
1
1 or 2
1 or 2
1 or 2
1 or 2
1 or 2
CS490N -- Chapt. 16
58
2003
Dynamic Classification
Can extend tree function at run-time
Requires use of ASI
Pattern converted to internal form
CS490N -- Chapt. 16
59
2003
Summary
Programming languages for classification are
Special-purpose
High-level
Declarative
Data-oriented
CS490N -- Chapt. 16
60
2003
Questions?
XVII
Design Tradeoffs And Consequences
CS490N -- Chapt. 17
2003
CS490N -- Chapt. 17
2003
Programmability
Vs.
Processing Speed
Programmable hardware is slower
Flexibility costs...
CS490N -- Chapt. 17
2003
Speed
Vs.
Functionality
Generic idea:
CS490N -- Chapt. 17
2003
Speed
Difficult to define
Can include
Packet Rate
Data Rate
Burst size
CS490N -- Chapt. 17
2003
Per-Interface Rates
Vs.
Aggregate Rates
Per-interface rate important if
CS490N -- Chapt. 17
2003
CS490N -- Chapt. 17
2003
Lookaside Coprocessors
Vs.
Flow-Through Coprocessors
Flow-through pipeline
Difficult to change
Lookaside
CS490N -- Chapt. 17
2003
Uniform Pipeline
Vs.
Synchronized Pipeline
Uniform pipeline
Synchronized pipeline
Synchronization expensive
CS490N -- Chapt. 17
2003
Explicit Parallelism
Vs.
Cost And Programmability
Explicit parallelism
Implicit parallelism
Easier to program
CS490N -- Chapt. 17
10
2003
Parallelism
Vs.
Strict Packet Ordering
Increased parallelism
Improves performance
CS490N -- Chapt. 17
11
2003
Stateful Classification
Vs.
High-Speed Parallel Classification
Static classification
Keeps no state
Is the fastest
Dynamic classification
Keeps state
CS490N -- Chapt. 17
12
2003
Memory Speed
Vs.
Programmability
Separate memory banks
Difficult to program
Non-banked memory
Easier to program
Lower performance
CS490N -- Chapt. 17
13
2003
I/O Performance
Vs.
Pin Count
Bus width
CS490N -- Chapt. 17
14
2003
Programming Languages
A three-way tradeoff
Can have two, but not three of
Ease of programming
Functionality
Performance
CS490N -- Chapt. 17
15
2003
CS490N -- Chapt. 17
16
2003
CS490N -- Chapt. 17
17
2003
CS490N -- Chapt. 17
18
2003
Multithreading:
Throughput
Vs.
Ease Of Programming
Multiple threads of control can increase throughput
Planning the operation of threads that exhibit less contention
requires more programmer effort
CS490N -- Chapt. 17
19
2003
Traffic Management
Vs.
High-Speed Forwarding
Traffic management
Blind forwarding
CS490N -- Chapt. 17
20
2003
Generality
Vs.
Specific Architectural Role
General-purpose network processor
More expensive
CS490N -- Chapt. 17
21
2003
Special-Purpose Memory
Vs.
General-Purpose Memory
General-purpose memory
Special-purpose memory
CS490N -- Chapt. 17
22
2003
Backward Compatibility
Vs.
Architectural Advances
Backward compatibility
Architectural advances
CS490N -- Chapt. 17
23
2003
Parallelism
Vs.
Pipelining
Both are fundamental performance techniques
Usually used in combination: pipeline of parallel processors
CS490N -- Chapt. 17
24
2003
Summary
Many design tradeoffs
No easy answers
CS490N -- Chapt. 17
25
2003
Questions?
XVIII
Overview Of The Intel Network Processor
CS490N -- Chapt. 18
2003
CS490N -- Chapt. 18
2003
CS490N -- Chapt. 18
2003
Intel IXP1200
Name of first generation IXP chip
Four models available
Model
Pin
Support
Support
Possible Clock
Number
Count
For CRC
For ECC
IXP1200
432
no
no
IXP1240
432
yes
no
IXP1250
520
yes
yes
IXP1250
520
yes
yes
166 only
CS490N -- Chapt. 18
2003
IXP1200 Features
One embedded RISC processor
Six programmable packet processors
Multiple, independent onboard buses
Processor synchronization mechanisms
Small amount of onboard memory
One low-speed serial line interface
Multiple interfaces for external memories
Multiple interfaces for external I/O buses
A coprocessor for hash computation
Other functional units
CS490N -- Chapt. 18
2003
PCI bus
SRAM
bus
serial
line
SRAM
FLASH
IXP1200
chip
Memory
Mapped
I/O
SDRAM
SDRAM
bus
High-speed I/O bus
CS490N -- Chapt. 18
IX bus
2003
Bus Width
Clock Rate
Data Rate
Serial line
PCI bus
IX bus
SDRAM bus
SRAM bus
(NA)
32 bits
64 bits
64 bits
16 or 32 bits
(NA)
33-66 MHz
66-104 MHz
232 MHz
232 MHz
38.4 Kbps
2.2 Gbps
4.4 Gbps
928.0 MBps
464.0 MBps
CS490N -- Chapt. 18
2003
Component
Purpose
6
1
several
CS490N -- Chapt. 18
Onboard buses
2003
PCI bus
SRAM
bus
SRAM
IXP1200 chip
SRAM
access
PCI access
scratch
memory
multiple,
independent
internal
buses
Embedded
RISC
processor
(StrongARM)
serial
line
FLASH
Memory
Mapped
I/O
Microengine 1
Microengine 2
Microengine 3
Microengine 4
Microengine 5
SDRAM
access
SDRAM
IX access
Microengine 6
SDRAM
bus
High-speed I/O bus
CS490N -- Chapt. 18
IX bus
2003
Onboard?
Programmable?
no
yes
yes
yes
no
yes
yes
yes
no
no
CS490N -- Chapt. 18
10
2003
Maximum
Size
GP Registers
Inst. Cache
Data Cache
Mini Cache
Write buffer
Scratchpad
Inst. Store
FlashROM
SRAM
SDRAM
128 regs.
16 Kbytes
8 Kbytes
512 bytes
unspecified
4 Kbytes
64 Kbytes
8 Mbytes
8 Mbytes
256 Mbytes
CS490N -- Chapt. 18
On
Chip?
yes
yes
yes
yes
yes
yes
yes
no
no
no
11
Typical
Use
Intermediate computation
Recently used instructions
Recently used data
Data that is reused once
Write operation buffer
IPC and synchronization
Microengine instructions
Bootstrap
Tables or packet headers
Packet storage
2003
Addressable
Data Unit (bytes)
Relative
Access Time
Special
Features
Scratch
12 - 14
synchronization via
test-and-set and other
bit manipulation,
atomic increment
SRAM
16 - 20
queue manipulation,
bit manipulation,
read/write locks
SDRAM
32 - 40
CS490N -- Chapt. 18
12
2003
Memory Addressability
Each memory specifies minimum addressable unit
CS490N -- Chapt. 18
13
2003
Example Of Complexity:
SRAM Access Unit
SRAM access unit
clock
signals
SRAM
pin
interface
service priority
arbitration
AMBA
bus
interface
SRAM address
memory
& FIFO
data
data
AMBA
from
StrongARM
buffer
Flash
(Boot
ROM)
addr
AMBA addr.
queues
command
decoder
& addr.
generator
microengine addr.
& command queues
Memory
Mapped
I/O
Microengine
commands
microengine data
CS490N -- Chapt. 18
14
2003
Summary
We will use Intel IXP1200 as example
IXP1200 offers
CS490N -- Chapt. 18
15
2003
Questions?
XIX
Embedded RISC Processor (StrongARM Core)
CS490N -- Chapt. 19
2003
StrongARM Role
General-Purpose
Processor
GPP
GPP
Embedded
RISC
Processors
IXP1200
IXP1200
IXP1200
IXP1200
physical
interfaces
(a)
(b)
CS490N -- Chapt. 19
2003
2003
StrongARM Characteristics
Reduced Instruction Set Computer (RISC)
Thirty-two bit arithmetic
Vector floating point provided via a coprocessor
Byte addressable memory
Virtual memory support
Built-in serial port
Facilities for a kernelized operating system
CS490N -- Chapt. 19
2003
Arithmetic
StrongARM is configurable in two modes
Big endian
Little endian
CS490N -- Chapt. 19
2003
CS490N -- Chapt. 19
2003
Contents
SDRAM Bus:
SDRAM
Scratch Pad
FBI CSRs
Microengine xfer
Microengine CSRs
C0 0 0 0 0 0 0
B0 0 0 0 0 0 0
A0 0 0 0 0 0 0
9000 0000
8000 0000
AMBA xfer
Reserved
System regs
Reserved
PCI Bus:
PCI memory
PCI I/O
PCI config
Local PCI config
4000 0000
SRAM Bus:
SlowPort
SRAM CSRs
Push/Pop cmd
Locks
BootROM
0000 0000
CS490N -- Chapt. 19
2003
Summary
Embedded processor on IXP1200 is StrongARM
StrongARM addressing
Byte addressable
CS490N -- Chapt. 19
2003
Questions?
XXI
Reference System And Software Development Kit
(Bridal Veil, SDK)
CS490N -- Chapt. 21
2003
Reference System
Provided by vendor
Targeted at potential customers
Usually includes
Hardware testbed
Development software
Reference implementations
CS490N -- Chapt. 21
2003
CS490N -- Chapt. 21
2003
Quantity or Size
Item
256
CS490N -- Chapt. 21
2003
Purpose
C compiler
MicroC compiler
Assembler
Downloader
Monitor
Bootstrap
Reference Code
CS490N -- Chapt. 21
2003
Basic Paradigm
Build software on conventional computer
Load into reference system
Test / measure results
CS490N -- Chapt. 21
2003
Bootstrapping Procedure
1. Powering on host causes network processor board to run boot monitor
from Flash memory
2. Device driver on host communicates across the PCI bus with boot
monitor to load operating system (Embedded Linux) and initial RAM disk
configuration into network processors memory
3. Host signals boot monitor to start the StrongARM
4. Operating system runs login process on serial line and starts telnet server
5. Operating systems on host and StrongARM configure PCI bus to act as
Ethernet emulator; the StrongARM uses NFS to mount two file systems,
R and W, from a server on the host
CS490N -- Chapt. 21
2003
Starting Software
1. Compile code for StrongARM and microengines
2. Create system configuration file named ixsys.config
3. Copy code and configuration file to read-only public download
directory, R
4. Run terminal emulation program on host and log onto StrongARM
5. Change to NSF-mounted directory R, and run shell script ixstart
with argument ixsys.config
6. Later, to stop the IXP1200, run ixstop script
CS490N -- Chapt. 21
2003
Summary
Reference systems
Provided by vendor
Usually include
*
Hardware testbed
Cross-development software
Reference implementations
CS490N -- Chapt. 21
2003
Questions?
XXII
Programming Model
(Intel ACE)
CS490N -- Chapt. 22
2003
CS490N -- Chapt. 22
2003
ACE Features
Fundamental software building block
Used to construct packet processing systems
Runs on StrongARM, microengine, or host
Handles control plane and fast or slow data path processing
Coordinates and synchronizes with other ACEs
Can have multiple inputs or outputs
Can serve as part of a pipeline
CS490N -- Chapt. 22
2003
ACE Terminology
Library ACE
Built by Intel
Conventional ACE
MicroACE
CS490N -- Chapt. 22
2003
More Terminology
Source microblock
Sink microblock
Transform microblock
CS490N -- Chapt. 22
2003
CS490N -- Chapt. 22
2003
CS490N -- Chapt. 22
2003
input
ports
ingress ACE
process ACE
egress ACE
output
ports
CS490N -- Chapt. 22
2003
Components Of MicroACE
(review)
Single ACE with two components
CS490N -- Chapt. 22
2003
ingress ACE
(core)
IP ACE
(core)
egress ACE
(core)
ingress ACE
(microblock)
IP ACE
(microblock)
egress ACE
(microblock)
StrongARM
microengine
input
ports
output
ports
CS490N -- Chapt. 22
10
2003
Microblock Group
Set of one or more microblocks
Treated as single unit
Loaded onto microengine for execution
Can be replicated on multiple microengines for higher speed
CS490N -- Chapt. 22
11
2003
ingress ACE
(core)
IP ACE
(core)
egress ACE
(core)
StrongARM
microengine 1
input
ports
microengine 2
ingress ACE
(microblock)
IP ACE
(microblock)
egress ACE
(microblock)
output
ports
CS490N -- Chapt. 22
12
2003
CS490N -- Chapt. 22
13
2003
Illustration Of Replication
microengine 1
ingress ACE
(microblock)
IP ACE
(microblock)
input
ports
microengine 2
egress ACE
(microblock)
ingress ACE
(microblock)
output
ports
IP ACE
(microblock)
microengine 3
CS490N -- Chapt. 22
14
2003
Microblock Structure
Asynchronous model
Programmer creates
Initialization macro
Dispatch loop
CS490N -- Chapt. 22
15
2003
Dispatch Loop
Determines disposition of each packet
Uses return code to either
Discard packet
CS490N -- Chapt. 22
16
2003
17
2003
Arguments Passed To
Processing Macro
A buffer handle for a frame that contains a packet
A set of state registers
Can be modified
CS490N -- Chapt. 22
18
2003
Packet Queues
Placed between ACE components
Buffer packets
Permit asynchronous operation
CS490N -- Chapt. 22
19
2003
Stack ACE
ingress ACE
(core)
IP ACE
(core)
egress ACE
(core)
IP ACE
microblock
egress ACE
microblock
StrongARM
microengines
input
ports
CS490N -- Chapt. 22
ingress ACE
microblock
20
output
ports
2003
Exceptions
Packets passed from microblock to core component
Mechanism
CS490N -- Chapt. 22
21
2003
CS490N -- Chapt. 22
22
2003
Crosscall Mechanism
Used between
CS490N -- Chapt. 22
23
2003
Crosscall Implementation
Both caller and callee programmed to use crosscall
Declaration given in Interface Definition Language (IDL)
IDL compiler
Reads specification
CS490N -- Chapt. 22
24
2003
Crosscall Implementation
(continued)
Three types of crosscalls
CS490N -- Chapt. 22
25
2003
CS490N -- Chapt. 22
26
2003
Summary
Intel SDK uses ACE programming model
ACE
CS490N -- Chapt. 22
27
2003
Questions?
XXIII
ACE Run-Time Structure
And
StrongARM Facilities
CS490N -- Chapt. 23
2003
StrongARM Responsibilities
Loading ACE software
Creating and initializing ACE
Resolving names
Managing the operation of ACEs
Allocating and reclaiming resources (e.g., memory)
Controlling microengine operation
Communication among core components
Interface to non-ACE applications
Interface to operating system facilities
Forwarding packets between core component and
microblock(s)
CS490N -- Chapt. 23
2003
. . .
OMS
Operating System
Specific Library
(shared library)
Resolver
(process)
Action Services
Library
(loadable kernel module)
Name
Server
(process)
Resource Manager
(loadable kernel module)
StrongARM
microengines
microblocks
. . .
CS490N -- Chapt. 23
2003
Components
Object Management System
Contains
*
Resolver
Name server
Resource Manager
Access to OS
Memory management
CS490N -- Chapt. 23
2003
Components
(continued)
Operating System Specific Library
Shared library
TCP/IP support
CS490N -- Chapt. 23
2003
Microengine Assignment
Automated by SDK
Programmer specifies
SDK chooses
CS490N -- Chapt. 23
2003
CS490N -- Chapt. 23
Numeric Value
Meaning
0
1
2
3
4
2003
CS490N -- Chapt. 23
2003
Reserved Names
SDK software uses fixed names for some functions
Examples
CS490N -- Chapt. 23
2003
CS490N -- Chapt. 23
10
2003
CS490N -- Chapt. 23
*/
*/
*/
*/
*/
*/
*/
11
2003
Event Loop
Central to asynchronous programming model
Uses polling
Repeatedly checks for presence of event(s) and calls
appropriate handler
Can be hidden from programmer
In ACE model
Explicit
CS490N -- Chapt. 23
12
2003
CS490N -- Chapt. 23
13
2003
CS490N -- Chapt. 23
14
2003
CS490N -- Chapt. 23
15
2003
CS490N -- Chapt. 23
16
2003
Function to invoke
Arguments to pass
Callback function
Called program
Invoked asynchronously
CS490N -- Chapt. 23
17
2003
/* Synchronous version
*/
/* Call f (potentially blocking)
*/
/* Use the result from f to call g */
/* Use the value of z to update q*/
}
h1 {
/* Asynchronous version
allocate global variables y, z, and q;
establish cbf1 as the callback function for f1;
establish cbg1 as the callback function for g1;
Start f1(x) with a nonblocking call;
return;
*/
CS490N -- Chapt. 23
18
2003
*/
function cbg1(retval) {
z = retval;
q += z;
return;
}
*/
CS490N -- Chapt. 23
19
2003
Code Complexity
CS490N -- Chapt. 23
20
2003
CS490N -- Chapt. 23
21
2003
CS490N -- Chapt. 23
22
2003
Memory Allocation
Performed by StrongARM
Function RmMalloc (in Resource Manager) allocates
memory
Request must specify type of memory
SRAM
SDRAM
Scratch
CS490N -- Chapt. 23
23
2003
CS490N -- Chapt. 23
24
2003
CS490N -- Chapt. 23
25
2003
CS490N -- Chapt. 23
26
2003
structmyace
*/
struct
int
mycount;
CS490N -- Chapt. 23
27
2003
CS490N -- Chapt. 23
28
2003
Crosscall
Uses OMS
Three types
Oneway: no return
CS490N -- Chapt. 23
29
2003
CS490N -- Chapt. 23
Caller
Called Procedure
no
non-ACE application
no
non-ACE application
no
non-ACE application
non-ACE application
yes
30
Block?
2003
Crosscall Declaration
Uses Interface Definition Language (IDL)
Declares type of
Exported functions
Arguments
CS490N -- Chapt. 23
31
2003
CS490N -- Chapt. 23
32
2003
Timer Management
Performed on StrongARM
Uses OMS
Follows asynchronous model
CS490N -- Chapt. 23
33
2003
Using A Timer
1. Initialize H, a handle for an ix_event.
2. Associate handle H with a callback function, CB.
3. Calculate T, a time for the event to occur.
4. Schedule event H at time T.
5. At any time prior to T, event H can be cancelled.
6. At time T, function CB will be called if the event has
not been cancelled.
CS490N -- Chapt. 23
34
2003
Summary
Core component of ACE
CS490N -- Chapt. 23
35
2003
Questions?
XX
Packet Processor Hardware
(Microengines And FBI)
CS490N -- Chapt. 20
2003
Role Of Microengines
Packet ingress from physical layer hardware
Checksum verification
Header processing and classification
Packet buffering in memory
Table lookup and forwarding
Header modification
Checksum computation
Packet egress to physical layer hardware
CS490N -- Chapt. 20
2003
Microengine Characteristics
Programmable microcontroller
RISC design
One hundred twenty-eight general-purpose registers
One hundred twenty-eight transfer registers
Hardware support for four threads and context switching
Five-stage execution pipeline
Control of an Arithmetic Logic Unit (ALU)
Direct access to various functional units
CS490N -- Chapt. 20
2003
Microengine Level
Not a typical CPU
Does not contain native instruction for each operation
Really a microsequencer
CS490N -- Chapt. 20
2003
Consequence Of Microsequencing
CS490N -- Chapt. 20
2003
CS490N -- Chapt. 20
2003
CS490N -- Chapt. 20
2003
Description
1
2
3
4
CS490N -- Chapt. 20
2003
stage 1
stage 2
stage 3
stage 4
stage 5
inst. 1
inst. 2
inst. 1
inst. 3
inst. 2
inst. 1
inst. 4
inst. 3
inst. 2
inst. 1
inst. 5
inst. 4
inst. 3
inst. 2
inst. 1
inst. 6
inst. 5
inst. 4
inst. 3
inst. 2
inst. 7
inst. 6
inst. 5
inst. 4
inst. 3
inst. 8
inst. 7
inst. 6
inst. 5
inst. 4
Time
CS490N -- Chapt. 20
2003
Instruction Stall
Occurs when operand not available
Processor temporarily stops execution
Reduces overall speed
Should be avoided when possible
CS490N -- Chapt. 20
10
2003
CS490N -- Chapt. 20
11
2003
stage 1
stage 2
stage 3
stage 4
stage 5
inst. K
inst. K-1
inst. K-2
inst. K-3
inst. K-4
inst. K+1
inst. K
inst. K-1
inst. K-2
inst. K-3
inst. K+2
inst. K+1
inst. K
inst. K-1
inst. K-2
inst. K+3
inst. K+2
inst. K+1
inst. K
inst. K-1
inst. K+3
inst. K+2
inst. K+1
inst. K
inst. K+3
inst. K+2
inst. K+1
inst. K+4
inst. K+3
inst. K+2
inst. K+1
inst. K+5
inst. K+4
inst. K+3
inst. K+2
inst. K+1
Time
CS490N -- Chapt. 20
12
2003
Sources Of Delay
Access to result of previous / earlier operation
Conditional branch
Memory access
CS490N -- Chapt. 20
13
2003
Scratchpad
SRAM
SDRAM
12 - 14
16 - 20
32 - 40
CS490N -- Chapt. 20
14
2003
Threads Of Execution
Technique used to speed processing
Multiple threads of execution remain ready to run
Program defines threads and informs processor
Processor runs one thread at a time
Processor automatically switches context to another thread
when current thread blocks
Known as hardware threads
CS490N -- Chapt. 20
15
2003
time t2
time t3
thread 1
context switch
thread 2
thread 3
thread 4
time
CS490N -- Chapt. 20
16
2003
CS490N -- Chapt. 20
17
2003
CS490N -- Chapt. 20
18
2003
CS490N -- Chapt. 20
19
2003
General-Purpose Registers
One hundred twenty-eight per microengine
Thirty-two bits each
Used for computation or intermediate values
Divided into banks
Context-relative or absolute addresses
CS490N -- Chapt. 20
20
2003
Forms Of Addressing
Absolute
Context-relative
CS490N -- Chapt. 20
21
2003
Register Banks
Mechanism commonly used with RISC processor
Registers divided into A bank and B bank
Maximum performance achieved when each instruction
references a register from the A bank and a register from the
B bank
CS490N -- Chapt. 20
22
2003
A bank
(64 regs.)
B bank
(64 regs.)
absolute addr.
used by
relative addr.
48 - 63
0 - 15
32 - 47
0 - 15
16 - 31
0 - 15
0 - 15
0 - 15
48 - 63
0 - 15
32 - 47
0 - 15
16 - 31
0 - 15
0 - 15
0 - 15
Note: half of the registers for each context are from A bank
and half from B bank
CS490N -- Chapt. 20
23
2003
Transfer Registers
Used to buffer external memory transfers
Example: read a value from memory
SRAM or SDRAM
Read or write
CS490N -- Chapt. 20
24
2003
SDRAM read
( 32 regs. )
SDRAM write
( 32 regs. )
SRAM read
( 32 regs. )
SRAM write
( 32 regs. )
CS490N -- Chapt. 20
absolute addr.
used by
relative addr.
24 - 31
context 3 ( 8 regs. )
0-7
16 - 23
context 2 ( 8 regs. )
0-7
8 - 15
context 1 ( 8 regs. )
0-7
0-7
context 0 ( 8 regs. )
0-7
24 - 31
context 3 ( 8 regs. )
0-7
16 - 23
context 2 ( 8 regs. )
0-7
8 - 15
context 1 ( 8 regs. )
0-7
0-7
context 0 ( 8 regs. )
0-7
24 - 31
context 3 ( 8 regs. )
0-7
16 - 23
context 2 ( 8 regs. )
0-7
8 - 15
context 1 ( 8 regs. )
0-7
0-7
context 0 ( 8 regs. )
0-7
24 - 31
context 3 ( 8 regs. )
0-7
16 - 23
context 2 ( 8 regs. )
0-7
8 - 15
context 1 ( 8 regs. )
0-7
0-7
context 0 ( 8 regs. )
0-7
25
2003
CS490N -- Chapt. 20
26
2003
Local CSRs
Local CSR
USTORE_ADDRESS
USTORE_DATA
ALU_OUTPUT
ACTIVE_CTX_STS
ENABLE_SRAM_JOURNALING
CTX_ARB_CTL
CTX_ENABLES
CC_ENABLE
CTX_n_STS
CTX_n_SIG_EVENTS
CTX_n_WAKEUP_EVENTS
Purpose
Load the microengine control store
Load a value into the control store
Debugging: allows StrongARM to read
GPRs and transfer registers
Determine context status
Debugging: place journal in SRAM
Context arbiter control
Debugging: enable a context
Enable condition codes
Determine context status
Determine signal status
Determine which wakeup events
are currently enabled
CS490N -- Chapt. 20
27
2003
Interrupt
Signal event
Signal event
Ready bus
CS490N -- Chapt. 20
28
2003
FBI Unit
Interface between processors and other functional units
Scratchpad memory
Hash unit
29
2003
FIFOs
Hardware buffers
Transfer performed in 64 byte blocks
Misleading name: access is random
Only path between external device and microengine
Receive FIFO (RFIFO)
Handles input
CS490N -- Chapt. 20
30
2003
FIFOs
(continued)
Transmit FIFO (TFIFO)
Handles output
CS490N -- Chapt. 20
31
2003
CS490N -- Chapt. 20
32
2003
FBI
Initiates and controls data transfer
Contains active components
Push engine
Pull engine
CS490N -- Chapt. 20
33
2003
Pull Engine
data enters
command queues
commands from
microengine
command bus
8-entry
(pull)
TFIFO
commands arrive
8-entry
(hash)
CSRs
Push Engine
8-entry
(push)
Scratch
Hash
commands arrive
data to
SRAM transfer
registers
CS490N -- Chapt. 20
data leaves
34
RFIFO
2003
ScratchPad Memory
Organized into 1K words of 4 bytes each
Offers special facilities
CS490N -- Chapt. 20
35
2003
Hash Unit
Configurable coprocessor
Operates asynchronously
Intended for table lookup when multiplication or division
required (ALU does not have multiply instruction)
CS490N -- Chapt. 20
36
2003
CS490N -- Chapt. 20
37
2003
Hash Mathematics
Integer value interpreted as polynomial over field [0,1]
Example:
2040116
Is interpreted as
x17 + x10 + 1
Similarly, value G(x) used in 48-bit hash
100100200040116
Is interpreted as
x48 + x36 + x25 + x10 + 1
CS490N -- Chapt. 20
38
2003
Hash Example
A = 80000000000116
G = 100100200040116
M = 20D16
( x47 + 1 )
( x48 + x36 + x25 + x10 + 1 )
( x9 + x3 + x2 + 1 )
CS490N -- Chapt. 20
39
2003
Hash Example
(continued)
Multiplying yields
M A = x56 + x50 + x49 + x47 + x9 + x3 + x2 + 1
Furthermore
MA = QG + R
Where
Q = x8 + x2 + x1
CS490N -- Chapt. 20
40
2003
Hash Example
(continued)
So, Q is:
10616
And R is:
x47 + x44 + x38 + x37 + x33 + x27 + x26 + x18 + x12 +
x11 + x9 + x8 + x3 + x1 + 1
The hash unit returns R as the value of the computation:
90620C041B0B16
CS490N -- Chapt. 20
41
2003
CS490N -- Chapt. 20
42
2003
Summary
Microengines
CS490N -- Chapt. 20
43
2003
Questions?
XXIV
Microengine Programming I
CS490N -- Chapt. 24
2003
Microengine Code
Many low-level details
Close to hardware
Written in assembly language
CS490N -- Chapt. 24
2003
CS490N -- Chapt. 24
2003
Statement Syntax
General form:
label:
CS490N -- Chapt. 24
2003
Comment Statements
Three styles available
CS490N -- Chapt. 24
2003
Assembler Directives
Begin with period in column one
Can
Generate code
CS490N -- Chapt. 24
myname
2003
Destination register
Operation
CS490N -- Chapt. 24
2003
Memory Operations
Programmer specifies
Type of memory
Direction of transfer
Optional token
CS490N -- Chapt. 24
2003
Memory Operations
(continued)
General forms
CS490N -- Chapt. 24
2003
Memory Addressing
Specified with operands addr1 and addr2
Each operand corresponds to register
Use of two operands allows
Base + offset
CS490N -- Chapt. 24
10
2003
Immediate Instruction
Place constant in thirty-two bit register
immed [ dst, ival, rot ]
Upper sixteen bits of ival must be all zeros or all ones
Operand rot specifies bit rotation
0
<< 0
<< 8
<< 16
CS490N -- Chapt. 24
No rotation
No rotation (same as 0)
Rotate to the left by eight bits
Rotate to the left by sixteen bits
11
2003
Register Names
Usually automated by assembler
Directives available for manual assignment
Directive
.areg
.breg
.$reg
.$$reg
CS490N -- Chapt. 24
12
2003
Assembler
Assigns registers
CS490N -- Chapt. 24
13
2003
.
.
.
.endlocal
CS490N -- Chapt. 24
14
2003
.
.
.
.endlocal
.
.
.
.endlocal
.endlocal
CS490N -- Chapt. 24
15
2003
Conflicts
Operands must come from separate banks
Some code sequences cause conflict
Example:
Z Q + R;
Y R + S;
X Q + S;
No assignment is valid
Programmer must change code
CS490N -- Chapt. 24
16
2003
CS490N -- Chapt. 24
17
2003
Use
#include
#define
#define_eval
#undef
#macro
#endm
Include a file
Define symbolic constant
Eval arithmetic expression & define constant
Remove previous definition
Start macro definition
End a macro definition
#ifdef
#ifndef
#if
#else
#elif
#endif
#for
#while
#repeat
#endloop
CS490N -- Chapt. 24
18
2003
Macro Definition
Can occur at any point in program
General form:
#macro name [ parameter1 , parameter2 , . . . ]
lines of text
#endm
CS490N -- Chapt. 24
19
2003
Macro Example
Compute a = b + c + 5
/* example macro add5 computes a=b+c+5 */
#macro add5[a, b, c]
.local tmp
alu[tmp, c, +, 5]
alu[a, b, +, tmp]
.endlocal
#endm
Assumes values a, b, and c in registers
CS490N -- Chapt. 24
20
2003
CS490N -- Chapt. 24
21
2003
CS490N -- Chapt. 24
22
2003
CS490N -- Chapt. 24
23
2003
CS490N -- Chapt. 24
Meaning
Conditional execution
Terminate previous conditional execution and
start a new conditional execution
Terminate previous conditional execution and
define an alternative
End .if conditional
Indefinite iteration with test before
End .while loop
Indefinite iteration with test after
End .repeat loop
Leave a loop
Skip to next iteration of loop
24
2003
ctx_arb instruction
CS490N -- Chapt. 24
25
2003
CS490N -- Chapt. 24
26
2003
CS490N -- Chapt. 24
27
2003
Indirect Reference
Poor choice of name
Hardware optimization
Found on other RISC processors
Result of one instruction modifies next instruction
Avoids stalls
Typical use
CS490N -- Chapt. 24
28
2003
CS490N -- Chapt. 24
29
2003
ALU operation
CS490N -- Chapt. 24
30
2003
CS490N -- Chapt. 24
31
2003
External Transfers
Microengine cannot directly access
Memory
CS490N -- Chapt. 24
32
2003
CS490N -- Chapt. 24
33
2003
CS490N -- Chapt. 24
34
2003
Summary
Microengines programmed in assembly language
Intels assembler provides
Macro preprocessor
CS490N -- Chapt. 24
35
2003
Questions?
XXV
Microengine Programming II
CS490N -- Chapt. 25
2003
CS490N -- Chapt. 25
2003
CS490N -- Chapt. 25
2003
Processor Coordination
Via Bit Testing
Provided by SRAM and Scratchpad memories
Atomic test-and-set
Mask used to specify bit in a word
General form
scratch [ bit_wr, $xfer, addr1, addr2, op ]
CS490N -- Chapt. 25
2003
Operation
Meaning
set_bits
clear_bits
test_and_set_bits
test_and_clear_bits
CS490N -- Chapt. 25
2003
StrongARM
Microengines
CS490N -- Chapt. 25
2003
Processor Coordination
Via Memory Locking
Word of memory acts as mutual exclusion lock
Achieved with memory lock instruction
Single read_lock instruction
CS490N -- Chapt. 25
2003
Processor Coordination
Via Memory Locking
(continued)
General form of read_lock
sram [ read_lock, $xfer, addr1, addr2, count ], ctx_swap
Token ctx_swap required (thread blocks until lock obtained)
General form of write_unlock
sram [ unlock, , addr1, addr2, 1 ]
CS490N -- Chapt. 25
2003
CS490N -- Chapt. 25
2003
Configure
Control
Interrogate
Monitor
Access
CS490N -- Chapt. 25
10
2003
Csr Instruction
Used on microengines
General form
csr [ cmd, $xfer, CSR, count ]
Alternatives
CS490N -- Chapt. 25
11
2003
CS490N -- Chapt. 25
12
2003
Purpose
Allocate a packet buffer
Get the SDRAM address of a packet buffer
(note: the name is misleading)
Drop a packet and recycle the buffer
Compute the length (in bytes) of the packet
portion of a buffer
Compute the offset of packet data within a buffer
Obtain the input port over which the packet arrived
Find the port over which the packet will be sent
Obtain the receive status of the packet
Initialize the dispatch loop macros
Send a packet to next microblock group
Send a packet to the StrongARM
Receive a packet from the StrongARM
Specify the microblock that is handling a packet
so the StrongARM will know
Specify the length of packet data in the buffer
Specify the offset of packet data in the buffer
Specify the exception code for the StrongARM
Specify the port over which the packet arrived
Specify the port on which the packet will be sent
Specify the receive status of the packet
13
2003
packet from
core component
on StrongARM
dispatch
loop
exception passed
up to the
StrongARM
CS490N -- Chapt. 25
packet
dropped
14
packet sent to
physical device or
next microACE
2003
Macro is RoundRobinSource
Macro is FifoSource
CS490N -- Chapt. 25
15
2003
StrongARM
CS490N -- Chapt. 25
16
2003
Implementation Of Priority
Macro EthernetIngress obtains Ethernet frame
Macro DL_SASource obtains packet from StrongARM
Each returns IX_BUFFER_NULL, if no packet waiting
To achieve priority, DL_SASource counts
SA_CONSUME_NUM times before returning a packet
CS490N -- Chapt. 25
17
2003
Packet Disposition
Auxiliary variable used
Dispatch loop finds a packet and calls macro X to process
Macro X sets variable dl_buffer_next
Dispatch loop examines dl_buffer_next and invokes
CS490N -- Chapt. 25
18
2003
Buffer manipulation
Packet processing
Other tasks
CS490N -- Chapt. 25
19
2003
CS490N -- Chapt. 25
20
2003
CS490N -- Chapt. 25
21
2003
SA_CONSUME_NUM
IX_EXCEPTION
SEQNUM_IGNORE
CS490N -- Chapt. 25
22
2003
CS490N -- Chapt. 25
23
2003
CS490N -- Chapt. 25
24
2003
DispatchLoop_h.uc
DispatchLoopImportVars.h
EthernetIngress.uc
EthernetEgress.uc
CS490N -- Chapt. 25
25
2003
CS490N -- Chapt. 25
26
2003
Packet I/O
Physical frame divided into sixty-four octet units for transfer
Each unit known as mpacket
Division performed by interface hardware
Microengine transfers each mpacket separately
Header uses two bits to specify position of mpacket
CS490N -- Chapt. 25
27
2003
Packet I/O
(continued)
No interrupts
Dispatch loop uses polling
CS490N -- Chapt. 25
28
2003
CS490N -- Chapt. 25
29
2003
Packet Egress
Steps required
CS490N -- Chapt. 25
30
2003
CS490N -- Chapt. 25
31
2003
Other Details
Microengine must check status of mpacket to determine if
CS490N -- Chapt. 25
32
2003
Summary
Special operations used for
Synchronization
Memory access
CSR access
CS490N -- Chapt. 25
33
2003
Summary
(continued)
Many details required to use Intel dispatch macros
Packet I/O performed on sixty-four byte units called
mpackets
Mpacket can be transferred from RFIFO to
SDRAM
CS490N -- Chapt. 25
34
2003
Questions?
XXVI
An Example ACE
CS490N -- Chapt. 26
2003
We Will
Consider an ACE
Examine all the user-written code
See how the pieces fit together
CS490N -- Chapt. 26
2003
Basic concepts
Need to
CS490N -- Chapt. 26
2003
The Example
Trivial network system
In-line paradigm using two ports
Ethernet-to-Ethernet connectivity
Known as bump-in-the-wire
Frame carries IP
Destination port is 80
CS490N -- Chapt. 26
2003
Illustration Of WWBump
switch
router
wwbump
to Internet
...
connections to
local systems
CS490N -- Chapt. 26
2003
Design
Code written for bridal veil testbed
Accepts input from either port zero or port one
Forwards incoming traffic to opposite port
Uses the ingress and egress library ACEs supplied by Intel
Defines a wwbump MicroACE
Passes each Web packet to the core component (exception)
Provides access to current packet count via the crosscall
Uses an ixsys.config file to specify binding of targets
CS490N -- Chapt. 26
2003
Organization Of ACEs
wwbump ACE
(core)
crosscall
client
wwbump ACE
(microblock)
egress ACE
(microblock)
StrongARM
microengine
input
ports
CS490N -- Chapt. 26
ingress ACE
(microblock)
output
ports
2003
CS490N -- Chapt. 26
2003
Type
Purpose
name
tag
ue
bname
ccbase
string
int
int
string
ix_base_t
CS490N -- Chapt. 26
2003
<ix/asl.h>
<ix/microace/rm.h>
<ix/asl/ixbasecc.h>
<wwbump_import.h>
/*
/*
/*
/*
/*
*/
*/
*/
*/
*/
*/
};
/* Exception handler prototype */
int
exception(void *ctx, ix_ring r, ix_buffer b);
#endif /* __WWBUMP_H */
CS490N -- Chapt. 26
10
2003
CS490N -- Chapt. 26
11
2003
CS490N -- Chapt. 26
12
2003
"WWBUMP_TAG"
#endif
#endif /* _WWBUMP_IMPORT_H */
CS490N -- Chapt. 26
13
2003
Division Of Microcode
Into Files
Two main files
Follow Intel naming convention: macro and file names
related to microcode start with uppercase letter
WWBump.uc
WWB_dl.uc
CS490N -- Chapt. 26
14
2003
WWBump Macro
Performs two functions
Classification
Represented as disposition
*
StrongARM (exception)
CS490N -- Chapt. 26
15
2003
WWBump Details
Uses Intel ingress macro
Calls DL_GetInputPort to determine input port (0 or 1)
Calls DL_SetOutputPort to specify output port
When passing packet to core component
CS490N -- Chapt. 26
16
2003
CS490N -- Chapt. 26
17
2003
0x800
6
80
#macro WWBumpInit[]
/* empty because no initialization is needed */
#endm
#macro WWBump[]
xbuf_alloc[$$hdr,6]
/* Reserve a register (ifn) and compute the output port for the
/* frame; a frame that arrives on port 0 will go out port 1, and
/* vice versa
.local ifn
DL_GetInputPort[ifn]
alu [ ifn, ifn, XOR, 1 ]
DL_SetOutputPort[ifn]
.endlocal
CS490N -- Chapt. 26
*/
*/
*/
18
2003
*/
*/
.local etype
immed[etype, ETH_IP]
alu_shf[ --, etype, -, $$hdr3, >>16]
br!=0[NotWeb#]
.endlocal
*/
CS490N -- Chapt. 26
19
2003
*/
*/
*/
.local mask
immed[mask, 0x3c]
; Mask out upper and lower 2 bits
alu [ dpoff, dpoff, AND, mask ]
.endlocal
CS490N -- Chapt. 26
20
2003
*/
*/
*/
*/
*/
CS490N -- Chapt. 26
21
2003
.local wprt
immed[wprt, TCP_WWW]
alu[--, dport, -, wprt]
br!=0[NotWeb#]
.endlocal
.endlocal
CS490N -- Chapt. 26
22
2003
CS490N -- Chapt. 26
23
2003
CS490N -- Chapt. 26
24
2003
CS490N -- Chapt. 26
25
2003
CS490N -- Chapt. 26
26
2003
CS490N -- Chapt. 26
27
2003
Macro DLSAsource
CS490N -- Chapt. 26
28
2003
*/
*/
*/
*/
br[Send_MB#]
CS490N -- Chapt. 26
29
2003
*/
*/
*/
Drop_Packet#:
/* Drop the frame and start over getting a new frame */
DL_Drop[ ]
.endw
nop
nop
nop
.endlocal
CS490N -- Chapt. 26
30
2003
CS490N -- Chapt. 26
31
2003
Exception Handler
Function named exception
Receives packets from microblock
Must call Intel function RmGetExceptionCode
Can call Intel function RmSendPacket to forward packet to a
microblock
Three arguments
CS490N -- Chapt. 26
32
2003
Default Action
Function named ix_action_default
Required part of every ACE
Called when frame arrives from source other than
microblock component
Our version merely discards the packet
Code in file action.c
CS490N -- Chapt. 26
33
2003
CS490N -- Chapt. 26
34
2003
CS490N -- Chapt. 26
35
2003
*/
*/
*/
*/
*/
CS490N -- Chapt. 26
36
2003
CS490N -- Chapt. 26
37
2003
CS490N -- Chapt. 26
38
2003
<stdlib.h>
<string.h>
<wwbump.h>
<wwbcc.h>
CS490N -- Chapt. 26
39
2003
CS490N -- Chapt. 26
40
2003
CS490N -- Chapt. 26
41
2003
CS490N -- Chapt. 26
42
2003
CS490N -- Chapt. 26
43
2003
CS490N -- Chapt. 26
44
2003
CS490N -- Chapt. 26
44
2003
An Example Crosscall
Trivial example
CS490N -- Chapt. 26
44
2003
CS490N -- Chapt. 26
45
2003
interface wwbump
{
twoway long getcnt();
};
CS490N -- Chapt. 26
46
2003
Contents
Description
IFACE_intName
IFACE_FCN_opName
stub_IFACE_init()
stub_IFACE_fini()
IFACE_FCN_fptr
stub_IFACE_FCN()
CC_VMT_IFACE
deferred_cb_IFACE_IFCN()
IFACE_FCN_cb_fptr
CB_VMT_IFACE
getCBVMT_IFACE()
sk_IFACE_init()
sk_IFACE_fini()
getCCVMT_IFACE()
invoke_IFACE_IFCN()
IFACE_stub_c.h
IFACE_stub_c.h
IFACE_sk_c.h
CS490N -- Chapt. 26
47
2003
Contents
IFACE_cc_c.h
Description
IFACE_FCN()
IFACE_sk_c.h
IFACE_FCN_cb()
IFACE_stub_c.h
IFACE_cb_c.h
IFACE_stub_c.c
IFACE_sk_c.c
IFACE_FCN()
IFACE_sk_c.h
IFACE_FCN_cb()
IFACE_stub_c.h
IFACE_cc_c.h
IFACE_cb_c.h
CS490N -- Chapt. 26
48
2003
Description
IFACE_cc_c.c
IFACE_cb_c.c
CS490N -- Chapt. 26
49
2003
Initialization function
Finalization function
CS490N -- Chapt. 26
50
2003
<stdlib.h>
<string.h>
<wwbump.h>
<wwbcc.h>
"wwbump_sk_c.h"
"wwbump_cc_c.h"
long Webcnt;
CS490N -- Chapt. 26
51
2003
CS490N -- Chapt. 26
52
2003
}
/* Function that is invoked each time a crosscall occurs */
ix_error getcnt(ix_base_t* bp, long* rv)
{
(void)bp; /* Reference unused arg to prevent compiler warnings */
/* Actual work: copy the web count into the return value */
*rv = Webcnt;
/* Return 0 for success */
return 0;
}
CS490N -- Chapt. 26
53
2003
ACEs to be loaded
Named ixsys.config
CS490N -- Chapt. 26
54
2003
Configuration Examples
file 0 /mnt/SlowIngressWWBump.uof
CS490N -- Chapt. 26
55
2003
Configuration Examples
(continued)
Specifies
wwbump is a microace
CS490N -- Chapt. 26
56
2003
Configuration Examples
(continued)
Specifies
CS490N -- Chapt. 26
57
2003
Configuration Examples
(continued)
sh command
CS490N -- Chapt. 26
58
2003
Summary
CS490N -- Chapt. 26
59
2003
Questions?
X
Switching Fabrics
CS490N -- Chapt. 10
2003
Physical Interconnection
Physical box with backplane
Individual blades plug into backplane slots
Each blade contains one or more network connections
CS490N -- Chapt. 10
2003
Logical Interconnection
Known as switching fabric
Handles transport from one blade to another
Becomes bottleneck as number of interfaces scales
CS490N -- Chapt. 10
2003
output ports
2
switching
fabric
input
arrives
output
leaves
..
.
..
.
CS490N -- Chapt. 10
2003
CS490N -- Chapt. 10
2003
CS490N -- Chapt. 10
2003
input
arrives
switching fabric
output ports
..
.
..
.
output
leaves
CS490N -- Chapt. 10
2003
input
arrives
switching fabric
output ports
..
.
..
.
output
leaves
CS490N -- Chapt. 10
2003
Desires
CS490N -- Chapt. 10
2003
Desires
High speed
CS490N -- Chapt. 10
2003
Desires
High speed
Low cost
CS490N -- Chapt. 10
2003
Desires
High speed and low cost!
CS490N -- Chapt. 10
2003
Possible Compromise
Separation of physical paths
Less parallel hardware
Crossbar design
CS490N -- Chapt. 10
2003
controller
interface hardware
input ports
switching fabric
1
active
connection
2
inactive
connection
..
.
N
output ports
CS490N -- Chapt. 10
10
..
2003
Crossbar
Allows simultaneous transfer on disjoint pairs of ports
Can still have port contention
CS490N -- Chapt. 10
11
2003
Crossbar
Allows simultaneous transfer on disjoint pairs of ports
Can still have port contention
CS490N -- Chapt. 10
11
2003
Solving Contention
Queues (FIFOs)
Attached to input
Attached to output
At intermediate points
CS490N -- Chapt. 10
12
2003
input queues
input ports
switching fabric
1
..
.
N
output queues
output ports
CS490N -- Chapt. 10
13
..
2003
...
input ports
... M
output ports
CS490N -- Chapt. 10
14
2003
controller
input ports
output ports
shared memory
switching fabric
..
.
..
.
CS490N -- Chapt. 10
15
2003
Multi-Stage Fabrics
Compromise between pure time-division and pure spacedivision
Attempt to combine advantages of each
CS490N -- Chapt. 10
16
2003
Banyan Fabric
Example of multi-stage fabric
Features
Scalable
Self-routing
CS490N -- Chapt. 10
17
2003
2-input
switch
"0"
"1"
input #2
CS490N -- Chapt. 10
18
2003
SW1
SW3
"01"
inputs
outputs
"10"
SW2
SW4
"11"
(a)
8-input switch
"000"
SW1
"001"
4-input switch
(for details
see above)
SW2
"010"
"011"
inputs
outputs
"100"
SW3
"101"
4-input switch
(for details
see above)
SW4
"110"
"111"
(b)
CS490N -- Chapt. 10
19
2003
Summary
Switching fabric provides connections inside single network
system
Two basic approaches
CS490N -- Chapt. 10
20
2003
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????????????????????????????????????????
Questions?
XIII
Network Processor Architectures
CS490N -- Chapt. 13
2003
Architectural Explosion
CS490N -- Chapt. 13
2003
Principle Components
Processor hierarchy
Memory hierarchy
Internal transfer mechanisms
External interface and communication mechanisms
Special-purpose hardware
Polling and notification mechanisms
Concurrent and parallel execution support
Programming model and paradigm
Hardware and software dispatch mechanisms
CS490N -- Chapt. 13
2003
Processing Hierarchy
Consists of hardware units
Performs various aspects of packet processing
Includes onboard and external processors
Individual processor can be
Programmable
Configurable
Fixed
CS490N -- Chapt. 13
2003
Processor Type
Programmable?
On Chip?
8
7
5
6
4
3
2
1
yes
yes
yes
no
no
no
no
no
possibly
typically
typically
typically
typically
typically
possibly
possibly
CS490N -- Chapt. 13
2003
Memory Hierarchy
Memory measurements
Throughput
Cost
Can be
Internal
External
CS490N -- Chapt. 13
2003
Rel. Speed
Approx. Size
On Chip?
Control store
G.P. Registers
Onboard Cache
Onboard RAM
Static RAM
Dynamic RAM
100
90
40
7
2
1
103
102
103
103
107
108
yes
yes
yes
yes
no
no
CS490N -- Chapt. 13
2003
CS490N -- Chapt. 13
2003
CS490N -- Chapt. 13
2003
Example Interfaces
System Packet Interface Level 3 or 4 (SPI-3 or SPI-4)
SerDes Framer Interface (SFI)
CSIX fabric interface
Note: The Optical Internetworking Forum (OIF) controls the SPI and SFI
standards.
CS490N -- Chapt. 13
10
2003
Arrival of packet
Timer expiration
Two paradigms
Polling
Notification
CS490N -- Chapt. 13
11
2003
CS490N -- Chapt. 13
12
2003
I/O processors
No operating system
Low-overhead or zero-overhead
CS490N -- Chapt. 13
13
2003
CS490N -- Chapt. 13
14
2003
Assigns to a processor
CS490N -- Chapt. 13
15
2003
Implicit parallelism
CS490N -- Chapt. 13
16
2003
CS490N -- Chapt. 13
17
2003
Single processor
Passes packet on
Known as run-to-completion
CS490N -- Chapt. 13
18
2003
Parallel Architecture
coordination
mechanism
..
.
f(); g(); h()
CS490N -- Chapt. 13
19
2003
Pipeline Architecture
f ()
g ()
h ()
CS490N -- Chapt. 13
20
2003
Clock Rates
Embedded processor runs at > wire speed
Parallel processor runs at < wire speed
Pipeline processor runs at wire speed
CS490N -- Chapt. 13
21
2003
Software Architecture
Central program that invokes coprocessors like subroutines
Central program that interacts with code on intelligent,
programmable I/O processors
Communicating threads
Event-driven program
RPC-style (program partitioned among processors)
Pipeline (even if hardware does not use pipeline)
Combinations of the above
CS490N -- Chapt. 13
22
2003
CS490N -- Chapt. 13
23
2003
CS490N -- Chapt. 13
24
2003
almost no
data
Embedded (RISC) Processor
small amount
of data
I/O Processor
data to / from
programmable processors
Lower Levels Of Processor Hierarchy
data
arrives
CS490N -- Chapt. 13
data
leaves
25
2003
Summary
Network processor architectures characterized by
Processor hierarchy
Memory hierarchy
Internal buses
External interfaces
Programming model
Dispatch mechanisms
CS490N -- Chapt. 13
26
2003
Questions?
XIV
Issues In Scaling A Network Processor
CS490N -- Chapt. 14
2003
Design Questions
Can we make network processors
Faster?
Easier to use?
More powerful?
More general?
Cheaper?
Scale is fundamental
CS490N -- Chapt. 14
2003
CS490N -- Chapt. 14
2003
CPU
Embedded Proc.
I / O Processors
CS490N -- Chapt. 14
2003
CS490N -- Chapt. 14
2003
Memory Speed
Access latency
SRAM 2 - 10 ns
DRAM 50 - 70 ns
CS490N -- Chapt. 14
2003
Memory Speed
(continued)
Memory cycle time
CS490N -- Chapt. 14
2003
External DRAM
CS490N -- Chapt. 14
2003
Memory Bandwidth
General measure of throughput
More parallelism in access path yields more throughput
Cannot scale arbitrarily
Pinout limits
CS490N -- Chapt. 14
2003
Types Of Memory
Memory Technology
Abbreviation
Synchronized DRAM
SDRAM
QDR-SRAM
ZBT-SRAM
FCRAM
DDR-DRAM
RLDRAM
CS490N -- Chapt. 14
10
Purpose
Synchronized with CPU
for lower latency
Optimized for low latency
and multiple access
Optimized for random
access
Low cycle time optimized
for block transfer
Optimized for low
latency
Low cycle time and
low power requirements
2003
Memory Cache
General-purpose technique
May not work well in network systems
CS490N -- Chapt. 14
11
2003
Memory Cache
General-purpose technique
May not work well in network systems
CS490N -- Chapt. 14
11
2003
Memory Cache
General-purpose technique
May not work well in network systems
CS490N -- Chapt. 14
11
2003
CS490N -- Chapt. 14
12
2003
Arrangement Of CAM
one slot
CAM
..
.
CS490N -- Chapt. 14
13
2003
Known as key
CAM returns
CS490N -- Chapt. 14
14
2003
CS490N -- Chapt. 14
15
2003
T-CAM Lookup
Each slot has bit mask
Hardware uses mask to decide which bits to test
Algorithm
for each slot do {
if ( ( key & mask ) == ( slot & mask ) ) {
declare key matches slot;
} else {
declare key does not match slot;
}
}
CS490N -- Chapt. 14
16
2003
key
08
00
45
06
00
50
00
00
slot #1
08
00
45
06
00
50
00
02
mask
ff
ff
ff
ff
ff
ff
00
00
slot #2
08
00
45
06
00
35
00
03
mask
ff
ff
ff
ff
ff
ff
00
00
CS490N -- Chapt. 14
17
2003
CS490N -- Chapt. 14
18
2003
pointer
..
.
CAM
CS490N -- Chapt. 14
RAM
19
2003
Software Scalability
Not always easy
Many resource constraints
Difficulty arises from
Explicit parallelism
CS490N -- Chapt. 14
20
2003
Summary
Scalability key issue
Primary subsystems affecting scale
Processor hierarchy
Memory hierarchy
SRAM
SDRAM
CAM
CS490N -- Chapt. 14
21
2003
Questions?
XV
Examples Of Commercial Network Processors
CS490N -- Chapt. 15
2003
Commercial Products
Emerge in late 1990s
Become popular in early 2000s
Exceed thirty vendors by 2003
CS490N -- Chapt. 15
2003
Examples
Chosen to
Illustrate concepts
CS490N -- Chapt. 15
2003
CS490N -- Chapt. 15
2003
Routing
Switch
Processor
( RSP )
packets
arrive
packets sent
to fabric
configuration bus
Agere
System
Interface
( ASI )
CS490N -- Chapt. 15
2003
Non-procedural
ASL
CS490N -- Chapt. 15
2003
input
framer
data buffer
controller
output
interface
program
memory
pattern
engine
checksum /
CRC engine
control
memory
queue
engine
ALU
conf. bus
interface
functional bus
interface
CS490N -- Chapt. 15
2003
Purpose
Perform pattern matching on each packet
Control packet queueing
Compute checksum or CRC for a packet
Conventional operations
Divide incoming packet into 64-octet blocks
Control access to external data buffer
Connect to external configuration bus
Connect to external functional bus
Connect to external RSP chip
CS490N -- Chapt. 15
2003
input
input
interface
stream
editor
output
interface
output
config.
interface
traffic
manager
engine
queue
manager
logic
ext. sched.
SSRAM
CS490N -- Chapt. 15
packet
assembler
SSRAM
ext. sched.
interface
ext. queue
entry SSRAM
traffic
shaper
engine
ext. linked
list SSRAM
2003
Purpose
Input interface
Packet assembler
Queue manager logic
Configuration bus interface
Output interface
CS490N -- Chapt. 15
10
2003
Five-stage pipeline
Memory cache
I/O interfaces
CS490N -- Chapt. 15
11
2003
Alchemy Architecture
to
SDRAM
SDRAM controller
MIPS-32
embed.
proc.
fast IrDA
EJTAG
instruct.
cache
DMA controller
bus unit
Ethernet MAC
SRAM
bus
MAC
data
cache
LCD controller
USB-Host contr.
SRAM controller
USB-Device contr.
RTC (2)
interrupt controller
power management
GPIO
CS490N -- Chapt. 15
SSI (2)
I2 S
AC 97 controller
12
2003
Packet metering
Packet transform
Packet policy
CS490N -- Chapt. 15
13
2003
AMCC Architecture
external search
interface
external memory
interface
policy
engine
host
interface
metering
engine
six
nP cores
input
control iface
CS490N -- Chapt. 15
onboard
memory
debug port
inter mod.
14
output
test iface
2003
Pipeline Of Homogeneous
Processors (Cisco)
Parallel eXpress Forwarding (PXF)
Arranged in parallel pipelines
Packet flows through one pipeline
Each processor in pipeline dedicated to one task
CS490N -- Chapt. 15
15
2003
Cisco Architecture
input
MAC classify
Accounting & ICMP
FIB & Netflow
MPLS classify
Access Control
CAR
MLPPP
WRED
output
CS490N -- Chapt. 15
16
2003
CS490N -- Chapt. 15
17
2003
Cognigine Architecture
pointer
file
dictionary
packet buffers
registers & scratch memory
source
route
addr.
calc.
source
route
instr.
cache
source
route
dict.
decode
pipeline
ctl.
execut.
unit
CS490N -- Chapt. 15
source
route
data
memory
execut.
unit
18
execut.
unit
execut.
unit
2003
Pipeline Of Parallel
Heterogeneous Processors
(EZchip)
Four processor types
Each type optimized for specific task
CS490N -- Chapt. 15
19
2003
EZchip Architecture
CS490N -- Chapt. 15
TOPparse
TOPsearch
TOPresolve
TOPmodify
..
..
..
..
..
.
..
..
..
..
..
.
..
..
..
..
..
.
..
..
..
..
..
.
memory
memory
memory
memory
20
2003
CS490N -- Chapt. 15
Optimized For
Header field extraction and classification
Table lookup
Queue management and forwarding
Packet header and content modification
21
2003
CS490N -- Chapt. 15
22
2003
ingress
data
store
SRAM
for
ingress
data
CS490N -- Chapt. 15
ingress
switch
interface
external DRAM
and SRAM
internal
SRAM
from switching
fabric
egress
switch
interface
ingress
physical
MAC
multiplexor
egress
physical
MAC
multiplexor
packets from
physical devices
packets to
physical devices
23
egress
data
store
traffic
manag.
and
sched.
egress
data store
2003
H0
H1
H2
H3
to external memory
H4
D0
D1
D2
D3
D4
D5
D6
completion unit
hardware regs.
exceptions
ingress
data
store
ingress
data
iface
instr. memory
ingress
data
store
CS490N -- Chapt. 15
programmable
protocol processors
(16 picoengines)
classifier assist
frame dispatch
24
egress
queue
PCI
bus
egress
data
store
internal
bus
egress
data
store
2003
Packet Engines
Found in Embedded Processor Complex
Programmable
Handle many packet processing tasks
Operate in parallel (sixteen)
Known as picoengines
CS490N -- Chapt. 15
25
2003
Purpose
Data Store
Checksum
Enqueue
Interface
String Copy
Counter
Policy
Semaphore
CS490N -- Chapt. 15
26
2003
Dedicated
Parallel clusters
Pipeline
CS490N -- Chapt. 15
27
2003
C-Port Architecture
switching fabric
CS490N -- Chapt. 15
network
processor
1
network
processor
2
physical
interface 1
physical
interface 2
...
network
processor
N
physical
interface N
28
2003
C-Port Configured As
Parallel Clusters
SRAM
switching
fabric
SRAM
queue
mgmt.
unit
fabric
proc.
table
lookup
unit
pci
ser.
prom
Exec. Processor
DRAM
buffer
mgmt.
unit
CP-0
CP-1
CP-2
CP-3
...
connections to
physical interfaces
CS490N -- Chapt. 15
29
2003
Internal Structure Of A
C-Port Channel Processor
To external DRAM
memory bus
RISC Processor
extract
space
merge
space
Serial Data
Processor
(in)
Serial Data
Processor
(out)
packets arrive
packets leave
30
2003
Summary
Many network processor architecture variations
Examples include
CS490N -- Chapt. 15
31
2003
Questions?
XXVII
Intels Second Generation Processors
CS490N -- Chapt. 27
2003
Terminology
CS490N -- Chapt. 27
2003
General Characteristics
Same basic architecture
Embedded processor
Parallel microengines
Enhancements
Speed
Amount of parallelism
Functionality
CS490N -- Chapt. 27
2003
IXP2850
CS490N -- Chapt. 27
2003
Solution
Increase parallelism
CS490N -- Chapt. 27
2003
fabric
gasket
IXP2400
(ingress)
F
A
B
input
demux
R
IXP2400
(egress)
I
C
Memory interfaces
CS490N -- Chapt. 27
2003
IXP2400 Features
XScale embedded processor (ARM compliant) with caches
Eight microengines (400 or 600 MHz)
Eight hardware threads per microengine
Multiple microengine instruction stores of one thousand
instructions each
Two hundred fifty-six general-purpose registers
Five hundred twelve transfer registers
Addressing for two gigabyte DDR-DRAM
Addressing for thirty-two megabyte QDR-SRAM
Sixteen words of Next Neighbor Registers
CS490N -- Chapt. 27
2003
Memory Hierarchy
External DDR-DRAM
External QDR-SRAM
CS490N -- Chapt. 27
2003
Utopia 1, 2, or 3 interface
CS490N -- Chapt. 27
2003
PCI bus
coprocessor
bus
QDR
SRAM
classif.
acceler.
..
.
DDR
DRAM
ASIC
IXP2400
chip
Flash
Mem.
Slow Port
interface
Media or
Switch Fabric
CS490N -- Chapt. 27
flow
control
bus
input and output demux
2003
Internal Architecture
optional host connection
PCI bus
coprocessor
bus
IXP2400 chip
PCI
iface.
XScale
RISC
processor
classif.
acceler.
..
.
coproc.
iface.
ASIC
multiple,
independent
internal
buses
MEs 1 - 4
MEs 5 - 8
Flash
Mem.
slow
port
Media or
Switch Fabric
interface
Media or
Switch Fabric
CS490N -- Chapt. 27
SRAM
iface.
QDR
SRAM
DRAM
iface.
DDR
DRAM
hash
unit
scratch
memory
FC bus
iface.
receive
transmit
flow
control
bus
10
2003
Microengine Enhancements
Multiplier unit
Pseudo-random number generator
CRC calculator
Four thirty-two bit timers and timer signaling
Sixteen entry CAM
Timestamping unit
Support for generalized thread signaling
Six hundred forty words of local memory
Queue manipulation mechanism that eliminates the need for
mutual exclusion
CS490N -- Chapt. 27
11
2003
Microengine Enhancements
(continued)
ATM segmentation and reassembly hardware
Byte alignment facilities
Two ME clusters with independent buses
Called Microengine Version 2
CS490N -- Chapt. 27
12
2003
Other Facilities
Reflector mode pathways
CS490N -- Chapt. 27
13
2003
IXP2800 Features
Same general architecture as 2400
Sixteen microengines (1.4 GHz)
Two unidirectional sixteen-bit Low Voltage Differential
Signaling data interfaces that can be configured as
SPI-4 Phase 2
CS490N -- Chapt. 27
14
2003
Summary
Intel is moving to second generation of network processors
Three models announced
IXP2400
IXP2800
IXP2850
Faster processors
More parallelism
CS490N -- Chapt. 27
15
2003
Questions?
Stop Here!
Stop Here!
Stop Here!
Stop Here!
Stop Here!
Questions?