Professional Documents
Culture Documents
References
Light Reading Report on Switch Fabrics, available online at: http://www.lightreading.com/document.asp?doc_id=25989 Title: Network Processors Architectures, Protocols, and Platforms Author: Panos C. Lekkas Publisher: McGraw-Hill I. Elhanany, D. Chiou, V. Tabatabaee, R. Noro, A. Poursepanj, The Network Processing Forum Switch Fabric Benchmark Specifications: An Overview, IEEE Network Magazine, March/April 2005.
Backlog info
Crossbar
Data From Input N
Serdes
Line Interface 1
Switch Chip
Line Interface 2
Switch Chip
Line Interface 2
Line Interface 1
Line Interface 2
Switch Throughput
Throughput is the maximum normalized traffic rate between the line card and the switch card. Throughput can not be larger than one. Throughput is usually demonstrated by the average delay versus normalized rate plot. Theoretically it looks like a hockey stick! In practice since the buffering is limited delay curve gets saturated.
100
usec
10
1 1 0.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
10
Scheduling Problem
Scheduling algorithm specifies input-output contention. We can model a switch as a bipartite graph. We have two set of nodes corresponding to the input and output ports. There is a link between two nodes if there is buffered cell for that connection. The scheduling algorithm finds a matching in the given bipartite graph.
11
MWM
2 3
12
4
2 2
MWM
ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures
LPF
MNCM
13
Practical Approaches
These algorithms are not amenable to hardware implementation We use simple algorithms that are simple and can be implemented in hardware. To compensate for their low performance we make the switch works faster than the line-card (speedup). It is proved that any maximal size matching with 2X speedup can achieve 100% throughput. A matching is maximal if it is not possible to add anymore link to the matching.
14
15
Arbiter Connections
Output Arbiters
Input Arbiters
16
Inside an Arbiter
17
Multiple Iteration
We can increase matching size by doing multiple iterations. The arbiter pointers are only updated after the first iteration. Grant and Accept arbiters can perform their function in one clock cycle. If we want to do k iterations we need 2k clock cycles without pipelining. We can pipeline the job and reduce the time required.
Grant1 Accept1 Grant2 Accept2 Grant3 Accept3
18
19
http://tiny-tera.stanford.edu/~nickm/papers/adisak_thesis.pdf
ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures
20
http://tiny-tera.stanford.edu/~nickm/papers/adisak_thesis.pdf
ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures
21
22
12. Switch Architecture 13. Guaranteed Latency 14. TDM Support 15. Sub-ports per 10-Gbit/s Line Interface 16. Traffic Flows per 10-Gbit/s Port 17. Frame Payload (Bytes) 18. Frame Distribution Across Fabric 19. Fabric Overspeed 20. Backplane Link Speed 21. Backplane Links per 10Gbit/s Port 22. Redundancy Modes 23. Host Interface
23
Performance Benchmarking
Traffic Modeling
Performance Metrics
Benchmark Suites
24
Traffic Modeling
Destination Distribution: The Zipf law has been proposed to model nonuniform traffic distribution between destinations.
Zipf (i )
i k
j 1
j k
k=0 corresponds to uniform traffic k= infinity completely preferred destination Typically k varies from 0 to 5
25
Traffic Modeling
Packet arrival process: Bernoulli i.i.d. arrivals ON/OFF model ON/OFF model with non-delimited burst streams ON/OFF model with minimum burst size. Mulitcast Multiplicity factor: Realistically should not exceed 10 with an average value of 2-4. Distribution of the detinations QoS Distribution of the traffic among a number of classes
26
Performance Metrics
Fabric Latency: Latency between point 2 and 3. Total Latency: Latency between point 1 and 3. Accepted vs. offered bandwidth: The number of cells fabric accept at point 2 divided by the number of frames offered to it at point 1. Jitter: Difference in the time interval between a pair of consecutive cells belonging to the same flow at the ingress and the egress.
ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures
27
Benchmark Suites
Hardware Benchmarks: Memory speed, processing speed, port-to-port minimum latency, switch fabric overhead, internal cell size. In these test there is no contention between packets to minimize scheduling and arbitration impacts. Zero load latency, maximum port load
Baisc port pair test with variable size packet
1.02
1 0.98 0.96 0.94 0.92 0.9 0.88 0 20 40 60 80 100 120 140 Packet size
28
Benchmark Suites
Arbitration Benchmarks Studies performance of the fabric when there is contention. Performance is studied for different traffic patterns and load destination distribution.
Summary Performance Chart
1000 1.2
100 0.8
usec
10
usec
10
0.6 0.4
1 1 0.1
Gb/sec offered load
9 10 11 12 13 14 15 16 17
0.2 0
Fabric Latency
Total Latency
Jitter
Submitted/Offered
29