Professional Documents
Culture Documents
1
Quiz 1
• NIOS II processor – basics
• FPGA – basics
• Caches
– Performance
– Size, number of bits
– Block placement
– Block identification
– Block replacement
– Write strategy
2
Quiz 1 (Cont.)
Key terms:
• Flynn’s taxonomy
• Shared memory architectures
– Cache coherence
– NUMA, UMA, COMA
– Symmetric Multiprocessors
• Distributed memory systems
• Classification based on communication
• Classification based on type of parallelism
• Chapter 1 from the textbook
3
Quiz 1 (Cont.)
• Amdahl law
• Speedup, Efficiency
• Parallelism profile, average parallelism, MIPS
• Scalability
• Understanding of performance of the program for parallel
addition
4
Overview
• Network properties
• Switches
• Single and multistage Interconnection networks
• Crossbar
5
Network properties
6
Bisection width
• Bisection width is the minimum number of wires that must be cut to
divide the network into two equal halves.
Small bisection width -> low bandwidth
A large bisection width -> a lot of extra wires
7
Factors Affecting Performance
• Functionality – how the network supports data routing,
interrupt handling, synchronization, request/message
combining, and coherence
• Network latency – worst-case time for a unit message to
be transferred
• Bandwidth – maximum data rate
• Hardware complexity – implementation costs for wire,
logic, switches, connectors, etc.
8
2 × 2 Switches
9
Switches
10
Single-stage networks
11
Multistage Interconnection Networks
• The capability of single stage networks are limited but if we cascade enough
of them together, they form a completely connected MIN (Multistage
Interconnection Network).
• Switches can perform their own routing or can be controlled by a central
router
• This type of networks can be classified into the following four categories:
• Nonblocking
– A network is called strictly nonblocking if it can connect any idle input to any idle
output regardless of what other connections are currently in process
• Rearrangeable nonblocking
– In this case a network should be able to establish all possible connections
between inputs and outputs by rearranging its existing connections.
• Blocking interconnection
– A network is said to be blocking if it can perform many, but not all, possible
connections between terminals.
– Example: the Omega network
12
Omega networks
Example:
• Connect input 101 to output
001
• Use the bits of the
destination address, 001, for
dynamically selecting a path
• Routing:
- 0 means use upper output
- 1 means use lower output
14
Baseline networks
15
Crossbar Network
16
Crossbar Network
17
Problem
A) Use two-input AND and OR gates to construct NxN
crossbar switch network between N processors and N
memory modules. Use cij signal as the enable signal for
the switch in ith row and jth column. Let the width of each
crosspoint be w bits.
B) Estimate the total number of AND and OR gates
needed as a function of N and w.
18
Problem (cont.)
M1 M2
... Mn
Crosspoint
C11 C1n
P1 C12
P2 C21
C22 C2n
...
Cn1
Cn2
Cnn
Pn
19
Problem (cont.)
M1 M2
... Mn
Crosspoint
M1
C11 C1n
P1 C12
P2 C21
C22 C2n Crosspoint
...
Cn1
C11
Cn2
Cnn
Pn
P1
20
Problem (cont.)
P1 P2
Address Address
Decoder Decoder
1 2 1 2
C11
C12
C21
C22
21
Performance Comparison
Network Latency Switching Wiring Blocking
complexity complexity
22
Some Commercial Solutions [3]
• System-on-chip crossbar networks:
– Nexus from Fulcrum Microsystems
• The core is used in PMC-Sierra dual MIPS processor RM9000
23
References
1. Advanced Computer Architecture and Parallel
Processing, by Hesham El-Rewini and Mostafa Abd-El-
Barr, John Wiley and Sons, 2005.
2. Advanced Computer Architecture Parallelism,
Scalability, Programmability, by K. Hwang, McGraw-Hill
1993.
3. A. Lines, “Nexus: an asynchronous crossbar
interconnect for synchronous system-on-chip designs”,
Proc. of High Performance Interconnects, pp 2-7, 2003.
24