376
Parallel Processing and Parallel Algorithms
Two-Dimensional Mesh SIMD Model
Given a two-dimensional mesh SIMD architecture with wraparound connection,
there is an algorithm that uses n° processors to perform matrix-matrix multipli-
cation. Consider an arbitrary element C[ij] of the product matrix. If B, denotes
the j, column vector of B and A, denotes the i, row vector of A, then Ci is the
product of row i of matrix A and column j of matrix B. The parallel algorithm
‘computes the product in three phases. Initially, the processor P, located at posi-
tion (ij), row i and column j, stores Ali,j] and Bfi,j] elements of the matrices. In
this distribution, only n processors contain a pair of elements of A and B. How-
ever, itis possible to broadcast elements so that every processor has appropriate
‘elements to produce the specific element of the product C = A * B. This can be
done by an upward rotation of the element of B and a leftward rotation of the
‘element of A stored in each processor. This initial distribution of the elements of
the matrices is phase one of the algorithm. In phase two of the algorithm the dot
product of the stored elements of each processor is computed. In phase three, the
result of phase two is broadcast to the neighboring processors in the leftward
and upward direction for the elements of A and B, respectively. After n itera-
tions of phase three of the algorithm, the element C[ij] of the product is present
in the processor P,,
Procedure Parallel_Matrix_Matrix(A,B,C)
Phasel
for k=Oton-1do
for P, where 0 k
then Afi - 14] = Alig)
endif
ifj>k
thenBUi - 1j] = Bli]
endif
endfor P,
endfor
Phase2
for P, where 0< ij Sn 1 do in parallel
Cli) = Ali) * Blid)
endfor P,
Phase3
for k
ton-1do
for P, where 0