next up previous contents
Next: Forming Up: Example: Parallelizing Matrix-Matrix Multiplication Previous: Forming

Forming tex2html_wrap_inline15230

For this operation to be well-defined, we require matrices, A , B , and C to have dimensions tex2html_wrap_inline15238 , tex2html_wrap_inline15240 , and tex2html_wrap_inline15242 , respectively.

If we partition

displaymath15226

then

displaymath15227

for tex2html_wrap_inline15244 . Thus, a matrix-matrix multiply can be written as a sequence of matrix-vector multiplies

Basic Parallel Algorithm:

We can implement the above sequence of matrix-vector multiplies by using views, splitting off one column and row at a time: Let C and B be partitioned like

displaymath15246

where tex2html_wrap_inline15254 and tex2html_wrap_inline15256 equal the first column of C and first row of B . Then

eqnarray9881

So the computation of C can be viewed matrix-vector multiply of A times the first row of B , after which the operation tex2html_wrap_inline15268 can be performed, using the same approach, recursively.

This leads to the following algorithm:

Again, the `` tex2html_wrap_inline15286 '' symbol is used to indicate assignment, while ``='' indicates a reference or partitioning.

Blocked Algorithm:

In order to get better performance, it is advantageous to set up the algorithm to perform a sequence of matrix-panel-of-vector operations: Let C and B be partitioned like

displaymath15290

where tex2html_wrap_inline15300 and tex2html_wrap_inline15302 now equal a panel of s columns. Then

eqnarray9916

So the computation of C can be viewed as a matrix-times-panel-of-vectors multiply ( A times the first panel of rows of B ), followed by a recursive operation tex2html_wrap_inline15312 using the same approach.

This leads to the following algorithm:

Again, the `` tex2html_wrap_inline15332 '' symbol is used to indicate assignment, while ``='' indicates a reference or partitioning. The resulting code is given in Fig. gif. We will often refer to this approach to parallel matrix-matrix multiplication as a     matrix-panel variant, to indicate that the basic operation uses matrix-panel-of-vectors (rows or columns) multiplication.

PLACE BEGIN HR HERE

figure10763

PLACE END HR HERE

Pipelined Algorithm:

For large n , the above algorithm can be improved by introducing pipelining. This can be accomplished by setting the natural flow of computation to be to the right or left, for communication within rows, or to the top or bottom, for communication within columns.


next up previous contents
Next: Forming Up: Example: Parallelizing Matrix-Matrix Multiplication Previous: Forming

rvdg@cs.utexas.edu