next up previous contents
Next: 1.6.2 Attaining better performance Up: 1.6 Implementation of Basic Previous: 1.6 Implementation of Basic

1.6.1 Matrix-matrix multiplication

   

C = A B : Parallelizing C = A B becomes straight-forward when one observes that

displaymath12677

Thus the parallelization of this operation can proceed as a sequence of rank-1 updates, with the vectors y and x equal to the appropriate column and row of matrices A and B , respectively.

tex2html_wrap_inline12697 : For this case, we note that

displaymath12678

This time, the parallelization of the operation can proceed as a sequence of matrix-vector multiplications, with the vectors y and x equal to the appropriate column and row of matrices C and B , respectively.

tex2html_wrap_inline12707 : Notice that tex2html_wrap_inline12709 is equivalent to computing tex2html_wrap_inline12711 , and thus the computation can proceed by computing

displaymath12679

The matrix-vector multiplication schemes described earlier can be easily adjusted to accommodate for this special case.

tex2html_wrap_inline12713 : On the surface, this operation appears quite straight-forward:

displaymath12680

Thus the parallelization of this operation can proceed as a sequence of rank-1 updates, with the vectors y and x equal to the appropriate row and column of matrices A and B , respectively. However, notice that the required spreading of vectors is quite different, requiring the spreading of a matrix row within rows of nodes and a matrix column within columns of nodes. It should be noted that without the observations made in Section 1.4.2 about spreading matrix rows and columns by first redistributing like the inducing vector distribution, this operation is by no means trivial when the mesh of nodes is non-square.


next up previous contents
Next: 1.6.2 Attaining better performance Up: 1.6 Implementation of Basic Previous: 1.6 Implementation of Basic

rvdg@cs.utexas.edu