next up previous contents
Next: More Operations and Information Up: Example: Parallelizing Rank-1 Update Previous: Simple implementation

General implementation

The above algorithm generalizes in a straight-forward manner to tex2html_wrap_inline14593 , where x and y can have any valid vector distribution, including projected and/or duplicated. As for the matrix-vector multiply, some care must be taken in creating xdup and ydup. Notice that xdup must be aligned with the columns of a, while ydup must be aligned with the rows of a. Creating xdup is now accomplished through the call

PLA_Pvector_create_conf_to( a, PLA_PROJ_ONTO_ROW, PLA_ALL_ROWS, &xdup );
PLA_Pvector_create_conf_to( a, PLA_PROJ_ONTO_COL, PLA_ALL_COLS, &ydup );
After this, all required communication and alignment is again hidden in the PLA_Copy routines. A code that generalizes even further, implementing the full functionality of the sequential tex2html_wrap_inline14599 ger operation in given in Figure gif. Notice that in the implementation, we reverse the meaning of vector x and y with respect to the explanation given above, to reflect the order of x and y traditionally given when defining the BLAS. Thus, the code implements tex2html_wrap_inline14609 !

PLACE BEGIN HR HERE

figure8190

PLACE END HR HERE


next up previous contents
Next: More Operations and Information Up: Example: Parallelizing Rank-1 Update Previous: Simple implementation

rvdg@cs.utexas.edu