We start by explaining the simple case of matrix-vector multiplication:
where A is an matrix. For our explanations, we will assume that matrix A and vectors x and y are created according to the driver given in Example 2.3. Thus, the template is created with a communicator and a blocking of . Given this template, vectors x and y are aligned with the first element of the template vector and matrix A with the upper-left element of the template matrix. This yields objects a, x, and y.
As explained in Section 1.5, the following steps will perform the matrix-vector multiply y = A x :
We will now show how to translate these operations into PLAPACK code.
- spread (collect) the entries of x within columns of nodes,
- perform the local matrix-vector multiply, and
- perform a distributed reduce (summation) of the local results within rows, leaving the global result in vector y .
The mechanism used by PLAPACK to communicate is to describe
the initial and final distribution as objects, and perform
a copy or reduce.
Thus, the following statements will perform the
spread of x within columns of nodes:
Finally, the local results (in the different versions of the duplicated
projected vector ydup) must be reduced into a single vector y :
PLA_Obj_datatype( a, &datatype );
PLA_Pvector_create( datatype, PLA_PROJ_ONTO_ROW, PLA_ALL_ROWS,
n, template, PLA_ALIGN_FIRST, &xdup );
PLA_Copy( x, xdup );
After this, all information is available locally to perform
the local matrix-vector multiply. Before doing so, we need to create
duplicated multiscalars to hold the constants ``0'' and ``1''.
Also, a duplicated projected vector (column) must be created
to hold the result:
PLA_Mscalar_create( datatype, PLA_ALL_ROWS, PLA_ALL_COLS, 1, 1, template, &one );
PLA_Obj_set_to_one( one );
PLA_Mscalar_create( datatype, PLA_ALL_ROWS, PLA_ALL_COLS, 1, 1, template, &zero );
PLA_Obj_set_to_zero( zero );
PLA_Pvector_create( datatype, PLA_PROJ_ONTO_COL, PLA_ALL_COLS,
m, template, PLA_ALIGN_FIRST, &ydup );
PLA_Local_gemv( PLA_NO_TRANSPOSE, one, a, xdup, zero, ydup );
PLA_Obj_set_to_zero( y );
PLA_Reduce( ydup, MPI_SUM, y );
Next: General implementation
Up: Example: Parallelizing Matrix-Vector Multiplication
Previous: Example: Parallelizing Matrix-Vector Multiplication
rvdg@cs.utexas.edu