The presented solver PAISS initially distributes data (i.e. the
coefficient matrix and vectors as used by an iterative method)
heterogeneously onto the used processors according to their currently
available computational performances, which are measured with a micro
benchmark. As the load - and thus the available computational
performance for an individual process - may change dynamically, a
redistribution of data may be necessary to achieve a balanced load
and thus a minimal run time. Balancing the load in PAISS is done by
removing data (i.e. the rows of a coefficient matrix and vector
elements) from overloaded processors and by distributing it to others.
Appropriate new matrix data structures - both for dense and
unstructured sparse matrices - have been designed and included in
PAISS, that allow a dynamic redistribution at run time, but still
achieve good floating-point performances (compared to static matrix
data structures as implemented in other parallel iterative solvers).
Timing experiments on different hardware platforms have shown that
the new designed dynamic matrix data structures are almost as
efficient as static ones.
A basic DLB algorithm has been developed, that redistributes matrix
rows and corresponding vector elements by removing them from
overloaded processors and allocating them to others. As the overhead
introduced by DLB in an iterative solver routine is mainly the
communication of matrix rows, a main effort has been made in reducing
this overhead appropriately. Six different methods have been
implemented to redistribute matrix rows. The first performs the
redistribution with blocking communication, while the second and
third methods allow an overlapping of the DLB communication with
useful computation (e.g. a matrix-vector multiplication). In the
fourth method, only the most overloaded processor is permitted to
send data to other processors, so communication is reduced (this
method is only useful on shared networks like Ethernet
or Token-Ring). Finally, two methods operate with overlapping
matrix regions, i.e. matrix rows which are stored redundantly on
several processors (but are only used on one processor within a
matrix-vector multiplication). DLB is then performed in a logical
fashion by adjusting the set of used rows within the bounds of the
stored rows on a processor.
An essential part of a DLB algorithm is the measurement of all
processor's loads. In PAISS, the wall clock time spent in the
computational routines is measured and distributed in each iteration.
This approach is totally system independent and it introduces almost
no overhead. The DLB algorithm then attempts to balance these
computational times on all processors by appropriately distributing
data.
PAISS is intended to solve large sparse (or dense) linear systems
of equations with iterative methods like Conjugate Gradient (CG).
Iterative solver routines can be implemented by using the basic
routines that are provided by PAISS. Preconditioning routines that
only use the given arithmetic operations (i.e. matrix-vector
multiplication, vector additions (DAXPY) and dot products) can also
be implemented easily. As an example, the CG method with diagonal and
polynomial preconditioning has been implemented and tested. A PAISS
solver routine can be run in serial or parallel mode without any
changes in the code.
Experiments on different hardware platforms have shown the suitability
of the DLB approach as applied within PAISS. Tests have been performed
with dense, band as well as sparse matrices from the
Harwell-Boeing collection. Compared with the state-of-the-art parallel
iterative solver package Aztec, the results achieved by PAISS are quite
good. On an idle homogeneous platform, similar run times have been
measured for both packages, whereas on a heterogeneous or dynamic
platform PAISS often achieves better results than Aztec.
The PAISS implementation is written in ANSI C and uses MPI for
communication. A first prototype runs on SUN workstations. Additionally,
it has been ported to an IBM SP-2 and
to a Swiss SCS Gigabooster.
Peter Christen
April 5, 1999