mpirun -np 8 ./heat < heat.input
mpiAllGathRedScatt n
performs an all-gather and reduce-scatter on buffers
of n words. It has three versions of these algorithms:
mpirun -np 8 ./mpiAllGathRedScatt 1000
.
You will notice that it only prints out version 0 timings,
as the other timings are so far yet to be implemented (your task!).
Implement the version 1 algorithms and test. When satisfied, use the batch file. Which versions was better, and on which values of n and the number of processes?
Time permitting, add the code for version 2 and repeat. Hint: avoid sending to self - may cause subtle deadlock issues. memcpy() can be used instead.
The synopsis of the MPI functions for versions 1 and 2 is:
int MPI_Bcast(void *buf, int count, MPI_Datatype dt, int root, MPI_Comm comm); int MPI_Reduce(const void *sbuf, void *rbuf, int count, MPI_Datatype dt, MPI_Op op, int root, MPI_Comm comm); int MPI_Scatter(const void *sbuf, int scount, MPI_Datatype sdt, void *rbuf, int rcount, MPI_Datatype rdt, int root, MPI_Comm comm); int MPI_Gather(const void *sbuf, int scount, MPI_Datatype sdt, void *rbuf, int rcount, MPI_Datatype rdt, int root, MPI_Comm comm); int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm); int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status);