bash-4.1$ . /short/c37/modules/chapel-1.16.0/util/setchplenv.bash
Chapel requires a newer version of OpenMPI than we have been using (it needs
support for the MPI_THREAD_MULTIPLE
threading mode).
You can confirm that your Chapel environment is correctly configured for Raijin,
as follows: bash-4.1$ chpl --version
chpl Version 1.16.0
bash-4.1$ env | grep CHPL
CHPL_COMM=gasnet
CHPL_COMM_SUBSTRATE=ibv
CHPL_HOST_PLATFORM=linux64
CHPL_TARGET_ARCH=sandybridge
CHPL_HOME=/short/c37/modules/chapel-1.16.0
CHPL_LAUNCHER=gasnetrun_ibv
bash-4.1$ mpirun -version
mpirun (Open MPI) 3.0.0
Report bugs to http://www.open-mpi.org/community/help/
The file daxpy.chpl contains a simple Chapel program which allocates two arrays X and Y, and then performs the following update:
Y = α × X + Y
Both arrays share the same domain, but they have different distributions (i.e. the points in the domain are mapped differently to the set of locales).
Compile and run the program on four domains:
make
./daxpy -nl 4 --N 10000
You should see that X is divided into equal chunks for each locale using a
block
distribution, whereas Y is stored entirely on Locales[0].
Change the program so that both arrays use the same domain map. You should see a significant improvement in performance - can you explain why?
Now change the program so that Y uses a cyclic distribution across all locales. How does this affect the performance? What about if both arrays use the same cyclic distribution?
The file pingPong.chpl contains a Chapel version of the 'ping-pong' benchmark, which measures communication bandwidth between locales.
Review the code; you will see that there are no explicit messages like there were in the MPI version of ping-pong. Where does communication take place, and why?
Run the program on [1,2,4] locales using the batch_pingPong script. How does the bandwidth measured for Chapel compare with the bandwidth you measured for MPI?