In this exercise, we will develop a Chapel version of the heat program that we used in previous exercises.
Setup your Chapel environment on Raijin just as for the previous session.
The variable mxDiff is computed using an array reduction - make sure you understand how this works.
bash-4.1$ make bash-4.1$ ./heat -nl 1 < heat.input
Change the program so that the loop which computes individual points in tnew is performed in parallel. Does this speed up the computation?
Try setting different values -- e.g. [1,2,4,8] -- for the environment variable CHPL_RT_NUM_THREADS_PER_LOCALE to control the number of Chapel tasks used to execute forall loops. Do you get different results if you change the number of Chapel tasks used?
Change the program to partition the arrays told and tnew across all locales using a block distribution.
Run the program on [1,2,4,8] locales using the batch_heat script, and observe the results.
Note the following line from the batch script:
This modifies the MPI command used to launch the Chapel program. What effect does it have - what happens if you remove this line? You may wish to consult the documentation for mpirun, particularly the options to map processes.
MPIRUN_CMD="mpirun -np %N --bind-to socket --map-by ppr:1:socket %C"
The communication performance of the program using a block distribution is far from optimal. In every iteration of the main loop, each locale needs to access multiple elements of told that are stored on other locales. In the current version of the program, each element is accessed individually, resulting in many more messages than the original MPI version of the program.
Improve your program by using a stencil distribution. Run the improved program on [1,2,4,8] locales using the batch_heat script, and observe the results.