s ANU - College of Engineering and Computer Science - SoCS - COMP3320
CECS Home | ANU Home | Search ANU
ANU College of Engineering and Computer Science
Research School of Computer Science
Printer Friendly Version of this Document

UniSAFE

High Performance Scientific Computing

COMP3320/COMP6464 Molecular Dyanmics Project

Due by 18:00 on Thu 5 April 2012

This assignment is worth 20% of your total course mark.

  • For COMP3320 students your assignment will be marked out of 40
  • For COMP6464 students your assignment will be marked out of 50
Details of how marks are assigned and what you are required to submit are given below.

The aim of the project is to develop an elementary molecular dynamics (MD) code and associated visualization software. MD is extremely widely used in computational chemistry, physics, biology, material science etc. It was also one of the first scientific uses of computers going back to work by Alder and Wainwright who, in 1957, simulated the motions of about 150 argon atoms. (See Alder, B. J. and Wainwright, T. E. J. Chem. Phys. 27, 1208 (1957) and J. Chem. Phys. 31, 459 (1959))

In MD each atom is represented as a point in space. The atoms interact with each other via pairwise potentials, in a similar manner to gravity interacting between the planets, sun, moons and stars. In practice this is an approximation since for atoms quantum effects are often important - but we'll ignore this and confine our discussions to "classical" MD.

Gravitational bodies interact via a potential that goes as m1m2/r12 where m1 and m2 are the masses of the bodies involved and r12 is the distance between their centres of mass. For the purpose of this work we will use a potential of the form

p12 = 1/(r12)12 - 2/(r12)6

To evaluate the interaction for many bodies we must simply perform a sum over all unique two body interactions. Thus for 3 particles we will have interactions 1-2, 1-3 and 2-3.

Consider now a cubic box with a side length of R. Along each dimension of the box we arrange N atoms as indicated by the following figure (not all atoms are shown!), such that there are a total of N*N*N atoms.

For this configuration of atoms we could calculate the total interaction energy as a sum over all atoms of all the pairwise interactions of that atom with all others - we term this the total "potential" energy (PE) of the system.


STEP 0: Visual display and potential energy evaluation (preliminary, no mark)


    Your initial objective is to produce a basic code that can read as input N (the number of atoms in each dimension) and R (the size of the box in each dimension). Your code should also compute and print out the value for the potential energy.

    Develop this code using Visual Python (VPython).

    Your code should display a visual window showing all the atoms neatly arranged in a cube, and you should printout the total potential energy (in the text window).

    Note that your python code, call it md1.py should accept the following arguments in order, i) N - number of atoms in each dimension of the box ii) R - the size of the box in each dimension, iii) tstep - the time step, and iv) iter - the number of time steps in your simulation. It should therefore run as follows: python md1.py N R tstep iter.


STEP 1: Molecular Dynamics: force, velocity, and movement (10 marks)


    Given the pairwise potential there will be a pairwise force. This corresponds to the derivative of the potential with respect to R12. Assuming the location of each atom is given in Cartesian coordinates the Cartesian forces are given by:

    Fx = (12/r1214 - 12/r128)(x1-x2)
    Fy = (12/r1214 - 12/r128)(y1-y2)
    Fz = (12/r1214 - 12/r128)(z1-z2)

    The total force on a given atom will be the sum of all the pairwise forces.

    Assuming that the forces on at least some of the atoms are non-zero then the atoms will want to move. MD is the process of moving the atoms according to the forces they are subjected to. Moving the atoms equates to integrating the equations of motion over time or solving a differential equation. There are a variety of ways in which we can do this, but a popular method is the so called "velocity verlet" algorithm (W.C Swope et al J. Chem. Physics 76 637 (1982)). This has the following form

    coord(t+tstep) = coord(t) + tstep * (velocity(t) + tstep*accel(t)/2)
    velocity(t+tstep) = velocity(t) + tstep * (accel(t) + accel(t+tstep))/2

    where t is the current time, and tstep is the timestep (the integration step). And by assuming that all particles have unit mass we can replace the acceleration by the force (F = ma).

    When the particles begin to move they will gain kinetic energy (KE = 1/2 mv2) with the total KE just a simple sum of all the atomic KEs. If there is no source or sink of energy then the total energy (PE+KE) should remain constant, so the gain in KE is offset by a lose of PE.

    Extend your md1.py to compute the force on each atom and allow the particles to move. Your code will need to also read in the timestep and the total number of timesteps and to compute and printout the potential, kinetic and total energy at each timestep.

    HERE are some sample outputs for you to verify that your results are correct

    Step 1 Submission Requirements

    You are required to submit your md1.py code.

    • Your md1.py code will be tested on PARTCH - so make sure it works without errors on this machine.
    • Your code will be run from the command line by typing
      python md1.py N R tstep iter
      where N is the integer number of atoms in each dimension, R is the floating point dimension of the initial cube, tstep is the floating point timestep, and iter is the integer total number of iterations.
    • Your code should output to stdout the total energy, potential energy, and kinetic energy at every timestep using the following format:
            Dimension of Cube 2.000000
            Total atoms       27
            Time step         0.001000
            Number of steps   10
      
            TIME         Total Energy            PE                 KE
            -------------------------------------------------------------------
            0.000000      -7.577011e+01      -7.577011e+01       0.000000e+00 
            0.001000      -7.577011e+01      -7.577026e+01       1.504815e-04 
            0.002000      -7.577011e+01      -7.577071e+01       6.018923e-04 
            0.003000      -7.577011e+01      -7.577147e+01       1.354131e-03 
            0.004000      -7.577011e+01      -7.577252e+01       2.407030e-03 
            0.005000      -7.577011e+01      -7.577387e+01       3.760354e-03 
            0.006000      -7.577011e+01      -7.577552e+01       5.413798e-03 
            0.007000      -7.577011e+01      -7.577748e+01       7.366993e-03 
            0.008000      -7.577011e+01      -7.577973e+01       9.619501e-03 
            0.009000      -7.577011e+01      -7.578228e+01       1.217082e-02 
           
    The marks for this step will be given based on the following:
    • compliance with submission process as detailed at end of this assignment;
    • the readability and structure of your code;
    • whether your code can display structures correctly;
    • the correctness in calculating the energy values, forces, and velocities;
    • whether your code generates the correct format as the sample output.

STEP 2: Python/C interfacing (10 marks)


    In this step you are required to re-write the compute intensive parts of your Python code in C, and modify your Python code to call the C module. The Python code should read in the various parameters and display the positions of the atoms as they move. The Python code should communicate the starting parameters to the C code. This should then run the MD simulation and periodically to communicate the coordinates of the atoms back to the Python program to be displayed. It is not expected that you communicate the coordinates at every timestep.

    In addition to the visual output, you should provide text output that details the input parameters, and provides the total energy, potential energy and kinetic energy periodically during the course of the simulation.

    Within the Python portion of your code use time.clock() routine to report the total time taken to perform the simulation. The start time is from the beginning of the first simulation step, while the end time is after the last iteration. This means you do not have to include setup time, but once the clock is started it should not be stopped again until everything is complete.

    Here are some useful links on Python/C API and the example codes presented in the course lecture:

    Step 2 Submission Requirements

    Name your modified Python code md2.py and your C code md2Module.c. You are required to submit there two files together with a Makefile that will compile the C code and create the relevant shared object library.

    • Your md2.py code will be tested on PARTCH - so make sure it works without errors on this machine. The paths to the Python include files in Partch for your C compiler are /usr/include/python2.6 and /usr/include/python2.6/Numeric. You may need to use -fPIC option in your compiler to cope with the bugs in the build system, and use the -shared option to link the shared library. (If you develop your code on another machine you will need to work out these details for that machine)
    • Your code will be run from the command line by typing
      python md2.py N R tstep iter update
      where the new "update" argument is an integer specifying how often the python code should update the positions of the atoms. A value of 5 would imply that Python only visually moves the atoms every 5th timestep. Also only print out the energy values every 5th timestep.
    • Your code should generate the same output as for md1.py
    The marks for this step will be given based on the following:
    • compliance with submission process as detailed at end of this assignment;
    • the readability and structure of your Python and C codes;
    • your choice of what is done in Python and what is done in C;
    • correct interfacing between Python and C code;
    • correctness of your code in calculating energy values, forces, velocities;

STEP 3: Accuracy and performance analysis of Python and C codes (14 marks)


    You have now developed both Python and Python/C versions of the same application code. In this step you are required to analyze and compare their accuracy and performance. Your are required to submit a short report of no more than 4 A4 pages addressing the issues raised below. You are encouraged to use the tools introduced to you during the labs and lectures, and present your results using tables, graphs and figures as you see fit.

    Step 3 Submission Requirements

    You are required to submit a report titled Step3_Report.pdf addressing the following issues:

    1. Compare the numerical results printed by md1.py with those of md2.py for a range of different input values. Include input parameters that lead to results where rounding is the dominant source of numerical error and input parameters that lead to results where truncation is the dominant source of numerical error. Clearly identifying in your report which case is which and why. (4 marks)
    2. What is the per-iteration (i.e. each timestep) execution time of md1.py and md2.py as a function of different problem dimensions? In each case detail how you expect your code to scale as a function of this input parameter, then provide performance data to validate or invalidate your expectation. If you were asked to say what the largest possible calculation was that you could perform with each version of the code - what would your answer be? (4 marks)
    3. What is the most computationally intensive component of your code in terms of time taken to execute? Is this the same for both versions of the code? During each iteration, what proportion of the total execution time is spent calculating the new i) coordinates, ii) forces, and iii) velocities? How does this proportion change as problem dimension increases? Does the most computationally intensive component correspond to the lines of code that is executed most often? (4 marks)
    4. Good data structure and algorithms can improve the efficiency of your code. Describe in plain language what you have done to improve the efficiency of your code, for example, in sequence of accessing the atoms, and so on. (2 marks)

STEP 4: Benchmarking and Performance Tuning (6 marks)


    In lectures we discussed the role of benchmarks. In this step you are required to convert your md code into a benchmark that will run on both Partch and NCI XE system (IDs for the XE will be distributed in the labs in week 3). For the benchmark produce a pure C version of your code, which can perform the same functions as listed in step 2 and 3, but without displaying the atoms and only printing energies every 10 timesteps. We call this version mdc.c

    Step 4 Submission Requirements

    You are required to submit mdc.c and a report titled Step4_Report.pdf (max 2 A4 pages).

    Your mdc.c code should

    • Compile on Partch and the XE using the same Makefile as mentioned above by typing either
      Make mdc_partch or Make mdc_XE.
    • In both cases the executable produced should be called mdc.exe
    • The executable should be run by issuing the following command mdc.exe N R tstep iter
      where the parameters are as defined above.
    • Tune the code to achieve the best possible performance on the two platforms.
    • On the XE system you may want to note that there are a number of different compilers available that can be activated by loading different modules.
    Your report should consider the performance of the benchmark on both platforms. Is the performance what you expected? How did you tune the benchmark to run fast on each platform?

    Marks for this step will be given based on:

    • compliance with submission process as detailed at end of this assignment;
    • readability and quality of your benchmark code;
    • correctness of the benchmark test results;
    • how fast your code runs;
    • quality of interpretation and analysis of the results in terms of correctness, breadth and depth.

STEP 5 (COMP6464 ONLY): Structure Optimisation (10 marks)


    Your objective is to determine the value of R that minimizes the total potential energy for a given value of N. We will do this numerically.

    This process works as follows. Given an initial size R and number of atoms N, set a range [R1, R2] so that R is in [R1, R2]. Divide [R1, R2] into evenly distributed test values of R. For each test value run you md code to get the value of the total potential energy at this R value. Which test value of R has the lowest potential energy? Revise your test range around this value and repeat the process using R values that are closer together. Continue the process until you have determined the optimal value of R to within some numerical threshold.

    In what follows you can perform the above manually, or you can write a python script to do this. You can also use more elaborate minimization techniques. Some useful numerical implementation of the minimisation methods can be found in the following book:
    William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian R. Flannery. Numerical Recipes in C - The Art of Scientific Computing . Second Edition, Cambridge University Press, 1992.

    Step 5 Submission Requirements

    You are required to submit a report titled Step5_Report.pdf (max 2 A4 pages). The report should address the following

    1. For N=5 and N=10 what are the values of R that minimizes the total energies of these two systems. Provide details to explain how you determined these values. How accurate are each of your reported values for R?(4 Marks)
    2. For N=5 and N=10 and with the optimized R values determined above what is the spacing between adjacent atoms? Is this value larger or smaller for the N=5 system compared to the N=10 system? Explain this observation.(2 Marks)
    3. For N=5 and N=10 and with the optimized R values determined above is the force on each individual atom zero (to within the uncertainty associated with R)? Run one of your md codes to verify whether you are right. Explain what you did in verifying this and why you observe what you do from your code.(4 Marks)

Submission Details


All your submitted code should include this header at the top:

    Name:
    Student Number:
    Course:
    Assignment Number:
    Name of this file:
    Lab Group:
    
    I declare that the material I am submitting in this file is entirely
    my own work. I have not collaborated with anyone to produce it, nor
    have I copied it, in part or in full, from work produced by someone else.
    I have not given access to this material to any other student in this course.
    

You are required to submit a tar file using the submit tool on one of the student computers:

    submit comp3320 md_project md.tar
or if you are a COMP6464 student
    submit comp6464 md_project md.tar
The tar file should expand to give the following:
  • a directory called YOURSTUDENTID_ass1_part1 (such as u1000000_ass1_part1) that contains md1.py, md2.py, md2Module.c, and Makefile. This directory should also contains a plain text file called README that tells the users (me) exactly how to run your code and either contains or points the user to some sample input/output data so they can verify that your code works (or gives identical results to what you got). This need not be as detailed as the report for Step 3. In the same directory, you should also include a plain text file called WRITEUP that details how long it took you to complete this part of the project and details of who you collaborated with (if anyone).
  • a directory called YOURSTUDENTID_ass1_part2 which contains mdc.c in step 4, and the Makefile. A README file shall be provided to explain how to run the codes. You also should put your Step3_Report.pdf and Step4_Report.pdf here.
  • (COMP6464 student only:) a directory called YOURSTUDENTID_ass1_part3 that contains Step5_Report.pdf. If you choose to write a Python code for the optimisation task, please name your code md3.py, and put it in this directory with a README file.