You are on page 1of 1

Tobias Neckel

Max-Planck, October 2013

Bash course - Tutorial 4


This tutorial is about extracting data from les and plotting it. This is the last tutorial on bash; you might use spare time to work on the rst three tutorials as well.

Simulation runtime results


On the course website, you nd a .tgz le containing runtime results of a parallel molecular dynamics (MD) simulation. The .tgz le contains a number of folders. The name of the folders is as follows: jg yyyymmdd hhmm CO2OPPKD????. The rst part gives the date and time of the simulation, followed by the material simulated (CO2 ), a shortcut for the simulated scenario, and the parallelisation algorithm which has been used. For this tutorial, only the last part (????) is important as it contains the number of CPUs which have been used in the simulation. Each folder contains only a single le with 200 lines, where each line corresponds to one time step of the simulation. Each line has as many entries as processors were used for the simulation. Each entry is the time consumed by the corresponding process for the force-calculation in the MD-simulation. Get familiar with the data and gnuplot. E.g., take the le for a single processor and plot the runtime for the 200 simulation steps (so the x-axis is the number of steps and the y-axis the runtime) From a users point of view, only the runtime of the slowest processor is relevant, as this is the time the user has to wait for results. Use one of the les that show the results for several processors and plot for each time step the slowest one. From a high performance computing point of view, it is important to nd out how an algorithm scales with growing process number. So for the x-axis, you should use the number of processors, for the y-axis the average runtime (averaged over the 200 steps) of the slowest process (important: in each step, some other process can be the slowest one). If the parallelisation algorithm shall be improved, the load-imbalance between the processors has to be known. Thus, instead of plotting just the slowest runtime, plot the average runtime of all processors and the standard deviation of the runtime.

You might also like