You are on page 1of 105

Discrete Element Modelling Of

Granular Snow Particles Using


LIGGGHTS


Author

Vinodh Vedachalam





Supervisor

Davy Virdee
EPCC




Edinburgh Parallel Computing Centre
The University of Edinburgh
UK

August 2011







Discrete Element Modelling Of Granular
Snow Particles Using LIGGGHTS

Author
Vinodh Vedachalam

A thesis submitted in partial fulfilment of the requirement for the degree of
M.Sc. High Performance Computing

Thesis Supervisor
Davy Virdee


MSc in High Performance Computing
The University of Edinburgh
Year of Presentation: 2011

Abstract
The idea behind this thesis is to investigate and develop a large-scale three-dimensional
Discrete Element Model that can simulate several millions of snow particles falling
under the influence of gravity. This model can be used to simulate the behaviour of the
snow particles to allow for in-elastic collisions and cohesion using high performance
computing. This model is then profiled and benchmarked for scalability and finally
some suggestions are listed for optimisation for future research. The project extensively
studies and discusses the development of the model and the various driving factors
behind the high performance computing (HPC) solution.
i
Contents
Chapter 1 Introduction ...................................................................................................... 1
1.1 What is a Granular Material?.................................................................................. 2
1.2 Mechanics of Snow................................................................................................. 2
1.3 Computer Simulation of Granular Materials.......................................................... 2
1.4 Need for High Performance Computing................................................................. 3
1.5 Research Objective and Approach.......................................................................... 4
1.6 Literature Review.................................................................................................... 5
1.7 Organization of the report ....................................................................................... 7
Chapter 2 Background....................................................................................................... 8
2.1 Introduction to Molecular Modelling ..................................................................... 9
2.2 Review of Molecular Dynamics ............................................................................. 9
2.3 Force Calculations and Ensembles ....................................................................... 10
2.4 Interaction.............................................................................................................. 11
2.5 Integration ............................................................................................................. 11
2.6 Periodic Boundary Condition ............................................................................... 12
2.7 Neighbour List....................................................................................................... 13
2.8 Discrete Element Method ..................................................................................... 14
2.9 Parallelisation and Communication of DEM Simulations ................................... 15
2.10 Summary ............................................................................................................. 17
ii
Chapter 3 Experimental Setup ........................................................................................ 18
3.1 The Platforms ........................................................................................................ 19
3.1.1 HECToR Cray XE6 ....................................................................................... 19
3.1.2 Ness ................................................................................................................ 22
3.2 The Software ......................................................................................................... 22
3.2.1 LAMMPS Code Introduction..................................................................... 23
3.2.2 LAMMPS Installation................................................................................. 23
3.2.3 LAMMPS Working .................................................................................... 25
Figure 3.4: Execution of LAMMPS flow chart...................................................... 25
3.2.4 LAMMPS - Input Script Structure ................................................................ 25
3.2.5 LAMMPS - Input Script Basics..................................................................... 26
3.2.6 LAMMPS - Parsing Rules............................................................................. 26
3.2.7 LIGGGHTS.................................................................................................... 27
3.3 Visualization Setup ............................................................................................... 28
3.4 Description of the Mechanical Properties of Snow.............................................. 28
3.4.1 Size and shape of snow particles ................................................................... 28
3.4.2 Density ........................................................................................................... 29
3.4.3 Youngs Modulus........................................................................................... 29
3.4.4 Poissons Ratio............................................................................................... 29
3.4.5 Coefficient of restitution................................................................................ 30
3.4.6 Coefficient of kinetic friction......................................................................... 30
3.5 Summary ............................................................................................................... 30
Chapter 4 Modelling of Cohesive Interactions............................................................... 31
iii
4.1 DEM Revisited...................................................................................................... 32
4.2 Defining a particle and particle collision.............................................................. 33
4.2.1 Particle Definition.......................................................................................... 33
4.2.2 Cohesive forces.............................................................................................. 34
4.3 Modelling Cohesive Contacts............................................................................... 35
4.3.1 Contact Point and Collision Normal.............................................................. 35
4.3.2 Normal Deformation and Contact force........................................................ 36
4.3.3 Collision Detection ........................................................................................ 37
4.4 Basics of Contact force models............................................................................. 37
4.5 Physical Models of Cohesive Contact .................................................................. 38
4.5.1 Linear cohesion model................................................................................... 38
4.5.2 JKR cohesion model ...................................................................................... 39
4.6 Summary ............................................................................................................... 40
Chapter 5 Implementation Details .................................................................................. 41
5.1 Porting LAMMPS Cohesion Add-on to HECToR............................................... 42
5.1.1 Modifying the fix_cohesive.h header file...................................................... 43
5.1.2 Modifying the fix_cohesive.cpp header file.................................................. 43
5.2 Building the granular module within LAMMPS.................................................. 46
5.3 LAMMPS Granular Essentials ............................................................................. 46
5.4 Determination of simulation time-step ................................................................. 47
5.5 LAMMPS Simulation........................................................................................... 47
5.5.1 Implementation Details.................................................................................. 47
5.5.2 Visualisation................................................................................................... 50
iv
5.5.3 LAMMPS Simulation Results and Discussions............................................ 50
5.6 LIGGGHTS Simulation........................................................................................ 52
5.6.1 Material parameter values.............................................................................. 52
5.6.2 LIGGGHTS Implementation Details ............................................................ 53
5.6.3 LIGGGHTS Simulation Results.................................................................... 55
5.6.4 Improved Chute Geometry ............................................................................ 57
5.6.5 Improved Simulation Results ........................................................................ 58
5.7 Summary ............................................................................................................... 60
Chapter 6 Benchmarking................................................................................................. 61
6.1 Cost of Accessing HECToR................................................................................. 62
6.2 Performance Benchmarks ..................................................................................... 62
6.3 Performance per time-step .................................................................................... 65
6.4 Performance Comparison - Cohesion and Non-Cohesion ................................... 66
6.5 Summary ............................................................................................................... 66
Chapter 7 Profiling and Performance Analysis .............................................................. 67
7.1 Description about profiling tools available........................................................... 68
7.2 Profiling using CrayPAT ...................................................................................... 69
7.2.1 Profiling - User functions............................................................................... 69
7.2.2 Profiling - Percentage Time of MPI Calls..................................................... 70
7.2.3 Profiling Messages/Sizes ............................................................................ 71
7.2.4 Profiling Memory Usage ............................................................................ 72
7.3 Timing O/P directly by the code and its description............................................ 72
7.4 Summary ............................................................................................................... 74
v
Chapter 8 Conclusions and Future Work........................................................................ 75
8.1.1 Summary ........................................................................................................ 75
8.1.2 Recommendations for future Research ......................................................... 77
References ....................................................................................................................... 79
Appendix A Project Management................................................................................ 82
Appendix B Parallel Processing on Ness & HECToR................................................ 84
Appendix C AutoCAD Details .................................................................................... 87
vi
List of Tables
Table 1.1: Estimate of Number of Snow Particles ......................................................... 3
Table 3.1: Ness Specification August 2011.................................................................. 22
Table 5.1: Material Parameters .................................................................................... 53
Table 5.2: Chute Specification...................................................................................... 57
Table 7.1: LIGGGHTS timing output........................................................................... 73
Table A.1: Updated work plan...................................................................................... 82
Table A.2: Risk Assessment ......................................................................................... 83


vii
List of Figures
Figure 1.1: A small avlanche slab of 100mx50x10 metre.............................................. 4
Figure 2.1: Flowchart of Molecular Dynamics Approach............................................ 10
Figure 2.2: 2D MD simulation with periodic images of itself ..................................... 13
Figure 2.3: Neighbour list of one of all particles (in red) drafted in 2D...................... 14
Figure 2.4: DEM Contact model in normal and tangential direction........................... 15
Figure 2.5: Spatial decomposition approach................................................................. 16
Figure 3.1: Picture of the Cray XE6 ............................................................................. 19
Figure 3.2: Magny-Cours architecutre diagram............................................................ 20
Figure 3.3: HECToR file system................................................................................... 21
Figure 3.4: Flowchart for LAMMPS execution ........................................................... 25
Figure 3.5: Graupel snow particles ............................................................................... 29
Figure 4.1: Diagram to illustrate the typical flow of a DEM simulation ..................... 32
Figure 4.2: Definition of a computational particle ....................................................... 33
Figure 4.3: Different possible collision states............................................................... 34
Figure 4.4: Two ways to calculate cohesion normal and ocntact point ....................... 35
Figure 4.5: Contact zone between two spherical particles ........................................... 36
Figure 4.6: Stages of collision detectin......................................................................... 37
Figure 4.7: Hertz contact force model .......................................................................... 40
Figure 5.1: LAMMPS Simulation screenshot 1 ........................................................... 51
viii
Figure 5.2: LAMMPS Simulation screenshot 2 ........................................................... 51
Figure 5.3: LAMMPS Simulation screenshot 3 ........................................................... 52
Figure 5.4: LIGGGHTS Simulation screenshot 1 ........................................................ 55
Figure 5.5: LIGGGHTS Simulation screenshot 2 ........................................................ 56
Figure 5.6: LIGGGHTS Simulation screenshot 3 ........................................................ 56
Figure 5.7: Cross section of the improved chute .......................................................... 57
Figure 5.8: Improved LIGGGHTS Simulation screenshot 1 ....................................... 58
Figure 5.9: Improved LIGGGHTS Simulation screenshot 2 ....................................... 59
Figure 5.10: Improved LIGGGHTS Simulation screenshot 3 ..................................... 59
Figure 6.1: Performance of the simulation model ........................................................ 63
Figure 6.2: Benchmark of different system size on 480, 960 processors .................... 63
Figure 6.3: Execution time of 75,000 and 1000,000 particles...................................... 64
Figure 6.4: Speedup of 75,000 and 1000,000 particles ............................................... 65
Figure 6.5: Speedup of 75,000 and 1000,000 particles ............................................... 65
Figure 6.6: Speedup of 75,000 and 1000,000 particles ............................................... 66
Figure 7.1: Profiling results of the code by function groups ....................................... 69
Figure 7.2: Time time consuming user functions ........................................................ 70
Figure 7.3: Top time consuming MPI functions .......................................................... 70
Figure 7.4: Top time consuming MPI_SYNC functions ............................................. 71
Figure 7.5: Profiling by message size .......................................................................... 72
Figure 7.6: LIGGGHTS timing output ........................................................................ 73
Figure 8.1: Linear regression analysis of scaled size benchmark ............................... 77

ix
Acknowledgements


I wish to sincerely thank my supervisor Davy Virdee for his guidance and support
during this research. He has continuously encouraged me and has contributed greatly to
my professional growth. I also thank Dr. Fiona Reid of EPCC for her help in the initial
stages of the project in setting up of LAMMPS on HECToR and Ness.

I wish to thank Dr. Jin Sun of Institute for Infrastructure and Environment University of
Edinburgh for providing the cohesion-add-on code for LAMMPS. I also wish to thank
Dr. Jane Blackford of Centre for Materials Science and Engineering and Institute of
Materials and Processes, School of Engineering and Electronics, University of
Edinburgh for her inputs on the physics of snow particles and for her feedback on the
simulation model.
x
Nomenclature
A Hamaker constant
a acceleration (m/s
2
)
E* reduced elastic modulus (Pa)
E Youngs modulus (Pa)

!
!
F force vector acting on particles (N)
F
ij
force of i at contact with neighbouring particles j (N)
!
f
n
d
repulsive forces at contact (N)
!
f
n
c
cohesion force (N)
!
f
n
e
viscous force (N)
g acceleration due to gravity (m/s
2
)
k normal spring stiffness
m mass of the particle (kg)
r radius of the particle (m)
r
b
buffer radius (m)
r
cut
cut off radius (m)
r
nl
neighbour list cut-off radius (m)
U potential energy (Jm
-2
)
!
u
pq
relative velocity between two particles p, q (ms
-1
)
Vn relative normal velocity of the colliding particle (ms
-1
)

xi
x
pq
relative position of two particles p , q (m)
!
x
i
..
translational acceleration (ms
-1
)

Greek Symbols

!
"t time-step (s)
!
"s distance between two particles in tangential direction (m)
!
" body surface normal overlap between particles (
!
m )
!
" density of particles (kg/m
3
)
!
" coefficient of restitution
!
" inclination angle of the chute (degrees)
!
coefficient of friction
!
" dumping coefficient
!
" Poisons ratio









xii
Abbreviations
2D two dimensional
3D three-dimensional
AU accounting unit
CFD computational fluid dynamics
CPU central processor unit
CAD computer aided design
DEM discrete element method
EAM embedded atom method
EDM event driven method
FEM finite element method
GPU graphical processing unit
HECToR high end computing terascale resource
JKR Johnson Kendall Roberts
KB kilobytes
LAMMPS large-scale atomic/molecular massively parallel simulator
LIGGGHTS LAMMPS improved for general granular and granular heat transfer
simulations
NUMA Non-uniform memory architecture
MB mega bytes

xiii
MD molecular dynamics
MPP massively parallel processor
MPI message passing interface
PBC periodic boundary conditions
STL stereolithography
TDM time driven method

1
Chapter 1

Introduction



This Chapter is an introduction to the thesis providing an overview of the objective
and approach of this research and emphasis the need for High Performance
Computing for this research. Snow is introduced as a granular material and some
background information about Snow Avalanches and their formation are discussed. In
the past, many researchers have proposed different models to describe the movement
of snow and snow mechanics ranging from simple theoretical model to complicated
computational models. Many such techniques to model snow particles and snow
avalanches are reviewed.












2
1.1 What is a Granular Material?
A Granular material is an assembly of many discrete solid particles interacting with
each other due to dissipative collisions and is dispersed in a vacuum or an interstitial
fluid. Granular particles can be considered as the fourth state of matter very different
from solids, liquids or gas. For example, a pile of granular sand particles at rest on an
inclined plane produces a solid like behaviour if the angle of inclination is less than a
certain angle called the angle of response. This is due to the static friction between the
granular particles and the slope. If the inclined plane is tilted a bit above the angle of
response, then the grains start to flow, exhibiting a fluid like behaviour (though the
flow is very much different from the actual flow of a fluid). The granular particles
behave like gaseous particles under highly agitated systems (Jenkins and Savage,
1983). The force interactions between the particles play a key role in defining the
mechanics of granular flows.
Several forms of granular flows exist in nature and in industrial process ranging from
avalanches in the form of landslides to powder mixing in chemical industry. Granular
materials cover a broad area of research at the intersection of different scientific fields
including soft matter physics, soil mechanics, powder technology and geological
processes. Despite the wide variety of properties, the discrete granular structure of
these materials leads to a rich generic phenomena, which has motivated research for
its fundamental understanding.
1.2 Mechanics of Snow
Snow is a form of precipitation that is made up of crystalline water ice crystals. It is an
example of a geo-material whose microstructure plays a significant role in its overall
behaviour. After snow falls, the physical structure of snowpack is affected by factors
like interaction with ground, temperature and other meteorological conditions. The
initial snow crystals are transformed into ice grains because of deformation through
wind or melting/freezing process. The resulting snow cover can be considered as
porous granular material made up of ice grains, water droplets and dust. Though on
macroscopic level, they are considered as continuous media, that is, the granular
structure may sometimes no longer be visible, but at microscopic level, a snow sample
can be considered as a cohesive granular assembly of elementary particles that are
assumed as rigid particles. Thus, in the approach presented in this thesis snow is
considered as granular medium.
1.3 Computer Simulation of Granular Materials
In the past, experimental studies were carried out to study the behaviour of granular
particles. In recent years, due to the advancement in computer processing speed
numerical simulation of such granular flow is seen as an effective alternative tool to
study and understand the behaviour of granular flows. Since a granular system is
composed of individual particles and each particle moves independently of each other,
3
it is difficult to predict the behaviour of granular system using continuous models. In
this context, the discrete approach developed for particle scale numerical modelling of
granular materials has become a powerful and reliable tool. It is considered as an
alternate to the continuum approach. This discrete approach is called as Discrete
Element Method (DEM). The philosophy behind the DEM simulation of granular flows
is to model the system at microscopic level or particle level and study their behaviour
including the detection and collision between particles and their environment. DEM can
efficiently and effectively model the dynamics of assemblages of particles. Technically,
the discrete approach requires a time-discretised form of equations of motion governing
particle displacements and rotations, and a force law or force-displacement relation
describing particle interactions. DEM is particularly useful in modelling materials that
undergo discontinuous deformations because of the contact with other particles in the
system, breakage of contact bonds and compaction of broken fragments. In this thesis,
DEM is employed to simulate the flow of snow particles under gravity a snow
avalanche.
Snow can be considered as a granular material (section 1.2), a DEM approach is chosen
instead of continuum approach and it is, in principal, possible to capture almost all the
granular physical phenomena of snow particles using DEM. Chapter 2 discusses the
background details of the DEM approach in detail.
1.4 Need for High Performance Computing
There are two main reasons to use super computers to develop the DEM model of snow
avalanches. First, the number of snow particles in a real snow avalanche is huge. To
give an idea of number of particles in a powder snow avalanches, an estimate on
number of snow particles of size 2mm and 5mm in 5-litre and 1000 litre volume is
done and is summarised in table 1.1.


Particle
Diameter (mm)
Volume of a Spherical
Particle (m
3
)
Fill Volume (m
3
)
Number of Particles
per fill volume
2 4.18 x 10
-09
0.005 (5 litres) 1.2 million
2 4.18 x 10
-09
1 (1000 litres) 240 million
5 6.54 x 10
-08
0.005 (5 litres) 76 thousand
5 6.54 x 10
-08
1 (1000 litres) 15 million
Note: 0.005 m
3
volume = 5 litres and 1 m
3
= 1000 litres

Table 1.1: Estimate of number of snow particles

If a slab of snow measuring 100m in length with a width of 50m and a depth of 10m
were to slide see figure 1.2 the volume would be 50,000 m
3
. This would contain
roughly 1.2 x 10
13
snow particles of size 5mm in diameter.

4


Figure 1.1: A small avalanche slab of 100x50x10 metres (Photograph from D. Virdee
collection)

Second, in order to accurately simulate the behaviour of the particles, all DEM
simulations require very large number of particles in the system. These signify the
importance of High performance infrastructures for our model not only for the
computational purpose but also for post-processing and visualisation purposes.

Given these, it would clearly be impossible to model such large-scale avalanches.
Hence this thesis will look at a small scale snow slide/avalanche of volumes up to 5
litres 0.005 m
3
with about a million particles flowing under the influence of gravity
and study the computational behaviour of the model to make an estimate on the
resources required to model much larger number of particles.

1.5 Research Objective and Approach
The research will aim to address the following aspects: Is it possible to model the
spherical snow particles using DEM? Can the model be profiled and benchmarked for
scalability? Can it be optimised for better performance?

The aims of this research encompass both the experimental and computational aspects
and investigate the discrete numerical simulation of granular (spherical) snow
particles flowing down a slope an avalanche. The experimental aim is to:

1. Design a discrete model of 1000 spherical snow particles of 5mm diameter
surrounded by granular walls on the x and y boundaries.
5

2. Implement the model using LAMMPS/LIGGGHTS. Validate the model. Extend it
for up to 1 million particles roughly the number in a 5-litre volume.

3. Profile the code to understand the performance bottlenecks and suggest
optimisation strategies.

4. Visualise the LAMMPS/LIGGGHTS output. Identify a suitable technique to
visualise the LAMMPS/LIGGGHTS snapshot of the simulation.

Predicting snow avalanche is a very complex process and this project will not attempt
to predict/forecast the occurrence of an avalanche. The goal of our model is to
attempt to capture the movement and deformation of small loose snow avalanche with
cohesion as discrete particles using high performance computing, benchmark this
model and examine the HPC artifacts observed after profiling the code. The intention
is to model the system approximately in terms of size, shapes (preferably) and
material characteristics subject to the computational limit. This can be further
developed, in the future, as a full-fledged model to investigate the mechanics of
granular snow avalanches.

1.6 Literature Review
In this section, the theories and computational models for snow avalanches are
reviewed as a basis for understanding the snow avalanche modelling. Various models
ranging from simple theoretical methods to complicated computational models can
describe the movement of snow and snow mechanics. All basic approaches used for
snow modelling employ statistical models, analytical models and numerical models.
Traditional avalanche models use point-wise and piece-wise analytical solutions for
governing differential equations to describe momentum conservations laws. To obtain
the crude estimations of important avalanche features such as velocity, pressure and
run-out distance only simple models have been used for the past 80 years. They are
capable of producing accurate results (Christophe, 2002). Logotala (1924) first
computed the velocity of avalanches down a predetermined path. Voellmy, Salm and
Gubler (VSG) examined the model proposed by Logotala and new extensions have
been proposed by them to increase the accuracy of the model. In VSG model, the
flowing avalanche is treated like a sliding block. The main advantage of this model is
that it can be used to predict the maximum velocity, run-out distance and the impact
pressure of an avalanche very easily. However, according to Mear (2005) this model
is not reliable because of the unrealistic assumptions made on the avalanche path. The
avalanche path is divided into three segments that include the avalanche track, the
run-out zone and the release zone. All of these segments are assumed to have constant
slope, constant width and uniform flow. Mear (2005) points out that the assumptions
that are taken into account for the avalanche path are not efficient and effective to
model all avalanche terrain and a system for snow modelling which is a dynamic
modelling cannot be done with these unrealistic assumptions.

6
The run-out distance of an avalanche is also described by many statistical approaches.
Bovis and Mears (1976), Lied and Bakkehoi (1980) and Mc Clund and Lied (1987)
proposed a different approach to solve the existing problem. They predicted the run-
out distances of avalanche from topographic features for a particular avalanche path.
Relating the slope angle of the avalanche path to the run-out distance of an avalanche
develops the statistical approaches. Advantages of this model are simplicity, ease of
use and are derived from local history of avalanche in the region of interest over the
other complicated models. Statistical models are very useful and accurate when
historic data of snow avalanches are available. Statistical models can be classified
into two types: alpha-beta (!") model and run out ratio model (#X/ X"). Both
these models are based in the correlation between run out distances and some
topographic features and they are based on the theory that the avalanche dynamics is
based in the longitudinal profile of the avalanche path. Since such models are based
on historic real data, according to Mear (2005), they have their own limitations: each
mountain terrain and the run out path is unique which makes the model applicable
only to the region where the data was gathered and not generic. Another drawback
with such models is they provide only very limited information about the velocity and
impact pressure of an avalanche. Hydraulic models are developed as a means to
capture the dynamics of avalanche that provides accurate information about the
velocity, flow depth and impact pressure of an avalanche over the run out path.

A different approach to avalanche modelling is to model them using depth-averaged
(the size of snow particles are on average scale) average scale continuum approach.
Voellmy (1955) was the first to develop one such model (the Voellmy-fluid model).
He described the dense-snow avalanche movement based on the principles of
conservation of mass and momentum. He treated the avalanche as a sliding block of
snow moving with a drag force that is proportional to the square of the speed of its
flow. This model is widely accepted as it gives correct and reliable results.
Furthermore, this model gives unsatisfactory results for large-scale avalanches
(Bartelt 1999). Eglit et al. (1960) from Moscow State University developed a model,
based on Voellmy-fluid model, in which there is no upper limit on the sliding friction
and that it makes no distinction between the active and passive parts of the avalanche
flow. This makes the Russian model suitable for large-scale avalanches.

Savage and Hutter (1988) modelled the motion of finite cohesion-less granular
materials down an inclined plane as a fluid (continuum approach). This is one of most
advanced concept to model the motion of granular model. It is based on the system of
differential equations for the conservation of mass and momentum and the velocity is
assumed constant. The model best describes the motion of front and rear edges of a
finite mass of granular material released from rest.

Hopkins and Johnson (2005) of US Army Engineer Research Development Center
(ERDC) have developed a dynamic model (
!
SNOW) of dry snow deformation based
on DEM that can sinter together and break apart. Their model identifies the micro-
structural deformation mechanics of snow particles. In this model the snow particles
are represented as randomly oriented cylinders of random length with hemispherical
ends. The contact detection is handled by a new iterative method based on the dilation
operation (Hopkins, 1995). This model accurately represents the mechanics that
7
control dry-snow deformation. My first interaction with the group, during the
literature review phase of the project, provided me few insights about the
!
SNOW
model they have developed. They have tried to develop a virtual snow laboratory
having mechanical, heat transfer, visible light interaction, metamorphism modules
and air flow/permeability.

During my telephonic conversation with A.Hopkins of ERDC, we discussed about the
DEM modelling of snow particles. According to him, DEM modelling of snow
mechanics is a mixed bag, as they say. With the DEM it is possible to model snow
particles made of discrete grains of various shapes that have sintered together. But to
model a more general, metamorphosed sample then finite element method (FEM)
may be more capable of resolving the complex structure that is difficult, if not
impossible to resolve into grains. However, there are couple of unanswered questions:
a. what to do with the FEM structure once it starts breaking apart into grains and
fragments. b. in that stage the can DEM solve the problem. These questions are still
not resolved. This meeting helped me gain lot of useful information for the thesis.

1.7 Organization of the report
This dissertation report is organised into eight chapters based on the order in which
the work has been done for the thesis and it follows a rational and progressive order.

This chapter is the introduction to the thesis.

Chapter 2 provides background information describing DEM and some of the
Molecular Dynamics (MD) components required to begin simulations. Methods of
decreasing the computation time are also discussed.

The hardware and software (program used) requirements are discussed in Chapter 3.
In addition, the description of the material properties of snow that is required for the
simulations are explored.

Chapter 4 is devoted to more advanced developments dealing with complex particle
shapes, cohesion forces, hydrodynamic and thermal interactions and modelling of
complex granular systems. In addition, the two physical cohesion models that are
employed in building the DEM model are explained in detail.

The Implementation details, Numerical results and Validation of the DEM model is
discussed in detail in Chapter 5.

The benchmarking results are presented in Chapter 6.

The code is profiled using CrayPAT and the profiling results are discussed in Chapter
7.

Chapter 8 provides optimisation suggestions and some conclusions of this research
and suggestions for future enhancements to the project.
8
Chapter 2

Background

This chapter provides background information about MD-DEM and discusses its
advantages and disadvantages. The numerical modelling requires not only a simulation
method but also a toolbox of different methods for the management of initial
boundary conditions and choice of parameters. All these topics are discussed in this
chapter in the framework of a minimalistic model. An explanation on how to calculate
the forces and the potential energy functions that are required to predict the trajectory of
particles are provided. The Integration method used to calculate the velocities and
positions of particles is described in detail. In addition to these, several optimization
techniques like neighbour lists, periodic boundary conditions are also discussed. In
addition, the constraints of DEM simulations and parallelisation and communication
strategies are explored.










9
2.1 Introduction to Molecular Modelling
Snow particles are very much bigger than molecules. However, molecular modelling
is the basis for DEM. This section and the next section discuss molecular modelling
to give some background for DEM.

Molecular modelling is defined as the computational technique to construct or mimic
the behaviour of molecules and performs a variety of calculation on these molecules
to predict their characteristics and behaviour. Molecular-modelling is used in the
field of material sciences for studying three main characteristics about an individual
molecule or system of molecules: its chemical structure (number and type of
particles), its properties (energy) and its characteristics behaviour in the presences of
other molecules (electrostatic potentials). These determinations help in validating
experimental studies or can help in predicting experimental results. The key feature of
Molecular modelling techniques is the atomistic level description of molecular
systems, that is, describing the system at individual atom (or a small group of atoms)
level. The main advantage of such approach is it reduces the complexity of the
system, allowing many more particles to be considered during the simulations. There
are a number of techniques employed in molecular modelling. MD is a popular and
most commonly used technique for molecular modelling.

2.2 Review of Molecular Dynamics
MD is a very established microscopic form of computer simulation method for
studying the properties and behaviour of complex systems like solids, liquids and
gases by calculating motion of every particle in the system over a given time. A basic
MD simulation contains five steps as summarised below.

Initialise - Read initial states of the particles.

Ensembles and Interaction - Calculate the forces acting on each particle
based on neighbour list.

Force Calculations - Compute the acceleration of each particle.

Integration - Obtain the new velocity and position of the particles after each
time step.

Analyse - Compute magnitudes of Interest and measure Observables. Repeat
steps 2 through 5 for the required number of time-steps.

P. Cundall (1979) developed DEM approach nearly 30 years ago to model granular
materials. MD was also used at that time for the simulations of molecular systems
with classical schemes that could be directly applied to granular media. For this
reason, many authors keep using indiscriminately the acronyms MD and DEM for the
discrete simulation methods of granular materials


10

Figure 2.1: Flowchart of MD approach

Note that both MD and DEM approach are identical in spirit but the physics is
fundamentally different. However, especially for modelling granular materials, the
particle properties, force interactions and integration laws that are often referred to as
MD techniques are used in DEM, in order to understand the collective behaviour of
the dissipative large-particle system. These MD techniques are explained in detail in
Section 2.3 to 2.6 in this chapter before proceeding to DEM.

2.3 Force Calculations and Ensembles
Particles in the MD simulation move due to the forces acting on them as governed by
Newtons second law of motion, which is given by equation 2.1


!
!
F = m
!
a = m
d
!
v
dt
= m
d
2
!
r
dt
2
(2.1)




11
Calculating the potential energy between the individual particles helps in determining
the atomic interactions that are described by inter atomic forces between particles.
The sum of the potential energy associated with all types of atomic interactions gives
the total potential energy of the system. The forces acting between the pair of
particles are computed by evaluating the negative first order derivate of the potential
energy with respect to the separation distance. This is given by equation 2.2

!
F
i
= "
#U
#r
ij
(2.2)

There are three commonly used Ensembles in MD simulations.

1. NVE Number of atoms, Volume and Energy of the system are kept constant. This
is called Micro Canonical ensemble.

2. NVT Number of atoms, Volume and Temperature of the system are kept
constant. This is called Canonical ensemble.

3. NPT - Number of atoms, Pressure and Temperature of the system are kept
constant. This is called Gibbs ensemble.

NVE ensemble approach is used in this thesis.

2.4 Interaction
Potential energy calculations play a key role in MD simulations. The selection of
appropriate potential is the fundamental step in any MD simulation so that it provides
useful results as well as not computationally expensive. The potential used in this
thesis is granular potential. The granular potential uses Hertzian interactions when the
distance r between two particles of radii R
i
and R
j
is less than their contact distance d
= R
i
+ R
j
. There is no force between the particles when radius r is greater than contact
distance d. Refer to Chapter 4 for more details about the interaction potential used in
this thesis.
2.5 Integration
All the atoms in the molecular dynamics simulation move randomly within the
system. For the atoms to move the forces must be integrated. It is very difficult to
solve such MD simulations analytically as it usually involves millions of atoms.
Therefore, to solve such systems, a numerical integration method is necessary. There
are many numerical integration methods available such as Verlet algorithm, leapfrog
algorithm, Velocity-Verlet and Beemans algorithm. LAMMPS, the molecular
dynamics software used in this thesis, uses Velocity-Verlet integration scheme.
Velocity-Verlet algorithm, a modified version of Verlet algorithm, is a numeric
integration method that determines the positions of the atoms after every time-step.
For the given position, basic Verlet algorithm is derived as Taylor expansions, one
forward in time and one backward in time as follows.

12


!
r(t + "t) = r(t) +v(t)"t +
1
2
a(t)"t
2
+
1
6
b(t)"t
3
+O("t
4
) (2.3)

!
r(t " #t) = r(t) "v(t)#t +
1
2
a(t)#t
2
"
1
6
b(t)#t
3
+O(#t
4
) (2.4)


Adding equations 2.3 and 2.4 we get

!
r(t + "t) = 2r(t) # r(t # "t) + a(t)"t
2
+O("t
4
) (2.5)

Equation 2.5 gives the position after time t. The position is expressed as a function of
the current position, the previous time-step position and the acceleration. The
truncation error of the Velocity-Verlet algorithm is of the order
!
"t
4
. One constraint
with the basic Verlet algorithm is, for the very first time-step, the previous position,
!
r(t " #t), is not defined. This is fixed in the Velocity-Verlet algorithm by explicitly
including the velocities of the atoms, that is, velocity and position are calculated at
the same time-step.

!
r(t + "t) = r(t) +v(t)"t +
1
2
a(t)("t)
2
(2.6)

!
v(t + "t) = v(t) +
a(t) + a(t + "t)
2
a(t) (2.7)

2.6 Periodic Boundary Condition
Almost all the MD simulations take place in a box or a container of any shape/size.
They aim to model infinite systems at the microscopic level given finite means. If a
container with rigid boundary conditions is chosen then at microscopic level most of
the particles are affected by the edge and wall effect of the system. According to
(Rapaport, 2004), for a microscopic simulation with 1000 particle nearly 500 particles
sticks to the boundary walls/edges.

This situation is avoided by using the Periodic Boundary Conditions (PBC) in which
the particles are treated as infinite array of identical translated images of itself as
shown in figure 2.2. There are two considerations with this PBC approach. First,
particles moving out at one end of the boundary re-enters at the opposite boundary,
creating a periodic movement of it. Second consequence is particles that are within r
c

distance of a boundary interact with particles within the same distance near the
opposite boundary. These two considerations are taken into account in LAMMPS
while doing the force calculations and the integration of position and velocity.

13


Figure 2.2: A 2D MD simulation with periodic images

2.7 Neighbour List
Figure 2.3 is the neighbour list illustrated visually. Each particle in the simulation
needs to interact with every other particle in the simulation step; there are a total of
!
O(n) interactions/forces to be calculated.

!
n
2
"
#
$
%
&
'
=
n(n (1)
2
)O(n
2)
) (2.8)

This value is even bigger for short-range forces, that is,
!
O(n
2
) interactions have to be
calculated. Only few of these make useful contributions while others are not updated.
In order tackle this problem of huge interactions, LAMMPS implements neighbour
list strategy to calculate the forces/interactions. The cut off radius r
cut
is used to
specify or limit the number of interactions associated with individual atoms. Only
those interactions associated with the atoms in the neighbour list is accounted for the
force calculations. This greatly helps in reducing the time of the simulation
(Subramani, 2008). Even this approach has a drawback that the neighbour list
requires frequent updates for every time step, which is undesirable because it
consumes considerable amount of time in generating the list. To avoid very frequent
updates to the neighbour list, a buffer radius r
b
is chosen such that the buffer radius
(r
b
) is added to the cut-off radius r
cut
to get neighbour list cut-off radius r
nl
that is
greater than the cut-off radius (e.g. r
nl
= 2r
cut
). This buffer radius helps in the
displacement of atoms beyond the cut-off radius but still within the neighbour list
radius thereby helping in reducing the number of times the neighbour list is refreshed.
Though this requires additional memory of order
!
O(n) it reduces the complexity of
14
the computation to
!
O(n) which is better than the nave
!
O(n
2
) for large number of
particles.



Figure 2.3: Neighbour list for one of all particles (in red) drafted in 2D

2.8 Discrete Element Method
A DEM approach is very much similar to the standard MD approach in which the
positions of the particles are updated gradually in time using many discrete time steps
(Plimpton, 1995). The basic properties of a DEM are:

They have many numbers of particles.
Particles are relatively stationary.
Force between the particles is very short-range.
Time taken to compute forces dominate the simulation.

A DEM analysis starts with a collection of particles or by creating particles in a
designated region. Either a sphere or another geometrically well-defined volume or a
combination of them mathematically represents each physical particle. The movement
of these spherical particles is based on the corresponding momentum balances. Along
with the current position and velocity of a particle, the particles physical
characteristics are used to calculate the current forces upon the particle. The forces
typically include gravity friction, pressure from contact with other particles and
physical system boundaries that may include other effects such as those caused by
cohesion. These forces are then used to predict the particles future location and
velocity for some minor increment called the time-step. Normally the time-step is on
the order of millionths of a second. This process is repeated for every particle in the
system for each time step. When particles collide with each other or with other parts
of the system, the particles are modelled like linear springs, dashpots and joints in the
normal and tangential directions as show in figure 2.4.



Therefore, the force

F
LJ
between two particles in distance r is obviously:

F
LJ
(r) =

U
LJ
(r) = 4

12

12
6

r
r
2
.
To save calculation time, the inuence of short-range potentials is usually neglected for
distances r greater than a cut-o radius r
c
:
U
LJ,cut
(r) =

U
LJ
(r) for r r
c
0 otherwise
(2.4)

F
LJ,cut
(r) =


F
LJ
(r) for r r
c
0 otherwise
.
2.2.2 Neighbour lists
Since every of the n particles needs to interact with every other particle in a simulation step,
there are

n
2

=
n(n 1)
2
O(n
2
)
forces to be calculated. Especially for short range potentials with cut-o radii, O(n
2
) inter-
actions have to be checked, but only few of them actually make a contribution. Therefore,
MD simulators usually implement a neighbour list strategy for these forces, i.e. a list of
particles within the distance r
neigh
(of e.g. r
neigh
= 2 r
c
) is kept for every particle for some
time-steps. The slower the interaction sphere of the particles move out of that radius r
neigh
,
the longer can the neighbour list be kept. Assuming a xed r
neigh
and that the macroscopic
density of the particles has an upper bound, there can only be a limited number of particles
within the neighbour list radius for each of the n particles, thus, reducing the complexity of
pure force computation to O(n). While this consumes O(n) extra memory, it is of course a
better method than the naive O(n
2
) one for a sucient large number of particles.
Figure 2.2:
Figure 2.2 illustrates the neighbour list for one
of all particles (crossed red) drafted in 2D: the in-
teraction sphere is coloured in vertically striped /
blue, the additional neighbour particles are horizon-
tally striped / green.
15
15


Figure 2.4: Contact model for DEM simulations in the (a) normal and (b) tangential
directions.


The contact force is the result of elastic, viscous and frictional resistance between the
moving particles, which can be modelled as the spring, dashpot and shear slider. The
spring models the elastic interactions while the dashpot expresses the dissipation of
energy in the system. Such a model helps in calculating the forces acting on each
particle. The new position, velocity and acceleration of the particle are estimated by
numerical integration of the Newtons second law:

!
m
i
x
..
i = m
i
g + F
ij
j
"
!
I
i
..
"
= (r
ij
# F
ij
)
j
$
(2.9)

!
x
i
..

- translational acceleration
!
m
i
- mass of the particles i
!
g

- acceleration due to gravity
!
F
ij

- force at contact with neighbouring particles j
!
r
ij
- vector directed from the centre of the particle i to the contact point with particle j

The implementation specifics of the deformations enforced by contacts between the
particles, methods of modelling interactions are discussed in detail in Chapter 4.
2.9 Parallelisation and Communication of DEM Simulations
The main constraint of a MD-DEM simulation is it requires many calculations and lot
of memory and CPU time. Usually the order of simulation time is proportional to the
square of the number of particles in the system. As the system grows, the time taken
for the simulation grows exponentially. In order to overcome these constraints, most
of the MD-DEM simulations are designed to be implemented in parallel. There are a
number of ways to do this parallelisation. One common approach is to run different
scenarios each with different start up conditions on independent processors and make
them run in parallel. There are two ways to implement this parallel strategy. Either
each parallel processor has access to shared memory or they execute the scenario on
independent memory and communicate data and results among themselves. Since the
170 I S,|a/ / a|
rately having its own mass, velocity and contact properties.
Contacts between cylindrical elements in the system are
modelled by the set of linear springs, dashpots and joints in
normal and tangential directions and frictional slider in tan-
gential direction (Figure 1).
The contact force results from elastic, viscous and fric-
tional resistance which can be modelled as the spring, dash-
pot and shear slider. The spring expresses elastic interaction
while the dashpot models dissipation of energy in the system.
Shear slider, that represents frictional force at contact point,
slurls ils opcrulion whcn lhc lollowing incquulily is mcl:

, F F
n t
>
(1)
whcrc: I
t
and F
n
are tangential and normal forces, respective-
ly, and is a coefficient of friction.
Such a contact model allows calculation of forces acting
on each particle. Position, velocity and acceleration of the
particle are estimated by numerical integration of the New-
lon`s sccond luw:

m
i
x
i
= m
i
g+
j
F
ij
I
i

i
=
j
(r
ij
F
ij
) (2)
whcrc: m
i
is a mass of particle i, x
i
is its translational accelera-
tion, g is acceleration of the gravity, F
ij
is the force at contact
with neighbouring particles j, I
i
is a moment of inertia,

i
is
particle rotational acceleration, r
ij
is the vector directed from
the centre of the particle i to the contact point with particle j.
Deformations enforced by contacts between particles or
particle and boundary during collision are represented by
overlap at contact point. Two methods of modelling inter-
action of two particles during impact have been proposed.
Hard-sphere model does not require as short simulation time
step as soft-sphere model does as a result of different way of
description of velocities and positions of elements in the two
mclhods. Using hurd-sphcrc modcl involvcs lhc loss ol somc
data, which limits its applicability to dilute free-flowing sys-
lcms |i Rcnzo & i Muio. 2OO5].
To cnublc uppliculion ol IM lor morc complcx und limc-
consuming proccsscs Iwushilu & Odu |2OOO] improvcd ulgorilhm
ol Cundull & Slruck |199]. Owing lo ils cupucilics IM is now-
adays an effective tool used for modelling numerous processes
occurring in grunulur syslcms. such us: purliclcs scgrcgulion |Suk-
aguchi et al., 1998], particles flow [Langston et al., 1997, 2004],
silos dischurgc |Yung & Hsiuu. 2OO1: Sukuguchi et al.. 1994: Mus-
son & Murlincz. 2OOO]. shcur bchuvior |Iwushilu & Odu. 2OOO:
Sukuguchi & Iuvicr. 2OOO]. etc.
Determination of simulation time step. Calculation
of a simulation time step is one of essential questions in DE
modelling. Sufficiently short time step ensures stability of the
system and enables stimulation of the real processes.
According lo Timoshcnko & Coodicr |19O] und 1ohn-
son [2004], during motion of particles in a granular system
the disturbances propagate in a form of Rayleigh waves along
surface of solid. The simulation time step is a part of Rayleigh
time which is taken by energy wave to transverse the smallest
element in the particular system. It should be so short that
disturbance of particles motion propagates only to the near-
est neighbours. It is assumed that velocity and acceleration
are constant during the time step. The time step should be
smaller than critical time increment calculated from theory.
A number of equations have been proposed for calculation
ol u crilicul limc slcp |Cundull & Slruck. 199: O`Sullivun &
Bray, 2004], however usually it is estimated based on natural
lrcqucncy in u lincur spring syslcm |Ru|i & Iuvicr. 2OO4]:

, k / f t
i c
= A
(3)
whcrc: k is cllcclivc slillncss und l is u luclor. Thc choicc ol
proper value of the constant f is very important but not easy.
The reason of difficulties is a fact that the f depends strongly
on packing configuration, number of contacts and proper-
ties of particles. It is different for two- and three-dimensional
simululions us wcll |O`Sullivun & Bruy. 2OO4 ].
Models of contact interaction. Application of a proper
conlucl modcl wilhin IM simululions cnublcs uccurulc
description of particles collision. The particle contact forces
occur only when particles penetrate or overlap. For circular
purliclcs lhc ovcrlup occurs whcn:

A
ij
=(r
j
r
i
)|x
j
x
i
|> 0, (4)
whcrc:
ij
is the amount of overlap between particles i and j, r
is a particle radius and x is the position vector for the particle
centre. Linear and non-linear contact models may be applied.
In the former model normal contact force is a linear function
of the overlap
ij
and relative velocity of particles A

ij
:
F
y
=kA
ij
+cA

ij
(5)
whcrc: k is coefficient of stiffness and c is coefficient of damp-
ing. The linear model is sufficient to investigate simple pro-
cesses occurring in grain assembly for elastic collisions. In
certain cases, in spite of its simplicity linear contact model
providcs rcsulls closc lo cxpcrimcnlul dulu |i Muio & i
Renzo, 2004]. The more complex processes should be exam-
ined by application of non-linear contact model that was pro-
poscd by Hcrlz |1882] lor normul dircclion:
F
y
=kA
ij
o
+ cA
ij
|
A

ij
(6)
whcrc: and urc lhc indcx. und by Mindlin |1949] lor lun-
gcnliul dircclion. Thcsc modcls posc Hcrlz-Mindlin conlucl
theory for elastic granular materials. The need of examin-
ing materials with various properties enforced extending the
IICURI 1. Conlucl modcl lor IM simululions in lhc normul (u) und
tangential (b) directions.
16
shared memory approach is difficult to implement on large scale, as it requires special
memory access patterns to avoid any synchronisation issues among processors, the
communication-based approach is preferred. The simplest way to implement the
communication-based parallelisation approach is to replicate all the data to every
processor that requires additional memory and communication time. The challenge
here is to reduce the memory and communication time by aiming to provide each
processor only with the minimum and required data it needs for the computation.
There are three types of decomposition strategies discussed by Beazleyand and
Lomdahl (1994) to overcome this challenge for parallel computing. They are force
decomposition, atom decomposition, and spatial decomposition. The MD simulation
package used in this thesis (LAMMPS) employs Spatial Decomposition (also known
as Doman Decomposition) technique.




Figure 2.5: Spatial Decomposition approach

The Spatial Decomposition approach divides the whole simulation domain
geometrically into small regions and assigns each region to individual processors. The
regions are further divided to cells. The advantage of this approach is each processor
has to work only on the particles in its own region (and cells). The only exception is
the particles close to the boundary of its neighbouring processors region. In this case,
message passing has to happen between the neighbouring processors to communicate
vital information required for the computation. This helps in limiting the
communication only to particles that movie in-and-out of the region and particles on
boundaries and has a lesser data size. Thus, the domain decomposition algorithm
works efficiently with minimal communication overheads.

If there are (N) processors used in the simulation each with the same calculation
speed, to calculate the estimated Speed-up (S) it is just enough to count the number of
instructions (I) or just the number of processor clock cycles.

!
S =
t
sequential
t
parallel
=
I
sequential
I
parallel
(2.10)


1 0
2 3
1
5
3
7
2 3
6 7
Domain - whole system
Regions - for each node
Cells -within a region
17
Let
!
" denote the fraction of parallelizable instructions, then there are
!
("# I
sequential
# N)
times parallel instructions and
!
(1"#)$ I
sequential
non-parallel instructions. Let I
overhead
be
the overhead time spent on initialization, communication and synchronization
process, the Speed-up S is given by

!
S =
I
sequential
(1"#)$ I
sequential
+
1
N
$ #$ I
sequential
+ I
overhead



!
=
1
(1"#) +
#
N
+
I
overhead
I
sequential



!
=
1
(1"#) +
#
N
+$
, where
!
" =
I
overhead
I
sequential
(2.11)


When
!
"= 0, equation 2.10 is called as Amdahls law (Jordan and Alaghband, 2003)
that implies even with the negligible overhead time, for a program with 90%
parallelizable sections in it, the maximum speedup possible is not more than 10 times
faster by parallelisation, no matter how many number of processors are used.

2.10 Summary
With the advent of powerful computational resources, DEM simulations have become a
popular method of addressing problems in granular and discontinuous media. Several
decomposition techniques are available that help in parallelizing large DEM
simulations. Potential functions to find the forces on each particle and a numerical
integration technique to evaluate the movement of particle are essential for all DEM
simulations. In this thesis, granular potential was chosen because of its ability to model
granular materials. The motion of particles is evaluated using the Velocity-Verlet
algorithm.
18
Chapter 3

Experimental Setup



This chapter describes the details of the experimental set-up used in the current study.
Included in the description are the salient features of each hardware platform used,
details about the software package and the visualization setup. It also describes the
choice of the experimental parameters required for the simulation and the ways to
calculate those parameters values.



















19
3.1 The Platforms
3.1.1 HECToR Cray XE6
Architecture Overview
HECToR service (phase 2b) consists of a Cray XE6 massively parallel processor
(MPP) distributed memory system. It uses AMD processors with a custom built
memory and communication system. This is tightly coupled with operating and
management system making it highly scalable and reliable system (cray.com, 2011).

Figure 3.1: The Cray XE6 (cray.com, 2011)
The XE6 as part of HECToR service consists of 1856 compute nodes, which contain
two AMD Opteron 6172 2.1 GHz 12-core processors, code-named Magny-Cours
which means that essentially two hexa-core cluster connected within the same socket.
Two such Magny-Cours form a processor in HECToR. This gives 44,544 cores, which
gives theoretical peak performance of 373 Tflops. In addition to this, there are 64
service modes: 6core, 2.2 GHz, 16 GB memory.
The processors on XE6 are used as compute nodes or service nodes.
Compute Nodes run Computer Node Linux (CNL), are configured for user
application, and ensures that there is a little OS noise during application
execution.
20
Service Nodes run SuSE Linux and can be configured for login, network and
system functions
Memory and Cache
There is 32GB of main memory per 24 core processor accessible using Non-Uniform
Memory Architecture (NUMA), giving the XE6 a total memory capacity of 58 TB.
Each core has 64KB of dedicated L1 data cache and 64KB of dedicated L1 instruction
cache, 512 KB of dedicated L2 cache, 6MB of shared L3 cache. Out of the 6MB of
shared L3 cache, 1MB is allocated for maintaining cache coherency. In XE, it is
possible to allocate all 32GB main memory to only one core in a processor, which helps
in running sparsely populated jobs so that more memory is available per core, although
this requires higher computational cycles.

Figure 3.2: Magny-Cours Architecture diagram (hector.ac.uk, 2011)
21
Communication Network

The key feature of XE6 is the Gemini Interconnect. The Gemini ASIC is capable of
handling tens of millions of MPI messages per second, thus improving the
performances drastically. Each dual-socket node is interfaced to Gemini Interconnect
through HyperTransport link. The bandwidth of each link is very high 8GB/s and low
latency around 1.5
!
s.

Data I/O and Storage

Out of the 64 service nodes on Phase 2b, 12 nodes are configured as IO nodes. These
are integrated to a 3D torus communication network via their own Gemini chips. They
provide connection between the machine and its 576 TB RAID disk via Infiniband
fibre. A high performance parallel file system (esFS) is used to allow access to the disk
by all the IO nodes. The phase 2b system has a 70TB of home file system that is backed
up to a 168 TB backup system. Figure 3.3 summarises the current file system.


Figure 3.3: HECToR File System (hector.ac.uk, 2011)


Programming Environment

On login to HECToR the Cray XE programming environment is loaded that sets the
environmental variables for the compilers and parallel libraries. Cray provides
wrappers scripts for compilers ftn for Fortran compiler and c/CC for C/C++
compilers. They wrappers serve as a single command to compile and link all the
necessary parallel libraries.


22
The Portable Batch System PBS batch scheduler is used to run the jobs on
HECToR. The easiest way to submit jobs is using a batch script that launches the job
using the aprun command. Refer to Appendix for more details on the PBS submission
script and the parameters that need to be specified in the script.

3.1.2 Ness
Ness is a much smaller parallel machine that is mostly used by students of EPCC and
for general research activities (epcc.ed.ac.uk, 2011). Ness is a Sun fire system machine
with two 16-core shared memory processors (SMP) nodes, at most one of which can be
allocated to a single job. It has the same combination of processor technology,
operating system and compiler suite as that of HECToR. Thus, it is very useful to do
some inexpensive code development, correctness studies and some performance tests
on this machine before transferring to HECToR. Table 2.1 summarises the specification
of Ness.

Property Ness
Machine Type Sun fire X4600
Machine Category Shared Memory
Processor Type 2.6 GHz AMD Opteron (AMD64e)
Theoretical Peak Flop rate per core 10.4Gflops/s
Cores per node 16
Total Cores 32
Maximum Job Size 16
Memory Per Core 2 GB
Interconnect N/A
MPI Bandwidth 1 GB/s
MPI Latency 0.8
L1 Data Cache Type Private Data
L1 Data Cache Size 64 KB
L1 Instruction Cache Type Private Instruction
L1 Instruction Cache Size 64 KB
L2 Cache Type Private Unified
L2 Cache Size 1024 KB

Table 3.1: Ness Specification

3.2 The Software
There are a number of commercial simulation packages available to perform DEM
modelling. Two software packages LAMMPS and LIGGGHTS are used to carry
out the computations for this thesis.
23
3.2.1 LAMMPS Code Introduction
LAMMPS (Sandia.gov, 2011) stands for Large Scale Atomic/Molecular Massively
Parallel Simulator that is written in C++ and is an open source code distributed under
GNU public license developed at Sandia National Laboratories, USA. LAMMPS can
be used to model wide range of materials ranging from large-scale atomic and
molecular systems using a variety of field forces and boundary conditions. Given the
boundary conditions LAMMPS simulates the materials by integrating Newton's
equations of motion for a system of interacting particles via short or long-range
forces. These forces include pair wise potentials, many body potentials such as
Embedded Atom Method (EAM), and long-range columbic forces such as Particle-
Particle Particle-Mesh (PPPM) force fields.

LAMMPS can be run on single processor as well as multi-processor machines and
even on multiple PCs that are connected through an Ethernet network. The LAMMPS
code has been parallelised using MPI for parallel communications and it uses spatial
decomposition strategy to decompose the domain into small dimensions that can be
given to each processor on the parallel machine. All of the LAMMPS and
LIGGGHTS simulation for this thesis were run on Ness as well as on the HECToR
supercomputer to take advantage of the parallel processing capabilities of LAMMPS.

Why LAMMPS
1. LAMMPS is free, easy to use, its source code is well structured, easy to
understand and modify.
2. LAMMPS is fast and it is suitable for massively parallel computing.
3. LAMMPS is well documented and it has large user community.
4. LAMMPS has a good MPI coupling interface.
5. LAMMPS scales well to large number of processors; its speedup generally
increases with the increase in the number of atoms used in the simulation (Reid
and Smith, 2005).
6. GPU acceleration is possible in LAMMPS.

3.2.2 LAMMPS Installation
The latest version (January 2011) of LAMMPS was downloaded from the LAMMPS
website as a tar file. This zip file is ftp-ed to an appropriate directory in HECToR.
Unzipping the file automatically creates the directory structure for LAMMPS on
HECToR. The /src directory contains all the C++ source and header files required for
LAMMPS. Given below are the instructions for building LAMMPS January 2011
version on HECToR XE6.
1. Create a Makefile for HECToR
LAMMPS contains two level Makefiles. A top-level Makefile is located in the /src
directory and a MAKE subdirectory that has the low level Makefiles or machine
specific Makefiles. First stage in building LAMMPS is to create a make file specific for
HECToR. A Makefile called Makefile.hector was created using the sample
24
makefile that comes with LAMMPS as a starting template. The Makefile contains a
number of system-specific settings, rules to compile and link the source files and many
other dependencies that are required in building the executable.
The first step is to change the first line of the Makefile to list the word HECToR after
the #. This line is displayed first when building LAMMPS. In addition, this will
include HECToR in the list of available options for the make command.
# HECToR XT4 system
The compiler/linker section lists the compiler and linker settings for HECToR. PGI
compiler is the default compiler on HECToR. But there is an open bug in using PGI
compilers with new version of LAMMPS. So it was then decided to use GNU compiler
instead of PGI compiler. Thanks to Dr. Fiona Reid for her help in implementing this
change. The following flags were used to compile LAMMPS:
CC = CC
CCFLAGS = -O3 -g -DFFT_FFTW -DMPICH_IGNORE_CXX_SEEK
DEPFLAGS = -M
LINK = CC $(CCFLAGS)
USRLIB = -ldfftw
SIZE = size
The location of FFTW libraries needs to be mentioned in the CCFLAGS option. The
option DFFT_FFTW includes the centrally installed one-dimensional FFTW library.
2. Load the appropriate environment
Load the xtpe-mc12 environment and also FFTW 2.1.5.* environment.
module load xtpe-mc12
module load fftw/2.1.5.2
To properly link the FFTW libraries few modifications to the source code is required.
The name of the header files in the fft3d.h header file needs to be altered to change
fftw.h -> dfftw.h so that they compile correctly against FFTW 2.1.5. This is the version
of FFTW library supported by LAMMPS.
3. Build LAMMPS
Now execute the below command being in the /src directory of LAMMPS.
make hector
This will create the executable lmp_hector in the same directory when the build is
complete.
25
3.2.3 LAMMPS Working
This section provides some background to the way LAMMPS works that is very
different from other commercial MD-DEM simulation packages. It is not required to
compile the code repeatedly for different MD scenarios. Instead, LAMMPS has come
up with its own scripting style in the form of an input file. It works by first reading
the input file, which has information about the initialization parameters like
dimension, units, boundary and particle definitions size, shape and initial co-ordinates
and the time-step required for the simulation. LAMMPS then builds the atomic
system and periodically writes the thermodynamic information to a log file. Once the
simulation is complete, the final state of the system is printed to the log file along
with other information like total execution time. All of these are discussed in detail in
the subsequent sections.


Figure 3.4: Execution of LAMMPS flow chart
3.2.4 LAMMPS - Input Script Structure
This section describes the structure of a typical LAMMPS input script. As shown in
figure 3.4, the input file of LAMMPS is divided into four parts.
Initialization: This is the very first stage to sets the parameters that defines the
molecular system and the simulation domain. Example: Define the processor topology.
Particle Definition: This sets the position and forces of particles in the simulation
domain. There are three ways to do this. The details can be read from a new data file
or restart file from the previous simulation or an input lattice can be created as part of
the simulation itself.

26
Settings: Once the simulation topology and the particles are created, this sets various
parameters for the simulation like boundary conditions, time-steps, and force field
options.

Execute: Once all the properties required are set, the simulation is started and run for
the desired time-step.
3.2.5 LAMMPS - Input Script Basics
LAMMPS works by start reading the Input commands from the Input file one at a time.
Each command in the input file prompts LAMMPS to take some action, which includes
setting some internal variables or reading the data file or setting some of the material
parameters, or run the simulation. The execution stops when LAMMPS reaches end of
the input file. The order in which the commands are placed in the input file is not
important but the following rules apply.
1. The ordering of commands in the input file has logic impact on the working of
LAMMPS. Thus the sequence of commands
Llme-sLep 0.3
run 100
run 100
does something different than this sequence:
run 100
Llme-sLep 0.3
run 100
In the first case, both the simulation run for 100 iterations each uses the time-step of 0.5
fmsec but in the second case, for the first 100 iterations the default time-step (1.0
fmsec) is used and a time-step of 0.5 fmsec is used for the second 100 iterations.
2. Sometimes the output of command A might be used in command B. This means
command A must precede command B in the input script for the desired effect to
happen.
3. Some commands are valid only when they follow other commands. For example, the
command to set the temperature of group of particles in the simulation cannot be
carried out until the particles are created and the region is defined.
3.2.6 LAMMPS - Parsing Rules
Each non-blank line in the input file is treated as a command by LAMMPS.
LAMMPS commands are case sensitive. All the LAMMPS commands are in
lower case, while upper case may be used for some user-defined commands or
in file names. Here is how LAMMPS parses the input script.

27
The very first word in any non-blank line is the command name, which is
followed by the list of arguments (specified in the same line)

If the last character in any line is &, then it is assumed that the command is
continued in the next line. The previous line is concatenated with the next line
by removing the &.

Any line that starts with # is treated as comment and is discarded.

All user defined variables are followed by a $ sign. If the variable name is just
a single character then it follows immediately next to the $ sign. But if the
variable name is more than a single printable character then it is specified
within curly brackets. For example: $x, ${my_var}. The variable in this case
are x and my_var.

All output text to the screen or to the log files are enclosed within double
quotes.
3.2.7 LIGGGHTS
LIGGGHTS (liggghts.com, 2011) is an open source, C++, MPI parallel DEM code
for modelling granular materials. LIGGGHTS stands for LAMMPS Improved for
General Granular and Granular Heat Transfer Simulations developed and distributed
by Christoph Kloss of Christian Doppler Laboratory on Particulate Flow Modelling at
Johannes Kepler University, Austria. LIGGGHTS is part of the CFDEM project with
the goal to develop a new CFD-DEM approach. LIGGGHTS is based on LAMMPS
and it provides potentials for modelling soft materials, solid-state materials and
coarse-grained granular materials. It can be used to model particles at the atomic,
meso or continuum scale. DEM methods involve the simulation of coarse-grained
granular particles and LAMMPS offers both linear and non-linear granular potentials
for this purpose. All these features of LAMMPS for granular simulations are
improved on in LIGGGHTS. The following are some of the new features that
LIGGGHTS improves on LAMMPS:

It is possible to import complex geometry from computer-aided design (CAD)
into a LIGGGHTS simulation
Pair style parameters like stiffness and damping can be linked to material
properties that can be derived from lab experiments (e.g. density, Youngs
Modulus, Poisons ratio and coefficient of restitution)
It has the potential to model macroscopic cohesion
LIGGGHTS has Dynamic load balancing

All the LAMMPS features, rules, commands and working discussed in previous
sections are applicable to LIGGGHTS as well.

28
3.3 Visualization Setup
LAMMPS does not do any post processing or visualization of the simulations.
However many visualization tools available can be coupled with LAMMPS to
visualise the output. Pizza.py toolkit (sandia.gov, 2011) and Paraview (paraview.org,
2011) are used in this thesis for Visualisation of LAMMPS output.

Pizza.py is an integrated collection of tools that provide post-processing capabilities
for LAMMPS package. August 2010 version of pizza toolkit is used in this thesis.
3.4 Description of the Mechanical Properties of Snow
Many of the theory and results discussed in this thesis involve some or all of the
mechanical properties that are discussed in this section. This section outlines the
physical properties. Refer to section 5.7.2 for actual values of these parameters used
in this simulation.
3.4.1 Size and shape of snow particles
There are several forms of snow particles, of which three are described here:
Snow Crystals: Typical the size of crystals range from microscopic to at most a few
millimetres in diameter.

Snow Flakes: Several snow crystals clump together to form the Snowflake. The
snowflake can grow large in size up-to 10 mm across in some cases when the snow is
especially wet and thick.

Graupel: When the loose collections of super cool water droplets coat a snowflake,
they form Graupel. The typical size of a Graupel is 2 to 5 millimetres in diameter.
For this thesis, it was decided to use spherical snow particles like graupel of size
5mm the maximum size of particles observed in field (In a conversation on 08
th

February 2011 Virdee stated that he has observed graupel snow particles of size 5mm
while on field)


29


Figure 3.5: Graupel Snow particles (Wikipedia.com, 2010, creative commons
licence)
3.4.2 Density
Symbol: S Unit: kg m
3

Density is the fundamental parameter of any material, which is calculated as mass per
unit volume (kg/m
3
). For porous materials, density refers to bulk density, which is the
total mass per volume. It is determined by weighing snow of a known volume. Total
snow density includes all constituents of snow that includes ice, liquid water and air
(Armstrong et al., 2009).
3.4.3 Youngs Modulus
Symbol: E Unit: pascal or N/m
2

Youngs modulus is used to characterise the stiffness of an elastic material (Godbout
et al., 2000). It is the ratio of stress (measures in units of pressure), to strain
(dimensionless unit). For snow ice, Youngs modulus can be measure from Drouin
and Michel 1971:

!
E = (5442.3 "67.3T
i
) Ti is the temperature of ice (
!
0
!
C) (3.1)
3.4.4 Poissons Ratio
Symbol:
!
" Unit: No Unit
In 3D, when an elastic material is stretched in one direction, it tends to get thinner in
the other two directions. Poissons ratio is defined as the ratio of contraction or lateral
stress to the extension or longitudinal strain under the influence of uniform uni-
dimensional stress.

Poisson's ratio is related to K the bulk modulus, G the shear modulus; and E the
Young's modulus, by the following: (Sinha, 1987)

30
!
" =
(3K #2G)
(6K +2G)
(3.2)
3.4.5 Coefficient of restitution
Symbol:
!
" Unit: No Unit
The coefficient of restitution is defined as the ratio of rebound velocity (
!
"
r
) to the
impact velocity (
!
"
i
) in the normal direction (Higa, et al., 1995).

!
" =
#
r
#
i
(3.3)
3.4.6 Coefficient of kinetic friction
Symbol: f Unit: No Unit
Coefficient of friction is the dimensionless scalar value that is given by the ratio of
the force of friction between two bodies and the force pressing them together
(Schaerer, n.d).

!
f =
5
u
u is the avalanche speed in m/s (3.4)

3.5 Summary
This chapter presented both the hardware and software setup used in the project. Both
LAMMPS and LIGGGHTS are installed on Ness as well as on HECToR. Taking
advantage of the configuration similarities between Ness and HECToR, the model
was first developed and tested on Ness before porting it to HECToR. The material
parameters discussed in this Chapter are used in actual implementation of the model.








31
Chapter 4
Modelling of Cohesive Interactions

Snow particles will stick together as they flow in an avalanche. Therefore, this study
needs to investigate cohesive numerical models of DEM. In recent years, DEM of
granular materials have been largely developed and they have great potentiality in both
industrial application and academic research. This chapter presents the numerical
models of cohesion in a discrete element framework. There are three levels of cohesion:
adhesion, cementation and capillarity. The focus of this thesis is only on one level of
cohesion: adhesion. Numerical modelling of cohesive phenomena must take into
account the shapes of particles and hydrodynamic interactions. The numerical
implementation of these interactions depends on the solving strategy: Molecular
Dynamics discussed in Chapter 2, which is based on the means of the equations of
dynamics and pair-wise contact interactions. They can be extended for cohesive
interactions, which are supplemented to the repulsive elastic and frictional interactions
of cohesion-less materials.








32
4.1 DEM Revisited
In this work, DEM is applied to individual snow particles whose larger-scale bulk
behaviour is defined by the way these particles interact with each other. Each particle
is computationally defined along with their shape, initial position, velocity and other
physical properties and changes in these parameters over time calculated for each
particle as they move around and interact/collide with other particles in the simulation
domain. There are two ways to resolve these collisions between particles hard or
event driven approach (EDM) and soft or time driven approach (TDM). Since EDM
assumes instantaneous collisions between particles (impulses), it is more suited for
dilute granular materials. Only the TDM approach helps in resolving the collision
forces between particles. Each particle is treated as a rigid body that can overlap with
its neighbour. Cundall and Strack (Cundall and Strack, 1979) first developed such
approach. TDM simulations are time driven, that is, the state of all particles at time t
is based on and updated after an adaptive time-step delta t. A more detailed review of
the DEM approach followed in this thesis is presented in rest of this chapter.




Figure 4.1: Diagram to illustrate the typical flow of a DEM simulation
33
4.2 Defining a particle and particle collision
A DEM algorithm tracks trajectory and forces of each particle individually at a
microscopic level. This section discusses the definition of a particle and a collision
from Computational perspective.
4.2.1 Particle Definition
In general, a particle can be defined as a small-localised object with physical
properties such as volume and mass with two volume boundary surfaces. The
boundary S
b
defines the physical surface of the particle. In a practical scenario,
particles can interact with other particles in the system even before the physical
contact occurs between them (e.g. electrostatic forces). Therefore, each particle is
considered to have a virtual boundary called effect surface boundary, S
e
, which
defines the boundary at which the particles interact with its neighbours. The entire
body volume of the particle should fall within the effect volume due to S
e
. Each
particle has a co-ordinate centre/origin, O
xyz
and centre of mass O
cm
. O
xyz
and O
cm

need not be same always. Figure 4.2 illustrates the basic computational aspects of a
particle.


Figure 4.2: Definition of a computation particle

The body and effect surfaces vary depending on the shape of the particles. The
particle shape has direct impact on the DEM simulation (Cleary and Sawley,
2002)(Jensen et al., 2001). The use of complex geometry for particle shapes increases
the computation time significantly for collision detection. For this reason, S
b
and Se
are defined as simple Spheres in this thesis such that O
xyz
= O
cm
. If r
p
is the base
radius of the particle with density
!
", the particle mass m
p
is given by

!
m
p
=
4
3
"#r
p
3
(4.1)
34
4.2.2 Cohesive forces
Each particle has two surface boundaries: S
b
body surface, S
e
Effect Surface and
two volumes: body volume and effect volume. There are three possible collision
states between particles as shown in figure 4.3.

Independent: Two particles are said to be independent of each other when their
effect volume due to effect surface (S
e
) do not overlap.

Interacting: Two particles are said to be interacting with each other when, the effect
volumes of the two particles overlap. The particles are not in contact physically but
they interact through some kind of long-range forces.

Colliding: In this state, the particles are in physical contact with each other and their
effect volumes and body volumes overlap.



Figure 4.3: Different possible collision states

When two spherical particles p and q are in collision state, a number of their
properties can be combined to describe the particle in collision. The below approach
is followed in this thesis to calculate the position, velocity and mass of particles in
collision. The relative position, x
pq
, of two particles p and q is the vector from particle
q to particle p

!
xp
pq
= x
p
" x
q
(4.2)

The relative velocity, u
pq
, is also calculated in the same way

!
u
pq
= u
p
" u
q
(4.3)




35
The reduced mass, m
pq
, is the effective inertial mass of the two particles

!
m
pq
=
m
p
m
q
m
p
+ m
q
(4.4)

4.3 Modelling Cohesive Contacts
This section briefly presents a general framework for the contact cohesion in which
various cohesion laws can be implemented. This framework is based on the
determination of a behaviour law and a failure criterion. The specific physical models
that are implemented in this thesis are discussed in section 4.5
4.3.1 Contact Point and Collision Normal
There are two commonly used methods to determine the contact point and collision
normal between the two colliding particles: Common Normal approach and
Intersection approach (Hogue and Newland, 1994)(Dziugys and Peters, 2001). In
common normal approach, a point on each of the interacting particles boundary that
possesses a common normal vector is identified first. The contact point is the mid-
point on the line joining these two points and the normal vector is the common
normal vector between the two points. In Intersection method, the contact point is the
mid-point of the line joining the intersection points of the particles boundaries and the
collision normal is defined perpendicular to the line. The same holds well for 3D,
with the change as intersection, line becomes a plane, with the contact point at the
centre.


Figure 4.4: Two ways to calculate collision normal and contact point

For spherical particles used in this thesis, both the intersection and the common
normal methods give the same normal vectors but different contact points. For
contacting particles with similar material parameters the common normal approach is



36
more realistic (Dziugys and Peters, 2001). For this reason, common normal approach
is used in this thesis.
4.3.2 Normal Deformation and Contact force
The particles deform in size and shape because of the collision. Although it is
possible to model the actual particle deformation (Feng, 2000), practically it is not
possible to do so because of the computational effort required for such modelling.
Instead, the deformation process is approximated by an overlap volume or distance
between the two colliding particles. The normal overlap distance between the two
particles is given by
!
" as shown in figure 4.5.


Figure 4.5: Contact zone between two spherical particles

For the two colliding particles p and q with radii r
p
and r
q
respectively, the normal
body surface overlap distance
!
"
pq
is given by

!
"
pq
= r
p
+ r
q
# x
pq

!
(4.5)

The relative velocity is expressed into normal and tangential components as given in
equation 4.6 and 4.7 respectively.

!
u
pq n
= (u
pq
n
^
pq ) n
pq
^
(4.6)
!
u
pq t
= u
pq
" u
pq n
(4.7)

The forces acting between the particles are normal to the contact plane and it is
expressed as equation 4.8.

!
f
n
= f
n
e
+ f
n
d
+ f
n
c
(4.8)

!
f
n
d
Repulsive forces at contact
!
f
n
d
- Viscous force
!
f
n
c
Cohesion force

37
For spherical particles, the contact region takes the shape of a disk of radius (say a) at
which the pressure is not uniformly distributed. The contact force is given by Hertz
contact law, which is expressed as in equation 4.9.

!
f
n
e
=
4
3
E
*
R
*
("#
n
)
3
2
(4.9)

With
!
1
E
*
=
(1"#
1
2
)
E
1
!
+
(1"#
2
2
)
E
2
where E
1
and E
2
are youngs modulus of the two
particles
!
"
1
and
!
"
2
are the poisons ratio and R* is the harmonic mean of the particles
radii.

In the case of cohesive granular systems, the macroscopic behaviour is much more
dominated by the cohesive interactions than by the non-linear behaviour at the
contact. In such case, the linear approximation of the contact force is more relevant.

!
f
n
e
= k
n
("#
n
) (4.10)
4.3.3 Collision Detection
In DEM collision, detection consists of two phases: neighbour searching and
geometric restitution. The neighbour-searching phase identifies the list of particle
pairs that might come in physical contact with the particle and the geometric
restitution phase calculates the exact collision properties. Building the neighbour list
helps in optimizing the neighbour searching process (section 2.7).


Figure 4.6: Stages of collision detection

4.4 Basics of Contact force models
The particle interaction forces play a major role in the DEM simulation of granular
materials. This section introduces the physical bases that are needed to understand the
interaction models.


38
Direct Contact Interaction
When two particles interact with each other, it produces repulsive forces because of
elastic surface deformation. This elastic deformation of spherical particles is defined
by Hertz theory (Hertz, 1882). Direct contact of two particle surfaces produces
frictional forces that resist the sliding (tangential) motion of particles. The
approximation technique followed for friction forces is known as Coulomb friction,
which is given by product of normal force and friction coefficient
!
". The value of
!
"
depends on whether the flow is static or sliding.

Contact-Independent Interaction
Some forces act between particles even when they are not physically in contact with
each other. Cohesion between particles is because of the Van-der-Waals forces. For
non-deformable spherical particles, Hamaker constant defines the Van-der-Waals
forces (Hamaker, 1937). For deformable spheres using Hertz contact model, the Van-
der-Waals forces are defined by what is known as JKR model (section 4.5.2).

Defining Force Models
Almost all the particles in the system influence the interacting particles for the
reasons discussed in the previous section. In DEM approach, some kind of numerical
model is used to evaluate the magnitude of these forces exerted by the particles upon
each other. These models are based on the contact geometry described in section 4.3
and divide the forces into normal component Fn and tangential component Ft that act
through the point of contact C.

4.5 Physical Models of Cohesive Contact
The general framework discussed in previous section can be used to model several
cohesive interactions. Many cohesive contact force models exist. An early model of
contact adhesion was developed by Bradly. In this model, the adhesion is calculated
using the Van-der-Waals interactions and the contact deformation of the surface is
completely neglected. Since contact deformation is the key in cohesive granular
particles, this model is not chosen for this thesis. In this thesis two popular force
models are implemented:

1. Non-Linear cohesion model (LAMMPS simulation) (Aarons, et al., 2009)
2. JKR cohesion model (LIGGGHTS simulation) (Radja and Dubois, 2008)
4.5.1 Linear cohesion model
Just like any standard DEM approach, the particles were allowed to overlap when they
collide with other particle, at which point the particles exert a repulsive force on each
other. This repulsive force that act in the normal direction is given by the linear spring-
dashpot normal force model,
!
F
n
= k" #$v
n
(4.11)

39
Here
k - normal spring stiffness
!
" - overlap
!
"
- damping coefficient
!
V
n
- relative normal velocity of the colliding particles.
The inelastic behaviour of the model is characterised by the damping term. The
elasticity is defined by the coefficient of restitution which is given by
!
" = exp #
$%
2mk #%
2
&
'
(
(
)
*
+
+
(4.12)
Here
!
m = "#d
3
/6 is the mass of a particle. A linear spring slider model gives the force
exerted by the colliding particles in the tangential direction,

!
F
t
= min(k
t
"s, F
n
) (4.13)

Here k
t
is the tangential spring constant and
!
"s is the distance between the two
particles in tangential direction. The cohesive particles interact with each other via the
van der Waals forces whose magnitude is given by

!
F
vdW
=
Ad
6
6s
2
(s +2d)
2
(s + d)
3
(4.14)

Here A Hamaker Constant
s distance between surface of the two colliding particle

When the particles collide, this model tends to diverge. To avoid this, a minimum cut-
off separation s
min
is used such that s <s
min
the van der Waals force remained equal to
the force experience at s
min.
The strength of the cohesion is expresses as the ratio of the
maximum van der Waals forces experienced by a particle to its weight.
4.5.2 JKR cohesion model
JKR model developed by Johson I et al. is based on Hertz elastic model. Figure 4.7 is
the pictorial representation of a particle of diameter d attached to a flat surface (in
green). Let P be the external force applied to the particle, a is the contact radii; F
ad
is
the adhesion force between the particle and the surface.

40

Figure 4.7: Hertz contact force model

In Hertzian model, the normal pushback force between two particles is proportional
to the area of overlap between the two particles. By considering the contact forces
between the two smooth particle surfaces and under the assumption of hertz elastic
model, the JKR model leads to the expression: (equation 4.15)

!
a
3
=
R
*
E
*
f
n
+ 3"#R
*
+ 6"#R
*
f
n
+(3"#R
*
)
2
[ ]
(4.15)

!
E
*
reduced elastic modulus
!
R* reduced radius of the particles in contact
!
" surface energy in J/m
2

!
f
n

normal force

JKR cohesion model is more accurate on system with large cohesion density and
larger particles.

4.6 Summary
This chapter introduced DEM as a suitable method to capture the snow particle
behaviour. Its context has been explained in terms of its constituent parts collision
detection, contact force evaluation, inter-particle force models - each is explained and
critically reviewed. Details of the physical cohesion model implemented in this thesis
are presented.
41
Chapter 5
Implementation Details

For the purpose of this thesis two cohesive models were considered - one supplied after
discussions with Dr. Jin Sun - a granular material expert at Institute for Infrastructure
and Environment at University of Edinburgh and the other is the built-in model that
comes with LIGGGHTS. The implementation details of these two models using
LAMMPS/LIGGGHTS are presented in detail in this chapter along with some
discussions on the simulation results.











42
5.1 Porting LAMMPS Cohesion Add-on to HECToR
LAMMPS granular packing does not have functionality to model Cohesion forces
between granular materials. Dr. Jin Sun has developed a Cohesion potential based in
the Linear Cohesion model, as discussed in section 4.5.1, for a different project of his.
Thanks to Dr. Jin Sun for providing the cohesion add-on code to use in this project.
The first step in the implementation phase is to port the cohesion add-on code to
HECToR. The add-on consists of two files fix_cohesive.cpp and fix_cohesive.h.
Two main challenges in porting the code to HECToR are: First, the code has been
tested on serial Linux based machine but it has not been ported to any of the
massively parallel machine like HECToR. Second, Dr. Suns code was developed for
a very old version of LAMMPS (June 2007). It might not be compatible with the
currently installed version on HECToR (January 2011).
The code was ported successfully on HECToR, the LAMMPS was compiled, and the
binary executable was built successfully. Nevertheless, it did not execute because of
the compatibility issues. Many of the data structures used in the June 2007 version
were deprecated and new data structures were added in the new version. It was first
decided to use the version of LAMMPS (June 2007) that Dr. Jin Sun used for his
code development. Then the new version (January 2011) LAMMPS has many new
features that will be of great help for the project. So, it was decided to fix the issue
and make the add-on compatible to the new version. The project plan was updated
accordingly to include this in the plan (see Appendix A). The fix was not as easy as it
was first estimated. After thorough analysis of the code, it was found that the problem
was because of two reasons as discussed in section 5.1.1 and 5.1.2.
Keeping in mind the budget constraint on HECToR, it was decided to use Ness for
code development and testing of the code changes before porting it to HECToR.
There is no central installation of LAMMPS on Ness. So my own installation of
LAMMPS was done on Ness along with pizza.py toolkit and Paraview that are
required for visualisation.
To port LAMMPS to Ness, a makefile called Makefile.Ness is created using the
Makefile for the HECToR as a starting template. Due to the configuration of Ness, it
is necessary to include the absolute path to the directories containing the compiler or
required libraries. Refer to Appendix B for the makefile used to build the parallel
version of LAMMPS on Ness. Ness has got both PGI and GNU compliers. Because
of the issue with PGI compilers and new version of LAMMPS, GNU compiler is used
on Ness as well. The Ness helpdesk team was contacted to get the location of the
necessary libraries (FFTW libraries).


43
5.1.1 Modifying the fix_cohesive.h header file
First problem was the new cohesion add-on was not recognised by the new version of
LAMMPS (January 2011). The way the header files are implemented in the new
version of LAMMPS has changed. The fix_cohesive.h file was re-written to match
the style followed in new LAMMPS version. The fix_cohesive.h should be
structured as follows
#ifdef FIX_CLASS
FixStyle(cohesive,FixCohe)
#else

(class definition of FixCohe)

#endif
Here cohesive is the new fix style that is added to LAMMPS, and FixCohe is the
class name defined in the fix_cohesive.h and fix_cohesive.cpp files. Now when
LAMMPS is re-built, the new fix style (cohe) becomes part of the executable and can
be invoked with the fix command like as given below.
fix cohe all cohesive 9.6e-8 0.01 4.0E-5 0.25 1
In this command, the "cohe" is the name of the new cohesive command and "all is
the particle group. Because there are two formulas for van der Waals force
implemented, the last parameter is an option to choose which to use 0 for a slightly
complicated version. Normally option 1 is used, a common formula. 9.6e
-8
is for the
Hamaker constant. 0.01 is for London retardation wavelength and not used for option
1. 4.0e
-5
and 0.25 are for the minimum and maximum separation, respectively.
5.1.2 Modifying the fix_cohesive.cpp header file
Second issue is because of the way the Neighbour list is built in the new version
(January 2011) of LAMMPS has changed significantly.
fix_cohesive.cpp:120:21: error: 'class LAMMPS_NS::Neighbor'
has no member named 'firstneigh'

fix_cohesive.cpp:121:23: error: 'class LAMMPS_NS::Neighbor'
has no member named 'numneigh'

When investigating the reason for the error message it was found that the variables
'firstneigh' and 'numneigh' that was declared and initialised in the neighbor.h header
file in the old version of LAMMPS (June 2007) is now deprecated in the new version
(January 2011). Fixing the second issue was a bigger challenge. LAMMPS has more
than 350,0000 lines of code organised, as parent and child class and virtual functions
and it were a challenge to find out the way the variables are declared and used.


44
In the June 2007 version of LAMMPS, both the neighbour and neighbour list objects
are part of the Neighbour class. In the new version of LAMMPS, a new parent class
called NeighList is created to do the entire neighbour list related processing. In Jins
code the firstneigh and numneigh variables belong to the Neighbour class. It was
modified to be part of NeighList class as given below.

neighs = neigh_list->firstneigh[i];
numneigh = neigh_list->numneigh[i];

This didnt fix the issue. The code failed with segmentation fault error and the core
files were created. Two things were tried out. Intermediate print statements were
placed in the code to print the objects, pointers and other loop variables on to the
screen and also the Compiler debugging features were turned and debugger was used
to find out where the code crashed. These two steps helped to identify the issue. The
segmentation fault was because the list object was empty. The below two functions
were over ridden in the fix_cohesive.cpp class to populate the list object.

void FixCohe::init()
{
int irequest = neighbor->request((void *) this);
neighbor->requests[irequest]->pair = 0;
neighbor->requests[irequest]->fix = 1;

if (strcmp(update->integrate_style,"respa") == 0)
nlevels_respa = ((Respa *) update->integrate)->nlevels;
}
void FixCohe::init_list(int id, NeighList *ptr)
{
list = ptr;
}
The way the force fields are calculated also requires some changes so that the
cohesive code blends well with the new LAMMPS structure. The modified code is
given below.
for (ii = 0; ii < nlocal; ii++){
i = ilist[ii];
if (!(mask[i] & groupbit)) continue;
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
radi = radius[i];
jlist = firstneigh[i];
jnum = numneigh[i];

for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
45
delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
radj = radius[j];
radsum = radi + radj;

if (rsq < (radsum + smax)*(radsum + smax)) {
r = sqrt(rsq);
del = r - radsum;
if (del > lam*PInv)
ccel = - ah*radsum*lam*
(6.4988e-3 - 4.5316e-4*lam/del + 1.1326e-
5*lam*lam/del/del)/del/del/del;
else if (del > smin)
ccel = - ah * (lam +
22.242*del)*radsum*lam/24.0/(lam + 11.121*del)
/(lam + 11.121*del)/del/del;
else
ccel = - ah * (lam +
22.242*smin)*radsum*lam/24.0/(lam + 11.121*smin)
/(lam + 11.121*smin)/smin/smin;
rinv = 1/r;

ccelx = delx*ccel*rinv ;
ccely = dely*ccel*rinv ;
ccelz = delz*ccel*rinv ;
f[i][0] += ccelx;
f[i][1] += ccely;
f[i][2] += ccelz;

if (newton_pair || j < nlocal) {
f[j][0] -= ccelx;
f[j][1] -= ccely;
f[j][2] -= ccelz;
}
}
}
}
These changes fixed the compatibility issue and the code was successfully running on
Ness. The code was tested by executing some example simulations those come with
LAMMPS and verifying their output for correctness. The code was then ported
successfully it HECToR. No issues were encountered in porting the code to HECToR.
It was tested on HECToR as well for correctness before proceeding with developing
the snow model.

46
5.2 Building the granular module within LAMMPS
LAMMPS comes with a Granular module exclusively for granular DEM simulations.
In the LAMMPS distribution, the granular module is distributed as an add-on module,
which means it is not included in the default compilation of LAMMPS. It has to be
built separately. In order to do that, go the LAMMPS source sub-directory (/sub) and
type
make yes-granular
followed by
make hector
to compile LAMMPS with the granular package on Hector
5.3 LAMMPS Granular Essentials
Granular Systems are composed of spherical particles of certain diameter value, which
mean they have an angular velocity and it is possible to make them rotate, by imparting
torque. As a general guideline, to run the granular simulation in LAMMPS it is required
to use the following commands.
atom_stylespheie
fix nvespheie
fix giavity
This compute calculates rotational kinetic energy, which can be outputted to the dump
file.
compute eiotatespheie
Use one of the 3 pair potentials that calculate the forces and torques between the
particles.
Paii_style gianhistoiy
Paii_style gianno_histoiy
Paii_style gianBeitzian
Any of the below fix options specific for granular systems
Fix fieeze
Fix poui
Fix viscous
Fix wallgian
47
5.4 Determination of simulation time-step
Determining the simulation time step is one of the key steps in DE modelling. The
computational time of the simulation is determined by the time step chosen. If the time
step is very small then the trajectory will cover only a limited proportion in the
simulation space. While a larger time step value will result in instabilities due to high
energy over laps. Such instabilities might lead to LAMMPS failure. The disturbances
due to the motion of particles propagate in a form of Rayleigh waves along the surface
of solid. The simulation time-step should be so short that the disturbance of particles
motion propagates only to the nearest neighbours. The time-step should be smaller than
the critical time increment calculated from theory. The velocity and acceleration are
kept constant in the calculation. The time step is estimated based on the natural
frequency in a liner spring system (Raji and Favier, 2004).
!
"t
c
= f m
i
/ k (5.1)
where k is effective stiffness and f is a factor.
Choosing the correct value for f is not the easy task. It is not easy because f depends
strongly on packing configurations, number of contacts and properties of particles.
Different values for the simulation time step were chosen to examine their influence on
the accuracy of the results obtained. The time-step value 0.00001 is found to be the
optimal value for the simulation.
5.5 LAMMPS Simulation
5.5.1 Implementation Details
The following scheme is used in this thesis to generate the input file for LAMMPS.
The input file is given in Algorithm 1.

1. First step is to define the simulation domain. This includes definition of the particle
style, boundary conditions, how the ghost particles are handles and the units used in
the simulation. The simulation requires the granular atom style be used. This is
defined in the input script in line number 2. An associated command tells LAMMPS
to create a data structure used to index particles. This is specified using the
atom_modify command (line 3). The atom_modify command is to modify the
properties of the particle and determines the way to find out the local particle index
associated with a global particle ID (particle look up). Array storage technique is used
in the simulation to do the particle lookup because it is the fastest method to do
particle lookup. The keyword array (line 3) means each processor maintains a lookup
table of size N (number of particles in the simulation). It requires memory
proportional to N/P, where P is the total number of processors used. As discussed in
section 2.6, periodic boundary conditions are used in this thesis for bulk simulations
(line 4). The inter-processor communication is turned on to communicate velocity
information with particles in the ghost particles. The ghost particles store this velocity
48
information since it is need for calculating the pairwise potential (line 6). For the
purpose of this simulation, all quantities specified in the input script, data file as well
as the quantities output to the log file, and the dump files all use SI units (line 7).

2. Next, the inter-atomic potential equations and its parameters were specified. Inter
atomic potential equations help in describing the forces acting between particles. This
simulation uses Hertz potential (line 12).

3. The third step is particle creation. This can be done in several methods like reading
from a data file or using the restart file from previous simulation or directly using the
pour command. In this thesis, the particles are defined using the pour command (line
19). The pour command is the most important command in the simulation. It inserts
granular particles into the system every few time-step until all the particles are
inserted in to the system. The region for insertion (insertion volume) is specified
using the region command in line 18. Inserted particles will be of granular type with
diameter 0.005 m, density 2500. For each insertion time-step, a fraction of total
particles is inserted inside the simulation region that mimics the stream of poured
particles. The inserted particles flow out of the insertion region under the influence of
gravity, until then next fraction of particles is not inserted. More particles are inserted
at any one time-step if the insertion volume specified using the vol command is high.
However, the insertion volume cannot be more than 0.6 (on a scale of 0 to 1). If this
value is more than 0.6 then the particles tend to overlap which is not correct. For this
simulation, the insertion volume is set as 0.5.

4. Then, to stabilise the atomic structure energy minimization algorithms are used.
Once the system is stable and the initial boundary coordinates are assigned, an initial
round of equilibration is required before starting the simulation. A random number
generator is assigned to set the initial velocities of particles in the simulation domain.
Time-step and duration of MD simulation are specified. NVE (Constant temperature,
Constant volume) integration algorithm is used to update the position, velocity and
angular velocity of the particles.

5. The simulation is started using the run command (line 25). Initially the simulation
is run only for 1 time-step and the thermodynamics details of the particles are written
to the dump file (LAMMPS output file) using the dump command (line 26). This is
required for the proper functioning of the visualisation tool. The Integration
algorithms (Velocity-Verlet algorithm in this thesis) are then used to solve the
Newtons equations and calculate the new positions and velocities of particles. The
simulation is run for up to 5000 time-step (line 27).

6. Once the particle insertion is completed and the particles settle in the simulation
box, the cohesion potential is applied to all the particles in the system using the fix
cohe command (line 31). The first parameter in the fix-cohe command is the Hamaker
constant. Then the chute flow is induced under the influence of gravity with the flow
occurring in the X-axis at a specified inclination of
!
35
!
(line 32). The simulation is
now run for another 35000 time-step to visualise the chute effect.

49
7. The output file is examined to extract the final positions and velocity of particles,
their thermodynamic statistics like energy, pressure and temperature to visualise it in
Paraview.


Algorithm 1: Input Script for LAMMPS Simulation

1: # our granular parLlcles lnLo conLalner, Lhen lnduce chuLe flow
2: aLom_sLyle granular
3: aLom_modlfy map array
4: boundary p p m
3: newLon off
6: communlcaLe slngle vel yes
7: unlLs sl
8: reglon reg block 0 0.13 0 0.13 0 0.13 unlLs box
9: creaLe_box 1 reg
10: nelghbor 0.003 bln
11: nelgh_modlfy delay 0
12: palr_sLyle gran/herLz/hlsLory 2000000.0 nuLL 30.0 nuLL 0.3 1
13: palr_coeff * *
14: Llme-sLep 0.00003
13: flx 1 all nve/sphere
16: flx 2 all gravlLy 9.81 vecLor 0.0 0.0 -1.0
17: flx zwalls all wall/gran 2000000.0 nuLL 30.0 nuLL 0.3 0 zplane 0.0&
0.13
18: reglon slab block 0.02 0.14 0.02 0.14 0.13 0.14 unlLs box
19: flx lns all pour 10000 1 1 vol 0.3 100 dlam 0.003 0.003 dens 2300 &
2300 vel 0. 0. 0. 0. -1. reglon slab
20: compuLe 1 all eroLaLe/sphere
21: Lhermo_sLyle cusLom sLep aLoms ke c_1 vol
22: Lhermo 1000
23: Lhermo_modlfy losL lgnore norm no
24: compuLe_modlfy Lhermo_Lemp dynamlc yes
23: run 1
26: dump mydmp all cusLom 100 dump.pourCohe ld Lype x y z lx ly lz vx vy vz
& fx fy fz omegax omegay omegaz radlus
27: run 3000 upLo
28: unflx lns
29: dump_modlfy mydmp every 10000
30: run 13000 upLo
31: flx cohe all coheslve 9.6e-8 0.01 4.0L-3 0.23 1
32: flx 3 all gravlLy 9.81 chuLe 33.0
33: dump_modlfy mydmp every 100
34: run 33000 upLo


50
5.5.2 Visualisation
The post processing stage consists of two separate steps. First, is to extract the
snapshot of the simulation from the dump files created by LAMMPS. The dump file
contains the energy of each particle for every specified frame. It is a tough to read the
information from the dump file manually. The pizza.py toolkit provides dummy.py
tool for this purpose. The dummy tool reads the LAMMPS dump file and stores their
contents as snapshots with 2D arrays of atom attributes, which can be accessed and
manipulated as required. It is now possible to read the snapshots and convert them to
VTK format. The vtk.py tool is used for this purpose. It reads the LAMMPS
snapshots and converts them to VTK format used by various visualisation packages.
The VTK files are visualised using Paraview. Below is the usage of the tools:

/* Load pizza.py */
python -i /home/gran_pizza_17Aug10/src/pizza.py
/* Create snapshots.d is the dump object containing particle co-
ordinates. */
d = dump(dump_filename)
/* Convert the snapshots to vtk format */
v = vtk(d)
/* Write the snapshots to imag0.vtk,imag1.vtk, etc */
v.manyGran()
5.5.3 LAMMPS Simulation Results and Discussions
A representative snapshot of the simulation is shown in figure 5.1 to 5.3 taken at
various time intervals. In this simulation, a stream of 10,000 granular snow particles is
poured from top of a 3D box. The particles are colored by magnitude of the normal
velocity acting on the particle from slowest to fastest. The particles in red are with high
normal velocity while the particles in blue are with the lowest.
In the simulation, there are three qualitatively distinct regimes determined by the height
of the pour/simulation box, the angle of inclination of the lower surface with respect to
the direction of gravity and the inter-particle cohesive stresses determined by the
Hamaker constant value. When the angle of inclination is small (i.e., below the angle of
response) or for short cohesive force (i.e., small value for Hamaker constant), the
granular system is stationary, that is, after the pour completes the particles settle on the
lower surface and do not flow. Below the angle of response for a given cohesive
energy, the particles dissipate more energy than the gravitational potential energy and
hence the particles do not flow. In avalanche terms this is referred as no flow regime.
On the other hand, at much larger angles, the energy dissipation of the particles is much
lower than the gravitational potential energy and hence the particles continue to
acceleration along the axis of inclination resulting in an unstable regime. For angles
intermediate between these two values, particles flow steady and this referred to as
steady state behaviour. The value of Hamaker constant for steady flow is 9.6x10
-6
and
the angle of inclination is
!
35
!
.
51

Figure 5.1: LAMMPS Simulation Screenshot 1

Figure 5.2: LAMMPS Simulation Screenshot 2
52

Figure 5.3: LAMMPS Simulation Screenshot 3

This cohesion model is based on Hamaker theory which produces a force singularity
that must be avoided when the surface of particles overlap. It models particles - they do
stick - because they are cohesive - but they don't bond/sinter/join crystallographically as
a grain boundary in the way snow particles does. For this reason, and after viewing the
output and having discussions with Dr. Jane Blackford, it was decided to consider the
cohesive model provided with LIGGGHTS.

5.6 LIGGGHTS Simulation
LIGGGHTS simulation is based on the JKR cohesion model (section 4.5.2), which
predicts greater cohesion between particles. The material parameters discussed in
section 3.4 are used in LIGGGHTS simulation to determine the pairwise potential of
particles. The actual values used in the simulation are given in section 5.6.1. Since it
possible to import complex CAD geometry as granular walls in LIGGGHTS, simple
chute geometry is used in the simulation. Particles are poured from the top of the chute
and they slide down the inclined surface. Details are given in section 5.6.2 and 5.6.3.
5.6.1 Material parameter values
Table 5.1 summarised the values for the material properties used in the simulation. All
the values used in the simulation are for a constant temperature (
!
5
!
C).

53
Parameter Unit Value Reference
Shape No unit Spherical
Diameter mm 5
Density Kg/m
3
500 Armstrong et al., 2009
Coefficient of Restitution No unit 0.89 Higa, et al., 1995
Coefficient of Friction No unit 0.1 Schaerer, n.d
Youngs Modulus Pascal 5e6 Godbout et al., 2000
Poissons Ratio No unit 0.32 Sinha, 1987
Gravity Acceleration m/s
2
9.81
Table 5.1: Material Parameters
5.6.2 LIGGGHTS Implementation Details
The input file used for LIGGGHTS is given in Algorithm 2. Since LIGGGHTS is
based on LAMMPS, it also follows the same implementation style of LAMMPS.
Given below are the key implementation features of LIGGGHTS.

1. Just like LAMMPS simulation, the first step is to define the simulation domain
with granular particles (line numbers 1 to line 3). In this simulation, all three
dimensions are defined as non-periodic so that the particles do not interact across the
boundary and position of the face is fixed. The inter-processor communication is
turned on to communicate velocity information with particles in the ghost particles.
The ghost particles store this velocity information since it is need for calculating the
pairwise potential. This is implemented in line 5 of the algorithm.

2. The material properties that are required to calculate the stiffness and damping
coefficients and other pair potentials for the simulation are set using the fix style
property/global in lines from 11 to 16. This fix style is not available in LAMMPS.
This command fixes the global properties to be accessed by other fix or pair styles.
The variables Young s modulus, Poisons ratio etc used in the fix are standard C++
variables.

3. The pair potential is defined using the pair_style command (line 17). This
simulation uses Hertzian potential forces for interaction between particles. The
mesh/gran fix style (line 22) allows importing of complex wall geometry for granular
simulations from CAD by means of ASCII STL files or legacy ASCII VTK files. It is
possible to apply offset or scaling to the geometry. For this simulation, the geometry
is scaled by a factor of 1.0.

4. The imported geometry is used to bind the simulation domain of the granular
system with a frictional wall. All particles in the group interact with the wall when
they are close enough to touch it. The equation for the force between the wall and
particles touching it is the same as the Hertzian potential defined using the pair_style
command.


54
5. The particles are poured into the simulation domain using the pour command. This
command is same as LAMMPS simulation with some improvements to it. The particles
that are generated in a way that they are now completely located within the insertion
region, it is now possible to use the whole simulation box as insertion region.

Algorithm 2: Input Script for LIGGGHTS Simulation

1: aLom_sLyle granular
2: aLom_modlfy map array
3: boundary f f f
4: newLon off
3: communlcaLe slngle vel yes
6: unlLs sl
7: reglon reg block 0.00 1.17 -0.3 0.99 -0.1 1.12 unlLs box
8: creaLe_box 1 reg
9: nelghbor 0.0028 bln
10: nelgh_modlfy delay 0
11: flx m1 all properLy/global youngsModulus peraLomLype 3.e6
12: flx m2 all properLy/global polssons8aLlo peraLomLype 0.323
13: flx m3 all properLy/global coefflclenL8esLlLuLlon peraLomLypepalr &
0.89
14: flx m4 all properLy/global coefflclenLlrlcLlon peraLomLypepalr 1 0.1
13: flx m3 all properLy/global characLerlsLlcveloclLy scalar 100.
16: flx m6 all properLy/global coheslonLnergyuenslLy peraLomLypepalr &
1 120000
17: palr_sLyle gran/herLz/hlsLory 1 1 #PerLzlan wlLh coheslon
18: palr_coeff * *
20: Llme-sLep 0.00003
21: flx gravl all gravlLy 9.81 vecLor 0.0 0.0 -1.0
22: flx cad all mesh/gran myLlnymesh.sLl 1 1.0 0. 0. 0. 0. 0. 0.
23: flx granwalls all wall/gran/herLz/hlsLory 1 1 mesh/gran 1 cad
24: group nve_group reglon reg
23: reglon bc cyllnder z 0.27 0.21 0.06 0.36 0.38 unlLs box
26: flx lns nve_group pour 300000 1 29494 reglon bc dlam unlform &
0.003 0.003 dens unlform 300 300 reglon bc
27: flx lnLegr nve_group nve/sphere
28: compuLe 1 all eroLaLe/sphere
29: Lhermo_sLyle cusLom sLep aLoms ke c_1 vol
30: Lhermo 4000
31: Lhermo_modlfy losL lgnore norm no
31: compuLe_modlfy Lhermo_Lemp dynamlc yes
33: dump dumpsLl all sLl 100 dump.sLl
34: run 1
33: dump dmp all cusLom 3000 dump.chuLe ld Lype Lype x y z lx ly lz vx vy
vz fx fy fz omegax omegay omegaz radlus
36: undump dumpsLl
37: run 100000 upLo
38: unflx lns
55
5.6.3 LIGGGHTS Simulation Results
Figure 5.4 5.6 show the snapshot of the first LIGGGHTS simulation. This simulation
uses JKR cohesion model with Hertzian potentials. As seen in the figure, the chute
geometry is used as the granular wall. The snow particles are poured from a certain
height and they flow down the chute till they reach the end of the granular wall. Since
the simulation uses non-periodic boundary conditions, once the particles reach the end
of the chute, they disappear and do not re-enter the domain.
In realistic scenario, granular snow avalanches start to flow from a point and they
gather mass progressively in a fan-like shape as they flow down the slope. This effect
was attempted in this simulation, but couldnt succeed. LIGGGHTS do not have
facility to model such a feature. It was first thought to code a new fix style for
LIGGGHTS that simulates this kind of behaviour, but was not carried further due to
time constraints. It is possible to use the fix adapt command to change the physical
properties of the particles. Diameter of the particles is increased at a certain rate after
every certain time-step to create an effect of particle (mass) accumulation though it is
not possible to achieve the fan-like shape of the avalanche flow.

Figure 5.4: LIGGGHTS Simulation Screenshot 1
56

Figure 5.5: LIGGGHTS Simulation Screenshot 2


Figure 5.6: LIGGGHTS Simulation Screenshot 3

57
5.6.4 Improved Chute Geometry
Most of the avalanches do not happen on a regular inclined surface. It will have an
initial inclined path where the avalanche is triggered and a run out path. So it was
decided to modify the chute geometry accordingly. The cross section of the new chute
geometry that is used in the simulation is show in figure 5.1. The chute is divided into
three parts: the upper inclined zone, the circular transition zone and the horizontal run-
out zone. The inclination angle
!
" of the upper inclined zone is fixed to
!
37
!
. According
to (Perla, 1999), most avalanches run on slopes between 25
o
and 45
o
; with the optimal
slope angle for an avalanche as 37
o
. Hence the chute inclination angle was fixed at 37
o
.
The dimensions of the chute are summarised in Table 5.2 (Chiou, 2005).
Chute Detail Value
Upper Inclined part, l
1
936 mm
Transition part, l
2
144 mm
Horizontal run-out part, l
3
835mm
Chute width 100 mm
Inclination angle,
!
" 37 degree
Table 5.2: Chute Specifications
AutoCAD was used to draw the chute, first, as a solid geometry. It was not possible to
directly use the CAD geometry in LIGGGHTS as granular wall. To use the solid
geometry in the LIGGGHTS simulation the solid object was converted into a mesh
using the gmsh conversion tool. The AutoCAD was exported as ASCII
stereolithography (stl) file and it is imported to the gmsh tool to convert the geometry to
mesh. It is then used in the simulation as granular wall using the fix command. Refer to
section 5.8 for more details on the implementation. Refer to Appendix C for more
details on the AutoCAD implementation.

Figure 5.7: Cross section of the chute
58
5.6.5 Improved Simulation Results
Figure 5.8 to 5.10 show the snapshot of the improved LIGGGHTS simulation. The new
chute geometry simulates the flow of an avalanche in a more realistic scenario from
terrain the perspective. The avalanche terrain is divided into three parts acceleration
zone, steady flow zone, and deceleration zone. The upper inclined part (l
1
) of the chute
corresponds to the acceleration zone, transition part (l
2
) is the steady flow zone (which
is kept very minimal for computational efficiency) and the horizontal run-out part (l
3
)
corresponds to the deceleration zone of an avalanche path. From computational
perspective, this improved LIGGHTS simulation requires more time-step to complete
(i.e. takes more time) because of the new chute geometry.

Figure 5.8: Improved LIGGGHTS Simulation Screenshot 1
59

Figure 5.9: Improved LIGGGHTS Simulation Screenshot 2


Figure 5.10: Improved LIGGGHTS Simulation Screenshot 3

60
5.7 Summary
The implementation details of the cohesion model are presented in detail in this
Chapter along with the results of the simulation. From the results it was clear that the
Hamaker theory based cohesion model is not suitable for modelling snow particles and
they fail to capture the sintering effect of snow particles. The JKR based cohesion
model is more suitable for the simulation.
61
Chapter 6
Benchmarking

This Chapter lists the benchmarks of the presented implementation and begins with the
most interesting property: Speed-up. Afterwards, the performance is investigated for
multiple processors, where the communication overheads start to increase. The speedup
and scaling is also shown for the different parts of the program. All benchmarks are
carried with the code based on the neighbour list approach.












62
6.1 Cost of Accessing HECToR
It was decided not to use Ness to study the performance of the code for two reasons.
First, Ness does not guarantee exclusive access to the nodes. There might be variations
in the timings by factors like system noise and usage of resources by other jobs running
on the same node at same time. Second, it is not well suited for scalability test as it has
only 32 nodes. HECToR is very massive machine and is very fast in calculations.
Hence, it was decided to use HECToR for the performance and scalability tests. Due to
the limited amount of time and budget on HECToR (MSc Students resource group), it
was decided to limit the number of nodes to 10 and numbers of particles to 1 million
for all the tests.
On HECToR, 1 core hour is set to 7.5 AUs (hector.ac.uk, 2011). The total number of
AUs used is calculated as:
Total AUs = Number of Processors * Run time in hours * 7.5 (6.1)
For a simulation with 1 million particles running on 2400 processors, say for 1 hour,
will consume: 2400 * 1 * 7.5 = 18,000 AUs. This number will increase with the
increase in the number of processors and number of particles.
The students resource group on HECToR has limited time allocated to it and is used by
all other student of the MSc, a new resource group was created in HECToR exclusive
for this project with 30,000 AUs moved from the student project reserve. All the tests
carried out for this project used the new resource group that is created.
6.2 Performance Benchmarks
In this section, the performance of the model is measured. Figure 6.1 shows the
performance of the simulation using LIGGGHTS measured on HECToR. Results are
shown for the simulation with 75,000 particles and 1000,000 particles. The
performance is measured in steps per second to allow two simulation sizes to be
compared in the same plot. The timings are taken from the loop time reported by the
code which is the total time spent in the main MD loop. In order to reduce/avoid the
start-up cost and the cost due to any instability, two runs of each system were
performed and average values are recorded. As expected, as the number of particles
increases the number of steps computed per second decreases. This is likely due to the
decrease in problem size and lack of I/O.
63

Figure 6.1: Performance of the model. System size ranges from 75,000 particles to
1000,000 particles

Figure 6.2: Benchmark of different system size on 480 and 960 processors


u
Suu
1uuu
1Suu
2uuu
2Suu
Suuu
u Suu 1uuu 1Suu 2uuu 2Suu Suuu
!
"
#
$
%
#
&
'
(
)
"

+
,
-
"
.
/
,
"
)
0

12&3"# %$ !#%)",,%#,
45&26'-5%( !"#$%#&'()"
7SK paiticles
1SuK paiticles
2SuK paiticles
SuuK paiticles
1uuuK paiticles
Suu
SSu
6uu
6Su
7uu
7Su
8uu
8Su
9uu
9Su
1uuu
1 2 S 4 S
7
2
(

8
5
&
"

+
4
"
)
%
(
9
,
0

12&3"# %$ !'#-5)6",
+:;<=>>>? @;:=>?>>>? A;@=>?>>>? B;=>>?>>>? =;:>>>?>>>
.'#-5)6",0
4)'6"9 ,5C" 3"()D&'#E
48u Piocessois
96u Piocessois
64
From Figure 6.2 it is obvious that the run time decreases with an increase in the number
of processors. This means that for higher number of particles and with a large number
of processors, the simulation performs better.
Figure 6.3 and 6.4 show the execution time and speed-up of the model respectively.
Since in HECToR each Node has 24 cores, it was decided to calculate the speedups
relative to 24 processor loop times. From figure 6.2, it is clear that the execution time
continues to reduce up to 2400 processors on HECToR. For figure 6.3, it is evident that
the system scales linearly with the increase in the number of processors. It is also clear
that the speed-up of the simulation increases with the increase in the number of
particles.

Figure 6.3: Execution time of 75,000 particles and 1000,000 particles on different
number of processors
u
Suu
1uuu
1Suu
2uuu
2Suu
u 24u 48u 72u 96u 12uu 144u 168u 192u 216u 24uu
7
2
(

8
5
&
"

+
4
"
)
%
(
9
,
0

12&3"# %$ !#%)",,%#,
FG")2-5%( 85&"
1uuuK Paiticles
7SK Paiticles
65

Figure 6.4: Speed up of 75,000 particles and 1000,000 particles on different number of
processors
6.3 Performance per time-step
The amount of time taken per time-step in the entire simulation time is studied.
Previous studies have shown that that the timings vary over the first hundred time-step.
The time taken per time-step is analysed for all the benchmarking runs that were
discussed in previous section. Figure 6.4 shows the results for the 40 node-75,000
benchmark run over 500 time-steps on HECToR.

Figure 6.5: Comparison of time taken per time-step
u
1
2
S
4
S
6
u 24u 48u 72u 96u 12uu 144u 168u 192u 216u 24uu
4
.
"
"
9

2
.

+
:
@
>

!
#
%
)
"
,
,
%
#
,
0

12&3"# %$ !#%)",,%#,
4.""9 H.
7SK Paiticles
1uuuK Paiticles
u
2S
Su
7S
1uu
12S
1Su
17S
2uu
u Su 1uu 1Su 2uu 2Su Suu SSu 4uu 4Su Suu
8
I
&
"

+
4
"
)
%
(
9
,
0

85&",-".
!"#$%#&'()" ."# -5&",-".
66
The results show that the performance is almost consistent for the first 100 time-step
and there is a very little deviation in 250 and 500 time-steps. The reason for this minute
deviation is that this is the time where a small amount of data is written to the output
file.
6.4 Performance Comparison - Cohesion and Non-Cohesion
It is interesting to compare the run time of the simulation model with cohesion and
without cohesion to understand the impact of cohesion on run time of the simulation.
The simulation model with cohesion between particles takes 15% more time to execute
than the model without cohesion of the same system size. Due to cohesion forces,
particles interact with large number of neighbour particles in the system and hence the
simulation takes more time to compute the forces between particles than the system
without cohesion.

Figure 6.6: Comparison of the model run time with cohesion and without cohesion for
system size 10000 particles run on 24 processors on HECToR

6.5 Summary
A number of timing results have been presented and analysed. Considering all the
results, the simulation appears to give the good performance and good scaling. The
scalability of the code can be improved by increasing the problem size. The results
obtained have been run only two times and are fully reproducible on HECToR.

11u
11S
12u
12S
1Su
1SS
14u
14S
Cohesion No-Cohesion
7
2
(

8
5
&
"

+
4
"
)
%
(
9
,
0

45&26'-5%( J%9"6
72( 85&" K, 45&26'-5%( J%9"6
67
Chapter 7
Profiling and Performance Analysis
Profiling helps to identify which parts of the program is taking the most of the
execution time. This helps in explaining the performance bottlenecks of the code on a
particular system. The traditional way of conducting performance analysis is by
Program counter sampling - interrupting the program at regular intervals and recording
the timing of the currently executed instruction. By this it is possible to compute the
relative amount of time spent in each procedure. There are a number of performance
analysis tools available to help programmers and optimise their applications. These
tools range from source code profilers to sophisticated tracers for analysis of
communication and analysis of the memory system or a combination of the two. The
sole purpose of the performance tools is to help developers to identify whether or not
the application is running efficiently on the computer resources available. Performance
of simulation is analysed in this chapter in terms of MPI functions and user defined
functions.









68
7.1 Description about profiling tools available
There are many performance analysis tools installed on HECToR that includes
Totalview, CrayPAT, TAU and Scalasca. For this thesis, Cray Performance Analysis
Tools (CrayPAT) is used to analyse the performance of the programs. CrayPAT
provides an integrated infrastructure for a variety of profiling experiments including
analysis of computation, communication, I/O and memory utilization and hardware
counter analysis. It supports all programming models. Currently CrayPAT is centrally
installed on HECToR and is working correctly. CrayPAT typically involves five-phase
cycle, which consists of:
1. Program Instrumentation
2. Data Measurement
3. Analysis of the performance data
4. Presentation of the captured data
5. Optimization of the program
CrayPAT consists of two components the CrayPAT performance collector and the
Cray Apprentice performance analyser. CrayPAT is the data capture tool that is used to
prepare the program for performance analysis experiment. Cray Apprentice is a post
processing data visualization tool that is used to further explore and study the resulting
captured data.
Here is the overview of how to use CrayPAT with APA on HECToR.
1. On HECToR, all the performance analysis related tools are merged into one
module called perftools. The first step is to load the CrayPAT module using
the module load perftools command.
2. Compile and link the application.
3. The pat_build -Oapa [exe_name] command is used to instrument
the program. This inserts the instructions needed for analysis of the program
at various points in the program. This will create the instrumented executable
exe_name+pat
4. This instrumented executable is executed as normal using the aprun
command. This generated a series of data files of the form:
exe_name+pat+PID.xf
5. The pat_report command can be used to generate the .apa file using
the experimental data file that is generated in Step 4.
6. Build the .apa file using the pat_build to generate new-instrumented
executable for the tracing experiment.
7. Run the new executable using the aprun command.
8. Generate the new report using the pat_report command.



69
7.2 Profiling using CrayPAT
Using Craypat, statistics for three group functions namely MPI functions, USER
functions and MPI_SYNC functions are obtained. MPI_SYNC is used in the trace
wrapper for each collective subroutine to measure the time spent waiting at the barrier
call before entering the subroutine. The MPI_SYNC statistics can be a good indication
of load imbalance. The time percentage of each group is shown in figure 7.1

Figure 7.1: Profile by function group
With processor counts from 960 to 2400, it can be seen that the time spent in MPI calls
increases from 28.7% to 33.1% while the time spent in user functions drop from 45.5%
to 24.9%. The time spent in MPI_SYNC increase from 25.7% to 42.0%.
7.2.1 Profiling - User functions
Figure 7.2 gives the top time-consuming user function. According to the CrayPAT
tracing results, the speed-up of the bin_atoms functions is about 3.5% on 2400
processors comparing with 960 processors.
u
u.2
u.4
u.6
u.8
1
96u 192u 24uu
!
"
#
)
"
(
-
'
L
"

%
$

-
%
-
'
6

-
5
&
"

12&3"# %$ !#%)",,%#,
!#%M56" 3N $2()-5%( L#%2.
NPI_Sync
NPI
0sei
70

Figure 7.2: Top time consuming user functions got from CrayPAT
7.2.2 Profiling - Percentage Time of MPI Calls
The most time consuming among the MPI calls is MPI_Allreduce. This MPI collective
operation does not scale well and it is expected behaviour.

Figure 7.3: Top time consuming MPI functions
u
u.uS
u.1
u.1S
u.2
u.2S
u.S
u.SS
u.4
u.4S
u.S
96u 192u 24uu
!
"
#
)
"
(
-
'
L
"

%
$

-
%
-
'
6

-
5
&
"

12&3"# %$ !#%)",,%#,
8%. -5&" O%(,2&5(L 2,"# $2()-5%(,
oveilaps_xneai_i
FixTiiNeighlist::check_tii
Neighboi::bin_atoms
main
u
u.uS
u.1
u.1S
u.2
u.2S
u.S
u.SS
96u 192u 24uu
!
"
#
)
"
(
-
'
L
"

%
$

-
%
-
'
6

-
5
&
"

12&3"# %$ !#%)",,%#,
8%. -5&" O%(,2&5(L J!I P2()-5%(,
NPI_Senu
NPI_allgathei_
NPI_Waitany
NPI_Allieuuce
71
However on the XE6, the scaling is relatively good from 1920 to 2400 processors in
figure 7.3. From the call graph generated by CrayPAT, it was clear that the
MPI_Allreduce bottleneck is in pre_force function of FixTriNeighlist class.

Figure 7.4: Top time consuming MPI_SYNC functions
In figure 7.4, it is clear that MPI_Allreduce accounts for the most parts of the waiting
time spent in the barrier. It is worth checking the possibility of combining together
several of the MPI_Allreduce. Compared to runs on 960 and 1920 processors
MPI_SCAN and MPI_Bcast calls are becoming more significant on 2400 processors.
7.2.3 Profiling Messages/Sizes
The main advantage of LAMMPS is that most of the messages are small and medium
sized messages. Mostly messages of size 64B and 64KB are used. From figure 7.5, it is
clear that the number of messages increases dramatically with the number of nodes.
u
u.uS
u.1
u.1S
u.2
u.2S
u.S
u.SS
u.4
u.4S
96u 192u 24uu
!
"
#
)
"
(
-
'
L
"

%
$

-
%
-
'
6

-
5
&
"

12&3"# %$ !#%)",,%#,
8%. -5&" )%(,2&5(L J!IQ4R1O P2()-5%(,
mpi_scan_(sync)
mpi_bcast_(sync)
NPI_Bcast(sync)
NPI_Allgathei(sync)
mpi_allgathei_(sync)
mpi_allieuuce_(sync)
NPI_Allieuuce(sync)
72

Figure 7.5: Profile by message sizes
7.2.4 Profiling Memory Usage
LAMMPS prints an estimate for its memory use but that is a lower limit, since it only
accounts for large memory allocations. Currently LAMMPS does not output more
accurate information on memory usage this feature would have to be added to
LAMMPS. This may not be easy to add, since LAMMPS may reserve memory for use
(via malloc()), but not actually use it due to your specific selection of features, that is it
will affect address space, but not (physical) memory used. On the other hand, allocated
and used address space may be paged out to swap space. If LAMMPS is run in parallel
over an RDMA architecture (e.g. infiniband or myrinet), then things get even more
complicated, since it may have "pinned" memory that is backing device memory, but
may be accessed by multiple processes at the same time and may have regular shared
memory. To make things even more complicated, some MPI implementations have a
"lazy" memory allocation that keeps allocated memory blocks around for later use,
since it is likely that it is possible that same size messages are sent multiple times, but
those could be freed, if there is no more free address space or based on some heuristics.
7.3 Timing O/P directly by the code and its description
LIGGGHTS code outputs timings from the main functions. Using these output timings
a stacked bar chart showing how the time spent in each section of the code varies with
processor count is plotted. Figure 7.1 shows such a chart for 1000K particles
simulation. The time spent on each section of the code is given as the percentage of the
total loop time. Table 7.1 is the summary of the timings reported by LIGGGHTS and
their description.
u
Suuuu
1uuuuu
1Suuuu
2uuuuu
2Suuuu
Suuuuu
SSuuuu
4uuuuu
1
2
&
3
"
#

%
$

J
"
,
,
'
L
"
,

J",,'L" 45C" +SN-",0
J",,'L" 45C"
1u
2u
Su
4u
6u
73
Name Description
Pair Time taken to compute the pairwise interactions between the atoms
Neigh Time taken to compute new neighbour lists for each atom
Comm Time spent in communications
Outpt Time taken to output the restart position, atom position, velocity and force files
Main Time taken for the main MD loop to execute minus the sum of the above times
Table 7.1: LIGGGHTS timing output
For figure 7.6 it is clear that as the processor count increases, the percentage of time
spent in Outpt, Comm begins to dominate. These phases involve MPI_Irecv/MPI_Send
and MPI_Allreduce and hence this is not unexpected behaviour. There is a significant
decrease in the Pair and Neigh timings. This is expected because, as the processors
increase the number of particles each processors handles decreases and hence the
reduction in these timings.

Figure 7.6: LIGGGHTS timing output



u
1u
2u
Su
4u
Su
6u
7u
8u
9u
1uu
11u
12u 48u 96u 144u 192u
!
"
#
)
"
(
-
'
L
"

-
5
&
"

-
'
E
"
(

+
T
0

!#%)",,%#,
UIVVVW84 85&5(L X2-.2-
Nain
Paii
Neigh
Comm
0utpt
74
7.4 Summary
Among the MPI functions MPI_Send, MPI_Allreduce and MPI_Waitany are the
mostly used MPI calls. Majority of the communication time is spent on MPI_Send and
MPI_Allreduce, which is the reason for increase in the communication time on higher
processors. Mostly messages of low size are used in the communications.
75
Chapter 8
Conclusions and Future Work
8.1.1 Summary
The main aim of the project was to demonstrate via a 3D model the flow of very large
number of snow particles under the influence of gravity to allow for in-elastic collisions
and cohesion using high performance computing and analyse the scalability and
performance of the model. Understanding how particles behave under collision and
cohesion will help in modelling granular snow avalanches. From industrial point of
view, it will help tire manufactures to know how snow interacts with tires and in snow
studies, the model can be used to study the effects of micro-penetrometer in snow.
Choosing the appropriate methodology to model the simulation was a big challenge.
Existing literature on snow particle modelling and snow avalanche modelling was
studied to understand the snow particle modelling. For the study it was understood that
the by traditional continuum approach only homogeneous materials can be represented
effectively. It was clear that discrete approach is the best finite difference method for
predicting the motion of individual and independently moving particles and hence it
was decided to implement the model using DEM.
It was also necessary to study the background of MD-DEM approach and the
mechanics of snow particles. Understanding the physics and mathematics behind DEM
approach helped in choosing the most appropriate technique and algorithms for the
model in terms of efficiency and accuracy. For example: choosing the Velocity-verlet
integration, PBC approach. This will also greatly help in further optimising the model.
The physical constants governing the phenomenon of flow snow particles such as
density, Youngs modulus, friction, coefficient of restitution are investigated more
carefully. These properties greatly affect the flow of particles (in real) and the
simulation as well. The choice of these parameters and the time-step for the simulation
are carefully considered.
Typically, DEM simulations involve large number of particles and hence require a very
long time to execute and are thus computationally expensive. Also the number of
particles in an avalanche is huge. The challenge was how to accommodate the size of
76
the system, while meeting the need for speedy results that are computationally
affordable to manage and run. Use of high performance computing was the obvious
choice and hence all the simulation runs for the model were done on HECToR.
LAMMPS/LIGGGHTS was considered for the model because of two reasons: it can
effectively model granular particles and it is good MPI coupling interface for parallel
implementation.
Benchmarking the model helped in understanding scalability of the model. It is very
important that the model scales very well in-order to handle large system sizes. It was
found that the model scales very well. With an increase in the number of processors,
speedup achieved is quite good, for example on 480 processors for 75K particles, the
speedup was 1.99, while on 960 processors for same number of particles speedup
achieved was 2.52.
Due to the budget constraint on HECToR, the particle size for this thesis was limited to
1 million. It was thought it would be interesting to extrapolate the scaled size
benchmark shown in figure 6.2 for a very huge number of particles. Least squares
regression analysis was carried to predict the runtime for more than 1 million particles.
Least squares technique was applied on the data to arrive at the linear regression model.
Once the regression model for the data is derived, it was easy to extrapolate for the data
for very large number of particles. Figure 8.1 shows the plot of the regression analysis.
X-axis represents the Number of particles (in ten-thousands) while the run time is in Y-
axis. The actual data was extended to 10
8
particles. The analysis shows, a system with
10
8
particles run on 960 processors will take approximately 4.4 hours to complete
which is equivalent to 31,700 AUs on HECToR. On 480 processors, it will take
approximately 5.5 hours and require 20,000 AUs on HECToR.
77


Figure 8.1: Linear Regression Analysis of Scaled size benchmark
To understand the performance of the model and to identify areas for optimisation, the
mode was profiled. Vampir was first considered for profiling but technical difficulties
faced in using Vampir on HECToR prevented this; CrayPAT was used to profile the
model. Top time-consuming MPI functions and User functions are identified.
8.1.2 Recommendations for future Research
HPC Perspective Suggestions for Optimisation
The amount of time spent on MPI calls - particularly in point-to-point communications
and collective operations increases with increase in the number of processors. As the
processors need to communicate between each other at every time step, there is less
possibility to reduce the bottleneck due to point-to-point communications. But there is
scope to reduce the bottleneck due to collective operations.
For the user functions, the time taken by the main computational loop increases with
the increase in number of processors. Currently the main class methods of
LAMMPS/LIGGGHTS code are not vectorised. It might be worth converting arrays to
vectors, to see how it improves the performance of the simulation. However, irregular
memory access pattern of LAMMPS might be a hindrance to vector code.
u
Suuu
1uuuu
1Suuu
2uuuu
2Suuu
u 1uuu 2uuu Suuu 4uuu Suuu 6uuu 7uuu 8uuu 9uuu 1uuuu
7
2
(
-
5
&
"

+
,
"
)
%
(
9
,
0

12&3"# %$ !'#-5)6", +5( -"(Y-D%2,'(90
4)'6"9 45C" S"()D&'#E Y FG-"(9"9
48u Piocessois
96u Piocessois
78
Though there is a bottleneck on the time spent on point-to-point communications; in
reality the actual time spent on this seems reasonable. The bottleneck could be because;
with the increase in number of processors the workload of each process has reduced
such that the amount of computation done by each processor is too small before
communicating it to its neighbours. Investigating the load balancing and domain
decomposition techniques can be help in improving this situation. Currently it is not
possible to calculate the memory requirements for each processor for the simulation.
Calculating the amount of memory required for each processor will help in
understanding the load balance of the simulation.
Simulation Perspective Improving the model
Most of the granular snow avalanches originate at a point, growing wider as they flow
down the slope sweeping more and more snow particles in its descent. This kind of
feature is not possible to simulate in LAMMPS/LIGGGHTS. It will be nice to code to
make the model look very close and realistic to the real snow avalanche.
In the current model, the density of the particles is fixed. In reality the density of snow
particles vary due to several environmental factors like heat. An attempt was made by
varying the density simulation. It resulted in irregular pattern in the simulation. It might
be an interesting study to analyse the effect of the material parameter values on the
simulation.
Changes made to the original work plan and the risk assessment details are presented in
Appendix A.
79
References

Ancey, Christophe, 2002. Snow Avalanches, Geomorpholigical Fluid Mechanics.
Beazleyand, D.M.and Lomdahl, P.S., 1994 "Message-PassingMulti- Cell Molecular
Dynamics on the Connection Machine 5", Parallel Computing, 20 (2), pp. 173-195.
Campbell, C.S., P. Cleary, and M.A. Hopkins (1995). Long-runout landslides: a study
by computer simulation, J. Geophysical Research, 100, B5, 8267-8283
Cray Inc; 2011; CRAY XE 6;
http://www.cray.com/Products/XE/CrayXE6System.aspx; accessed on 11th August
2011
Chiou, 2005. M.C.: Modelling dry granular avalanches past different obstructs:
numerical simulations and laboratory analyses. Ph. D. Technical University Darmstadt,
Germany.
D. C. Rapaport, 2004. The Art of Molecular Dynamics Simulation. Cambridge
University Press.
Dziugys, A. and Peters, B. 2001, An approach to simulate the motion of spherical and
non- spherical fuel particles in combustion chambers, Granular Matter, 3, pp.231
265.
EPCC; 2011; Ness; http://www.epcc.ed.ac.uk/facilities/ness/; accessed on 4th August
2011
Farhang Radja and Frdric Dubois. Discrete-element Modelling of Granular
Materials. Mechanics and Civil Engineering Laboratory (LMGC), University of
Montpellier 2, France ISBN: 9781848212602
Feng, J.Q. 2000, Contact behavior of spherical elastic particles: a computational study
of particle adhesion and deformations, Colloids and Surfaces A: Physicochemical and
Engineering Aspects, 172(1-3), pp.175198.


80
Fierz, C., Armstrong, R.L., Durand, Y., Etchevers, P., Greene, E., McClung, D.M.,
Nishimura, K., Satyawali, P.K. and Sokratov, S.A., 2009. The International
Classification for Seasonal Snow on the Ground. IHP-VII Technical Documents in
Hydrology N83, IACS Contribution N1, UNESCO-IHP, Paris.
"Fredston and Fresla cite research by Perla (1977) stating that the most frequent angle
of slope on which avalanches occur is 37 degrees."
Freston, J and Fesler, D., 1999. Snow Sense: Alaska Mountain Saftey CEnter
Perla, R , Slab avalanche measurements, Canadian Geotechnical Journal, 1977, 14:(2)
206-213, 10.1139/t77-021 in Freston and Fesler (1999)
F. J. L. Reid and L. A. Smith. Performance and profiling of the LAMMPS code on
HPCx, Technical report, HPCx Consortium, May 2005.
Godbout, S, Chenard, L. And Marquis, A., 2000, Instantaneous Youngs modulus of
ice from liquid manure, Canadian Agricultural Engineering, 42 (2), pp. 6.1-6.14
Hamaker, H.C., 1937. The LondonvanderWaals attraction between spherical
particles.Phys- ica, 4(10), pp.10581072.
H. Hertz, 1882. Uber die beruhrung fester elastischer korper (On the contact of elastic
solids). J. Reine Angew. Math., 92, pp.156171.
Harry F. Jordan and Gita Alaghband, 2003. Fundamentals of Parallel Processing,
Chapter 2.7.2 A Simple Performance Model - Amdahlss Law. Pearson Education, Inc.
Hector UK; 2010; Architecture details;
http://www.hector.ac.uk/cse/documentation/Phase2b/#arch; accessed on 15th August
2011
Hector UK; 2011; Cost of Access to hector;
http://www.hector.ac.uk/howcan/admin/costs/index.php; accessed on 5th August 2011
Hector UK; 2011; Hector Hardware;
http://www.hector.ac.uk/support/documentation/userguide/hardware.php; accessed on
8th August 2011
Higa, M. Arakawa, M. and N. Maeno, 1995. Measurements of restitution coefficients
of ice at low temperatures, Institute of Low Temperature Science, Hokkaido University,
Kita-ku Kita-Nishi-8, Sapporo 060, Japan, Received 5 May 1995; revised 6 October
1995; accepted 6 October 1995
Hogue, C. and Newland, D. 1994. Efficient computer simulation of moving granular
particles. Powder Technology, 78(1), pp.5166
Jenkins, J.T. and Savage, S.B., 1983. A theory for rapid flow of identical, smooth,
mearly elastic spherical particles. J. Fluid Mech. 130, pp.187-202

81

Jensen, R., Edil, T., Bosscher, P., Plesha, M. and Kahla, N. 2001, Effect of particle
shape on interface behaviour of DEM-simulated granular materials, International
Journal of Geomechanics, 1(1), pp.119
Lee R. Aarons, Jin Sun, and Sankaran Sundaresan, 2009. Unsteady Shear of Dense
Assemblies of Cohesive Granular Materials under Constant Volume Conditions.
Department of Chemical Engineering. Princeton University, Princeton, New Jersey
08544
Liggghts.com, 2011, LIGGGHTS Open Source Discrete Element Method Particle
Simulation Code, http://www.liggghts.com/, accessed on 7
th
August 2011
P.A. Cundall and O.D.L. Strack, 1979. A discrete numerical model for granular
assemblies. Geotechnique, 29(1), pp.4765.
Paraview.org, 2011;Paraview, http://www.paraview.org/, accessed on 9
th
August
2011
Peter A. Schaerer, n.d, Friction coefficients and speed of flowing avalanches.
P.W. Cleary and M.L. Sawley, 2002. DEM modelling of industrial granular flows: 3D
case studies and the effect of particle shape on hopper discharge. Applied Mathematical
Modelling, 26, pp.89111
Raji A.O., Favier J.F., 2004. Model for the deformation in agricultural and food
particulate materials under bulk compressive loading using discrete element method. i:
Theory, model development and validation, 64, pp. 359-371
Sandia.gov, LAMMPS Molecular Dynamics Simulator, http://lammps.sandia.gov/,
accessed on 4
th
August 2011
Sandia.gov, Pizza.py Toolkit; http://www.sandia.gov/~sjplimp/pizza.html; accessed
on 8
th
August 2011
Sinha, N.K, 1987, Effective Poisson's Ratio of Isotropic Ice, Reprinted from
Proceedings of the Sixth International Offshore Mechanics and Arctic Engineering
Symposium Houston, TX. March 1-5,1987 Vol. IV, p. 189-195 (IRC Paper No. 1472)
S. J. Plimpton, 1995. "Fast Parallel Algorithms for Short-Range Molecular Dynamics."
Journal of Comp Phys, pp. 1-19
Subramani, V. S. 2008. Potential Applications of Nanotechnology for improved
performance of Cement based materials. M.S Thesis, University of Arkansas
Wikipedia, 2010; File:Graupel, Westwood,
http://en.wikipedia.org/wiki/File:Graupel,_Westwood,_MA_2010-02-02.jpg, accessed
on 12
th
August 2011
82
Appendix A

Project Management
Z[: \%#E !6'(
Overall the project was on schedule without any major delays. The anticipated work
plan was submitted along with the project preparation course report. There are some
slight deviations to it mainly in the implementation phase. Initially it was thought that
the cohesion add-on provided for LAMMPS code would suffice our purpose of
modelling cohesive behaviour of snow particles. But then only after the
implementation, it was found to be not suitable for modelling snow particles. So
LIGGGHTS cohesion model was considered. Installation of LIGGGHTS and the
implementation using LIGGGHTS was not in the initial plan. The work plan was
modified accordingly to extend the implementation phase. Details are summarised in
below table A.1.
Task
Planned
Start Date
Planned
End Date
Actual
Start date
Actual
End Date
Phase 1 Background Research 26/01/11 28/02/11 26/01/11 28/02/11
Background reading 26/01/11 02/04/11 26/01/11 02/04/11
Literature Review 07/02/11 28/02/11 07/02/11 28/02/11
Phase 2 Experimental Setup 01/03/11 11/03/11 01/03/11 30/06/11
Installing LAMMPS/LIGGGHTS 01/03/11 03/03/11 01/03/11 30/06/11
Granular Simulation 03/04/11 03/11/11 03/04/11 03/11/11
Phase 3 Project Presentation 03/14/11 03/22/11 03/14/11 03/22/11
Phase 4 - Design 01/06/11 15/06/11 01/06/11 18/06/11
Construct the DEM Model 01/06/11 15/06/11 01/06/11 18/06/11
Phase 5 Implementation 16/06/11 09/08/11 16/06/11 13/08/11
Implementation of the Model 16/06/11 18/07/11 16/06/11 18/07/11
Profiling & Analysis 18/07/11 27/07/11 25/07/11 03/08/11
Benchmarking 28/07/11 09/08/11 03/08/11 13/08/11
Visualization of the simulation 10/08/11 17/08/11 16/06/11 13/08/11
Phase 6 Improvements 17/08/11 19/08/11 18/07/11 30/07/11
Changes to the model/code 17/08/11 19/08/11 18/07/11 30/07/11
Phase 7 Final Write-up 15/08/11 08/30/11 06/20/11 08/18/11








83
Z[@ 75,E Z,,",,&"(-
Below table summaries the risk assessment details. The severity has been categorised
as high, medium and low. In addition to the severity, the likelihood of the risk
occurrence is also considered and it is categorised as high more likely to happen to
low less likely to happen
Risk Severity Likelihood Mitigation Plan Status
Open bug with the PGI
compiler flags for latest
version of LAMMPS
(January 2011)
High Low
Used GNU compiler
instead of PGI compiler
and the flags are
modified appropriately
Mitigated
Porting Dr. Jin Suns
cohesive add-on to
HECToR
High High
Try to fix the code if it
is a minor issue; if not
consider using
LIGGGHTS
Mitigated
Installing LIGGGHTS
on HECToR
High Low
LAMMPS makefile
was used as a template
and a new Makefile for
LIGGGHTS was
created
Mitigated
The work plan was
modified to include the
LIGGGHTS
implementation. This
had an impact on the
over all project
schedule
High N/A
It was possible to
overlap many of the
stages and still
complete the project on
time. For example:
Once a cohesion model
was developed, It was
possible to do code
profiling and
benchmarking in
parallel
Mitigated
Table A.2: Risk Assessment



84
Appendix B

Parallel Processing on Ness & HECToR

B.1 Batch script on HECToR

#!/bin/bash --login
#PBS -N lammps
#PBS -v DISPLAY
#PBS -l mppwidth=48
#PBS -l mppnppn=24
#PBS -l walltime=00:20:00
#PBS -A d04

# Change to the directory that the job was submitted
echo "PBS_O_WORKDIR =" $PBS_O_WORKDIR
cd $PBS_O_WORKDIR

# Launch the parallel job
aprun -n 48 -N 24 lmp_hector < in.chute










85
B.2 Makefile for LAMMPS/LIGGGHTS on HECToR


# HECToR XT4 system

SHELL = /bin/sh
.SUFFIXES: .cpp .d
.IGNORE:

# System-specific settings

CC = CC
CCFLAGS = -O3 -g -DFFT_FFTW
DMPICH_IGNORE_CXX_SEEK
DEPFLAGS = -M
LINK = CC $(CCFLAGS)
USRLIB = -ldfftw
SIZE = size

# Link rule

$(EXE): $(OBJ)
$(LINK) $(LINKFLAGS) $(OBJ) $(USRLIB) $(SYSLIB) -
o $(EXE)
$(SIZE) $(EXE)

# Library target

lib: $(OBJ)
$(ARCHIVE) $(ARFLAGS) $(EXE) $(OBJ)

# Compilation rules

.cpp.o:
$(CC) $(CCFLAGS) -c $<

# Individual dependencies

$(OBJ): $(INC)





86
B.3 Makefile for LAMMPS/LIGGGHTS on Ness

# Ness system
SHELL = /bin/sh
.SUFFIXES: .cpp .d
.IGNORE:

# System-specific settings

CC = g++
CCFLAGS = -O3 -DFFT_FFTW -DMPICH_IGNORE_CXX_SEEK
DMPICH_SKIP_MPICXX
I/opt/local/packages/fftw/fftw-2.1.5-
gcc/include
I/opt/local/packages/mpich2/1.0.5p4-ch3_sock-
gcc4/include
DEPFLAGS = -M
LINK = g++ $(CCFLAGS)
LINKFLAGS = -L/opt/local/packages/fftw/fftw-2.1.5-
gcc/lib L/opt/local/packages/mpich2/1.0.5p4-
ch3_sock-gcc4/lib
L/usr/local/cuda3.2.16/lib64
USRLIB = -lfftw -lmpich -lpthread -lssl
SIZE = size

# Link rule

$(EXE): $(OBJ)
$(LINK) $(LINKFLAGS) $(OBJ) $(USRLIB) $(SYSLIB) -
o $(EXE)
$(SIZE) $(EXE)

# Library target

lib: $(OBJ)
$(ARCHIVE) $(ARFLAGS) $(EXE) $(OBJ)

# Compilation rules

.cpp.o:
$(CC) $(CCFLAGS) -c $<

# Individual dependencies

$(OBJ): $(INC)

87
Appendix C
AutoCAD Details
A chute was drawn using the AutoCAD 2011. The commands used were line, fillet,
pedit, offset, extrude etc. The details/steps followed in drawing the chute are given
below.
1. Units:
(Dialogue box opens, in the insertion scale $ choose millimeters)

2. Limits:
Specify lower left corner or [ON/OFF] <0.0000,0.0000>: 0,0
Specify upper right corner <12.0000,9.0000>: 1.2,0.9

3. Grid:
Specify grid spacing(X) or
[ON/OFF/Snap/Major/aDaptive/Limits/Follow/Aspect] <0.5000>:0.01

4. ZOOM:
Specify corner of window, enter a scale factor (nX or nXP), or
[All/Center/Dynamic/Extents/Previous/Scale/Window/Object] <real time>: a

5. Snap:
Specify grid spacing(X): 0.01

6. Menu $ View $ viewports $ Click Named (Dialogue Box opens)$New
Viewport $ choose Three: Left $ Click O.K.

7. (Click the cursor in the Left Screen.)
Menu $ View $ Views $Choose SE Isometric.

8. (Click the cursor in the Right Bottom Screen.)
Menu $ View $ Views $Choose Top.

9. (Click the cursor in the Right Top Screen.)
Menu $ View $ Views $Choose Front.

10. (Click the cursor in the right top screen.
88
Line:
Specify first point: 0.5026,0.2498 (Point C)
(Press Function Key F8 for ortho on)
Specify next point or [Undo]: <Ortho on>0 .03 (Point D)
Specify next point or [Undo]: 0.8 (Point E)
Specify next point or [Close/Undo]: Mouse right click $choose cancel.

11. Line:
Specify first point: Choose the Point C
(Press Function Key F8 for ortho off)
Specify next point or [Undo]: <Ortho off> @.03<143(Point B)
Specify next point or [Undo]: @.41<143 (Point A)
Specify next point or [Close/Undo]: Mouse right click $choose cancel.

12. Zoom:
Specify corner of window, enter a scale factor (nX or nXP), or
[All/Center/Dynamic/Extents/Previous/Scale/Window/Object] <real time>: a

13. Pan
(Move the object in such a way that points B,C,D were visible)

14. Fillet:
Select first object or [Undo/Polyline/Radius/Trim/Multiple]: Radius
Specify fillet radius <0.0000>: 0.03
Select first object or [Undo/Polyline/Radius/Trim/Multiple]: (Choose the line
BC)
Select second object or shift-select to apply corner: (Choose the line CD)
( The line BC and CD were transformed to curve)

15. Pedit:
Select polyline or [Multiple]: (choose the line AB)
Do you want to turn it into one? <Y>:Y
Enter an option [Close/Join/Width/Edit vertex/Fit/Spline/Decurve/Ltype
gen/Reverse/Undo]: Join
Select objects: (Choose all the other lines)
Enter an option [Close/Join/Width/Edit vertex/Fit/Spline/Decurve/Ltype
gen/Reverse/Undo]: (Mouse right click , choose cancel)
(All the lines became a single line)

16. Offset:
Specify offset distance or [Through/Erase/Layer] <Through>: 0.001
Select object to offset or [Exit/Undo] <Exit> : (choose the line ABCDE)
Specify point on side to offset or [Exit/Multiple/Undo] <Exit>: (Click above
the line ABCDE)
Select object to offset or [Exit/Undo] <Exit>: Exit.
(The line similar to Line ABCDE was created, i.e ABCDE)

17. Pan:
(Move the object in such a way that Point A was clearly seen)

18. Line:
Specify first point: (Choose Point A)
Specify next point or [Undo]: (Choose Point A)
Specify next point or [Undo]: (Mouse right click, choose cancel)
89

19. Pan:
(Move the object in such a way that Point E was clearly seen)

20. Line:
Specify first point: (Choose Point E)
Specify next point or [Undo]: (Choose Point E)
Specify next point or [Undo]: (Mouse right click, choose cancel)

21. Pedit:
Select polyline or [Multiple]: (Choose the line AA)
Do you want to turn it into one? <Y>: Y
Enter an option [Close/Join/Width/Edit vertex/Fit/Spline/Decurve/Ltype
gen/Reverse/Undo]: Join
Select objects: (Select all the other lines)
Enter an option [Open/Join/Width/Edit vertex/Fit/Spline/Decurve/Ltype
gen/Reverse/Undo]: (Mouse right click , choose cancel)
(ABCDE and ABCDE became a single line)

22. Extrude:
Select objects to extrude: (Choose the Line ABCDEABCDE)
Specify height of extrusion or [Direction/Path/Taper angle]:0.41
(Solid Tiny chute was created)

Convert Solid geometry to Mesh using AutoCAD

23. Meshsmooth
Select objects to convert: (Choose the solid)
(Dialogue box opens, choose create mesh $ ok)
(Solid was converted to Mesh)
(Mouse Left click the object, Dialogue Box opens, change the smoothness to
none)

To convert AutoCAD file to stl file format

24. (AutoCAD file menu $Export$Other Format $ File Types $ Choose *.stl,
Type the file Name)
Select solids or watertight meshes: (select the solid or watertight mesh)(stl
file was created)

You might also like