Professional Documents
Culture Documents
Introduction
Fluent Parallel Architecture
4 Basic Components of Parallel UDFs
Troubleshooting Tips for Parallel UDFs
Questions?
w_right
w_left_b
w_bottom
• The case shown here will be used as the basis for numerous
examples in this session
• If it is read into a Fluent parallel session using 4 cpus, the mesh and
solution data will be distributed into grid partitions as shown on the
next slide
8 © 2011 ANSYS, Inc. February 26, 2014
Mesh with 4 Partitions
1.
Parallel – 4 cpus
1.
#if RP_HOST
/* Coding here only performed on HOST process */
#endif
#if RP_NODE
/* Coding here only performed on NODE processes */
#endif
#if PARALLEL
/* Coding here only performed on HOST & NODE processes */
#endif
#if !PARALLEL
/* Coding here only performed on SERIAL process */
#endif
#if !RP_HOST
/* Coding here only performed on NODE & SERIAL processes */
#endif
#if !RP_NODE
/* Coding here only performed on HOST & SERIAL processes */
#endif
Interior Faces
Exterior Cell
Interior Cells
Modified
• The correct way to avoid looping over faces that belong to exterior cells is by
using PRINCIPAL_FACE_P(f,t)
- Do not use begin_f_loop_int(f,t)
- This usually only required when a quantity is summed over all the faces of a thread
– Generally not necessary for assigning boundary conditions in DEFINE_PROFILE
macros
19 © 2011 ANSYS, Inc. February 26, 2014
Global Reductions
• The example has shown that as the UDF executes in parallel, variables
can have different values on different compute nodes
• In many cases, the workflow of the UDF requires the execution of the
commands to be synchronized, for instance so that at a certain point
in the program execution, a variable has the same value on every
compute node
• This synchronization is achieved through the use of global reductions
– Depending on the objective
• For total value over all nodes, use a summation reduction
• For minimum or maximum values, use a high or low reduction
• For a logical test over all nodes, use an AND or OR reduction
• The form of the macro depends on the variable type
• We want to sum the cell count over all the compute nodes, so PRF_GISUM1 is
used
• Global reductions should always be inside an RP_NODE compiler directive
• Through the PRF_GISUM reduction, the value of the ncount variable is the same
on each compute node
• But its value on the host process is still zero
• Global reductions operate only on node processes
• The final piece of the puzzle is how to communicate the value of ncount to the
host process
T(w_right)=f(w_left_a)
manual)
– Sometimes required in other macros but much less
common
w_left_b
• Other common examples where parallelization is
required include using the values of user-defined
scheme variables in a UDF, or using non-local values to
control boundary conditions
– Example: want to set the temperature of wall w_right as a
function of the temperature at w_left_a, but these are
located on different grid partitions
w_bottom
26 © 2011 ANSYS, Inc. February 26, 2014
When Parallelization is Unnecessary
• Most DEFINE_DPM_ macros require no parallelization
• Particles carry information with them as they cross mesh partitions
• Some exceptions for file writing, see UDF Manual, Section 7.4
• Many UDFs operate on one cell, or one face, at a time, such as
DEFINE_PROFILE, DEFINE_SOURCE, DEFINE_PROPERTY
• These generally do not need to be modified
Mostly no parallelization is needed for
simple DEFINE_PROFILE UDFs such as
this. Even PRINCIPAL_FACE_P is not
needed at boundary faces.
This example is
from a
DEFINE_ADJUST
macro
Incorrect Usage
Correct Usage This will probably also result in a
32 © 2011 ANSYS, Inc. February 26, 2014 runtime error - why?
Performance Tips: Looping and Directives
• For parallel use,
• {begin,end}_c_loop instead of {begin,end}_c_loop_int whenever possible
• Communication between compute processes will be reduced
– Increase in communication can lead to decrease in parallel efficiency
• The same consideration applies for PRINCIPAL_FACE_P in {begin,end}_f_loop
– Obviously sometimes these need to be used, just restrict the usage to only those
times
• Also, use compiler directives judiciously
so that commands execute only where
needed
Webinar Recording:
Available in one week’s time in the
ANSYS Resource Library at
www.ansys.com/Resource+Library