Tutorial 25. Parallel Processing

Tutorial 25.
Parallel Processing
Introduction
This tutorial illustrates the setup and solution of a simple 3D problem using FLUENTs
parallel processing capabilities. In order to be run in parallel, the mesh must be divided
into smaller, evenly sized partitions. Each FLUENT process, called a compute node,
will solve on a single partition, and information will be passed back and forth across all
partition interfaces. FLUENTs solver allows parallel processing on a dedicated parallel
machine, or a network of workstations running Linux, UNIX, or Windows.
The tutorial assumes that both FLUENT and network communication software have been
correctly installed (see the separate installation instructions and related information for
details). The case chosen is the mixing elbow problem you solved in Tutorial 1.
This tutorial demonstrates how to do the following:
Start the parallel version of FLUENTusing either Linux/UNIX, or Windows.
Partition a grid for parallel processing.
Use a parallel network of workstations.
Check the performance of the parallel solver.
Prerequisites
This tutorial assumes that you are familiar with the menu structure in FLUENT and that
you have completed Tutorial 1. Some steps in the setup and solution procedure will not
be shown explicitly.
c Fluent Inc. September 21, 2006 25-1
Parallel Processing
Problem Description
The problem to be considered is shown schematically in Figure 25.1. A cold uid at
20
C ows into the pipe through a large inlet, and mixes with a warmer uid at 40
C
that enters through a smaller inlet located at the elbow. The pipe dimensions are in
inches, and the uid properties and boundary conditions are given in SI units. The
Reynolds number for the ow at the larger inlet is 50,800, so a turbulent ow model will
be required.
= 4216 J/kgK
p C
= 8 x 10 Pas
4
k = 0.677 W/mK
= 0.4 m/s
x
U
4" Dia.
4"
8"
3"
1" Dia.
1"
8"
Viscosity:
Conductivity:
Specific Heat:
T = 20 C
I = 5%
= 1.2 m/s
y U
T = 40 C
I = 5%
Density: = 1000 kg/m
3
o
o
Figure 25.1: Problem Specication
25-2 c Fluent Inc. September 21, 2006
Parallel Processing
Setup and Solution
Preparation
1. Download parallel_process.zip from the Fluent Inc. User Services Center or
copy it from the FLUENT documentation CD to your working folder (as described
in Tutorial 1).
2. Unzip parallel_process.zip.
elbow3.cas can be found in the parallel process folder created after unzipping
the le.
You can partition the grid before or after you set up the problem (dene models,
boundary conditions, etc.). It is best to partition after the problem is set up, since
partitioning has some model dependencies (e.g., sliding-mesh and shell-conduction
encapsulation). Since you have already followed the procedure for setting up the
mixing elbow in Tutorial 1, elbow3.cas is provided to save you the eort of re-
dening the models and boundary conditions.
Step 1: Starting the Parallel Version of FLUENT
Since the procedure for starting the parallel version of FLUENT is dependent upon the
type of machine(s) you are using, four versions of this step are provided here. Follow the
procedure for the machine conguration that is appropriate for you.
Step 1A: Multiprocessor Windows, Linux, or UNIX Computer
Step 1B: Network of Windows, Linux, or UNIX Computers
Step 1A: Multiprocessor Windows, Linux, or UNIX Computer
You can start the 3D parallel version of FLUENT on a Windows, Linux, or UNIX machine
using 2 processes by performing either of the following steps:
At the command prompt, type
fluent 3d -t2
See Chapter 31 of the Users Guide for additional information about parallel com-
mand line options.
Parallel Processing
For Linux or UNIX, at the command prompt, type fluent.
For Windows, type fluent -t2.
!
Do not specify any argument (e.g., 3d).
1. Specify the 3D parallel version.
File Run...
(a) Enable the 3D and the Parallel options in the Versions group box.
(b) Set Processes to 2 in the Options group box.
(c) Retain the selection of Default in the Interconnect drop-down list.
(d) Click Run.
Step 1B: Network of Windows, Linux, or UNIX Computers
You can start the 3D parallel version of FLUENT on a network of Windows, Linux, or
UNIX machines using 2 processes and check the network connectivity by performing the
following steps:
1. Start parallel FLUENT.
At the command prompt, type
fluent 3d -t2 -cnf=fluent.hosts
Parallel Processing
where -cnf indicates the location of the hosts text le. The hosts le is a text
le that contains a list of the computers on which you want to run the parallel
job. If the hosts le is not located in the directory where you are typing the
startup command, you will need to supply the full pathname to the le.
For example, the fluent.hosts le may look like the following:
my_computer
another_computer
See Chapter 31 of the Users Guide for additional information about hosts les
and parallel command line options.
For Linux or UNIX, at the command prompt, type fluent.
For Windows, type fluent -t2.
!
Do not specify any additional arguments (e.g., 3d).
(a) Specify the 3D network parallel version.
File Run...
i. Enable the 3D and the Parallel options in the Versions group box.
ii. Retain the default value of 1 for Processes in the Options group box.
iii. Specify the name and location of the hosts text le in the Hosts File
text box.
Parallel Processing
iv. Retain the selection of Default in the Interconnect drop-down list.
v. Click Run.
2. Check the network connectivity information.
Although FLUENT displays a message conrming the connection to each new com-
pute node and summarizing the host and node processes dened, you may nd it
useful to review the same information at some time during your session, especially
if more compute nodes are spawned to several dierent machines.
Parallel Show Connectivity...
(a) Set Compute Node to 0.
For information about all dened compute nodes, you will select node 0, since
this is the node from which all other nodes are spawned.
(b) Click Print.
------------------------------------------------------------------------------
ID Comm. Hostname O.S. PID Mach ID HW ID Name
------------------------------------------------------------------------------
n1 mpich2 another_computer Windows-32 21240 1 1 Fluent Node
host net my_computer Windows-32 1204 0 3 Fluent Host
n0* mpich2 my_computer Windows-32 1372 0 0 Fluent Node
------------------------------------------------------------------------------
ID is the sequential denomination of each compute node (the host process is
always host), Comm. is the communication library (i.e., MPI type), Hostname
is the name of the machine hosting the compute node (or the host process),
O.S. is the architecture, PID is the process ID number, Mach ID is the compute
node ID, and HW ID is an identier specic to the communicator used.
(c) Close the Parallel Connectivity panel.
Parallel Processing
Step 2: Reading and Partitioning the Grid
When you use the parallel solver, you need to subdivide (or partition) the grid into groups
of cells that can be solved on separate processors. If you read an unpartitioned grid into
the parallel solver, FLUENT will automatically partition it using the default partition
settings. You can then check the partitions to see if you need to modify the settings and
repartition the grid.
1. Inspect the automatic partitioning settings.
Parallel Auto Partition...
If the Case File option is enabled (the default setting), and there exists a valid parti-
tion section in the case le (i.e., one where the number of partitions in the case le
divides evenly into the number of compute nodes), then that partition information
will be used rather than repartitioning the mesh. You need to disable the Case File
option only if you want to change other parameters in the Auto Partition Grid panel.
(a) Retain the Case File option.
When the Case File option is enabled, FLUENT will automatically select a
partitioning method for you. This is the preferred initial approach for most
problems. In the next step, you will inspect the partitions created and be able
to change them, if required.
(b) Click OK to close the Auto Partition Grid panel.
2. Read the case le elbow3.cas.
File Read Case...
Parallel Processing
3. Display the grid (Figure 25.2).
Display Grid...
Z
Y
X
Grid
FLUENT 6.3 (3d, pbns, rke)
Figure 25.2: Grid Along the Symmetry Plane for the Mixing Elbow
4. Check the partition information.
Parallel Partition...
(a) Click Print Active Partitions.
Parallel Processing
FLUENT will print the active partition statistics in the console.
>> 2 Active Partitions:
P Cells I-Cells Cell Ratio Faces I-Faces Face Ratio Neighbors
0 11329 1900 0.168 37891 2342 0.062 1
1 11329 359 0.032 38723 2342 0.060 1
----------------------------------------------------------------------
Collective Partition Statistics: Minimum Maximum Total
----------------------------------------------------------------------
Cell count 11329 11329 22658
Mean cell count deviation 0.0% 0.0%
Partition boundary cell count 359 1900 2259
Partition boundary cell count ratio 3.2% 16.8% 10.0%
Face count 37891 38723 74272
Mean face count deviation -1.1% 1.1%
Partition boundary face count 2342 2342 2342
Partition boundary face count ratio 6.0% 6.2% 3.2%
Partition neighbor count 1 1
----------------------------------------------------------------------
Partition Method Principal Axes
Stored Partition Count 2
Done.
Note: FLUENT distinguishes between two cell partition schemes within a par-
allel problemthe active cell partition, and the stored cell partition. Here,
both are set to the cell partition that was created upon reading the case le.
If you repartition the grid using the Partition Grid panel, the new partition
will be referred to as the stored cell partition. To make it the active cell
partition, you need to click the Use Stored Partitions button in the Partition
Grid panel. The active cell partition is used for the current calculation,
while the stored cell partition (the last partition performed) is used when
you save a case le. This distinction is made mainly to allow you to par-
tition a case on one machine or network of machines and solve it on a
dierent one.
See Chapter 31 of the Users Guide for details.
(b) Review the partition statistics.
An optimal partition should produce an equal number of cells in each parti-
tion for load balancing, a minimum number of partition interfaces to reduce
interpartition communication bandwidth, and a minimum number of partition
neighbors to reduce the startup time for communication. Here, you will be
Parallel Processing
looking for relatively small values of mean cell and face count deviation, and
total partition boundary cell and face count ratio.
(c) Close the Partition Grid panel.
5. Examine the partitions graphically.
(a) Initialize the solution using the default values.
Solve Initialize Initialize...
In order to use the Contours panel to inspect the partition you just created,
you have to initialize the solution, even though you are not going to solve the
problem at this point. The default values are sucient for this initialization.
(b) Display the cell partitions (Figure 25.3).
Display Contours...
i. Enable Filled in the Options group box.
ii. Select Cell Info... and Active Cell Partition from the Contours of drop-down
lists.
iii. Select symmetry from the Surfaces selection list.
iv. Set Levels to 2, which is the number of compute nodes.
v. Click Display and close the Contours panel.
As shown in Figure 25.3, the cell partitions are acceptable for this problem.
The position of the interface reveals that the criteria mentioned earlier will be
Parallel Processing
Contours of Active Cell Partition
1.00e+00
5.00e-01
0.00e+00
Z
Y
X
Figure 25.3: Cell Partitions
matched. If you are dissatised with the partitions, you can use the Partition
Grid panel to repartition the grid. Recall that, if you wish to use the modied
partitions for a calculation, you will need to make the Stored Cell Partition the
Active Cell Partition by either clicking the Use Stored Partitions button in the
Partition Grid panel, or saving the case le and reading it back into FLUENT.
See Section 31.5.4 of the Users Guide for details about the procedure and
options for manually partitioning a grid.
6. Save the case le with the partitioned mesh (elbow4.cas).
File Write Case...
Parallel Processing
Step 3: Solution
1. Initialize the ow eld using the boundary conditions set at velocity-inlet-5.
Solve Initialize Initialize...
(a) Select velocity-inlet-5 from the Compute From drop-down list.
(b) Click Init.
A Warning dialog box will open, asking if you want to discard the data generated
during the rst initialization, which was used to inspect the cell partitions.
(c) Click OK in the Warning dialog box to discard the data.
(d) Close the Solution Initialization panel.
2. Enable the plotting of residuals during the calculation.
Solve Monitors Residual...
3. Start the calculation by requesting 200 iterations.
Solve Iterate...
The solution will converge in approximately 180 iterations.
4. Save the data le (elbow4.dat).
File Write Data...
Parallel Processing
Step 4: Checking Parallel Performance
Generally, you will use the parallel solver for large, computationally intensive problems,
and you will want to check the parallel performance to determine if any optimization is
required. Although the example in this tutorial is a simple 3D case, you will check the
parallel performance as an exercise.
See Chapter 31 of the Users Guide for details.
Parallel Timer Usage
Performance Timer for 179 iterations on 2 compute nodes
Average wall-clock time per iteration: 0.574 sec
Global reductions per iteration: 123 ops
Global reductions time per iteration: 0.000 sec (0.0%)
Message count per iteration: 70 messages
Data transfer per iteration: 0.907 MB
LE solves per iteration: 7 solves
LE wall-clock time per iteration: 0.150 sec (26.1%)
LE global solves per iteration: 2 solves
LE global wall-clock time per iteration: 0.001 sec (0.1%)
AMG cycles per iteration: 12 cycles
Relaxation sweeps per iteration: 479 sweeps
Relaxation exchanges per iteration: 141 exchanges
Total wall-clock time: 102.819 sec
Total CPU time: 308.565 sec
The most accurate way to evaluate parallel performance is by running the same par-
allel problem on 1 CPU and on n CPUs, and comparing the Total wall-clock time
(elapsed time for the iterations) in both cases. Ideally you would want to have the Total
wall-clock time with n CPUs be 1/n times the Total wall-clock time with 1 CPU.
In practice, this improvement will be reduced by the performance of the communication
subsystem of your hardware, and the overhead of the parallel process itself. As a rough
estimate of parallel performance, you can compare the Total wall-clock time with the
CPU time. In this case, the CPU time was approximately 3 times the Total wall-clock
time. For a parallel process run on two compute nodes, this reveals very good parallel
performance, even though the advantage over a serial calculation is small, as expected for
this simple 3D problem.
Note: The wall clock time, the CPU time, and the ratio of iterations to convergence time
may dier depending on the type of computer you are running (e.g., Windows32,
Linux 64, etc.).
Parallel Processing
Step 5: Postprocessing
See Tutorial 1 for complete postprocessing exercises for this example. Here, two plots are
generated so that you can conrm that the results obtained with the parallel solver are the
same as those obtained with the serial solver.
1. Display an XY plot of temperature across the exit (Figure 25.4).
Plot XY Plot...
(a) Select Temperature... and Static Temperature from the Y Axis Function drop-
down lists.
(b) Select pressure-outlet-7 from the Surfaces selection list.
Parallel Processing
(c) Click Plot and close the Solution XY Plot panel.
Z
Y
X
Static Temperature
Position (in)
(k)
Temperature
Static
8 7.5 7 6.5 6 5.5 5 4.5 4 3.5
3.01e+02
3.00e+02
2.99e+02
2.98e+02
2.97e+02
2.96e+02
2.95e+02
2.94e+02
2.93e+02
pressure-outlet-7
Figure 25.4: Temperature Distribution at the Outlet
2. Display lled contours of the custom eld function dynam-head (Figure 25.5).
Display Contours...
(a) Select Custom Field Functions... from the Contours of drop-down list.
Parallel Processing
The custom eld function you created in Tutorial 1 (dynam-head) will be se-
lected in the lower drop-down list.
(b) Enter 80 for Levels.
(c) Select symmetry from the Surfaces selection list.
(d) Click Display and close the Contours panel.
Contours of dynamic-head
9.91e+02
9.66e+02
9.29e+02
8.92e+02
8.55e+02
8.18e+02
7.81e+02
7.43e+02
7.06e+02
6.69e+02
6.32e+02
5.95e+02
5.58e+02
5.20e+02
4.83e+02
4.46e+02
4.09e+02
3.72e+02
3.35e+02
2.97e+02
2.60e+02
2.23e+02
1.86e+02
1.49e+02
1.12e+02
7.43e+01
3.72e+01
0.00e+00
Z
Y
X
Figure 25.5: Contours of the Custom Field Function, Dynamic Head
Summary
This tutorial demonstrated how to solve a simple 3D problem using FLUENTs parallel
solver. Here, the automatic grid partitioning performed by FLUENT when you read the
mesh into the parallel version, was found to be acceptable. You also learned how to check
the performance of the parallel solver to determine if optimizations are required.
See Section 31.6 of the Users Guide for additional details about using the parallel solver.

Tutorial 25. Parallel Processing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tutorial 25. Parallel Processing

Uploaded by

Copyright:

Available Formats

Tutorial 25.

You might also like