Parallel Computational Fluid Dynamics '99 by D. Keyes, A. Ecer, and N. Satofuka by D. Keyes, A. Ecer, and N. Satofuka - Read Online

Book Preview

Parallel Computational Fluid Dynamics '99 - D. Keyes

You've reached the end of this preview. Sign up to read more!
Page 1 of 1

PLENARY PAPERS

Parallel multigrid solution and optimization in compressible flow simulation and design

J.-A. Désidéri, L. Fournier, S. Lanteri and N. Marco,     INRIA Projet Sinus, 2004 Route des Lucioles, 06902 Sophia-Antipolis Cedex

B. Mantel and J. Périaux,     Dassault Aviation, France

J.F. Wang,     Nanjing Institute of Aeronautics and Astronautics, China

This paper describes recent achievements regarding the development of parallel multi-grid (PMG) methods and parallel genetic algorithms (PGAs) in the framework of compressible flow simulation and optimization. More precisely, this work is part of a broad research objective aimed at building efficient and robust optimization strategies for complex multi-disciplinary shape-design problems in aerodynamics. Ultimately, such a parallel optimization technique should combine the following ingredients:

I1. parallel multigrid methods for the acceleration of the underlying flow calculations;

I2. domain decomposition algorithms for an efficient and mathematically well posed distribution of the global optimization process on a network of processors;

I3. robust parallel optimization techniques combining non-deterministic algorithms (genetic algorithms in the present context) and efficient local optimization algorithms to accelerate the convergence to the optimal solution;

I4. distributed shape parametrization techniques.

In this contribution, we address mainly topics I1 and I3 in the context of optimal airfoil design.

1 PARALLEL STRATEGIES IN GENETIC ALGORITHMS

1.1 Introduction

Genetic algorithms (GAs) are search algorithms based on mechanisms simulating natural selection. They rely on the analogy with Darwin’s principle of survival of the fittest. John Holland, in the 1970’s, introduced the idea according to which difficult optimization problems could be solved by such an evolutionary approach. The technique operates on a population of potential solutions represented by strings of binary digits (called chromosomes or individuals) which are submitted to several semi-stochastic operators (selection, crossover and mutation). The population evolves during the generations according to the fitness value of the individuals; then, when a stationary state is reached, the population has converged to an/the optimized solution (see [3] for an introduction to the subject). GAs differ from classical optimization procedures, such as the steepest descent or conjugate gradient method, in many ways:

- the entire parameter set is coded;

- the iteration applies to an entire population of potential solutions, in contrast to classical algorithms, in which a single candidate solution is driven to optimality by successive steps;

- the iteration is an evolution step, or new generation, conducted by semi-stochastic operators;

- the search space is investigated (more) globally, enhancing robustness;

- two keywords are linked to GAs : exploration and exploitation. Exploration of the search space is important at the beginning of the GA process, while exploitation is desirable when the GA process is close to the global optimum.

GAs have been introduced in aerodynamics shape design problems for about fifteen years (see Kuiper et al. [4], Périaux et al. in [9], Quagliarella in [10] and Obayashi in [8], who present 3D results for a transonic flow around a wing geometry). The main concern related to the use of GAs for aerodynamic design is the computational effort needed for the accurate evaluation of a design configuration that, in the case of a crude application of the technique, might lead to unacceptable computer time if compared with more classical algorithms. In addition, hard problems need larger populations and this translates directly into higher computational costs. It is a widely accepted position that GAs can be effectively parallelized and can in principle take full advantage of (massively) parallel computer architectures. This point of view is above all motivated by the fact that within a generation (iteration) of the algorithm, the fitness values associated with each individual of the population can be evaluated in parallel.

In this study, we have developed a shape optimum design methodology that combines the following ingredients:

- the underlying flow solver discretizes the Euler or full Navier-Stokes equations using a mixed finite element/finite volume formulation on triangular meshes. Time integration to steady state is achieved using a linearized Euler implicit scheme which results in the solution of a linear system for advancing the solution to the next time step;

- a binary-coded genetic algorithm is used as the main optimization kernel. In our context, the population of individuals is represented by airfoil shapes. The shape parametrization strategy is based on Bézier curves.

1.2 Parallelization strategy

Several possible strategies can be considered for the parallelization of the GA-based shape design optimization described above:

- a first strategy stems from the following remark: within a given generation of a GA, the evaluation of the fitness values associated with the population of individuals defines independent processes. This makes GAs particularly well suited for massively parallel systems; we also note that a parent/chuld approach is a standard candidate for the implementation of this first level of parallelism, especially when the size of the populations is greater than the available number of processors;

- a second strategy consists of concentrating the parallellization efforts on the process underlying a fitness value evaluation, here the flow solver. This approach finds its main motivation in the fact that, when complex field analysers are used in conjunction with a GA, then the aggregate cost of fitness values evaluations can represent between 80 to 90% of the total optimization time. A SPMD paradigm is particularly well suited for the implementation of this strategy;

- the third option combines the above two approaches and clearly yields a two-level parallelization strategy which has been considered here and which will be detailed in the sequel. Our choice has been motivated by the following remarks: (1) a parallel version of the two-dimensional flow solver was available and adapted to the present study; (2) we have targetted a distributed memory SPMD implementation and we did not want the resulting optimization tool to be limited by memory capacity constraints, especially since the present study will find its sequel in its adaptation to 3D shape optimization problems, based on more complex aerodynamical models (Navier-Stokes equations coupled with a turbulence model); and (3) we believe that the adopted parallelization strategy will define a good starting-point for the construction and evaluation of sub-populations based parallel genetic algorithms (PGAs).

In our context, the parallelization strategy adopted for the flow solver combines domain partitioning techniques and a message-passing programming model [6]. The underlying mesh is assumed to be partitioned into several submeshes, each one defining a subdomain. Basically, the same base serial code is executed within every subdomain. Applying this parallelization strategy to the flow solver results in modifications occuring in the main time-stepping loop in order to take into account one or several assembly phases of the subdomain results. The coordination of subdomain calculations through information exchange at artificial boundaries is implemented using calls to functions of the MPI library.

The paralellization described above aims at reducing the cost of the fitness function evaluation for a given individual. However, another level of parallelism can clearly be exploited here and is directly related to the binary tournament approach and the crossover operator. In practice, during each generation, individuals of the current population are treated pairwise; this applies to the selection, crossover, mutation and fitness function evaluation steps. Here, the main remark is that for this last step, the evaluation of the fitness functions associated with the two selected individuals, defines independent operations. We have chosen to exploit this fact using the notion of process groups which is one of the main features of the MPI environment. Two groups are defined, each of them containing the same number of processes; this number is given by the number of subdomains in the partitioned mesh. Now, each group is responsible for the evaluation of the fitness function for a given individual. We note in passing that such an approach based on process groups will also be interesting in the context of sub-populations based PGAs (see [1] for a review on the subject); this will be considered in a future work.

1.3 An optimum shape design case

The method has been applied to a direct optimization problem consisting in designing the shape of an airfoil, symbolically denoted by γ, to reduce the shock-induced drag, CD, while preserving the lift, CL, corresponding to the RAE2822 airfoil, immersed in an Eulerian flow at 2° of incidence and a freestream Mach number of 0.73. Thus, the cost functional was given the following form:

(1)

The non-linear convergence tolerance has been fixed to 10−6. The computational mesh consists of 14747 vertices (160 vertices on the airfoil) and 29054 triangles. Here, each chromosome represents a candidate airfoil defined by a Bézier spline whose support is made of 7+7 control points at prescribed abscissas for the upper and lower surfaces. A population of 30 individuals has been considered. After 50 generations, the shape has evolved and the shock has been notably reduced; the initial and final flows (iso-Mach values) are shown on Figure 1. Additionally, initial and final values of CD and CL are given in Table 1.

Table 1

Drag reduction: initial and optimized values of the CDand CLcoefficients

Figure 1 Drag reduction: initial and optimized flows (steady iso-Mach lines)

The calculations have been performed on the following systems: an SGI Origin 2000 (equipped with 64 MIPS R10000/195 Mhz processors) and an experimental Pentium Pro (P6/200 Mhz, running the LINUX system) cluster where the interconnection is realized through FastEthernet (100 Mbits/s) switches. The native MPI implementation has been used on the SGI Origin 2000 system while MPICH 1.1 has been used on the Pentium Pro cluster. Performance results are given for 64 bit arithmetic computations.

We compare timing measurements for the overall optimization using one and two process groups. Timings are given for a fixed number of generations (generally 5 optimization iterations). In Tables 2 and 3 below, Ng and Np respectively denote the number of process groups and the total number of processes (Ng = 2 and Np = 4 means 2 processes for each of the two groups), CPU is the total CPU time, Flow is the accumulated flow solver time, and Elapsed is the total elapsed time (the distinction between the CPU and the elapsed times is particularly relevant for the Pentium Pro cluster). Finally, S(Np) is the parallel speed-up ratio Elapsed(Ng = 1, Np = 5)/Elapsed(Ng, Np), the case Ng = 1, Np = 5 serving as a reference. For the multiple processes cases, the given timing measures (CPU and Flow) always correspond to the maximum value over the per-process measures.

Table 2

Parallel perfomance results on the SGI Origin 2000

Table 3

Parallel perfomance results on the Pentium Pro cluster

For both architectures, the optimal speed-up of 2 is close to being achieved. For the Pentium Pro cluster, the communication penalty is larger, and this favors the usage of fewer processors and more groups. For the SGI Origin 2000, the situation is different: communication only involves memory access (shared memory), and parallelization remains efficient as the number of processors increases; moreover, additional gains are achieved due to the larger impact of cache memory when subdomains are small.

1.4 High-lift multi-element airfoil optimization by GAs and multi-agent strategies

In this section, we report on numerical experiments conducted to optimize the configuration (shape and position) of a high-lift multi-element airfoil by both conventional GAs and more novel ones, based on multi-agent strategies better fit to parallel computations. These experiments have been also described in [7] in part.

The increased performance requirements for high-lift systems as well as the availability of (GA-based) optimization methods tend to renew the emphasis on multi-element aerodynamics. High-lift systems, as depicted on Figure 2, consist of a leading-edge device (slat) whose effect is to delay stall angle, and a trailing-edge device (flap) to increase the lift while maintaining a high L/D ratio.

Figure 2 Geometry of multi-element airfoil including slat, main body and flap (shape and position definition)

The lift coefficient CL of such an airfoil is very sensitive to the flow features around each element and its relative position to the main body; in particular, the location of the separation point can change rapidly due to the wake/boundary-layer interaction. As a result, the functional is non-convex and presents several local optima, making use of a robust algorithm necessary to a successful optimization.

Here, the 2D flow computation is conducted by the Dassault-Aviation code Damien which combines an inviscid flow calculation by a panel method with a wake/boundary-layer interaction evaluation. This code, which incorporates data concerning transition criteria, separated zones, and wake/boundary-layer interaction, has been thoroughly calibrated by assessments and global validation through comparisons with ONERA wind-tunnel measurements. As a result, viscous effects can be computed, and this provides at a very low cost a fairly accurate determination of the aerodynamics coefficients. As a counterpart of this simplified numerical model, the flow solver is non differentiable and can only be treated as a black box in the optimization process. Evolutionary algorithms are thus a natural choice to conduct the optimization in a situation of this type.

Figure 3 relates to a first experiment in which only the 6 parameters determining the positions relative to the main body (deflection angle, overlap and gap) of the two high-lift devices (slat and flap) have been optimized by a conventional GA, similar to the one of the previous section. Provided reasonable ranges are given for each parameter, an optimum solution is successfully found by the GA, corresponding to an improved lift coefficient of 4.87.

Figure 3 Initial (top) and final (bottom) configurations in position optimization problem

The second experiment is the first step in the optimization of the entire configuration consisting of the shapes of the three elements and both positions of slat and flap. More precisely, only the inverse problem consisting in reconstructing the pressure field is considered presently. If γS, γB and γF denote the design variables associated with the slat, main body and flap respectively, and γ = (γS, γB, γF) represents a candidate configuration, one minimizes the following functional:

(2)

in which, for example:

(3)

is a positive integral extending over one element only (slat) but reflecting in the integrand the interaction of the whole set of design variables. Here, Pt is the given target pressure; similar definitions are made for JB(γ) and JF(γ).

In order to reduce the number of design variables and enforce smoothness of the geometry, shapes are represented by Bézier curves.

This inverse problem has been solved successfully by the GA first by a global algorithm in which the chromosomes contain the coded information associated with all design variables indiscriminately. The convergence history of this experiment is indicated on Figure 4 for the first 200 generations.

Figure 4 Optimization of shape and position design variables by various strategies all involving the same number of cost functional evaluations

An alternative to this approach is provided by optimization algorithms based on (pseudo) Nash strategies in which the design variables are a priori partitioned into appropriate subsets. The population of a given subset evolves independently at each new generation according to its own GA, with the remaining design variables being held fixed and equal to the best elements in their respective populations found at the previous generation. Evidently, in such an algorithm, the computational task of the different GAs, or players to use a term of game theory, can be performed in parallel [11].

Many different groupings of the design variables can be considered, two of which are illustrated on Figure 4: a 3-player strategy (slat shape and position; body shape; flap shape and position) and a 5-player strategy (slat, body and flap shapes, slat and flap positions). The two algorithms achieve about the same global optimum, but the parameters of the GAs (population size) have been adjusted so that the endpoints of the two convergence paths correspond to the same number of functional evaluations as 100 generations of the global algorithm. Thus, this experiment simulates a comparison of algorithms at equal serial cost. This demonstrates the effectiveness of the multi-agent approach, which achieves the global optimum by computations that could evidently be performed in parallel.

We terminate this section by two remarks concerning Nash strategies.

First, in a preliminary experiment related to the slat/flap position inverse problem, a 2-player game in which the main body (γB) is fixed, an attempt was made to let the population of γS found at the previous generation,) and symmetrically for γF . Figure 5 indicates that in such a case, the algorithm fails to achieve the desired global optimum.

Figure 5 Effect of the fitness definition on the convergence of slat/flap position parameters

Second, observe that in the case of a general cost function, a Nash equilibrium in which a minimum is found with respect to each subgroup of variables, the other variables being held fixed, does not necessarily realize a global minimum. For example, in the trivial case of a function f(x, y) of two real variables, a standard situation in which the partial functions:

(4)

achieve local minima at x* and y* respectively, is realized typically when:

(5)

and this does not imply that the Hessian matrix be positive definite. However, in the case of an inverse problem in which each individual positive component of the cost function is driven to 0, the global optimum is indeed achieved.

2 PARALLEL MULTIGRID ACCELERATION

2.1 Introduction

Clearly, reducing the time spent in flow calculations (to the minimum) is crucial to make GAs a viable alternative to other optimization techniques. One possible strategy to achieve this goal consists in using a multigrid method to accelerate the solution of the linear systems resulting from the linearized implicit time integration scheme. As a first step, we have developed parallel linear multigrid algorithms for the acceleration of compressible steady flow calculations, independently of the optimization framework. This is justified by the fact that the flow solver is mainly used as a black box by the GA. The starting point consists of an existing flow solver based on the averaged compressible Navier-Stokes equations, coupled with a k – turbulence model [2]. The spatial discretization combines finite element and finite volume concepts and is designed for unstructured triangular meshes.

Steady state solutions of the resulting semi-discrete equations are obtained by using an Euler implicit time advancing strategy which has the following features: linearization (approximate linearization of the convective fluxes and exact differentiation of the viscous terms); preconditioning (the Jacobian matrix is based on a first-order Godunov scheme); and local time stepping and CFL law (a local time step is computed on each control volume). Each pseudo time step requires the solution of two sparse linear systems (respectively, for the mean flow variables and for the variables associated with the turbulence model).

The multigrid strategy is adopted to gain efficiency in the solution of the two subsystems. In the present method, the coarse-grid approximation is based on the construction of macro-elements, more specifically macro control-volumes by volume-agglomeration. Starting from the finest mesh, a greedy coarsening algorithm is applied to generate automatically the coarse discretizations (see Lallemand et al. [5]).

Parallelism is introduced in the overall flow solver by using a strategy that combines mesh partitioning techniques and a message passing programming model. The MPI environment is used for the implementation of the required communication steps. Both the discrete fluxes calculation and the linear systems solution are performed on a submesh basis; in particular, for the basic linear multigrid algorithm which is multiplicative (i.e. the different levels are treated in sequence with inter-dependencies between the partial results produced on the different levels), this can be viewed as an intra-level parallelization which concentrates on the smoothing steps performed on each member of the grid hierarchy. A necessary and important step in this adaptation was the construction of appropriate data structures for the distribution of coarse grid calculations. Here, this has been achieved by developing a parallel variant of the original greedy type coarsening algorithm, which now includes additional communication steps for a coherent construction of the communication data structures on the partitioned coarse