Professional Documents
Culture Documents
DETC/CIE2015-46476
Cameron Turner
College of Engineering and Computer Science
Colorado School of Mines
Golden CO 80401, USA
ABSTRACT INTRODUCTION
In this paper we address the particular need for high-speed or Composite materials play an important role in many engineer-
”real-time” characterization of realistic anisotropic material sys- ing design and manufacturing disciplines. As the usage of these
tems such as laminated composites. This is driven by the desire materials continues to grow, the development of computational
to dynamically alter the loading paths applied by a multiaxial tools that allow deeper understanding of these materials and their
robotic test frame during the testing of a specimen, so that strain responses is necessary. The problem of characterization is par-
states are developed in the specimen in a manner that activates ticularly important, both to analyze the properties of novel ma-
the maximum excitation of the specimen’s constitutive proper- terial systems and the as-built properties of more conventional
ties. In order to achieve this goal, we present an evolutionary analogs. One of our current research needs is the development of
adaptation of earlier work into computationally efficient material “real-time” characterization algorithms which can compute the
characterization using response-surface surrogate models. This properties of test specimens at least several times per second.
approach is enhanced by the adoption of highly-parallel Gen- The development of real-time characterization is desirable as it
eral Purpose Graphics Processing (GPGPU) computing. We dis- will allow multiaxial robotic test frames, such as the “NRL66.3”
cuss the challenges of adapting the characterization problem for that was custom-designed at the Naval Research Laboratory [1,
GPGPU computing, particularly in terms of parallelization, syn- 2] shown in Fig. 1, to apply dynamically varying direction load
chronization, and approximation. Two parallelized algorithms paths in order to properly excite all of the constitutive parame-
for characterization are developed, and the merits of each are ters of the specimen undergoing testing. In this context we use
discussed. We then demonstrate validation results on a simple the term “excite” to refer to the development of strains states in
linear-elastic material system, and present statistical data which the test specimen which produce high sensitivity in the mate-
demonstrate the robustness of the approach in the presence of ex- rial’s response relative to its constitutive properties. This feature
perimental noise. We conclude with remarks regarding the per- will essentially remove the requirement to use one specimen per
formance of the GPGPU-enabled characterization algorithm, and linear loading path as it will enable a non-linear but optimized
its applicability to more complex material systems. loading path and therefore will lead to lower numbers of ma-
terial specimens than the ones used under the current practices,
1
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.
research on their use in materials characterization. In the sub-
sequent section, we discuss the opportunities for parallelization
of the characterization problem. Expanding on this theme, we
then discuss the topics of parallelization, synchronization, and
approximation in-depth, and develop a pair of parallel algorithms
for characterization. Following this, we give some sample results
from the characterization of a linear-elastic specimen in real-
time. Statistical analyses of these results are presented to demon-
strate the robustness of the approach in the presence of experi-
mental noise. In a final section, we present our conclusions re-
garding GPGPU-accelerated material characterization, and focus
in particular in the immediate extension of the proposed method-
ology to examine more complex physical systems.
2
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.
ConductfPhysical Experimental efficient but physics-agnostic representations of more expensive
Experiment Dataf physics-specific models which are generated by sampling the un-
InversefCharacterization derlying model and subsequently fitting some functional form to
the sampled data. They can also be conceptualized as a model
Constitutive AdjustfConstitutive of another model. This is the reason the term metamodel is fre-
properties Properties No quently found in the literature, especially in the field of design
engineering. We avoid the use of this term however, as it conflicts
EvaluatefSystem Model with an existing definition in the field of mathematical logic.
Converged?
Modelf Data A wide variety of surrogate models exists. They can be
broadly grouped into three families [3, 34]. Rigorous compar-
Yes
isons of the different aspects of these surrogate modeling tech-
IdentifiedfConstitutive AcquirefConstitutive niques are presented in [34, 35]. Generally, there are tradeoffs
Properties Properties
associated with the selection of a surrogate model type; for in-
stance, a technique capable of approximating highly non-linear
FIGURE 2. Illustration of the FEA-in-the-loop numeri- data may be more computationally expensive than other alterna-
cal/experimental method for material characterization. tives. In this work we use Response Surface Model (RSM) sur-
rogates of material constitutive models composed with the shape
of structural components. This choice is driven by the relatively
of this method is that it is limited to cases where such fields compact and computationally efficient nature of these models. In
are derivable. Another recent technique reduces the problem the next section, we turn to the discussion of how these surrogate
to that of finding the pseudoinverse of a matrix formed using models are used to characterize materials.
the forward material model by taking advantage of the full-field
strain measurements achieved via the Meshless Random Grid
Method (MRGM) [26]. Energy-based characterization methods Surrogate Characterization Methodology
using total and dissipated strain energy density determination Our characterization approach employs a mixed experimental-
[27,28], that take advantage of the total potential energy [29] numerical methodology. We assume a known model of the ma-
have also been demonstrated. Nonparametric constitutive mod- terial system that operates over a physical domain Ω ⊂ R3 (i.e.
els of anisotropic materials have also been used to identify elas- a test specimen such as that shown in Fig. 4), and we approach
tic constants. The use of stochastic corrections to parameter esti- characterizing this system as the task of determining a set of in-
mates using Kalman filtering is shown in [30]. Another approach dependent constitutive parameters in a such that,
based on the use of artificial neural networks [31] is presented in
[32] and further demonstrated in [33]. tε (xt , ft ) + ε e + ε r = cε (a, xc , fc ) + ε m , (1)
In the present work we focus on the acceleration of the FEA-
in-the-loop characterization approach (as opposed to defining a
new method for characterization) as it is conceptually straight- where tε is a vector field function that collects experimen-
forward. Figure 2 illustrates this technique: a physical experi- tally observed strain tensor components at a set of points
ment produces an experimental dataset and is associated with a xt ∈ So ⊆ Ω ⊂ R3 and cε is likewise a system model vector
material system model (finite element or otherwise) of the tested function of strain tensor components, which are functions of
specimen. Initial constitutive parameters for the material system a constitutive model. The specimen’s response is sampled at
are estimated using any a priori knowledge available. From this locations xc ∈ So0 ⊆ Ω0 . The domain Ω0 is the computational do-
initial estimate, an optimization algorithm iteratively adjusts this main the system model evaluates into. The subdomains So and
model until the model response matches the experimental data. So0 serve the purpose of providing an optimization space for the
Once this has been achieved, the model parameters are extracted inverse problem that is compatible with the capabilities of exper-
and reported. imental measurement techniques. The parameters fc dictate the
specific loading of the test specimen . Given a well-designed
experimental/numerical setup, it is reasonable to assume that xt
CHARACTERIZATION USING SURROGATE MODELS can be taken to coincide with xc . Similarly, the loading of the
One approach to accelerating inverse material characterization experimental and numerical forms can be considered to be ap-
is to remove the computationally expensive combination of the proximately equal ft ≈ fc . The terms ε e , ε r , and ε m represent
FEM model with the constitutive model from the optimization respectively systematic experimental errors, random experimen-
loop and replace it with an efficient surrogate, as seen in Fig. tal errors, and errors due to system model formulation. In a well
3. Generally speaking, surrogate models are computationally designed experiment and a properly constructed computational
3
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.
Conduct?Physical Experimental
Experiment Data?
Preprocessing? Inverse?Characterization
Evaluate?System
Constitutive Adjust?Constitutive
Model?
Properties Properties No
Sampled
Data Evaluate?Surrogate Model
Converged?
Model? Data
Construct?Surrogate
model Yes
Material domain (Ω) 4. does not require a priori assumptions on the values of ā or
Domain surface (S) xi the structure of cε .
y
Observable surface
region (So) If the physical experiment, underlying model, and surrogate
Observable x model are all well designed, then all of the error terms will be
z
region boundary small. Combined with the choice of common measurement loca-
Direction of major tions, geometry and loadings between the experiment and model,
orthotropic axis Multiple-ply layup Eq. (4.9) can be expressed compactly as
system model, these errors should be relatively small in magni- Definition of the Surrogate Model
tude. The development and definition of the surrogate model are
We note at the outset that cε is typically computationally ex- largely driven by experimental considerations. We assume that
pensive to formulate and to evaluate. In some simple cases, ana- the test specimen is represented by a domain Ω ∈ R3 with a
lytical constitutive models are available, but cε is more typically surface boundary S. Since mechanical elements of physical test
formulated via the solution of a FEA model. In order to achieve apparatus (e.g. the grips) typically do not permit experimental
a more computationally efficient characterization, we introduce measurements on the entirety of S, we define the observable sub-
a surrogate model, mε , of the original strain response model cε set of So ⊂ S as shown in Fig. 4. The experimental measurement
that involves the original constitutive model. Therefore, Eq. (1) locations are thus restricted to x ∈ So .
becomes, Because FEA based system models evaluate to a set of do-
main nodes we also assume that our sampling occurs on a dis-
tε (xt , ft ) + ε e + ε r = mε (ā, xc , fc ) + ε m + ε s (2) crete number of such points represented as vectors xi ∈ So , i =
1 . . . nm . Consequently, our data consists of a set of nm individual
surrogate models, mεi , each of which approximates cε at a single
where ā are the approximated material parameters of the surro- point i represented by a vector xi .
gate model and ε s denotes additional error introduced by the sur-
rogate model. Successful characterization depends on construct-
mεi (ā) = mε (ā, xi ) , i = 1 . . . nm (4)
ing mε such that it
1. represents cε sufficiently well, i.e. ε s is small;
The purpose of each of these surrogates is to generate a vector
2. can be evaluated with minimum computational expense;
of scalars measurable at a point on the material specimen, that
3. is robust with respect to experimental noise in the sampling
depend on the constitutive parameters. We have decided to use
of tε , i.e. increases in ε e or ε r have little effect on the results
a full field strain measurement methodology and therefore, con-
produced;
struct these surrogates to consist of the measured surface strain
4
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.
tensor components of the form, Based on Eq. 3, such a problem can be formulated as,
T
i Z
i i
mεi (ā) = m εxx , m εyy , m εxy , i = 1 . . . nm (5) min h (ā) = kmε (ā, x) − tε (x)k dS (8)
So
5
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.
Constitutive1Model1 Experimental1Full-Field
(e.g.1FEA1mesh) Strain1Measurements
Sampling1Locations εxy
xi1(e.g.1FEA1nodes)
εyy
εxx
nm
min.11∑||mi(a1,1a2)1-1ti||1
Surrogate1Models1(mi) i=11
Evaluating the System Model Each sample of the system Fitting Surrogate Models As outlined in the previous section
model presents an opportunity for parallelization. At the present the surrogate models must be fit to the data produced by evalu-
time, and for most material systems, a forward model evaluation ating the system model by solving a system of linear equations.
is equivalent to a finite element analysis. This requires the inver- The solution of linear systems on the GPU is an extensively re-
sion of linear system which is potentially very large. The possi- searched application of GPU computing [43,44], and suitable al-
bilities for GPU computing for this application have already been gorithms have reached a high level of sophistication. As before,
well-explored (e.g. [40,41]), and are already offered as features this encouraged us to study other avenues of parallelization.
in most commercial FEA packages. For this reason, we chose
not to pursue further research in this direction. Batch-fitting Surrogate Models Just as the possibility for
computing system model samples in parallel exists, so does the
Computing Evaluation Batches As discussed previously, the possibility of fitting the surrogates in parallel. In this case the
system model is evaluated many times for different combinations results of all evaluations of the system model must be collected
of constitutive material parameters. Because these system model at every point in So corresponding to a surrogate model. As be-
evaluations are independent, it is possible to conduct them in fore, the limited memory resources on the GPU do not encourage
parallel. However, because each evaluation is computationally parallelization in this fashion.
complex, most GPU architectures will be constrained by mem-
ory capacity to conducting only a few simultaneous evaluations. Objective Function Computation From the perspective of the
For this reason, it seems that this opportunity for paralleliza- broader goal of this research, the four methods of parallelization
tion is better exploited by traditional multi-computer clusters in mentioned earlier do not address the goal of achieving real-time
a distributed memory setting. Many commercial FEA solvers al- characterization. These four steps are effectively preprocessing
ready offer this feature, and allow “parametric sweeps” to be dis- operations which can be computed “off-line” before physical ex-
tributed across a cluster of workstation or server computers [42] perimentation begins. While GPU computing may be used to ac-
where each node is responsible for the evaluation of the system celerate this preprocessing, especially in the system model eval-
model for a particular subset of constitutive parameter values. uation, we elected to focus upon the fifth opportunity enumer-
For these reasons we chose to pursue the rest of the areas of op- ated previously: the computation of the objective function for
portunity as enumerated earlier. It should be noted, however, that optimization in order to achieve the inverse solution. This led
the latest generation of GPU hardware features greatly increased us to the development of a fast parallel implementation of the
memory capacity. As these devices decrease in cost their use for optimization algorithm used to recover the material constitutive
computing evaluation batches will become competitive. parameters.
6
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.
CPUNPreprocessing
CPUNPreprocessing
SampleNNMaterialN SampleNNMaterialN
START START
SystemNModel SystemNModel
EstimateNItitial EstimateNItitial
MaterialN ConstructN/NFit ConstructN/NFit
MaterialN SurrogateNModels
PropertiesN SurrogateNModels
PropertiesN
GPUNParallelNOptimization
GPUNParallelNEvaluationN
ComputeN
ComputeN
ComputeN ComputeN
ComputeN
ComputeN
ComputeN ComputeN
Obj.NFxn.N
Obj.NFxn.N
Obj.NFxn.N Obj.NFxn.N
Obj.NFxn.N
Obj.NFxn.N
Obj.NFxn.N Obj.NFxn.N
atNnodeNi atNnodeNi
ComputeN
ComputeN
ComputeN ComputeN
ComputeN
ComputeN
ComputeN ComputeN
Obj.NFxn.N
Obj.NFxn.N
Obj.NFxn.N Obj.NFxn.N
Obj.NFxn.N
Obj.NFxn.N
Obj.NFxn.N Obj.NFxn.N
atNnodeNi atNnodeNi
ComputeN
ComputeN
ComputeN ComputeN
ComputeN
ComputeN
ComputeN ComputeN
Obj.NFxn.N
Obj.NFxn.N
Obj.NFxn.N Obj.NFxn.N
Obj.NFxn.N
Obj.NFxn.N
Obj.NFxn.N Obj.NFxn.N
atNnodeNi atNnodeNi
Σ Σ
CPUNPostprocess CPUNOptimization
CPUNPostprocess
Adjust Store
Converged? N MaterialN Material
PropertiesN PropertiesN
END
Store
Material END
PropertiesN
FIGURE 6. Two possible options for the implementation of the GPGPU-accelerated characterization algorithm. At left, a concept termed “GPU
Parallel Evaluation” (GPE) is shown, while at right another concept termed “GPU Parallel Optimization” (GPO) is given.
PARALLELIZATION, SYNCHRONIZATION & APPROXI- allelizaton, synchronization, and approximation. The algorithm
MATION developed is termed the GPU-Accelerated Material Mechanics
Upon consideration of the topics above, it is clear that a focus Analysis (GAMMA). It is outlined in Fig. 6, and explored in
on the inversion problem itself will be of the greatest utility. As further detail herein.
expressed in Eq. 9, the inversion of the forward material sys- In terms of implementation, GAMMA is principally writ-
tem model is achieved by the use of a non-linear optimization ten in C++. The CUDA framework [45-47] was used to imple-
algorithm which is used to discover the global minimum of an ment this algorithm in order to achieve good computational per-
objective function. This objective function is itself the summa- formance on the GPU. However, this choice sacrifices platform
tion of many terms, each corresponding to a sampling location interoperability, as CUDA-enabled codes will only run on graph-
xi within the domain So as illustrated in Fig. 5. The develop- ics hardware manufactured by NVIDIA.
ment of the GPGPU-accelerated characterization algorithm was
conducted with great care given to the relationships between par-
7
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.
Parallelization The parallelization concept used to implement
GAMMA is based on spatial discretization. A paradigm of one
thread per term of the objective function is used. This also corre-
sponds to a concept of one thread per sample location, as each
sample location contributes one distinct term to the objective
function. Given typical formulations of the forward problem, the
number of threads is typically no greater than 105 . This paradigm
does not enforce a specification for the grouping of threads into
thread blocks. As a result, the computational grid is initialized
in the simplest fashion: a one-dimensional grid of thread blocks
each consisting of a one-dimensional grid of 64 threads. The
thread block size is selected in a semi-arbitrary fashion, as the
computational performance does not appear to be particularly FIGURE 7. A pair of objective functions with two independent vari-
sensitive to this parameter. While this parallelization paradigm ables corresponding to the horizontal axes, and one dependent variable
is conceptually straight-forward, and indeed seems to be the only corresponding to the vertical direction. At left, a poorly conditioned ob-
option readily at hand, it has serious consequences in terms of jective function is shown, while at right a well conditioned example is
synchronization needs. given.
It should also be noted that GAMMA utilizes previously-
developed parallel evaluation techniques when possible. If a
scheme was implemented and tested. It was immediately found
GPU-enabled finite element solver is available, for instance, it
to produce inaccurate or incorrect results during synthetic exper-
is employed to collect the initial samples of the system model.
imental benchmarking. This aberrant behavior was explained by
Additionally, surrogate models are fit to the sample data using a
plotting the objective functions on which each of the optimiz-
traditional Message Passing Interface (MPI) framework for dis-
ers were operating. Examples of these are shown in Fig. 7. It
tributing the workload across all available CPU cores.
was found that a large number of the objective functions in the
GPO formulation were poorly conditioned, and did not exhibit
Synchronization The one thread per term parallelization
a single minimum corresponding to the true material properties.
paradigm clearly creates a situation where synchronization is re-
Therefore, the numerical optimization routine produced incorrect
quired. In particular the summation of the terms to compute the
results, which when averaged resulted in poor performance.
entire objective function is problematic, and has the potential to
As a result of the problems found in the GPO formulation,
collapse the parallelization of the problem by forcing threads to
the GPE algorithm was ultimately selected for use. In order to
serially access an accumulator variable. In order to avert a pos-
mitigate the effects of repeated synchronizations a parallel re-
sible loss of performance, a pair of schemes were investigated.
duction algorithm was implemented. The parallel reduction al-
These were termed GPU-Parallel Evaluation (GPE) and GPU-
gorithm implemented is outlined in Fig. 8, and discussed in much
Parallel Optimization (GPO). Both possibilities for addressing
greater detail in [48]. All possible options for optimizing this al-
synchronization needs are shown in Fig 6.
gorithm were employed, including sequential addressing of data
The GPE algorithm is based on the concept of evaluating
in fast shared memory and full loop unrolling. Further optimiza-
the terms of the objective function in parallel, accumulating their
tion of the GPE algorithm was conducted by the careful consid-
sum, and then passing the objective function value back to the
eration of the approximation systems in use.
Central Processing Unit (CPU). The CPU is then responsible for
computing the adjustments to the material parameters based on
Approximation GAMMA involves several layers of approxi-
a predetermined optimization scheme. This is the most conven-
mation. The first has already been mentioned, and consists of
tional and cognitively straightforward option, but it does require
the sampling of the system model response at a finite set of
a synchronization at every iteration of the optimization solver.
points. This approximation is tied very tightly to the paralleliza-
In order to avoid repeated synchronization, an alternate GPO
tion scheme outlined previously. There is another explicit ap-
scheme was investigated. In this algorithm, the optimization loop
proximation employed, which consists of the set of surrogate
for each point in the domain So is conducted separately in paral-
models employed to interpolate the system model response. In
lel. In essence, each optimization loop is recovering the material
the previous research described earlier, NURBs-based surrogates
properties at one point in the domain, and the bulk properties
were employed. However, in the process of visualizing the ob-
are thus computed by averaging all of these point computations.
jective functions mentioned in the previous section, it was seen
This seems promising from a synchronization standpoint, as each
that the design spaces were not strongly nonlinear. This implies
optimizer loop can run completely independently with only one
that the choice of NURBs as an interpolator is non-ideal.
synchronization operation required in total. As a result, the GPO
8
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.
DatalValues 5 1 1 2 1 0 2 5 0 9 8 1 5 1 2 4 ... InputlDatalVector to all other memories on the GPU. Because only a few thousand
ThreadlIDs 0 1 2 3 4 5 6 7 Stridel=l8 threads are running on the GPU, it is important that the occur-
rence of stalls when waiting for memory access be minimized.
5 10 9 3 6 1 4 9 0 9 8 1 5 1 2 4 ReductionlStepl1
Since the RSM can be stored in a very fast memory, this choice
0 1 2 3 Stridel=l4 of approximation has a large impact on the performance of the
11 11 13 12 6 1 4 9 0 9 8 1 5 1 2 4 ReductionlStepl2
characterization algorithm.
0 1 Stridel=l2
9
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.
TABLE 1. Sampling data for the first numerical experiment.
xT (subset)
which for the isotropic case takes the matrix form
εxx 1 −ν −ν 0 0 0 σxx
εyy 1 −ν 0 0 0 σyy
FIGURE 9. An infinite isotropic domain with a hole of radius a sub-
εzz 1 1 0 0 0 σzz
2εyz = E
jected to in-plane loading σ∞ . Only a subset of the actual sampling 2(1 + ν) 0 0 σyz
locations xT are shown. 2εxz (sym) 2(1 + ν) 0 σxz
2εxy 2(1 + ν) σxy
(14)
This strain model was evaluated on 1600 locations xi in a uni-
form radial pattern around the hole (r = 1), a subset of which are
a2 3a4
σ∞ seen in Fig. 9. The samples were taken with values of θ = 0
σθ θ = 1 + 2 − 1 + 4 cos (2θ )
2 r r to 2π in increments 40 π
and r = 1 to 3 in increments of 0.1. A
2 4a2 3a4 single frame of data was used, with σ∞ = 50MPa. Table 1 shows
σ∞ a
σrr = 1 − 2 + 1 − 2 + 4 cos (2θ ) (11) data regarding the sampling of the constitutive parameters E and
2 r r r
2 4
ν (Young’s modulus and Poisson’s ratio, respectively) The true
σ∞ 2a 3a
σrθ = − 1 + 2 − 4 cos (2θ ) . parameter values in Table 1 reflect the properties of Al-6061-T6 .
2 r r With 1600 values of xi , at each of which the model of Equations
(30-33) was evaluated at 25 combinations of constitutive param-
The frame reference transformation from the polar to the eter values, the constitutive model was evaluated 40,000 times in
Cartesian coordinate system is implemented by using the well total.
known transform, GAMMA was executed on a workstation computer featur-
ing a quad-core processor operating at 4.8 GHz, 16 GB of sys-
tem memory and a NVIDIA GeForce GTX 650 GPU. It should
1 be noted that this is a very inexpensive GPU, costing less than
σxx = (σrr + σθ θ + (σrr − σθ θ ) cos (2θ ) − 2σrθ sin (2θ )) $100 at the present time. GAMMA was run with the following
2
1 parameters:
σyy = (σrr + σθ θ + (−σrr + σθ θ ) cos (2θ ) + 2σrθ sin (2θ ))
2 1. RSM polynomial order: 3
σxy = σrθ cos (2θ ) + (σrr − σθ θ ) cos (θ ) sin (θ ) 2. Optimization methodology: Newton / gradient descent
(12) 3. CUDA block size: 64 threads
The strains are then computed from the stresses using the inverse
form of Hooke’s constitutive law, Results from the application of GAMMA to the test problem
are shown in Fig. 10, below. The effects of synthetic experimen-
tal noise up to 200 microstrain in magnitude are investigated.
On average, 0.035 seconds were required to recover the material
ε = C −1 : σ (13)
properties.
The results show that the error in E is roughly constant at
a magnitude of 0.5% irrespective of experimental noise. This
implies that the error in the RSM approximation is dominating
errors caused by other factors. However, given that the error is
small, and well within the pre-established goal of 2.5%, this ap-
pears to be an acceptable compromise. The error in ν rises pro-
10
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.
Error in Parameters vs Noise Magnitude
parallelization strategy forces a choice between synchronization
2.5 paradigms (GPO vs GPE). Unfortunately, the approximation of
the material response at a set of points invalidates the GPO syn-
2.0 E chronization paradigm, and promotes the choice of GPE. The
ν GPE strategy requires repeated synchronizations and vector re-
ductions which reduces the performance the algorithm. For-
% Error
1.5
tunately, by making judicious choices regarding the surrogate
model approximation basis and implementing a highly optimized
1.0
parallel-reduce algorithm, this disruption is mitigated.
Because the concepts of parallelization, synchronization,
0.5 and approximation were considered from the start, the GAMMA
implementation effectively moderates the problematic interac-
0.0 tions of these concepts. Because of this, high computational per-
0 50 100 150 200
Noise Level (με) formance is achieved, which enables real-time materials charac-
terization.
FIGURE 10. Mean error in the recover of constitutive parameters E The preliminary results given in this paper are for a linear-
and ν (plot markers) and the associated standard deviation (whiskers) isotropic continuum. However, the underlying surrogate-based
as a function of noise level for the first numeric experiment, when using methodology has been validated on more complex material sys-
the GAMMA GPE algorithm. tem models, including those relevant to engineering composites.
The most immediate future work is thus updating the GAMMA
tool to accommodate these higher-dimensional surrogates. This
portionally to experimental noise, indicating that this noise is the
will only require the implementation of a higher-order RSM ker-
primary determinant of the accuracy of the recovery of ν.
nel, and it is anticipated that the evaluation of higher-dimensional
Compared to the timing results given in previous work [3],
polynomials will not significantly impact the computational per-
GAMMA exhibits a significant performance advantage, compute
formance of GAMMA.
time is reduced by roughly a factor of 10. However, it is impor-
Expanding GAMMA into the domain of composite mate-
tant to note that the results given in that section were developed
rial systems will enable the testing of composite materials. Live
using a different surrogate model formulation, a different opti-
feedback from the characterization algorithm will be available
mization algorithm, and an optimized commercial solver. Com-
during the testing of material specimens, which will allow for
pared to an efficient multi-core CPU implementation using the
the test frame’s load path to be dynamically adjusted to achieve
same algorithms as GAMMA, the performance margin shrinks
maximum excitation of the material’s constitutive parameters. In
slightly to a factor of 8. Given the small number of threads em-
the future, GAMMA may also be used to detect time or load-
ployed, along with the very inexpensive GPU processor used,
dependent changes in the properties of materials as well. Finally,
this is evidence that the GAMMA implementation is reasonably
with the ability to rapidly achieve characterizations, it may be
efficient.
feasible to partition the test data into sub-regions and capture the
variations in material properties throughout the spatial domain.
CONCLUSIONS AND PLANS
In this paper, we have presented the background, theory, and
Acknowledgments
methodology associated with a surrogate model-based technique
The authors acknowledge the Office of Naval Research for sup-
for materials characterization. Following this, the discussion
porting this research through both the Naval Research Enterprise
and details of a GPGPU-accelerated implementation of this al-
Internship Program (NREIP) and the Naval Research Laboratory
gorithm is given. The characterization algorithm was developed
(NRL) core funding. This research was performed while one au-
from the ground up with an awareness of massively parallel na-
thor held a National Research Council Research Associateship
ture of this computing technology. As a result, the close inter-
Award at NRL.
relationships between parallelization, synchronization, and ap-
proximation have been carefully considered and are readily ob-
servable.
In the materials characterization algorithm, the choice to
parallelize the problem by sampling the material system response
at a finite set of discrete points essentially defines an approx-
imation of the underlying material physics. Furthermore, this
11
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.
References tion in Engineering Conference August 4-7, 2013, Portland,
[1] J. Michopoulos, J. Hermanson, and A. Iliopoulos, 2010, OR, USA.
“Towards a Recursive Hexapod for the Multidimensional [14] Iliopoulos A, Michopoulos JG. Direct strain tensor approx-
Mechanical Testing of Composites,” Proceedings of the imation for full-field strain measurement methods. Interna-
ASME 2010 IDETC/CIE Conferences, Montreal, Canada. tional Journal for Numerical Methods in Engineering 2013;
DETC2010-13221 95:313-330.
[2] Michopoulos JG, Hermanson JC, Iliopoulos AP, Lambrakos [15] Michopoulos J, Hermanson JC, Furukawa T, Iliopoulos A.
SG, Furukawa T. Data-driven design optimization for com- A Framework For The Automated Data-Driven Constitutive
posite material characterization. Journal of Computing and Characterization Of Composites. Proceedings of the 17th
Information Science in Engineering 2011; 11(2):1-10. International Conference on Composite Materials, ICCM-
[3] J. Steuben, J. Michopoulos, A. Iliopoulos, and C. Turner, 17 2009.
2012, “Inverse Characterization of Composite Materials Us- [16] Michopoulos J, Hermanson JC, Iliopoulos AP, Lambrakos S,
ing Surrogate Models,” Proceedings of the ASME 2013 Furukawa T. On the Constitutive Response Characterization
IDETC/CIE Conferences, Portland, OR. DETC2013-13221 for Composite Materials via Data-Driven Design Optimiza-
[4] Sevenhuijsen J. Two simple methods for deformation tion, Proceedings of the 2011 ASME International Design
demonstration and measurement. Strain 1981; 17(1):20-24. Engineering Technical Conferences & Computers and In-
[5] Bruck H, McNeill S, Sutton M, Peters W. Digital image cor- formation in Engineering Conference August 28 - 31, 2011,
relation using newton-raphson method of partial differential Washington, DC, USA, DETC2011-47740.
correction. Experimental Mechanics 1989; 29:261-267. [17] Michopoulos J, Iliopoulos A, Hermanson JC, Orifici AC,
[6] Andrianopoulos NP, Iliopoulos AP. Strain measurements by Thomson RS. Preliminary Validation Of Composite Mate-
a hybrid experimental-numerical method using a mesh-free rial Constitutive Characterization. Proceedings of the ASME
field function, Honorary Volume for Professor P. S. Theo- 2012 International Design Engineering Technical Confer-
caris Armenian Academy of Sciences, 2005; 31-41. ences & Computers and Information in Engineering Con-
[7] Michopoulos JG, Iliopoulos AP. A Computational Work- ference August 13-15, 2012, Chicago, IL, USA.
bench for Remote Full Field 2D Displacement and Strain [18] Michopoulos J, Hermanson JC, Iliopoulos A. First Industrial
Measurements. ASME Conference Proceedings, 2009; 55- Strength Multi-Axial Robotic Testing Campaign For Com-
63. posite Material Characterization. Proceedings of the ASME
[8] Iliopoulos A, Michopoulos JG. Effects of Anisotropy on 2012 International Design Engineering Technical Confer-
the Performance Sensitivity of the Mesh-Free Random Grid ences & Computers and Information in Engineering Con-
Method for Whole Field Strain Measurement. ASME Con- ference August 13-15, 2012, Chicago, IL, USA.
ference Proceedings 2009; 65-74. [19] Iliopoulos A, Michopoulos J, Hermanson, JC. Composite
[9] Michopoulos JG, Iliopoulos AP. A Computational Work- Material Testing Data Reduction To Adjust For The Sys-
bench for Remote Full Field 2D Displacement and Strain tematic 6-Dof Testing Machine Aberrations. Proceedings of
Measurements. ASME Conference Proceedings 2009; 55- the ASME 2012 International Design Engineering Technical
63. Conferences & Computers and Information in Engineering
[10] Michopoulos JG, Iliopoulos A. A Computational Work- Conference August 13-15, 2012, Chicago, IL, USA.
bench for Remote Full Field 3D Displacement and Strain [20] Meuwissen M, Oomens C, Baaijens F, Petterson R,
Measurements. ASME Conference Proceedings, 2011; 489- Janssen J. Determination of the elasto-plastic proper-
498. ties of aluminum using a mixed numerical-experimental
[11] Iliopoulos A, Michopoulos J. Direct Strain Imaging For Full method. Journal of Materials Processing Technology 1998;
Field Measurements. Proceedings of the ASME 2012 In- 75(1):204 -211.
ternational Design Engineering Technical Conferences & [21] Kajberg J, Lindkvist G. Characterisation of materials sub-
Computers and Information in Engineering Conference Au- jected to large strains by inverse modeling based on in-
gust 13-15, 2012, Chicago, IL, USA. plane displacement elds. International Journal of Solids and
[12] Iliopoulos AP, Michopoulos JG, Andrianopoulos NP. Per- Structures 2004; 41(13):3439-3459.
formance Analysis of the Mesh-Free Random Grid Method [22] Schmidt T, Tyson J, Galanulis. Full-eld dynamic displace-
for Full-Field Synthetic Strain Measurements Strain, Black- ment and strain measurement using advanced 3d image cor-
well Publishing Ltd, 2012; 48:1-15. relation photogrammetry: Part 1. Experimental Techniques
[13] Iliopoulos A, Michopoulos JG, Hermanson JC. Performance 2003; 27(3):47-50.
analysis and experimental validation of the direct strain [23] Molimard J, Le Riche R, Vautrin A, Lee J. Identication of
imaging method. ASME 2013 International Design Engi- the four orthotropic plate stiffnesses using a single open-hole
neering Technical Conferences & Computers and Informa- tensile test. Experimental Mechanics 2005; 45:404-411.
12
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.
[24] Grediac M, Toussaint E, Pierron F. Special virtual fields for [38] Nelder JA, Mead R. A Simplex Method for Function Mini-
the direct determination of material parameters with the vir- mization. The Computer Journal 1965; 7(4):308-313.
tual fields method. 1principle and definition. International [39] Lagarias JC, Reeds JA, Wright MH, Wright PE. Conver-
Journal of Solids and Structures 2002; 39(10):2691-2705. gence Properties of the Nelder-Mead Simplex Method in
[25] Pierron F, Vert G, Burguete R, Avril S, Rotinat R, Wis- Low Dimensions. SIAM Journal of Optimization 1998;
nom MR. Identication of the orthotropic elastic stiffnesses of 9(1):112-147.
composites with the virtual fields method: Sensitivity study [40] Goddeke, D., 2011, Fast and Accurate Finite-Element
and experimental validation. Strain 2007; 43(3):250-259. Multigrid Solvers for PDE Simulations on GPU Clusters,
[26] Michopoulos JG, Iliopoulos AP, Furukawa T. Accuracy of Logos Verlag, Berlin, Germany.
inverse composite laminate characterization via the mesh [41] Krawezik, G.P., Poole, G., 2010, “Accelerating the AN-
free random grid method. ASME International Design En- SYS direct sparse solver with GPUs,”Proceedings of the
gineering Technical Conferences and Computers and Infor- 2010 Symposium on Application Accelerators in High Per-
mation in Engineering Conference 2009; 2:367-374. formance Computing, Available: http://saahpc.ncsa. illi-
[27] Furukawa T, Michopoulos JG,Kelly DW. Elastic character- nois.edu/09/papers/Krawezik paper.pdf.
ization of laminated composites based on multiaxial tests. [42] Tabatabaian M, 2013, Comsol for Engineers, Mercury
Composite Structures 2008; 86(1 - 3):269-278. Learning and Information.
[28] Furukawa T, Michopoulos JG. Online planning of multiaxial [43] Kruger, J., Westermann, R., 2003, “Linear algebra operators
loading path for elastic material identification. Computer for GPU implementation of numerical algorithms,” ACM
Methods in Applied Mechanics and Engineering 2008; 197( Transactons on Graphics, 22(3):908-916.
9 - 12): 885-901. [44] Li, R., Saad, Y., 2013, “GPU-accelerated preconditioned it-
[29] Michopoulos JG, Furukawa T, Lambrakos SG. Data-driven erative linear solvers,” Journal of Supercomputing, 63(2):
characterization of composites based on virtual determinis- 443-466.
tic and noisy multiaxial data. ASME International Design [45] Courtier, R., 2013, Designing Scientific Applications on
Engineering Technical Conferences and Computers and In- GPUs, CRC Press, Boca Raton, FL.
formation in Engineering Conference 2008; 2:1095-1106. [46] NVIDIA, 2010, “Introduction to CUDA C”,
[30] Furukawa T, Pan JW. Stochastic identification of elastic con- GPU Technology Conference 2010, Available at
stants for anisotropic materials. International Journal for http://www.nvidia.com/content/GTC-2010/
Numerical Methods in Engineering 2010; 81(4):429-452. pdfs/2131_GTC2010.pdf.
[31] Samarasinghe S. Neural networks for applied sciences and [47] NVIDIA, 2012, “NVIDIA CUDA C Programming
engineering, in From Fundamentals to Complex Pattern Guide”, Available at http://developer.download.
Recognition. Auerbach Publications: Massachusetts, 1988. nvidia.com/compute/DevZone/docs/html/C/
[32] Michopolous J. Constitutive characterization of composites doc/CUDA_C_Programming_Guide.pdf.
via artificial neural nets. Tech. Rep. Code 6383 Internal [48] Harris M, “Optimizing Parallel Reduction in CUDA,”
Report. Naval Research Laboratory: Washington DC 1992. NVIDIA Developer Technology, Presentation available
[33] Furukawa T, Yagawa G. Implicit constitutive modeling for online at http://developer.download.nvidia.
viscoplasticity using neural networks. International Jour- com/compute/cuda/1.1-Beta/x86_website/
nal for Numerical Methods in Engineering 1998; 43(2):195- projects/reduction/doc/reduction.pdf.
219.
[34] Turner CJ. HyPerModels: Hyperdimensional Performance
Models for engineering Design. PhD thesis, The University
of Texas at Austin 2005.
[35] Turner CJ, Crawford RH. Selecting an appropriate meta-
model: The case for NURBs metamodels. ASME Interna-
tional Design Engineering Technical Conferences and Com-
puters and Information in Engineering Conference 2005;
2:759-771.
[36] Aarts EHL, Korst J. Simulated annealing and Boltzmann
machines: a stochastic approach to combinatorial optimiza-
tion and neural computing, Wiley, 1989.
[37] Kirkpatrick S. Optimization by Simulated Annealing: Quan-
titative Studies. Journal of Statistical Physics 1984; 34(5-
6):975-986.
13
This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
Approved for public release; distribution is unlimited.