You are on page 1of 14

Self-supervised optimization of random material microstructures in the small-data regime

Maximilian Rixner1 and Phaedon-Stelios Koutsourelakis*,1,2


1
Technical University of Munich, Faculty of Mechanical Engineering, Professorship of Continuum Mechanics,
www.contmech.mw.tum.de
2
Munich Data Science Institute (MDSI - Core Member), www.mdsi.tum.de

Abstract
While the forward and backward modeling of the process-structure-property chain has received a lot of attention from the
materials community, fewer efforts have taken into consideration uncertainties. Those arise from a multitude of sources and
their quantification and integration in the inversion process are essential in meeting the materials design objectives. The first
contribution of this paper is a flexible, fully probabilistic formulation of such optimization problems that accounts for the
uncertainty in the process-structure and structure-property linkages and enables the identification of optimal, high-dimensional,
process parameters. We employ a probabilistic, data-driven surrogate for the structure-property link which expedites computations
and enables handling of non-differential objectives. We couple this with a novel active learning strategy, i.e. a self-supervised
collection of data, which significantly improves accuracy while requiring small amounts of training data. We demonstrate its
efficacy in optimizing the mechanical and thermal properties of two-phase, random media but envision its applicability encompasses
arXiv:2108.02606v1 [stat.ML] 5 Aug 2021

a wide variety of microstructure-sensitive design problems.

Introduction putational problem arising from the presence of uncertainties, we


recast the stochastic optimization as a probabilistic inference task
Inverting the process-structure-property (PSP) relationships rep- and employ approximate inference techniques based on Stochastic
resents a grand challenge in materials science as it holds the po- Variational Inference (SVI, [16]).
tential of expediting the design of new materials with superior In terms of the stochastic formulation of the problem, our work
performance [1, 2]. most closely resembles that of [17] where they seek to identify
While significant progress has been made in the forward a probability density on microstructural features which would
and backward modeling of the process-structure and structure- yield a target probability density on the corresponding properties.
property linkages and in capturing the nonlinear and multiscale While this poses a challenging optimization problem, producing
processes involved [3], much fewer efforts have attempted to in- a probability density on microstructural features does not pro-
tegrate uncertainties which are an indispensable component of vide unambiguous design guidelines. In contrast, we operate on
materials’ analysis and design [4, 5] since a) process variables (and average over) the whole microstructure and consider a much
do not fully determine the resulting microstructure but rather a wider range of design objectives. In [18] random microstructures
probability distribution on microstructures [6], b) noise and in- were employed but their macroscopic properties were insensitive
completeness are characteristic of experimental data that are used to their random variability (due to scale-separation) and low-
to capture process-structure (most often) and structure-property dimensional parametrizations of the two-point correlation func-
relations [7], c) models employed for the process-structure or tion were optimized using gradient-free tools. In a similar fash-
structure-property links are often stochastic and there is uncer- ion, in [19, 20] analytic, linear models are employed which given
tainty in their parameters or form, especially in multiscale formu- small and Gaussian uncertainties on the macroscopic properties,
lations [8], and d) model compression and dimension reduction find the underlying orientation distribution function (ODF) of
employed in order to gain efficiency unavoidably leads to some the crystalline microstructure. In [21, 22], averaged macroscopic
loss of information which in turn gives rise to predictive uncer- properties (ignoring the effects of crystal size and shape) are com-
tainty [9]. As a result, microstructure-sensitive properties can puted with respect to the ODF of the polycrystalline microstruc-
exhibit stochastic variability which should be incorporated in the ture and on the basis of their targeted values, the corresponding
design objectives. ODF is found. While data-based surrogates were also employed,
(Back-)propagating uncertainty through complex and poten- the problem formulation did not attempt to quantify the effect of
tially multiscale models poses significant computational difficul- microstructural uncertainties.
ties [10]. Data-based surrogates can alleviate these as long as the In terms of surrogate development, in this work we focus
number of training data, i.e. the number of solves of the complex on the microstructure-property link and consider random, bi-
models they would substitute, is kept small. In this small-data nary microstructures, the distribution of which depends on some
setting additional uncertainty arises due to the predictive inaccu- processing-related parameters. We develop active learning strate-
racy of the surrogate. Quantifying it can not only lead to more gies that are tailored to the optimization objectives. The latter
accurate estimates but also guide the acquisition of additional account for the stochasticity of the material properties (as well as
experimental/simulation data. the predictive uncertainty of the surrogate) i.e. we enable the so-
We note that problem formulations based on Bayesian Opti- lution of optimization under uncertainty problems. The method-
mization [11, 12, 13]) account for uncertainty in the objective ological framework presented enables the control of a stochas-
solely due to the imprecision of the surrogate and not due to the tic process giving rise to the discovery of random heterogeneous
aleatoric, stochastic variability of the underlying microstructure. materials, that exhibit favourable properties according to some
In the context of optimization/design problems in particular, a notion of optimality.
globally-accurate surrogate would be redundant. It would suf-
fice to have a surrogate that can reliably drive the optimization
process to the vicinity of the local optimum (or optima) and can Methods
sufficiently resolve this (those) in order to identify the optimal
A conceptual overview of the proposed stochastic-inversion frame-
value(s). Since the location of optima is unknown a priori this
work is provided in Fig. 1 and is contrasted with deterministic
necessitates adaptive strategies where the surrogate-training and
formulations. We present the main building blocks and modeling
optimization are intertwined.
assumptions and subsequently define the optimization problems
We emphasize that unlike successful efforts e.g. in topology of interest. We then discuss associated challenges, algorithmic
optimization [14] or general heterogeneous media [15] which find steps and conclude this section with details regarding the proba-
a single, optimal microstructure that maximizes some property- bilistic surrogate model and the active learning strategy.
based objective, our goal is more ambitious but also more con- We define the following variables/parameters:
sistent with the physical reality. We attempt to find the optimal
distribution of microstructures from the ones that are realizable • process parameters ϕ ∈ Rdφ : These are the optimization
from a set of processing conditions (Fig. 1). To address the com- variables and can parametrize actual processing variables

1
Stochastic Process Microstructures Effective Properties
p(x|ϕ) x p(κ|ϕ)

Stochastic Inversion
ϕ∗ = arg maxϕ Ep(x|ϕ) [u (κ (x))]
1
2 P r (κ ∈ K|ϕ1 )

ϕ1 3 κ2

Structure-property linkage κ (x)


3
2
Stochastic process p (x|ϕ)
Process Parameters

1 p (κ|ϕ1 )
1
2 K
2
ϕ2 3
3
ϕ
p (κ|ϕ2 )

κ1
2
1 1
2 3
ϕ3
3 p (κ|ϕ3 )

Deterministic Inversion
x∗ = arg maxx u (κ (x))

Figure 1: Conceptual overview. Given stochastic process-structure and structure-property links, we identify the process parame-
ters ϕ∗ which maximize the expected utility Ep(x|ϕ) [u (κ)] (Illustration based on the specific case u (κ) = IK (κ)). (Micro)Structures
x arise from a stochastic process through the density p (x|ϕ) that depends on the process parameters ϕ. A data-driven surrogate is
employed to predict properties κ which introduces additional uncertainty.

(e.g. chemical composition, annealing temperature) or sta- the process-structure link p (x|ϕ) is given, and its particu-
tistical descriptors (e.g. ODF) that might be linked to the lar form for the binary media examined is explained in the
processing. In general, high-dimensional ϕ would need to be sequel.
considered which can afford greater flexibility in the design
process. • structure-property link: The calculation of the properties κ
for a given microstructure x involves in general the solu-
• random microstructures x: This is in general a very high- tion of a stochastic or deterministic, complex, high-fidelity
dimensional vector that represents the microstructure with model (in our numerical illustrations, this consists of partial
the requisite detail to predict its properties. In the numerical differential equations (PDEs)). We denote the correspond-
illustrations which involve two-phase media in d = 2 dimen- ing conditional density as p(κ|x), which in the case of a
sions represented on a uniform grid with Np subdivisions per deterministic model degenerates to a Dirac-delta. For opti-
d
dimension, x ∈ {0, 1}Np consists of binary variables which mization purposes, repeated such computations are needed
indicate the material phase of each of the pixels (Figure 4). and especially in high-dimensional settings derivatives of the
We emphasize that x is a random vector due to the stochas- properties with respect to x are also required in order to
tic variability of microstructures even in cases where ϕ is the drive the search. Such derivatives might be either unavail-
same (see process-structure link below). able (e.g. when x is binary as above), or, at the very least,
will add to the computational burden. To overcome this ma-
• properties κ: This vector represents the material properties jor efficiency hurdle we advocate the use of a data-driven
of interest which depend on the microstructure x. We denote surrogate model. We denote with D the training data (i.e.
this dependence with some abuse of notation as κ(x) and pairs of inputs-microstructures and outputs-properties κ(x))
discuss it in the structure-property link below. Due to this and explain in the sequel how these are selected (see section
dependence, κ ∈ Rdk will also be a random vector. In the on Active Learning). We employ a probabilistic surrogate
numerical illustrations κ consists of mechanical and thermal, model denoted by M and use pM (κ|x, D) to indicate its
effective (apparent) properties. predictive density (we will elaborate on the reasons for using
Furthermore, we note: a probabilistic surrogate in the subsequent sections).

• process-structure link: We denote the dependence between With these definitions in hand, we proceed to define two closely
ϕ and x with the conditional density p(x|ϕ) (Figure 1) that related optimization problems (O1) & (O2) that we would like to
reflects the fact that processing parameters do not in general address. For the first optimization problem (O1) we make use
uniquely determine the microstructural details. Formally ex- of a utility function u(κ) ≥ 02 , specific examples of which are
perimental data [23] and/or models [24]1 would need to be provided in the sequel. Due to the uncertainty in κ we consider
used to determine p(x|ϕ). We also note that no a-priori di- the expected utility U (ϕ) which is defined as:
mensionality reduction is implied, i.e., the full microstruc- Z 
tural details are retained and employed in the property- U1 (ϕ) = Ep(x|ϕ) u(κ) p(κ|x) dκ (1)
predicting, high-fidelity models. In this work, we assume
1 2
The binary microstructures considered for our numerical illustrations Negative-valued utility functions can be employed as long as they are
could arise from the solution of the Cahn-Hilliard equation describing phase bounded from below i.e. u(κ) ≥ u0 > −∞ in which case u(κ) − u0 should
separation occurring in a binary alloy under thermal annealing be used in place of u(κ)

2
fc2a

16
block-4
12
block-3
8 fc2b
fc1
block-2
4
block-
1

Figure 2: Architecture of the convolutional-neural-network surrogate for property κ prediction. Features are extracted
from the microstructure x using a sequence of 4 blocks (each comprised of a sequence of convolutional layer, non-linear activation
function and pooling), where in each block the size of the feature map is reduced, while the depth of the feature map increases.
Fully connected feedforward layers map the extracted convolutional features to the mean mθ (x) and the covariance Sθ (x) of the
predictive Gaussian distribution pM (κ|x, θ) = N (κ| mθ (x) , Sθ (x)), where θ denotes the neural network parameters.

(where Ep(x|ϕ) [.] implies an expectation with respect to p (x|ϕ)) Expectation-Maximization and Stochastic Variational
and seek the processing parameters ϕ that maximize it, i.e.: Inference

(O1): ϕ∗ = arg max U (ϕ) (2) We present the proposed algorithm for the solution of (O1) and
ϕ discuss the requisite changes for (O2) afterward. Due to the in-
D (ϕ) (see Eq. (2)) and
tractability of the objective function U1,M
Consider for example u(κ) being the indicator function IK (κ) its derivatives, we employ the Expectation-Maximization scheme
of some target domain K, defining the desired range of prop- [26] which is based on the so-called Evidence Lower BOund
erty values (Figure 3a). In this case, solving (O1) above will (ELBO) F :
lead to the value of ϕ that maximizes the probability that the
resulting material will have properties in the target domain K, D (ϕ)
R
log U1,M = log u(κ) pM (κ|x, D) p(x|ϕ) dκ dx
i.e. U (ϕ) = p (κ ∈ K|ϕ). Similar probabilistic objectives have " #
u(κ) pM (κ|x, D) p(x|ϕ)
been proposed for several other materials’ classes and models (e.g. ≥ Eq(x,κ) log (4)
[25]). Another possibility of potential practical interest involves q(x, κ)
2
u(κ) = e−τ ||κ−κtarget || 3 in which case solving (O1) leads to = F (q (κ, x) , ϕ)
the material with properties which, on average, are closest to the
prescribed target κtarget (Figure 3b). where Eq(x,κ) [.] denote the expectation with respect to the auxil-
The second problem we consider involves prescribing a tar- iary density q(x, κ). The algorithm alternates between maximiz-
get density ptarget (κ) on the material properties and seeking ing F with respect to the density q(x, κ) while ϕ is fixed (E-step)
the ϕ that leads to a marginal density of properties p (κ|ϕ) = and maximizing with respect to ϕ (M-step) while q(x, κ) is fixed.
Ep(x|ϕ) [p (κ|x)] that is as close as possible to the target den- We employ a Variational-Bayesian relaxation [27], in short VB-
sity (Figure 3c). While there are several distance measures in EM, according to which instead of the optimal q we consider a
the space of densities, we employ here the Kullback-Leibler di- family Qξ of densities parameterized by ξ and in the E-step max-
vergence KL(ptarget (κ)||p(κ|ϕ)), the minimization of which is imize F with respect to ξ. This, as well as the the maximization
equivalent to: with respect to ϕ in the M-step, is done by using stochastic gra-
dient ascent where the associated derivatives are substituted by
(O2): ϕ∗ = arg noisy Monte Carlo estimates (i.e., Stochastic Variational Infer-
R maxϕ U2 (ϕ) (3) ence [16]). The particulars of ξ as well as of the E- and M-steps
where U2 (ϕ) = ptarget (κ) log p(κ|ϕ) dκ
are discussed in the Supplementary Information. We illustrate
The aforementioned objective resembles the one employed in [17], the basic, numerical steps in the inner-loop of Algorithm 1 (the
but rather than finding a density on the microstructure (or fea- algorithm starts from an initial, typically random, guess of ξ and
tures thereof) that leads to a close match of ptarget (κ), we iden- ϕ). Colloquially, the VB-EM iterations can be explained as fol-
tify the processing variables ϕ that do so. lows: In the E-step and given the current estimate for ϕ, one
We note that both problems are considerably more challeng- averages over microstructures that are not only a priori more
ing than deterministic counterparts, as in both cases the objec- probable according to p(x|ϕ) but also achieve a higher score ac-
tives involve expectations with respect to the high-dimensional cording to u(κ) pM (κ|x, D). Subsequently, in the M-step step,
vector(s) x (and potentially κ), representing the microstructure we update the optimization variables ϕ on the basis of the average
(and their effective properties). Additionally in the case of (O2), above (see Supplementary Information - VB-EM-Algorithm).
the analytically intractable density p(κ|ϕ) appears explicitly in The second objective, U2,M (Eq. (3)) can be dealt with in a
the objective. While one might argue that a brute-force Monte similar fashion. As it involves an integration over κ with respect
Carlo approach with a sufficiently large number of samples would to the target density ptarget (κ), it can first be approximated
suffice to carry out the aforementioned integrations, we note that using S Monte Carlo samples {κ(s) }S s=1 from the ptarget (κ) and
propagating the uncertainty from x to the properties κ would also subsequently each of the terms in the sum can be lower-bounded
require commensurate solves of the expensive structure-property as follows:
model which would need to be repeated for various ϕ-values. To
overcome challenges associated with the structure-property link, D (ϕ)
R
we make use of the surrogate model i.e. instead of the true p(κ|x) U2,M = ptarget (κ) log pM (κ|ϕ, D) dκ
1
PS
in the expressions above we make use of pM (κ|x, D), based on ≈ log pM (κ(s) |ϕ, D)
Ps=1
S
1 S R (s) |x, D) p(x|ϕ) dx
a surrogate model M. The reformulated objectives are denoted = S s=1 log pM"(κ
D D
#
with U1,M and U2,M and the specifics of the probabilistic sur-
1 S pM (κ(s) |x, D) p(x|ϕ)
≥ S s=1 Eq(s) (x) log
P
rogate, as well as the solution strategy proposed are explained in
q (s) (x)
the sequel. 1 PS (s)

= S s=1 Fs q (x) , ϕ
3
where τ is merely a scaling parameter (5)

3
p(κ|ϕ∗ ) p(κ|ϕ∗ ) p(κ|ϕ∗ )

κ2 κ2 κ2
K κtarget ptarget (κ)

κ1 κ1 κ1

(a) Target domain K (b) Target value κtarget (c) Target distribution ptarget (κ)

Figure 3: Illustration of various materials design objectives. Different optimization objectives with respect to the density
p (κ|ϕ) that expresses the likelihood of property values κ for given processing conditions ϕ. We illustrate the following cases: (a)
we seek to maximize the probability that the material properties κ fall within a target domain K. (b) we seek to minimize the
mean deviation of the properties κ from a target value κtarget . (c) we seek to minimize the deviation between p (κ|ϕ) and a target
probability density ptarget (κ) on the material properties.

In this case, the aforementioned stochastic variational we will discuss an adaptive acquisition strategy in the following
inference tools will need to be applied for updating each section. While the results obtained are based on this particular
q (s) (x) , s = 1, ..., S in the E-step, but the overall algorithm architecture of the surrogate, the methodological framework pro-
remains conceptually identical. We note that incremental and posed can accommodate any probabilistic surrogate and integrate
partial versions where, e.g., a subset of the q (s) are updated with its predictive uncertainty in the optimization procedure.
one or more steps of stochastic gradient ascent are possible [28]
and can lead to improved computational performance. Active Learning
Active learning refers to a family of methods whose goal is to
improve learning accuracy and efficiency by selecting particularly
Probabilistic surrogate model
salient training data [33]. This is especially relevant in our ap-
In order to overcome the computational roadblock imposed by the plications as data acquisition is de facto the most computation-
high-fidelity model employed in the structure-property link (i.e. ally expensive component. The basis of all such methods is to
κ(x) or p(κ|x)), we substitute it with a data-driven surrogate progressively enrich the training dataset by scoring candidate in-
which is trained on N pairs of data puts (i.e. microstructures x in our case) based on their expected
oN informativeness [34]. The latter can be quantified with a so-
called acquisition function α(x) , for which many different forms
n 
D= x(n) , κ(n) = κ x(n) (6)
n=1 have been proposed (depending on the specific setting). We note
While such supervised machine learning problems have been stud- though that in most cases in the literature, acquisition functions
ied extensively and a lot of the associated tools have found their associated with the predictive accuracy of the supervised learning
way in materials applications [29], we note that their use in the model have been employed, which in our formulation translates
context of the optimization problems presented requires signifi- to the accuracy of our surrogate in predicting the properties κ for
cant adaptations. an input microstructure. In this regard, since a measure of the
In particular and unlike canonical, data-centric applications predictive uncertainty is the covariance SθD (x) above, one might

which rely on the abundance of data (Big Data), we operate un- define e.g. α(x) = trace SθD (x) . Alternative acquisition func-
der a smallest-possible-data regime. This is because in our setting tions have been proposed in the context of Bayesian Optimization
training data arise from expensive simulations, and consequently problems which as explained in the introduction exhibit signifi-
we are interested in minimizing the number of data points which cant differences with ours [11]. While it is true that a perfect
have to be generated. The shortage of information generally leads surrogate, i.e., when p(κ|x) = pM (κ|x, D) ∀x, would yield the
to increased predictive uncertainty which, rather than dismissing, exact optimum, imperfect surrogates could be useful as long as
we quantify by employing a probabilistic surrogate that yields a they can correctly guide the search in the ϕ-space and lead to
predictive density pM (κ|x, D) instead of mere point estimates. optimal value of (O1) or (O2). Hence an accurate surrogate for
More importantly though, we note that the distribution of the ϕ−values (and corresponding microstructures x) far away from
inputs in D, i.e. the microstructures x, changes drastically with the optimum is not necessary. The difficulty of course is that we
ϕ (Figure 1). As we do not know a priori the optimal ϕ∗ , we do not know a priori where the optimum lies or not and a surro-
cannot generate training data from p(x|ϕ∗ ) and data-driven sur- gate trained on microstructures drawn from p(x|ϕ(0) ) where ϕ(0)
rogates generally produce poor extrapolative, out-of-distribution is the initial guess in the optimization scheme (see Algorithm 1),
predictions [30]. It is clear therefore, that the selection of the will generally perform poorly at other ϕ’s.
training data, i.e. the microstructures-inputs x(n) for which we The acquisition function that we propose incorporates the op-
pay the price of computing the output-property of interest κ(n) , timization objectives. In particular for the (O1) problem (Eq.
should be integrated with the optimization algorithm in order (1)) we propose:
to produce a sufficiently accurate surrogate while keeping N as
α(x) = VarpM (κ|x,D) [u(κ)] (7)
small as possible. We defer a detailed discussion of this aspect
for the next section, and will first present the particulars of the In this particular form, α scores each microstructure x in terms
surrogate model employed. of the predictive uncertainty in the utility u (the expected value
The probabilistic surrogate adopted for the purpose of driving of which we seek to maximize) due to the predictive density of
the optimization algorithm makes use of a multivariate Gaussian the surrogate. In the case discussed earlier where u(κ) = IK (κ)
likelihood, i.e. κ|x ∼ N (mθ (x), Sθ (x)), where the mean mθ (x) (and U1 (ϕ) = P r (κ ∈ K|ϕ)), the acquisition function reduces to
and covariance Sθ (x) are modeled with a convolutional neural the variance of the event κ ∈ K. This suggests that the acqui-
network (CNN) (see Figure 2 and Suppl. Information - Prob- sition function yields the largest scores for microstructures for
abilistic Surrogate), with θ denoting the associated parameters. which the surrogate is most uncertain whether their correspond-
The use of such convolutional neural networks has been previously ing properties belong in the target domain K
proposed for property prediction in binary media in e.g., [31, 32]. In terms of the overall procedure, we propose an outer
Point estimates θD of the parameters are obtained with the help loop, within which the VB-EM-based optimization is embed-
of training data D by maximizing the corresponding likelihood ded, such that if D(l) denotes the training dataset at iteration
p (D|θ). On the basis of these estimates, the predictive density l, pM (κ|x, D(l) ) the corresponding predictive density of the sur-
(i.e for a new input-microstructure x) of the surrogate follows as rogate, q (l) (x) the marginal variational density found in the last
pM (κ|x, D) = N mθD (x), SθD (x) . We emphasize the depen- E-step and ϕ(l) the optimum found in the last M-step, we aug-
dence of the probabilistic surrogate on the dataset D, for which ment the training dataset as follows (see Algorithm 1):

4
• we randomly generate a pool of candidate microstructures phases 0 and 1). In the following plots, phase 1 is always shown
Npool as white and phase 0 always as black. We note that the depen-
{x(l,n) }n=1 from q (l) (x) and select a subset of Nadd <
Npool microstructures which yield the highest values of the dence of effective properties on (low-dimensional) microstructural
acquisition function α(x(l,n) ) features (analogous to ϕ) has been considered, in e.g. [43, 44] but
the random variability in these properties has been ignored either
• We solve the high-fidelity model for the aforementioned Nadd by considering very large RVEs or by averaging over several of
(l) them. We emphasize finally that the framework proposed can
microstructures and construct a new training dataset Dadd
(l)
accommodate any high-fidelity model for the structure-property
which we add to D(l) in order to form D(l+1) = D(l) ∪ Dadd . link as this is merely used as a generator of training data D.
We retrain 4 the surrogate based on D(l+1) , i.e. we compute
pM (κ|x, D(l+1) ), and restart the VB-EM-based optimiza- Case 1: Target domain of multi-physics properties (O1)
tion algorithm with the now updated surrogate.
In the following we will demonstrate the performance of the pro-
For the (O2) problem above, we propose to select microstruc- posed formulation in an (O1)-type stochastic optimization prob-
tures that yield the highest predictive log-score on the sample lem, with regards to both thermal as well as mechanical proper-
representation {κ(s) }S
s=1 of the target distribution, i.e. ties. Additionally, we will provide a systematic and quantitative
assessment of the benefits of the active learning strategy proposed
(as compared to randomized data generation).
S
1 X   We consider a combination of mechanical and thermal proper-
α (x) = log pM κ(s) x κ(s) ∼ ptarget (κ) (8)

S s=1 ties of interest, namely:
1  eff 
κ1 = [aeff ]11 , κ2 = [C ]1111 + [Ceff ]2222 (9)
Results & Discussion 2

In the following, we present two applications of the proposed i.e., κ ∈ R2+ , and define the target domain:
framework for (O1)- and (O2)-type optimization problems. We
first elaborate on the specific choices for the process parameters K = [8.5, 11.0] × [6.75, 9.0] (10)
ϕ, the random microstructures x and their properties κ as well
The utility function u(κ) = IK (κ) is the (non-differentiable) in-
as the associated PSP links.
dicator function of K ⊂ R2+ which implies that the objective of
Process ϕ - Microstructure x: In all numerical illustra- the optimization (type (O1) - see Figure 3a) is to find the ϕ that
tions we consider statistically homogeneous, binary (two-phase) maximizes the probability that the resulting microstructures have
microstructures which upon spatial discretization (on a uniform, properties κ that lie in K. The two-phase microstructures have
two-dimensional Np × Np grid with Np = 64) are represented volume fraction 0.5 and the parameters ϕ ∈ R100 as well as p(x|ϕ)
by a vector x ∈ {0, 1}4096 . The binary microstructures are mod- were defined as discussed in the beginning of this section6 .
eled by means of a thresholded zero-mean, unit-variance Gaussian With regards to the adaptive learning strategy (appearing as
field [36, 37]. If the vector xg denotes the discretized version of the outer loop in Algorithm 1), we note: the initial training
the latter (on the same grid), then the value at each pixel i is dataset D(0) consists of N0 = 2048 data pairs. This is gen-
given by xi = H(xg,i − x0 ) where H(·) denotes the Heaviside erated via ancestral sampling, i.e. we randomly draw samples
function and x0 the cutoff threshold. which determines the vol- ϕ from N (0, I) 7 and conditionally on each ϕ(n) we sample
ume fractions of the resulting binary field. We parameterize with p(x|ϕ(n) ) to generate a microstructure. In each data acquisi-
ϕ the spectral density function (SDF) of the underlying Gaus- tion step l, Npool = 4096 candidates were generated and a subset
sian field (i.e. the Fourier transform of its autocovariance) using of Nadd = 1024 candidates was selected based on the acquisition
a combination of radial basis functions (RBFs, see Supplemen- function. Hence the size of the dataset increased by 1024 data
tary Information-Process-Structure linkage) which automatically pairs at each iteration l, with L = 4 data augmentation steps
ensures the non-negativity of the resulting SDF. The constraint performed in total.
of unit variance is enforced using a softmax transformation. The The optimal process parameters at each data acquisition step
density p(x|ϕ) implicitly defined above affords great flexibility in are denoted as ϕ∗M,D(l) , with the subscript indicating the depen-
the forms of the resulting binary medium (as can be seen in the
ensuing illustrations) which increases as the dimension of ϕ does. dence on the surrogate model M and the dataset D(l) on which
Figure 1 illustrates how different values of the process parame- it has been trained. Once the algorithm has converged to its final
ters ϕ can lead to profound changes in the microstructures (and estimate of the process parameters after L data acquisition steps,
correspondingly, their effective physical properties κ). While the i.e. ϕ∗M,D(L) , we can assess ϕ∗M,D(L) by obtaining a reference es-
parameters ϕ selected do not have explicit physical meaning, they timate of the expected utility U (ϕ∗M,D(L) ) = P r(κ ∈ K|ϕ∗M,D(L) )
can be linked to actual processing variables given appropriate using a Monte Carlo simulation, i.e. by sampling microstructures
data. Naturally, not all binary media can be represented by this x ∼ p(x|ϕ∗M,D(L) ), and running the high-fidelity model instead of
model and a more flexible p(x|ϕ) could be employed with small the inexpensive surrogate. This approach will also enable us to
modifications in the overall algorithm [38, 39, 40]. compare the optimization results obtained with the active learn-
Microstructure x - Properties κ In this study we con- ing strategy with those obtained by using randomized training
sider a two-dimensional, representative volume element (RVE) data D for the surrogate (i.e. without adaptive learning). We
ΩRVE = [0, 1]2 and assume each of the two phases are isotropic, argue that the former has a competitive advantage, if for the
linear elastic in terms of their mechanical response and are char- same number of datapoints N we can achieve a higher score in
acterized by isotropic, linear conductivity tensors in terms of their terms of our materials design objective P r (κ ∈ K|ϕ∗ ). As the
thermal response. We denote with C the fourth-order elasticity optimization objective F is non-convex and the optimization
tensor and with a the second order conductivity tensor which algorithm itself non-deterministic 8 , generally the process
are also binary (vector and tensor) fields. The vector κ consists parameters ϕ∗ identified can vary across different runs. For
of various combinations of macroscopic, effective (apparent), me- this reason the optimization problem is solved several times
chanical or thermal properties of the RVE which we denote by (with different randomized initializations) and we report on the
Ceff and aeff , respectively. The effective properties for each mi- aggregate performance of active learning vs. randomized data
crostructure occupying ΩRVE were computed using finite element generation (serving as a baseline).
simulations and Hill’s averaging theorem [41, 42] (see Supple-
mentary Information: Structure-Property linkage). We assumed In the following we discuss the results obtained and displayed
a contrast ratio of 50 in the properties of the two phases, i.e., in Fig. 4, 5, 6 and 7.
E1 /E0 = 50 (where E0 , E1 are the elastic moduli of phases 0 6
In this case, the centers of the 100 RBFs are fixed to a uniform grid
and 1)5 and a1 /a0 = 50 (where a0 , a1 are the conductivities of and the bandwidths were prescribed. Hence the 100 entries of ϕ correspond
to the weights of the RBFs
4 7
The retraining could be avoided by making use of online learning [35], The choice ϕ ∼ N (0, I) is not arbitrary, as - given the adopted
which does not rely on an a-priori fixed dataset and can deal with incre- parametrization - it envelopes all possible SDFs
8
mentally arriving (or streamed) data due to the randomized generation of the data, the stochastic initializa-
5
Poisson’s ratio was ν = 0.3 for both phases tion of the neural network, the randomized initial guess of ϕ(0) ∼ N (0, I)

5

p x ϕ(0)
a1 = 50
E1 = 50

phases
κ1 = 9.91, κ2 = 6.43 κ1 = 10.22, κ2 = 6.07 κ1 = 7.45, κ2 = 6.98 κ1 = 8.75, κ2 = 6.75

p x ϕ∗M,D(4)

a0 = 1
E0 = 1


κ1 = 10.72, κ2 = 7.10 κ1 = 10.92, κ2 = 6.20 κ1 = 10.19, κ2 = 7.33 κ1 = 9.76, κ2 = 6.53

Figure 4: Case 1: Optimal random microstructures. (Tow row) Samples of microstructures drawn from p(x|ϕ) for the initial
guess ϕ(0) of processing variables, (Bottom row) Samples of microstructures drawn from p(x|ϕ) for the optimal value ϕ∗
M,D (L)
of processing variables which maximize the probability that the corresponding material properties will fall in the target domain
K = [8.5, 11.0] × [6.75, 9.0] (Eq. (10)). Underneath each microstructure, the thermal κ1 and mechanical κ2 properties of interest
(Eq. (9)) are reported.

p(κ|ϕ(0) ) p(κ|ϕ∗M,D(0) ) p(κ|ϕ∗M,D(4) )


0.64
() [C]1111 + [C]2222 ) [Stiffness]

12
0.56

10 0.48
P r(κ ∈ K | ϕ(0) ) = 0.14 P r(κ ∈ K | ϕ∗M,D(0) ) = 0.31 P r(κ ∈ K | ϕ∗M,D(4) ) = 0.66
0.40
8 0.32

0.24
6
0.16
1
2

4 0.08
κ2 =

6 8 10 12 14 6 8 10 12 14 6 8 10 12 14 0.00
κ1 = [a]11 [Conductivity] κ1 = [a]11 [Conductivity] κ1 = [a]11 [Conductivity]

Figure 5: Case 1: Evolution of the process-property density during optimization. The actual process-property density
p (κ|ϕ) was estimated using 1024 Monte Carlo samples, the high-fidelity structure-property model (see Supplementary Information:
Structure-Property linkage) and for three values of the process parameters ϕ. (Left) for the initial guess ϕ(0) , (Middle) for the
optimal ϕ as obtained using the initial training dataset D(0) and without adaptive learning, (Right) for the optimal ϕ obtained
with the augmented training dataset D(4) identified by the active learning scheme proposed and as detailed in the text. The target
domain K (Eq. (10)) is drawn with a green rectangle and the colorbar indicates the value of the density p (κ|ϕ).
 
S = P r κ ∈ K|ϕ∗M,D(l) marginal p(κ ) 1

0.6
0.6
achieved metric S

marginal p(κ1 )

0.4
0.55 p(κ1 |ϕ∗ )
active learning M,D (4)
p(κ1 |ϕ∗ , D(0) )
random (baseline) M,D (4)
p(κ1 |ϕ∗ , D(4) )
0.5 0.2 M,D (4)

0.45
0
2,000 3,000 4,000 5,000 6,000 6 8 10 12
N : labeled data (microstructures) effective property κ1

Figure 6: Case 1: Assessment of active learning approach. (Left:) The probability we seek to maximize with respect to ϕ,
i.e. P r (κ ∈ K|ϕ) is plotted as a function of the size N of the training dataset (i.e. the number of simulations of the high-fidelity
model). The lines depict the medians obtained over 80 independent runs of each algorithm. The red line corresponds to the results
obtained without adaptive learning and the blue with adaptive learning. (Right:) for the the optimal ϕ∗M,D(4) identified using active
learning, we compare the actual process-property density p(κ1 |ϕ) (black line - estimated with 1024 Monte Carlo samples and the
high-fidelity model) with the one predicted by the surrogate trained only on the initial dataset D(0) (blue line) and with the one
predicted by the surrogate trained on the augmented dataset D(4) (red line).

6
EM - algorithm Active Learning — P r(κ ∈ K|ϕ∗M,D(l) )


2,200 0.6

P r κ ∈ K ϕ∗M,D(l)
ELBO F

2,100
0.5



2,000
0.4


1,900
0 20 40 60 80 100 120 0 1 2 3 4
EM-iteration data acquisition step l
(b) Convergence of P r(κ ∈ K|ϕ∗ )
(a) Convergence of ELBO for l = 0 M,D (l)

Figure 7: Case 1: Convergence characteristics of the optimization algorithm. (a) evolution of the ELBO F as a function of
the iteration number in the inner loop (see Algorithm 1) and for l = 0 (outer loop - see Algorithm 1). (b) evolution of the probability
we seek to maximize P r [κ ∈ K|ϕ] (estimated with 1024 Monte Carlo samples and the high-fidelity model) at the optimal ϕ values
identified by the algorithm at various data acquisition steps l (outer loop - see Algorithm 1).

• In Fig. 4 we depict sample microstructures drawn from Monte Carlo (black line). We can see that a model only in-
p(x|ϕ) for two values of ϕ i.e. for the initial guess ϕ(0) formed by D(0) (blue line) identifies an incorrect density and
(top row) and for optimal process parameters ϕ∗M,D(L) (bot- as such fails to converge to the optimal process parameters.
tom row). While the microstructures in the bottom row re- The active learning approach (red line) was able to correct
main random, one observes that the connectivity of phase the initially erroneous model belief and as a result performs
1 (stiffer) is increased as compared to the microstructures better in the optimization task.
of the top row. The diagonal, connected paths of the lesser
conducting phase (black) effectively block heat conduction • In Fig. 7a we illustrate the evolution of the ELBO during
in the horizontal direction. This is reflected in the effective the inner-loop iterations of the proposed VB-EM algorithm
properties reported underneath each image. The value of the (see Algorithm 1). Finally, in Fig. 7b we depict (on the
objective, i.e. of the probability that the properties κ ∈ K, right) the evolution of the maximum of the objective iden-
is ≈ 0.65 for the microstructures in the bottom row (see Fig. tified at various data acquisition steps l of the proposed ac-
6 - right) whereas for the top ones ≈ 0.14. tive learning scheme. As it can be seen, the targeted data
enrichment enables the surrogate to resolve details in the
• Fig. 5 provides insight into the optimization algorithm pro- structure-property map and identify higher-performing pro-
posed by looking at the process-property density p(κ|ϕ) for cessing parameters ϕ.
various ϕ values. We note that this is implicitly defined by
propagating the randomness in the microstructures (quanti- Case 2: Target density of properties (O2)
fied by p(x|ϕ)) through the high-fidelity model that predicts
In this second numerical illustration, we investigate the perfor-
the properties of interest. Based on the Monte Carlo es-
mance of the proposed methodological framework for an (O2)-
timates depicted in Fig. 5, one observes that the density
type optimization problem (Eq. (3)) where we seek to identify
p(κ|ϕ) which only minimally touches upon the target do-
the processing parameters ϕ that lead to a property density p(κ|ϕ)
main K for initial process parameters ϕ(0) (left), gradually
that is closest to a prescribed target ptarget (κ). In particular, we
moves closer to K as the iterations proceed and by using the
considered the following two properties
surrogate trained on the initial batch of data D(0) (middle).
The incorporation of additional training data through the h i h i
successive applications of the adaptive learning scheme de- κ1 = aeff κ2 = aeff (11)
scribed earlier, enables the surrogate to sufficiently resolve 11 22
details in the structure-property map that eventually lead i.e. κ ∈ R2 and a target density:
to the density p(κ|ϕ) shown on the right panel and which  
maximally overlaps (in comparison) with the target domain ptarget (κ) = N µ̂, Σ̂ (12)
K.
with µ̂ = [20.5, 3.5]T and Σ̂11 = 0.60, Σ̂22 = 0.01, Σ̂12 = −0.03
• On the left side of Fig. 6 we illustrate the performance ad-
(indicated with green iso-probability lines in Fig. 9). These
vantage gained by the active learning approach proposed over
values were selected to promote anisotropic behavior, i.e. mi-
a brute-force strategy that employs randomized training data
crostructures are desired to have a large effective conductivity in
for the surrogate. To this end, we compare the values of
the first spatial dimension, while being comparatively insulating
the objective function, i.e. P r(κ ∈ K|ϕ∗M,D ) achieved for
in the second spatial dimension. The characteristics of the active
datasets D of equal size, with the dataset being either gener-
learning procedure (outer loop in Algorithm 1) remain identical,
ated randomly, or constructed based on our active learning
with the only difference that D(0) now comprises N0 = 4096 dat-
approach. Evidently, the active learning approach was able
apoints, with Nadd = 1024 datapoints (out of 4096 candidates)
to achieve a better material design at comparably signifi-
added in each of the L = 6 data-enrichment steps. We used
cantly lower numerical cost (as measured by the number of
S = 20 samples from ptarget (κ) to approximate the objective
evaluations of the high-fidelity model of the microstructure-
(see Eq. (5)).
property link). We observe that while the addition of more
We present and discuss the results obtained based on the Fig-
training data generally leads to more accurate surrogates,
ures 8 and 9:
when this is done without regard to the optimization objec-
tives (red line) then it does not necessarily lead to higher • In Fig. 8 we showcase sample microstructures drawn from
values of the objective function. On the right panel of Fig. p (x|ϕ) both for the initial guess ϕ(0) (top row) as well as
6 we provide further insight as to why the adaptive data ac- for the optimal process parameters ϕ∗M,D(L) identified by the
quisition was able to outperform a randomized approach. To optimized algorithm using the active learning approach. The
this end and for one of the two properties, we compare the leftmost three columns pertain to volume fraction 0.5 (of
model-based belief p κ1 ϕ∗M,D(4) , D(0) of the surrogate con-

the more conducting phase a1 ) whereas the rightmost three
ditional on D(0) against a reference density obtained using columns to volume fraction 0.3.

7
(a) volume fraction : 0.5 (b) volume fraction : 0.3
p(x|ϕ(0) )

a1 = 50

phases
κ1 = 7.9 κ2 = 8.5 κ1 = 10.5 κ2 = 8.7 κ1 = 10.8 κ2 = 8.3 κ1 = 4.9 κ2 = 4.7 κ1 = 4.8 κ2 = 4.9 κ1 = 4.9 κ2 = 4.7
p(x|ϕ∗M,D(6) )

a0 = 1

κ1 = 20.3 κ2 = 3.5 κ1 = 20.7 κ2 = 3.3 κ1 = 19.8 κ2 = 3.7 κ1 = 5.8 κ2 = 1.8 κ1 = 9.7 κ2 = 2.0 κ1 = 5.8 κ2 = 1.8

Figure 8: Case 2: Optimal random microstructures. (Tow row) Samples of microstructures drawn from p(x|ϕ) for the initial
guess ϕ(0) of processing variables, (Bottom row) Samples of microstructures drawn from p(x|ϕ) for the optimal value ϕ∗M,D(6) of
the processing variables which minimize the KL-divergence between p(κ|ϕ) and the target density ptarget (Eq. (12)). Underneath
each microstructure, the thermal properties κ1 , κ2 of interest (Eq. (11)) are reported. The illustrations correspond to two volume
fractions 0.5 (in (a)-left) and 0.3 (in (b)-right) of the high-conductivity phase (a1 = 50).

Converged ϕ
Initial ϕ Intermediate ϕ Converged ϕ [Cropped]
12 12 12

model belief pM (κ|ϕ, D)


κ2 = [a]22 [Conductivity]

baseline / randomized
6
10 10 10

8 8 8 5

6 6 6 4

4 4 4 ptarget (κ)
3
10 20 10 20 10 20 15 20

model belief pM (κ|ϕ, D(l) )


12 12 12
κ2 = [a]22 [Conductivity]

active learning
10 10 10

8 8 8 5

6 6 6 4

4 4 4 ptarget (κ)
3
10 20 10 20 10 20 15 20

12 12 12
κ2 = [a]22 [Conductivity]

6
reference p(κ|ϕ)

10 10 10
Monte Carlo

8 8 8 5

6 6 6 4

4 4 4 ptarget (κ)
3
10 20 10 20 10 20 15 20
κ1 = [a]11 [Conductivity] κ1 = [a]11 [Conductivity] κ1 = [a]11 [Conductivity] κ1 = [a]11 [Conductivity]

Figure 9: Case 2: Evolution the of process-property density p(κ|ϕ) with and without active learning in relation to
the target ptarget (κ). We plot the evolution of the process-property density p(κ|ϕ) at three different stages of each optimization
run, i.e. for the initial ϕ (left column) for the ϕ at an intermediate stage of the optimization (middle column) and for the optimal ϕ
identified upon convergence (right column). The fourth column is a zoomed-in version of the third that enables closer comparisons
of the densities involved. (Top row:) illustrates p(κ|ϕ) as predicted by the surrogate trained on a randomized dataset without active
learning. (Middle row:) illustrates p(κ|ϕ) as predicted by the surrogate trained using the adaptive learning proposed. Bottom row:
illustrates the actual p(κ|ϕ) (estimated with 1024 Monte Carlo samples and the high-fidelity model) and for the optimal ϕ identified
by the active learning approach. The target distribution ptarget (κ) is indicated with green iso-probability lines.

8
As one would expect, we observe that the optimal family of
microstructures identified (determined by ϕ∗M,D(L) ) exhibit Algorithm 1 Obtain ϕ∗ = arg maxϕ U1,2 (Eq. (1) or Eq. (3))
connected paths of the more conductive phase (white) along using a probabilistic surrogate and active learning
the horizontal direction. The connected paths of the lesser
Data: l = 0, t = 0 D(0) , structure-property-model, surrogate
conducting phase (black) are also aligned in the horizontal
p κ x, D(0) , initial ϕ(0) , variational family Qξ

direction so as to reduce the effective conductivity along the
vertical direction. The optimal microstructures therefore ex- Result: Converged process parameter ϕ∗ (L)
D
hibit a strong anistropic property by funneling heat through while ϕ not converged do
pipe-like structures of high-conductivity material in the hori-
zontal direction. This is also reflected in the indicative prop- while ELBO not converged do
erty values report under each image. /* Execute E-step */
 
• Finally, Fig. 9 assesses the advantage of the active learn- ξ(t+1) = arg max F ϕ(t) , qξ (κ, x)
ξ
ing strategy advocated for this problem. In particular, we
plot the evolution of the process-structure density p (κ|ϕ)
/* Execute M-step */
in relation to the target ptarget (κ) (depicted with green
iso-probability lines) at different stages of each optimization  
(initial-intermediate-converged). Using the optimal process ϕ(t+1) = arg max F ϕ, qξ(t+1) (κ, x)
ϕ
parameters ϕ at each of these stages, we see that the op-
timization scheme without active learning (top row) results t→t+1
in a density that is quite far from the target. In contrast, end
the optimization algorithm with active learning (middle row) /* Optimal ϕ conditional on current data */
is able to identify a ϕ which brings the p (κ|ϕ) into close
alignment with the target distribution ptarget (κ). The va- ϕ∗ ← ϕ(t)
M,D (l)
lidity of this result is assessed on the bottom row where the
actual p(κ|ϕ) (estimated with Monte Carlo and the high- /* Create microstructure candidates */
fidelity model) is depicted for the ϕ values identified by the sample x(l,n) ∼ q (x) , n = 1, . . . , Npool
active learning approach in the middle row. We observe a
compute α(x(l,n) ) (Eq. (7))
very close agreement which reinforces the evidence that the
active learning strategy advocated enables the surrogate to /* Select most informative subset */
accurately resolve the details in the structure-property map (l)
that are needed for the solution of the optimization problem. Select Nadd < Npool microstructures from Dpool
which yield
the highest acquisition function values and compute the cor-
(l)
In summary, we presented a flexible, fully probabilistic, data- responding property values κ(x) in order to form Dadd .
driven formulation for materials design that accounts for the un-
certainty in the process-structure and structure-property links
and enables the identification of the optimal, high-dimensional, (l)
D(l+1) ← D(l) ∪ Dadd
process parameters ϕ. We have demonstrated that a variety of
different objectives can be accommodated. The methodology re-
lies on the availability of a process-structure linkage which we /* Update probabilistic surrogate */
generically represented with the density p(x|ϕ) and could be
learned from experimental or simulation data. It is not restricted    
to the particular parametrizations adopted in terms of the process p κ x, D(l+1) ← p κ x, D(l)

ϕ or microstructural x variables and very high-dimensional de-
scriptions can be employed due to the VB-EM scheme advocated.
We have also demonstrated the use of probabilistic surrogates in l →l+1
combination with novel, active learning formulations which can end
significantly reduce the computational effort associated with the
structure-property link and enable the solution of problems with a
small number of such simulations. The optimization framework Figure 10: Pseudo-code for proposed algorithm. The inner
is also agnostic regarding the nature of the structure-property VB-EM iterations are wrapped within the adaptive data acquisi-
link p (κ|x) (and whether it is deterministic or probabilistic), as tion as an outer loop.
it simply defines the data generation process for the probabilistic
surrogate. As a result, other material properties or other physical
descriptions can be readily incorporated. The framework advo- Author Contributions
cated does not rely on a particular architecture of the data-driven
surrogate. Its predictive uncertainty is incorporated and the self- M.R.: conceptualization, physics and machine-learning modeling
supervised active learning mechanism can control the number of and computations, algorithmic and code development, writing of
training data that each particular surrogate would need. While the paper. P-S.K: conceptualization, writing of the paper
not discussed, it is also possible to assess the optimization error,
albeit with additional runs of the high-fidelity model, by using
an Importance Sampling step [45]. Lastly we mention further Competing Interests
potential of improvement by a fully Bayesian treatment of the
The authors declare no competing interests.
surrogate’s parameters θ, which would be particularly beneficial
in the small-data regime we are operating in.
References
Code Availability [1] National Science and Technology Council Executive Office
of the President. Materials Genome Initiative: A Renais-
The source code will be made available at https://github.com/ sance of American Manufacturing. June 2011.
bdevl/SMO. [2] David L McDowell, Jitesh Panchal, Hae-Jin Choi, Carolyn
Seepersad, Janet Allen, and Farrokh Mistree. Integrated de-
sign of multiscale, multifunctional materials and products.
Data Availability Butterworth-Heinemann, 2009.
[3] Raymundo Arróyave and David L. McDowell. Systems
The accompanying data will be made available at https:// Approaches to Materials Design: Past, Present, and Fu-
github.com/bdevl/SMO. ture. Annual Review of Materials Research, 49(1):103–

9
126, 2019. eprint: https://doi.org/10.1146/annurev-matsci- Veera Sundararaghavan, and Ankit Agrawal. Microstructure
070218-125955. optimization with constrained design objectives using ma-
[4] Aleksandr Chernatynskiy, Simon R. Phillpot, and Richard chine learning-based feedback-aware data-generation. Com-
LeSar. Uncertainty Quantification in Multiscale Simulation putational Materials Science, 160:334–351, April 2019.
of Materials: A Prospective. Annual Review of Materials [23] Evdokia Popova, Theron M. Rodgers, Xinyi Gong, Ah-
Research, 43(1):157–182, 2013. met Cecen, Jonathan D. Madison, and Surya R. Kalidindi.
[5] Pejman Honarmandi and Raymundo Arróyave. Uncertainty Process-Structure Linkages Using a Data Science Approach:
Quantification and Propagation in Computational Materials Application to Simulated Additive Manufacturing Data. In-
Science and Simulation-Assisted Materials Design. Integrat- tegrating Materials and Manufacturing Innovation, 6(1):54–
ing Materials and Manufacturing Innovation, 9(1):103–143, 68, March 2017.
March 2020. [24] Xian Yeow Lee, Joshua R. Waite, Chih-Hsuan Yang, Balaji
[6] Liu, Xuan, Furrer, David, Kosters, Jared, and Holmes, Jack. Sesha Sarath Pokuri, Ameya Joshi, Aditya Balu, Chinmay
NASA Vision 2040: A Roadmap for Integrated, Multiscale Hegde, Baskar Ganapathysubramanian, and Soumik Sarkar.
Modeling and Simulation of Materials and Systems. Techni- Fast inverse design of microstructures via generative invari-
cal report, March 2018. ance networks. Nature Computational Science, 1(3), March
[7] Frederic E. Bock, Roland C. Aydin, Christian J. Cyron, 2021.
Norbert Huber, Surya R. Kalidindi, and Benjamin Kluse- [25] Hisaki Ikebata, Kenta Hongo, Tetsu Isomura, Ryo Maezono,
mann. A Review of the Application of Machine Learning and and Ryo Yoshida. Bayesian molecular design with a chemi-
Data Mining Approaches in Continuum Materials Mechan- cal language model. Journal of Computer-Aided Molecular
ics. Frontiers in Materials, 6, 2019. Publisher: Frontiers. Design, 31(4):379–391, April 2017.
[8] Jitesh H. Panchal, Surya R. Kalidindi, and David L. Mc- [26] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maxi-
Dowell. Key computational modeling issues in Integrated mum likelihood from incomplete data via the EM algorithm
Computational Materials Engineering. Computer-Aided De- (with discussion). Journal of the Royal Statistical Society
sign, 45(1):4–25, January 2013. B, 39(1):1–38, 1977.
[9] Constantin Grigo and Phaedon-Stelios Koutsourelakis. [27] Matthew J Beal and Zoubin Ghahramani. The Variational
Bayesian Model and Dimension Reduction for Uncertainty Bayesian EM Algorithm for Incomplete Data: with Applica-
Propagation: Applications in Random Media. SIAM/ASA tion to Scoring Graphical Model Structures. Bayesian Statis-
Journal on Uncertainty Quantification, 7(1):292–323, Jan- tics, (7), 2003.
uary 2019. [28] Radford M Neal and Geoffrey E Hinton. A view of the
[10] N. Zabaras and B. Ganapathysubramanian. A scalable EM algorithm that justifies incremental, sparse, and other
framework for the solution of stochastic inverse problems variants. In Learning in graphical models, pages 355–368.
using a sparse grid collocation approach. Journal of Com- Springer, 1998.
putational Physics, 227(9):4697–4735, April 2008. [29] Surya R. Kalidindi. A Bayesian framework for materials
[11] Peter I Frazier and Jialei Wang. Bayesian optimization for knowledge systems. MRS Communications, 9(2):518–531,
materials design. In Information Science for Materials Dis- June 2019. Publisher: Cambridge University Press.
covery and Design, pages 45–75. Springer, 2016. [30] Gary Marcus and Ernest Davis. Rebooting AI: Building Ar-
[12] Yichi Zhang, Daniel W. Apley, and Wei Chen. Bayesian Op- tificial Intelligence We Can Trust. Pantheon, September
timization for Materials Design with Mixed Quantitative and 2019.
Qualitative Variables. Scientific Reports, 10(1):4924, March [31] Zijiang Yang, Yuksel C Yabansu, Reda Al-Bahrani, Wei-
2020. Number: 1 Publisher: Nature Publishing Group. keng Liao, Alok N Choudhary, Surya R Kalidindi, and Ankit
[13] Jaimyun Jung, Jae Ik Yoon, Hyung Keun Park, Hyeontae Jo, Agrawal. Deep learning approaches for mining structure-
and Hyoung Seop Kim. Microstructure design using machine property linkages in high contrast composites from simula-
learning generated low dimensional and continuous design tion datasets. Computational Materials Science, 151:278–
space. Materialia, 11:100690, June 2020. 287, 2018.
[14] Chun-Teh Chen and Grace X. Gu. Machine learning for [32] Ahmet Cecen, Hanjun Dai, Yuksel C Yabansu, Surya R Ka-
composite materials. MRS Communications, 9(2):556–566, lidindi, and Le Song. Material structure-property linkages
June 2019. Publisher: Cambridge University Press. using three-dimensional convolutional neural networks. Acta
[15] S. Torquato. Optimal Design of Heterogeneous Materials. Materialia, 146:76–84, 2018.
Annual Review of Materials Research, 40(1):101–129, 2010. [33] Simon Tong and Stanford University Computer Science
[16] Matthew D. Hoffman, David M. Blei, Chong Wang, and Dept. Active learning: theory and applications. Stanford
John Paisley. Stochastic Variational Inference. J. Mach. University, 2001.
Learn. Res., 14(1):1303–1347, May 2013. [34] D. J. C. MacKay. Information-Based Objective Functions for
[17] Anh Tran and Tim Wildey. Solving stochastic inverse prob- Active Data Selection. Neural Computation, 4(4):590–604,
lems for property–structure linkages using data-consistent July 1992. Conference Name: Neural Computation.
inversion and machine learning. JOM, 73(1):72–89, January [35] Doyen Sahoo, Quang Pham, Jing Lu, and Steven CH Hoi.
2021. Online deep learning: Learning deep neural networks on the
[18] Fayyaz Nosouhi Dehnavi, Masoud Safdari, Karen Abrinia, fly. arXiv preprint arXiv:1711.03705, 2017.
Ali Hasanabadi, and Majid Baniassadi. A framework for op- [36] M Teubner. Level surfaces of Gaussian random fields and mi-
timal microstructural design of random heterogeneous mate- croemulsions. EPL (Europhysics Letters), 14(5):403, 1991.
rials. Computational Mechanics, 66(1):123–139, July 2020. Publisher: IOP Publishing.
[19] Pınar Acar, Siddhartha Srivastava, and Veera Sun- [37] Anthony P Roberts and Max Teubner. Transport proper-
dararaghavan. Stochastic Design Optimization of Mi- ties of heterogeneous materials derived from Gaussian ran-
crostructures with Utilization of a Linear Solver. AIAA dom fields: bounds and simulation. Physical Review E,
Journal, 55(9):3161–3168, 2017. Publisher: Ameri- 51(5):4141, 1995. Publisher: APS.
can Institute of Aeronautics and Astronautics eprint: [38] P.S. Koutsourelakis. Probabilistic characterization and sim-
https://doi.org/10.2514/1.J056000. ulation of multi-phase random media. Probabilistic Engi-
[20] Pinar Acar and Veera Sundararaghavan. Stochastic Design neering Mechanics, 21(3), 2006.
Optimization of Microstructural Features Using Linear Pro- [39] Ramin Bostanabad, Anh Tuan Bui, Wei Xie, Daniel W. Ap-
gramming for Robust Design. AIAA Journal, 57(1):448–455, ley, and Wei Chen. Stochastic microstructure characteriza-
2019. Publisher: American Institute of Aeronautics and As- tion and reconstruction via supervised learning. Acta Mate-
tronautics eprint: https://doi.org/10.2514/1.J057377. rialia, 103:89–102, January 2016.
[21] Ruoqian Liu, Abhishek Kumar, Zhengzhang Chen, Ankit [40] Ruijin Cang, Yaopengxiao Xu, Shaohua Chen, Yongming
Agrawal, Veera Sundararaghavan, and Alok Choudhary. A Liu, Yang Jiao, and Max Yi Ren. Microstructure Represen-
predictive machine learning approach for microstructure op- tation and Reconstruction of Heterogeneous Materials Via
timization and materials design. Scientific reports, 5(1):1– Deep Belief Network for Computational Material Design.
12, 2015. Publisher: Nature Publishing Group. Journal of Mechanical Design, 139(7), May 2017.
[22] Arindam Paul, Pinar Acar, Wei-keng Liao, Alok Choudhary, [41] Christian Miehe and Andreas Koch. Computational micro-

10
to-macro transitions of discretized microstructures undergo-
ing small strains. Archive of Applied Mechanics, 72(4):300–
317, 2002.
[42] Rodney Hill. On constitutive macro-variables for heteroge-
neous solids at finite strain. Proceedings of the Royal So-
ciety of London. A. Mathematical and Physical Sciences,
326(1565):131–147, 1972.
[43] G. Saheli, H. Garmestani, and B. L. Adams. Microstructure
design of a two phase composite using two-point correla-
tion functions. Journal of Computer-Aided Materials De-
sign, 11(2):103–115, January 2004.
[44] David T. Fullwood, Stephen R. Niezgoda, Brent L. Adams,
and Surya R. Kalidindi. Microstructure sensitive design for
performance optimization. Progress in Materials Science,
55(6):477–562, August 2010.
[45] Raphael Sternfels and Phaedon-Stelios Koutsourelakis.
Stochastic design and control in random heterogeneous ma-
terials. International Journal for Multiscale Computational
Engineering, 9(4), 2011.
[46] Andrew Wilson and Ryan Adams. Gaussian process kernels
for pattern discovery and extrapolation. In International
conference on machine learning, pages 1067–1075. PMLR,
2013.
[47] Masanobu Shinozuka and George Deodatis. Simulation of
multi-dimensional gaussian stochastic fields by spectral rep-
resentation. 1996.
[48] B Hu and W Schiehlen. On the simulation of stochastic pro-
cesses by spectral representation. Probabilistic engineering
mechanics, 12(2):105–113, 1997.
[49] Diederik P Kingma and Max Welling. Auto-encoding varia-
tional bayes. arXiv preprint arXiv:1312.6114, 2013.
[50] Diederik P Kingma and Jimmy Ba. Adam: A method for
stochastic optimization. arXiv preprint arXiv:1412.6980,
2014.
[51] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,
James Bradbury, Gregory Chanan, Trevor Killeen, Zeming
Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An im-
perative style, high-performance deep learning library. Ad-
vances in neural information processing systems, 32:8026–
8037, 2019.
[52] Adrian Smith. Sequential Monte Carlo methods in practice.
Springer Science & Business Media, 2013.

11
Supplementary Information where expectations with respect to x have been substituted
by integrations with respect to the (primal) random variables Ψ
In the following, we provide details specific to the algorithm im- arising from the spectral representation. Similarly the variational
plementation and the numerical simulations. density is expressed with respect to Ψ, i.e. qξ (κ, Ψ) (as opposed
to qξ (κ, x)). The ELBO of the (O2)-type problems in Eq. (5)
Process-Structure linkage can similarly be written as
As mentioned earlier the discretized, two-phase random mi-
Z
D
crostructures employed in the numerical illustrations are rep- log U2,M (ϕ) = ptarget (κ) log pM (κ|ϕ, D) dκ
resented by a random vector x which arises by thresholding a S
two-dimensional, zero-mean, unit-variance Gaussian field, in its 1 X  
i.i.d.
≈ log pM κ(s) ϕ, D κ(s) ∼ ptarget (κ)

discretized form denoted by the vector xg . The cutoff threshold S s=1
x0 is specified by the desired volume fraction and the parameters
S
ϕ are associated with the spectral density function (SDF) G (w) 1 X
Z  
= log pM κ(s) Fϕ (Ψ) , D p (Ψ) dΨ

of the underlying Gaussian field. The SDF G (w) arises as the S s=1
Fourier dual of the autocovariance, where w = [w1 , w2 ]T ∈ R2  
denotes the wavenumbers. We express the SDF as: S
pM κ(s) Fϕ (Ψ) , D p (Ψ)

1 X
≥ E (s) log
(s)

Q S s=1 qξ (Ψ) q (Ψ)
X ξ
G (w) = γi hi (w; µi , σi ) (13)
S
i=1
 
(s)
X
= Fs qξ (Ψ) , ϕ
where the functions hi are Radial Basis Functions (RBFs) which s=1
depend on the parameters µi , σi and have the functional form: (21)
Since the maximization of the ELBO w.r.t. ξ and ϕ is not
1 − 12 ||w−µ||2
h (w; µ, σ) = √ e 2σ . (14) possible in closed form, noisy estimates of the gradients ĝϕ ≈
2πσ 2

∇ϕ F ϕ, qξ (κ, Ψ) and ĝξ ≈ ∇ξ F ϕ, qξ (κ, Ψ) are obtained
This form is adopted because it automatically ensures the posi- using Monte Carlo and the reparametrization trick ([49] - see
tivity of the resulting SDF. Eq. (13) is also known as a spectral ensuing discussion). For our numerical illustrations, the opti-
mixture kernel, which defines a universal approximator for suffi- mization with respect to ϕ and ξ is carried out with stochastic
ciently large Q [46]. In our simulations, the parameters {µi }, i.e. gradient ascent and the Adam optimizer [50] in PyTorch [51].
the centers of RBFs, were fixed to a uniform grid in [0, wmax ]2 ,
with wmax = 65.0 and σi = 12.0, ∀i. Finally the weights γi Representation
are related to the optimization variables ϕ through a softmax
transformation: Instead of the bounded phase angles Ψ, we employ the unbounded
eϕi and normally distributed variables Ψt (i.e. Ψt ∼ N (0, I)) which
γi = PQ . (15) are related through the error function erf (·) as follows:
ϕj
j=1 e
This is employed so that the resulting SDF integrates to 1 which
Ψt,i
  
is the variance of the corresponding Gaussian field. We made
Ψi = 0.5 · 1 + erf √ · 2π (22)
use of a spectral representation of the underlying Gaussian field 2π
(and therefore of xg ) on the basis of its ϕ-controlled SDF and
according to the formulations detailed in [47, 48, 45]. The thresh- As a result, all expectations with respect to p(Ψ) are substituted
olded Gaussian vector xg gives rise to the binary microstructure by expectations with respect to p (Ψt ) = N (0, I).
x as described above and we denote summarily the corresponding In addition, instead of the Heaviside function defining the bi-
transformation as: nary vector x from the underlying Gaussian xg as xi = H(xg,i −
x0 ), we employ the differentiable transformation
x = Fϕ (Ψ) (16)
where Ψ denotes a vector of so-called random phase angles [47]. tanh ( (xg,i − x0 )) + 1
xi = (23)
It consists of independent random variables uniformly distributed 2
in [0, 2π], and its dimension depends on the discretization of the
We note that as  → ∞ we recover the Heaviside function and
spectral domain. A direct implication of Eq. (16) is that the
therefore a hard truncation. While the resulting xi ’s are only
process-structure density p(x|ϕ) can now be expressed as:
approximately binary for finite ε (we used ε = 25), these were
used in all computations involved in the surrogate and the op-
Z
p(x|ϕ) = δ (x − Fϕ (Ψ)) p(Ψ) dΨ (17) timization. An advantage is that this enables the use of the
reparametrization trick [49] to estimate the ELBO and its gradi-
where p(Ψ) is the product of uniform densities U [0, 2π]. As a ents as explained in the previous section.
result, expectations of arbitrary functions, say f (x), with respect
to p(x|ϕ) can now be written as (with some abuse of notation):
Low-rank Variational Approximation
For qξ (κ, Ψt ) ∈ Q we adopt the choice of a low-rank multivariate
Z
Ep(x|ϕ) [f (x)] = f (x) δ (x − Fϕ (Ψ)) p (Ψ) dx dΨ (18)
Gaussian distribution (with z = [κ, Ψt ]T ∈ Rdz ), i.e.
Z
= f (Fϕ (Ψ)) p (Ψ) dΨ (19)  
qξ (z) = N z | µ, Σ = diag (d) + LLT (24)
VB-EM-Algorithm
with L ∈ Rdz ×M and M << dz . The variational parame-
By making use of Eq. (19) above, we can write the ELBO for the ters are given by ξ = {µ, d, L} with dim (ξ) = O (dz · M ). This
log-expected utility in Eq. (4) as particular choice enables to capture enough of the correlation
structure to drive the EM updates, while remaining scalable with
D
log U1,M (ϕ) = log Ep(x|ϕ) [u (κ) pM (κ|x, D) dκ] regards to the (generally large) dimension dz of the problem (we
Z used M = 50). We note that a fully Bayesian treatment of the
= log u (κ) pM (κ|Fϕ (Ψ) , D) p (Ψ) dκ dΨ surrogate could be accomplished by including the neural network
parameters θ in the variational inference framework. While we
could also drive the EM-algorithm via, e.g., Markov Chain Monte
 
u (κ) pM (κ|Fϕ (Ψ) , D) p (Ψ)
≥ Eqξ (κ,Ψ) log Carlo or Sequential Monte Carlo, the choice of variational infer-
qξ (κ, Ψ)
 ence is computationally faster and additionally enables monitor-
= F ϕ, qξ (κ, Ψ) (20) ing of convergence through the ELBO F .

12
Tempering was introduced before the first dense layer. The neural network
was trained on the log-likelihood of the data making again use of
When specifying the material design objective, it is numerically
the Adam optimizer for the stochastic updates of the parameters
advantageous to pursue a tempering schedule, in particular if the
θ.
desired material behaviour deviates strongly from the initially
observed dataset D, or the properties κ associated with the initial
guess ϕ(0) . In the following we discuss an adaptive tempering Physical model for the computation of properties κ
strategy which - for the sake of illustration - we explain in the In the following we provide a more detailed description of the
context of a utility function u (κ) = IK (κ) (see Fig. 3a). Instead physical models in the structure-property linkage abstractly rep-
of trying to obtain ϕ∗ = arg max p (κ ∈ K|ϕ) directly, we instead resented as κ(x), i.e., the link between the microstructures and
introduce a sequence of target domains K(r) , r = 1..., R, such their physical properties we want to control in this study. Note
that K(R) = K. To this end we may define K(0) in such a way, that the specific choice of κ(x) does not have any direct bearing
that (according to the model  belief) a non-negligible number of on the optimization, as the structure-property linkage only en-
samples κ ∼ pM κ ϕ(0) , D fall into the domain K(0) . In order ters into the generation of the training data D for the surrogate.
to assess how strongly the tempered target domain K(r) can be For our numerical illustrations we make use of both the effective
shifted towards the desired K in each each step r, we can make thermal as well as mechanical properties of the microstructures
use of the effective sample size (ESS) [52]. For an ensemble of Nw x ∼ p (x|ϕ). We present details regarding the numerical com-
phase angles {Ψ(i) }N w
i=1 generated from q (Ψ), we introduce the
putation of the effective mechanical properties, with the thermal
corresponding weights as the model-based belief that the material properties following by analogy. In order to quantify the macro-
properties κ reside in the tempered target domain K(r) scopic response of a microstructure, we consider a linear, isotropic
elasticity problem on the microscopic scale for a representative
Z     volume element (RVE). At this scale the behaviour of the mi-
r (i) crostructure is characterized by the balance equation (BE) and
wn = IK(r) (κ) pM κ Fϕ Ψt , D dκ (25)

constitute equation (CE)
P 
r = wr / Nw r ,
Denoting the normalized weights as w̃n n n=1 wn
the ESS is defined as (BE): div (σ) = 0 ∀s ∈ ΩRVE (27)
(CE): σ = C (s) :  ∀s ∈ ΩRVE (28)
2
where σ,  denote microscopic stress and strain, while C (s)
 −1 P
Nw
Nw r
wn
n=1
constitutes the heterogeneous elasticity tensor C (s). For a binary
X
r 2
ESS =  w̃n = PN (26)
w r 2
wn microstructures where the two phases occupy (random) subdo-
n=1 n=1
mains V0 and V1 (with V0 ∩ V1 = ∅ and V0 ∪ V1 = ΩRVE ) the
where ESS ∈ [0, 1] represents the deterioration of sample qual- elasticity tensor follows as:
ity induced by shifting the domain K(r) . Let q (Ψt ) be an approx-
imation to the posterior over the phase angles conditional on the (
optimality criteria (i.e. K(r) ) and the current value of ϕ. One may C1 , if s ∈ V1
C (s) = (29)
then adaptively chose to shift the target domain K(r) → K(r+1) C0 , if s ∈ V0
in such a manner, that the ESS of the samples generated from
q (Ψt ) does not deteriorate beyond a certain threshold value (e.g. In our case, the elasticity tensors C0 and C1 are fully defined
using a bisection approach). When defining the material design by the Young’s moduli E0 and E1 of the two phases (a common
objective by means of a target distribution ptarget (κ), similarly Poisson’s ratio of ν = 0.3 was used). We define the macroscopic
one may introduce tempering by gradually shifting the sample stress Σ = hσi as well as macroscopic strain E = hi, where h·i
representation giving rise to the evidence lower bound. denotes a spatial average of microscopic quantities over ΩRVE .
We then characterize the macroscopic, effective behaviour of the
microstructure via [41]
Structure-Property linkage
Probabilistic Surrogate
Σ = Ceff : E (30)
The probabilistic surrogate employed is based on a parametric
convolutional neural network (see. Fig. 2), where a split in the fi- under the constraint that (30) satisfies the averaging theorem
nal dense layers gives rise to (separately) the mean vector mθ (x), by Hill [41, 42]
as well as the covariance matrix Sθ (x) (assumed to be diagonal).
The specific choices made regarding the neural network architec-
1
Z
ture are based on prior published work (e.g. [31, 32]). Each Σ:E= t · u dA (31)
block in Fig. 2 corresponds to 2d convolutions with a subsequent |ΩRVE | ∂ΩRVE
non-linear activation function (Leaky ReLU) ,followed by aver-
age pooling. The convolutional layers employ a (3 × 3) kernel, with microscopic tractions t and displacements u. The homog-
which in combination with appropriate padding leaves the size of enized properties κ are based on various entries of Ceff which is
the feature map unchanged 9 . The subsequent average pooling computed by solving an ensemble of elementary load cases, as
always employs a (2 × 2) kernel (and identical stride), such that given by the following macroscopic strain modes
the size of the feature maps is reduced by half in each block. For
the numerical results presented, after a sequence of 4 such blocks 
1 0
 
0 0
 
0 0.5

(with an increasing depth of 4, 8, 12 and 16 channels in the fea- Ê (1) = , Ê (2) = , Ê (3) =
0 0 0 1 0.5 0
ture maps), the resulting feature representation extracted from
(32)
the microstructure is flattened and enters first a shared hidden
layer (of width 30), subsequently splitting up into two more lay- corresponding to either a pure tension or shear mode, such that
ers that map to the mean µθ and the diagonal covariance matrix Ceff is recovered by integrating the microscopic stress σ to obtain
Sθ (x) (the positivity of the latter is ensured via an exponential Σ = hσi. One possible approach of imposing these elementary
transformation). The two phases were encoded as a (+1) and strain modes is based on the introduction of periodic boundary
(−1) for the CNN, as this is numerically more expedient com- condition (as opposed to defining load cases based on displace-
pared to an {1, 0} representation of the phases. For all numerical ment or tractions), which augments Eq. (27) and (28) by
experiments presented, the neural network was trained with a
batch size of Nbs = 128. To add regularization, a weight decay of
10−5 was used, and additionally a dropout layer (with p = 0.05)  = Ê (c) + ∇s v in ΩRVE (33)
9
The discretized microstructures regarded as a vectors x ∈ {0, 1} 4096 v is ΩRVE -periodic (34)
are of course reshaped into their original (64 × 64) un-flattened tensor rep-
resentation for the CNN. t=σ·n is ΩRVE -antiperiodic (35)

13
Here v denotes a periodic fluctuation (i.e., u = Es + v), and t
are antiperiodic tractions on the boundary of the domain ΩRVE .
We solve for the periodic fluctuations v for all three elementary
load cases c = {1, 2, 3} using the Bubnov-Galerkin approach and
the standard Finite Element Method (an additional Lagrange
multiplier has to be included in the variational problem to dis-
ambiguate it with regards to rigid body transformations). The
effective tangent moduli Ceff of the RVE thus obtained by the
solution of the differential equations(Eq. (27), (28), (33), (34)
and (35)) can be shown [41] to satisfy the averaging theorem by
Hill. We finally note that Ceff and the properties of interest κ
vary depending on the underlying, random microstructure x and
the need for their repeated computation represents the computa-
tional bottleneck for the optimization of the process parameters
ϕ.

14

You might also like