You are on page 1of 9

Knowledge-Based Systems 184 (2019) 104903

Contents lists available at ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

Gradient-enhanced high dimensional model representation via


Bayesian inference

Kai Cheng, Zhenzhou Lu , Kai Chaozhang
School of Aeronautics, Northwestern Polytechnical University, Xi’an, 710072, PR China

highlights

• HDMR surrogate model are developed based on Bayesian inference technique.


• An efficient method for HDMR model integrating gradient information is presented.
• Eight benchmark examples are used to validate the effectiveness of the method.

article info a b s t r a c t

Article history: Recently, gradient-enhanced surrogate models have drawn extensive attention for function approx-
Received 9 September 2018 imation, in which the gradient information is utilized to improve the surrogate model accuracy. In
Received in revised form 19 June 2019 this work, gradient-enhanced high dimensional model representation (HDMR) is established based
Accepted 28 July 2019
on Bayesian inference technique. The proposed method first assigns Gaussian process prior for the
Available online 31 July 2019
model response function and its partial derivative functions (with respect to all the input variables).
Keywords: Then the auto-covariance functions and the cross-covariance functions of these random processes are
Surrogate model established respectively by the HDMR basis functions. Finally, the posterior distribution of the response
Bayesian inference function is analytically obtained through Bayes theorem. The proposed method combines the sample
Gaussian process information and gradient information in a seamless way to yield a highly accurate HDMR prediction
High dimensional model representation model. We demonstrate our method via several examples, and the results all suggest that combining
gradient information with sample information provides more accurate prediction results at reduced
computational cost.
© 2019 Elsevier B.V. All rights reserved.

1. Introduction HDMR is developed as a set of quantitative model assessment


and analysis tools for capturing complex high-dimensional input–
Over the last decades, along with the rapid development of output system behavior [31,32,36]. Its theoretical foundation lies
computer science and technique, a variety of complex computa- in that only low-order correlations amongst the input variables
tional models have been developed for simulating and predict- have a significant effect on the model outputs, thus the HDMR
ing the behavior of systems in nearly all fields of engineering permits expressing a multidimensional function as a sum of many
and science. In the meanwhile, surrogate model is becoming low dimensional component functions to reflect the independent
more and more popular to meet the growing engineering de-
and cooperative contribution. In practical applications, two types
mand in computational power [1]. For decades, various types of
of HDMR expansions are developed in literature. One decomposes
surrogate model techniques have been continuously developed,
models output into components defined on hyper-planes based
such as Kriging/Gaussian Process (GP) [2–6] and its gradient-
enhanced versions [7], support vector regression (SVR) [8–12]and on a fixed cut point or anchor point in the parameter space, thus
its gradient-enhanced variant [13–15], radial basis functions (RBF) this method is known as Cut-HDMR. The other method conforms
[16–18], polynomial chaos expansion (PCE) [13,19–23], artificial to the analysis of variance (ANOVA) decomposition widely used
neural networks (ANN) [24–26] and high dimensional model in uncertainty quantification and sensitivity analysis, and hence
representation (HDMR) [27–35], among which the HDMR has is called ANOVA-HDMR.
attracted significant interest due to its ability to handle complex To construct the HDMR surrogate model, various methods
high dimensional models. are developed to evaluate the low-order component terms of
the truncated dimension-wise decompositions. For Cut-HDMR,
∗ Corresponding author. the usual practice is to set up a look-up table of the compo-
E-mail address: zhenzhoulu@nwpu.edu.cn (Z. Lu). nent functions values evaluated at the selected cut points or

https://doi.org/10.1016/j.knosys.2019.104903
0950-7051/© 2019 Elsevier B.V. All rights reserved.
2 K. Cheng, Z. Lu and K. Chaozhang / Knowledge-Based Systems 184 (2019) 104903

anchor points in the parameter space, and the Lagrange or lin- where
ear interpolation method is utilized to estimate the component g0 = E [g(x)] ,
functions. Rabitz et al. [37] used equidistantly distributed sam- gk1 (xk1 ) = E [g(x)|xk1 ] − f0 ,
ple points along each axis of the input parameter to develop gk1 k2 (xk1 , xk2 ) = E [g(x)|xk1 , xk2 ] − gk1 (xk1 ) − gk2 (xk2 ) − g0 . (2)
Cut-HDMR. Liu et al. [27] suggested using non-uniform optimal ..
nodes to mitigate the Runge’s phenomenon. The optimal nodes .
are selected as the nodes of Legendre–Gauss, Chebyshev—Gauss In Eq. (1), g0 indicates a constant term denoting the 0th order
and Clenshaw–Curtis quadratures. ANOVA-HDMR, also known effect to g(x). The component functions gk1 (xk1 )(k1 = 1, . . . , n)
as random sampling (RS)-HDMR, expands the component func- denote the first-order term expressing the individual effect of
tions in terms of analytical basis functions, and the Monte Carlo input parameters xk1 on the output response g(x). The compo-
simulations or other technique is utilized to estimate the basis nent functions gk1 k2 (xk1 , xk2 )(1 ≤ k1 ≤ k2 ≤ n) indicate the
function coefficients. Li et al. [36] suggested using orthogonal second-order term describing the interactive effects of the input
polynomials, cubic B-splines and polynomial to approximate the parameters xk1 and xk2 upon the output response. Similarly, the
ANOVA-HDMR component functions. Luo et al. [34] used the high-order component functions gu (xu )(u ∈ [1, . . . , n]) denote
reproducing kernel technique to estimate the arbitrary order the cooperative effects of a set of parameters xu . Experience
ANOVA-HDMR component functions. Lambert et al. [38] utilized shows that the high order interactions among input parameters
the group method of data handling (GMDH) method to con- in Eq. (1) can be negligible so that the HDMR can be truncated to
struct the ANOVA-HDMR surrogate model based on Legendre include terms up to second order components [30,34], namely,
polynomial basis functions. n
∑ ∑
This work aims at developing the ANOVA-HDMR surrogate g(x) = g0 + gk1 (xk1 ) + gk1 k2 (xk1 , xk2 ). (3)
model integrating with gradient information based on Bayesian k1 =1 1≤k1 ≤k2 ≤n
inference technique, namely, gradient-enhanced HDMR
The total number of summands in Eq. (3) is 1 + n + n(n − 1)/2,
(GE-HDMR). Firstly, the response function of an underlying model
and the component functions in Eq. (3) can be approximated
is approximated by the HDMR component functions with un-
as [38]:
known coefficients. Then we assign the GP prior for the HDMR
d
model and its partial derivative function. The auto-covariance
αr(k1 ) ϕr (xk1 ),

functions and the cross-covariance functions of these GP are gk1 (xk1 ) ≈
defined by the HDMR component functions respectively. Given a r =1
l m
set of samples (response information and gradient information), ∑ ∑ (4)
gk1 k2 (xk1 , xk2 ) ≈ βpq
(k1 k2 )
ϕp (xk1 )ϕq (xk2 ),
the posterior distribution of the model response can be computed
p=1 q=1
by means of Bayes theorem, and the GE-HDMR surrogate model
..
is analytically derived by its mean function. The proposed method .
combines the sample information and gradient information to
where {ϕp }p∈N∗ (N∗ = 1, 2, . . .) indicate the orthogonal polyno-
approximate the response function, thus it provides much more (k ) (k k )
accurate prediction result compared to classic HDMR model with mials basis functions, αr 1 and βpq1 2 represent the basis func-
no gradient information. Several examples are used to validate tion coefficients, while d, l and m are the predefined polynomial
the accuracy and efficiency of the presented method, results orders of bases.
demonstrate that the developed GE-HDMR surrogate model is
3. Gradient-enhanced HDMR based on Bayesian inference
much more efficient and accurate than the classic HDMR model.
The rest of this paper is organized as follows: Section 2 re-
3.1. Bayesian inference method
views the basic theories of classic HDMR. The construction of the
GE-HDMR surrogate model is introduced in detail in Section 3.
In the Bayesian framework, the crux is to select an appropriate
In Section 4, several tested examples are used to illustrate the
prior for the model output g(x). In general, the GP is always
performance of the proposed GE-HDMR method. Finally, some chosen as the prior for g(x)
conclusions are drawn in Section 5. { [2]. Given }the T
training samples set
{D = (X , Y )}, where X = x(1) , . . . , x(N ) is the input data, Y =
}T
g(x(1) ), . . . , g(x(N ) ) is the corresponding model response, N is
{
2. High dimensional model representation the size of the training samples set, the joint distribution of Y is
Gaussian under the GP prior hypothesis, i.e., Y ∼ N (µY , K ), and
High-dimensional function estimation faces with the so-called its joint probability density function (PDF) f (Y ) can be defined as
‘‘curse of dimensionality’’ as the sample size needed to approxi-
mate the function to a satisfying accuracy level increases expo-
( )
1
nentially with the dimensionality of the function [34]. However, f (Y ) = (2π )−N /2 |K |−1/2 exp − (Y − µY )T K −1 (Y − µY ) , (5)
2
HDMR provides a remarkable way to overcome this predicament
by approximating high-dimensional function with a sum of low- where µY and K is the mean vector and covariance matrix of Y
dimensional functions [27,30]. Considering a square-integrable respectively. Generally, the covariance matrix K is conveniently
response function y = g(x), n-dimensional independent input pa- chosen to be the squared exponential covariance function:
rameters are gathered in vector x. HDMR expresses the response (i) (j)
{ n
}
(i) (j) 1 ∑ (xd − xd )2
Kij = Cov g(x ), g(x ) = exp − ,
( )
function f (x) as finite components in terms of input parameters (6)
as
2
d=1
wd2
n
∑ ∑ where w = (w1 , . . . , wn ) is the hyper-parameter vector of the
g(x) = g0 + gk1 (xk1 ) + gk1 k2 (xk1 , xk2 ) + · · · covariance function. Under the GP prior assumption, the posterior
k1 =1 1≤k1 ≤k2 ≤n
(1) distribution of the model output is also a GP which can be
+ g1,2,...,n (x1 , . . . , xn ), obtained analytically by means of the Bayes theorem. The analytic
K. Cheng, Z. Lu and K. Chaozhang / Knowledge-Based Systems 184 (2019) 104903 3

construction of the posterior distribution of the model output covariance function in Eq. (9) as
forms the state-of-the-art of GP-based surrogate modeling [2].
∂ g(x(j) ) ∂ K (x(i) , x(j) )
[
]
In next subsection, we provided the formulation for developing (i) (j) (i)
C (x , x ) = Cov g(x ), =
GE-HDMR by Bayesian inference technique. ∂ xi1 ∂ xi 1
n d
∂ϕr (x(kj1) )
ϕr (x(ki1) )
∑ ∑
=
k1 =1 r =1
∂ x(i1j)
3.2. Gradient-enhanced HDMR
l m
∂ϕp (x(kj1) )ϕq (x(kj2) )
ϕp (x(ki1) )ϕq (x(ki2) )
∑ ∑ ∑
+ .
Following the procedure in Section 2, HDMR can be approxi- 1≤k1 ≤k2 ≤n p=1 q=1
∂ x(i1j)
mated as
(11)
n d
(N ) T
αr(k1 ) ϕr (xk1 ) For N realizations X = x(1) , . . . , x
∑ ∑ { }
g(x) ≈ g0 + of the random input
}T
vector, the response vector Y = g(x(1) ), . . . , g(x(N ) ) is com-
{
k1 =1 r =1
(7)
∑ l
∑ m
∑ puted by deterministic solver at these points. In addition, the gra-
+ β (k1 k2 )
pq ϕ
p (xk1 ) q (xk2 )ϕ . dient information Y∂ =
∂ g(x(1) ) (1 ) ) (N ) ) (N ) )
( )
1≤k1 ≤k2 ≤n p=1 q=1 ∂x
, . . . , ∂ g(x
∂x n
, . . . , ∂ g(x
∂x
, . . . , ∂ g(x
∂x n
can be obtained ef-
1 1
[ ] ficiently via direct or adjoint method [7,40]. Therefore, the joint
∂ g(x)
Thus the gradient g∂ (x) = ∂ x1
, . . . , ∂∂g(x)
xn
of g(x) can be distribution of Y , Y ∂ and g(x) for an untried site x in the param-
obtained as eter space is

g(x) µg K (x, x) K (x, X ) C (x, X )


n d
[ ] ([ ] [ ])
∂ g(x) ∂ϕr (xk1 )
αr(k1 )
∑ ∑
≈ Y ∼N µY , K (X , x) K (X , X ) C (X , X ) ,
∂ xi1 ∂ xi1 Y∂ 0 C (X , x) C (X , X ) Q (X , X )
k1 =1 r =1
l m
∑ ∑ ∑ ∂ϕp (xk1 )ϕq (xk2 ) (12)
+ βpq
(k1 k2 )
(i1 = 1, . . . , n).
∂ xi1 where µg indicates the mean value of g(x), µY and 0 represent
1≤k1 ≤k2 ≤n p=1 q=1
the mean value vector of Y and Y ∂ respectively, K (x, x) is the vari-
(8)
ance of g(x), K (x, X ) is the covariance vector between g(x) and
Next, we will develop GE-HDMR surrogate model based on Y , C (x, X ) is the cross-covariance vector between g(x) and Y ∂ ,
Bayesian inference technique. Firstly, we assign GP prior for K (X , x) is the covariance vector between Y and g(x), K (X , X ) is
response function g(x) and all the partial derivative functions the covariance matrix of Y , C (X , X ) is the cross-covariance matrix
∂ g(x)
(i1 = 1, . . . , n). The covariance function of model response between Y and Y ∂ , C (X , x) is the cross-covariance vector between
∂x
i1
Y ∂ and g(x), while Q (X , X ) is the cross-covariance matrix of Y ∂ .
g(x) can be defined by the inner production of the HDMR basis
Then the conditional (posterior) GP distribution of g(x) can be
functions [10,13] according to Eq. (7) as
derived as
K (x(i) , x(j) ) = Cov (g(x(i) ), g(x(j) )) g(x) ∼ N µ(x), σ 2 (x) ,
( )
(13)
n d

ϕr (x(ki1) )ϕr (x(kj1) ) where


∑ ∑
=
k1 =1 r =1 ]−1
K (X , X ) C (X , X )
[
µ(x) = µg + K (x, X ) C (x, X )
[ ]
l m
C T (X , X ) Q (X , X )
ϕp (x(ki1) )ϕq (x(ki2) )ϕp (x(kj1) )ϕq (x(kj2) ).
∑ ∑∑
+
Y − µY
[ ]
1≤k1 ≤k2 ≤n p=1 q=1 × ,
Y∂
(9)
(14)
Then, the cross-covariance functions between any two partial
∂ g(x) ∂ g(x)
derivative functions ∂ x and ∂ x (i1 , i2 = 1, . . . , n) can be σ 2 (x) = K (x, x) − K (x, X ) C (x, X )
[ ]
i1 i1
obtained by the second derivative of the covariance function in ]−1 [
K (X , X ) C (X , X ) K (X , x) (15)
[ ]
Eq. (9) [7,39], namely × .
C T (X , X ) Q (X , X ) C (X , x)
∂ g(x(i) ) ∂ g(x(j) ) ∂ 2 K (x(i) , x(j) )
[ ]
Q (x(i) , x(j) ) = Cov , K (X , X ) C (X , X )
[ ]
=
∂ xi1 ∂ xi2 ∂ xi1 ∂ xi2 Here is the joint cross-covariance
C T (X , X ) Q (X , X )
n
∑ d
∑ ∂ 2 (ϕr (x(ki) )ϕr (x(kj) ))
1 1
matrix, which is positive semi-definite. The proof is following
= (i) ( j) [39]:
k1 =1 r =1
∂ xi1 ∂ xi2
∑ l
∑ m
∑ ∂ 2 (ϕp (x(ki) )ϕq (x(ki) )ϕp (x(kj) )ϕq (x(kj) )) ∀v ∈ RN(n+1) ,
+ 1 2 1 2
. K (X , X ) C (X , X )
[ ] [ ( )]
Y (16)
∂ x(i1i) ∂ x(i2j) vT v = Var vT ≥ 0.
1≤k1 ≤k2 ≤n p=1 q=1
C T (X , X ) Q (X , X ) Y∂
(10) ]−1
K (X , X ) C (X , X )
[ [ ]
K1 K2
Here we take = for
In addition, the cross-covariance functions between g(x) and C T (X , X ) Q (X , X ) K3 K4
∂ g(x)
∂ xi 1 1
= 1, . . . , n) can be deduced by the derivative of the
(i brevity. Therefore, the mean value function (prediction function)
4 K. Cheng, Z. Lu and K. Chaozhang / Knowledge-Based Systems 184 (2019) 104903

Table 1
Eight benchmark examples.
ID Expression Variable space
5
( 5 )0.2
∑ ∏
F1 g(x) = (5 ln(xi )2 ) − xi xi ∼ N(5, 1)
i=1 i=1
10
x2i
∑ ( ( ))
F2 g(x) = xi ci + ln xi ∼ N(5, 2)
i=1
+ ... + x21 x210
10
[( 10 )]
∑ ∑
F3 g(x) = exp(xi ) ci + xi − ln exp(xi ) xi ∼ N(0, 0.5)
i=1 i=1

g(x) = x21 + x22 + x1 x2 − 14x1 − 16x2 + (x3 − 10)2 + 4(x4 − 5)2


F4 + (x5 − 3)2 + 2(x6 − 1)2 + 5x27 + 7(x8 − 11)2 + 2(x9 − 10)2 + xi ∼ N(0, 2)
(x10 − 7)2 + 45
10

x2i − 10 cos(xi ) + 10 xi ∼ N(0, 1)
( )
F5 g(x) =
i=1

fV = 1.21 − 3.7534 × 10−5 T − 3.1534 × 10−4 T ln CME − 0.7499


+ 6.62 × 10−5 T ln FAIR − 6.9897 exp(916.91/T − 4.6392)I
−[1.2658 × 105 I 3 + 46196I 2 − 428I − 0.4029T − 18.8094CME
2
I ∼ N(0.04, 0.008), T ∼ N(320, 8)
+ 18.8094CME + 10.496] × [ln I − 3.9056 + 2.9582 × 10−4
F6 CME ∼ N(1.1, 0.2), FME ∼ N(4, 0.3)
× (ln CME + ln(1 − 1/(5.3466 × 107 exp(−5182.4/T )CME
2
)))]
FAIR ∼ N(110, 8)
− [−1.2687 × 105 I 3 − 46221I 2 + 4283.6I + 0.4033T + 18.818CME
2

− 10.572 − 18.818CME ] × [ln I − 3.8959 − 8.2402 × 10−4 ln FAIR ]


+ 31.583I 2 ln FME
rw ∼ N(0.1, 0.013), r ∼ N(25050, 5237.5)
2π Tu (Hu − Hl ) Tu ∼ N(89335, 6566.25), Hu ∼ N(1050, 15)
F7 fflow =
Tl ∼ N(89.55, 6.613), Hl ∼ N(760, 15)
( )
ln(r /rw ) 1 + 2LTu
+ Tu
ln(r /rw )rw
2K
w Tl
L ∼ N(1400, 70), Kw ∼ N(10950, 273.75)
S ∼ N(175, 8.33), Wf w ∼ N(260, 13.33)

0.758 Λ ∼ N(−π/18, π/18), λ ∼ N(0.75, 0.083)


fw = 0. 036SW Wf0w.0035 (A/cos 2Λ) q λ 0.006 0.04

F8 )0.49 WP ∼ N(0.0525, 0.00917), A ∼ N(8, 0.67)


−0.3
× (100tc /cosΛ)
(
Nz Wdg + Sw Wp
tc ∼ N(0.13, 0.016), Nz ∼ N(4.25, 0.58)
Wdg ∼ N(2100, 133.33), q ∼ N(30, 5)

For F2 and F3: c1≤i≤10 = [−6.089 − 17.164 − 34.054 − 5.914 − 24.721 − 14.986 − 24.1 − 10.708 − 26.662 − 22.179]

µ(x) can be expressed as In this paper, the mean value of the response function is
(i)
estimated by sample mean value, namely, g0 = µf = µY =
Y − µY
[ ][ ]
K1 K2 ∑N (i)
µ(x) = µg + K (x, X ) C (x, X ) i=1 g(x )/N. Thus µ(x) can be simplified as
[ ]
K3 K4 Y∂
= µg + K (x, X ) [K 1 (Y − µY ) + K 2 Y ∂ ] n d
+ C (x, X ) [K 1 (Y − µY ) + K 2 Y ∂ ] αr(k1 ) ϕr (xk1 )
∑ ∑
µ(x) = g0 +
= µg + K (x, X )ω1 + C (x, X )ω2 k1 =1 r =1
(19)
l m

N n d ∑ ∑ ∑
ω1(i) ⎝ ϕr (x(ki1) )ϕr (xk1 ) + β ϕ
(k1 k2 )
ϕ
p (xk1 ) q (xk2 ) + ···
∑ ∑ ∑
= µg + pq
1≤k1 ≤k2 ≤n p=1 q=1
i=1 k1 =1 r =1

m l
∑ ∑∑ (i) (i) where
+ ϕp (xk1 )ϕq (xk2 )ϕp (xk1 )ϕq (xk2 ) + · · ·⎠
1≤k1 ≤k2 ≤n p=1 q=1
N N
∂ϕr (x(ki1) )
αr(k1 ) = ω1(i) ϕr (x(ki1) ) + ω2(i)
⎛ ∑ ∑
N n d
∂ϕr (x(ki) )
ω2(i) ⎝ ∂ x(i1j)
∑ ∑ ∑
+ (j)
1
ϕr (xk1 ) i=1 i=1
∂ xi1 (20)
i=1 k1 =1 r =1 N
∑ (i) (i) (i)
N
∑ (i)
∂ϕp (x(ki1) )ϕq (x(ki2) )
⎞ β (k1 k2 )
= ω1 ϕp (xk1 )ϕq (xk2 ) + ω2
∑ l
∑ m
∑ ∂ϕp (x(ki) )ϕq (x(ki) ) pq
∂ x(i1j)
+ 1 2
ϕp (xk1 )ϕq (xk2 ) + · · ·⎠ i=1 i=1

1≤k1 ≤k2 ≤n p=1 q=1


∂ x(i1j)
From Eq. (19), we see that the HDMR surrogate model is devel-
(17) oped incorporating gradient information based on Bayesian infer-
where ence technique. The coefficients of GE-HDMR surrogate model are
composed of two part information, namely, sample information
ω1 = [ω1(1) , . . . , ω1(N ) ] = [K 1 (Y − µY ) + K 2 Y ∂ ] (the first part in Eq. (20)) and gradient information (the second
(18)
ω2 = [ω2(1) , . . . , ω2(N ) ] = [K 3 (Y − µY ) + K 4 Y ∂ ] part in Eq. (20)).
K. Cheng, Z. Lu and K. Chaozhang / Knowledge-Based Systems 184 (2019) 104903 5

[ From Eq. (14), ] it follows that the prediction gradient µ∂ (x) = Table 2
∂µ(x) ∂µ(x) The comparisons of the surrogate model errors of Function 1.
∂ x1
, . . . , ∂ xn
can be expressed as
Function 1 GE-HDMR HDMR

∂ K (x, X ) C (x, X ) Sample size RRMSE RMAE RRMSE RMAE


[ ][ ]−1
K (X , X ) C (X , X )
µ∂ (x) = 20 0.0079 0.1360 0.0481 0.2120
∂x C (X , X ) Q (X , X )
T
40 0.0059 0.0870 0.0101 0.2070
Y − µY
[ ]
60 0.0038 0.0815 0.0070 0.1450
× 80 0.0033 0.0415 0.0063 0.0890
Y∂
(21) 100 0.0022 0.0302 0.0060 0.0650
] K (X , X ) C (X , X ) −1
[ ]
= C (X , x) Q (x, X )
[
C T (X , X ) Q (X , X ) Table 3
Y − µY The comparisons of the surrogate model errors of Function 2.
[ ]
× . Function 2 GE-HDMR HDMR
Y∂
Sample size RRMSE RMAE RRMSE RMAE
where Q (x, X ) indicates the cross-covariance vector between
20 0.0204 0.0831 0.0252 0.117
g∂ (x) and Y ∂ . 40 0.0111 0.0594 0.0162 0.116
It is worth noting that the proposed GE-HDMR surrogate 60 0.0091 0.0586 0.0152 0.108
model is an interpolation model, which can be proved easily as 80 0.0089 0.0505 0.0148 0.082
follows (here we respectively take K (X , X ), C (X , X ) and Q (X , X ) 100 0.0076 0.0476 0.0146 0.067

as K , C and Q for brevity)


[ ]−1
K (X , X ) C (X , X ) orthogonal with respect to the same measure [40]. To validate
[ ]
µ(X ) = µg + K (X , X ) C (X , X )
C T (X , X ) Q (X , X ) the performance of the GE-HDMR method, various nonlinear
mathematical functions and engineering problems are employed,
[ ]
Y − µY
× and the corresponding descriptions are listed in Table 1.
Y∂
[ ] To demonstrate the efficiency and accuracy of the proposed
= µg + K C method, the GE-HDMR is compared with the classic HDMR surro-
gate model. Here the relative root mean square error (RRMSE) and
[ ]
K −1 + K −1 C (Q − C T K −1 C )−1 C T K −1 −K −1 C (Q − C T K −1 C )−1
× relative maximum absolute error (RMAE) are used as accuracy
−(Q − C T K −1 C )−1 C T K −1 (Q − C T K −1 C )−1
[ ] metrics. The RRMSE is adopted to gauge the overall accuracy
Y − µY of the meta-model, while the RMAE is used to measure the
×
Y∂ local accuracy of the meta-model. The two accuracy metrics are
= Y. defined as follows:

(22) N1 
√1
1  ∑
In the meanwhile, it is found that RRMSE = (yi − ỹi )2
STD N1
i=1
[ ]−1
K (X , X ) C (X , X ) 1
[ ]
µ∂ (X ) = C (X , X ) Q (X , X ) RMAE = max |yi − ỹi | , i = 1, . . . , N1 (24)
C T (X , X ) Q (X , X ) STD
[ ] 
Y − µY N1

×  1 ∑
Y∂ STD = √ (yi − y)2
[ ] N1 − 1
i=1
= C Q

where yi (i = 1, . . . , N1 ) is the test samples, y is the mean of yi ,


[ ]
K −1 + K −1 C (Q − C T K −1 C )−1 C T K −1 −K −1 C (Q − C T K −1 C )−1
×
−(Q − C T K −1 C )−1 C T K −1 (Q − C T K −1 C )−1 ỹi is the ith prediction of meta-model, and N1 = 5000 is the
[ ] number of test samples. In each example, the training samples
Y − µY
× . and test samples are generated by Latin Hypercube Sampling
Y∂
(LHS) technique.
= Y∂ Figs. 1 and 2 represent the comparisons of the errors for the
(23) eight benchmark examples [41,42] in Table 1, where RRMSE and
RMAE are plotted in log10 scale. The corresponding exact results
One can conclude from Eqs. (22)–(23) that two constraints are also listed in Tables 2–9. As expected, the proposed GE-HDMR
(namely, response constraint µ(X ) = Y and gradient constraint algorithm yields more accurate meta-model than the traditional
µ∂ (X ) = Y ∂ ) are enforced to the proposed GE-HDMR surrogate HDMR meta-model. It is observed that both the RRMSE and RMAE
model, which improves the surrogate model accuracy to a great obtained by the presented method is reduced by almost one or
degree compared to classic HDMR model (namely, no gradient two order of magnitude for all the examples, especially when the
constraint). In next section, we will demonstrate the accuracy and samples size are small. With the increase of the samples size, the
efficiency of the propose GE-HDMR surrogate model by several performances of the GE-HDMR always outperform the HDMR.
examples. In Table 10, we list the training time of the proposed GE-
HDMR and the classic HDMR model with the same training sam-
4. Numerical examples ples. The results are obtained on a standard computer with Intel
Xeon Process W3550 (3.07 GHz). It is observed that the com-
This section is dedicated to the validation and assessment of putational costs of the presented method is larger than that of
the proposed GE-HDMR model. In this paper, the HDMR basis classic HDMR model. Indeed, since the gradient information is
functions is chosen to be Hermite polynomials, since they pos- included, one can see from Eq. (14) that training a GE-HDMR
sess the convenient property that they and their derivatives are model requires inverting a(n + 1)N × (n + 1)Nmatrix, which leads
6 K. Cheng, Z. Lu and K. Chaozhang / Knowledge-Based Systems 184 (2019) 104903

Fig. 1. Comparison of errors between GE-HDMR and HDMR meta-model for function 1–4.

Fig. 2. Comparison of errors between GE-HDMR and HDMR meta-model for function 5–8.

Table 4 Table 5
The comparisons of the surrogate model errors of Function 3. The comparisons of the surrogate model errors of Function 4.
Function 3 GE-HDMR HDMR Function 4 GE-HDMR HDMR
Sample size RRMSE RMAE RRMSE RMAE Sample size RRMSE RMAE RRMSE RMAE
50 0.069 0.590 0.783 4.711 10 0.0144 0.1157 0.4941 2.1779
100 0.019 0.212 0.169 1.722 20 6.19×10−13 6.85×10−12 0.1823 1.0847
150 0.014 0.116 0.126 1.417 30 1.34×10−14 9.09×10−14 0.0181 0.1157
200 0.012 0.070 0.110 0.854 40 1.11×10−14 7.99×10−14 0.0142 0.1085
250 0.012 0.066 0.099 0.744 50 1.02×10−14 5.17×10−14 0.0139 0.0899

to higher computational cost than classic HDMR model. How- listed in Table 10 for each test example. In Figs. 3 and 4, the box-
ever, compared to classic HDMR, the presented method requires plots are provided to show the robustness of the two performance
much fewer true model evaluations to obtain an accurate surro- metrics of each test example, where 1 and 2 in x-axis represent
gate model, which saves much computational cost since single classic HDMR and GE-HDMR model respectively, and here RRMSE
model evaluations of a complex engineering model is usually very and RMAE are plotted in log10 scale. From Figs. 3 and 4, one
expensive. can conclude that both the two method provide relative robust
In addition, we perform the presented GE-HDMR model and results for RRMSE, but variability of RMAE is relative large. Since
the classic HDMR model for 50 times with the same sample size RRMSE measures the overall accuracy of surrogate model, thus
K. Cheng, Z. Lu and K. Chaozhang / Knowledge-Based Systems 184 (2019) 104903 7

Fig. 3. Comparison of robustness between GE-HDMR and HDMR meta-model for function 1–4.

Fig. 4. Comparison of robustness between GE-HDMR and HDMR meta-model for function 5–8.

Table 6 Table 8
The comparisons of the surrogate model errors of Function 5. The comparisons of the surrogate model errors of Function 7.
Function 5 GE-HDMR HDMR Function 7 GE-HDMR HDMR
Sample size RRMSE RMAE RRMSE RMAE Sample size RRMSE RMAE RRMSE RMAE
50 0.096 1.529 1.172 4.342 30 0.00149 0.02530 0.1360 1.0700
100 0.022 0.349 0.408 2.330 60 0.00111 0.02300 0.0191 0.1830
150 0.021 0.331 0.278 1.846 90 0.00078 0.01450 0.0137 0.0930
200 0.014 0.167 0.263 1.487 120 0.00064 0.00610 0.0122 0.1050
250 0.009 0.102 0.241 1.405 150 0.00065 0.00520 0.0108 0.0746

Table 7 Table 9
The comparisons of the surrogate model errors of Function 6. The comparisons of the surrogate model errors of Function 8.
Function 6 GE-HDMR HDMR Function 8 GE-HDMR HDMR
Sample size RRMSE RMAE RRMSE RMAE Sample size RRMSE RMAE RRMSE RMAE
20 0.00021 0.0105 0.0038 0.043 20 0.00577 0.08259 0.12745 0.76794
40 7.31×10−5 0.0060 0.0035 0.064 40 0.00329 0.05478 0.10428 0.56946
60 6.18×10−5 0.0055 0.00015 0.024 60 0.00224 0.05280 0.09820 0.50323
80 4.43×10−5 0.0054 5.92×10−5 0.0125 80 0.00161 0.05010 0.02589 0.23139
100 4.36×10−5 0.0044 5.42×10−5 0.0134 100 0.00069 0.03802 0.01977 0.14373
8 K. Cheng, Z. Lu and K. Chaozhang / Knowledge-Based Systems 184 (2019) 104903

Table 10 References
The comparisons of the computational time of Function 1–8.
Function ID GE-HDMR time HDMR time Sample size [1] H. Liu, Y.-S. Ong, J. Cai, A survey of adaptive sampling for global metamod-
1 2.21 1.72 s 100 eling in support of simulation-based complex engineering design, Struct.
2 9.18 s 1.06 s 100 Multidiscip. Optim. 57 (2018) 393–416.
3 35.96 s 1.47 s 250 [2] C. Rasmussen, C. Williams, Gaussian Processes for Machine Learning, MIT
4 1.87 s 0.96 s 50 Press, 2006.
5 82.91 s 5.58 s 250 [3] L.L. Gratiet, Multi-Fidelity Gaussian Process Regression for Computer
6 2.22 s 1.70 s 100 Experiments, 2013.
7 6.84 s 4.17 s 150 [4] L. Parussini, D. Venturi, P. Perdikaris, G.E. Karniadakis, Multi-fidelity Gaus-
8 5.76 s 1.45 s 100 sian process regression for prediction of random fields, J. Comput. Phys.
336 (2017) 36–50.
[5] J.P.C. Kleijnen, Regression and Kriging metamodels with their experimental
designs in simulation: A review, European J. Oper. Res. 256 (2017) 1–16.
[6] A. Melkumyan, F. Ramos, Multi-kernel Gaussian processes, in: International
it is relative stable than RMAE, which measures the maximum Joint Conference on Ijcai, 2009.
absolute error. Also, it is observed that the provided GE-HDMR [7] Z.-H. Han, S. Görtz, R. Zimmermann, Improving variable-fidelity surrogate
model provides more robust results of RRMSE for most of the modeling via gradient-enhanced kriging and a generalized hybrid bridge
test example, which demonstrate the robustness of the presented function, Aerosp. Sci. Technol. 25 (2013) 177–189.
[8] J.M. Bourinet, F. Deheeger, M. Lemaire, Assessing small failure probabilities
method. by combined subset simulation and support vector machines, Struct. Saf.
On the whole, the proposed method in this paper is an efficient 33 (2011) 343–353.
and robust way to improve the accuracy of HDMR meta-model by [9] X. Zhu, Z. Gao, An efficient gradient-based model selection algorithm for
integrating the gradient information. multi-output least-squares support vector regression machines, Pattern
Recognit. Lett. 111 (2018) 16–22.
[10] K. Cheng, Z. Lu, Y. Zhou, Y. Shi, Y. Wei, Global sensitivity analysis using
5. Conclusions support vector regression, Appl. Math. Model. (2017).
[11] P. Tsirikoglou, S. Abraham, F. Contino, C. Lacor, G. Ghorbaniasl, A hyperpa-
In this paper, we investigated high dimensional model rep- rameters selection technique for support vector regression models, Appl.
Soft Comput. 61 (2017) 139–148.
resentation surrogate model when gradient information of a re-
[12] K. Cheng, Z. Lu, Y. Wei, Y. Shi, Y. Zhou, Mixed kernel function support
sponse function is present. Assuming the response function and vector regression for global sensitivity analysis, Mech. Syst. Signal Process.
its partial derivative functions are all Gaussian Processes, we 96 (2017) 201–214.
developed the auto-covariance functions and cross-covariance [13] K. Cheng, Z. Lu, Adaptive sparse polynomial chaos expansions for global
sensitivity analysis based on support vector regression, Comput. Struct.
functions of all these random process. Then the analytical ex-
194 (2018) 86–96.
pression of the GE-HDMR surrogate model is derived from the [14] X. Zhou, T. Jiang, An effective way to integrate ε -support vector regression
joint distribution of samples and gradients. The proposed method with gradients, Expert Syst. Appl. 99 (2018) 126–140.
combines the sample information and the gradient information as [15] T. Jiang, X. Zhou, Gradient/Hessian-enhanced least square support vector
a whole to make a more accurate prediction. Thus the GE-HDMR regression, Inform. Process. Lett. 134 (2018) 1–8.
[16] Q. Zhou, Y. Wang, P. Jiang, X. Shao, S.-K. Choi, J. Hu, et al., An active
surrogate model is promising for function approximation when learning radial basis function modeling method based on self-organization
gradient information can be easily obtained. maps for simulation-based design problems, Knowl.-Based Syst. 131 (2017)
Eight benchmark examples are used to validate the perfor- 10–27.
mance of the GE-HDMR surrogate model, and the results all [17] R. Schaback, Error estimates and condition numbers for radial basis
function interpolation, Adv. Comput. Math. 3 (1995) 251–264.
suggest that the proposed method is an effective way to inte- [18] M. Björkman, K. Holmström, Global optimization of costly nonconvex
grate gradient information to improve the accuracy of HDMR functions using radial basis functions, Opt. Eng. 1 (4) (2000) 373–397.
meta-model. [19] G. Blatman, B. Sudret, Adaptive sparse polynomial chaos expansion based
Although the presented method improve the accuracy of clas- on least angle regression, J. Comput. Phys. 230 (2011) 2345–2367.
[20] V. Keshavarzzadeh, R.G. Ghanem, S.F. Masri, O.J. Aldraihem, Convergence
sic HDMR model, it usually requires more training time to ob-
acceleration of polynomial chaos solutions via sequence transformation,
tain a accurate surrogate model. Also, the presented method Comput. Methods Appl. Mech. Engrg. 271 (2014) 167–184.
leads to a full HDMR model. However, recent study have shown [21] G. Blatman, B. Sudret, Efficient computation of global sensitivity indices
that a sparse representation of HDMR model could improve its using sparse polynomial chaos expansions, Reliab. Eng. Syst. Saf. 95 (2010)
performance [43]. Thus the future work concentrates on reduc- 1216–1229.
[22] S. Salehi, M. Raisee, M.J. Cervantes, A. Nourbakhsh, An efficient multifidelity
ing the computational cost of training GE-HDMR model as well ℓ1-minimization method for sparse polynomial chaos, Comput. Methods
as developing efficient algorithm to construct sparse GE-HDMR Appl. Mech. Engrg. 334 (2018) 183–207.
model. [23] B. Sudret, Global sensitivity analysis using polynomial chaos expansions,
Reliab. Eng. Syst. Saf. 93 (2008) 964–979.
[24] V. Papadopoulos, D.G. Giovanis, N.D. Lagaros, M. Papadrakakis, Accelerated
Declaration of competing interest subset simulation with neural networks for reliability analysis, Comput.
Methods Appl. Mech. Engrg. 223–224 (2012) 70–80.
No author associated with this paper has disclosed any po- [25] W. Hao, Z. Lu, P. Wei, J. Feng, B. Wang, A new method on ANN for variance
tential or pertinent conflicts which may be perceived to have based importance measure analysis of correlated input variables, Struct.
Saf. 38 (2012) 56–63.
impending conflict with this work. For full disclosure statements
[26] M. Marseguerra, R. Masini, E. Zio, G. Cojazzi, Variance decomposition-based
refer to https://doi.org/10.1016/j.knosys.2019.104903. sensitivity analysis via neural networks, Reliab. Eng. Syst. Saf. 79 (2003)
229–238.
Acknowledgments [27] Y. Liu, M. Yousuff Hussaini, G. Ökten, Accurate construction of high
dimensional model representation with applications to uncertainty
quantification, Reliab. Eng. Syst. Saf. 152 (2016) 281–295.
The authors would like to express the gratitude to three re- [28] X. Ma, N. Zabaras, An adaptive high-dimensional stochastic model rep-
viewers for helpful comments and constructive suggestions. This resentation technique for the solution of stochastic partial differential
work was supported by the National Natural Science Founda- equations, J. Comput. Phys. 229 (2010) 3884–3915.
tion of China (Grant No. NSFC 51775439), National Science and [29] H. Rabitz, ÖF. Aliş, General foundations of high-dimensional model
representations, J. Math. Chem. 25 (1999) 197–233.
Technology Major Project (2017-IV-0009-0046) and ‘‘Innovation [30] E. Li, H. Wang, G. Li, High dimensional model representation (HDMR)
Foundation for Doctor Dissertation of Northwestern Polytechnical coupled intelligent sampling strategy for nonlinear problems, Comput.
University’’ with project code of CX201933. Phys. Comm. 183 (2012) 1947–1955.
K. Cheng, Z. Lu and K. Chaozhang / Knowledge-Based Systems 184 (2019) 104903 9

[31] Genyuan Li, A. Carey Rosenthal, Herschel Rabitz, High dimensional model [39] L. Laurent, R.L. Riche, B. Soulier, P.A. Boucard, An overview of gradient-
representations, J. Phys. Chem. A 105 (2001) 7765–7777. enhanced metamodels with applications, Arch. Comput. Methods Eng.
[32] G. Li, S.W. Wang, High dimensional model representations generated from (2017) 1–46.
low dimensional data samples, I. mp-Cut-HDMR, J. Math. Chem. 30 (2001) [40] J. Peng, J. Hampton, A. Doostan, On polynomial chaos expansion via
1–30. gradient-enhanced ℓ 1 -minimization, J. Comput. Phys. 310 (2016)
[33] G. Li, J. Hu, S.W. Wang, P.G. Georgopoulos, J. Schoendorf, H. Rabitz, 440–458.
Random sampling-high dimensional model representation (RS-HDMR) and [41] X. Cai, H. Qiu, L. Gao, P. Yang, X. Shao, An enhanced RBF-HDMR integrated
orthogonality of its different order component functions, J. Phys. Chem. A with an adaptive sampling method for approximating high dimensional
110 (2006) 2474–2485. problems in engineering design, Struct. Multidiscip. Optim. 53 (2016)
[34] X. Luo, Z. Lu, X. Xu, Reproducing kernel technique for high dimen- 1209–1229.
sional model representations (HDMR), Comput. Phys. Comm. 185 (2014) [42] H. Liu, J.-R. Hervas, Y.-S. Ong, J. Cai, Y. Wang, An adaptive RBF-HDMR mod-
3099–3108. eling approach under limited computational budget, Struct. Multidiscip.
[35] I.M. Sobol, Theorems and examples on high dimensional model Optim. 57 (2018) 1233–1250.
representation, Reliab. Eng. Syst. Saf. 79 (2003) 187–193. [43] R.S.C. Lambert, F. Lemke, S.S. Kucherenko, S. Song, N. Shah, Global
[36] Genyuan Li, A.Shengwei Wang, Herschel Rabitz, Practical approaches to sensitivity analysis using sparse high dimensional model representa-
construct RS-HDMR component functions, J. Phys. Chem. A 106 (2002) tions generated by the group method of data handling, Math. Comput.
8721–8733. Simulation 128 (2016) 42–54.
[37] H. Rabitz, Ö.F. Aliş, J. Shorter, K. Shim, Efficient input—output model
representations, Comput. Phys. Commun. 117 (1999) 11–20.
[38] R.S.C. Lambert, F. Lemke, S.S. Kucherenko, S. Song, N. Shah, Global
sensitivity analysis using sparse high dimensional model representations
generated by the group method of data handling, Math. Comput. Simul.
128 (2016) 42–54.

You might also like