You are on page 1of 8

1058 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 34, NO.

2, FEBRUARY 2023

Sample-Based Neural Approximation Approach


for Probabilistic Constrained Programs
Xun Shen , Member, IEEE, Tinghui Ouyang , Nan Yang , and Jiancang Zhuang

Abstract— This article introduces a neural approximation- replaced by deterministic constraints imposed for finite sets of
based method for solving continuous optimization problems with independently extracted samples of random variables. The sce-
probabilistic constraints. After reformulating the probabilistic nario approach ensures that the solution of the approximately
constraints as the quantile function, a sample-based neural
network model is used to approximate the quantile function. formulated deterministic program satisfies the original proba-
The statistical guarantees of the neural approximation are bilistic constraints with a determined bound of probability. The
discussed by showing the convergence and feasibility analysis. bound of probability is determined by the number of extracted
Then, by introducing the neural approximation, a simulated samples. Afterward, the scenario approach has been improved
annealing-based algorithm is revised to solve the probabilistic to ensure more tight confidence bounds of probability by
constrained programs. An interval predictor model (IPM) of wind
power is investigated to validate the proposed method. a sampling-and-discarding approach [8]. In the sampling-
and-discarding scenario approach, a certain proportion of the
Index Terms— Neural network model, nonlinear optimiza- extracted samples are used to define the deterministic con-
tion, probabilistic constraints, quantile function, sample average
approximation. straints, while the rest ones are discarded. With a less sample
number, the violation probabilities of the original probabilistic
I. I NTRODUCTION constraints are still preserved. However, the scenario approach

P ROBABILISTIC-constrained programs (PCPs) or chance-


constrained programs (CCPs) are mathematical programs
involved random variables in constraints, which is required
has a fatal drawback. It gives a too conservative solution,
which converges to the totally robust solution of the uncertain
constraints rather than the original probabilistic constraints.
to be satisfied in a given probability level [1]. In this study, Namely, as the sample number becomes infinity, the solution
the appellation of PCPs is adopted. PCPs have been used will satisfy the uncertain constraints with probability 1 rather
in various fields, such as decision-making and filtering in than the given level of probability.
uncertain systems [2], motion planning with probabilistic On the other hand, a sample average approach has been
constraints [5], engine combustion control [3], probabilistic proposed in [9] and [10] to satisfy the probabilistic con-
bound estimation of uncertain state trajectories [4], and energy straints strictly. The sample average program is formulated
storage modeling toward distribution systems in electricity as an approximation of the PCP by replacing the probabilistic
market [6]. constraints with a measure to indicate the violation probability.
PCPs have a great value for practical applications. However, Besides, an inner–outer approximate approach is proposed
it is NP-hard to solve the PCPs directly due to the existence in [11] to get a tight sample average approximate program.
of probabilistic constraints. Thus, a lot of research has been A randomized optimization-based method is used to solve
conducted to develop an approximation approach to solve the PCPs in [12].
PCPs. For instance, Calariore and Campi [7] proposed the This article extends the sample average approach by intro-
scenario approach in which the probabilistic constraints are ducing neural network-based constraints to replace probabilis-
tic constraints. After reformulating the probabilistic constraints
Manuscript received 29 May 2020; revised 3 January 2021 and
6 June 2021; accepted 31 July 2021. Date of publication 10 August as the quantile function, a sample-based neural network model
2021; date of current version 6 February 2023. This work was supported is used to approximate the quantile function. The statistical
in part by the Graduate University for Advanced Studies, SOKENDAI. guarantees of the neural approximation are discussed by show-
(Corresponding author: Xun Shen.)
Xun Shen is with the Department of Statistical Sciences, Graduate Uni- ing the convergence and feasibility analysis. Then, by intro-
versity for Advanced Studies, SOKENDAI, Tokyo 106-8569, Japan (e-mail: ducing the neural approximation, a simulated annealing-based
shenxun@ism.ac.jp). algorithm is revised to solve the PCPs. An interval predictor
Tinghui Ouyang is with the National Institute of Advanced
Industrial Science and Technology, Tokyo 135-0064, Japan (e-mail: model (IPM) of wind power is investigated to validate the
ouyang.tinghui@aist.go.jp). proposed method.
Nan Yang is with the Department of Hubei Provincial Collaborative This article is organized as follows. Section II gives the
Innovation Center for New Energy Microgrid, China Three Gorges University,
Yichang 443002, China (e-mail: ynyyayy@ctgu.edu.cn). problem formulation. Section III demonstrates the proposed
Jiancang Zhuang is with The Institute of Statistical Mathematics, neural approximation approach and the proposed algorithm is
Tokyo 190-8562, Japan (e-mail: zhuangjc@ism.ac.jp). introduced in Section IV. Section V presents how to use the
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TNNLS.2021.3102323. proposed method to establish IPM of wind power. Finally,
Digital Object Identifier 10.1109/TNNLS.2021.3102323 Section VI concludes this article.
2162-237X © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on February 21,2023 at 05:33:25 UTC from IEEE Xplore. Restrictions apply.
SHEN et al.: SAMPLE-BASED NEURAL APPROXIMATION APPROACH FOR PCPs 1059

II. P ROBLEM F ORMULATION


Consider the following optimization problem with proba-
bilistic constraints:
min J (u)
u∈U
s.t. Pr{h(u, δ) ≤ 0} ≥ 1 − α, δ ∈ , α ∈ (0, 1) (1)
where u represents the decision variable with a closed and
compact feasible domain U ⊆ Rn u , J : Rn u → R is the
objective function, δ represents an uncertain parameter vector
with sample space  ⊆ Rn δ , probability measure is well
defined on the B (the sigma algebra of ) and the probability
density of δ is denoted as pδ (δ), h : Rn u × Rn δ → Rn h is a
Fig. 1. Comparison of 1 − α − Pr{h(u, δ) ≤ 0} and Q 1−α (h(u, δ)), where
vector-valued function, and α is a risk parameter. The notation h(u, δ) = 1.5u 2 − 3 + δ and δ ∼ N (0, 1) (α = 0.1).
of the probabilistic constraints is that the constraints repre-
sented by h(u, δ) ≤ 0 should be satisfied with a probability
larger than 1 − α. Besides, this study focuses on problems The quantile Q 1−α (h(u, δ)) ≤ 0 is equivalent to F(0, u) ≥
in which J and h are continuous and differentiable ∀u ∈ U 1 − α. For the case n h > 1, Q 1−α (h(u, δ) ≤ 0 cannot be
and ∀δ ∈ . The uncertain variable h(u, δ) has a continuous well defined since h(u, δ) is not a scalar random variable any
cumulative distribution function (CDF) for all u ∈ U. more. Instead of using h(u, δ), h̄(u, δ) = max j =1,...,n h h j (u, δ)
There are several difficulties to address problem described is used. Then, the cumulative probability function F(γ , u)
by (1). notes the same style as (5) only replacing h(·, ·) by h̄(·, ·).
1) The structural properties of the feasible domain defined Consequently, the quantile is written as Q 1−α (h̄(u, δ) ≤ 0.
by h(u, δ) ≤ 0 may not succeed to the domain defined Without losing the generality, Q 1−α (h̄(u, δ) ≤ 0 can be used
by the constraints Pr{h(u, δ) ≤ 0} ≥ 1 −α. For instance, for n h = 1 as well. Then, the following reformulation of
even if h are all linear in u, the probabilistic constraint problem (1) is written as:
may not define a convex domain. min J (u)
2) Instead of knowing the distribution of δ, only the sam- u∈U
 
ples of δ are available. s.t. Q 1−α h̄(u, δ) ≤ 0, δ ∈ , α ∈ (0, 1). (6)
3) The tractable analytical function of probabilistic con-
The advantage of using the quantile is that the quantile
straint does not exist even with the knowledge of the
function is much less flat compared to the probability function
distribution of δ.
Pr{h(u, δ) ≤ 0}. The comparison example of the quantile
This study is to find out a tractable analytical function H (u) function and probability function is shown in Fig. 1. By the
to define a feasible domain Ũ f = {u ∈ U|H (u) ≤ 0}. The quantile function-based reformulation, the feasible region is
feasible domain Ũ f is equal to the feasible domain U f = {u ∈ measured in the image of h̄(u, δ) instead of the bounded
U|Pr{h(u, δ) ≤ 0} ≥ 1 − α} with probability as 1. Then, image [0, 1] of the probability function.
the problem with deterministic constraints
min J (u) B. Neural Approximation of Quantile Function
u∈U
s.t. H (u) ≤ 0 (2) The sample set  N = {δ1 , . . . , δ N } is obtained by extracting
samples independently of  according to the identical distri-
is the equivalent problem of (1). By solving (2), the optimal bution pδ (δ). For a given u, the empirical CDF of H N =
solution of (1) can be approximated. {h(u, δ1 ), . . . , h(u, δ N )} is written as
1   
N
III. N EURAL A PPROXIMATION A PPROACH
F̃ N (t, u) = I h̄(u, δi ) ≤ t (7)
A. Quantile Function-Based Problem Reformulation N i=1
The cumulative probability function of a random variable X where I(h(u, δi ) ≤ t) denotes the indicator function written as
is denoted as 
  0, if h̄(u, δi ) > t
F(x) = Pr{X ≤ x}. (3) I h̄(u, δi ) ≤ t = (8)
1, if h̄(u, δi ) ≤ t.
While the 1 − α level quantile of a random variable X is
defined as [13] The (1 − α)-empirical quantile at u can be obtained from a
value t such that F̃ N (t, u) ≈ 1 − α, which is defined as
Q 1−α (X) = inf{x ∈ R|Pr{X ≤ x} ≥ 1 − α}. (4)   
  1  N
 
For the case n h = 1, h : R × R → R is a real-valued
nx nδ Q̃ 1−α
h̄(u, δ) = inf y  I h̄(u, δi ) ≤ y ≥ 1 − α (9)
N i=1
function and h(u, δ) is a scalar random variable. Thus, the
cumulative probability function of h(u, δ) can be defined as equivalently written as
 
F(γ , u) = Pr{h(u, δ) ≤ γ }, γ ∈ R. (5) Q̃ 1−α h̄(u, δ) = h̄
M (u) (10)

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on February 21,2023 at 05:33:25 UTC from IEEE Xplore. Restrictions apply.
1060 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 34, NO. 2, FEBRUARY 2023

where M =
(1−α)N and h̄
M (u) denotes the Mth smallest C. Convergence and Feasibility Analysis
observation of the values H N for a given u. We make the following assumptions.
We can use the (1 − α)−empirical quantile as an approxi- Assumption 1: h̄(u, δ) is a Carathéodory function. For
mation of the probabilistic constraints and form the following every fixed u ∈ U, h̄(u, δ) is a continuous random variable,
approximate problem: which is measurable and has a continuous distribution. For
min J (u) every fixed δ ∈ , h̄(u, δ) is a continuous function of u.
u∈U
  Besides, for every u ∈ U, Fh̄ (h̄(u, δ)), the CDF of h̄(u, δ),
s.t. Q̃ 1−α h̄(u, δ) ≤ 0, δ ∈ , α ∈ (0, 1). (11) is continuously differentiable and it has strictly positive deriva-
tive of h̄(u, δ) (strictly monotonically increasing), f h̄ (h̄(u, δ)),
Using the (1 − α)-empirical quantile as an approximation
over the domain of h̄(u, δ): (−∞, +∞).
of the probabilistic constraints has two drawbacks.
Remark 1: The CDF Fh̄ (h̄(u, δ)) is continuously and
1) Q̃ 1−α (h̄(u, δ))) is usually not differentiable. M changes strictly monotonically increasing over (−∞, +∞). Thus,
for different values of u. Thus, even if the constraint the quantile function Q(h̄(u, δ)) : [0, 1] → Dh̄(u,δ) is the
h(u, δ) is smooth for a fixed value of δ, Q̃ 1−α (h̄(u, δ))) inverse of Fh̄ and thus continuous on [0, 1]. Moreover, for
does not have to be smooth. a fixed δ ∈ , h̄(·, δ) is continuous on U. Thus, the values
2) For small N, the uncertainty of Q̃ 1−α (h̄(u, δ))) is large of the CDF Fh̄ (h̄(u, δ)) and the quantile function Q(h̄(u, δ))
and the inward kinks of the feasible boundary will be vary continuously on U due to the fact that continuous
very nonsmooth. Large N can bring some smoothness functions’ function composition leads to continuous function.
back, and however, this also increases the computation Thus, Assumption 1 implies that the 1 − α level quantile
burden. Q 1−α (h̄(u, δ)) is a continuous function of u on U.
Thus, it is necessary to look for the smooth approximation of Assumption 2: The problem expressed by (1) has a globally
(1 − α)-empirical quantile. optimal solution u ∗ , such that for any u , there is u ∈ U such
Given the sample set  N , the (1 − α)-empirical quantile that u − u ∗  ≤ u and F(0, u) > 1 − α (or equivalently
Q̃ 1−α (h̄(u, δ))) is essentially a function of u. Q 1−α (h̄(u, δ))) Q 1−α (h̄(u, δ)) < 0).
is also a function of u. By using Q̃ 1−α (h̄(u, δ))) as an Remark 2: Assumption 2 implies that there exists a
approximation of Q 1−α (h̄(u, δ))), a smooth function can be sequence {u k }∞ k=1 ⊆ U that converges to an optimal solution
used to approximate the (1 − α) quantile Q 1−α (h̄(u, δ))). u ∗ such that F(0, u k ) > 1 − α (or equivalently Q 1−α (h̄(u k ,
In this study, single-layer neural network model is used δ)) < 0) for all k ∈ N.
to approximate the (1 − α) quantile Q 1−α (h̄(u, δ))). Using By applying [14, Corollary 21.5], we have the following
N independently extracted samples of δ, the neural approxi- lemma about the convergence of Q̃ 1−α (h̄(u, δ)).
mation of Q 1−α (h̄(u, δ))) with S hidden nodes and activation Lemma 1: Suppose that Assumption 1 holds. For any fixed
function g(·) is defined as α ∈ (0, 1) and any fixed u ∈ U, if N samples of δ, δi ,

S i = 1, . . . , N, are independently extracted, then
Ĥ S (u) = βi g(u, ai , bi ) (12)  
1  I h̄ i ≤ Q 1−α − (1 − α)
N
i=1 Q̃ 1−α
−Q 1−α
=−   + o(1)
where βi denotes the weight vector connecting the i th hidden N i=1 f Q 1−α
node and the output nodes, ai = [ai,1 , . . . , ai,k ] represents the (16)
weight vector toward u, and bi is the scalar threshold of the
where Q̃ , Q , and h̄ i are shorts for Q̃ (h̄(u, δ)),
1−α 1−α 1−α
i th hidden node. Here, the activation function adopts
Q 1−α (h̄(u, δ)), and h̄(u, δi ), respectively.
the sigmoid function expressed as
Consequently, the sequence { Q̃ 1−α − Q 1−α } is asymptotically
g(u, ai , bi ) =
1
. (13) normal with mean 0 and variance (α(1 − α))/(N · f 2 (Q 1−α )).
1+ e−ai u+bi As N increases to ∞, the variance also vanishes to zeros.
T

The purpose is to achieve For the convergence of Ĥ S (u) to Q 1−α (h̄(u, δ)), we have
  the following theorem.
Ĥ S (u) ≈ Q 1−α h̄(u, δ) . (14) Theorem 1: Suppose that Assumption 1 holds, as N → ∞,
Then, Ĥ S (·) maps the decision variable u ∈ R into the k ∀ H > 0 there exists S, a, b, and δ such that
  
image of Q 1−α (h̄(u, δ))) which is R. In order to find solutions sup  Ĥ S (u) − Q 1−α h̄(u, δ)  <  H , w.p.1 (17)
for (1) and (6), it is equivalent to solve the following problem: u∈Uc

min J (u) where Uc ⊆ U represents any compact set inside the feasible
u∈U area.
s.t. Ĥ S (u) ≤ 0. (15) Proof (Theorem 1): Due to Assumption 1 and Remark 1,
Q 1−α (h̄(u, δ)) is the continuous function of u. Then, according
The above formulation is a sample-based neural approxi-
to the universal approximation theorem [15], [16], ∀ H > 0,
mation of the original PCP, which is obtained by a two-layer
∀Q 1−α (h̄(u, δ)), ∃S ∈ N+ , β, b ∈ R S , and a ∈ R S×k such
approximation: sample approximation and neural approxima-
that
tion. The convergence and feasibility of the two-layer approx-   
imation should be analyzed.  Ĥ S (u) − Q 1−α h̄(u, δ)  <  H (18)

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on February 21,2023 at 05:33:25 UTC from IEEE Xplore. Restrictions apply.
SHEN et al.: SAMPLE-BASED NEURAL APPROXIMATION APPROACH FOR PCPs 1061

for all u ∈ U. However, Q 1−α (h̄(u, δ)) is not used directly and J (û) ≥ J ∗ . Moreover, J (û l ) → J (û) w.p.1, which means
for fitting the quantile function. The used one is the approx- that liml→∞ JN,l S,l
≥ J ∗ . The above is true for any cluster point

imate value Q̃ 1−α (h̄(u, δ)), which has bias compared to of {û k }k=1 in the compact set U, and we have
Q 1−α (h̄(u, δ)). The bias is defined by (16). Denote d Q 1−α =
Q̃ 1−α − Q 1−α . According to Lemma 1, d Q 1−α is normal with
S,k
lim inf JN,k ≥ J ∗ , w.p.1. (21)
k→∞
mean 0 and variance (α(1 − α))/(N · f 2 (Q 1−α )). As N → ∞,
(α(1 − α))/(N · f 2 (Q 1−α )) → 0. Namely, d Q 1−α  → 0 with Now, by Assumption 2 and Remark 2, there exists an
probability 1. optimal solution u ∗ and a sequence {û l }l=1 ∞
converging to u ∗
Since we can only obtain Q̃ 1−α (h̄(u, δ)), instead of (18) with Q (h̄(û l , δ)) < 0. Note that Ĥ S,l (û l ) converges to
1−α

which needs Q 1−α (h̄(u, δ)), the following inequality: Q 1−α (h̄(û l , δ)) w.p.1, and thus, there exist K (l) such that
   Ĥ S,k (û l ) ≤ 0 for every k ≥ K (l) and every l, w.p.1. Assume
 Ĥ S (u) − Q̃ 1−α h̄(u, δ)  <  H (19) that K (l) < K (l + 1) for every l without loss of generality
and define the sequence {ū k }∞ k=K (1) by setting ū k = u l for all k
can be obtained ∀ H > 0, ∀Q 1−α (h̄(u, δ)), ∃S ∈ N+ , β, b ∈
and l with K (l) ≤ k < K (l + 1). We then have Ĥ S,k (ū k ) ≤ 0,
R S , and a ∈ R S×k . Since  Ĥ S (u) − Q 1−α (h̄(u, δ)) = S,k
which implies that JN,k ≤ J (ū k ) for all k ≥ K (1). Since f is
 Ĥ S (u) − Q̃ 1−α (h̄(u, δ)) + Q̃ 1−α (h̄(u, δ)) − Q 1−α (h̄(u, δ)) ≤
continuous and ū k also converges to u ∗ , we have
 Ĥ S (u) − Q̃ 1−α (h̄(u, δ)) + d Q 1−α , we have
 
     S,k
lim sup JN,k ≤ J u ∗ = J ∗ , w.p.1. (22)
 Ĥ S (u) − Q 1−α h̄(u, δ)  <  H + d Q 1−α . (20) k→∞

Namely, for any  H > 0, we have parameters to satisfy (20). Thus, JN,k S,k
→ J (u ∗ ) w.p.1 when k → ∞. Namely,
Notice that d Q 1−α  becomes 0 with probability 1 as N → ∗
JNS →J .
∞. Besides, for any N, there exist S, a, b, and δ to satisfy (20) For the proof of D(A SN , A) → 0 w.p.1, it can refer
for any  H > 0. Thus, (17) is proven.  to [18, Th. 5.3]. 
Let A and A SN denote the sets of optimal solutions for For the finite sample feasibility analysis of the approxi-
problems (1) and (15), respectively. Notice that problem (6) mation problem solutions, we will make use of Hoeffiding’s
shares the same set of optimal solutions as problem (1). inequality [18], [19]:
Besides, denote J ∗ and JNS the optimal values of cost functions Theorem 3: Denote Z 1 , . . . , Z N for independent random
in problems (1) and (15), respectively. The convergences of JNS variables with bounded sample spaces, namely Pr{Z i ∈
and A SN as N and S increase are summarized in the following. [z i,min , z i,max ]} = 1, ∀i ∈ {1, . . . , N}. Then, if s > 0
Theorem 2: Suppose that U is compact, the function J (·)  N 
is continuous, and Assumptions 1 and 2 hold. Then, JNS → J ∗  − 2N 2 s 2

Pr (Z i − E{Z i }) ≥ s N ≤ e
N
i=1 ( z i,max −z i,min )
2
. (23)
and D(A SN , A) → 0 w.p.1.
i=1
Proof (Theorem 2): Due to Assumption 2, the set A is
nonempty and there exists u ∈ U such that Q 1−α (h̄(u, δ)) < 0. Based on Hoeffding’s inequality, probabilistic feasibility
According to Theorem 1, Ĥ S (u) converges to Q 1−α guarantee of the sample approximation method is proven in [9]
(h̄(u, δ)) < 0, and thus, we can find S0 , a0 , b0 , β0 , and and summarized here as:
N0 such that Ĥ S (u) −  H − d Q 1−α  ≤ 0. ( H can be Theorem 4: Let u ∈ U be such that u ∈ / U f . Then,
any small value to 0 and d Q 1−α  decreases to 0 with
N
Pr F̃ (−γ , u) ≥ 1 − β ≤ e−2N τu
2
probability 1 as N → ∞.) Because Ĥ S (u) is continuous (24)
in u and U is compact, the feasible set of the approxima-
where γ > 0, β ∈ [0, α], and τu > 0 is written as
tion problem (15) is compact as well. Besides, A SN is not
empty w.p.1 for all N ≤ N0 and all S, a, b, β, and N for τu = F(0, u) − F(−γ , u) + (α − β). (25)
smaller error bound (with higher probability)  H + d Q 1−α 
of (20). Here, we denote Ĥ S,0 (u) for Ĥ S (u) with parameters Theorem 4 shows an important property of approximating
{S0 , a0 , b0 , β0 , N0 } and input u. Analogously, Ĥ S,k (u k ) is for the cumulative probability function by samples. Set γ ≈ 0
Ĥ S (u) with parameters {Sk , ak , bk , βk , Nk } and input u k . With and β ≈ α, if the sample number goes to ∞ and
parameters {Sk , ak , bk , βk , Nk }, the corresponding notations for Pr{ F̃ N (−γ , u) ≥ 1 − β} goes to 0. Namely, the sample-based
the optimal solution set and optimal cost value are generally neural approximation problem has the solution, which satisfies
defined as A S,k S,k
N,k and J N,k . the original probabilistic constraint in a higher probability with

Let {Nk }k=1 ≥ N0 and {Sk , ak , bk , βk , Nk } be two sequences more sample numbers.
such that the corresponding s,k as in (20) decreases to 0.
Let û k ∈ A S,k N,k , which means that û k ∈ U, Ĥ S,k (û k ) ≤ 0,
S,k IV. P ROPOSED A LGORITHMS FOR PCP S
and JN,k = J (û k ). Let û ∈ U be any cluster point of
{û k }∞ ∞
k=1 . Define {û l }l=1 a subsequence converging to û. Since
Two algorithms are used to solve PCPs. First, sample-based
Ĥ S,l (u) defined by {Sl , al , bl , βl , Nl } is continuous and con- algorithm is summarized to train the deterministic constraints,
verges uniformly to Q 1−α (h̄(u, δ)) on U w.p.1, we have which has a neural network form. Then, a simulated annealing
that Q 1−α (h̄(û, δ)) = liml→∞ Ĥ S,l (û l ) w.p.1. Therefore, algorithm is summarized to solve the deterministic program
Q 1−α (h̄(û, δ)) ≤ 0 and û is feasible for the true problem, obtained by the neural approximation approach.

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on February 21,2023 at 05:33:25 UTC from IEEE Xplore. Restrictions apply.
1062 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 34, NO. 2, FEBRUARY 2023

A. Algorithm for Training Ĥ S (u)


Due to the discussion in Section III, the algorithm for
training Ĥ S (u) is summarized as follows.
1) Generate the sample set of uncertain parameter vector
 N = {δ1 , . . . , δ N } by extracting samples indepen-
dently of sample space  according to the identical
distribution pδ (δ).
2) Generate the sample set of decision variable U K =
{u 1 , . . . , u K } by extracting samples independently of
feasible domain U according to uniform distribution.
3) For all u k ∈ U K , calculate the (1 − α)-empirical quantile
at u k , Q̃ 1−α (h̄(u k , δ)), using (9) and obtain the output Fig. 2. Power curve of an industrial wind turbine.
sequence Y K = { Q̃ 1−α (h̄(u 1 , δ)), . . . , Q̃ 1−α (h̄(u K , δ))}.
4) Train β, a, b, and c in (12) based on U K and Y K by
the extreme learning machine (ELM) algorithm that is
d) Update the temperature as
introduced in [20].

Tm−1 , if (29) holds
Tm = (31)
B. Simulated Annealing Algorithm for Approximate Program ρ · Tm−1 , if (29) doesn’t hold
Simulated annealing algorithm has been widely used
where ρ ∈ (Tmin /T0 , 1).
to address global optimizations with constraints in many
fields [21]–[25] and has also been proved to be practical e) Terminate the algorithm if Tm < Tmin , where Tmin is
the lower boundary for the temperature or m = M.
for nonconvex optimization [26], [27]. After using neural
approximation approach, we obtain the original problem with
deterministic constraints. The deterministic constraints have
V. A PPLICATION TO IPM OF W IND P OWER
the neural network formulation. The original edition of simu-
lated annealing algorithm is not presented here, which can be A. Wind-Turbine Power Curve and Dataset
found in [22] and [26]. After minor revision on the original
algorithm, it can be used to solve the deterministic constrained The predictor model of wind power is used to predict the
problem. The algorithm is summarized as follows. wind power based on wind speed measurement [28]. Fig. 2
shows the power curve constructed from the industrial data.
1) Initialize a temperature T0 and a decision value u ∗ and
The data come from a large wind farm located in Jiugongshan,
calculate the cost value as
Hubei, China. The dataset was collected at turbines at a
    
E ∗ = J u ∗ + C Ĥ S u ∗ (26) sampling interval of 10 min. In total, 56 618 data points were
collected from March 17, 2009, to April 17, 2009. The unit
where of the active wind power is kW, and the value of power is
  
  0, if Ĥ S u ∗ < 0 normalized for air density of 1.18 kg/m3 .
C u∗ =   (27) Piecewise models are often used to improve the prediction
C̄, if Ĥ S u ∗ ≤ 0
accuracy [29], [30]. The wind speed range is discretized into
where C̄ should be chosen as a very larger value. intervals, and the corresponding wind speed and power data
2) For iteration m = 1, 2, . . . , M, do the following step make the partitions. Supposing that the cut-out speed is v co ,
iteratively. the speed range [0, v co ] is divided into Nw equal length
intervals. The data points in each interval are defined as
a) Randomly select another point u m ∈ Vr (u ∗ ) =
{u m ∈ U|u m − u ∗  ≤ r }. Vr (u ∗ ) a neighborhood

Di = (v, p(v)) ∈ R2 v ∈ v i , v i+1 ) , i = 1, . . . , Nw (32)
of the previous point and r is the radius. Calculate
the corresponding cost value E m similarly as (27). where Di represents the points set of the i th partition,
b) Calculate v represents the wind speed, p(v) is the corresponding wind
E m = E m − E ∗ . (28) power, and v i = ((i − 1)/Nw )v co is the demarcation speed
between the i th and (i − 1)th partitions. In this study, v co is
c) Move to the new point by setting u m = u ∗ if a chosen as 20, and Nw is 10, as shown in Fig. 2. Besides,
random variable μ distributed uniformly over (0,1) we do not establish models for all partitions in this study.
satisfies As an example to demonstrate the proposed method, the data
− TEm in partitions 4 and 5 are used as a unity, which are marked
μ≤e m−1 (29)
red circles in Fig. 2. Notice that some under-power points or
or equivalently stopping points are wiped out by the data preprocess method
introduced in [28, Sec. III]. For convenience, denote the used
E m ≤ −Tm−1 log μ. (30) dataset as D.

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on February 21,2023 at 05:33:25 UTC from IEEE Xplore. Restrictions apply.
SHEN et al.: SAMPLE-BASED NEURAL APPROXIMATION APPROACH FOR PCPs 1063

B. IPM of Wind Power


Consider a system with the mechanism or model from the
input to output written as
 
y = f ϕ, δ y (33)

where y ∈ R denotes the system output and ϕ ∈ Rn ϕ is


a regression vector containing input variables u ∈ Rn u or
decision variables on which the system output y depends.
Besides, δ y denotes the uncertain parameter vector and δ y ∈
 y ⊆ Rn δ y . Besides, assume that probability measure is
well defined on the sigma algebra of  y , B y . Due to the Fig. 3. Estimation results of quantile function by using neural networks.
existence of δ y , for a fixed ϕ, y is supposed to be located in a
sample space with well-defined sigma algebra and probability
measure. Let Y(ϕ) be the sample space and p y (y|ϕ) be
the probability density function conditioned on ϕ. Standard
predictor models can only give a specific value of y for ϕ.
However, the probabilistic interval of y conditioned on ϕ is
important for various applications.
Differently from standard predictor models, IPMs return a
prediction interval instead of a single prediction value [31].
As defined in [31], an IPM is a set-valued map

I : ϕ → I (ϕ) ⊆ Y ⊂ R (34)

where ϕ ∈ Rn ϕ is a regression vector containing input variables Fig. 4. Results of interval predictions by scenario approach using different
u ∈ Rn u or decision variables on which the system output y choices of sample number.
depends. Given an observed ϕ, I (ϕ) is an informative interval
or the prediction interval, containing y with a given guaranteed
probability 1 − α, α ∈ [0, 1). Output intervals are obtained by Due to the above discussions, the IPM identification prob-
considering the span of parametric families of functions. Here, lem of wind power prediction is to find IPM I(c,r,ε) (v) (or find
we consider the family of linear regression functions defined (c, r, ε)) such that
by
min wr + ε (39)

(c,r,ε)
M = y = θ T ϕ + e, θ ∈ ⊆ Rn θ , e ≤ ε ∈ R . (35) subject to r, ε ≥ 0


Then, a parametric IPM is obtained by associating to each Pr p ∈ Ic,r,ε (v) ≥ 1 − α, α ∈ [0, 1). (40)
ϕ the set of all possible outputs given by θ T ϕ + e as θ varies Notice that w is a weight and often chosen as w =
over and e varies over E = {e ∈ R|e ≤ ε}, which is E{ϕ} [31]. In this study, w is chosen as 20.
defined as

C. Results and Discussion
I (ϕ) = y : y = θ T ϕ + e ∀θ ∈ ∀e ∈ E . (36)
Fig. 3 shows one example of the estimation results of
A possible choice for the set is a ball with center c and quantile function by using neural networks; 5000 different
radius r combinations of (c, r, ε) are considered. The dataset was

divided into a train set and a test set. The blue circles show
= Bc,r = θ ∈ Rn θ : θ − c ≤ r . (37) the quantile function value given by the trained fitting model.
Then, the interval output of the IPM can be explicitly The red dotted line shows the result of the empirical quantile
computed as function value calculated by using the data of the test set. The
mean value of error is −0.0421 and the mean of the abstract

I (ϕ) = c T ϕ − (r ϕ + ε), c T ϕ + (r ϕ + ε) . (38) value of mean is 12.5882.
We compare the proposed method with scenario approach,
For the wind power prediction problem, the input is wind which is proposed in [7]. The required α is set as 0.05.
speed and output is wind power as u = v and y = p. Besides, Figs. 4 and 5 show the results of interval predictions by
ϕ is chosen as ϕ = [u u 2 ]T . Then, ϕ = [v v 2 ]T in the wind scenario approach and the proposed method, respectively.
power prediction problem. Identifying the IPM is essentially The number of used data samples is 150, 750, or 7500.
identifying (c, r, ε). Without loss of the generality, we use As the number of samples increases, the scenario approach
I(c,r,ε) (v) for I (ϕ) in the latter part for IPM identification gives more conservative results, while the proposed method
problem of wind power prediction. performs with better robust on sample numbers.

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on February 21,2023 at 05:33:25 UTC from IEEE Xplore. Restrictions apply.
1064 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 34, NO. 2, FEBRUARY 2023

[3] X. Shen, Y. Zhang, T. Shen, and C. Khajorntraidet, “Spark advance self-


optimization with knock probability threshold for lean-burn operation
mode of Si engine,” Energy, vol. 122, pp. 1–10, Mar. 2017.
[4] X. Shen, T. Ouyang, Y. Zhang, and X. Zhang, “Computing probabilis-
tic bounds on state trajectories for uncertain systems,” IEEE Trans.
Emerg. Topics Comput. Intell., early access, Sep. 3, 2020, doi: 10.1109/
TETCI.2020.3019040.
[5] M. Guo and M. M. Zavlanos, “Probabilistic motion planning under
temporal tasks and soft constraints,” IEEE Trans. Autom. Control,
vol. 63, no. 12, pp. 4051–4066, Dec. 2018.
[6] P. Gautam, R. Karki, and P. Piya, “Probabilistic modeling of energy stor-
age to quantify market constrained reliability value to active distribution
systems,” IEEE Trans. Sustain. Energy, vol. 11, no. 2, pp. 1043–1053,
Apr. 2020.
[7] G. Calafiore and M. Campi, “The scenario approach to robust control
Fig. 5. Results of interval predictions by neural approximation approach design,” IEEE Trans. Autom. Control, vol. 51, no. 5, pp. 742–753,
using different choices of sample number. The limitation of violation proba- May 2006.
bility is α = 0.05. [8] M. C. Campi and S. Garatti, “A sampling-and-discarding approach to
chance-constrained optimization: Feasibility and optimality,” J. Optim.
Theory Appl., vol. 148, no. 2, pp. 257–280, 2011.
TABLE I [9] J. Luedtke and S. Ahmed, “A sample approximation approach for
P ROBABILITY T HAT C ONSTRAINT FAILURE optimization with probabilistic constraints,” SIAM J. Optim., vol. 19,
P ROBABILITY α > 0.05: P R{α > 0.05} no. 2, pp. 674–699, Jul. 2008.
[10] A. Peña-Ordieres, J. R. Luedtke, and A. Wächter, “Solving chance-
constrained problems via a smooth sample-based nonlinear approxima-
tion,” SIAM J. Optim., vol. 30, no. 3, pp. 2221–2250, Jan. 2020.
[11] A. Geletu, A. Hoffmann, M. Kloppel, and P. Li, “An inner-outer
approximation approach to chance constrained optimization,” SIAM J.
Optim., vol. 27, no. 3, pp. 1834–1857, 2017.
TABLE II [12] X. Shen, J. Zhuang, and X. Zhang, “Approximate uncertain program,”
M EAN VALUE OF C OST F UNCTION IEEE Access, vol. 7, pp. 182357–182365, 2019.
[13] G. Steinbrecher and W. T. Shaw, “Quantile mechanics,” Eur. J. Appl.
Math., vol. 19, no. 2, pp. 87–112, 2008.
[14] A. W. van der Vaart, Asymptotic Statistics (Cambridge Series in Statisti-
cal and Probabilistic Mathematics). Cambridge, U.K.: Cambridge Univ.
Press, 1998.
[15] G. Cybenko, “Approximation by superpositions of a sigmoidal function,”
Math. Control, Signals Syst., vol. 2, no. 4, pp. 303–314, 1989.
A more comprehensive validation was conducted through [16] M. Hassoun, Fundamentals of Artificial Neural Networks. Cambridge,
MA, USA: MIT Press, 1995.
the posterior Monte Carlo analysis. In the validation, the num-
[17] R. R. Selmic and F. L. Lewis, “Neural-network approximation of
ber of data sample was increased from 150 to 7500. For each piecewise continuous functions: Application to friction compensation,”
number of data samples, 500 times sampling and calculation IEEE Trans. Neural Netw., vol. 13, no. 3, pp. 745–751, May 2002.
of c, r, ε processes were done. For instance, if we set the [18] A. Shapiro, D. Dentcheva, and A. Ruszczynski, Lectures on Stochastic
Programming: Modeling and Theory. 2nd ed. Philadelphia, PA, USA:
number of data samples as 150. Then, 500 times of extracting SIAM, 2014.
150 samples of (v, p) from the training set were done. For [19] W. Hoeffding, “Probability inequalities for sums of bounded random
each extracted 150 samples, the corresponding c, r, ε was variables,” J. Amer. Stat. Assoc., vol. 58, no. 301, pp. 13–30, 1963.
[20] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning
calculated by the scenario approach or proposed method. The machine: Theory and applications,” Neurocomputing, vol. 70, nos. 1–3,
statistical analysis results are summarized in Tables I and II. pp. 489–501, 2006.
Apparently, the proposed method can achieve a better tradeoff [21] S. Kirkpatrick, C. D. Gelatt, and M. Vecchi, “Optimization BT simulated
annealing,” Science, vol. 220, no. 4598, pp. 671–680, 1983.
between the probability constraint and the cost value.
[22] S. P. Brooks and B. J. T. Morgan, “Optimization using simulated
annealing,” J. Roy. Stat. Soc., Ser. D, Statistician, vol. 44, no. 2,
pp. 241–257, 1995.
VI. C ONCLUSION
[23] A. S. Azad, M. Islam, and S. Chakraborty, “A heuristic initialized
In this article, a neural approximation-based method for stochastic memetic algorithm for MDPVRP with interdependent depot
operations,” IEEE Trans. Cybern., vol. 47, no. 12, pp. 4302–4315,
solving PCP has been proposed. The statistical guarantees of Dec. 2017.
the proposed method are discussed. Furthermore, IPM of wind [24] A. Albanna and H. Yousefi’Zadeh, “Congestion minimization of LTE
power is investigated with experimental data to validate the networks: A deep learning approach,” IEEE/ACM Trans. Netw., vol. 28,
no. 1, pp. 347–359, Feb. 2020.
proposed method. In the validation, the proposed method is
[25] A. Dekkers and E. Aarts, “Global optimization and simulated annealing,”
compared with the scenario approach. The results show that Math. Program., vol. 50, pp. 367–393, Mar. 1991.
the proposed method performs better in the tradeoff between [26] H. Szu and R. Hartley, “Fast simulated annealing,” Phys. Lett. A,
satisfying the probabilistic constraints and minimizing the cost. vol. 122, no. 3, pp. 157–162, 1987.
[27] L. Ingber, “Simulated annealing: Practice versus theory,” Math. Comput.
Model., vol. 18, no. 11, pp. 29–57, 1993.
R EFERENCES [28] T. Ouyang, A. Kusiak, and Y. He, “Modeling wind-turbine power curve:
A data partitioning and mining approach,” Renew. Energy, vol. 102,
[1] A. Charnes and W. W. Cooper, “Chance constrained programming,” pp. 1–8, Mar. 2017.
Manage. Sci., vol. 6, no. 1, pp. 73–79, 1959. [29] M. Lydia, A. I. Selvakumar, S. S. Kumar, and G. E. P. Kumar,
[2] M. C. Campi and S. Garatti, Introduction to the Scenario Approach. “Advanced algorithms for wind turbine power curve modeling,” IEEE
Philadelphia, PA, USA: MOS-SIAM Series on Optimization, 2019. Trans. Sustain. Energy, vol. 4, no. 3, pp. 827–835, Jul. 2013.

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on February 21,2023 at 05:33:25 UTC from IEEE Xplore. Restrictions apply.
SHEN et al.: SAMPLE-BASED NEURAL APPROXIMATION APPROACH FOR PCPs 1065

[30] M. G. Khalfallah and A. M. Koliub, “Suggestions for improving wind Nan Yang was born in Xiangyang, China.
turbines power curves,” Desalination, vol. 209, nos. 1–3, pp. 221–229, He received the B.S. degree in electrical engineering
Apr. 2007. from the College of Electrical and Power Engi-
[31] M. C. Campi, G. Calafiore, and S. Garatti, “Interval predictor models: neering, Taiyuan University of Technology, Taiyuan,
Identification and reliability,” Automatica, vol. 45, no. 2, pp. 382–392, China, in 2009, and the Ph.D. degree in electrical
Feb. 2009. engineering from Wuhan University, Wuhan, China,
in 2014.
Xun Shen (Member, IEEE) was born in Yueyang, He is currently an Associate Professor with the
China. He received the B.S. degree in electri- Department of Hubei Provincial Collaborative Inno-
cal engineering from Wuhan University, Wuhan, vation Center for New Energy Microgrid, China
Three Gorges University, Yichang, China. His major
China, in 2012, and the Ph.D. degree in mechanical
engineering from Sophia University, Tokyo, Japan, research interests include optimal operation of power system, planning of
in 2018. He is currently pursuing the Ph.D. degree power system, uncertainty modeling, and artificial intelligence technology.
with the Department of Statistical Sciences, Gradu-
ate University for Advanced Studies, Tokyo.
He has been a Post-Doctoral Research Associate
with the Department of Engineering and Applied
Sciences, Sophia University, since April 2018. Since
June 2019, he has been an Assistant Professor with the Department of
Mechanical Systems Engineering, Tokyo University of Agriculture and Tech-
nology, Fuchu, Japan. His current research interests include chance con- Jiancang Zhuang received the Ph.D. degree in
strained optimization, point process, and their industrial applications. statistics from the Department of Statistical Science,
Graduate University for Advanced Studies, Tokyo,
Tinghui Ouyang received the B.S. and Ph.D. Japan, in 2003.
degrees in electrical engineering and automation From 2003 to 2004, he was a Lecturer with The
from Wuhan University, Wuhan, Hubei, China, Institute of Statistical Mathematics, Tokyo. From
in 2012 and 2017, respectively. 2004 to 2006, he was a Post-Doctoral Fellow at The
He is currently a Research Scientist at the Artificial Institute of Statistical Mathematics, funded by the
Intelligence Research Center (AIRC), National Insti- Japanese Science Promotion Society. From 2006 to
tute of Advanced Industrial Science and Technology 2007, he was a Post-Doctoral Fellow at the Depart-
(AIST), Tokyo, Japan. Before joining AIST, he was ment of Earth and Space Sciences, University of
a Research Fellow with Nanyang Technological Uni- California at Los Angeles, Los Angeles, CA, USA. In 2007, he returned
versity (NTU), Singapore, from 2018 to 2019, and to The Institute of Statistical Mathematics, where he had been an Assistant
a Post-Doctoral Fellow at the University of Alberta, Professor until 2011. Since 2011, he has been an Associate Professor with
Edmonton, AB, Canada, from 2017 to 2018. His major research interests The Institute of Statistical Mathematics and the Department of Statistical
include computational intelligence, data mining, machine learning, and deep Sciences, Graduate University for Advanced Studies. His research interests
learning. include statistics, probability, and modeling.

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on February 21,2023 at 05:33:25 UTC from IEEE Xplore. Restrictions apply.

You might also like