Professional Documents
Culture Documents
net/publication/312601860
CITATIONS READS
2 188
3 authors:
Colin Jones
École Polytechnique Fédérale de Lausanne
213 PUBLICATIONS 5,446 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Turnpikes and dissipativity in optimal control and economic NMPC View project
All content following this page was uploaded by Sanket Diwale on 23 April 2018.
Abstract— Airborne wind energy systems are built to exploit solution [5]. We were however examining stable wind con-
the stronger and more consistent wind prevalent at high ditions and using training measurements only after reaching
altitude. This requires a reliable controller design that keeps steady state (for the system here this implies at least 10
the airborne system flying while respecting all operational
and safety constraints. A frequent design for such a system loops with the same inputs applied and fairly constant wind).
includes a flying airfoil tethered to a ground station. Here, we We now present a more flexible solution where transient
demonstrate an on-line data based method that optimizes the measurements (after each loop) are utilised. The general
average towing force of such a system in the presence of altitude optimization setting is
constraints and varying wind. We utilize Gaussian Processes to
learn the mapping from the input to the objective, constraint max P (x, u, w)
and state dynamic functions of the system. We then formulate x,u
a chance - constrained optimization problem that takes into s.t. ẋ = f (x, u, w), (1)
consideration uncertainty in the learned functions and finds 0 ≤ G(x, u, w)
feasible directions for improvement. Simulation studies show
that we can find near optimal set points for the controller where P : Rnx × Rnu × Rnw → R is the objective function,
without the use of significant assumptions on model dynamics f : Rnx ×Rnu ×Rnw → Rnx describes the system dynamics
while respecting the unknown constraint function. The results
demonstrate an improved performance over our previous work
and G : Rnx × Rnu × Rnw → R is the constraint function.
which was restricted to steady state measurements. The variables x ∈ Rnx describe the states of the system
and can be indirectly manipulated through the inputs u
I. I NTRODUCTION which might be bounded by additional constraints in a set
X ⊆ Rnu . The variables w represent the exogenous signals
The power available in a wind stream is proportional to
(here the wind) on which we have no control. The functions
the cube of its speed [1]. This is a major reason behind the
can potentially be nonlinear and non-convex, unknown, and
continuous increase in the height of modern wind turbines.
are being actively learned through measurements using the
However, further scaling the support mast incurs consid-
framework of Gaussian Processes (GP) [6]. GPs can be
erable costs, can arouse environmental opposition and has
used for regression with relatively few assumptions on the
certain structural limitations. To circumvent this obstacle
structure of the unknown function and have been extensively
a number of commercial and research efforts have been
used in unconstrained optimization [7], [8].
focusing in Airborne Wind Energy (AWE) systems that
The challenge arises in the presence of constraints that
operate without the need for structural support [2], [3].
need to be learned as well. A first approach to this problem
We demonstrate here a data based optimization algorithm but with known constraints was given in [9] while [10]
on an AWE design actively used by a commercial company assumes that the outputs of the objective and constraint
(Skysails) in large marine vessels to increase their fuel functions are dependent. The authors in [11] use an aug-
savings [4]. A tethered flexible airfoil is launched from mented Lagrangian approach to transform the problem to
a mounting station at the front of the ship towards the an unconstrained one. We follow here an approach where
sky where it performs figure eight loops using a custom we also learn the system dynamics and make predictions
made low level controller. In favourable wind conditions about its future states which incur additional reachability type
the aerodynamic force generated upon the foil is transferred constraints. We use simulation results to demonstrate that the
through the tether to the boat and pulls forward, reducing the AWE system can be both optimized and remain adaptive to
effort of its onboard motors. However, such a controller can wind variation while respecting the altitude safety thresholds.
neither guarantee an optimal trajectory nor that the airfoil
Section II describes the AWE system, section III gives an
will keep flying above an altitude safety threshold.
overview to GPs for optimization, section IV introduces the
We have previously addressed this problem in the frame- algorithm proposed, section V presents simulation results,
work of steady state constrained optimization and have while section VI provides a conclusion and discusses future
demonstrated that a data based approach can be a potential perspectives.
1 Sanket Sanjay Diwale and Dr. Ioannis Lymperopoulos
and Prof. Colin N. Jones are with the Automatic Control
II. S YSTEM D ESCRIPTION
Laboratory at the Department of Mechanical Engineering in EPFL, Skysails has developed a commercial system to assist the
Lausanne - 1015, Switzerland sanket.diwale@epfl.ch,
ioannis.lymperopoulos@epfl.ch, propulsion of large maritime vessels. The system consists of
colin.jones@epfl.ch a flying foil, a control pod applying deflections to the flexible
48
1.5 46
φ4 φ3 φ1 φ2
1 44
42
θ − coordinate [deg]
0.5
40
radians
38
−0.5
36
−1
34
θ
−1.5 φ
ψ 32
Reference
−2
30
2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 −50 −40 −30 −20 −10 0 10 20 30 40 50
time (Arb. units) φ − coordinate [deg]
Fig. 1: Low level tracking control for changing set points. Fig. 2: Example of kite trajectory (blue) and switching
The yaw (ψ - kite orientation) controller tracks the level surfaces (red) for the Skysails controller in the spherical
reference signal which changes value when a crossing occurs. coordinate system. Arrows denote the direction of crossing
The positional states (θ and φ) are affected by ψ. for which a switching surface is active.
surface and a mounting unit that transfers the tension from positions. The set point values at each corresponding switch
the tether to the boat. represent the high level decision variables employed by our
optimization algorithms.
A. System Dynamics
When the kite crosses a particular section the yaw set
The skysails model neglects the mass in the dynamics (a point is changed to a predefined level such that the kite flies
reasonable assumption for large counteracting aerodynamic a stable loop. The kite is considered to be crossing the ith
forces) and substitutes the sagging tether by a rigid rod (ac- switching surface when:
ceptable while the tether remains under substantial tension).
The equations of motion describing the system are ϕ − ϕi = 0 & k ϕ̇ > 0; k = 1 or − 1 (5)
wE w Each surface is defined by a constant ϕi giving its position
ϑ̇ = cos ϑ cos ψ − sin ϑ (2) in the ϕ space and a constant k taking values 1 or -1 (to
L L
wE cos ϑ define the orientation of the crossing). Each surface considers
ϕ̇ = − sin ψ (3)
L sin ϑ crossings only in one direction and ignores crossings in the
ψ̇ = wEg cos (ϑ)δ + ϕ̇ cos ϑ. (4) other direction. A depiction can be seen in Figure 2. A
detailed analysis can be found in [12], [13].
The spherical coordinates (ϑ, ϕ) and orientation (ψ) of the
airfoil represent the states of the system. The exogenous III. G AUSSIAN P ROCESSES FOR C ONSTRAINED
signal is the wind speed (w). Note that the system coordinates O PTIMIZATION
are defined in such a way that one of the horizontal axes Gaussian processes are an extension of the multivariate
remains always aligned to the wind direction. The main sys- Gaussian distribution to infinite dimensional spaces where
tem parameters are the airfoil glide ratio (E), the deflection any finite combination of outputs is jointly Gaussian. They
coefficient (g) and the tether length (L). Finally, δ is the are used in supervised learning mainly for regression from
deflection applied to the kite affecting its orientation. a Bayesian perspective. A GP essentially describes a dis-
The system exhibits nonlinear behaviour even with con- tribution in the function space. GPs are more flexible than
siderable simplification of the aerodynamics and it is dif- parametric models and due to Gaussianity assumptions can
ficult to find closed form expressions for the functions of make predictions, incorporate new measurements and quan-
interest. Numerical optimization results using model based tify uncertainty using mostly closed-form expressions [14].
approaches, while useful, would be challenged in realistic
A. Gaussian Processes - Regression
situations where the model mismatch and wind conditions
are unknown. A GP is fully specified by its mean m(x) and covariance
(or kernel) function k(x, x0 ) . For a finite set of N training
B. Skysails Control data points D = {xi , yi }i=1:N generated by an unknown
A simple but robust low level controller developed by function h : Rd → R, the GP assumes a multivariate
Skysails is used for tracking “figure eight” loops. The distribution
controller uses a cascaded control scheme where a given
y1:N ∼ N (m1:N , K(x1:N , x1:N )) (6)
set-point on the kite orientation (yaw angle, ψ) is tracked by
applying deflections to the kite, see Figure 2. This set-point where y1:N = h(x1:N ) and m1:N is compact notation for
changes during every single loop at predefined switching m(x1:N ). For any point (or collection of points) x∗ , we can
analytically predict the output of the learned function y ∗ = A. Steady State Optimization
h(x∗ ) as a conditional distribution For this setting we assume that the decision variables (u)
y ∗ |D, x∗ ∼ N (µ(x∗ |D), σ(x∗ |D)), (7) are being repeatedly applied until the system settles in a
stationary orbit. This way the state (x) of the system does
where
not affect the objective (P ) or the constraint function (G)
−1
µ(x∗ |D) = m∗ + k(x∗ , x1:N )K1:N (y1:N − m1:N ) which now depend only on the input applied. Thus, we solve
and the problem in the static optimization setting (no dynamics
−1
learned) with a chance constrained formulation
σ(x∗ |D) = k(x∗ , x∗ ) + k(x∗ , x1:N )K1:N k(x1:N , x∗ )
with compact notation K1:N = K(x1:N , x1:N ). max EIP (u)
u
The element that encodes our assumptions on the shape s.t. P(GPG (u) ≤ 0) ≥ 1−β (9)
and properties of the learned function is the kernel. We use σG (u) ≤ ασn ,
here an ARD (Automatic Relevance Determination) Squared
Exponential (SE) kernel, defined as where GPG is a GP approximating the constraint function
and EIP is the expected improvement for the GPP (ap-
(x − x0 )T Λ−1 (x − x0 ) proximating the objective function). The constraint function
kSE (x, x0 ) = σy2 exp(− ) + σn2 δii0 ,
2 is replaced with an auxiliary statement requiring that the
where Λ = diag(λ1 , ..., λd ) and δii0 is Kroenecker’s delta (= probability of satisfying the constraint at a sample point u is
1 iff i = i0 , 0 otherwise). The parameters η = {σy , λ1:d , σn } larger than (1 − β). By setting β appropriately we can adjust
(usually called hyperparameters in the GP terminology) are the conservativeness of our approach. Due to the use of GPs
being learned from the measurements D. This kernel includes we can easily derive this probability at any point u.
measurement uncertainty (σn ). We select as appropriate The second constraint restricts search only in locations
hyperparameters η those that maximize the log marginal where the variance (σG ) of GPG is below a threshold propor-
likelihood for the observed data (for many cases this function tional to the measurement uncertainty (ασn ). Large variance
is nonconvex and presents an optimization challenge). The implies that this part of the domain has not been adequately
selection of hyperparameters is essentially the learning phase sampled and usually the estimates are not accurate even in a
that allows for the construction of a covariance matrix probabilistic setting (GPs like most regression methods have
including any point of interest and the already sampled difficulties in extrapolating). This reduces the convergence
points. The function output at this point of interest can be rate but does not allow significant constraint violations. The
easily calculated in a closed form manner using (7). full procedure can been seen in Algorithm 1.
B. Gaussian Processes - Optimization
Instead of directly optimizing over the mean of the learned Algorithm 1 Stready State Optimization
function we follow a methodology where an auxiliary ac- 1: Initialization. Choose N points in the feasible set (form-
quisition function indicates the next sampling point and ing D)
gradually progresses towards the optimum while also learn- 2: Training. Learn the hyper-parameters for GPG and
ing the unknown function [15]. This auxiliary function is GPP
called Expected Improvement (EI) and quantifies the ex- 3: Optimization. Find Pmax among the sampled points
pected magnitude of improvement over the current best value 4: - Calculate EIP using Pmax for the objective function
ymax = max y1:N +K (where N are the initial training points 5: - Calculate the Chance Constraint on GPG
and K the additional points sampled during the process). This 6: - Calculate the Variance Constraint on GPG
function can be analytically derived for any given point x in 7: Next Point. Select the point ũ = arg max EIP that
the search space as satisfies all constraints
8: if P (ũ) ≥ Pmax and max EIP ≤ tolerance then
EI(x) = σ(x)[vΦ(v) + φ(v)], (8)
9: Terminate and use u∗ = ũ
where v = (y(x) − ymax )/σ(x), σ(x) is the variance as 10: else
predicted by the GP and Φ(·), φ(·) are the cumulative and 11: Add (ũ, P (ũ)) in D and Go To 2
probability density functions for the normal distribution. EI 12: end if
promotes points with high variance and large mean and
allows for sampling in locations where the mean is low but
there exists substantial potential for improvement. B. Optimization with Transient Measurements
IV. L EARNING AND O PTIMIZATION WITH T RANSIENT We solve now the optimization problem, as described in
M EASUREMENTS the general setting (1), that incorporates system dynamics.
Before proceeding in the main contribution of the paper, We define the discrete dynamics as the change in the system
where we accommodate for the use of transient measure- state after one complete loop. A loop is complete when
ments, we review our previous approach that is using only the kite crosses a particular switching surface (described in
steady state measurements. subsection II-B). We assume that the evolution of the state
can be directly observed within some measurement error. The and (11g) restrict the choice of xs , us to pairs which do not
system dynamics can then be represented as violate the altitude constraint in steady state and transient
respectively.
xn+1 = f (xn , un , wn ) (10) The bounds on the uncertainty of the GP output is
where (xn , wn ) are the state of the system and the exoge- described in constraints (11h) and (11i) which act as a
neous signal (here the wind) at the beginning of the nth trust region constraint. This way we avoid selecting points
loop, un is the control sequence throughout the loop and for which a GP exhibits very large variance. Here, σnf
xn+1 is the state at the end of the loop. We also measure the and σnG are the measurement noise of the state (angles
objective P (xn , un , wn ) and the constraint G(xn , un , wn ) ϑ, ψ) and the altitude respectively. The rest of the variables
function output over a single loop. (s , r , βs , βr , βGs , βGt , α, with common values . , β. = 0.1
Since we can only observe the outcome of applying un and α = 3) act as tuning parameters and govern the
at a specific (xn , wn ) and have no model for f (·), P (·) and aggressiveness of the search.
G(·) we learn them as GP regression models, as described A fundamental difficulty with the transient approach is
in subsection III-A, and denote them GPf , GPP and GPG that we do not have direct measurements of P for any steady
respectively. state (xs , us ). To evaluate EIp however, we need a Pbest
For this setting the formulation of the optimization prob- over which all the possible next sampling points are to be
lem becomes compared. We choose as Pbest , using (12), the point that
maximises over the mean prediction of average force by
GPP . With this formulation, Pbest plays the role of ymax
max EIP (xs , us , wn ) (11a) in (8). The corresponding input is called ubest .
xs ,us
s.t. The complete method can be seen in Algorithm 2. Note,
that here there is no terminating condition, since the exoge-
x̂n+1 ∼ GPf (xn , us , wn ) (11b) nous signal might be constantly changing and the system
Ĝ ∼ GPG (xn , us , wn ) (11c) seldomly resides in steady state.
P( kxs − x̂n+1 k < s | xn = xs ) > 1 − βs (11d)
Algorithm 2 Transient Optimization
P( kxs − x̂n+1 k < r ) > 1 − βr (11e)
1: Initialization. Choose N set-points (u) in the feasible
P( Ĝ > 0 | xn = xs ) > 1 − βGs (11f) set and observe the transients (for Objective, Constraint,
P( Ĝ > 0) > 1 − βGt (11g) Dynamics); (forming an initial D)
σf (x0 , us , wn ) < ασnf for x0 = xs , xn (11h) 2: Training. Learn hyper-parameters for GPG , GPP , GPf
3: Optimization. Find Pbest , ubest from (12) for current
σG (x0 , us , wn ) < ασnG for x0 = xs , xn (11i)
conditions of (xn , wn )
- Calculate EIP∗ and u∗s using Pbest for the objective
function by solving (11)
Pbest = max µP (xs , us , wn ) (12a) 4: Next Point. Select the next set-point ũn as follows:
xs ,us
s.t. (11b), (11c), (11d), (11e), (11f), (11g), (11h), (11i) 5: if EIP∗ (xs , us , wn ) ≤ EIthreshold and ubest is feasible
then
The objective function, in (11a), is the EI for the GPP 6: Use ũ = ubest
(here P is the average force for a single loop). We call the 7: else
optimum Expected Improvement obtained from (11) as EIP∗ 8: Use ũ = u∗s
and the corresponding arguments, x∗s , u∗s . The subscript (s) 9: end if
denotes steady state conditions. It is important to note here 10: Add measured data to D and Go To 2
that we are searching for a steady state solution (xs , us )
while the system might be residing in a transient state xn .
Furthermore, we can only directly manipulate us , while xs V. R ESULTS
is the steady outcome of that input sequence. We implement our algorithm in Matlab and the Skysails
The dynamics of the system are described in a probabilistic controller in Simulink. For the GP regression we use the
setting, through GPf as in (11b). The probabilistic output of object-oriented Matlab toolbox TacoPig [16]. We use the
the constraint function is given in (11c). skysails model described in II-A (with parameters E = 10,
The constraint (11d) accepts only the (xs , us ) which L = 520, g = 1/7) to generate data corrupted with noise.
approximate a stationary orbit (tolerance s ) with high proba- No algorithm is provided with any direct knowledge of this
bility. The chance constraint (11e) restricts to steady states xs model.
close to states that are reachable in one loop starting from the For constant wind conditions, we start both Algorithms
current state xn . This is important to maintain consistency 1 and 2 with N = 15 initial training points in the feasible
in the decisions taken by the algorithm from one iteration to set. This is possible from prior experience of the operator.
another and prevent the steady states chosen to continuously For Algorithm 1, set points are repeated for 10 loops, until
keep jumping in inconsistent directions. The constraints (11f) the kite reaches a steady trajectory. On the other hand,
80 80
60
Force
60 40
Force
40 20
0
20 0 10 20 30 40 50 60 70
iterations
0 400
0 10 20 30 40 50 60
Altitude
iterations
300
200
400
0 10 20 30 40 50 60 70
iterations
Altitude
14
300
Wind (m/s)
12
10
200
8
0 10 20 30 40 50 60 6
iterations
0 10 20 30 40 50 60 70
iterations
Fig. 3: Convergence of Algorithm 1 in constant wind condi- Fig. 4: Maximum objective tracking with Algorithm 1 under
tion and 200m altitude constraint. Red circles represent mea- varying wind conditions.
sured values and black crosses represent Pmax (predicted)