Professional Documents
Culture Documents
z(m)
used to compute the flight path of the ornithopter in real time. −20
−35
I. I NTRODUCTION A ND R ELATED W ORK −40
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 30,2022 at 10:12:09 UTC from IEEE Xplore. Restrictions apply
considered a non-linear model for describing the ornithopter is a tuple of values m = (δ, f ), where δ is the tail deflection,
behaviour during flapping and gliding maneuvers. Unfortu- determined by the deflection angle (up and down), and f
nately, OSPA is only valid for online application in short is the wing flapping, determined by the flapping frequency
distances scenarios. Furthermore, in these scenarios OSPA (including zero value for gliding). Thus an optimal trajectory
only minimizes the positional distance, without taking into can be done by combining flapping and gliding actions (see
account the ornithopter velocity and pitch values at the target [9] for more details).
configuration. We assume an open space where the drone flies without
Here, we present a new approach to online plan effi- obstacles and we consider two scenarios: the mid-range
cient trajectories for an ornithopter both for long and short problem, where the X-distance between starting and target
distances scenarios. The method uses neural networks and points is within a range of 25 to 100 meters, and the landing
significantly improves the computational time and energy problem, where the X-distance is within a range of 15 to 25
consumption of the more recent trajectory optimization ap- meters.
proach. To train the networks, we build a data set of energy- The optimization function is the total energy cost (battery)
optimized trajectories based on the OSPA algorithm. We consumed by the ornithopter. Two main metrics will be used
design two types of neural networks that can learn an optimal in the experiments: the cost and the precision. The cost is
policy from the data set and predict the sequence of states given by the energy consumption determined by maneuvers
of an energy-efficient trajectory to reach the target. performed by the ornithopter it is dominated by the flapping
In summary, our main contributions are the following: maneuvers. The precision measures how far the final state is
• We create a novel data set for trajectory optimization from the target in the XZ-plane.
based on the planner OSPA. There are some approaches to address this optimization
• We design an algorithm using an artificial neural net- problem but unfortunately, solving the non-linear system that
work (ANN) to learn maneuvers and predict energy- describes the dynamics of the ornithopter is time consuming
optimized trajectories that can be computed online. and the methods from the literature are no valid to be
• We use an alternative recurrent neural network (RNN) used online. Thus, the main goal in this paper is to use a
to learn states instead of maneuvers and predict the next neural network to compute (in real time) an energy-efficient
state for accurate trajectories. trajectory for the ornithopter.
• We show that our neural network-based optimization For modelling the longitudinal motion of the ornithopter
algorithms can be used for real-time trajectory optimiza- prototype, the following non-dimensional Newton–Euler
tion both for medium and short distances. Indeed, our equations is used in [9]:
methods significantly reduce computational costs while
resulting in trajectories comparable to those produced du
2M = Ub2 [(CL + ΛCLt ) sin α
by OSPA. dt
The remainder of the paper is organized as follows: the + (CT − CD − Li − ΛCDt ) cos α]
problem description is stated in Section II. Afterwards, the − sin θ − 2Mqw (1)
created data set is introduced in Section III. The NN-based dw
algorithms are presented in Section IV, while computational 2M = Ub2 [−(CL + ΛCLt ) cos α
dt
analysis and results take place in Section V. Finally, conclu- + (CT − CD − Li − ΛCDt ) sin α]
sions are outlined in Section VI.
+ cos θ + 2Mqu (2)
II. P ROBLEM D ESCRIPTION 1 dq
= CL cos(α) − (CT − CD ) sin(α)
Suppose that we have an autonomous ornithopter with χUb2 dt
a known model of its dynamics. The problem is to plan + LΛ[CLt cos(α) + CDt sin(α)]
online efficient trajectories to navigate the ornithopter from
− RHL [CL sin(α) + (CT − CD ) cos(α)] (3)
a starting point to a target location. Those trajectories have
to comply with the ornithopter dynamics, being thus flyable. dθ
= q, (4)
Then, we assume the existence of lower-level algorithms to dt
control the ornithopter through the computed trajectory. where α is√the angle of attack, defined as α = arctan(w/u),
A trajectory is a sequence of flight states, where a state and Ub = u2 + w2 the velocity module. M, χ, Λ, L and
describes the ornithopter configuration at a given instant of RHL are characteristic non-dimensional parameters of the
time. A flight state is given by a tuple s = (x, z, u, w, Θ, q), UAV. These parameters are obtained by scaling the variables
where x and z are the positional values, u and w are velocity with the characteristic speed, length and time:
components in the body reference frame, Θ is the pitch
angle and q is the pitch angular velocity. Our trajectories r s
2mg c ρSc2
are constrained to the XZ plane, as we use a longitudinal Uc = , Lc = , tc = , (5)
ρS 2 8mg
motion model for the ornithopter.
The ornithopter reaches a state (dynamically feasible) where m is the mass of the UAV, ρ the air density, S the wing
performing a specific control maneuver. A control maneuver surface, c the mean aerodynamic chord and g the gravity
1666
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 30,2022 at 10:12:09 UTC from IEEE Xplore. Restrictions apply
IV. N ETWORKS
The OSPA planner performance depends on the maneuvers
it takes at a given state. We propose two types of networks
to evaluate performance against OSPA: a network to predict
actions and a network to predict states. This derives in
two different kind of problems, classification and regression,
respectively.
The advantage of the maneuver classification network
is that it can generate feasible paths for the ornithopter
according to predefined dynamic model and, it is possible
Fig. 2: Schematics of the ornithopter with the forces acting to compute the cost in terms of energy. However, this
on it. Axis XZ represents the Earth frame; axis X 0 Z 0 a network only consider a discrete set of available maneuvers.
translation of the Earth frame; and axis Xb Zb the body frame. Therefore, we propose a second type of network to directly
predict a set of future states given the past states in the flight.
Thus, we tackle the continuous problem, even if the training
acceleration. data is generated in a discrete manner.
Figure 2 shows the forces acting on the vehicle, as well
as the representative variables and reference frames used. In A. Artificial Neural Network (ANN)
[9], the authors explain how to compute the coefficients for The data set is used to train a deep neural network in a
the aerodynamic forces of the wing and the tail. supervised manner. We model a trajectory from the data as
a discrete set of state-maneuvers pairs. Since the number of
III. DATA S ET
maneuvers is fixed, we train the network to learn the next
As far as we know there are no data sets available for control maneuver depending on the current and target states.
ornithopter trajectory optimization. Thus, we created a novel Before training, a pre-processing of the data set is per-
data set to train the NNs using the OSPA planner. For the formed to remove non-representative data. We observed that
scenarios described before (gliding and landing problems), some of the control maneuvers are used in less than the 2%
we generate a data set of pairs (s, m) where s is a flight of the trajectories in the whole data set, thus the trajectories
state and m is the corresponding control maneuver. which contains these maneuvers are deleted, keeping 99% of
In the mid-range problem we aim to obtain low energy the data intact and 7 maneuvers in total.
trajectories within a precision bounded by 3 meters. For this 1) ANN architecture: The goal is to build an ANN to learn
problem we generate over 400 different trajectories (3053 the best maneuver in each scenario. For this, we label the
states) between two random initial and target states sampled maneuvers with a tag from L = {0, 1, . . . 6}. The input of
within intervals in Table I. From each trajectory, we store the network is the difference between the current and target
the state-control waypoints returned by OSPA and we use states, and the output is the maneuver to be performed.
the 80% to train the models and the rest for validation. We build an ANN with only two hidden layers in order to
For the landing problem the ornithopter needs accurate obtain fast computational inference. We consider a six units
trajectories, thus precision is more important than energy. input layer and a seven units output layer. The first hidden
We select trajectories within a precision bounded by 0.25 layer quadruple the size of its input, and the second hidden
meters. Also, in order to be ready for a perching maneuver, layer doubles it. Moreover, each of this layer contains a
the final state must have speed and pitch values near to 0 m/s ReLU activation and a Dropout layer as a regularization tech-
and 30◦ , respectively. We generate over 600 different landing nique. All the input variables are normalized by subtracting
trajectories (11024 states) with initial and target randomly the mean and dividing by the standard deviation. Attempts
sampled within the intervals in Table I. From this data we with other architectures did not improve remarkably the
use the 80% to train the models and the rest for validation. results.
We release the generated data set1 for further study and 2) Training: The network is trained with the Adam op-
reproducibility. timizer using a learning rate of 0.001, a β1 value of 0.9, a
Variable Initial State Target State (MR) Target State (L)
β2 value of 0.999, an epsilon value of 1e-8 and no weight
X (m) 0 [25, 100] [15, 25] decay. The learning is stopped after 300 epochs. The loss
Z (m) 0 [−40, 0] [−10, 0] function that minimizes the learning process is a weighted
Θ (◦ ) [−30, 30] 30 30 cross entropy function defined by:
Ub (m/s) [1, 4] 0 0
1667
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 30,2022 at 10:12:09 UTC from IEEE Xplore. Restrictions apply
Loss per epoch
current state. For the sake of comparison, we propose a RNN
to calculate the ornithopter trajectory based on a sequence
of flight states.
1) RNN architecture: We build a RNN with one recurrent
layer and one output layer. The output layer has as many
units as flight states are considered for prediction (two states
for the mid-range problem and six states for the landing
problem). The input is the set of differences between each
element in the sequence of past states and the target state. All
input variables are normalized following the same strategy as
(a) in the ANN scenario. The output is the distance of the next
state to the target, and has a linear activation function. This
represents a nice strategy to push the network to predict low
values while keeping the ability to compute the next state,
which can be obtained by simply subtracting the output of
the network and the target state. The recurrent layer has 11
LSTM neurons. We have chosen LSTM neurons to deal with
the vanishing gradient problem encountered by traditional
recurrent neurons. We choose to keep a small architecture to
(b) obtain fast predictions rate.
2) Training: The network is trained with one trajectory at
Fig. 3: (a) and (b) show the behaviour of LCE and LM SE a time, by comparing its output with the expected values. The
loss functions during training phase for ANN and RNN, optimization process is carried out by the Adam optimizer
respectively. In both cases the loss value has converged for using the same parameters as the ANN. The learning is
the testing data set. stopped after 100 epochs. The loss function used is the MSE
(Mean Squared Error):
m and q represents the probability of input s of being of N
1 X 2
class m following the ANN output distribution. We use a LM SE = (d(yi , yˆi )) ,
softmax function to obtain this probabilities. N i=1
The distance measure ε is defined by the distance matrix where N is the number of considered flight states, yi is the
between labels (maneuvers), M7X7 : i component of the OSPA state and yˆi is the i-component of
the RNN predicted state. In our specific case, minimizing the
ε(N (s), y(s)) = M [argmax(N (s)), argmax(y(s))] MSE implies that the likelihood function between the pre-
dicted states distribution and the OSPA states is maximized.
Each cell (i, j) in M represents the Euclidean distance Figure 3b shows the loss value during training.
between action i and action j. The lower the value of a 3) Path computation: The RNN algorithm in the predic-
cell, the more similar the maneuvers are. The use of this tion mode uses an initial input and develops a complete
measure as a weight value for the loss function leads to a trajectory sequence starting from it. The inputs for our
fast performance in the training phase. Also, this technique trajectory computation are just the start and target states of
improves the generalization of the network. Figure 3a shows the desired trajectory. These states are used to compute the
the loss value during training. The validation loss decreases initial normalized distance to the target (x0 ) which is used to
with the training loss, which implies that the networks is start the RNN prediction algorithm. At every time step t the
learning properly and that there are no overfitting effects. information to predict the next state consists on the previous
3) Path generation: Once trained, the ANN can predict
prediction of the distance at t − 1 (yt−1 ) and the recurrent
the control maneuver m from a flight state s. Also, we im-
layer hidden states at t − 1 (ht−1 ). The algorithm ends when
plement the kinodynamic model introduced in [9], K(s, m),
an output yt is close enough to zero, i.e. the target has been
to compute the next state given an initial state and a control
reached and the trajectory is over.
maneuver. This allows us to compute the full trajectory by
using the next formula in a loop starting at the initial state: V. R ESULTS
si+1 = K(si , N (si )). A. Mid-range problem
The loop is stopped when the next state if further than the In this section we compare the three methods, the NN-
current one from the target. based algorithms and the OSPA planner. We use the whole
data set, including training and validation samples, to com-
B. Recurrent Neural Network (RNN) pute the performance metrics (this is not an advantage for the
The goal of the proposed ANN is to predict the next NNs). Both networks iteratively compute a trajectory from
maneuver at each time step using the information of the the initial state to the target. This means that the networks
1668
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 30,2022 at 10:12:09 UTC from IEEE Xplore. Restrictions apply
Algorithm Cost (W) Time (s) Precision (m) Error
OSPA 34.68 520 2.84 na
ANN 35.39 0.49 2.99 1.57
RNN na 0.40 0.37 1.31
1669
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 30,2022 at 10:12:09 UTC from IEEE Xplore. Restrictions apply
| z/x | Cost(ANN | RNN) Time(ANN | RNN) Precision(ANN | RNN) Error(ANN | RNN)
[0.8, 1] 21.61 | na 0.45 | 0.40 1.57 | 0.18 0.91 | 1.90
[0.6, 0.8] 24.48 | na 0.42 | 0.38 1.85 | 0.12 0.46 | 1.08
[0.4, 0.6] 28.80 | na 0.50 | 0.43 3.06 | 0.29 1.73 | 1.32
[0.2, 0.4] 31.96 | na 0.52 | 0.46 3.96 | 0.42 1.69 | 1.22
[0, 0.2] 52.26 | na 0.95 | 0.81 1.89 | 0.47 1.55 | 1.41
1670
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 30,2022 at 10:12:09 UTC from IEEE Xplore. Restrictions apply