You are on page 1of 6

Trajectory Learning for Stable Bipedal Walking

Robots using Sequential Networks


Gaurav Kumar Yadav Tanej Kumar Kata Shruti Jaiswal G.C.Nandi
Member IEEE Guru Ghasidas University Member IEEE Senior Member IEEE
IIIT Allahabad tanej14040@gmail.com IIIT Allahabad IIIT Allahabad
gauravkumaryadav51@gmail.com shruti.jaiswal123@gmail.com gcnandi@iiita.ac.in

Abstract—Bipedal walking in an unknown environment is an Many such models used analytical techniques like ZMP (Zero
extremely challenging problem to solve due to its inherently moment point) based walking, Inverted Pendulum based walk-
unstable structure. For bipedal robots, so far, this problem ing, Central Pattern Generator etc. ZMP based model [1] says,
has been tried using analytical methods with predefined sets
of parameters, having limited success. Such robots could walk if ZMP lies within the support polygon of feet then there
only in a structured environment with a flat floor and other is a guaranty of stable walking. Inverted pendulum based
restrictions. Walking with learning ability is an excellent way to models are capable of achieving stable walking. [2]. The
negotiate with the unstructured/uneven terrain which is a more fundamental inverted pendulum model has the assumption
humanoid way since we also learn over time( a child takes 8- that the mass of the whole robot body concentrated at the
12 months learning how to walk) how to walk by balancing
our bipedal structure. In this paper, we propose such models centre of mass, connecting link to COM and pivot should be
having learning ability to be imparted to the biped robots mass less. Another very efficient way to generate the walking
based on walking sequences. More specifically,two sequential trajectories is by using the central pattern generator method.
network models have been configured, Long Short Term Memory The central pattern generator is a combination of a group
(LSTM) and Gated Recurrent Unit (GRU) to learn the joint of neural circuit modelled by recurrent neural networks [3].
trajectories of hip, knee, and ankle joints, separately and studied
their performances. Using these two models, learning has been Bipedal walking using a neural-based oscillator is a model-free
imparted to a robot both in the sagittal plane as well as in the approach used to control biped walking by generating joint
frontal plane. We use the Pybullet physics engine of a biped robot motion trajectories [4]. However, almost all analytical model
for simulation purposes. Data have been collected by running based approaches have limitations in making robot capable
several simulations and trained our sequential network models of negotiating ( walking) in an unknown environment having
with these collected data. We observe that the walking patterns
of the present model could be learned both by GRU and LSTM uncertainties and hence such models largely confined inside
within ten episodes. Since the GRU model has much fewer laboratory environments and never came out for wide range
network parameters compared to LSTM, we suggest using GRU practical applications. Currently, researchers are working to
for controlling the walking of simple biped robots like the one we build models using a more constructive approach having com-
have studied. For more complex modes, such as walking with long bination of both the analytical model and learning-based model
steps, brisk walking, walking having push recovery capability
etc., we may need to collect large sets of data for training and to exploit analytical model’s huge success in laboratory envi-
testing to suggest which model would be more suitable. ronment in achieving stable walking, not only in static state but
Index Terms—Biped robot model, Pybullet, Gated Recurrent also in dynamic walking state (although only in a structured
Unit (GRU), Long short term memory (LSTM). environment) together with learning based model’s capability
of learning walking trajectories in unstructured environments.
I. I NTRODUCTION In learning-based approaches, researchers are increasingly
In this global pandemic, because of COVID-19, worlds working on using machine learning, reinforcement learning
feel the need for more advanced humanoid robots. Those can and Deep Learning based approaches (particularly sequential
provide support to doctors and are capable of taking care of models) for achieving stable walking. Machine learning-based
patients. Wheel based humanoid robots can commonly be seen models [5] are used to take a decision for the robot which leg
in the hospital during this time. Robots help in thermal screen- should move forward first and where to place the swing leg etc.
ing, collecting samples and taking care of patients. Legged in a particular gait cycle. Authors in [6] uses deep deterministic
humanoid robots, although more suitable for the unstructured policy gradient algorithm to train a bipedal walking robot
environment, are still facing the challenges of walking sta- for several trail and error, and the model does not have any
bly in unknown environments owing to its inherent stability prior information of itself or environment dynamics. After
problem due to inverted pendulum like unstable structure training, authors claim to get successful stable walking of the
having large number of degrees of freedom. Over the years, robot. Another paper [7] discusses how robots could learn gait
many researchers intensively worked on developing model cycles to walk using reinforcement learning. In this paper,
based approaches which produced good results in a structured we focus on making a biped robot capable of walking stably
environment having very limited uncertainties/disturbances. using a learning-based approach. Walking being a periodic and
sequential phenomenon can better be modeled using LSTM
and GRU (the fundamentals of which have been explained in
section III) for learning joint trajectories for stable walking.
In our experiment, we use a bipedal robot model and record
the data of stable walking by varying its parameters and
for various possible combinations of walking cycles. Main
contributions of our paper are:
• Generating stable walking data from Pybullet biped robot
for several walking patterns such as walk-1, walk-2, and
walk-3 etc. having different stride, as shown in table-1.
• Subsequently, trained two sequence models -LSTM and
GRU for Learning joints trajectories for stable walking.
• Compare the post learning joint trajectories generated by
LSTM and GRU models.
• Implementing both the proposed models in a biped robot
using the Pybullet physic engine and suggesting which
model is optimal from the performance point of view. Fig. 1. Biped Robot
Further, this paper has been structured as follows: we dis-
cuss previous work done under the subtitle ”the Analysis of
previous research” in section II. Preliminaries and details of for different architectures of recurrent neural networks. Long
the experiments performed have been discussed in section III. short term memory with the deep neural network can help
The methodology with flow diagram has been discussed in to detect walking gait phase using acceleration signal [12].
section IV. Section V describes the results obtained from our In bipedal walking other trajectories like the centre of mass
experiments and their subsequent analyses . At the end, we trajectory and foot trajectory also decide the robot walking
conclude and recommend for the future work in section VI. characteristics. The authors in [13] this paper discusses both
centre of mass and foot trajectories generation using the neural
II. A NALYSIS OF P REVIOUS R ESEARCH oscillator. In this work, we analyzed two sequential networks
The role of trajectories in biped walking decides the char- LSTM and GRU to generate the joints trajectories of various
acteristics of walking. Crooked paths of joints hip, knee, and joints of both legs.
ankle define by the joints angle for the time. In generic walking
on the sagittal plane, hip trajectory follow the sinusoidal III. P RELIMINARIES AND E XPERIMENTAL D ETAILS
pattern, knee supports the double hump, and for ankle, we can A. Biped robot model
say similar to inverted double hump trajectories. Authors in We have used a biped model [14] in the Pybullet physics
[8] discussed the trajectory generation for hip, knee, and ankle engine for our experimentation purposes. The biped robot
joints using sinusoidal and cubic spline functions. The generic model is written in URDF (Universal Robotic Descriptive file)
nature of the hip joint trajectory in the sagittal plane is sinu- format, so it can be imported in any simulator or physics
soidal, and the quality of the knee joint trajectory in sagittal is engine with URDF support. The biped model has a trunk and
the double hump. So the authors used the fundamental concept two legs, as shown in Fig. 1. It has six degrees of freedom
for generating these trajectories using sinusoidal and cubic in each leg (three at hip joint, one at knee and two at the
spline function. Walking of the bipedal robot is a trivial task ankle). In our experiment, the robot model has two degrees of
because of the nonlinear and underactuated structure. During freedom at the hip joint, one at the knee and two at the ankle
the swing phase, the knee joint feels free. The hip, knee, and joint. So the biped has a total of ten degrees of freedom. We
ankle angular joints trajectories decide the next angle of the are considering the robot walking in the sagittal and frontal
joints. Authors in [9] discussed to generate online paths of plane in our experiment.
swing leg during walking using particle swarm optimization
algorithms. Swing leg contains the underactuated degree of B. Pybullet
freedom. So the trajectories of the swing leg help to decide the The Bullet Physics SDK (Software Development Kit) is
control movement of the swing leg. Generating stable paths for comprised of the Pybullet libraries. Unlike MuJoCo, Bullet
joints using a hybrid automate model is another way to achieve is a free and open-source physics engine. Pybullet is a quick
stable walking. Authors in [10] consider bipedal walk as a and convenient python module used for machine learning and
rocking block. During the double walking support, the phase robotics simulation [15]. It supports URDF, SDF (Standard
finds it as a vertical rectangle plane, and in the case of the left Database Format), MJCF (MuJoCo file). One can simulate
and right swing phase, they consider it a tilted rectangle plane. forward dynamics, compute inverse dynamics, also use for-
Using this approach, they achieved stable walking for their ward and inverse kinematics and collision detection query
model. Joints trajectories for stable walking using a sequential using Pybullet. TensorFlow and OpenAI Gym environment is
network with time delay proposed in [11] showed the results supported in Pybullet. Pybullet design works on client-server
flow through the gait unit. Computational complexity of LSTM
for per time step and weight is O(1) [18]. Sequential models
are efficient to learn and predict the sequential data like audio,
video, text, time series subject data. LSTM uses the concept of
cell state and gates. LSTM consist of three gates. The forget
gate, input gate, and the output gate.
1) Forget Gate: This gate is used to decide which infor-
mation is irrelevant and can be forgotten from the cell state.
The output of the previous state and input of the current state
will pass through a sigmoid function, as shown in equation [1]
and the production of it comes between 0 and 1. If the value
comes 1, means previous information is essential and relevant,
so it is passed to the next cell state. If the value of sigmoid
function comes 0, means previous information is irrelevant, so
the network gets rid of it.
Fig. 2. Stance Phase and Swing Phase
ft = σ(Wf ∗ xt + Uf ∗ ht−1 + bf ) (1)

API. The client sends the command, and the physics server here ft is the output of the forget gate. Wf and Uf are
returns the status of the control. Nowadays, many research the weight of the forget gate. xt is the input of the current
labs like Google Brain, Open AI, Stanford AI lab, etc. are state. ht−1 is the output of the previous state. bf is the bias
using Pybullet in their research and development. for forget gate.
2) Input Gate: The input gate comprises of two layers. The
C. Biped Walking Phases one, shown in equation [2], the sigmoid layer which is used
Walking is a periodic phenomenon; after one gait cycle, it to decide which value will be updated. Another one, shown
will repeat the same if there is no external disturbance and in equation [3], tanh layer creates the new vector value. A
if the walking surface is plane. A gait cycle can be divided combination of both, together with forget, will decide the new
in phases. A combination of phases defines the characteristics cell state value, as shown in equation [4].
of walking. The stance phase and swing phase are two broad
classification of one gait cycle. In our work, we consider the it = σ(Wi ∗ xt + Ui ∗ ht−1 + bi ) (2)
Stance phase and Swing phase, shown in Fig. 2. In normal
here it is the output of the sigmoid layer. Wi and Ui are the
walking, the role of the stance phase is 60%, and the role of
weights. ht − 1 is the output of the previous state and bi is
the swing phase is 40%. Subdivision of these two phases plays
the bias.
a major role in various walking behaviours [16].
1) Stance Phase: It is also called a double supported phase.
In this phase, both legs will lie on the ground. It starts from Ct0 = tanh(Wc ∗ xt + Uc ∗ ht−1 + bc ) (3)
heel strike then flat foot, heel off, and ends with the toe-off
here Ct0 is the vector of new candidate value.
period. These four steps are termed as Initial contact, Loading
response, Mid Stance, Terminal Stance, and Pre-Swing phase Ct = ft ∗ Ct−1 + it ∗ Ct0 (4)
[17].
2) Swing Phase: The single supported phase or swing Ct is the new cell state. It contains the old state multiplying
phase decides the forward movement of walking. In normal with the forget gate plus Ct0 * it . Now the new cell state
walking, we spend 20% lesser duration staying in swing phase information obtained will be passed through the network.
than in stance phase, but in case of brisk walking, the length 3) Output Gate: The role of output gate is to decide what
of the swing phase will be more than that of a stance phase. will be going to the output.
In the case of running, a period will come, when both legs
leave the ground, and this phase is called the lift phase. In the ot = σ(Wo ∗ xt + Uo ∗ ht−1 + bo ) (5)
case of walking, Initial Swing, Mid Swing, Terminal Swing
Cell state value passes through the tanh function and gets a
are the parts of the swing phase.
value between -1 to 1. It multiplies with the outcome of the
D. Long Short Term Memory sigmoid function and decides the output.
Long Short Term Memory (LSTM) is used to learn patterns ht = ot ∗ tanh(Ct ) (6)
in time dependant data. Learning really long term dependen-
cies like long time lag tasks is not easy to solve using recurrent ht shown in equation [6] is the output of the LSTM current
neural network, because of insufficient decaying error during cell unit, and it passes as input to the next cell unit. Ct will
back-propagation. LSTM solves this issue using constant error be pass as a cell state to the next cell.
E. Gated Recurrent Unit timeStep, damping, incline. bodyMovePoint defines the num-
Gated Recurrent Unit was proposed by KyungHyun Cho in ber of data points collected when the stance phase occurs.
2014 [19] [20] . It aims to solve the problem of vanishing gra- legMovePoint establishes the amount of data points collected
dient, which comes with standard recurrent neural networks. when the swing phase occurs. In our Analysis, we consider
GRU uses gated concept like LSTM. It uses update gate and only the stance phase and the swing phase. We keep body-
reset gate to solve the vanishing gradient problem. Update gate MovePoint and legMovePoint as 8,8 respectively, so data point
and reset gate are two vectors which play a vital role to decide will record 8 each for right stance phase, right swing phase, left
what information will pass through the future. It can store stance phase, and for left swing phase resulting into 32 data
information for a long time without vanishing the information points for one gait cycle.Height and stride defines swing foot
and get rid of that information which is not relevant for the lift height and stride length of walking respectively. Sit used to
network. represent an initial knee fold. SwayBody represents body sway
1) Update Gate: Update gate is used to determine, how length. SwayFoot represents foot sway length, and swayShift
much past information is needed to be passed to future. It is use to describe start point of the sway. landPull defines before
capable to learn long term time series dependencies. As shown putting the swing foot on the ground, move forward more and
in equation [7], update gate takes input from previous state and pulling back when landing on the ground. liftPush represents
current state. Unlike LSTM, here output of the previous state the gain momentum during the lifting foot, by pushing lifting
contain cell state information also. foot backwards. The timeStep is the simulation time. Body
position forward is used to make the body move forward.
zt = σ(Wz ∗ xt + Uz ∗ ht−1 + bz ) (7) Damping is to show damping at the start and end of the foot
lift. Inclination shows the tangent angle of incline. For each
here, Wz and Uz are the update weight. xt is the input to the set of parameters, we make robot walk in the following order:
current state. ht−1 is the output of the previous state. bz is the Describing biped model walk for two gait cycle. Start with
bias for update gate. right leg forward, which gives us a data of shape (16,12) then
2) Reset Gate: Reset Gate is use to decide, how much past left leg forward to complete one gait step, which provides us
information the model can forget. It is capable to remember with (32,12). Again repeat the same for the second gait cycle,
short term time series dependencies. In equation [8], sigmoid which gives us back (32,12). Combining all those for two
function takes inputs and give the value, it decides how much gait cycles, we get data of shape(64,12). These parameters are
previous information should be carried in the network. very tricky. If you change these parameters randomly then the
rt = σ(Wr ∗ xt + Ur ∗ ht−1 + br ) (8) form of the data will vary. Three sets [walk1, walk2, walk3]
of these parameters used to record the data points. Different
here, Wr and Ur is the weight for reset gate. xt and ht − 1 sets of parameters are given in Table[1]. We collected a total
are input of current state and output from previous state. br is of 864 data sets for our analysis. We used 85% of data for
bias. training and the remaining 15% of data for testing purposes.
3) Current Memory : It is the sub part of reset gate. As
Parameters Walk1 Walk2 Walk3
shown in equation [9] the value of tanh function will come BodyMovePoint 8 8 8
between -1 to 1. It is used to introduce the non-linearity into LegMovePoint 8 8 8
Height 50 50 50
the input and normalize input as zero mean. Stride [80, 85, 90, 95] [70, 75, 100, 105] [70, 75, 100, 105]
Sit [40, 45, 43] [40, 43, 45] [40, 43, 45]
h0t = tanh(W ∗ xt + U ∗ ht−1 ∗ rt ) (9) SwayBody 30 30 30
SwayFoot 0.0 0.0 [0, 1]
bodyPositionForward 5 5 5
4) Final Memory: The final output of the current state SwayShift 3 3 3
comes using equation [10]. ht is the output of current unit liftPush [0.3, 0.4, 0.5] [0.3, 0.4, 0.5] [0.3, 0.4, 0.5]
landPull [0.5, 0.6, 0.7] [0.5, 0.6, 0.7] [0.5, 0.6, 0.7]
and pass it as input with full information of past and current timeStep 0.06 0.06 0.06
state through the next state of the network. damping 0.0 0.0 [0.0, 0.1, 0.2]
inline 0.0 0.0 0.0
TABLE I
ht = (1 − zt ) ∗ ht−1 + zt ∗ h0t (10) B IPED MODEL PARAMETERS
here ht is the output of the state. zt is the update gate outcome.
ht−1 output of previous state and h0t is current memory.
B. Defining Models
IV. M ETHODOLOGY
We have build vanilla LSTM and GRU models using Keras
A. Collecting Data API. Both models have a single hidden layer with 100 cell
The bipedal robot model walks on the simulator with units. Optimizer and loss function used in our model are
various walking parameters, given in Table [1] for collecting Adam and means square error respectively. As we discussed
healthy walking joints trajectories data. These parameters are above, the total number of data sets we have used is 864. To
bodyMovePoint, legMovePoint, height, stride, sit, swayBody, overcome the underfitting problem in our model, we have used
swayFoot, bodyPositionForward, swayShift, liftPush, landPull, l2 regularizer with value 1e-3 and achieved excellent results.
We trained both models LSTM and GRU ten epochs, and we
have tested our models for these epochs.
1) Training Procedure: We have 864 data samples with
each having a shape of (64,12), so overall data shape is (864,
64, 12). Out of which we use 85% (734) of the data for training
and remaining with the same size used for testing. Here in
shape 12 shows the twelve degrees of freedom of biped model
in which two, right hip yaw and left hip yaw having a column
with zero values. For both models, the input will be the starting
data (1,12) and which give an output (1,12). We kept batch
size one which reads the one CSV file regarding one set of
parameters, and in this CSV, we have (64,12) data points where
input is first 63 rows as input and output from 2 to 64 which is
next 63 rows after the first row. So input is (63,12), and output
is (63,12) after one step. We trained both models for ten epochs
and the loss value for training LSTM is 0.00083552 and for
the GRU model for the same number of epochs is 0.00061025.
So after training, we use both trained models for the test case,
discussed in the result section.

C. Flowchart
In this work, initially, we imported a biped robot model
in the Pybullet physics engine. The biped model has a total
of twelve degrees of freedom, and we have used ten degrees
of freedom in our analysis. Now we initialize the model
Fig. 3. Flow chart
parameters like body move point, leg move point, stride length,
the time step of walking, sway body, sway foot, lift push, land
pull, etc. for three different walking behaviour as shown in generated by both LSTM and GRU models are quite closer to
Table [1]. After the initialization of parameters, the biped robot the original trajectories. The second image in Fig. 4 is hip roll
model walks stably using the analytical method of walking. trajectories. The third one is knee pitch trajectories. The fourth
For stable walking with different walk style, we recorded one is ankle roll trajectories, and the fifth one is the ankle pitch
the joint trajectory parameters. The data set we collected for trajectories. The characteristics as shown in the Fig. 5 are also
two gait cycles for walk1, walk2 and walk3 parameters. We the same as discussed above. After analyses, we conclude that
recorded 864 data samples of shape (64, 12). For training, we for simple walking gaits, the GRU model can be selected,
use 85% of data, and for testing, we 15% data. Using Keras since compared to LSTM, it requires much less number of
API, we build a vanilla LSTM and GRU model separately and parameters (35,412) for learning, with reduced computations
trained the model with training data and then used the trained compared to LSTM(46,412). However, one must be careful in
model to predict the joints trajectories of the biped model. We generalizing this statement. For complex gait cycles (such as
observe that both (LSTM and GRU) models have given paths having capability of slow as well as brisk walking, walking
closer to the real one. with long strides, walking with push recovery capability
may require to learn more complex trajectories and in such
V. R ESULTS AND A NALYSES
cases we need to carefully check the performances of both
We train both LSTM and GRU models for ten epochs LSTM and GRU models before making generalized statement.
against sagittal(pitch) as well as frontal(roll) planes . Mod- Nevertheless, for humanoid robots having multimodal walking
els without l2 regularization were showing no training loss, capability, GRU model to be selected as a first choice due to
updating behaviour after 4 epochs. We use l2 regularization the requirements of less computations.
and models start to learn (update itself). After training, the
results obtain from both the models have been shown in Fig. VI. C ONCLUSION AND F UTURE W ORKS
4 and Fig. 5. Both LSTM and GRU models have generated In this paper, we have emphasized that a learning-based
joint trajectories for hip, knee, and ankle joints. Fig. 4 shows model should be developed for controlling the walking of
right leg joints trajectories, In which the five trajectories have biped robots since learning-based walking has the capability of
shown. The first image of Fig. 4 shows hip joint pitch trajec- adapting walking gaits in a new walking environment. In the
tories, Out of three trajectories, the blue one is the original present investigation, we have collected real-time data from a
trajectory against which we are bench marking our models, biped robot available from the Pybullet physics engine. These
the green one generated by the GRU model and the black one data have been used for training two sequence models, LSTM
is generated by LSTM model. One can see that the trajectories and GRU. From the simulation studies, it has been observed
Fig. 4. Right leg joints trajectories for one gait cycle, generated by LSTM and GRU

Fig. 5. Left leg joints trajectories for one gait cycle, generated by LSTM and GRU

that for this particular robot GRU model, which has much [8] G. K. Yadav, S. Jaiswal, and G. Nandi, “Generic walking trajectory
fewer parameters compared to LSTM is sufficient to learn generation of biped using sinusoidal function and cubic spline,” in
2020 7th International Conference on Signal Processing and Integrated
its walking patterns. However, we need to collect more data Networks (SPIN). IEEE, 2020, pp. 745–750.
by making the robot walk on different walking environments [9] J.-W. Kim, “Online joint trajectory generation of human-like biped
having unevenness and other disturbances(like push, etc.) and walking,” International Journal of Advanced Robotic Systems, vol. 11,
no. 2, p. 19, 2014.
need to check the performances of both the GRU and LSTM [10] V. B. Semwal and G. C. Nandi, “Generation of joint trajectories using
models with the help of limit cycle analysis. Also, as an hybrid automate-based model: a rocking block-based approach,” IEEE
ongoing work, we are currently engaged in collecting large Sensors Journal, vol. 16, no. 14, pp. 5805–5816, 2016.
[11] B. Ammar, N. Chouikhi, A. M. Alimi, F. Chérif, N. Rezzoug, and
data sets using mobile phone [21] from actual human walking P. Gorce, “Learning to walk using a recurrent neural network with
and planning to train our sequence networks with such data time delay,” in International Conference on Artificial Neural Networks.
to validate/generalize which sequence model would be better Springer, 2013, pp. 511–518.
[12] T. Zhen, L. Yan, and P. Yuan, “Walking gait phase detection based
for an unstructured and cluttered environment. on acceleration signals using lstm-dnn algorithm,” Algorithms, vol. 12,
no. 12, p. 253, 2019.
[13] C. Liu, J. Yang, W. Bu, and Q. Chen, “A trajectory generation method
R EFERENCES for biped walking based on neural oscillators,” in 2016 IEEE 13th
International Conference on Networking, Sensing, and Control (ICNSC).
[1] M. Vukobratović and B. Borovac, “Zero-moment point—thirty five years IEEE, 2016, pp. 1–6.
of its life,” International journal of humanoid robotics, vol. 1, no. 01, [14] K. Sunbin, “bipedal-robot-walking-simulation,”
pp. 157–173, 2004. https://github.com/Einsbon/bipedal-robot-walking-simulation, 2019.
[2] S. Kajita, F. Kanehiro, K. Kaneko, K. Yokoi, and H. Hirukawa, “The 3d [15] E. Coumans and Y. Bai, “Pybullet, a python module for physics
linear inverted pendulum mode: A simple modeling for a biped walking simulation for games, robotics and machine learning,” http://pybullet.org,
pattern generation,” in Proceedings 2001 IEEE/RSJ International Con- 2016–2020.
ference on Intelligent Robots and Systems. Expanding the Societal Role [16] V. B. Semwal, M. Raj, and G. C. Nandi, “Biometric gait identification
of Robotics in the the Next Millennium (Cat. No. 01CH37180), vol. 1. based on a multilayer perceptron,” Robotics and Autonomous Systems,
IEEE, 2001, pp. 239–246. vol. 65, pp. 65–75, 2015.
[3] J. Shan and F. Nagashima, “Neural locomotion controller design and [17] J. Taborri, E. Palermo, S. Rossi, and P. Cappa, “Gait partitioning
implementation for humanoid robot hoap-1,” in 20th annual conference methods: A systematic review,” Sensors, vol. 16, no. 1, p. 66, 2016.
of the robotics society of Japan, 2002. [18] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
[4] D. Hein, M. Hild, and R. Berger, “Evolution of biped walking using computation, vol. 9, no. 8, pp. 1735–1780, 1997.
neural oscillators and physical simulation,” in Robot Soccer World Cup. [19] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares,
Springer, 2007, pp. 433–440. H. Schwenk, and Y. Bengio, “Learning phrase representations using
[5] C. Kouppas, Q. Meng, M. King, and D. Majoe, “Sarah: The bipedal rnn encoder-decoder for statistical machine translation,” arXiv preprint
robot with machine learning step decision making,” Int. J. Mech. Eng. arXiv:1406.1078, 2014.
Robot. Res, vol. 7, no. 4, 2018. [20] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of
[6] A. Kumar, N. Paul, and S. Omkar, “Bipedal walking robot using deep gated recurrent neural networks on sequence modeling,” arXiv preprint
deterministic policy gradient,” arXiv preprint arXiv:1807.05924, 2018. arXiv:1412.3555, 2014.
[7] K. Zhang, Z. Hou, C. W. de Silva, H. Yu, and C. Fu, “Teach biped robots [21] V. B. Semwal and G. C. Nandi, “Toward developing a computational
to walk via gait principles and reinforcement learning with adversarial model for bipedal push recovery–a brief,” IEEE Sensors Journal, vol. 15,
critics,” arXiv preprint arXiv:1910.10194, 2019. no. 4, pp. 2021–2022, 2015.

You might also like