Professional Documents
Culture Documents
(5)
where G tries to minimize this objective. The distribution
piz (z) is defined in equation 4. Note that we calculate the
weighted average of the scores from discriminators on hi , i =
2, 3, ..., T . This objective can make G pay more attention to
its later outputs, allowing G to try different outputs in the
beginning. During the training process, G learns to generate
heuristic which satisfies the standard of D1 and D2 .
The objective of training safety discriminator D1 is:
L(D1 ) = Eh∼pn (h) [log D1 (h|m)] +
T
1 X
Ez∼piz (z) [log (1 − D1 (Ghi (z|m, q)|m))], (6)
T 1
columns in Fig. 6. The training set is comprised of Map the length of initial and optimal path and corresponding
2, 4 and 5, which includes 200, 83 and 112 different maps, consumed iterations. The experiment results are presented
respectively. For each map we randomly set 20, 50 and 20 in Fig. 8. Herein, we select four pairs of map and state
pairs of start and end state. For each pair of states, we run information, which is shown in Fig. 9. The discretized
RRT 50 times to generate paths in one image. Besides, we heuristic is presented by the yellow points. The green paths
also set the state information manually to generate data which are generated by HRRT*. For all the four maps, HRRT*
is relatively rare during the random generation process. can find an initial path with a shorter length (very close to
Therefore, the training set belonging to Map 2, 4 and 5 have the optimal path) and fewer iterations compared with RRT*.
4000, 5351 and 3086 pairs of images, respectively. For the Because the heuristic almost covers the region where the
test set, the state information is set randomly. The size of our optimal solution exists, HRRT* has more chances to expand
training and test set is shown in the third and fourth column its nodes in this region with a constant possibility of sampling
in Fig. 6. from heuristic. However, RRT* has to sample from the whole
Because the paths in the dataset generated by RRT are map uniformly. That is why that HRRT* can also converge
not nearly optimal in some cases due to the randomness to the optimal path with much less iterations, which means
in RRT algorithm, we adopt one-sided label smoothing that the heuristic generated by RGM satisfies equation 2 and
technique which replaces 0 and 1 targets for discriminators provide an efficient guidance for the sampling-based path
with smoothed values such as 0 and 0.9 [28]. We also planner.
implement the weight clipping technique [15] to prevent the
neural network from gradient explosion during the training
process. The test results are presented in Fig. 6, which shows
that not only the RGM model can generate efficient heuristic
given maps that are similar to maps in training set, and it also
has a good generalization ability to deal with the maps that
have not been seen before. We present examples of heuristic
generated by the RGM model, as shown in Fig. 7.
Fig. 9. Four maps with different types. Yellow points denote the heuristic.
B. Quantitative Analysis of RRT* with Heuristic
After generating heuristic by the RGM model, we combine
the heuristic with RRT* (denoted as HRRT* for simplicity) V. C ONCLUSIONS A ND F UTURE W ORK
similar to [11] and [7]: Define the possibility of sampling In this paper, we present a novel RGM model which can
from heuristic every iteration as P (i). If P (i) > Ph , then learn to generate efficient heuristic for robot path planning.
RRT* samples nodes from heuristic, otherwise it samples The proposed RGM model achieves good performance in
nodes from the whole map, where Ph is a fixed positive environments similar to maps in training set with different
number between [0, 1]. The value of Ph is set manually and state information, and generalizes to unseen environments
needs further research about its impact on the performance which have not been seen before. The generated heuristic
of HRRT*. can significantly improve the performance of sampling-based
We compare this HRRT* with conventional RRT*. Ph is path planner. For the future work, we are constructing a
set to 0.4. We execute both HRRT* and RRT* on each map real-world path planning dataset to evaluate and improve the
120 times to get data for statistical comparison, including performance of the proposed RGM model.
R EFERENCES [15] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative
adversarial networks,” in Proceedings of the 34th International Con-
[1] H. M. Choset, S. Hutchinson, K. M. Lynch, G. Kantor, W. Burgard, ference on Machine Learning-Volume 70, 2017, pp. 214–223.
L. E. Kavraki, S. Thrun, and R. C. Arkin, Principles of robot motion: [16] L. Mescheder, A. Geiger, and S. Nowozin, “Which training methods
theory, algorithms, and implementation. MIT press, 2005. for gans do actually converge?” in International Conference on Ma-
[2] N. J. N. Hart, Peter E. and B. Raphael, “A formal basis for the heuristic chine learning (ICML). PMLR, 2018, pp. 3481–3490.
determination of minimum cost paths,” IEEE transactions on Systems [17] T. Mikolov, S. Kombrink, L. Burget, J. Černockỳ, and S. Khudanpur,
Science and Cybernetics, vol. 4, no. 2, pp. 100–107, 1968. “Extensions of recurrent neural network language model,” in 2011
[3] O. Khatib, “Real-time obstacle avoidance for manipulators and mobile IEEE international conference on acoustics, speech and signal pro-
robots,” International Journal of Robotics Research, vol. 5, no. 1, pp. cessing (ICASSP). IEEE, 2011, pp. 5528–5531.
90–98, 1986. [18] T. Takahashi, H. Sun, D. Tian, and Y. Wang, “Learning heuristic
[4] S. M. LaValle and J. J. Kuffner Jr, “Randomized kinodynamic plan- functions for mobile robot path planning using deep neural networks,”
ning,” The international journal of robotics research, vol. 20, no. 5, in Proceedings of the International Conference on Automated Planning
pp. 378–400, 2001. and Scheduling, vol. 29, no. 1, 2019, pp. 764–772.
[5] L. Kavraki, P. Svestka, and M. H. Overmars, Probabilistic roadmaps [19] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional
for path planning in high-dimensional configuration spaces. Un- networks for biomedical image segmentation,” in International Confer-
known Publisher, 1994, vol. 1994. ence on Medical image computing and computer-assisted intervention.
[6] S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal Springer, 2015, pp. 234–241.
motion planning,” The international journal of robotics research, [20] S. Choudhury, M. Bhardwaj, S. Arora, A. Kapoor, G. Ranade,
vol. 30, no. 7, pp. 846–894, 2011. S. Scherer, and D. Dey, “Data-driven planning via imitation learning,”
[7] J. Wang, W. Chi, C. Li, C. Wang, and M. Q.-H. Meng, “Neural The International Journal of Robotics Research, vol. 37, no. 13-14,
rrt*: Learning-based optimal path planning,” IEEE Transactions on pp. 1632–1672, 2018.
Automation Science and Engineering, 2020. [21] M. Mirza and S. Osindero, “Conditional generative adversarial nets,”
[8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, arXiv preprint arXiv:1411.1784, 2014.
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in [22] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
Advances in neural information processing systems, 2014, pp. 2672– network training by reducing internal covariate shift,” in International
2680. Conference on Machine Learning, 2015, pp. 448–456.
[9] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in [23] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation
Proceedings of the International Conference on Learning Representa- of gated recurrent neural networks on sequence modeling,” in NIPS
tions (ICLR), 2014. 2014 Workshop on Deep Learning, December 2014, 2014.
[10] M. J. B. Ahmed H. Qureshi, Anthony Simeonov and M. C. Yip, [24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
“Motion planning networks,” in International Conference on Robotics recognition,” in Proceedings of the IEEE conference on computer
and Automation (ICRA). IEEE, 2019, pp. 2118–2124. vision and pattern recognition, 2016, pp. 770–778.
[11] J. H. Ichter, Brian and M. Pavone, “Learning sampling distributions [25] K. Gregor, I. Danihelka, A. Graves, D. Rezende, and D. Wierstra,
for robot motion planning,” in 2018 IEEE International Conference “Draw: A recurrent neural network for image generation,” in Interna-
on Robotics and Automation (ICRA). IEEE, 2018, pp. 7087–7094. tional Conference on Machine Learning, 2015, pp. 1462–1471.
[12] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation [26] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention
learning with deep convolutional generative adversarial networks,” generative adversarial networks,” in International Conference on Ma-
arXiv preprint arXiv:1511.06434, 2015. chine Learning. PMLR, 2019, pp. 7354–7363.
[13] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture [27] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
for generative adversarial networks,” in Proceedings of the IEEE T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An
conference on computer vision and pattern recognition, 2019, pp. imperative style, high-performance deep learning library,” in Advances
4401–4410. in neural information processing systems, 2019, pp. 8026–8037.
[28] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and
[14] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image
X. Chen, “Improved techniques for training gans,” in Advances in
translation with conditional adversarial networks,” in Proceedings of
neural information processing systems, 2016, pp. 2234–2242.
the IEEE conference on computer vision and pattern recognition,
2017, pp. 1125–1134.