Autonomous Robot Navigation in Crowd: 1 Paulo de Almeida Afonso 2 Paulo Roberto Ferreira JR

You might also like

You are on page 1of 6

2022 Latin American Robotics Symposium (LARS), 2022 Brazilian Symposium on Robotics (SBR), and 2022 Workshop on Robotics

in Education (WRE) | 978-1-6654-6280-8/22/$31.00 ©2022 IEEE | DOI: 10.1109/LARS/SBR/WRE56824.2022.9995874

Autonomous Robot Navigation in Crowd


1st Paulo de Almeida Afonso 2nd Paulo Roberto Ferreira Jr.
Programa de Pós-Graduação em Computação Programa de Pós-Graduação em Computação
Centro de Desenvolvimento Tecnológico (CDTec) Centro de Desenvolvimento Tecnológico (CDTec)
Universidade Federal de Pelotas (UFPEL) Universidade Federal de Pelotas (UFPEL)
Pelotas, Brasil Pelotas, Brasil
paafonso@inf.ufpel.edu.br paulo@inf.ufpel.edu.br

reactive algorithms [14] or use strategies aimed at social


Abstract—This study presents a review of the literature that navigation [15]–[18].
addresses the problem of autonomous navigation in crowded The remainder of this paper is organized as follows. Section
indoor environments. Navigation in this type of environment is
a particularly challenging task because a robot must be able II presents the research methodology adopted in this study.
to navigate autonomously without endangering nearby people. Section III presents the main reactive methods that include
We analyze a few selected studies in this field published in methods based on reinforcement learning (RL) and deep rein-
the last seven years. The analysis shows that a combination of forcement learning (DRL). Section IV presents the predictive
different techniques is necessary for safe navigation in this type methods, which includes, the most commonly used algorithms
of environment. Therefore we follow a line of previous studies to
seek new solutions, or investigate and improve existing solutions and some of their derivations that represent efficient solution
to address the aforementioned problem. modeling of human beings and prediction of trajectories.
Index Terms—autonomous navigation, indoor environments, The obtained results are discussed in Section V. Finally, the
crowded environment, crowd navigation conclusions are presented, and the references are listed.

I. INTRODUCTION II. METHODOLOGY


The main sources used in this study were the CAPES
Studies related to the integration of robots in daily human
Journal Portal, digital library IEEEXplore, Google Scholar,
activities in places, such as airports, museums, and malls, have
and Scopus Database. By searching using relevant terms and
shown that autonomous navigation in this type of environment
keywords (autonomous navigation, crowded scenarios, crowd
is a challenging task [1]–[4].
navigation, collision avoidance in crowd, robot navigation in
In crowded scenarios, movement of people can generate
crowded), papers published in the last seven years in important
obstructions, sensing hindrances, and perception impairment
scientific journals and conferences in the field of robotics
of the robot in relation to its position in the environment
were selected, such as: International Journal of Robotics
[5]. Moreover, the uncertainty of human behavior can lead to
Research, Autonomous Robots, International Conference on
unsafe situations for the robots and people during navigation.
Autonomous Robot Systems and Competitions (ICARSC),
Faced with this challenge, several solutions have been ex-
International Conference on Robotics and Automation (ICRA)
plored for robots to navigate efficiently and safely, considering
and International Conference on Intelligent Robots and Sys-
the people movements around them.
tems (IROS).
This study presents a literature review on how autonomous The selected papers were then classified based on their tech-
navigation in crowded internal scenarios is addressed. The niques. Those directly related to the object of this study were
articles published in the last seven years in important scientific first ranked. Their references were consulted, and according
journals and conferences in the field of robotics were consid- to the proposed approach, the papers were analyzed and the
ered in this study. observations were noted.
The analysis of the selected articles shows that the existing
approaches are mainly divided into two methods: reactive III. R EACTIVE M ETHODS
methods, in which the decision-making process of the agent A. Reinforcement Learning Based
is initiated by identifying an imminent collision [6], [7]; and
Traditional Reinforcement Learning (RL) methods can be
predictive methods, in which the collision-free trajectories or
used to model reactive behaviors by processing the sensor
future actions of pedestrians are predicted through behavioral
reading data and the relative positions of the objects, which
modeling of humans [8]–[10].
are key elements of a navigation environment. In this method,
Additionally, there are techniques that consider the solutions
based on learning [11]–[13], which sometimes combine with https://www.periodicos.capes.gov.br/
https/iexpee.ie.ieorg/
https://scholar.google.com.br/
978-1-6654-6280-8/22/$31.00 ©2022 IEEE https//www.scopus.com/

139
Authorized licensed use limited to: FUDAN UNIVERSITY. Downloaded on February 25,2023 at 04:33:29 UTC from IEEE Xplore. Restrictions apply.
collision avoidance is performed based on the speed and Unlike purely reactive methods, in a DRL approach we seek
direction of a model; the navigation task is performed by to code cooperative behaviors by learning a value function
processing the received data, issuing linear and angular speed [39], [40], or learning from the experiences of various agents
commands to control the robot, and to avoiding obstacles [19]. during the training steps [41]. These strategies allow the
Generally, training data are preprocessed through a neural algorithms to choose the actions to be performed, based on the
network, and further, the resource map extracted from the observations of an arbitrary number of nearby agents, without
supervised learning model is used as data input to the network, assuming that other agents follow any specific behavioral
resulting in a set of commands to determine the actions of the model.
robot [20]. In DRL approaches, different algorithms were combined
The RL problems involve learning the task and mapping to implement collision prevention policies. Studies based on
situations to actions with the aim of maximizing a reward value decentralized environments demonstrate that the data obtained
[21], [22]. To use RL in situations addressing real-world com- from each agent belonging to the environment can be applied
plexity, agents should derive efficient representations of the to deep neural networks for the planning stage, whereas RL
environment and use the received sensory inputs to generalize algorithms, such as proximal policy optimization (PPO) [42]–
past experiences for application in future situations [23]. [44], can be used in the training stage to update the collision
The Q-learning algorithm proposed by Watkins and Dayan avoidance policy.
[24] is one of the most popular RL algorithms. Its application To improve the algorithm performance and reduce the
has been adopted in several studies for planning different computational cost, artificially generated maps and training
navigation and collision avoidance models [25], [26]. datasets containing real-world samples can be used [45], or
There are four basic components of RL: agent, environment, independent decision models can be created using repeated
reward, and action. A typical RL algorithm operates with a offline simulations [46].
limited knowledge of the environment and feedback on the It is possible to store experience transactions using a mem-
quality of decisions [27]. ory module to minimize the learning time [47]. The compu-
tation time can also be reduced by eliminating unnecessary
repetitions during the network training step.

C. Artificial Potential Field Based


Solutions based on artificial potential field algorithms that
combine different techniques modifications with the traditional
algorithm have been proposed to improve the planning of paths
and to avoid obstacles [48].
Modifications to the Artificial Potential Field (APF) can
be adopted to create new attractive force-points and help the
moving robot to escape from the local minimum [49, Fig. 2].

Fig. 1. Reinforcement Learning

At each step of the interaction with the T environment, the


B agent receives an input i as an indication of the current
state s, and chooses an a action to generate an output. The
action changes the state of the environment, and the value of
this state transition is communicated to the agent through a r
reinforcing signal [28, Fig. 1].
After multiple repetitions of this process, the agent is able
to choose actions that tend to increase the sum of the values
of the reinforcement signal. An agent can learn to do this
over time through systematic trial and error guided by a wide
variety of algorithms. Fig. 2. T-APF

B. Deep Reinforcement Learning Based In this type of solution, the robot makes the route using
The DRL is a combination of RL and deep learning (DL). the traditional APF algorithm, and upon reaching a local
Its use has been researched for a variety of applications, such minimum, a function based on the modified APF is applied to
as games [29], autonomous navigation in indoor environments stimulate the robot to change its trajectory and then resume
[30], [31], driving autonomous vehicles in urban scenarios the route towards its goal.
[32], [33], recognition of static and mobile objects [34], and Sampling-based techniques to identify collision-free paths
collision avoidance during autonomous navigation [35]–[38]. in a dynamic environments can be combined with artificial

140
Authorized licensed use limited to: FUDAN UNIVERSITY. Downloaded on February 25,2023 at 04:33:29 UTC from IEEE Xplore. Restrictions apply.
power-field planning methods for navigation in the dynamic the need for the application, and time constraints, the method
environments [50], [51]. The solution consists of calculating can be adopted to ensure greater safety during navigation.
a path without static obstacle collisions that can be used as
IV. P REDICTIVE M ETHODS
an intermediate attribute to achieve this goal. To improve the
security of the algorithm, a repulsive potential field can be A. Based on Human Behavior
incorporated for each moving obstacle based on precomputed Some studies have highlighted that purely reactive obstacle
stochastic sets. prevention techniques may not be sufficient for solving naviga-
A modification of the repulsion field function by introducing tion problems in dynamic environments [59], [60]. According
a relative distance value between the target point and robot to the authors, in this type of scenario, the robot should
was presented by [52]. The solution adopted a combination coexist, cooperate with humans, and focus on other obstacles
of APF with the fuzzy control algorithm, creating a function in motion.
to increase the perception of the mobile robot beyond the Modern structures of DRL and human-robot interaction
distance from the obstacles. As the robot approaches the goal, (HRI) can be used to encode the previous knowledge of hu-
the repulsion force is updated by a regulating factor tending mans, which is achieved by introducing cooperative behavioral
to zero, until it reaches the goal. characteristics to the actions of the robot and reinforcing safety
during navigation [61]. In this type of solution, the robot learns
D. Velocity Obstacle Based the appropriate social navigation using data obtained from
sensor inputs, whereas the reward update can occur based on
The concept of a velocity obstacle based on the geometric
human feedback or actions performed during interaction with
structure of the collision cone was first presented by [53].
the environment.
In the proposed structure, obstacles are observed in the local
horizontal plane (XY) of the agent with its flat cross-section
centered in p⃗j , as shown in Fig. 3 [54]:

Fig. 4. RL Architecture for social navigation and HRI.

Figure 4 represents a learning architecture proposed by [61].


Fig. 3. Velocity Obstacles The data obtained are processed by a neural network and
used for path prediction and calculating proxemic distances for
A study by [54] analyzed and compared the performance social navigation and collision avoidance, a solution based on a
of several well-established approaches for avoiding collisions called Composite Reinforcement Learning (CRL) framework.
in non-cooperative multiagent systems, the algorithms, Ve- Solutions based on pedestrian behavior can also be imple-
locity Obstacle - VO [53], Reciprocal Velocity Obstacle - mented through the extraction and combination of the global
RVO [55], Hybrid Reciprocal Velocity Obstacle - HRVO [56] characteristics of the crowd [61]–[63]. In this study, the theory
and Optimal Reciprocal Collision Avoidance - ORCA [57], of personality traits can also be used to dynamically learn the
were studied in several scenarios with different levels of behavior of pedestrians in a given scene and to calculate a
difficulty. Analysis of his results showed that reactive collision model of movement for this pedestrian [64].
avoidance methods may be sufficient to avoid collisions in A widely used solution is the integration of the long
environments with no communication between agents. The short-term memory (LSTM) algorithm [65] with the DRL.
HRVO and ORCA methods were proved to be more efficient In this type of solution, LSTM can be used to reflect the
in dense environments when handling trajectory uncertainties. memory characteristics of humans and accelerate the learning
The ORCA method also exhibiteds smoother trajectories and of policies for autonomous agent navigation [66]–[69].
better computing times. An LSTM-LMC architecture using LSTM and Local-Map
In [58] was presented an extension of the VO method, Critic (LMC) was proposed for autonomous navigation in
aiming to determine the safest path between the robot’s current complex environments with a limited field of view [70]. The
and destination positions. In the proposed solution, the authors solution used a map with static obstacles to reinforce the
assumed that the robot speeds and obstacles are known or mea- learning ability, combined with a dataset containing 75 maps,
surable. The SVO method uses a velocity-vector component to randomly sampled at each training episode. To overcome
calculate the safest speed at each sampling step. Depending on application performance failures in real scenarios and improve

141
Authorized licensed use limited to: FUDAN UNIVERSITY. Downloaded on February 25,2023 at 04:33:29 UTC from IEEE Xplore. Restrictions apply.
the robustness of the learned policy, different randomization method uses a modification of the algorithm A* to calculate
techniques were applied during tests in a simulated environ- the ideal path, abstracting the environment through the trian-
ment. gulation of Delaunay [81], and then projecting the dynamics
The construction of a navigation policy for unknown dy- of the obstacles in the triangulation.
namic environments using a multiagent approach applied DRL The solution presented did not explicitly address socially
by combining LSTM with the PPO and RVO algorithms [71]. acceptable navigation. However, it can be extended to incor-
The PPO algorithm based on actor/critic method was used to porate social norms, such as a person taking pictures of others
train agents to learn on how to achieve their goals, whereas the or a group of friends walking together. Therefore, the edges
RVO was used to avoid collisions during the navigation task. between pedestrians can be denoted as noncrossable for the
The tests showed that the proposed solution could simultane- robot.
ously train multiagents with different objectives, obtain good V. DISCUSSION
results during path planning, and present a good capacity for
This section presents a brief analysis of the studies selected
self-learning. Further, the authors highlighted that LSTM can
for this review. Among the ones selected, 55% of the studies
address two common problems that may occur when training
used the reactive approach to prevent collisions during the
traditional recurrent neural networks—the vanishing gradient
navigation task, and 45% used predictive methods. A signifi-
problem [72] and the exploding gradient problem [73].
cant rise in the use of learning-based techniques, particularly
B. Based on Trajectory Prediction after 2017, corresponding to 61% of the selected studies, was
noted. Additionally, 33% of the studies addressed the use of
Studies that focusing on the planning of trajectories in envi-
techniques aimed at social navigation, with significant progress
ronments involving everyday situations have highlighted that
in this type of solutions from 2021.
humans do not have the habit of following shorter trajectories
The analysis results also revealed that there is a significant
when navigating crowds [74]. In this type of environment,
increase in the proposition of studies that focused on social
social norms are generally considered when implementing
navigation and use of DRL for navigation in crowded envi-
mechanisms such that the robot avoids the most dense areas
ronments; it significant studies demonstrates the need to adopt
with high probability of collisions.
policies for collision prevention and a combine techniques that
To obtain viable trajectories, in addition to predicting the
provide efficient navigation without endangering people close
future evolution of obstacles, the need to observe kinodynamic
to the robot.
restrictions of the robot has also been highlighted based on
Therefore, for autonomous navigation in dynamic and
the safety, maneuverability, and restrictions of the environment
crowded environments, three essential points should be ob-
[75]. In a complementary manner, the agent should be able to
served: a) the need for a collision prevention policy through
learn an interaction model from the trajectory data [76], [77]
reactive or predictive behaviors; b) the ability to promptly
or real human behavior, model the velocities of other agents
calculate a viable trajectory and change the trajectory of the
in the crowd, or identify the variable personality at of each
robot based on updated information of the environment; and
pedestrian [78].
c) the ability to identify people to improve decision-making
Some solutions use agent-based methods to predict pedes-
and perform the task of navigation more safely.
trian trajectories using velocity-space reasoning [79]. In addi-
tion to not relying on prior knowledge of the environment, this VI. CONCLUSIONS
type of approach can be integrated with other local navigation This study presents a literature review on addressing au-
techniques to improve the rate of task completion and reduce tonomous navigation in crowded indoor environments. We
instances related to the problem of robot freezing. considered studies in this field that were published in the
last seven years. The analysis showed that the navigation of
autonomous robots in crowded environments remains an open
problem.
The existing approaches disregard the number of agents as
a dynamic variable or consider the trajectory or velocity of
the other agents to be known. However, some studies have
shown that in this type of scenarios, the robot should be able
to navigate without prior knowledge of the environment or
actions of other agents and obstacles.
It was also noted that most of the proposed solutions
consider people as simple moving obstacles, which can affect
Fig. 5. Dynamic Channel the safety of navigation in real-world applications. Therefore,
it is possible to follow a line of studies in this field to
Dynamic channels architecture for autonomous crowd nav- address new solutions, particularly through a combination of
igation (Fig. 5) combines global optimization with a heuristic techniques, or investigating and improving existing solutions
to manage agent dynamics in the crowd [80]. The proposed to efficiently deal address the problem in question.

142
Authorized licensed use limited to: FUDAN UNIVERSITY. Downloaded on February 25,2023 at 04:33:29 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES [21] R. S. Sutton, “Introduction: The challenge of reinforcement learning,”
in Reinforcement Learning. Springer, 1992, pp. 1–3.
[1] J. Godoy, S. J. Guy, M. Gini, and I. Karamouzas, “C-nav: Distributed [22] ——, “Reinforcement learning: Past, present and future,” in Asia-Pacific
coordination in crowded multi-agent navigation,” Robotics and Au- Conference on Simulated Evolution and Learning. Springer, 1998, pp.
tonomous Systems, vol. 133, p. 103631, 2020. 195–197.
[2] M. Nishimura and R. Yonetani, “L2b: Learning to balance the safety- [23] L. Qiang, D. Nanxun, L. Huican, and W. Heng, “A model-free mapless
efficiency trade-off in interactive crowd-aware robot navigation,” in 2020 navigation method for mobile robot using reinforcement learning,” in
IEEE/RSJ International Conference on Intelligent Robots and Systems 2018 Chinese Control And Decision Conference (CCDC). IEEE, 2018,
(IROS), 2020, pp. 11 004–11 010. pp. 3410–3415.
[3] D. Dugas, J. Nieto, R. Siegwart, and J. J. Chung, “Ian: Multi-behavior [24] C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no.
navigation planning for robots in real, crowded environments,” in 2020 3-4, pp. 279–292, 1992.
IEEE/RSJ International Conference on Intelligent Robots and Systems [25] T. Ribeiro, F. Gonçalves, I. Garcia, G. Lopes, and A. F. Ribeiro, “Q-
(IROS), 2020, pp. 11 368–11 375. learning for autonomous mobile robot obstacle avoidance,” in 2019
[4] Y. Chen, C. Liu, B. E. Shi, and M. Liu, “Robot navigation in crowds by IEEE International Conference on Autonomous Robot Systems and
graph convolutional networks with attention learned from human gaze,” Competitions (ICARSC). IEEE, 2019, pp. 1–7.
IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2754–2761, [26] S. Yang and C. Li, “Behavior control algorithm for mobile robot based
2020. on q-learning,” in 2017 International Conference on Computer Network,
[5] V. N. Zhidkov, N. V. Kim, and N. V. Udalova, “Robot navigation Electronic and Automation (ICCNEA). IEEE, 2017, pp. 45–48.
in a dynamically changing environment,” in 2020 International Multi- [27] D. Zhang, X. Han, and C. Deng, “Review on the research and practice of
Conference on Industrial Engineering and Modern Technologies (Far- deep learning and reinforcement learning in smart grids,” CSEE Journal
EastCon), 2020, pp. 1–6. of Power and Energy Systems, vol. 4, no. 3, pp. 362–370, 2018.
[6] D. Zhang, Z. Xie, P. Li, J. Yu, and X. Chen, “Real-time navigation [28] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement
in dynamic human environments using optimal reciprocal collision learning: A survey,” Journal of artificial intelligence research, vol. 4,
avoidance,” in 2015 IEEE International Conference on Mechatronics pp. 237–285, 1996.
and Automation (ICMA). IEEE, 2015, pp. 2232–2237.
[29] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
[7] D. J. Gonon, D. Paez-Granados, and A. Billard, “Reactive navigation in
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,
crowds for non-holonomic robots with convex bounding shape,” IEEE
et al., “Human-level control through deep reinforcement learning,”
Robotics and Automation Letters, vol. 6, no. 3, pp. 4728–4735, 2021.
Nature, vol. 518, no. 7540, p. 529, 2015.
[8] M. Pfeiffer, U. Schwesinger, H. Sommer, E. Galceran, and R. Siegwart,
[30] L. Tai and M. Liu, “Towards cognitive exploration through deep rein-
“Predicting actions to act predictably: Cooperative partial motion plan-
forcement learning for mobile robots,” arXiv preprint arXiv:1610.01733,
ning with maximum entropy models,” in 2016 IEEE/RSJ International
2016.
Conference on Intelligent Robots and Systems (IROS). IEEE, 2016, pp.
2096–2101. [31] T. Okuyama, T. Gonsalves, and J. Upadhay, “Autonomous driving
system based on deep q learnig,” in 2018 International Conference on
[9] Y. Xu, D. Ren, M. Li, Y. Chen, M. Fan, and H. Xia, “Tra2tra: Trajectory-
Intelligent Autonomous Systems (ICoIAS). IEEE, 2018, pp. 201–205.
to-trajectory prediction with a global social spatial-temporal attentive
neural network,” IEEE Robotics and Automation Letters, vol. 6, no. 2, [32] A. E. Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep reinforcement
pp. 1574–1581, 2021. learning framework for autonomous driving,” Electronic Imaging, vol.
[10] Y. Cui, H. Zhang, Y. Wang, and R. Xiong, “Learning world transition 2017, no. 19, pp. 70–76, 2017.
model for socially aware robot navigation,” in 2021 IEEE International [33] M. Wulfmeier, D. Rao, D. Z. Wang, P. Ondruska, and I. Posner,
Conference on Robotics and Automation (ICRA), 2021, pp. 9262–9268. “Large-scale cost function learning for path planning using deep inverse
[11] T. Fan, X. Cheng, J. Pan, D. Monacha, and R. Yang, “Crowdmove: reinforcement learning,” The International Journal of Robotics Research,
Autonomous mapless navigation in crowded scenarios,” arXiv preprint vol. 36, no. 10, pp. 1073–1087, 2017.
arXiv:1807.07870, 2018. [34] G. Zuo, T. Du, and J. Lu, “Double dqn method for object detection,”
[12] Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware in 2017 Chinese Automation Congress (CAC). IEEE, 2017, pp. 6727–
motion planning with deep reinforcement learning,” in 2017 IEEE/RSJ 6732.
International Conference on Intelligent Robots and Systems (IROS). [35] P. K. Mohanty, A. K. Sah, V. Kumar, and S. Kundu, “Application of deep
IEEE, 2017, pp. 1343–1350. q-learning for wheel mobile robot navigation,” in 2017 3rd International
[13] S. S. Samsani and M. S. Muhammad, “Socially compliant robot navi- Conference on Computational Intelligence and Networks (CINE). IEEE,
gation in crowded environment by human behavior resemblance using 2017, pp. 88–93.
deep reinforcement learning,” IEEE Robotics and Automation Letters, [36] X. Ruan, D. Ren, X. Zhu, and J. Huang, “Mobile robot navigation based
vol. 6, no. 3, pp. 5223–5230, 2021. on deep reinforcement learning,” in 2019 Chinese Control And Decision
[14] D. Bareiss and J. van den Berg, “Generalized reciprocal collision Conference (CCDC). IEEE, 2019, pp. 6174–6178.
avoidance,” The International Journal of Robotics Research, vol. 34, [37] H. T. Le, D. T. Nguyen, and X. T. Truong, “Socially aware robot navi-
no. 12, pp. 1501–1514, 2015. gation framework in crowded and dynamic environments: A comparison
[15] F. d. A. M. Pimentel and P. T. Aquino-Jr, “Evaluation of ros navigation of motion planning techniques,” in 2021 8th NAFOSTED Conference on
stack for social navigation in simulated environments,” Journal of Information and Computer Science (NICS), 2021, pp. 95–101.
Intelligent & Robotic Systems, vol. 102, no. 4, pp. 1–18, 2021. [38] L. Liu, D. Dugas, G. Cesari, R. Siegwart, and R. Dubé, “Robot
[16] M. Daza, D. Barrios-Aranibar, J. Diaz-Amado, Y. Cardinale, and J. Vi- navigation in crowded environments using deep reinforcement learning,”
lasboas, “An approach of social navigation based on proxemics for in 2020 IEEE/RSJ International Conference on Intelligent Robots and
crowded environments of humans and robots,” Micromachines, vol. 12, Systems (IROS), 2020, pp. 5671–5677.
no. 2, p. 193, 2021. [39] Y. F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized non-
[17] H. Zeng, R. Hu, X. Huang, and Z. Peng, “Robot navigation in crowd communicating multiagent collision avoidance with deep reinforcement
based on dual social attention deep reinforcement learning,” Mathemat- learning,” in 2017 IEEE international conference on robotics and
ical Problems in Engineering, vol. 2021, 2021. automation (ICRA). IEEE, 2017, pp. 285–292.
[18] F. Haarslev., W. Juel., A. Kollakidou., N. Krüger., and L. Bodenhagen., [40] Y. Zhai, Y. Miao, and H. Wang, “Robot navigation with interaction-based
“Context-aware social robot navigation,” in Proceedings of the 18th deep reinforcement learning,” in 2021 IEEE International Conference on
International Conference on Informatics in Control, Automation and Robotics and Biomimetics (ROBIO), 2021, pp. 1974–1979.
Robotics - ICINCO,, INSTICC. SciTePress, 2021, pp. 426–433. [41] M. Everett, Y. F. Chen, and J. P. How, “Motion planning among dynamic,
[19] Y. Liu, H. Liu, and B. Wang, “Autonomous exploration for mobile robot decision-making agents with deep reinforcement learning,” in 2018
using q-learning,” in 2017 2nd International Conference on Advanced IEEE/RSJ International Conference on Intelligent Robots and Systems
Robotics and Mechatronics (ICARM). IEEE, 2017, pp. 614–619. (IROS). IEEE, 2018, pp. 3052–3059.
[20] L. Tai and M. Liu, “A robot exploration strategy based on q-learning net- [42] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox-
work,” in 2016 IEEE International Conference on Real-time Computing imal policy optimization algorithms,” arXiv preprint arXiv:1707.06347,
and Robotics (RCAR). IEEE, 2016, pp. 57–62. 2017.

143
Authorized licensed use limited to: FUDAN UNIVERSITY. Downloaded on February 25,2023 at 04:33:29 UTC from IEEE Xplore. Restrictions apply.
[43] P. Long, T. Fanl, X. Liao, W. Liu, H. Zhang, and J. Pan, “Towards [63] S. Matsuzaki, S. Aonuma, and Y. Hasegawa, “Dynamic window ap-
optimally decentralized multi-robot collision avoidance via deep rein- proach with human imitating collision avoidance,” in 2021 IEEE In-
forcement learning,” in 2018 IEEE International Conference on Robotics ternational Conference on Robotics and Automation (ICRA), 2021, pp.
and Automation (ICRA). IEEE, 2018, pp. 6252–6259. 8180–8186.
[44] S. Yao, G. Chen, Q. Qiu, J. Ma, X. Chen, and J. Ji, “Crowd-aware robot [64] A. Bera, T. Randhavane, and D. Manocha, “Aggressive, tense or shy?
navigation for pedestrians with multiple collision avoidance strategies identifying personality traits from crowd videos.” in IJCAI, 2017, pp.
via map-based deep reinforcement learning,” in 2021 IEEE/RSJ Inter- 112–118.
national Conference on Intelligent Robots and Systems (IROS), 2021, [65] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
pp. 8144–8150. computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[45] Y. Liu, A. Xu, and Z. Chen, “Map-based deep imitation learning for [66] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and
obstacle avoidance,” in 2018 IEEE/RSJ International Conference on S. Savarese, “Social lstm: Human trajectory prediction in crowded
Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 8644–8649. spaces,” in Proceedings of the IEEE conference on computer vision and
[46] P. Long, W. Liu, and J. Pan, “Deep-learned collision avoidance policy pattern recognition, 2016, pp. 961–971.
for distributed multiagent navigation,” IEEE Robotics and Automation [67] M. Lisotto, P. Coscia, and L. Ballan, “Social and scene-aware trajectory
Letters, vol. 2, no. 2, pp. 656–663, 2017. prediction in crowded spaces,” in 2019 IEEE/CVF International Con-
[47] J. Wu, S. Shin, C.-G. Kim, and S.-D. Kim, “Effective lazy training ference on Computer Vision Workshop (ICCVW), 2019, pp. 2567–2574.
method for deep q-network in obstacle avoidance and path planning,” in [68] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan:
2017 IEEE International Conference on Systems, Man, and Cybernetics Socially acceptable trajectories with generative adversarial networks,” in
(SMC). IEEE, 2017, pp. 1799–1804. Proceedings of the IEEE Conference on Computer Vision and Pattern
[48] P. Wu, S. Xie, H. Liu, J. Luo, Q. Li, and J. Gu, “A novel obstacle avoid- Recognition, 2018, pp. 2255–2264.
ance strategy of nonholonomic mobile robot based on virtual simulation [69] M. Pfeiffer, G. Paolo, H. Sommer, J. Nieto, R. Siegwart, and C. Cadena,
platform,” in 2015 IEEE International Conference on Information and “A data-driven model for interaction-aware pedestrian motion prediction
Automation. IEEE, 2015, pp. 185–190. in object cluttered environments,” in 2018 IEEE International Confer-
[49] D. Lee, J. Jeong, Y. H. Kim, and J. B. Park, “An improved artificial ence on Robotics and Automation (ICRA). IEEE, 2018, pp. 1–8.
potential field method with a new point of attractive force for a [70] J. Choi, K. Park, M. Kim, and S. Seok, “Deep reinforcement learning of
mobile robot,” in 2017 2nd International Conference on Robotics and navigation in a complex and crowded environment with a limited field
Automation Engineering (ICRAE). IEEE, 2017, pp. 63–67. of view,” in 2019 International Conference on Robotics and Automation
[50] H.-T. Chiang, N. Malone, K. Lesser, M. Oishi, and L. Tapia, “Path- (ICRA), 2019, pp. 5993–6000.
guided artificial potential fields with stochastic reachable sets for motion [71] L. Sun, J. Zhai, and W. Qin, “Crowd navigation in an unknown and
planning in highly dynamic environments,” in 2015 IEEE International dynamic environment based on deep reinforcement learning,” IEEE
Conference on Robotics and Automation (ICRA). IEEE, 2015, pp. Access, vol. 7, pp. 109 544–109 554, 2019.
2347–2354. [72] S. Hochreiter, “The vanishing gradient problem during learning recurrent
[51] N. Malone, H.-T. Chiang, K. Lesser, M. Oishi, and L. Tapia, “Hybrid neural nets and problem solutions,” International Journal of Uncertainty,
dynamic moving obstacle avoidance using a stochastic reachable set- Fuzziness and Knowledge-Based Systems, vol. 6, no. 02, pp. 107–116,
based potential field,” IEEE Transactions on Robotics, vol. 33, no. 5, 1998.
pp. 1124–1138, 2017. [73] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training
[52] X. Gu, M. Han, W. Zhang, G. Xue, G. Zhang, and Y. Han, “Intelli- recurrent neural networks,” in International conference on machine
gent vehicle path planning based on improved artificial potential field learning. PMLR, 2013, pp. 1310–1318.
algorithm,” in 2019 International Conference on High Performance Big [74] R. Bresson, J. Saraydaryan, J. Dugdale, and A. Spalanzani, “Socially
Data and Intelligent Systems (HPBD&IS). IEEE, 2019, pp. 104–109. compliant navigation in dense crowds,” in 2019 IEEE Intelligent Vehicles
[53] P. Fiorini and Z. Shiller, “Motion planning in dynamic environments us- Symposium (IV), 2019, pp. 64–69.
ing velocity obstacles,” The International Journal of Robotics Research, [75] A. Vemula, K. Muelling, and J. Oh, “Modeling cooperative navigation in
vol. 17, no. 7, pp. 760–772, 1998. dense human crowds,” in Robotics and Automation (ICRA), 2017 IEEE
[54] J. A. Douthwaite, S. Zhao, and L. S. Mihaylova, “A comparative study of International Conference on. IEEE, 2017, pp. 1685–1692.
velocity obstacle approaches for multi-agent systems,” in 2018 UKACC [76] M. Fahad, G. Yang, and Y. Guo, “Learning human navigation behav-
12th International Conference on Control (CONTROL). IEEE, 2018, ior using measured human trajectories in crowded spaces,” in 2020
pp. 289–294. IEEE/RSJ International Conference on Intelligent Robots and Systems
[55] J. Van den Berg, M. Lin, and D. Manocha, “Reciprocal velocity obsta- (IROS), 2020, pp. 11 154–11 160.
cles for real-time multi-agent navigation,” in 2008 IEEE International [77] A. J. Sathyamoorthy, J. Liang, U. Patel, T. Guan, R. Chandra, and
Conference on Robotics and Automation. IEEE, 2008, pp. 1928–1935. D. Manocha, “Densecavoid: Real-time navigation in dense crowds using
[56] J. Snape, J. Van Den Berg, S. J. Guy, and D. Manocha, “The hybrid anticipatory behaviors,” in 2020 IEEE International Conference on
reciprocal velocity obstacle,” IEEE Transactions on Robotics, vol. 27, Robotics and Automation (ICRA), 2020, pp. 11 345–11 352.
no. 4, pp. 696–706, 2011. [78] A. Bera, T. Randhavane, R. Prinja, and D. Manocha, “Sociosense:
[57] J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n- Robot navigation amongst pedestrians with social and psychological
body collision avoidance,” in Robotics research. Springer, 2011, pp. constraints,” in Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ
3–19. International Conference on. IEEE, 2017, pp. 7018–7025.
[58] Z. Gyenes and E. G. Szadeczky-Kardoss, “Motion planning for mobile [79] S. Kim, S. J. Guy, W. Liu, D. Wilkie, R. W. Lau, M. C. Lin, and
robots using the safety velocity obstacles method,” in 2018 19th In- D. Manocha, “Brvo: Predicting pedestrian trajectories using velocity-
ternational Carpathian Control Conference (ICCC). IEEE, 2018, pp. space reasoning,” The International Journal of Robotics Research,
389–394. vol. 34, no. 2, pp. 201–217, 2015.
[59] M.-T. Lorente, E. Owen, and L. Montano, “Model-based robocentric [80] C. Cao, P. Trautman, and S. Iba, “Dynamic channel: A planning
planning and navigation for dynamic environments,” The International framework for crowd navigation,” in 2019 International Conference on
Journal of Robotics Research, vol. 37, no. 8, pp. 867–889, 2018. Robotics and Automation (ICRA), 2019, pp. 5551–5557.
[60] G. Ferrer and A. Sanfeliu, “Anticipative kinodynamic planning: multi- [81] L. P. Chew, “Constrained delaunay triangulations,” Algorithmica, vol. 4,
objective robot navigation in urban and dynamic environments,” Au- no. 1, pp. 97–108, 1989.
tonomous Robots, pp. 1–16, 2018.
[61] P.-H. Ciou, Y.-T. Hsiao, Z.-Z. Wu, S.-H. Tseng, and L.-C. Fu, “Compos-
ite reinforcement learning for social robot navigation,” in 2018 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS).
IEEE, 2018, pp. 2553–2558.
[62] S. H. Kiss, K. Katuwandeniya, A. Alempijevic, and T. Vidal-Calleja,
“Probabilistic dynamic crowd prediction for social navigation,” in 2021
IEEE International Conference on Robotics and Automation (ICRA),
2021, pp. 9269–9275.

144
Authorized licensed use limited to: FUDAN UNIVERSITY. Downloaded on February 25,2023 at 04:33:29 UTC from IEEE Xplore. Restrictions apply.

You might also like