Professional Documents
Culture Documents
net/publication/356272643
CITATIONS READS
37 1,401
4 authors, including:
All content following this page was uploaded by Shichang Xiao on 25 July 2022.
To cite this article: Hongtao Hu, Xurui Yang, Shichang Xiao & Feiyang Wang (2021): Anti-conflict
AGV path planning in automated container terminals based on multi-agent reinforcement learning,
International Journal of Production Research, DOI: 10.1080/00207543.2021.1998695
Article views: 2
CONTACT Xurui Yang yang_xurui@163.com Institute of Logistics Science & Engineering, Shanghai Maritime University, Shanghai 201306, PR
People’s Republic of China
(2) A method based on the Multi-Agent Deep Deter- 2.1.1. AGV path planning in the general scenario
ministic Policy Gradient (MADDPG) strategy in AGV is used as material handling equipment in manufac-
reinforcement learning is developed to solve the turing systems, warehouses, and transportation systems.
AGV path planning problem. Since the scenarios Many previous researchers have explored the AGV
created by the node network are discrete and the path planning for flexible manufacturing systems, AGVs
MADDPG algorithm is used for a continuous sce- are responsible for performing material delivery tasks
nario, the Gumbel-Softmax technique is applied to between workstations. Ghasemzadeh, Behrangi, and
discretize the problem. Azgomi (2009) investigated a multi-AGV path search
method to avoid network congestion based on a static
The remainder of the paper is organised as follows: network graph in the flexible manufacturing system.
Section 2 gives a brief review of the related literature. Nishi, Hiranaka, and Grossmann (2011) studied schedul-
Section 3 addresses the port layout and task definition ing and conflict-free routing of AGV, set AGV capac-
and establishes the integer programming model. The ity to prevent collision in both paths and nodes. Hu
Multi-Agent Reinforcement learning algorithm is intro- et al. (2020) proposed a new real-time scheduling method
duced in section 4. Section 5 shows the results of numer- based on deep reinforcement learning to reduce the com-
ical experiments. Section 6 draws some conclusions and pletion time and delay rate of AGVs in the flexible shop
puts forward some suggestions for future research. floor. Drótos et al. (2021) proposed a centralised control
method, by which the vehicle scheduling was optimised
to ensure the conflict free route control of a large fleet
2. Literature review
of AGVs in the factory scenario. Zhang et al. (2021)
This section summarises the literature related to the AGV used the topology map approach to represent the manu-
path planning problem and related research on reinforce- facturing workshop transportation network, established
ment learning. an energy-saving path planning model, and proposed
a two-stage solution method and particle swarm opti-
misation algorithm to solve the problem. Farooq et al.
2.1. AGV path planning problem
(2021) investigated the path planning problem of AGV
Vehicle path planning involves planning a reasonable in the production system of textile spinning flow shop, a
path for vehicles from the starting point to the target simplified scheduling model was established to meet the
point to meet the objectives of minimum path length or real-time application requirements, and the simulation
shortest time. As one of the methods to improve AGV experiment was carried out with an improved genetic
operation efficiency, the path planning problem has been algorithm.
studied by many scholars. Qiu et al. (2002) and Fazlol- With the development of technology, the application
lahtabar and Saidi-Mehrabad (2015) provided a com- of AGV is gradually extended to other scenarios, such
prehensive review of methodologies to optimise AGV as warehouse and transportation system. Adamo et al.
dispatching and routing problems, respectively. Consid- (2018) studied the vehicle path and the speed on each
ering the application scenario of AGVs, the literature on arc of the path in the transportation network, so as to
AGV path planning will be reviewed from two aspects: avoid vehicle conflict and save energy at the same time,
general scenario and port scenario. a branch and bound algorithm was proposed to improve
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH 3
the solution efficiency. Fransen et al. (2020) proposed a Table 1. Comparison of the AGV path planning in shop floor
dynamic path planning method for a mobile sorting sys- scenario and ACT scenario.
tem, established a graph-representation of grid system scenarios
layout, and its vertex weight updates with time, so as Methodology Shop-floor ACT
to avoid deadlock. An et al. (2020) studied the conflict Zone control strategy
Method to Paths are described as
path planning of automatic vehicles in a traffic scenario, describe the a mesh or a grid, (divide the planning
established the conflict point network and a space–time system which is usually area into multiple
considered to be non-overlapping
routing framework, the Dijkstra algorithm with time a regular network areas and restrict
windows and the k-shortest path (KSP) algorithm were with nodes and arcs. the existence of only
one vehicle in any
developed to solve this problem. area at any time);
Grid (network with
nodes and arcs)
2.1.2. AGV path planning in port scenario Task Material delivery Container trans-
The horizontal transportation area of an ACT includes tasks between portation between
roads and intersections. Therefore, many scholars regard workstations. One the seaside and
path consists of a landside. One path
the horizontal transportation area of ports as a highway series of tasks. only completes
traffic network and use zone control strategy to study one task.
Mathematical Mathematical Conflict avoidance is
path planning, which divide the planning area into mul- model models are rarely rarely considered
tiple non-overlapping areas and restrict the existence of established. in mathematical
models.
only one vehicle in any area at any time. Kim, Jeon, and Objective Minimise the delay Minimise the path
Ryu (2006) used the grid method and reserved graph time length or time
Solution method/ Simulation, heuristic Simulation, heuristic
technology to study the collision and deadlock of two algorithm algorithm algorithm
AGVs during turning. Ho and Liao (2009) designed a
dynamic zone strategy based on a network guide path
to solve the AGV path planning problem in a fixed zone. container-flow mapping. Zhen (2016) studied the truck
Li et al. (2016) proposed a flow control strategy to avoid congestion in the yard and established a combination of
AGV congestion in the local area and reduce the prob- probabilistic and physics-based models for truck inter-
ability of collision avoidance. Hu, Dong, and Xu (2020) ruptions. Jin, Lee, and Cao (2016) built an integer pro-
used a topology diagram to study the multi-AGV dis- gramming model considering the traffic congestion in
patching and routing problem of the container terminal the yard and solved the congestion problem by limit-
and proposed a three-stage decomposition algorithm, ing the lane capacity. Li et al. (2020) studied the vehi-
which combines the A∗ algorithm with the time-window cle routing problem with discrete demand and multiple
method for AGV path planning. time windows, a branch-and-price-and-cut algorithm is
In addition, some scholars have studied the anti- proposed to solve the problem.
collision problem of AGV path planning in the port, and The above literature solves the problem of multi AGV
the effectiveness of the proposed method is verified by path planning in different environments and realises col-
simulation. Lehmann, Grunow, and Gunther (2006) pro- lision avoidance to a certain extent. Table 1 shows the
posed two different methods to detect AGV deadlock, summary and comparison of the problem, modelling and
and their effectiveness was verified by simulation. Park, solution of AGV path planning in the shop floor scenario
Kim, and Lee (2009) proposed a grid-based deadlock rep- and the ACT scenario in the existing literature. It can
resentation, which prevented deadlock by imposing con- be seen that although there are some commonalities in
straints on vehicle movement, simulation results show different scenarios, such as the method to describe the
that this method can significantly improve the operation system and the solution method. However, due to the
performance of container terminals. Li et al. (2018) devel- different tasks of AGV in different scenarios, the focus
oped the probability formula of AGV catch-up conflict of modelling is different. In a shop-floor system, AGV
and studied the influence of travel time uncertainty on path planning is usually combined with task schedul-
AGV catch-up conflict through simulation. ing, which is limited by the production process. The
The truck conflict problem in the traditional con- path between workstations is usually fixed, after the task
tainer terminal is also useful for solving the AGV con- sequence is determined, the AGV path is also deter-
flict prevention path planning problem. Bai et al. (2015) mined. However, in the ACT, it is still necessary to plan
developed a set-covering model for the vehicle routing the path for each AGV to complete the task after the task
problem based on a novel route representation and a assignment.
4 H. HU ET AL.
and the demand for solution efficiency in the practical planning for the first task of each AGV during such
application, so as to provide decision support for the short planning horizon. For a longer planning horizon,
operation of the ACT. In this paper, a new method for our algorithm can be incorporated into a rolling horizon
describing the AGV path planning problem is proposed optimisation approach.
considering the characteristics of the ACT, and an inte- Planning the shortest path for each AGV while ensur-
ger programming model is established to find the shortest ing that there is no conflict between the paths is the
path for multi-AGVs at the same time, and also takes key factor affecting the operation efficiency of the termi-
into account the driving direction of AGVs and the con- nal. This paper takes Shanghai Yangshan Deepwater Port
flict between paths. To effectively solve this problem, phase IV as the research object, which is the largest auto-
reinforcement learning based on the multi-agent deep mated container terminal in the world. There are 61,199
deterministic policy gradient (MADDPG) is designed. magnetic nails buried in the underground of Yangshan
phase IV so that AGV can sense its position and guide
AGV to drive.
3. Problem description and mathematical
model
3.1.1. Development of port layout
3.1. Problem description
The horizontal transportation area of the ACT is an area
The main steps of AGV operation are as follows: firstly, with regular shape and unmanned operation. Unlike the
receive the container operation task instruction; then, general manufacturing system, there are no obstacles or
plan a reasonable path to the loading/unloading posi- workstations in this area. In order to precisely describe
tion of the operation task instruction; next, work with the AGV path planning problem in the ACT, according
the handling equipment to load the container to the to the characteristics of magnetic nail guided driving, a
AGV/unload the container from the AGV, so as to com- node network of port layout is constructed, as shown in
plete the loading/unloading operation; finally, wait for Figure 2. After the task assignment, AGV needs to go
the next task instruction. through several nodes to complete the task.
The task assignment of AGVs is completed in the Ignoring the acceleration and deceleration of the
upper-level decision. Since the AGV path planning prob- AGV during driving, the horizontal distance between the
lem is an optimisation problem of the operational level, nodes is the distance that the AGV drives at the maxi-
the planning horizon is relatively short. In this case, mum speed per second, while the vertical nodes count as
most AGV can only complete one task during the plan- one node per lane. As shown in Figure 2, the horizon-
ning horizon. In this research, we only consider the path tal transportation area of the ACT is generally divided
6 H. HU ET AL.
into the following three parts: quay crane operation area, horizontal direction. Figure 4(b) shows the correspond-
buffer lanes, and yard side lanes. In this paper, the anti- ing adjacency matrix. 1 means it can be reached, 0 means
conflict AGV path planning in the horizontal transporta- it can not be reached. Taking the first line as an example,
tion area is investigated. node 0 can only reach node 0 (indicating that the AGV
can stay at this node) and node 3.
3.1.2. Driving rules
AGV can only be located at a certain node at any time and 3.1.3. The definition of task
stay at the current position or drive to the adjacent nodes Generally, the tasks of ACTs are divided into loading
allowed by the rules at any time. Each node can hold tasks and unloading tasks. The loading task is that the
at most one AGV. AGVs are not allowed to travel out- AGV picks up the container and transports it to the
side the network. As shown in Figure 3, in the horizontal designated quay crane operation position through the
direction, there are seven one-way lanes in the quay crane horizontal transportation area, while the unloading task
operation area, and six one-way lanes arranged alter- is the opposite.
nately in the lane at the yard side; the nodes in the vertical Due to the short period of AGV path planning, the
direction are bidirectional. task definition in this paper is different from the tradi-
The adjacency matrix is set according to the driving tional definition of loading and unloading tasks. In other
rules of the ACT to constrain the driving direction of words, this paper does not distinguish whether the task is
AGV in the mathematical model. Figure 4(a) shows the a loading task or an unloading task, only the starting and
driving rules when there are 6 nodes. The adjacent nodes end points of AGV tasks are considered. And the AGV
can pass through each other in the vertical direction, and path planning is to generate the path from the current
only the right node can reach the adjacent left node in the position to the end point.
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH 7
(1) The initial position of AGV and task assignment are Cij the adjacency matrix, which reflects the con-
known; nectivity between nodes. If node i and node j
(2) We consider AGV planning problem within a short are adjacent in the port layout, and AGV can
period, and only one task is assigned to each drive from node i to node j according to the
AGV; driving rules, then the corresponding value in
(3) The maximum battery capacity of AGV is enough to the matrix is 1, otherwise, it is 0, i, j ∈ N
ensure the energy consumption of the assigned task; nsk the starting point of AGV k, k ∈ K, nsk ∈ N s
(4) The conflict between the lane inside the yard and nek the end point of AGV k, k ∈ K, nek ∈ N e
AGV-mate exchange area is not considered. lk the latest arrival time of AGV k, k ∈ K
8 H. HU ET AL.
Decision variables: the specified initial position at the initial moment. Con-
straints (7) guarantee that the AGV without tasks stays
αkij a binary variable, equals 1 if node i is passed at the starting point. Constraints (8) guarantee that at
through immediately before node j by AGV k, any given time step, the AGV will only be at one certain
0 otherwise, i, j ∈ N, i = j, k ∈ K node. Constraints (9) limit the number of AGVs that can
βkit a binary variable, equals 1 if AGV k is at node be held by any node at any time, mainly to prevent the
i at time t, 0 otherwise, i ∈ N, k ∈ K, t ∈ T occupying of the same point. Constraints (10) ensure the
moving rule of AGVs, that is, AGV can only drive to the
3.2.3. Model formulation node connected with the current node. Constraints (11)
ensure that the AGV must reach the destination within
min 0.5 |βkit − βki(t+1) | (1)
the specified time. Constraints (12) prevent the opposite
i∈N t∈T k∈K
conflict, that is, two AGVs can’t drive to each other’s cur-
subject to: rent node at the next time step. Constraints (13) are the
relationship between two decision variables. Constraints
αk,nsk ,j = 1, ∀k ∈ K (2) (14) and (15) are the integer restrictions of the decision
j∈N variables.
αkij = αkjm , ∀k ∈ K, j ∈ Nkr (3)
i∈N,i=j m∈N,m=j
3.2.4. Linearisation
Since the objective function (1), constraints (11), and
αk,i,nek = 1, ∀k ∈ K (4) constraints (12) are all nonlinear functions, new auxiliary
i∈N decision variables will be defined to linearise the original
αkij ≤ Cij , ∀k ∈ K, i, j ∈ N, i = j (5) model. Then the model can be transformed into:
βk,nsk ,0 = 1, ∀k ∈ K (6) min 0.5 ωkit (16)
i∈N t∈T k∈K
βk,nsk ,t = 1, ∀k ∈ K2 , t ∈ T (7)
subject to:
βkit = 1, ∀k ∈ K, t ∈ T (8) Constraints (2)-(10), (13)-(15)
i∈N
ωkit ≥ βkit − βki(t+1) , ∀k ∈ K, i ∈ N, t ∈ T (17)
βkit ≤ 1, ∀i ∈ N, t ∈ T (9)
k∈K ωkit ≥ βki(t+1) − βkit , ∀k ∈ K, i ∈ N, t ∈ T (18)
βkit ≤ Cij · βkj(t+1) , ∀k ∈ K, i ∈ N, t ∈ T (10) (t + 1) · τk,nek ,t ≤ lk , ∀k ∈ K, t ∈ T (19)
j∈N
τk,nek ,t ≥ βk,nek ,t+1 − βk,nek ,t , ∀k ∈ K, t ∈ T (20)
(t + 1) · |βk,nek ,t+1 − βk,nek ,t | ≤ lk , ∀k ∈ K1 , t ∈ T (11)
τk,nek ,t ≥ βk,nek ,t − βk,nek ,t+1 , ∀k ∈ K, t ∈ T (21)
Cij · (βnit · βmjt + βmi(t+1) · βnj(t+1) ) ≤ 1,
Cij · (ϕnmijt + ϕnmij(t+1) ) ≤ 1, ∀m, n ∈ K, m = n, t ∈ T ,
∀m, n ∈ K, m = n, t ∈ T , i, j ∈ N, i = j (12)
i, j ∈ N, i = j (22)
αkij ≤ βkjt , ∀k ∈ K, j ∈ N (13) ϕnmijt ≥ βnit + βmjt − 1, ∀m, n ∈ K, m = n, t ∈ T,
i∈N t∈T
i, j ∈ N, i = j (23)
αkij ∈ {0, 1}, ∀k ∈ K, i, j ∈ N (14)
ϕnmijt ≥ βmit + βnjt − 1, ∀m, n ∈ K, m = n, t ∈ T,
βkit ∈ {0, 1}, ∀k ∈ K, i ∈ N, t ∈ T (15)
i, j ∈ N, i = j (24)
Objective (1) is to minimise the path length. By minimis- ωkit ∈ {0, 1}, ∀k ∈ K, i ∈ N, t ∈ T (25)
ing the number of nodes AGV pass through, the shortest
path is obtained. Constraints (2) guarantee that there τk,nek ,t ∈ {0, 1}, ∀k ∈ K, t ∈ T (26)
is one and only one connection point after the starting ϕnmijt ∈ {0, 1}, ∀m, n ∈ K, m = n, t ∈ T, i, j ∈ N, i = j
point of AGV k. Constraints (3) are flow balance con- (27)
straints except starting and end point. Constraints (4)
guarantee that there is one and only one connection point The new auxiliary decision variable ωkit is defined
before the end point of AGV k. Constraints (5) ensure to substitute parts |βkit − βki(t+1) |. The objective func-
that the nodes must be connected before AGV can pass tion (1) is linearised by constraints (16), (17), and (18).
through. Constraints (6) ensure that the AGV is located at The new auxiliary decision variable τk,nek ,t and ϕnmijt are
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH 9
contains the observations of all agents, and output is the power direction can the driving rules be met. Judgment is
Q-value of agent k. made when AGV random sampling action is carried out.
In a policy gradient algorithm, the parameter θ of pol- When the velocity direction of the AGV is contrary to the
icy π is usually adjusted directly to maximise the objec- power direction of the node, velocity drops to 0, the AGV
tive function J(θ ) = Es∼pπ ,a∼πθ [R], where pπ is the state will stop at the node, wait for the next action at the next
distribution. In the multi-agent environment, the expec- time step, and increase the corresponding penalty in the
tation of the policy gradient of each agent k is J(θk ) = reward function.
E[Rk ], which can be written as follows: The conflict risk between AGV k and k can be
obtained through the distance function:
∇θk J(θk ) = Es∼pπ ,a∼πθ [∇θk log πk (ak |sk )
Qπk (x, a1 , . . . , aK )] (28) d(stk , stk ) > dmin (33)
average approximation Q function error L(θk ) of the F Table 3. Algorithm Procedure: The time-window based Dijkstra
samples extracted in the current iteration is reduced by algorithm.
the training of the critic network, and then the determin- Algorithm: The time-window based Dijkstra algorithm in AGV path planning
istic policy gradient ∇θk J of each AGV is updated with 1: Initialise the adjacency matrix Cij and node matrix Hit to record the
the same idea. After completing these two steps, update weather node i is occupied in time t
2: Generate AGV priority list Kp
the θk and the data in the experience pool. And repeat 3: While ()
the sampling to the training process, until the training 4: Plan a path P for the AGV k = 1 using the Dijkstra algorithm and
update the node matrix Hit
episode reaches the specified number of episodes M, the 5: Plan a path P’ for the AGV k’ using the Dijkstra algorithm and record
training is terminated. the time of all nodes occupied by P’ to generate new matrix Hit
6: Same point occupation conflict detection: Compare two matrixes
whether there is conflict: if there is a conflict, mark the conflict node as
5. Numerical experiment unavailable and go to Step 5 if there is no conflict, go to Opposite collision
detection
7: Opposite collision detection: Checks whether the adjacent node of the
In this Section, a series of numerical experiments are current node is occupied at the current time:
conducted to evaluate the effectiveness of the proposed if there is a conflict, mark the conflict node as unavailable and go to
Step 5
model and solution method. if there is no conflict, k’ = k’+1, update the node matrix Hit , go to
Step 5
8: if all path planning has been completed
5.1. Parameter setting break;
transportation. A new method for describing the AGV Data availability statement
path planning problem in the ACT is presented, that is, The authors confirm that the data supporting the findings of
the node network is established according to the distri- this study are available within the article [and/or] its supple-
bution of port magnetic nails. This description method mentary materials.
can meet the real-time and precise requirements of AGV
path planning in ACT. To model the anti-conflict AGV Disclosure statement
path planning problem, an integer programming model No potential conflict of interest was reported by the author(s).
is established, in which the driving rules of the ACT
are represented through the adjacency matrix, and the Funding
conflicting paths are prevented by relevant constraints.
This work was supported by National Natural Science Foun-
Because the proposed description method has the
dation of China: [Grant Number 71771143]; Soft Science Key
characteristics of large scale and complexity, in order to Project of Shanghai Science and Technology Innovation Action
improve the solution efficiency and meet the require- Plan: [Grant Number 20692193300]; China: [Grant Number
ments of the ACT for solution quality at the same time, 20SG46].
a method based on the Multi-Agent Deep Determinis-
tic Policy Gradient strategy in reinforcement learning is Notes on contributors
proposed to find a better solution for the constructed
Hongtao Hu received his B.S. degree from
model. Each agent is given an actor-critic structure so Fudan University, China and PhD. degree
that each AGV can adjust its strategy to achieve better from Shanghai Jiaotong University, China,
performance. The Gumbel-Softmax strategy is applied to and then went to the National University
discretize the action space to solve the requirement of of Singapore, Singapore, as a research fel-
traditional MADDPG for continuity of action space. At low for two years. Now he is a professor
and vice president at the school of logistics
last, a series of numerical experiments are conducted to
engineering, Shanghai Maritime Univer-
verify the effectiveness and the efficiency of the model sity, China. His main research fields include port and shipping
and the algorithm. Compared with the CPLEX and management and optimization, supply chain network design
the time-window based Dijkstra algorithm, MADDPG and optimization, production and logistics system optimiza-
can obtain a better solution based on considering time tion. He has published more than 30 papers, appeared in inter-
efficiency, which provides a new solution method for national journals such as International Journal of Production
Research, Transportation Research Part C: emerging technolo-
AGV anti-conflict path planning research. The proposed gies, Computers & Industrial Engineering, etc.
algorithm can take into account both efficiency and per-
Xurui Yang received her B.S. degree from
formance, a path planning solution with near optimal
Shanghai University of Engineering Sci-
performance can be obtained with higher computational ence, China and M.S. degree from Uni-
efficiency. versity of Duisburg-Essen, Germany. Now
Further research on AGV path planning can be stud- she is a Ph.D. candidate at the insti-
ied from the following aspects: AGV conflict prevention tute of logistics science & engineering,
path planning considering yard exchange area and the Shanghai Maritime University, China. Her
main research fields include port opera-
application of deep reinforcement learning in the coop- tion management and optimization, operation research opti-
erative transportation of port equipment. mization algorithm.
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH 15
Shichang Xiao received his B.S. degree Routing Problems: A Review Study.” Journal of Intelligent &
from Henan University of Technology, Robotic Systems 77 (3): 525–545.
China, M.S. degree from Xi’an Univer- Foerster, J., Y. Assael, N. De Freitas, and S. Whiteson. 2016.
sity of Technology, China and PhD. degree “Learning to Communicate with Deep Multi-Agent Rein-
from Northwestern Polytechnical Univer- forcement Learning.” In Advances in Neural Information
sity, China, during which he studied at Processing Systems, 2137–2145.
the University of Michigan, Ann Arbor, Foerster, J., R. Chen, M. Al-Shedivat, S. Whiteson, P. Abbeel,
USA, for one year. Now he is a lecturer and I. Mordatch. 2018. “Learning with Opponent-Learning
at the school of logistics engineering, Shanghai Maritime Uni- Awareness.” International Foundation for Autonomous
versity, China. His main research fields include logistics and Agents and Multiagent Systems 19: 122–130.
manufacturing system modeling, automated container termi- Fransen, K. J. C., J. A. W. M. van Eekelen, A. Pogromsky, M. A.
nal scheduling, job shop scheduling, operation research opti- A. Boon, and I. J. B. F. Adan. 2020. “A Dynamic Path Plan-
mization algorithm. He has published more than 10 papers, ning Approach for Dense, Large, Grid-Based Automated
appeared in international journals such as Computers & Oper- Guided Vehicle Systems.” Computers & Operations Research
ations Research, Energies, etc. 123: 105046.
Fujii, K. 2018. “Mathematical Reinforcement to the Minibatch
Feiyang Wang received his B.S. degree
of Deep Learning.” Advances in Pure Mathematics 08 (03):
from Nanjing University of Finance &
307–320.
Economics, China. Now he is a graduate
Ghasemzadeh, H., E. Behrangi, and M. A. Azgomi. 2009.
student at the school of logistics engineer-
“Conflict-free Scheduling and Routing of Automated Guided
ing, Shanghai Maritime University, China.
Vehicles in Mesh Topologies.” Robotics and Autonomous
His main research fields include port
Systems 57 (6-7): 738–748.
operation management and optimization,
Ho, Y. C., and T. W. Liao. 2009. “Zone Design and Control for
machine learning.
Vehicle Collision Prevention and Load Balancing in a Zone
Control AGV System.” Computers & Industrial Engineering
56 (1): 417–432.
References Hu, Y. J., L. C. Dong, and L. Xu. 2020. “Multi-AGV Dispatching
and Routing Problem Based on a Three-Stage Decomposi-
Adamo, T., T. Bektas, G. Ghiani, E. Guerriero, and E. Manni. tion Method.” Mathematical Biosciences and Engineering 17
2018. “Path and Speed Optimization for Conflict-Free (5): 5150–5172.
Pickup and Delivery Under Time Windows.” Transportation Hu, H., X.L. Jia, Q.X. He, S.F. Fu, and K. Liu. 2020. “Deep
Science 52 (4): 739–755. Reinforcement Learning Based AGVs Real-Time Schedul-
An, Y.L., M. Li, X. Lin, F. He, and H.L. Yang. 2020. “Space- ing with Mixed Rule for Flexible Shop Floor in Industry 4.0.”
time Routing in Dedicated Automated Vehicle Zones.” Computers & Industrial Engineering 149: 106749.
Transportation Research Part C: Emerging Technologies 120: Jang, E., S. Gu, and B. Poole. 2016. “Categorical Reparameter-
102777. ization with Gumbel-Softmax.” International Conference on
Bai, R. B., N. Xue, J. J. Chen, and G. W. Roberts. 2015. “A Set- learning representations, arXiv: 1611.01144.
covering Model for a Bidirectional Multi-Shift Full Truck- Jin, J. G., D. H. Lee, and J. X. Cao. 2016. “Storage Yard Man-
load Vehicle Routing Problem.” Transportation Research agement in Maritime Container Terminals.” Transportation
Part B-Methodological 79: 134–148. Science 50 (4): 1300–1313.
Brockman, G., V. Cheung, L. Pettersson, J. Schneider, J. Schul- Kim, K. H., S. M. Jeon, and K. R. Ryu. 2006. “Deadlock Pre-
man, J. Tang, and W. Zaremba. 2016. “OpenAI Gym.” vention for Automated Guided Vehicles in Automated Con-
arXiv:1606.01540, Jun 5. tainer Terminals.” Or Spectrum 28 (4): 659–679.
Chen, B., R. Bai, J. Li, Y. Liu, N. Xue, and J. F. Ren. 2020. “A Mul- Kim, C. W., and J. M. A. Tanchoco. 1991. “Conflict-free
tiobjective Single Bus Corridor Scheduling Using Machine Shortest-Time Bidirectional AGV Routeing.” International
Learning-Based Predictive Models.” International Journal of Journal of Production Research 29 (12): 2377–2391.
Production Research. doi:10.1080/00207543.2020.1766716. Lehmann, M., M. Grunow, and H. O. Gunther. 2006.
Dandanov, Nikolay, Hussein Al-Shatri, Anja Klein, and “Deadlock Handling for Real-Time Control of AGVs at
Vladimir Poulkov. 2017. “Dynamic Self-Optimization of Automated Container Terminals.” Or Spectrum 28 (4):
the Antenna Tilt for Best Trade-off Between Coverage and 631–657.
Capacity in Mobile Networks.” Wireless Personal Communi- Li, Q., A. Pogromsky, T. Adriaansen, and J. T. Udding. 2016.
cations 92 (1): 251–278. “A Control of Collision and Deadlock Avoidance for Auto-
Drótos, M., P. Györgyi, M. Horváth, and T. Kis. 2021. “Subop- mated Guided Vehicles with a Fault-Tolerance Capability.”
timal and Conflict-Free Control of a Fleet of AGVs to Serve International Journal of Advanced Robotic Systems 13: 1–24.
Online Requests.” Computers & Industrial Engineering 152: doi:10.5772/62685.
106999. Li, J. L., H. Qin, R. Baldacci, and W. B. Zhu. 2020. “Branch-
Farooq, B., J. S. Bao, H. Raza, Y. C. Sun, and Q. W. Ma. 2021. and-price-and-cut for the Synchronized Vehicle Routing
“Flow-shop Path Planning for Multi-Automated Guided Problem with Split Delivery, Proportional Service Time
Vehicles in Intelligent Textile Spinning Cyber-Physical Pro- and Multiple Time Windows.” Transportation Research
duction Systems Dynamic Environment.” Journal of Manu- Part E-Logistics and Transportation Review 140: 101955.
facturing Systems 59: 98–116. doi:10195510.1016/j.tre.2020.101955.
Fazlollahtabar, H., and M. Saidi-Mehrabad. 2015. “Methodolo- Li, J. J., B. W. Xu, O. Postolache, Y. S. Yang, and H. F. Wu.
gies to Optimize Automated Guided Vehicle Scheduling and 2018. “Impact Analysis of Travel Time Uncertainty on AGV
16 H. HU ET AL.
Catch-Up Conflict and the Associated Dynamic Adjust- Samvelyan, M., T. Rashid, C. S. Witt, and G. Farquhar. 2019.
ment.” Mathematical Problems in Engineering 2018 (5): “The StarCraft Multi-Agent Challenge.” Proceedings of the
1–11. doi:10.1155/2018/4037695. 18th International Conference on Autonomous Agents and
Lowe, R., Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch. MultiAgent Systems, 13–17.
2017. “Multi-agent Actor-Critic for Mixed Cooperative- Sutton, R. S., and A. G. Barto. 2014. Reinforcement Learning: An
Competitive Environments.” In Advances in Neural Informa- Introduction. London: The MIT Press.
tion Processing Systems, 30 (2017): 6379–6390. Toda, M. 1958. “On the Theory of the Brownian Motion.”
Lu, H., X. Zhang, and S. Yang. 2020. “A Learning-Based Itera- Journal of the Physical Society of Japan 13 (11): 1266–1280.
tive Method for Solving Vehicle Routing Problems.” Inter- Vinitsky, Eugene, Aboudy Kreidieh, Luc Le Flem, Nishant
national Conference on learning representations, published Kheterpal, Kathy Jang, Cathy Wu, Fangyu Wu, Richard Liaw,
online. Eric Liang, and Alexandre M. Bayen. 2018. “Benchmarks for
Mnih, V., K. Kavukcuoglu, D. Silver, et al. 2015. “Human-level Reinforcement Learning in Mixed-Autonomy Traffic.” Pro-
Control Through Deep Reinforcement Learning.” Nature ceedings of The 2nd Conference on Robot Learning, PMLR 87:
518 (7540): 529–533. 399–409.
Nishi, T., Y. Hiranaka, and I. E. Grossmann. 2011. “A Watkins, C., and J. Dayan. 1992. “Q -learning.” Machine Learn-
Bilevel Decomposition Algorithm for Simultaneous Produc- ing 8 (3-4): 279–292.
tion Scheduling and Conflict-Free Routing for Automated Zhang, K., F. He, Z. C. Zhang, X. Lin, and M. Li. 2020. “Multi-
Guided Vehicles.” Computers & Operations Research 38 (5): vehicle Routing Problems with Soft Time Windows: A Multi-
876–888. Agent Reinforcement Learning Approach.” Transportation
Park, J. H., H. J. Kim, and C. Lee. 2009. “Ubiquitous Research Part C-Emerging Technologies 121: 102861.
Software Controller to Prevent Deadlocks for Automated Zhang, Z.W., L.H. Wu, W.Q. Zhang, T. Peng, and J. Zheng.
Guided Vehicle Systems in a Container Port Terminal 2021. “Energy-efficient Path Planning for a Single-Load
Environment.” Journal of Intelligent Manufacturing 20 (3): Automated Guided Vehicle in a Manufacturing Workshop.”
321–325. Computers & Industrial Engineering 158: 107397.
Qiu, L., W. J. Hsu, S. Y. Huang, and H. Wang. 2002. “Scheduling Zhen, L. 2016. “Modeling of Yard Congestion and Optimiza-
and Routing Algorithms for AGVs: a Survey.” International tion of Yard Template in Container Ports.” Transportation
Journal of Production Research 40 (3): 745–760. Research Part B-Methodological 90: 83–104.