Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Keywords: Bi-objective school bus scheduling optimization problem that is a subset of vehicle fleet scheduling problem is
Reinforcement learning focused in this paper. In the literature, school bus routing and scheduling problem is proven to be an NP-Hard
Ant colony optimization problem. The processed data supplied by our framework is utilized to search a near-optimum schedule with the
Genetic algorithm
aid of reinforcement learning by evolutionary algorithms. They are named as reinforcement learning-enabled
Particle swarm optimization
genetic algorithm (RL-enabled GA), reinforcement learning-enabled particle swarm optimization algorithm (RL-
School bus routing and scheduling
Combinatorial optimization enabled PSO), and reinforcement learning-enabled ant colony optimization algorithm (RL-enabled ACO). In this
paper, the performance characterization of reinforcement learning-enabled evolutionary algorithms for integrated
school bus routing and scheduling problem is investigated. The efficiency of the conventional algorithms is im-
proved, and the near-optimal schedule is achieved significantly in a shorter duration with the active guidance
of the reinforcement learning algorithm. We attempt to carry out extensive performance evaluation and con-
ducted experiments on a geospatial dataset comprising road networks, trip trajectories of buses, and the address
of students. The conventional and reinforcement learning integrated algorithms are improving the travel time
of buses and the students. More than 50% saving by the conventional and the reinforcement learning-enabled
ant colony optimization algorithm compared to the constructive heuristic algorithm is achieved from 92nd and
54th iterations, respectively. Similarly, the saving by the conventional and the reinforcement learning-enabled
genetic algorithm is 41.34% at 500th iterations and more than 50% improvement from 281st iterations, respec-
tively. Lastly, more than 10% saving by the conventional and the reinforcement learning-enabled particle swarm
algorithm is achieved from 432nd and 28th iterations, respectively.
∗
Corresponding author.
E-mail address: eda_koksal@u.nus.edu (E. Koksal).
https://doi.org/10.1016/j.ijcce.2021.02.001
Received 26 November 2020; Received in revised form 19 January 2021; Accepted 4 February 2021
Available online 8 February 2021
2666-3074/© 2021 The Authors. Publishing Services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
E. Koksal, A.R. Hegde, H.P. Pandiarajan et al. International Journal of Cognitive Computing in Engineering 2 (2021) 47–56
48
E. Koksal, A.R. Hegde, H.P. Pandiarajan et al. International Journal of Cognitive Computing in Engineering 2 (2021) 47–56
Table 1
Literature classification on the integrated school bus routing and scheduling.
(Stodola, Mazal, Podhorec and Litvaj, 2014) Integrated ACO, Homogenous vehicle fleet
(Arias-Rojas, Jiménez and Montoya-Torres, 2012) Individual ACO for allocation then problem is converted to Travelling Salesman Problem
(Kang et al., 2015) Integrated GA for the integrated problem
(Kim and Son, 2012) Integrated PSO to minimize the number of vehicles
(Alinezhad et al., 2018) Integrated PSO, Homogenous vehicle fleet
(Kiriş and Özcan, 2020) Individual GA-based approach with k-means
(Mahmoudzadeh and Wang, 2020) Individual Cluster-Based with dynamic demand characteristics.
Table 2
Literature classification on the improvement approaches.
(Karafotias, Smit and Eiben, 2012) Adaptive The parameters are predicted by Artificial Neural Network (ANN) based on diversity
and fitness values with online calibration.
(Gong, Tang, Li and Zhang, 2019) Deterministic Separate subpopulations for GA are created with fixed parameters.
(Böttcher, Doerr and Neumann, 2010) Adaptive The mutation probability of GA is altered based on an equation.
(Ratnaweera, 2002, Zheng, Ma, Zhang and Qian, 2003) Adaptive Linear time varying inertia weight w of PSO is adapted.
(Naka, Genji, Yura and Fukuyama, 2001) Adaptive Nonlinear time varying inertia weight w is applied.
(Ratnaweera, Halgamuge and Watson, 2004) Adaptive Linear varying c1 and c2 co-efficient is used.
(Lessing, Dumitrescu and Stützle, 2004) Adaptive Dynamic heuristic matrix is applied.
(Chusanapiputt, Nualhong, Jantarang and Phoomvuthisarn, 2006) Adaptive Based on pheromone dispersion, the 𝛼, 𝛽, and 𝜌 are adapted.
(Li and Li, 2007) Adaptive Time varying 𝛼 and 𝛽 is applied.
(Martens et al., 2007) Self-Adaptive The ant determines the value of parameters by decision rules.
Table 3
The table shows the list of notations and the terminology of problem statement with their summary.
Notation Description
Eiben, 2015). Since the EA are iterative algorithms, in which if its pa- other scale bounded from 0 to 1 based on the work of Bowerman et al.
rameters’ values are not chosen correctly, the overall result might be (Bowerman, Hall and Calamai, 1995). Based on our assumption men-
inefficient. The most popular EAs GA, PSO, and ACO algorithms. The tioned above the problem is formulated as follows:
studies to improve the efficiency of these algorithms during the compu- ( ) ( )
min 𝑓 = 𝜔𝑏 ∗ 𝜑 𝑓𝑏 + 𝜔𝑠 ∗ 𝜑 𝑓𝑠 (1)
tation to increase the quality of the near-optimum values are consoli-
dated in Table 2.
𝑛𝑖
∑
𝑠𝑢𝑏𝑗 𝐻 𝑟 𝑖 (𝑗 ) ≤ 𝜂 ∗ 𝐶 𝑖 (2)
School bus route generation and scheduling problem statement and 𝑗=1
constructive heuristic
𝑚
∑
𝑛𝑖 = 𝑛𝑝 (3)
SBRS problem has been modeled by Newton and Thomas
𝑖=1
(Newton and Thomas, 1974). There are various configurations and this
𝑛𝑖
𝑚 ∑
NP-Hard optimization problem can be approached in various ways, refer ∑
𝐻𝑟𝑖 (𝑗 ) = 𝑛𝑠 (4)
to Section 2.
𝑖=1 𝑗=1
In this study, we considered two stakeholders, transport operators,
and students. The transport operators would attempt to minimize their 𝑟𝑖 (𝑗 ) ≠ 𝑟𝑘 (𝑗 ), ∀𝑖, 𝑘 ∈ 𝐵, ∀𝑗 ∈ 𝑃 (5)
cost; on the other hand, students would expect minimization of their
travel time. Even though, the nature of these objective functions is sup- 𝑇𝑏𝑖 ≤ 𝑇𝑚𝑎𝑥 , 𝑓 𝑜𝑟 𝑖 = 1, 2, ⋯ , 𝑚 (6)
portive at the beginning of the algorithm, their nature conflict with each
𝜔𝑏 , 𝜔𝑠 ∈ [0, 1], 𝑎𝑛𝑑 𝜔𝑏 + 𝜔𝑠 = 1 (7)
other when the algorithms are converging. The notations and the termi-
nology used are summarized in Table 3. Objective function Eq. (1) combines these two functions with the
weighted sum approach. The effect of different weighting can be ex-
Problem statement - formulation of the SBRS problem amined by the decision-makers to find the desired solution. Our bus
fleet capacity is not homogenous, and some spare seats are reserved by
The aim is to minimize the travel time of buses and students. The Eq. (2). Eqs. (3) and (4) enforces picking up all students and visiting all
data range of these two functions is normalized and mapped into an- selected roadsides. Eq. (5) prevents each pick-up point to be assigned to
49
E. Koksal, A.R. Hegde, H.P. Pandiarajan et al. International Journal of Cognitive Computing in Engineering 2 (2021) 47–56
Table 4
The table shows the list of notations and the terminology of reinforcement learning algorithm with their summary.
Notation Description
𝑄(𝑆, 𝐴) Q-Table stores the expected long-term impact of taking the specific action from that specific state.
𝐸(𝑆, 𝐴) E-Table, eligibility trace mechanism to signifies what the influence of the action from the specific state on the gained reward is.
𝜎(𝑡) The reward functions.
𝜛 The learning rate controls the influence of the target on the current Q-values.
𝛿 The target, temporal difference (TD) error.
𝛾 The discount rate determines the current value of future rewards.
𝜆 The trace decay determines the fallback rate of eligibility trace.
different buses. Eq. (6) enforces the upper bound of the duration for a Reinforcement learning
route. Lastly, Eq. (7) imposes the weights of the objective functions to
be within the range. Reinforcement learning is an area of machine learning and it is an
Two objective functions 𝑓𝑏 and 𝑓𝑠 is converted to a linear combina- agent-based approach. In a given state of environment, the goal-directed
tion, as shown in Eq. (1). Still, the problem falls with the variants of agent learns and takes an action based on its policy and receives a re-
Travelling Salesman Problem (TSP), which is also NP-Hard, due to the ward from the environment. Based on the received feedback, the agent
requirements of the problem. Initially, owing to the requirement of the changes its state in the environment. In the RL context, the aim is to
identification of intermediate nodes/edges and the travel time predic- maximize the expected sum of future rewards (Sutton and Barto, 2018),
tions within 15 minutes time interval is challenging. Furthermore, the refer to Table 4 for notations.
starting node/edge is unknown and the two-way tour distance and time Combinatorial optimization problems with a complex dataset can be
between any two nodes/edges are not strictly equal. Furthermore, the a challenging task particularly with an adequate performance from the
proposed service is a part of our framework and SBRS is a test case. In perspective of computation time. This paper introduces reinforcement
generic cases, the service has to respond and achieve a near-optimum learning-enabled combinatorial optimization algorithms to solve the in-
schedule with dynamic demand. tegrated bus route generation and scheduling problem. RL plays the role
of guiding the optimization algorithms on-the-fly to aid in achieving a
Bus stop selection near-optimal result within a smaller number of iterations. RL aims to
explore the search space of the problem and learn from this experience
In this study, the LAR strategy is followed. Note that, the number the best performing policy.
of bus stops is greater than the number of students. Furthermore, stu- In this study, RL is integrated with GA, PSO, and ACO algorithms to
dents cannot cross the street and bus drivers have to drive on the left improve their performance. These EAs are sensitive regarding the choice
side of the road. Thus, it is crucial to determine the relative position of the values of their parameters. These algorithms have two parameters
between the pick-up point and its nearest road segment before deciding that influence their performance. The following method is applied to
the intermediate nodes/edges. The student is assigned to a pick-up point these algorithms to integrate RL.
based on the angle and the equation between two lines by considering The continuous range of both parameters is discretized into d inter-
geographical coordinates. vals. There are d2 different combinations of these two parameters that
can exist, and the combinations of these parameters represent different
Initial state generation process – constructive heuristic states. Each state defines the range for the value of probability param-
eters. Later, a random value (using uniform distribution) is assigned to
This constructive heuristic method allocates buses to roadsides for these parameters from the range that is set by the RL. There are five
the first time to generate initial solutions. A school-centered system actions (increasing, decreasing, and maintaining) to find the new state
is built to scan roadsides and allocate buses in a radar mode. We based on the current state and the reward.
developed this constructive heuristic from the idea of sectoring of The learning process is traced by an estimated state-action table Q-
Thangiah and Nygard (Thangiah and Nygard, 1992) and Corberán et al. Table Q (S, A), for each state and the possible five actions from this state.
(Corberán, Fernández, Laguna and Marti, 2002). Mostly, the pick-up The action is selected from Q-Table based on the 𝜖-greedy policy which
points close to each other to be visited by the same bus, thus the pick- chooses a random action either from the Q-table with a probability or
up points are sectored. Initial solutions are obtained by changing the it will choose the action with the highest Q-value.
phasing. Phasing is the starting state of the scanning in a radar mode 1 1 𝜔 𝜔
measured clockwise from the north line. 𝜎(𝑡) = ∗ + 𝑏 ∗Ω𝐵 + 𝑠 ∗Ω𝑆 (8)
2 𝑓𝑡+1 2 2
50
E. Koksal, A.R. Hegde, H.P. Pandiarajan et al. International Journal of Cognitive Computing in Engineering 2 (2021) 47–56
⎪
Reinforcement learning-enabled genetic algorithm ⎩ 0, otherwise
Every particle also consists of a velocity vector the same length as
GA is a class of optimization algorithm inspired by the process of
the position vector with floating-point values representing the velocity
natural selection and was proposed by Holland (Holland, 1975). By the
with the particle moves in the search space. After intermediate vector Y
biologically inspired operators such as mutation, crossover, and selec-
is generated, the PSO algorithm proceeds to the update of the velocity
tion, the possible solutions are represented as a gene, refer to Table 5
V vector, and the position 𝜆 vector of the particle in the continuous
for notations. The terminology of GA is as follows.
domain starts. Firstly, the distance d1 between position vector XV and
One gene represents a selected roadside Pj where students are
the personal best solution of particle 𝜁 is measured by Eq. (14). The
mapped. Single gene stores the number of students at this roadside (Hj ),
distance d2 between position vector XV and the global best solution of
the source srcj , and the destination destj nodes for the direction of the
particle G is measured by Eq. (15).
road segment and the ID of the assigned bus bi . One solution is repre-
sented by the form of a chromosome. 𝑑1 = −1 − 𝑦𝑖𝑡𝑒𝑟
𝑝𝑑 (14)
Constructive heuristic generates a group of different chromosomes
as the first population. During each generation, GA selects individual
𝑑2 = 1 − 𝑦𝑖𝑡𝑒𝑟
𝑝𝑑 (15)
chromosomes at random from the previous population to produce the
new chromosomes for the next generation. GA applies genetic opera-
tions to generate different chromosomes. A complete GA includes three 𝑣𝑖𝑡𝑒𝑟 𝑖𝑡𝑒𝑟−1
𝑝𝑑 = 𝑤.𝑣𝑝𝑑 + 𝑐 1 .𝑟1 .𝑑1 + 𝑐 2 .𝑟2 .𝑑2 (16)
basic genetic operations: selection, crossover, and mutation. Selection se-
lects some individual chromosomes, which contribute to the popula-
𝜆𝑖𝑡𝑒𝑟 𝑖𝑡𝑒𝑟 𝑖𝑡𝑒𝑟
𝑝𝑑 = 𝑦𝑝𝑑 + 𝑣𝑝𝑑 (17)
tion at the next generation. The crossover combines two chromosomes
to form new chromosomes in the next generation. Lastly, mutation ap- Secondly, the velocity V vector for each particle is updated based
plies random changes to a chromosome to generate new chromosomes. on the distance d1 , d2, and the velocity vector of the previous iteration,
51
E. Koksal, A.R. Hegde, H.P. Pandiarajan et al. International Journal of Cognitive Computing in Engineering 2 (2021) 47–56
Table 6
The table shows the list of notations and the terminology of particle swarm optimization algorithm with their summary.
Notation Description
𝑋 𝑉𝑝 The position vector of particle p, 𝑥𝑣𝑝𝑛 is the index of nth bus stop with students to be scheduled.
𝑉𝑝 The velocity vector of particle p.
𝜁𝑝 The position vector of particle p’s personal best solution position, 𝑝𝑝𝑛 is the index of nth bus stop with students to be scheduled.
𝐺 The position vector of global best solution for entire swarm.
𝑌𝑝𝑖𝑡𝑒𝑟 The Intermediate discrete vector of particle p for iteration iter.
𝜆𝑖𝑡𝑒𝑟
𝑝
The position vector of the particle p in the continues domain for iteration iter.
𝑤 The inertia weight, the coefficient of previous velocity.
𝑐1 , 𝑐2 Acceleration coefficients
refer to Eq. (16). At last, the position 𝜆 vector of the particle in the direction. The aim is to guide the particle to move along the same di-
continuous domain is calculated with the intermediate vector Y and the rection. The inertia weight w has an impact on the convergence of the
velocity vector V, refer to Eq. (17). algorithm. When 𝑤 ≥ 1, the velocity increases over time thereby accel-
After the position 𝜆 vector of the particle in the continuous domain erating the particles with maximum velocity. It leads the swarm to di-
is updated, the intermediate vector Y and the position vector XV in the verge. When 𝑤 < 1, the particle decelerates. Thus, the inertia weight w
discrete domain are updated, refer to Eqs. (18) and (19). The vector enforces the particle to explore the search space or enforces local ex-
XV represents the bus assignment of the students and these processes ploitation.
repeat for each iteration until PSO finds a near-optimum value bus as- Second component (𝑐1 .𝑟1 .𝑑1 ), named as cognitive component, is the
signment. At last, the new bus assignment is directed to the school bus distance of the particle from its current position to its own personal best
routing process to find the intermediate route between the bus stops. position. This term introduces the effect of returning to the personal best
For each particle, the final travel time f is calculated by the school bus position of the particle and hence it is also named the “nostalgia” of the
routing process, refer to Eq. (1). The personal best position of the par- particle.
ticle 𝜁 and the global best position G are updated based on Eqs. (20) Third component (𝑐2 .𝑟2 .𝑑2 ), named as social component, is the dis-
and (21). tance of the particle from its current position to the global best particle
( ) position. This component acts more like a group norm that individual
⎧ 1, if 𝜆iter > 𝛼
⎪ ( pd ) particles try to attain.
𝑦iter
pd
= ⎨−1, if 𝜆iter < 𝛼 (18) There are two random parameters, r1 and r2 , in the range of [0,1].
pd
⎪ They are to help in randomizing the influence of cognitive and social
⎩ 0, otherwise
( ) components. These parameters are random for each index of the velocity
⎧𝐺𝑑iter−1 , if 𝑦iter =1 vector of each particle for each iteration. Acceleration co-efficient (𝑐1 .𝑟1 )
⎪ ( pd )
𝑥𝑣iter
pd
= ⎨𝜁 iter−1 , if 𝑦iter = −1 (19) and (𝑐2 .𝑟2 ), determine the given importance to the personal and swarm
pd pd
⎪ experience.
⎩any vehicle, otherwise
Consequently, the performance of PSO depends on the inertia weight
⎧ ( ) ( ( )iter ) w and the acceleration co-efficient (c1 and c2 ). Similar to the RL-enabled
( )iter ⎪𝑥𝑣iter , if 𝑓 𝑥𝑣iter < 𝑓 𝜁 best 𝑝
𝜁 best 𝑝 = ⎨ ( 𝑝 )iter−1 𝑝
(20) GA, RL is integrated into PSO to guide the swarm to a near-optimum
⎪𝜁 best 𝑝 , otherwise solution, named RL-enabled PSO.
⎩
52
E. Koksal, A.R. Hegde, H.P. Pandiarajan et al. International Journal of Cognitive Computing in Engineering 2 (2021) 47–56
Table 7 only those pair of bus stops and first stops belonging to the respective
The table shows the list of notations and the terminology of ant colony bus tour.
optimization algorithm with their summary. The probability of assigning a student to any feasible bus tour is cal-
Notation Description culated by Eqs. (24) and (25). These equations consist of two parameters
that influence the performance of ACO. These two parameters deter-
𝜉𝑗𝑘 The pheromone level between jth stop and kth stop.
𝐷𝑗𝑘 The minimum distance between the location j and k.
mine the relative influence of the pheromone concentration (𝛼) and the
𝜓𝑗𝑘 The heuristic matrix, the reciprocal of Dik heuristic matrix (𝛽). In this study, RL determines the value of these two
𝐶𝑖 The capacity of the ith bus. parameters based on the reward calculated by Eq. (8).
Θ The first stop selection vector
𝛼 The relative influence of the pheromone
concentration. School bus routing
𝛽 The relative influence of the heuristic value.
𝜌 The evaporation factors.
𝑇ℎ The solution constructed by the ant h.
In this study, the two-way tour distance between two nodes/edges is
𝐹𝑏 The fitness of global-best solution. not strictly equal. Additionally, the starting roadside is unknown. Fur-
𝐹𝑠 The fitness of the iteration-best solution. thermore, the intermediate nodes/edges of the routes need to be calcu-
𝐹𝑤 The fitness of the worst solution. lated based on the travel time prediction within 15 mins time interval.
𝑇𝑏 The global best tour.
These characteristics convert our problem to be a variant of TSP and
𝑇𝑠 The iteration best tour.
𝑛𝑗 The number of stops in the bus tour. prevent us to use deterministic path query algorithms such as Dijkstra
𝑃 𝑟𝑜𝑏ℎ𝑗𝑘 The probability of assigning student location Pj to bus k by the Algorithm (Dijkstra, 1959). In the literature, SA is the commonly used
ant h. meta-heuristic algorithm for TSP (Adzhar and Salleh, 2014, Fan and
Machemehl, 2006, Spada, Bierlaire and Liebling, 2005).
SA approximates the global schedule for traveling tour in a large
Each artificial ant in the colony has a vector that stores the tour Ri search space. It takes a certain probability to accept a solution. Further-
for all the available m buses. The tour vector of each bus Ri represents a more, since simple order change cannot approach the global minimum
sequence of roadside P(j) visited by the feasible ith bus. P(j) is a selected in one step, it prevents getting trapped in the local minimum and settle
roadside where students are mapped. The length of the tour vector Ri is in a near-optimal solution.
bounded by the capacity constraint Ci of the ith bus. Thus, the SA algorithm is integrated with GA, PSO, and ACO algo-
Constructive heuristic generates a group of different solutions. rithms and their RL enabled versions. During each iteration, SA will
Pheromone matrix 𝜉 jk is initialized based on the best and the worst so- calculate the intermediate nodes/edges and the route of the students
lutions generated by the constructive heuristic, refer to Eqs. (22) and that are allocated to the same buses.
(23) for global update rule. Lastly, the first stop selection vector ϴ is
initialized based on the bus assignment of constructive heuristic.
Until the termination iteration number is achieved, the following Experimental design
processes are applied. Firstly, the first stop of all tour vector for each ant
is determined. The first stop selection vector ϴ is utilized as a discrete We attempt to carry out extensive performance evaluation of our
probability distribution to choose the first stop for each bus. After the proposed strategies RL-enabled GA, RL-enabled PSO, and RL-enabled
first stop selection, the tour is constructed for all ants. From unvisited ACO. Each algorithm is compared with the baseline mechanism with
student locations, one student is sampled at a time. Then the probability the static value for the parameters.
of assigning student location Pj to any feasible bus tour is calculated, We tested our algorithms using the time datasets on a real-world
refer to Eqs. (24) and (25) where k0 refers to the first stop of the bus case of a school. Our framework retrieves the Global Positioning Sys-
tour. tem (GPS) records of 1000 private buses via 3G/4G. These data records
The idea of assigning a student to a bus is based on two factors. are processed by Travel Time Prediction (TTP) service, which employs
Firstly, the pheromone value suggests whether the bus stop j can be machine learning algorithms to predict (Ren, Han, Li and Veeravalli,
cluster to the first bus stop k0 or not. Secondly, the heuristic value rep- 2017). Our framework feeds our algorithms with the predicted travel
resents the nearest neighborhood between bus stop j and the bus stop time and retrieves the schedule for the requested vehicle fleets. Even
which is already visited by the bus. This process continues until all stu- if the demand variations might not be significant for some schools, in
dents are assigned to a bus. generic cases the demand varies significantly. Thus, the convergence
At last, firstly the pheromone matrix 𝜉 and first stop selection vec- speed to a near-optimum is crucial as the efficiency of the results.
tor ϴ for all ants are updated using the local update rule, refer to The dataset used in our experiments comprises 58440 edges, 27179
Eqs. (26) and (27) for local update rule. Secondly, the pheromone ma- nodes, 1000 bus trip trajectories, and 14 school buses with 330 students.
trix 𝜉 and first stop selection vector ϴ are updated using global update The algorithm aims to assign these 330 students while considering the
rules, refer to Eqs. (22) and (28). intermediate nodes among these 58440 edges and 27179 nodes for every
15 mins interval.
𝜉ijiter
0
+1
= (1 − 𝜌)𝜉ijiter
0
+ 𝜌Δ𝜉0 , if edge 𝑖 ∈ bus tour , 𝑗0 ∈ first stop of bus tour We conducted a brute force experiment to determine the static value
(26) of parameters for the conventional version of GA, PSO, and ACO as
suggested in the literature (Eiben and Smith, 2003). After some initial
Θiter
𝑗
+1
= (1 − 𝜌)Θiter
𝑗 + 𝜌 ∗ Θ0 , if for all f irst stop 𝑗 ∈ 𝑇𝑙 (27) experimentation, the best results for the conventional GA were mostly
between (0.65-0.75) for both pc and pm parameters. The value of the
Θ𝑖𝑡𝑒𝑟
𝑗
+1
= (1 − 𝜌)Θ𝑖𝑡𝑒𝑟 𝑖𝑡𝑒𝑟
𝑗 + 𝜌 ∗ ΔΘ𝑗 (28) probability of these operators for RL-enabled GA varies between (0.15-
0.95) for both parameters. The best results for the conventional PSO
⎧ (𝐹𝑤 −𝐹𝑏 )+(𝐹𝑤 −𝐹𝑠 ) were mostly between (0.3-0.4) and (1.2-1.4) for c1 and c2 , respectively.
⎪ 𝐹𝑤 , if 𝑗 ∈ first stop of 𝑇𝑏 or 𝑇𝑔 , 𝑖 ∈ 𝑇𝑏 or 𝑇𝑔
ΔΘiter
𝑗 =⎨ 𝑛𝑗 The value of c1 and c2 for RL-enabled PSO varies between (1-5). Lastly,
⎪ 0, otherwise the best results for the conventional ACO were mostly between (2-3) and
⎩
(5-6) for 𝛼 and 𝛽, respectively. The value of the 𝛼 and 𝛽 for RL-enabled
(29)
ACO varies between (1-10).
The local update will be applied to only those bus stops and first stops At last, the computation time might be affected by the platform, thus
present in an ant tour. Hence, the effect of evaporation will influence the number of iterations is more accurate to quantify the performance
53
E. Koksal, A.R. Hegde, H.P. Pandiarajan et al. International Journal of Cognitive Computing in Engineering 2 (2021) 47–56
Fig. 1. The performance comparison of conventional GA, PSO, and ACO algorithms with RL-enabled GA, RL-enabled PSO, and RL-enabled ACO to improve TTB and
TTS.
54
E. Koksal, A.R. Hegde, H.P. Pandiarajan et al. International Journal of Cognitive Computing in Engineering 2 (2021) 47–56
while RL-enabled PSO algorithm achieved same improvement level from knowledge provided by SOLO Pte Ltd. Also, we are grateful for
28th iteration onward, refer to Table 8. the help of Dr. Zengxiang Li, IHPC A∗ STAR Singapore, for his
The average travel time of one bus by the existing schedule provided support.
by the transport operators is 26.4” minutes. The achieved average travel
time of one bus by the conventional GA, PSO, and ACO are (22.2” –
References
23.9” – 20.7”) mins and by the RL-enabled GA, RL-enabled PSO, and
RL-enabled ACO are (21.1”-23.8”-19.3”) mins, respectively. Similarly,
Adzhar, N., & Salleh, S. (2014). Simulated annealing technique for routing in a rectangular
the average travel time of one student by the existing schedule provided mesh network. Modelling and Simulation in Engineering, 2014.
by the transport operators is 16.8” minutes. The achieved average travel Alinezhad, H., Yaghoubi, Ù. S., Hoseini Motlagh, S. M., Allahyari, S., & Saghafi
Nia, M. (2018). An improved particle swarm optimization for a class of capacitated
time of one student by the conventional GA, PSO, and ACO are (11.5”
vehicle routing problems. International Journal of Transportation Engineering, 5(4),
– 14.1” – 10.4”) mins and by the RL-enabled GA, RL-enabled PSO and 331–347.
RL-enabled ACO are (10.7”-13.9”-11”) mins, respectively. Arias-Rojas, J. S., Jiménez, J. F., & Montoya-Torres, J. R. (2012). Solving of school bus
routing problem by ant colony optimization. Revista EIA, (17), 193–208.
Babaee Tirkolaee, E., Goli, A., Pahlevan, M., & Malekalipour Kordestanizadeh, R. (2019).
Conclusions A robust bi-objective multi-trip periodic capacitated arc routing problem for urban
waste collection using a multi-objective invasive weed optimization. Waste Manage-
This study aimed to design an efficient methodology to achieve a ment & Research, 37(11), 1089–1101.
Bengio, Y., Lodi, A., & Prouvost, A. (2020). Machine learning for combinatorial optimiza-
near-optimum schedule for the SBRS problem. In our context, certain tion: a methodological tour d’horizon. European Journal of Operational Research.
factors influence and magnify the complexity of our problem. To this Böttcher, S., Doerr, B., & Neumann, F. (2010). Optimal fixed and adaptive mutation rates for
end, we augment an agent-based RL approach to guide the combina- the LeadingOnes problem Paper presented at the International Conference on Parallel
Problem Solving from Nature.
torial optimization algorithms. Our hypothesis is that with the guid- Bowerman, R., Hall, B., & Calamai, P. (1995). A multi-objective optimization approach to
ance of RL, the algorithm may achieve a near-optimum schedule within urban school bus routing: Formulation and solution method. Transportation Research
a smaller number of iterations. The augmentation process dynami- Part A: Policy and Practice, 29(2), 107–123.
Chusanapiputt, S., Nualhong, D., Jantarang, S., & Phoomvuthisarn, S. (2006). Selective self-
cally controls the hyper-parameters by introducing additional exploita- -adaptive approach to ant system for solving unit commitment problem Paper presented at
tion/exploration factors along with the conventional operators and act- the Proceedings of the 8th annual conference on Genetic and evolutionary computa-
ing according to the convergence rate. tion.
Corberán, A., Fernández, E., Laguna, M., & Marti, R. (2002). Heuristic solutions to the
In this paper, we have validated and demonstrated the usefulness
problem of routing school buses with multiple objectives. Journal of the operational
of fusing RL into the conventional EAs which showed a significant im- research society, 53(4), 427–435.
provement. From the literature survey, this work is the first-of-its-kind Davoodi, S. M. R., & Goli, A (2019). An integrated disaster relief model based on cov-
to validate and to demonstrate the above-mentioned hypothesis with a ering tour using hybrid Benders decomposition and variable neighborhood search:
Application in the Iranian context. Computers & Industrial Engineering, 130, 370–380.
complex real-world dataset. In this study, we considered three popular Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische
algorithms; namely GA, PSO, and ACO algorithms. To improve the effi- mathematik, 1(1), 269–271.
ciency of these conventional algorithms, the RL algorithm is integrated Dorigo, M., Maniezzo, V., & Colorni, A. (1996). Ant system: optimization by a colony of
cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cyber-
to guide these algorithms. We attempt to carry out extensive perfor- netics), 26(1), 29–41.
mance evaluations. Eiben, A. E., & Smith, J. E. (2003). Introduction to evolutionary computing. Springer.
The results indicate that the conventional and the reinforcement Fan, W., & Machemehl, R. B. (2006). Using a simulated annealing algorithm to solve the
transit route network design problem. Journal of Transportation Engineering, 132(2),
learning integrated algorithms are improving the travel time of buses 122–132.
and the students. More than 50% saving by the conventional and the Fisher, M. L., Jaikumar, R., & Van Wassenhove, L. N. (1986). A multiplier adjust-
reinforcement learning-enabled ant colony optimization algorithm com- ment method for the generalized assignment problem. Management Science, 32(9),
1095–1103.
pared to the constructive heuristic algorithm is achieved from 92nd and Goli, A., & Davoodi, S. M. R. (2018). Coordination policy for production and delivery
54th iterations, respectively. Similarly, the saving by the conventional scheduling in the closed loop supply chain. Production Engineering, 12(5), 621–631.
and the reinforcement learning-enabled genetic algorithm is 41.34% at Goli, A., Zare, H. K., Tavakkoli-Moghaddam, R., & Sadeghieh, A. (2019). Hybrid artificial
intelligence and robust optimization for a multi-objective product portfolio problem
500th iterations and more than 50% improvement from 281st iterations,
Case study: The dairy products industry. Computers & Industrial Engineering, 137, Ar-
respectively. Lastly, more than 10% saving by the conventional and the ticle 106090.
reinforcement learning-enabled particle swarm algorithm is achieved Goli, A., Zare, H. K., Tavakkoli-Moghaddam, R., & Sadegheih, A. (2020). Multiobjective
from 432nd and 28th iterations, respectively. fuzzy mathematical model for a financially constrained closed-loop supply chain with
labor employment. Computational Intelligence, 36(1), 4–34.
The work reported in this paper is almost industry-ready and we have Gong, M., Tang, Z., Li, H., & Zhang, J. (2019). Evolutionary multitasking with dynamic
conclusively demonstrated its effectiveness is based on a real-world case resource allocating strategy. IEEE Transactions on Evolutionary Computation, 23(5),
study using real-life data. As such, our Intelligent Transportation System 858–869.
Holland, J. (1975). Adaptation in artificial and natural systems (p. 232). Ann Arbor: The
(ITS) framework can be readily deployed over the cloud. An extension of University of Michigan Press.
this work could be considering the city’s congestion condition. Further- Kang, M., Kim, S.-K., Felan, J. T., Choi, H. R., & Cho, M. (2015). Development of a ge-
more, our generic cases could be extended as multiple schools, factories, netic algorithm for the school bus routing problem. International Journal of Software
Engineering and Its Applications, 9(5), 107–126.
Central Business District (CBD) zones, or even city scale. We could ex- Karafotias, G., Hoogendoorn, M., & Eiben, A. (2015). Parameter control in evolutionary al-
pect a significant potential improvement by our algorithms because of gorithms: Trends and challenges. IEEE Transactions on Evolutionary Computation, 19(2),
the increase of application scale in both temporal and spatial dimen- 167–187.
Karafotias, G., Smit, S. K., & Eiben, A. (2012). A generic approach to parameter control Paper
sions. presented at the European Conference on the Applications of Evolutionary Computa-
tion.
Declaration of Competing Interests Kennedy, J., & Eberhart, R. (1995a). A new optimizer using particle swarm theory. In
Proceedings of the sixth international symposium on micro machine and human science.
Kennedy, J., & Eberhart, R. (1995b). Particle swarm optimization. In Proceedings of
The authors declare that they have no known competing financial ICNN’95-international conference on neural networks.
interests or personal relationships that could have appeared to influence Kim, B.-I., Kim, S., & Park, J. (2012). A school bus scheduling problem. European Journal
of Operational Research, 218(2), 577–585.
the work reported in this paper.
Kim, B.-I., & Son, S.-J. (2012). A probability matrix based particle swarm optimization for
the capacitated vehicle routing problem. Journal of Intelligent Manufacturing, 23(4),
Acknowledgments 1119–1126.
Kiriş, S. B., & Özcan, T. (2020). Metaheuristics approaches to solve the employee bus
routing problem with clustering-based bus stop selection. In Artificial Intelligence
The authors are grateful for the support of the NUS SINGA and Machine Learning Applications in Civil, Mechanical, and Industrial Engineering
scholarship, for the datasets (i.e., vehicle GPS record) and domain (pp. 216–239). IGI Global.
55
E. Koksal, A.R. Hegde, H.P. Pandiarajan et al. International Journal of Cognitive Computing in Engineering 2 (2021) 47–56
Koksal Ahmed, E., Li, Z., Veeravalli, B., & Ren, S (2020). Reinforcement learning enabled Ren, S., Han, L., Li, Z., & Veeravalli, B. (2017). Spatial-temporal traffic speed bands data
genetic algorithm for school bus scheduling. Journal of Intelligent Transportation Sys- analysis and prediction. In Proceedings of the 2017 IEEE international conference on
tems. industrial engineering and engineering management (IEEM).
Lessing, L., Dumitrescu, I., & Stützle, T. (2004). A comparison between ACO algorithms Riera-Ledesma, J., & Salazar-González, J.-J. (2012). Solving school bus routing using the
for the set covering problem. In Proceedings of the International Workshop on Ant Colony multiple vehicle traveling purchaser problem: A branch-and-cut approach. Computers
Optimization and Swarm Intelligence. & Operations Research, 39(2), 391–404.
Li, M., Du, W., & Nian, F. (2014). An adaptive particle swarm optimization algorithm Sangaiah, A. K., Tirkolaee, E. B., Goli, A., & Dehnavi-Arani, S. (2020). Robust optimization
based on directed weighted complex network. Mathematical problems in engineering, and mixed-integer linear programming model for LNG supply chain planning problem.
2014. Soft Computing, 24(11), 7885–7905.
Li, Y., & Li, W. (2007). Adaptive ant colony optimization algorithm based on infor- Sariff, N. B., & Buniyamin, N. (2009). Comparative study of genetic algorithm and ant
mation entropy: Foundation and application. Fundamenta Informaticae, 77(3), 229– colony optimization algorithm performances for robot path planning in global static
242. environments of different complexities. In Proceedings of the 2009 IEEE international
Mahmoudzadeh, A., & Wang, X. B. (2020). Cluster based methodology for scheduling a symposium on computational intelligence in robotics and automation-(CIRA).
university shuttle system. Transportation Research Record, 2674(1), 236–248. Sarubbi, J. F., Mesquita, C. M., Wanner, E. F., Santos, V. F., & Silva, C. M. (2016). A strategy
Martens, D., De Backer, M., Haesen, R., Vanthienen, J., Snoeck, M., & Baesens, B. (2007). for clustering students minimizing the number of bus stops for solving the school bus routing
Classification with ant colony optimization. IEEE Transactions on Evolutionary Compu- problem.Paper presented at the NOMS 2016-2016 IEEE/IFIP Network Operations and
tation, 11(5), 651–665. Management Symposium.
Mazyavkina, N., Sviridov, S., Ivanov, S., & Burnaev, E. (2020). Reinforcement learning for Schittekat, P., Kinable, J., Sörensen, K., Sevaux, M., Spieksma, F., & Springael, J. (2013).
combinatorial optimization: A survey. arXiv preprint arXiv:2003.03600. A metaheuristic for the school bus routing problem with bus stop selection. European
Mostafaeipour, A., Goli, A., & Qolipour, M. (2018). Prediction of air travel demand using a Journal of Operational Research, 229(2), 518–528.
hybrid artificial neural network (ANN) with Bat and Firefly algorithms: A case study. Shafahi, A., Wang, Z., & Haghani, A. (2018). A matching-based heuristic algorithm for
The Journal of Supercomputing, 74(10), 5461–5484. school bus routing problems. arXiv preprint arXiv:1807.05311.
Naka, S., Genji, T., Yura, T., & Fukuyama, Y. (2001). Practical distribution state estima- Spada, M., Bierlaire, M., & Liebling, T. M. (2005). Decision-aiding methodology for the
tion using hybrid particle swarm optimization. In Proceedings of the 2001 IEEE power school bus routing and scheduling problem. Transportation Science, 39(4), 477–490.
engineering society winter meeting. Conference Proceedings (Cat. No. 01CH37194). Stodola, P., Mazal, J., Podhorec, M., & Litvaj, O. (2014). Using the ant colony optimiza-
Newton, R. M., & Thomas, W. H. (1974). Bus routing in a multi-school system. Computers tion algorithm for the capacitated vehicle routing problem. In Proceedings of the 16th
& Operations Research, 1(2), 213–222. international conference on mechatronics-mechatronika 2014.
Osman, I. H., & Laporte, G. (1996). Metaheuristics: A bibliography. Springer. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
Pacheco, J., Caballero, R., Laguna, M., & Molina, J. (2013). Bi-objective bus routing: an Thangiah, S. R., & Nygard, K. E. (1992). School bus routing using genetic algorithms. In
application to school buses in rural areas. Transportation Science, 47(3), 397–411. Proceedings of the applications of artificial intelligence X: Knowledge-Based Systems.
Park, J., & Kim, B.-I. (2010). The school bus routing problem: A review. European Journal Tirkolaee, E. B., Goli, A., & Weber, G.-W. (2019). Multi-objective aggregate production
of Operational Research, 202(2), 311–319. planning model considering overtime and outsourcing options under fuzzy seasonal
Ratnaweera, A. (2002). Particle swarm optimization with self-adaptive acceleration co- demand. In Advances in manufacturing II (pp. 81–96). Springer.
efficients. In Proceedings of the international conference on fuzzy system & knowledge Voß, S., Martello, S., Osman, I. H., & Roucairol, C. (2012). Meta-heuristics: Advances and
discovery (FSKD 2002), Singapore, Nov. trends in local search paradigms for optimization. Springer Science & Business Media.
Ratnaweera, A., Halgamuge, S. K., & Watson, H. C. (2004). Self-organizing hierarchical Zheng, Y.-L., Ma, L.-H., Zhang, L.-Y., & Qian, J.-X. (2003). On the convergence analysis
particle swarm optimizer with time-varying acceleration coefficients. IEEE Transac- and parameter selection in particle swarm optimization. In Proceedings of the 2003
tions on Evolutionary Computation, 8(3), 240–255. international conference on machine learning and cybernetics (IEEE Cat. No. 03EX693).
56